linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	xtensa: add MX irqchip	Max Filippov	2014-01-14	3	-0/+169
\| \| \| \| \| \| \| \|	MX is an interrupt distributor used in some SMP-capable xtensa configurations. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>
*	xtensa: move built-in PIC to drivers/irqchip	Max Filippov	2014-01-14	2	-0/+109
\| \| \| \| \| \| \| \| \|	Extract xtensa built-in interrupt controller implementation from xtensa/kernel/irq.c and move it to other irqchips, providing way to instantiate it from the device tree. Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Chris Zankel <chris@zankel.net>
*	Merge branch 'x86-urgent-for-linus' of ↵	Linus Torvalds	2013-12-29	6	-6/+8
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "There is a small EFI fix and a big power regression fix in this batch. My queue also had a fix for downing a CPU when there are insufficient number of IRQ vectors available, but I'm holding that one for now due to recent bug reports" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Don't select EFI from certain special ACPI drivers x86 idle: Repair large-server 50-watt idle-power regression
\| *	x86/efi: Don't select EFI from certain special ACPI drivers	Jan Beulich	2013-12-19	5	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 7ea6c6c1 ("Move cper.c from drivers/acpi/apei to drivers/firmware/efi") results in CONFIG_EFI being enabled even when the user doesn't want this. Since ACPI APEI used to build fine without UEFI (and as far as I know also has no functional depency on it), at least in that case using a reverse dependency is wrong (and a straight one isn't needed). Whether the same is true for ACPI_EXTLOG I don't know - if there is a functional dependency, it should depend on EFI rather than selecting it. It certainly has (currently) no build dependency. Adjust Kconfig and build logic so that the bad dependency gets avoided. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/52AF1EBC020000780010DBF9@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
\| *	x86 idle: Repair large-server 50-watt idle-power regression	Len Brown	2013-12-19	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Linux 3.10 changed the timing of how thread_info->flags is touched: x86: Use generic idle loop (7d1a941731fabf27e5fb6edbebb79fe856edb4e5) This caused Intel NHM-EX and WSM-EX servers to experience a large number of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote from deep C-states to shallow C-states, which caused these platforms to experience a significant increase in idle power. Note that this issue was already present before the commit above, however, it wasn't seen often enough to be noticed in power measurements. Here we extend an errata workaround from the Core2 EX "Dunnington" to extend to NHM-EX and WSM-EX, to prevent these immediate returns from MWAIT, reducing idle power on these platforms. While only acpi_idle ran on Dunnington, intel_idle may also run on these two newer systems. As of today, there are no other models that are known to need this tweak. Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com Signed-off-by: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* \|	Merge tag 'pm+acpi-3.13-rc6' of ↵	Linus Torvalds	2013-12-29	4	-34/+50
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes and new device IDs from Rafael Wysocki: - Fix for a cpufreq regression causing stale sysfs files to be left behind during system resume if cpufreq_add_dev() fails for one or more CPUs from Viresh Kumar. - Fix for a bug in cpufreq causing CONFIG_CPU_FREQ_DEFAULT_* to be ignored when the intel_pstate driver is used from Jason Baron. - System suspend fix for a memory leak in pm_vt_switch_unregister() that forgot to release objects after removing them from pm_vt_switch_list. From Masami Ichikawa. - Intel Valley View device ID and energy unit encoding update for the (recently added) Intel RAPL (Running Average Power Limit) driver from Jacob Pan. - Intel Bay Trail SoC GPIO and ACPI device IDs for the Low Power Subsystem (LPSS) ACPI driver from Paul Drews. * tag 'pm+acpi-3.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: powercap / RAPL: add support for ValleyView Soc PM / sleep: Fix memory leak in pm_vt_switch_unregister(). cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers cpufreq: remove sysfs files for CPUs which failed to come back after resume ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs
\| \| \
\| \| \
\| *-. \	Merge branches 'powercap' and 'acpi-lpss' with new device IDs	Rafael J. Wysocki	2013-12-27	3	-2/+13
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* powercap: powercap / RAPL: add support for ValleyView Soc * acpi-lpss: ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs
\| \| \| * \|	ACPI: Add BayTrail SoC GPIO and LPSS ACPI IDs	Paul Drews	2013-11-30	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the new ACPI ID (INT33FC) for the BayTrail GPIO banks as seen on a BayTrail M System-On-Chip platform. This ACPI ID is used by the BayTrail GPIO (pinctrl) driver to manage the Low Power Subsystem (LPSS). Signed-off-by: Paul Drews <paul.drews@intel.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \| \|	powercap / RAPL: add support for ValleyView Soc	Jacob Pan	2013-12-22	1	-2/+11
\| \| \| \|/ \| \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for RAPL on Intel ValleyView based SoC platforms, such as Baytrail. Besides adding CPU ID, special energy unit encoding is handled for ValleyView. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| \| \|
\| \| \ \
\| *-. \| \|	Merge branches 'pm-cpufreq' and 'pm-sleep' containing PM fixes	Rafael J. Wysocki	2013-12-27	1	-32/+37
\| \|\ \\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* pm-cpufreq: cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy drivers cpufreq: remove sysfs files for CPUs which failed to come back after resume * pm-sleep: PM / sleep: Fix memory leak in pm_vt_switch_unregister().
\| \| * \| \|	cpufreq: Use CONFIG_CPU_FREQ_DEFAULT_* to set initial policy for setpolicy ↵	Jason Baron	2013-12-22	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	drivers When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the intel_pstate driver, the desired default policy is not properly set. For example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the 'powersave' policy being set. Fix by configuring the correct default policy, if either 'powersave' or 'performance' are requested. Otherwise, fallback to what the driver originally set via its 'init' routine. Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
\| \| * \| \|	cpufreq: remove sysfs files for CPUs which failed to come back after resume	Viresh Kumar	2013-12-22	1	-32/+31
\| \| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are cases where cpufreq_add_dev() may fail for some CPUs during system resume. With the current code we will still have sysfs cpufreq files for those CPUs and struct cpufreq_policy would be already freed for them. Hence any operation on those sysfs files would result in kernel warnings. Example of problems resulting from resume errors (from Bjørn Mork): WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212() missing sysfs attribute operations for kobject: (null) Modules linked in: [stripped as irrelevant] CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006 ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000 ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310 Call Trace: [<ffffffff81380b0e>] dump_stack+0x55/0x76 [<ffffffff81038635>] warn_slowpath_common+0x7c/0x96 [<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212 [<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43 [<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82 [<ffffffff81182382>] ? sysfs_open_file+0x32/0x212 [<ffffffff811823c7>] sysfs_open_file+0x77/0x212 [<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac [<ffffffff81122562>] do_dentry_open+0x17c/0x257 [<ffffffff8112267e>] finish_open+0x41/0x4f [<ffffffff81130225>] do_last+0x80c/0x9ba [<ffffffff8112dbbd>] ? inode_permission+0x40/0x42 [<ffffffff81130606>] path_openat+0x233/0x4a1 [<ffffffff81130b7e>] do_filp_open+0x35/0x85 [<ffffffff8113b787>] ? __alloc_fd+0x172/0x184 [<ffffffff811232ea>] do_sys_open+0x6b/0xfa [<ffffffff811233a7>] SyS_openat+0xf/0x11 [<ffffffff8138c812>] system_call_fastpath+0x16/0x1b To fix this, remove those sysfs files or put the associated kobject in case of such errors. Also, to make it simple, remove the cpufreq sysfs links from all the CPUs (except for the policy->cpu) during suspend, as that operation won't result in a loss of sysfs file permissions and we can create those links during resume just fine. Fixes: 5302c3fb2e62 ("cpufreq: Perform light-weight init/teardown during suspend/resume") Reported-and-tested-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+ [rjw: Changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* \| \| \|	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	2013-12-24	11	-94/+189
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull block fixes from Jens Axboe: - fix for a memory leak on certain unplug events - a collection of bcache fixes from Kent and Nicolas - a few null_blk fixes and updates form Matias - a marking of static of functions in the stec pci-e driver * 'for-linus' of git://git.kernel.dk/linux-block: null_blk: support submit_queues on use_per_node_hctx null_blk: set use_per_node_hctx param to false null_blk: corrections to documentation null_blk: warning on ignored submit_queues param null_blk: refactor init and init errors code paths null_blk: documentation null_blk: mem garbage on NUMA systems during init drivers: block: Mark the functions as static in skd_main.c bcache: New writeback PD controller bcache: bugfix for race between moving_gc and bucket_invalidate bcache: fix for gc and writeback race bcache: bugfix - moving_gc now moves only correct buckets bcache: fix for gc crashing when no sectors are used bcache: Fix heap_peek() macro bcache: Fix for can_attach_cache() bcache: Fix dirty_data accounting bcache: Use uninterruptible sleep in writeback bcache: kthread don't set writeback task to INTERUPTIBLE block: fix memory leaks on unplugging block device bcache: fix sparse non static symbol warning
\| * \| \| \|	null_blk: support submit_queues on use_per_node_hctx	Matias Bjørling	2013-12-21	1	-4/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the case of both the submit_queues param and use_per_node_hctx param are used. We limit the number af submit_queues to the number of online nodes. If the submit_queues is a multiple of nr_online_nodes, its trivial. Simply map them to the nodes. For example: 8 submit queues are mapped as node0[0,1], node1[2,3], ... If uneven, we are left with an uneven number of submit_queues that must be mapped. These are mapped toward the first node and onward. E.g. 5 submit queues mapped onto 4 nodes are mapped as node0[0,1], node1[2], ... Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \| \|	null_blk: set use_per_node_hctx param to false	Matias Bjørling	2013-12-21	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The defaults for the module is to instantiate itself with blk-mq and a submit queue for each CPU node in the system. To save resources, initialize instead with a single submit queue. Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \| \|	null_blk: warning on ignored submit_queues param	Matias Bjorling	2013-12-19	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Let the user know when the number of submission queues are being ignored. Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \| \|	null_blk: refactor init and init errors code paths	Matias Bjorling	2013-12-19	1	-25/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify the initialization logic of the three block-layers. - The queue initialization is split into two parts. This allows reuse of code when initializing the sq-, bio- and mq-based layers. - Set submit_queues default value to 0 and always set it at init time. - Simplify the init error code paths. Signed-off-by: Matias Bjorling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \| \|	null_blk: mem garbage on NUMA systems during init	Matias Bjorling	2013-12-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For NUMA systems, initializing the blk-mq layer and using per node hctx. We initialize submit queues to 1, while blk-mq nr_hw_queues is initialized to the number of NUMA nodes. This makes the null_init_hctx function overwrite memory outside of what it allocated. In my case it lead to writing garbage into struct request_queue's mq_map. Signed-off-by: Matias Bjorling <m@bjorling.me> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \| \| \|	drivers: block: Mark the functions as static in skd_main.c	Rashika Kheria	2013-12-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mark functions skd_skmsg_state_to_str() and skd_skreq_state_to_str() as static in skd_main.c because they are not used outside this file. This eliminates the following warnings in skd_main.c: drivers/block/skd_main.c:5272:13: warning: no previous prototype for ‘skd_skmsg_state_to_str’ [-Wmissing-prototypes] drivers/block/skd_main.c:5284:13: warning: no previous prototype for ‘skd_skreq_state_to_str’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
\| * \| \| \|	Merge branch 'bcache-for-3.13' of git://evilpiepirate.org/~kent/linux-bcache ↵	Jens Axboe	2013-12-17	9	-66/+111
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into for-linus Kent writes: Jens - small pile of bcache fixes. I've been slacking on the writeback fixes but those definitely need to get into 3.13.
\| \| * \| \| \|	bcache: New writeback PD controller	Kent Overstreet	2013-12-16	4	-49/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old writeback PD controller could get into states where it had throttled all the way down and take way too long to recover - it was too complicated to really understand what it was doing. This rewrites a good chunk of it to hopefully be simpler and make more sense, and it also pays more attention to units which should make the behaviour a bit easier to understand. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: bugfix for race between moving_gc and bucket_invalidate	Kent Overstreet	2013-12-16	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a possibility for a bucket to be invalidated by the allocator while moving_gc was copying it's contents to another bucket, if the bucket only held cached data. To prevent this moving checks for a stale ptr (to an invalidated bucket), before and after reads. It it finds one, it simply ignores moving that data. This only affects bcache if the moving_gc was turned on, note that it's off by default. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: fix for gc and writeback race	Nicholas Swenson	2013-12-16	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Garbage collector needs to check keys in the writeback keybuf to make sure it's not invalidating buckets to which the writeback keys point to. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: bugfix - moving_gc now moves only correct buckets	Nicholas Swenson	2013-12-16	3	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed gc_move_threshold because picking buckets only by threshold could lead moving extra buckets (ei. if there are buckets at the threshold that aren't supposed to be moved do to space considerations). This is replaced by a GC_MOVE bit in the gc_mark bitmask. Now only marked buckets get moved. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: fix for gc crashing when no sectors are used	Nicholas Swenson	2013-12-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: Fix heap_peek() macro	Nicholas Swenson	2013-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: Fix for can_attach_cache()	Nicholas Swenson	2013-12-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: Fix dirty_data accounting	Kent Overstreet	2013-12-16	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. NOTE FOR BACKPORTERS: For 3.10 (and 3.11?) there's other accounting fixes necessary that got squashed in with other patches; the full patch against 3.10 is 408cc2f47eeac93a, available at: git://evilpiepirate.org/~kent/linux-bcache.git bcache-3.10-writeback-fixes Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10 diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 2a46036..4a12b2f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -1817,7 +1817,8 @@ static bool fix_overlapping_extents(struct btree b, struct bkey insert, if (KEY_START(k) > KEY_START(insert) + sectors_found) goto check_failed; - if (KEY_PTRS(replace_key) != KEY_PTRS(k)) + if (KEY_PTRS(k) != KEY_PTRS(replace_key) \|\| + KEY_DIRTY(k) != KEY_DIRTY(replace_key)) goto check_failed; /* skip past gen */
\| \| * \| \| \|	bcache: Use uninterruptible sleep in writeback	Kent Overstreet	2013-12-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're just waiting on kthread_should_stop(), nothing else, so interruptible sleep was wrong here. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: kthread don't set writeback task to INTERUPTIBLE	Stefan Priebe	2013-12-16	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	at the beginning (schedule_timout_interuptible) and others do his on their own This prevents wrong load average calculation (load of 1 per thread) Signed-off-by: Kent Overstreet <kmo@daterainc.com>
\| \| * \| \| \|	bcache: fix sparse non static symbol warning	Wei Yongjun	2013-11-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes the following sparse warning: drivers/md/bcache/btree.c:2220:5: warning: symbol 'btree_insert_fn' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* \| \| \| \| \|	Merge branch 'for-3.13-fixes' of ↵	Linus Torvalds	2013-12-24	4	-12/+49
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "There's one interseting commit - "libata, freezer: avoid block device removal while system is frozen". It's an ugly hack working around a deadlock condition between driver core resume and block layer device removal paths through freezer which was made more reproducible by writeback being converted to workqueue some releases ago. The bug has nothing to do with libata but it's just an workaround which is easy to backport. After discussion, Rafael and I seem to agree that we don't really need kernel freezables - both kthread and workqueue. There are few specific workqueues which constitute PM operations and require freezing, which will be converted to use workqueue_set_max_active() instead. All other kernel freezer uses are planned to be removed, followed by the removal of kthread and workqueue freezer support, hopefully. Others are device-specific fixes. The most notable is the addition of NO_NCQ_TRIM which is used to disable queued TRIM commands to Micro M500 SSDs which otherwise suffers data corruption" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: libata, freezer: avoid block device removal while system is frozen libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs libata: disable a disk via libata.force params ahci: bail out on ICH6 before using AHCI BAR ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8
\| * \| \| \| \| \|	libata, freezer: avoid block device removal while system is frozen	Tejun Heo	2013-12-19	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Freezable kthreads and workqueues are fundamentally problematic in that they effectively introduce a big kernel lock widely used in the kernel and have already been the culprit of several deadlock scenarios. This is the latest occurrence. During resume, libata rescans all the ports and revalidates all pre-existing devices. If it determines that a device has gone missing, the device is removed from the system which involves invalidating block device and flushing bdi while holding driver core layer locks. Unfortunately, this can race with the rest of device resume. Because freezable kthreads and workqueues are thawed after device resume is complete and block device removal depends on freezable workqueues and kthreads (e.g. bdi_wq, jbd2) to make progress, this can lead to deadlock - block device removal can't proceed because kthreads are frozen and kthreads can't be thawed because device resume is blocked behind block device removal. 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue") made this particular deadlock scenario more visible but the underlying problem has always been there - the original forker task and jbd2 are freezable too. In fact, this is highly likely just one of many possible deadlock scenarios given that freezer behaves as a big kernel lock and we don't have any debug mechanism around it. I believe the right thing to do is getting rid of freezable kthreads and workqueues. This is something fundamentally broken. For now, implement a funny workaround in libata - just avoid doing block device hot[un]plug while the system is frozen. Kernel engineering at its finest. :( v2: Add EXPORT_SYMBOL_GPL(pm_freezing) for cases where libata is built as a module. v3: Comment updated and polling interval changed to 10ms as suggested by Rafael. v4: Add #ifdef CONFIG_FREEZER around the hack as pm_freezing is not defined when FREEZER is not configured thus breaking build. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Tomaž Šolc <tomaz.solc@tablix.org> Reviewed-by: "Rafael J. Wysocki" <rjw@rjwysocki.net> Link: https://bugzilla.kernel.org/show_bug.cgi?id=62801 Link: http://lkml.kernel.org/r/20131213174932.GA27070@htj.dyndns.org Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Len Brown <len.brown@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Cc: kbuild test robot <fengguang.wu@intel.com>
\| * \| \| \| \| \|	libata: implement ATA_HORKAGE_NO_NCQ_TRIM and apply it to Micro M500 SSDs	Marc Carino	2013-12-17	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Certain drives cannot handle queued TRIM commands properly, even though support is indicated in the IDENTIFY DEVICE buffer. This patch allows for disabling the commands for the affected drives and apply it to the Micron/Crucial M500 SSDs which exhibit incorrect protocol behavior when issued queued TRIM commands, which could lead to silent data corruption. tj: Merged two unnecessarily split patches and made minor edits including shortening horkage name. Signed-off-by: Marc Carino <marc.ceeeee@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/g/1387246554-7311-1-git-send-email-marc.ceeeee@gmail.com Cc: stable@vger.kernel.org # 3.12+
\| * \| \| \| \| \|	libata: disable a disk via libata.force params	Robin H. Johnson	2013-12-16	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A user on StackExchange had a failing SSD that's soldered directly onto the motherboard of his system. The BIOS does not give any option to disable it at all, so he can't just hide it from the OS via the BIOS. The old IDE layer had hdX=noprobe override for situations like this, but that was never ported to the libata layer. This patch implements a disable flag for libata.force. Example use: libata.force=2.0:disable [v2 of the patch, removed the nodisable flag per Tejun Heo] Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Link: http://unix.stackexchange.com/questions/102648/how-to-tell-linux-kernel-3-0-to-completely-ignore-a-failing-disk Link: http://askubuntu.com/questions/352836/how-can-i-tell-linux-kernel-to-completely-ignore-a-disk-as-if-it-was-not-even-co Link: http://superuser.com/questions/599333/how-to-disable-kernel-probing-for-drive
\| * \| \| \| \| \|	ahci: bail out on ICH6 before using AHCI BAR	Paul Bolle	2013-12-16	1	-9/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The check for "combined mode" (which disables ahci support) on ICH6 is done after the first use of AHCI BAR. But if ahci is not enabled AHCI BAR is initialized to 0x00000000. (At least it is on the ICH6-M I tested this on. If I understand the datasheet correctly it should also be on ICH6R.) This apparently makes the call of pcim_iomap_regions_request_all() return -EINVAL. And we end up with ahci: probe of 0000:00:1f.2 failed with error -22 (at warning level) in the logs. So check for "combined mode" before calling pcim_iomap_regions_request_all(). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Tejun Heo <tj@kernel.org>
\| * \| \| \| \| \|	ahci: imx: Explicitly clear IMX6Q_GPR13_SATA_MPLL_CLK_EN	Marek Vasut	2013-12-03	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We must clear this IMX6Q_GPR13_SATA_MPLL_CLK_EN bit on i.MX6Q, otherwise Linux will fail to find the attached drive on some boards. This entire fix was: Reported-by: Eric Nelson <eric.nelson@boundarydevices.com> Signed-off-by: Marek Vasut <marex@denx.de> Reviewed-by: Shawn Guo <shawn.guo@linaro.org> Cc: Richard Zhu <r65037@freescale.com> Cc: Linux-IDE <linux-ide@vger.kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
\| * \| \| \| \| \|	libata: add ATA_HORKAGE_BROKEN_FPDMA_AA quirk for Seagate Momentus SpinPoint M8	Michele Baldessari	2013-11-29	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've received multiple reports in Fedora via (BZ 907193) that the Seagate Momentus SpinPoint M8 errors out when enabling AA: [ 2.555905] ata2.00: failed to enable AA (error_mask=0x1) [ 2.568482] ata2.00: failed to enable AA (error_mask=0x1) Add the ATA_HORKAGE_BROKEN_FPDMA_AA for this specific harddisk. Reported-by: Nicholas <arealityfarbetween@googlemail.com> Signed-off-by: Michele Baldessari <michele@acksyn.org> Tested-by: Nicholas <arealityfarbetween@googlemail.com> Acked-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
* \| \| \| \| \| \|	Merge tag 'rdma-for-linus' of ↵	Linus Torvalds	2013-12-24	5	-15/+52
\|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband fixes from Roland Dreier: "Last batch of InfiniBand/RDMA changes for 3.13 / 2014: - Additional checks for uverbs to ensure forward compatibility, handle malformed input better. - Fix potential use-after-free in iWARP connection manager. - Make a function static" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/uverbs: Check access to userspace response buffer in extended command IB/uverbs: Check input length in flow steering uverbs IB/uverbs: Set error code when fail to consume all flow_spec items IB/uverbs: Check reserved fields in create_flow IB/uverbs: Check comp_mask in destroy_flow IB/uverbs: Check reserved field in extended command header IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA() IB/core: const'ify inbuf in struct ib_udata RDMA/iwcm: Don't touch cm_id after deref in rem_ref RDMA/cxgb4: Make _c4iw_write_mem_dma() static
\| \| \ \ \ \ \ \
\| \| \ \ \ \ \ \
\| *-. \ \ \ \ \ \	Merge branches 'cxgb4', 'flowsteer' and 'misc' into for-linus	Roland Dreier	2013-12-23	4	-14/+51
\| \|\ \ \ \ \ \ \ \
\| \| \| * \| \| \| \| \| \|	RDMA/iwcm: Don't touch cm_id after deref in rem_ref	Steve Wise	2013-12-16	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rem_ref() calls iwcm_deref_id(), which will wake up any blockers on cm_id_priv->destroy_comp if the refcnt hits 0. That will unblock someone in iw_destroy_cm_id() which will free the cmid. If that happens before rem_ref() calls test_bit(IWCM_F_CALLBACK_DESTROY, &cm_id_priv->flags), then the test_bit() will touch freed memory. The fix is to read the bit first, then deref. We should never be in iw_destroy_cm_id() with IWCM_F_CALLBACK_DESTROY set, and there is a BUG_ON() to make sure of that. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: Check access to userspace response buffer in extended command	Yann Droneaud	2013-12-20	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a check on the output buffer with access_ok(VERIFY_WRITE, ...) to ensure the whole buffer is in userspace memory before using the pointer in uverbs functions. If the buffer or a subset of it is not valid, returns -EFAULT to the caller. This will also catch invalid buffer before the final call to copy_to_user() which happen late in most uverb functions. Just like the check in read(2) syscall, it's a sanity check to detect invalid parameters provided by userspace. This particular check was added in vfs_read() by Linus Torvalds for v2.6.12 with following commit message: https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/?id=fd770e66c9a65b14ce114e171266cf6f393df502 Make read/write always do the full "access_ok()" tests. The actual user copy will do them too, but only for the range that ends up being actually copied. That hides bugs when the range has been clamped by file size or other issues. Note: there's no need to check input buffer since vfs_write() already does access_ok(VERIFY_READ, ...) as part of write() syscall. Link: http://marc.info/?i=cover.1387273677.git.ydroneaud@opteya.com Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: Check input length in flow steering uverbs	Yann Droneaud	2013-12-20	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since ib_copy_from_udata() doesn't check yet the available input data length before accessing userspace memory, an explicit check of this length is required to prevent: - reading past the user provided buffer, - underflow when subtracting the expected command size from the input length. This will ensure the newly added flow steering uverbs don't try to process truncated commands. Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: Set error code when fail to consume all flow_spec items	Yann Droneaud	2013-12-20	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the flow_spec items parsed count does not match the number of items declared in the flow_attr command, or if not all bytes are used for flow_spec items (eg. trailing garbage), a log message is reported and the function leave through the error path. Unfortunately the error code is currently not set. This patch set error code to -EINVAL in such cases, so that the error is reported to userspace instead of silently fail. Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: Check reserved fields in create_flow	Yann Droneaud	2013-12-20	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted by Daniel Vetter in its article "Botching up ioctls"[1] "Check all unused fields and flags and all the padding for whether it's 0, and reject the ioctl if that's not the case. Otherwise your nice plan for future extensions is going right down the gutters since someone will submit an ioctl struct with random stack garbage in the yet unused parts. Which then bakes in the ABI that those fields can never be used for anything else but garbage." It's important to ensure that reserved fields are set to known value, so that it will be possible to use them latter to extend the ABI. The same reasonning apply to comp_mask field present in newer uverbs command: per commit 22878dbc9173 ("IB/core: Better checking of userspace values for receive flow steering"), unsupported values in comp_mask are rejected. [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: Check comp_mask in destroy_flow	Yann Droneaud	2013-12-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just like the check added to create_flow in 22878dbc9173 ("IB/core: Better checking of userspace values for receive flow steering"), comp_mask must be checked in destroy_flow too. Since only empty comp_mask is currently supported, any other value must be rejected. This check was silently added in a previous patch[1] to move comp_mask in extended command header, part of previous patchset[2] against create/destroy_flow uverbs. The idea of moving comp_mask to the header was discarded for the final patchset[3]. Unfortunately the check added in destroy_flow uverb was not integrated in the final patchset. [1] http://marc.info/?i=40175eda10d670d098204da6aa4c327a0171ae5f.1381510045.git.ydroneaud@opteya.com [2] http://marc.info/?i=cover.1381510045.git.ydroneaud@opteya.com [3] http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com Cc: Matan Barak <matanb@mellanox.com> Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: Check reserved field in extended command header	Yann Droneaud	2013-12-20	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noted by Daniel Vetter in its article "Botching up ioctls"[1] "Check all unused fields and flags and all the padding for whether it's 0, and reject the ioctl if that's not the case. Otherwise your nice plan for future extensions is going right down the gutters since someone will submit an ioctl struct with random stack garbage in the yet unused parts. Which then bakes in the ABI that those fields can never be used for anything else but garbage." It's important to ensure that reserved fields are set to known value, so that it will be possible to use them latter to extend the ABI. The same reasonning apply to comp_mask field present in newer uverbs command: per commit 22878dbc9173 ("IB/core: Better checking of userspace values for receive flow steering"), unsupported values in comp_mask are rejected. [1] http://blog.ffwll.ch/2013/11/botching-up-ioctls.html Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/uverbs: New macro to set pointers to NULL if length is 0 in INIT_UDATA()	Roland Dreier	2013-12-20	2	-11/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Trying to have a ternary operator to choose between NULL (or 0) and the real pointer value in invocations leads to an impossible choice between a sparse error about a literal 0 used as a NULL pointer, and a gcc warning about "pointer/integer type mismatch in conditional expression." Rather than clutter the source with more casts, move the ternary operator into a new INIT_UDATA_BUF_OR_NULL() macro, which makes it easier to use and simplifies its callers. Reported-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| \| * \| \| \| \| \| \| \|	IB/core: const'ify inbuf in struct ib_udata	Yann Droneaud	2013-12-16	1	-1/+1
\| \| \|/ / / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Userspace input buffer is not modified by kernel, so it can be 'const'. This is also a prerequisite to remove the implicit cast from INIT_UDATA(). Link: http://marc.info/?i=cover.1386798254.git.ydroneaud@opteya.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
\| * / / / / / / /	RDMA/cxgb4: Make _c4iw_write_mem_dma() static	Rashika	2013-12-15	1	-1/+1
\| \|/ / / / / / / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch marks the function _c4iw_write_mem_dma() as static because it is not used outside this file, which fixes the warning: drivers/infiniband/hw/cxgb4/mem.c:176:5: warning: no previous prototype for ‘_c4iw_write_mem_dma’ [-Wmissing-prototypes] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Roland Dreier <roland@purestorage.com>