summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'modules-next-for-linus' of ↵Linus Torvalds2014-04-0637-278/+331
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Nothing major: the stricter permissions checking for sysfs broke a staging driver; fix included. Greg KH said he'd take the patch but hadn't as the merge window opened, so it's included here to avoid breaking build" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: staging: fix up speakup kobject mode Use 'E' instead of 'X' for unsigned module taint flag. VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms. kallsyms: fix percpu vars on x86-64 with relocation. kallsyms: generalize address range checking module: LLVMLinux: Remove unused function warning from __param_check macro Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE module: remove MODULE_GENERIC_TABLE module: allow multiple calls to MODULE_DEVICE_TABLE() per module module: use pr_cont
| * staging: fix up speakup kobject modeRusty Russell2014-04-0117-212/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | It uses the unnecessary S_IFREG bit which broke when my stricter-checking-for-mode patch went in. Since we're fixing it anyway, the extra level of indirection is confusing for readers (ROOT_W == rw-r--r-- for example). Also, many of these are other-writable. Is that really intended? I'll-queue-this-patch-up-in-a-bit-by: Greg KH <greg@kroah.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * Use 'E' instead of 'X' for unsigned module taint flag.Rusty Russell2014-03-315-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Takashi Iwai <tiwai@suse.de> says: > The letter 'X' has been already used for SUSE kernels for very long > time, to indicate the external supported modules. Can the new flag be > changed to another letter for avoiding conflict...? > (BTW, we also use 'N' for "no support", too.) Note: this code should be cleaned up, so we don't have such maps in three places! Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.Rusty Russell2014-03-247-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary of http://lkml.org/lkml/2014/3/14/363 : Ted: module_param(queue_depth, int, 444) Joe: 0444! Rusty: User perms >= group perms >= other perms? Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2? Side effect of stricter permissions means removing the unnecessary S_IFREG from several callers. Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair number of drivers fail this test, so that will be the debate for a future patch. Suggested-by: Joe Perches <joe@perches.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * kallsyms: fix percpu vars on x86-64 with relocation.Rusty Russell2014-03-172-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86-64 has a problem: per-cpu variables are actually represented by their absolute offsets within the per-cpu area, but the symbols are not emitted as absolute. Thus kallsyms naively creates them as offsets from _text, meaning their values change if the kernel is relocated (especially noticeable with CONFIG_RANDOMIZE_BASE): $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.nokaslr 0000000000000000 D __per_cpu_start 0000000000004000 D gdt_page 0000000000014280 D __per_cpu_end ffffffff810001c8 T _stext ffffffff81ee53c0 D __per_cpu_offset $ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.kaslr1 000000001f200000 D __per_cpu_start 000000001f204000 D gdt_page 000000001f214280 D __per_cpu_end ffffffffa02001c8 T _stext ffffffffa10e53c0 D __per_cpu_offset Making them absolute symbols is the Right Thing, but requires fixes to the relocs tool. So for the moment, we add a --absolute-percpu option which makes them absolute from a kallsyms perspective: $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # no KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff802001c8 T _stext ffffffff8099b180 D __per_cpu_offset ffffffff809a3000 D __per_cpu_load $ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # With KASLR 0000000000000000 A __per_cpu_start 000000000000a000 A gdt_page 0000000000013040 A __per_cpu_end ffffffff89c001c8 T _stext ffffffff8a39d180 D __per_cpu_offset ffffffff8a3a5000 D __per_cpu_load Based-on-the-original-screenplay-by: Andy Honig <ahonig@google.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Kees Cook <keescook@chromium.org>
| * kallsyms: generalize address range checkingKees Cook2014-03-171-21/+32
| | | | | | | | | | | | | | | | | | This refactors the address range checks to be generalized instead of specific to text range checks, in preparation for other range checks. Also extracts logic for "is the symbol absolute" into a function. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * module: LLVMLinux: Remove unused function warning from __param_check macroMark Charlebois2014-03-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code makes a compile time type check that is optimized away. Clang complains that it generates an unused function: linux/kernel/panic.c:471:1: warning: unused function '__check_panic' [-Wunused-function] core_param(panic, panic_timeout, int, 0644); ^ linux/moduleparam.h:283:2: note: expanded from macro 'core_param' param_check_##type(name, &(var)); \ ^ <scratch space>:87:1: note: expanded from here param_check_int ^ linux/moduleparam.h:369:34: note: expanded from macro 'param_check_int' #define param_check_int(name, p) __param_check(name, p, int) ^ linux/moduleparam.h:349:22: note: expanded from macro '__param_check' static inline type *__check_##name(void) { return(p); } ^ <scratch space>:88:1: note: expanded from here __check_panic GCC won't complain for a static inline function but would if it was just a static function. Adding the unused attribute to the function declaration removes the warning. Per request from Rusty Russell it is marked as __always_unused as the code is meant to be optimized away. This code works for both GCC and clang. Signed-off-by: Mark Charlebois <charlebm@gmail.com> Signed-off-by: Behan Webster <behanw@converseincode.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULEMathieu Desnoyers2014-03-139-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users have reported being unable to trace non-signed modules loaded within a kernel supporting module signature. This is caused by tracepoint.c:tracepoint_module_coming() refusing to take into account tracepoints sitting within force-loaded modules (TAINT_FORCED_MODULE). The reason for this check, in the first place, is that a force-loaded module may have a struct module incompatible with the layout expected by the kernel, and can thus cause a kernel crash upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y. Tracepoints, however, specifically accept TAINT_OOT_MODULE and TAINT_CRAP, since those modules do not lead to the "very likely system crash" issue cited above for force-loaded modules. With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed module is tainted re-using the TAINT_FORCED_MODULE taint flag. Unfortunately, this means that Tracepoints treat that module as a force-loaded module, and thus silently refuse to consider any tracepoint within this module. Since an unsigned module does not fit within the "very likely system crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag to specifically address this taint behavior, and accept those modules within Tracepoints. We use the letter 'X' as a taint flag character for a module being loaded that doesn't know how to sign its name (proposed by Steven Rostedt). Also add the missing 'O' entry to trace event show_module_flags() list for the sake of completeness. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> NAKed-by: Ingo Molnar <mingo@redhat.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: David Howells <dhowells@redhat.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * module: remove MODULE_GENERIC_TABLERusty Russell2014-03-132-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | MODULE_DEVICE_TABLE() calles MODULE_GENERIC_TABLE(); make it do the work directly. This also removes a wart introduced in the last patch, where the alias is defined to be an unknown struct type "struct type##__##name##_device_id" instead of "struct type##_device_id" (it's an extern so GCC doesn't care, but it's wrong). The other user of MODULE_GENERIC_TABLE (ISAPNP_CARD_TABLE) is unused, so delete it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * module: allow multiple calls to MODULE_DEVICE_TABLE() per moduleTom Gundersen2014-03-132-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 78551277e4df5: "Input: i8042 - add PNP modaliases" had a bug, where the second call to MODULE_DEVICE_TABLE() overrode the first resulting in not all the modaliases being exposed. This fixes the problem by including the name of the device_id table in the __mod_*_device_table alias, allowing us to export several device_id tables per module. Suggested-by: Kay Sievers <kay@vrfy.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tom Gundersen <teg@jklm.no> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * module: use pr_contJiri Slaby2014-03-131-3/+3
| | | | | | | | | | | | | | | | | | When dumping loaded modules, we print them one by one in separate printks. Let's use pr_cont as they are continuation prints. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tileLinus Torvalds2014-04-0615-20/+1295
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull arch/tile updates from Chris Metcalf: "These fix a few stray build issues seen in linux-next, and also add the minimal required support for perf to tilegx" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: arch/tile: remove unused variable 'devcap' tile: Fix vDSO compilation issue with allyesconfig perf tools: Allow building for tile tile/perf: Support perf_events on tilegx and tilepro tile: Enable NMIs on return from handle_nmi() without errors tile: Add support for handling PMC hardware tile: don't use __get_cpu_var() with structure-typed arguments tile: avoid overflow in ns2cycles
| * | arch/tile: remove unused variable 'devcap'Chris Metcalf2014-04-041-2/+0
| | | | | | | | | | | | | | | | | | | | | Commit 503275bf37 removed the use of the variable but not the variable itself. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | tile: Fix vDSO compilation issue with allyesconfigKerry Sheh2014-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make allyesconfig give the following build error on tile: tilegx-linux-gcc: error: arch/tile/kernel/vdso/vgettimeofday32.o: No such file or directory tilegx-linux-objcopy: 'arch/tile/kernel/vdso/vdso32.so.dbg': No such file or directory In case with CONFIG_MODVERSIONS, cmd_cc_o_c generate .tmp_<file>.o from <file>.c only. Fix it by execute rule_cc_o_c instead. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kerry Sheh <ksheh@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | perf tools: Allow building for tileZhigang Lu2014-03-072-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Tested by building perf: - Cross-compiled for tile on x86_64 - Built natively on tile Signed-off-by: Zhigang Lu <zlu@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | tile/perf: Support perf_events on tilegx and tileproZhigang Lu2014-03-075-0/+1048
| | | | | | | | | | | | | | | | | | | | | Add perf support for tile architecture. Signed-off-by: Zhigang Lu <zlu@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | tile: Enable NMIs on return from handle_nmi() without errorsZhigang Lu2014-03-072-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NMI interrupts mask ALL interrupts before calling the handler, so we need to unmask NMIs according to the value handle_nmi() returns. If it returns zero, the NMIs should be re-enabled; if it returns a non-zero error, the NMIs should be disabled. Signed-off-by: Zhigang Lu <zlu@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | tile: Add support for handling PMC hardwareZhigang Lu2014-03-076-12/+204
| | | | | | | | | | | | | | | | | | | | | The PMC module is used by perf_events, oprofile and watchdogs. Signed-off-by: Zhigang Lu <zlu@tilera.com> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | tile: don't use __get_cpu_var() with structure-typed argumentsChris Metcalf2014-03-061-2/+2
| | | | | | | | | | | | | | | | | | This no longer works with the new per-cpu infrastructure. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
| * | tile: avoid overflow in ns2cyclesHenrik Austad2014-03-061-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 4cecf6d401a ("sched, x86: Avoid unnecessary overflow in sched_clock") and in recent patch "clocksource: avoid unnecessary overflow in cyclecounter_cyc2ns()" https://lkml.org/lkml/2014/3/4/17, the mult-shift approach is replaced by 2 steps to avoid storing a large, intermediate value that could overflow. arch/tile/kernel/time.c has a similar pattern in cycles2ns, and this copies the same pattern in this function CC: John Stultz <johnstul@us.ibm.com> CC: Mike Galbraith <bitbucket@online.de> CC: Salman Qazi <sqazi@google.com> Signed-off-by: Henrik Austad <henrik@austad.us> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
* | | Merge tag 'dm-3.15-changes' of ↵Linus Torvalds2014-04-0621-478/+2346
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper changes from Mike Snitzer: - Fix dm-cache corruption caused by discard_block_size > cache_block_size - Fix a lock-inversion detected by LOCKDEP in dm-cache - Fix a dangling bio bug in the dm-thinp target's process_deferred_bios error path - Fix corruption due to non-atomic transaction commit which allowed a metadata superblock to be written before all other metadata was successfully written -- this is common to all targets that use the persistent-data library's transaction manager (dm-thinp, dm-cache and dm-era). - Various small cleanups in the DM core - Add the dm-era target which is useful for keeping track of which blocks were written within a user defined period of time called an 'era'. Use cases include tracking changed blocks for backup software, and partially invalidating the contents of a cache to restore cache coherency after rolling back a vendor snapshot. - Improve the on-disk layout of multithreaded writes to the dm-thin-pool by splitting the pool's deferred bio list to be a per-thin device list and then sorting that list using an rb_tree. The subsequent read throughput of the data written via multiple threads improved by ~70%. - Simplify the multipath target's handling of queuing IO by pushing requests back to the request queue rather than queueing the IO internally. * tag 'dm-3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits) dm cache: fix a lock-inversion dm thin: sort the per thin deferred bios using an rb_tree dm thin: use per thin device deferred bio lists dm thin: simplify pool_is_congested dm thin: fix dangling bio in process_deferred_bios error path dm mpath: print more useful warnings in multipath_message() dm-mpath: do not activate failed paths dm mpath: remove extra nesting in map function dm mpath: remove map_io() dm mpath: reduce memory pressure when requeuing dm mpath: remove process_queued_ios() dm mpath: push back requests instead of queueing dm table: add dm_table_run_md_queue_async dm mpath: do not call pg_init when it is already running dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbind dm: stop using bi_private dm: remove dm_get_mapinfo dm: make dm_table_alloc_md_mempools static dm: take care to copy the space map roots before locking the superblock dm transaction manager: fix corruption due to non-atomic transaction commit ...
| * | | dm cache: fix a lock-inversionJoe Thornber2014-04-043-52/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When suspending a cache the policy is walked and the individual policy hints written to the metadata via sync_metadata(). This led to this lock order: policy->lock cache_metadata->root_lock When loading the cache target the policy is populated while the metadata lock is held: cache_metadata->root_lock policy->lock Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by ensuring the cache_metadata root_lock is held whilst all the hints are written, rather than being repeatedly locked while policy->lock is held (as was the case with each callout that policy_walk_mappings() made to the old save_hint() method). Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove locking correctness") build option. However, it is not clear how the LOCKDEP reported paths can lead to a deadlock since the two paths, suspending a target and loading a target, never occur at the same time. But that doesn't mean the same lock-inversion couldn't have occurred elsewhere. Reported-by: Marian Csontos <mcsontos@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
| * | | dm thin: sort the per thin deferred bios using an rb_treeMike Snitzer2014-04-041-2/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A thin-pool will allocate blocks using FIFO order for all thin devices which share the thin-pool. Because of this simplistic allocation the thin-pool's space can become fragmented quite easily; especially when multiple threads are requesting blocks in parallel. Sort each thin device's deferred_bio_list based on logical sector to help reduce fragmentation of the thin-pool's ondisk layout. The following tables illustrate the realized gains/potential offered by sorting each thin device's deferred_bio_list. An "io size"-sized random read of the device would result in "seeks/io" fragments being read, with an average "distance/seek" between each fragment. Data was written to a single thin device using multiple threads via iozone (8 threads, 64K for both the block_size and io_size). unsorted: io size seeks/io distance/seek -------------------------------------- 4k 0.000 0b 16k 0.013 11m 64k 0.065 11m 256k 0.274 10m 1m 1.109 10m 4m 4.411 10m 16m 17.097 11m 64m 60.055 13m 256m 148.798 25m 1g 809.929 21m sorted: io size seeks/io distance/seek -------------------------------------- 4k 0.000 0b 16k 0.000 1g 64k 0.001 1g 256k 0.003 1g 1m 0.011 1g 4m 0.045 1g 16m 0.181 1g 64m 0.747 1011m 256m 3.299 1g 1g 14.373 1g Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
| * | | dm thin: use per thin device deferred bio listsMike Snitzer2014-03-311-61/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The thin-pool previously only had a single deferred_bios list that would collect bios for all thin devices in the pool. Split this per-pool deferred_bios list out to per-thin deferred_bios_list -- doing so enables increased parallelism when processing deferred bios. And now that each thin device has it's own deferred_bios_list we can sort all bios in the list using logical sector. The requeue code in error handling path is also cleaner as a side-effect. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
| * | | dm thin: simplify pool_is_congestedMike Snitzer2014-03-311-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pool is congested if the pool is in PM_OUT_OF_DATA_SPACE mode. This is more explicit/clear/efficient than inferring whether or not the pool is congested by checking if retry_on_resume_list is empty. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
| * | | dm thin: fix dangling bio in process_deferred_bios error pathMike Snitzer2014-03-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If unable to ensure_next_mapping() we must add the current bio, which was removed from the @bios list via bio_list_pop, back to the deferred_bios list before all the remaining @bios. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
| * | | dm mpath: print more useful warnings in multipath_message()Jose Castillo2014-03-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The warning message "Unrecognised multipath message received" is displayed in two different situations in multipath_message(): when the number of arguments passed is invalid and when the string passed in argv[0] is not recognized. Make it easier to identify where the problem is by making these warnings more specific with additional context for each case. Signed-off-by: Jose Castillo <jcastillo@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm-mpath: do not activate failed pathsHannes Reinecke2014-03-271-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | activate_path() is run without a lock, so the path might be set to failed before activate_path() had a chance to run. This patch add a check for ->active in activate_path() to avoid unnecessary overhead by calling functions which are known to be failing. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm mpath: remove extra nesting in map functionMike Snitzer2014-03-271-22/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return early for case when no path exists, and when the pathgroup isn't ready. This eliminates the need for extra nesting for the the common case. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de>
| * | | dm mpath: remove map_io()Hannes Reinecke2014-03-271-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | multipath_map() is now just a wrapper around map_io(), so we can rename map_io() to multipath_map(). Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
| * | | dm mpath: reduce memory pressure when requeuingHannes Reinecke2014-03-271-23/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When multipath needs to requeue I/O in the block layer the per-request context shouldn't be allocated, as it will be freed immediately afterwards anyway. Avoiding this memory allocation will reduce memory pressure during requeuing. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
| * | | dm mpath: remove process_queued_ios()Hannes Reinecke2014-03-271-42/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | process_queued_ios() has served 3 functions: 1) select pg and pgpath if none is selected 2) start pg_init if requested 3) dispatch queued IOs when pg is ready Basically, a call to queue_work(process_queued_ios) can be replaced by dm_table_run_md_queue_async(), which runs request queue and ends up calling map_io(), which does 1), 2) and 3). Exception is when !pg_ready() (which means either pg_init is running or requested), then multipath_busy() prevents map_io() being called from request_fn. If pg_init is running, it should be ok as long as pg_init_done() does the right thing when pg_init is completed, I.e.: restart pg_init if !pg_ready() or call dm_table_run_md_queue_async() to kick map_io(). If pg_init is requested, we have to make sure the request is detected and pg_init will be started. pg_init is requested in 3 places: a) __choose_pgpath() in map_io() b) __choose_pgpath() in multipath_ioctl() c) pg_init retry in pg_init_done() a) is ok because map_io() calls __pg_init_all_paths(), which does 2). b) needs a call to __pg_init_all_paths(), which does 2). c) needs a call to __pg_init_all_paths(), which does 2). So this patch removes process_queued_ios() and ensures that __pg_init_all_paths() is called at the appropriate locations. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
| * | | dm mpath: push back requests instead of queueingHannes Reinecke2014-03-271-78/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no reason why multipath needs to queue requests internally for queue_if_no_path or pg_init; we should rather push them back onto the request queue. And while we're at it we can simplify the conditional statement in map_io() to make it easier to read. Since mpath no longer does internal queuing of I/O the table info no longer emits the internal queue_size. Instead it displays 1 if queuing is being used or 0 if it is not. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
| * | | dm table: add dm_table_run_md_queue_asyncMike Snitzer2014-03-274-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce dm_table_run_md_queue_async() to run the request_queue of the mapped_device associated with a request-based DM table. Also add dm_md_get_queue() wrapper to extract the request_queue from a mapped_device. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
| * | | dm mpath: do not call pg_init when it is already runningHannes Reinecke2014-03-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves condition checks as a preparation of following patches and has no effect on behaviour. process_queued_ios() is the only caller of __pg_init_all_paths() and 2 condition checks are moved from outside to inside without side effects. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
| * | | dm: use RCU_INIT_POINTER instead of rcu_assign_pointer in __unbindMonam Agarwal2014-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace rcu_assign_pointer(p, NULL) with RCU_INIT_POINTER(p, NULL). The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL). Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm: stop using bi_privateMikulas Patocka2014-03-271-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Device mapper uses the bio structure's bi_private field as a pointer to dm_target_io or dm_rq_clone_bio_info. But a bio structure is embedded in the dm_target_io and dm_rq_clone_bio_info structures, so the pointer to the structure that contains the bio can be found with the container_of() macro. Remove the use of bi_private and use container_of() instead. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm: remove dm_get_mapinfoMikulas Patocka2014-03-272-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove dm_get_mapinfo() because no target uses it. Targets can allocate per-bio data using ti->per_bio_data_size, this is much more flexible than union map_info. Leave union map_info only for the request-based multipath target's use. Also delete the unused "unsigned long long ll" field of union map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm: make dm_table_alloc_md_mempools staticMikulas Patocka2014-03-272-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the function dm_table_alloc_md_mempools static because it is not called from another file. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm: take care to copy the space map roots before locking the superblockJoe Thornber2014-03-273-81/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In theory copying the space map root can fail, but in practice it never does because we're careful to check what size buffer is needed. But make certain we're able to copy the space map roots before locking the superblock. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed
| * | | dm transaction manager: fix corruption due to non-atomic transaction commitJoe Thornber2014-03-275-27/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
| * | | dm cache: remove remainder of distinct discard block sizeHeinz Mauelshagen2014-03-274-77/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Discard block size not being equal to cache block size causes data corruption by erroneously avoiding migrations in issue_copy() because the discard state is being cleared for a group of cache blocks when it should not. Completely remove all code that enabled a distinction between the cache block size and discard block size. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm cache: prevent corruption caused by discard_block_size > cache_block_sizeMike Snitzer2014-03-271-34/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the discard block size is larger than the cache block size we will not properly quiesce IO to a region that is about to be discarded. This results in a race between a cache migration where no copy is needed, and a write to an adjacent cache block that's within the same large discard block. Workaround this by limiting the discard_block_size to cache_block_size. Also limit the max_discard_sectors to cache_block_size. A more comprehensive fix that introduces range locking support in the bio_prison and proper quiescing of a discard range that spans multiple cache blocks is already in development. Reported-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Cc: stable@vger.kernel.org
| * | | dm bitset: only flush the current word if it has been dirtiedJoe Thornber2014-03-272-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This change offers a big performance boost for dm-era. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm: add era targetJoe Thornber2014-03-274-0/+1851
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-era is a target that behaves similar to the linear target. In addition it keeps track of which blocks were written within a user defined period of time called an 'era'. Each era target instance maintains the current era as a monotonically increasing 32-bit counter. Use cases include tracking changed blocks for backup software, and partially invalidating the contents of a cache to restore cache coherency after rolling back a vendor snapshot. dm-era is primarily expected to be paired with the dm-cache target. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | | | Merge tag 'iommu-updates-v3.15' of ↵Linus Torvalds2014-04-0619-990/+1754
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU upates from Joerg Roedel: "This time a few more updates queued up. - Rework VT-d code to support ACPI devices - Improvements for memory and PCI hotplug support in the VT-d driver - Device-tree support for OMAP IOMMU - Convert OMAP IOMMU to use devm_* interfaces - Fixed PASID support for AMD IOMMU - Other random cleanups and fixes for OMAP, ARM-SMMU and SHMOBILE IOMMU Most of the changes are in the VT-d driver because some rework was necessary for better hotplug and ACPI device support" * tag 'iommu-updates-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (75 commits) iommu/vt-d: Fix error handling in ANDD processing iommu/vt-d: returning free pointer in get_domain_for_dev() iommu/vt-d: Only call dmar_acpi_dev_scope_init() if DRHD units present iommu/vt-d: Check for NULL pointer in dmar_acpi_dev_scope_init() iommu/amd: Fix logic to determine and checking max PASID iommu/vt-d: Include ACPI devices in iommu=pt iommu/vt-d: Finally enable translation for non-PCI devices iommu/vt-d: Remove to_pci_dev() in intel_map_page() iommu/vt-d: Remove pdev from intel_iommu_attach_device() iommu/vt-d: Remove pdev from iommu_no_mapping() iommu/vt-d: Make domain_add_dev_info() take struct device iommu/vt-d: Make domain_remove_one_dev_info() take struct device iommu/vt-d: Rename 'hwdev' variables to 'dev' now that that's the norm iommu/vt-d: Remove some pointless to_pci_dev() calls iommu/vt-d: Make get_valid_domain_for_dev() take struct device iommu/vt-d: Make iommu_should_identity_map() take struct device iommu/vt-d: Handle RMRRs for non-PCI devices iommu/vt-d: Make get_domain_for_dev() take struct device iommu/vt-d: Make domain_context_mapp{ed,ing}() take struct device iommu/vt-d: Make device_to_iommu() cope with non-PCI devices ...
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| *-------. \ \ \ Merge branches 'iommu/fixes', 'arm/smmu', 'x86/amd', 'arm/omap', ↵Joerg Roedel2014-04-0219-990/+1754
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'arm/shmobile' and 'x86/vt-d' into next
| | | | | | * | | | iommu/vt-d: Fix error handling in ANDD processingDavid Woodhouse2014-04-011-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we failed to find an ACPI device to correspond to an ANDD record, we would fail to increment our pointer and would just process the same record over and over again, with predictable results. Turn it from a while() loop into a for() loop to let the 'continue' in the error paths work correctly. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| | | | | | * | | | iommu/vt-d: returning free pointer in get_domain_for_dev()Dan Carpenter2014-03-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we hit this error condition then we want to return a NULL pointer and not a freed variable. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| | | | | | * | | | iommu/vt-d: Only call dmar_acpi_dev_scope_init() if DRHD units presentDavid Woodhouse2014-03-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Jörg and fixed in commit 11f1a7768 ("iommu/vt-d: Check for NULL pointer in dmar_acpi_dev_scope_init(), this code path can bizarrely get exercised even on AMD IOMMU systems with IRQ remapping enabled. In addition to the defensive check for NULL which Jörg added, let's also just avoid calling the function at all if there aren't an Intel IOMMU units in the system. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>