summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'modules-for-v4.19' of ↵Linus Torvalds2018-08-175-85/+100
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.19 merge window: - Fix modules kallsyms for livepatch. Livepatch modules can have SHN_UNDEF symbols in their module symbol tables for later symbol resolution, but kallsyms shouldn't be returning these symbols - Some code cleanups and minor reshuffling in load_module() were done to log the module name when module signature verification fails" * tag 'modules-for-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: kernel/module: Use kmemdup to replace kmalloc+memcpy ARM: module: fix modsign build error modsign: log module name in the event of an error module: replace VMLINUX_SYMBOL_STR() with __stringify() or string literal module: print sensible error code module: setup load info before module_sig_check() module: make it clear when we're handling the module copy in info->hdr module: exclude SHN_UNDEF symbols from kallsyms api
| * kernel/module: Use kmemdup to replace kmalloc+memcpyzhong jiang2018-08-021-4/+2
| | | | | | | | | | | | | | | | we prefer to the kmemdup rather than kmalloc+memcpy. so just replace them. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * ARM: module: fix modsign build errorArnd Bergmann2018-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The asm/module.h header file can not be included standalone, which breaks the module signing code after a recent change: In file included from kernel/module-internal.h:13, from kernel/module_signing.c:17: arch/arm/include/asm/module.h:37:27: error: 'struct module' declared inside parameter list will not be visible outside of this definition or declaration [-Werror] u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val); This adds a forward declaration of struct module to make it all work. Fixes: f314dfea16a0 ("modsign: log module name in the event of an error") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * modsign: log module name in the event of an errorJessica Yu2018-07-023-27/+32
| | | | | | | | | | | | | | | | | | | | Now that we have the load_info struct all initialized (including info->name, which contains the name of the module) before module_sig_check(), make the load_info struct and hence module name available to mod_verify_sig() so that we can log the module name in the event of an error. Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * module: replace VMLINUX_SYMBOL_STR() with __stringify() or string literalMasahiro Yamada2018-06-252-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | With the special case handling for Blackfin and Metag was removed by commit 94e58e0ac312 ("export.h: remove code for prefixing symbols with underscore"), VMLINUX_SYMBOL_STR() is now equivalent to __stringify(). Replace the remaining usages to prepare for the entire removal of VMLINUX_SYMBOL_STR(). Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * module: print sensible error codeJason A. Donenfeld2018-06-251-2/+2
| | | | | | | | | | | | | | | | | | Printing "err 0" to the user in the warning message is not particularly useful, especially when this gets transformed into a -ENOENT for the remainder of the call chain. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * module: setup load info before module_sig_check()Jessica Yu2018-06-221-34/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to be able to log the module name in early error messages, such as when module signature verification fails. Previously, the module name is set in layout_and_allocate(), meaning that any error messages that happen before (such as those in module_sig_check()) won't be logged with a module name, which isn't terribly helpful. In order to do this, reshuffle the order in load_module() and set up load info earlier so that we can log the module name along with these error messages. This requires splitting rewrite_section_headers() out of setup_load_info(). While we're at it, clean up and split up the operations done in layout_and_allocate(), setup_load_info(), and rewrite_section_headers() more cleanly so these functions only perform what their names suggest. Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * module: make it clear when we're handling the module copy in info->hdrJessica Yu2018-06-221-21/+21
| | | | | | | | | | | | | | | | | | | | In load_module(), it's not always clear whether we're handling the temporary module copy in info->hdr (which is freed at the end of load_module()) or if we're handling the module already allocated and copied to it's final place. Adding an info->mod field and using it whenever we're handling the temporary copy makes that explicitly clear. Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * module: exclude SHN_UNDEF symbols from kallsyms apiJessica Yu2018-06-181-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Livepatch modules are special in that we preserve their entire symbol tables in order to be able to apply relocations after module load. The unwanted side effect of this is that undefined (SHN_UNDEF) symbols of livepatch modules are accessible via the kallsyms api and this can confuse symbol resolution in livepatch (klp_find_object_symbol()) and cause subtle bugs in livepatch. Have the module kallsyms api skip over SHN_UNDEF symbols. These symbols are usually not available for normal modules anyway as we cut down their symbol tables to just the core (non-undefined) symbols, so this should really just affect livepatch modules. Note that this patch doesn't affect the display of undefined symbols in /proc/kallsyms. Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
* | Merge tag 'vla-leftovers-v4.19-rc1' of ↵Linus Torvalds2018-08-172-2/+10
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull VLA removal leftovers from Kees Cook: - bus/imx-weim: Use maximum register count to avoid VLA - drm/i2c/tda9950: Use maximum CEC message size to avoid VLA * tag 'vla-leftovers-v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: bus: imx-weim: Remove VLA usage drm/i2c: tda9950: Remove VLA usage
| * | bus: imx-weim: Remove VLA usageKees Cook2018-08-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this switches to using a maximum size and adds a sanity check. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: Fabio Estevam <fabio.estevam@nxp.com> Cc: NXP Linux Team <linux-imx@nxp.com> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Shawn Guo <shawnguo@kernel.org>
| * | drm/i2c: tda9950: Remove VLA usageKees Cook2018-08-131-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the quest to remove all stack VLA usage from the kernel[1], this sets the buffer to maximum size and adds a sanity check. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Cc: David Airlie <airlied@linux.ie> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Kees Cook <keescook@chromium.org>
* | | x86/speculation/l1tf: Exempt zeroed PTEs from inversionSean Christopherson2018-08-171-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that we should *not* invert all not-present mappings, because the all zeroes case is obviously special. clear_page() does not undergo the XOR logic to invert the address bits, i.e. PTE, PMD and PUD entries that have not been individually written will have val=0 and so will trigger __pte_needs_invert(). As a result, {pte,pmd,pud}_pfn() will return the wrong PFN value, i.e. all ones (adjusted by the max PFN mask) instead of zero. A zeroed entry is ok because the page at physical address 0 is reserved early in boot specifically to mitigate L1TF, so explicitly exempt them from the inversion when reading the PFN. Manifested as an unexpected mprotect(..., PROT_NONE) failure when called on a VMA that has VM_PFNMAP and was mmap'd to as something other than PROT_NONE but never used. mprotect() sends the PROT_NONE request down prot_none_walk(), which walks the PTEs to check the PFNs. prot_none_pte_entry() gets the bogus PFN from pte_pfn() and returns -EACCES because it thinks mprotect() is trying to adjust a high MMIO address. [ This is a very modified version of Sean's original patch, but all credit goes to Sean for doing this and also pointing out that sometimes the __pte_needs_invert() function only gets the protection bits, not the full eventual pte. But zero remains special even in just protection bits, so that's ok. - Linus ] Fixes: f22cc87f6c1f ("x86/speculation/l1tf: Invert all not present mappings") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'for-4.19/dm-changes' of ↵Linus Torvalds2018-08-1715-334/+690
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - A couple stable fixes for the DM writecache target. - A stable fix for the DM cache target that fixes the potential for data corruption after an unclean shutdown of a cache device using writeback mode. - Update DM integrity target to allow the metadata to be stored on a separate device from data. - Fix DM kcopyd and the snapshot target to cond_resched() where appropriate and be more efficient with processing completed work. - A few fixes and improvements for DM crypt. - Add DM delay target feature to configure delay of flushes independent of writes. - Update DM thin-provisioning target to include metadata_low_watermark threshold in pool status. - Fix stale DM thin-provisioning Documentation. * tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm writecache: fix a crash due to reading past end of dirty_bitmap dm crypt: don't decrease device limits dm cache metadata: set dirty on all cache blocks after a crash dm snapshot: remove stale FIXME in snapshot_map() dm snapshot: improve performance by switching out_of_order_list to rbtree dm kcopyd: avoid softlockup in run_complete_job dm cache metadata: save in-core policy_hint_size to on-disk superblock dm thin: stop no_space_timeout worker when switching to write-mode dm kcopyd: return void from dm_kcopyd_copy() dm thin: include metadata_low_watermark threshold in pool status dm writecache: report start_sector in status line dm crypt: convert essiv from ahash to shash dm crypt: use wake_up_process() instead of a wait queue dm integrity: recalculate checksums on creation dm integrity: flush journal on suspend when using separate metadata device dm integrity: use version 2 for separate metadata dm integrity: allow separate metadata device dm integrity: add ic->start in get_data_sector() dm integrity: report provided data sectors in the status dm integrity: implement fair range locks ...
| * | | dm writecache: fix a crash due to reading past end of dirty_bitmapMikulas Patocka2018-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wc->dirty_bitmap_size is in bytes so must multiply it by 8, not by BITS_PER_LONG, to get number of bitmap_bits. Fixes crash in find_next_bit() that was reported: https://bugzilla.kernel.org/show_bug.cgi?id=200819 Reported-by: edo.rus@gmail.com Fixes: 48debafe4f2f ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm crypt: don't decrease device limitsMikulas Patocka2018-08-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-crypt should only increase device limits, it should not decrease them. This fixes a bug where the user could creates a crypt device with 1024 sector size on the top of scsi device that had 4096 logical block size. The limit 4096 would be lost and the user could incorrectly send 1024-I/Os to the crypt device. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm cache metadata: set dirty on all cache blocks after a crashIlya Dryomov2018-08-091-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting Documentation/device-mapper/cache.txt: The 'dirty' state for a cache block changes far too frequently for us to keep updating it on the fly. So we treat it as a hint. In normal operation it will be written when the dm device is suspended. If the system crashes all cache blocks will be assumed dirty when restarted. This got broken in commit f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") in 4.9, which removed the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk flag) when loading cache blocks. This results in data corruption on an unclean shutdown with dirty cache blocks on the fast device. After the crash those blocks are considered clean and may get evicted from the cache at any time. This can be demonstrated by doing a lot of reads to trigger individual evictions, but uncache is more predictable: ### Disable auto-activation in lvm.conf to be able to do uncache in ### time (i.e. see uncache doing flushing) when the fix is applied. # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb # vgcreate vg_cache /dev/vdb /dev/vdc # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # dmsetup status vg_cache-lv_slowdev 0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw - ^^^^ 7065 * 64k = 441M yet to be written to the slow device # echo b >/proc/sysrq-trigger # vgchange -ay vg_cache # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # lvconvert --uncache vg_cache/lv_slowdev Flushing 0 blocks for cache vg_cache/lv_slowdev. Logical volume "lv_cachedev" successfully removed Logical volume vg_cache/lv_slowdev is not cached. # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ 0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ This is the case with both v1 and v2 cache pool metatata formats. After applying this patch: # vgchange -ay vg_cache # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # lvconvert --uncache vg_cache/lv_slowdev Flushing 3724 blocks for cache vg_cache/lv_slowdev. ... Flushing 71 blocks for cache vg_cache/lv_slowdev. Logical volume "lv_cachedev" successfully removed Logical volume vg_cache/lv_slowdev is not cached. # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ Cc: stable@vger.kernel.org Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm snapshot: remove stale FIXME in snapshot_map()Mike Snitzer2018-08-091-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ae1093be ("dm snapshot: use mutex instead of rw_semaphore") eliminated the need to worry about read vs write locking. So remove a FIXME in snapshot_map() that is concerned about selectively taking a write lock. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm snapshot: improve performance by switching out_of_order_list to rbtreeDavid Jeffery2018-08-081-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | copy_complete()'s processing of out_of_order_list can result in quadratic complexity in the worst case. As such it was the source of consuming too much cpu and the source of significant loss in performance. Fix this by converting out_of_order_list to an rbtree. This improved a dm-snapshot test copy workload from 32 seconds to 4 seconds. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Brett Hull <bhull@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm kcopyd: avoid softlockup in run_complete_jobJohn Pittman2018-08-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was reported that softlockups occur when using dm-snapshot ontop of slow (rbd) storage. E.g.: [ 4047.990647] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [kworker/10:23:26177] ... [ 4048.034151] Workqueue: kcopyd do_work [dm_mod] [ 4048.034156] RIP: 0010:copy_callback+0x41/0x160 [dm_snapshot] ... [ 4048.034190] Call Trace: [ 4048.034196] ? __chunk_is_tracked+0x70/0x70 [dm_snapshot] [ 4048.034200] run_complete_job+0x5f/0xb0 [dm_mod] [ 4048.034205] process_jobs+0x91/0x220 [dm_mod] [ 4048.034210] ? kcopyd_put_pages+0x40/0x40 [dm_mod] [ 4048.034214] do_work+0x46/0xa0 [dm_mod] [ 4048.034219] process_one_work+0x171/0x370 [ 4048.034221] worker_thread+0x1fc/0x3f0 [ 4048.034224] kthread+0xf8/0x130 [ 4048.034226] ? max_active_store+0x80/0x80 [ 4048.034227] ? kthread_bind+0x10/0x10 [ 4048.034231] ret_from_fork+0x35/0x40 [ 4048.034233] Kernel panic - not syncing: softlockup: hung tasks Fix this by calling cond_resched() after run_complete_job()'s callout to the dm_kcopyd_notify_fn (which is dm-snap.c:copy_callback in the above trace). Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm cache metadata: save in-core policy_hint_size to on-disk superblockMike Snitzer2018-08-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | policy_hint_size starts as 0 during __write_initial_superblock(). It isn't until the policy is loaded that policy_hint_size is set in-core (cmd->policy_hint_size). But it never got recorded in the on-disk superblock because __commit_transaction() didn't deal with transfering the in-core cmd->policy_hint_size to the on-disk superblock. The in-core cmd->policy_hint_size gets initialized by metadata_open()'s __begin_transaction_flags() which re-reads all superblock fields. Because the superblock's policy_hint_size was never properly stored, when the cache was created, hints_array_available() would always return false when re-activating a previously created cache. This means __load_mappings() always considered the hints invalid and never made use of the hints (these hints served to optimize). Another detremental side-effect of this oversight is the cache_check utility would fail with: "invalid hint width: 0" Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm thin: stop no_space_timeout worker when switching to write-modeHou Tao2018-08-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now both check_for_space() and do_no_space_timeout() will read & write pool->pf.error_if_no_space. If these functions run concurrently, as shown in the following case, the default setting of "queue_if_no_space" can get lost. precondition: * error_if_no_space = false (aka "queue_if_no_space") * pool is in Out-of-Data-Space (OODS) mode * no_space_timeout worker has been queued CPU 0: CPU 1: // delete a thin device process_delete_mesg() // check_for_space() invoked by commit() set_pool_mode(pool, PM_WRITE) pool->pf.error_if_no_space = \ pt->requested_pf.error_if_no_space // timeout, pool is still in OODS mode do_no_space_timeout // "queue_if_no_space" config is lost pool->pf.error_if_no_space = true pool->pf.mode = new_mode Fix it by stopping no_space_timeout worker when switching to write mode. Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm kcopyd: return void from dm_kcopyd_copy()Mike Snitzer2018-07-316-63/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | dm_kcopyd_copy() only ever returns 0 so there is no need for callers to account for possible failure. Same goes for dm_kcopyd_zero(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm thin: include metadata_low_watermark threshold in pool statusAndy Grover2018-07-302-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The metadata low watermark threshold is set by the kernel. But the kernel depends on userspace to extend the thinpool metadata device when the threshold is crossed. Since the metadata low watermark threshold is not visible to userspace, upon receiving an event, userspace cannot tell that the kernel wants the metadata device extended, instead of some other eventing condition. Making it visible (but not settable) enables userspace to affirmatively know the kernel is asking for a metadata device extension, by comparing metadata_low_watermark against nr_free_blocks_metadata, also reported in status. Current solutions like dmeventd have their own thresholds for extending the data and metadata devices, and both devices are checked against their thresholds on each event. This lessens the value of the kernel-set threshold, since userspace will either extend the metadata device sooner, when receiving another event; or will receive the metadata lowater event and do nothing, if dmeventd's threshold is less than the kernel's. (This second case is dangerous. The metadata lowater event will not be re-sent, so no further event will be generated before the metadata device is out if space, unless some other event causes userspace to recheck its thresholds.) Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm writecache: report start_sector in status lineMikulas Patocka2018-07-271-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | Fixes: d284f8248c7 ("dm writecache: support optional offset for start of device") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm crypt: convert essiv from ahash to shashKees Cook2018-07-271-17/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparing to remove all stack VLA usage from the kernel[1], remove the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct shash. The stack allocation will be made a fixed size in a later patch to the crypto subsystem. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm crypt: use wake_up_process() instead of a wait queueMikulas Patocka2018-07-271-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a small simplification of dm-crypt - use wake_up_process() instead of a wait queue in a case where only one process may be waiting. dm-writecache uses a similar pattern. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: recalculate checksums on creationMikulas Patocka2018-07-272-4/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When using external metadata device and internal hash, recalculate the checksums when the device is created - so that dm-integrity doesn't have to overwrite the device. The superblock stores the last position when the recalculation ended, so that it is properly restarted. Integrity tags that haven't been recalculated yet are ignored. Also bump the target version. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: flush journal on suspend when using separate metadata deviceMikulas Patocka2018-07-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flush the journal on suspend when using separate data and metadata devices, so that the metadata device can be discarded and the table can be reloaded with a linear target pointing to the data device. NOTE: the journal is deliberately not flushed when using the same device for metadata and data, so that the journal replay code is tested. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: use version 2 for separate metadataMikulas Patocka2018-07-271-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use version "2" in the superblock when data and metadata devices are separate, so that the device is not accidentally read by older kernel version. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: allow separate metadata deviceMikulas Patocka2018-07-271-54/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the ability to store DM integrity metadata on a separate device. This feature is activated with the option "meta_device:/dev/device". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: add ic->start in get_data_sector()Mikulas Patocka2018-07-271-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A small refactoring. Add the variable ic->start to the result returned by get_data_sector() and not in the callers. This is a prerequisite for the commit that adds the ability to use an external metadata device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: report provided data sectors in the statusMikulas Patocka2018-07-271-1/+3
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: implement fair range locksMikulas Patocka2018-07-271-9/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-integrity locks a range of sectors to prevent concurrent I/O or journal writeback. These locks were not fair - so that many small overlapping I/Os could starve a large I/O indefinitely. Fix this by making the range locks fair. The ranges that are waiting are added to the list "wait_list". If a new I/O overlaps some of the waiting I/Os, it is not dispatched, but it is also added to that wait list. Entries on the wait list are processed in first-in-first-out order, so that an I/O can't starve indefinitely. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: decouple common code in dm_integrity_map_continue()Mikulas Patocka2018-07-271-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decouple how dm_integrity_map_continue() responds to being out of free sectors and when add_new_range() fails. This has no functional change, but helps prepare for the next commit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm integrity: change 'suspending' variable from bool to intMikulas Patocka2018-07-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Early alpha processors can't write a byte or short atomically - they read 8 bytes, modify the byte or two bytes in registers and write back 8 bytes. The modification of the variable "suspending" may race with modification of the variable "failed". Fix this by changing "suspending" to an int. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm delay: add flush as a third class of IOMikulas Patocka2018-07-272-5/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new class for dm-delay that delays flush requests. Previously, flushes were delayed as writes, but it caused problems if the user needed to create a device with one or a few slow sectors for the purpose of testing - all flushes would be forwarded to this device and delayed, and that skews the test results. Fix this by allowing to select 0 delay for flushes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm delay: refactor repetitive codeMikulas Patocka2018-07-271-120/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-delay has a lot of code that is repeated for delaying read and write bios. Repetitive code is generally bad; refactor out the repetitive code in preperation for adding another delay class for flush bios. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm cache: only allow a single io_mode cache feature to be requestedJohn Pittman2018-07-271-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More than one io_mode feature can be requested when creating a dm cache device (as is: last one wins). The io_mode selections are incompatible with one another, we should force them to be selected exclusively. Add a counter to check for more than one io_mode selection. Fixes: 629d0a8a1a10 ("dm cache metadata: add "metadata2" feature") Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | dm thin: update stale "Status" DocumentationMike Snitzer2018-07-271-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation/device-mapper-/thin-provisioning.txt's "Status" section no longer reflected the current fitness level of DM thin-provisioning. That is, DM thinp is no longer "EXPERIMENTAL". It has since seen considerable improvement, has been fairly widely deployed and has performed in a robust manner. Update Documentation to dispel concern raised by potential DM thinp users. Reported-by: Drew Hastings <dhastings@crucialwebhost.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | | | Merge tag 'fsnotify_for_v4.19-rc1' of ↵Linus Torvalds2018-08-179-138/+157
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify cleanups from Amir and a small inotify improvement" * tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Add flag IN_MASK_CREATE for inotify_add_watch() fanotify: factor out helpers to add/remove mark fsnotify: add helper to get mask from connector fsnotify: let connector point to an abstract object fsnotify: pass connp and object type to fsnotify_add_mark() fsnotify: use typedef fsnotify_connp_t for brevity
| * | | | inotify: Add flag IN_MASK_CREATE for inotify_add_watch()Henry Wilson2018-06-273-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch() which prevents inotify from modifying any existing watches when invoked. If the pathname specified in the call has a watched inode associated with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST. Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and will return EINVAL. RATIONALE In the current implementation, there is no way to prevent inotify_add_watch() from modifying existing watch descriptors. Even if the caller keeps a record of all watch descriptors collected, this is only sufficient to detect that an existing watch descriptor may have been modified. The assumption that a particular path will map to the same inode over multiple calls to inotify_add_watch() cannot be made as files can be renamed or deleted. It is also not possible to assume that two distinct paths do no map to the same inode, due to hard-links or a dereferenced symbolic link. Further uses of inotify_add_watch() to revert the change may cause other watch descriptors to be modified or created, merely compunding the problem. There is currently no system call such as inotify_modify_watch() to explicity modify a watch descriptor, which would be able to revert unwanted changes. Thus the caller cannot guarantee to be able to revert any changes to existing watch decriptors. Additionally the caller cannot assume that the events that are associated with a watch descriptor are within the set requested, as any future calls to inotify_add_watch() may unintentionally modify a watch descriptor's mask. Thus it cannot currently be guaranteed that a watch descriptor will only generate events which have been requested. The program must filter events which come through its watch descriptor to within its expected range. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Henry Wilson <henry.wilson@acentic.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | fanotify: factor out helpers to add/remove markAmir Goldstein2018-06-271-57/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor out helpers fanotify_add_mark() and fanotify_remove_mark() to reduce duplicated code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | fsnotify: add helper to get mask from connectorAmir Goldstein2018-06-273-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a helper to get the mask from the object (i.e. i_fsnotify_mask) to generalize code of add/remove inode/vfsmount mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | fsnotify: let connector point to an abstract objectAmir Goldstein2018-06-275-37/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the code to attach/detach a connector to object more generic by letting the fsnotify connector point to an abstract fsnotify_connp_t. Code that needs to dereference an inode or mount object now uses the helpers fsnotify_conn_{inode,mount}. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | fsnotify: pass connp and object type to fsnotify_add_mark()Amir Goldstein2018-06-274-39/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of passing inode and vfsmount arguments to fsnotify_add_mark() and its _locked variant, pass an abstract object pointer and the object type. The helpers fsnotify_obj_{inode,mount} are added to get the concrete object pointer from abstract object pointer. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | fsnotify: use typedef fsnotify_connp_t for brevityAmir Goldstein2018-06-273-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The object marks manipulation functions fsnotify_destroy_marks() fsnotify_find_mark() and their helpers take an argument of type struct fsnotify_mark_connector __rcu ** to dereference the connector pointer. use a typedef to describe this type for brevity. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | | | | Merge tag 'for_v4.19-rc1' of ↵Linus Torvalds2018-08-179-45/+33
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF and ext2 update from Jan Kara. * tag 'for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: use ktime_get_real_seconds for timestamps udf: convert inode stamps to timespec64
| * | | | | ext2: use ktime_get_real_seconds for timestampsArnd Bergmann2018-06-272-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_seconds() is deprecated because of the y2038 overflow, so users should migrate to 64-bit timestamps using ktime_get_real_seconds(). In ext2, the timestamps in the superblock and in the inode are all limited to 32-bit, and this won't get fixed, so let's just stop using the deprecated interface and keep truncating. All users of ext2 should migrate to ext4 before 2038 to prevent this from causing problems. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | | | udf: convert inode stamps to timespec64Arnd Bergmann2018-06-277-41/+28
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VFS structures are finally converted to always use 64-bit timestamps, and this file system can represent a long range of on-disk timestamps already, so now let's fit in the missing bits for udf. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>