summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-4.19/dm-changes' of ↵Linus Torvalds2018-08-1715-334/+690
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - A couple stable fixes for the DM writecache target. - A stable fix for the DM cache target that fixes the potential for data corruption after an unclean shutdown of a cache device using writeback mode. - Update DM integrity target to allow the metadata to be stored on a separate device from data. - Fix DM kcopyd and the snapshot target to cond_resched() where appropriate and be more efficient with processing completed work. - A few fixes and improvements for DM crypt. - Add DM delay target feature to configure delay of flushes independent of writes. - Update DM thin-provisioning target to include metadata_low_watermark threshold in pool status. - Fix stale DM thin-provisioning Documentation. * tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits) dm writecache: fix a crash due to reading past end of dirty_bitmap dm crypt: don't decrease device limits dm cache metadata: set dirty on all cache blocks after a crash dm snapshot: remove stale FIXME in snapshot_map() dm snapshot: improve performance by switching out_of_order_list to rbtree dm kcopyd: avoid softlockup in run_complete_job dm cache metadata: save in-core policy_hint_size to on-disk superblock dm thin: stop no_space_timeout worker when switching to write-mode dm kcopyd: return void from dm_kcopyd_copy() dm thin: include metadata_low_watermark threshold in pool status dm writecache: report start_sector in status line dm crypt: convert essiv from ahash to shash dm crypt: use wake_up_process() instead of a wait queue dm integrity: recalculate checksums on creation dm integrity: flush journal on suspend when using separate metadata device dm integrity: use version 2 for separate metadata dm integrity: allow separate metadata device dm integrity: add ic->start in get_data_sector() dm integrity: report provided data sectors in the status dm integrity: implement fair range locks ...
| * dm writecache: fix a crash due to reading past end of dirty_bitmapMikulas Patocka2018-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | wc->dirty_bitmap_size is in bytes so must multiply it by 8, not by BITS_PER_LONG, to get number of bitmap_bits. Fixes crash in find_next_bit() that was reported: https://bugzilla.kernel.org/show_bug.cgi?id=200819 Reported-by: edo.rus@gmail.com Fixes: 48debafe4f2f ("dm: add writecache target") Cc: stable@vger.kernel.org # 4.18 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm crypt: don't decrease device limitsMikulas Patocka2018-08-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | dm-crypt should only increase device limits, it should not decrease them. This fixes a bug where the user could creates a crypt device with 1024 sector size on the top of scsi device that had 4096 logical block size. The limit 4096 would be lost and the user could incorrectly send 1024-I/Os to the crypt device. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache metadata: set dirty on all cache blocks after a crashIlya Dryomov2018-08-091-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting Documentation/device-mapper/cache.txt: The 'dirty' state for a cache block changes far too frequently for us to keep updating it on the fly. So we treat it as a hint. In normal operation it will be written when the dm device is suspended. If the system crashes all cache blocks will be assumed dirty when restarted. This got broken in commit f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") in 4.9, which removed the code that consulted cmd->clean_when_opened (CLEAN_SHUTDOWN on-disk flag) when loading cache blocks. This results in data corruption on an unclean shutdown with dirty cache blocks on the fast device. After the crash those blocks are considered clean and may get evicted from the cache at any time. This can be demonstrated by doing a lot of reads to trigger individual evictions, but uncache is more predictable: ### Disable auto-activation in lvm.conf to be able to do uncache in ### time (i.e. see uncache doing flushing) when the fix is applied. # xfs_io -d -c 'pwrite -b 4M -S 0xaa 0 1G' /dev/vdb # vgcreate vg_cache /dev/vdb /dev/vdc # lvcreate -L 1G -n lv_slowdev vg_cache /dev/vdb # lvcreate -L 512M -n lv_cachedev vg_cache /dev/vdc # lvcreate -L 256M -n lv_metadev vg_cache /dev/vdc # lvconvert --type cache-pool --cachemode writeback vg_cache/lv_cachedev --poolmetadata vg_cache/lv_metadev # lvconvert --type cache vg_cache/lv_slowdev --cachepool vg_cache/lv_cachedev # xfs_io -d -c 'pwrite -b 4M -S 0xbb 0 512M' /dev/mapper/vg_cache-lv_slowdev # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # dmsetup status vg_cache-lv_slowdev 0 2097152 cache 8 27/65536 128 8192/8192 1 100 0 0 0 8192 7065 2 metadata2 writeback 2 migration_threshold 2048 smq 0 rw - ^^^^ 7065 * 64k = 441M yet to be written to the slow device # echo b >/proc/sysrq-trigger # vgchange -ay vg_cache # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # lvconvert --uncache vg_cache/lv_slowdev Flushing 0 blocks for cache vg_cache/lv_slowdev. Logical volume "lv_cachedev" successfully removed Logical volume vg_cache/lv_slowdev is not cached. # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ 0fe00010: aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa ................ This is the case with both v1 and v2 cache pool metatata formats. After applying this patch: # vgchange -ay vg_cache # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ # lvconvert --uncache vg_cache/lv_slowdev Flushing 3724 blocks for cache vg_cache/lv_slowdev. ... Flushing 71 blocks for cache vg_cache/lv_slowdev. Logical volume "lv_cachedev" successfully removed Logical volume vg_cache/lv_slowdev is not cached. # xfs_io -d -c 'pread -v 254M 512' /dev/mapper/vg_cache-lv_slowdev | head -n 2 0fe00000: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ 0fe00010: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................ Cc: stable@vger.kernel.org Fixes: f177940a8091 ("dm cache metadata: switch to using the new cursor api for loading metadata") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm snapshot: remove stale FIXME in snapshot_map()Mike Snitzer2018-08-091-2/+0
| | | | | | | | | | | | | | | | | | Commit ae1093be ("dm snapshot: use mutex instead of rw_semaphore") eliminated the need to worry about read vs write locking. So remove a FIXME in snapshot_map() that is concerned about selectively taking a write lock. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm snapshot: improve performance by switching out_of_order_list to rbtreeDavid Jeffery2018-08-081-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | copy_complete()'s processing of out_of_order_list can result in quadratic complexity in the worst case. As such it was the source of consuming too much cpu and the source of significant loss in performance. Fix this by converting out_of_order_list to an rbtree. This improved a dm-snapshot test copy workload from 32 seconds to 4 seconds. Signed-off-by: David Jeffery <djeffery@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Brett Hull <bhull@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm kcopyd: avoid softlockup in run_complete_jobJohn Pittman2018-08-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was reported that softlockups occur when using dm-snapshot ontop of slow (rbd) storage. E.g.: [ 4047.990647] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [kworker/10:23:26177] ... [ 4048.034151] Workqueue: kcopyd do_work [dm_mod] [ 4048.034156] RIP: 0010:copy_callback+0x41/0x160 [dm_snapshot] ... [ 4048.034190] Call Trace: [ 4048.034196] ? __chunk_is_tracked+0x70/0x70 [dm_snapshot] [ 4048.034200] run_complete_job+0x5f/0xb0 [dm_mod] [ 4048.034205] process_jobs+0x91/0x220 [dm_mod] [ 4048.034210] ? kcopyd_put_pages+0x40/0x40 [dm_mod] [ 4048.034214] do_work+0x46/0xa0 [dm_mod] [ 4048.034219] process_one_work+0x171/0x370 [ 4048.034221] worker_thread+0x1fc/0x3f0 [ 4048.034224] kthread+0xf8/0x130 [ 4048.034226] ? max_active_store+0x80/0x80 [ 4048.034227] ? kthread_bind+0x10/0x10 [ 4048.034231] ret_from_fork+0x35/0x40 [ 4048.034233] Kernel panic - not syncing: softlockup: hung tasks Fix this by calling cond_resched() after run_complete_job()'s callout to the dm_kcopyd_notify_fn (which is dm-snap.c:copy_callback in the above trace). Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache metadata: save in-core policy_hint_size to on-disk superblockMike Snitzer2018-08-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | policy_hint_size starts as 0 during __write_initial_superblock(). It isn't until the policy is loaded that policy_hint_size is set in-core (cmd->policy_hint_size). But it never got recorded in the on-disk superblock because __commit_transaction() didn't deal with transfering the in-core cmd->policy_hint_size to the on-disk superblock. The in-core cmd->policy_hint_size gets initialized by metadata_open()'s __begin_transaction_flags() which re-reads all superblock fields. Because the superblock's policy_hint_size was never properly stored, when the cache was created, hints_array_available() would always return false when re-activating a previously created cache. This means __load_mappings() always considered the hints invalid and never made use of the hints (these hints served to optimize). Another detremental side-effect of this oversight is the cache_check utility would fail with: "invalid hint width: 0" Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: stop no_space_timeout worker when switching to write-modeHou Tao2018-08-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now both check_for_space() and do_no_space_timeout() will read & write pool->pf.error_if_no_space. If these functions run concurrently, as shown in the following case, the default setting of "queue_if_no_space" can get lost. precondition: * error_if_no_space = false (aka "queue_if_no_space") * pool is in Out-of-Data-Space (OODS) mode * no_space_timeout worker has been queued CPU 0: CPU 1: // delete a thin device process_delete_mesg() // check_for_space() invoked by commit() set_pool_mode(pool, PM_WRITE) pool->pf.error_if_no_space = \ pt->requested_pf.error_if_no_space // timeout, pool is still in OODS mode do_no_space_timeout // "queue_if_no_space" config is lost pool->pf.error_if_no_space = true pool->pf.mode = new_mode Fix it by stopping no_space_timeout worker when switching to write mode. Fixes: bcc696fac11f ("dm thin: stay in out-of-data-space mode once no_space_timeout expires") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm kcopyd: return void from dm_kcopyd_copy()Mike Snitzer2018-07-316-63/+27
| | | | | | | | | | | | | | dm_kcopyd_copy() only ever returns 0 so there is no need for callers to account for possible failure. Same goes for dm_kcopyd_zero(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: include metadata_low_watermark threshold in pool statusAndy Grover2018-07-302-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The metadata low watermark threshold is set by the kernel. But the kernel depends on userspace to extend the thinpool metadata device when the threshold is crossed. Since the metadata low watermark threshold is not visible to userspace, upon receiving an event, userspace cannot tell that the kernel wants the metadata device extended, instead of some other eventing condition. Making it visible (but not settable) enables userspace to affirmatively know the kernel is asking for a metadata device extension, by comparing metadata_low_watermark against nr_free_blocks_metadata, also reported in status. Current solutions like dmeventd have their own thresholds for extending the data and metadata devices, and both devices are checked against their thresholds on each event. This lessens the value of the kernel-set threshold, since userspace will either extend the metadata device sooner, when receiving another event; or will receive the metadata lowater event and do nothing, if dmeventd's threshold is less than the kernel's. (This second case is dangerous. The metadata lowater event will not be re-sent, so no further event will be generated before the metadata device is out if space, unless some other event causes userspace to recheck its thresholds.) Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm writecache: report start_sector in status lineMikulas Patocka2018-07-271-1/+5
| | | | | | | | | | | | Fixes: d284f8248c7 ("dm writecache: support optional offset for start of device") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm crypt: convert essiv from ahash to shashKees Cook2018-07-271-17/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparing to remove all stack VLA usage from the kernel[1], remove the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct shash. The stack allocation will be made a fixed size in a later patch to the crypto subsystem. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm crypt: use wake_up_process() instead of a wait queueMikulas Patocka2018-07-271-15/+10
| | | | | | | | | | | | | | | | | | This is a small simplification of dm-crypt - use wake_up_process() instead of a wait queue in a case where only one process may be waiting. dm-writecache uses a similar pattern. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: recalculate checksums on creationMikulas Patocka2018-07-272-4/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When using external metadata device and internal hash, recalculate the checksums when the device is created - so that dm-integrity doesn't have to overwrite the device. The superblock stores the last position when the recalculation ended, so that it is properly restarted. Integrity tags that haven't been recalculated yet are ignored. Also bump the target version. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: flush journal on suspend when using separate metadata deviceMikulas Patocka2018-07-271-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Flush the journal on suspend when using separate data and metadata devices, so that the metadata device can be discarded and the table can be reloaded with a linear target pointing to the data device. NOTE: the journal is deliberately not flushed when using the same device for metadata and data, so that the journal replay code is tested. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: use version 2 for separate metadataMikulas Patocka2018-07-271-3/+13
| | | | | | | | | | | | | | | | | | Use version "2" in the superblock when data and metadata devices are separate, so that the device is not accidentally read by older kernel version. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: allow separate metadata deviceMikulas Patocka2018-07-271-54/+149
| | | | | | | | | | | | | | | | Add the ability to store DM integrity metadata on a separate device. This feature is activated with the option "meta_device:/dev/device". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: add ic->start in get_data_sector()Mikulas Patocka2018-07-271-3/+4
| | | | | | | | | | | | | | | | | | | | A small refactoring. Add the variable ic->start to the result returned by get_data_sector() and not in the callers. This is a prerequisite for the commit that adds the ability to use an external metadata device. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: report provided data sectors in the statusMikulas Patocka2018-07-271-1/+3
| | | | | | | | | | Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: implement fair range locksMikulas Patocka2018-07-271-9/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm-integrity locks a range of sectors to prevent concurrent I/O or journal writeback. These locks were not fair - so that many small overlapping I/Os could starve a large I/O indefinitely. Fix this by making the range locks fair. The ranges that are waiting are added to the list "wait_list". If a new I/O overlaps some of the waiting I/Os, it is not dispatched, but it is also added to that wait list. Entries on the wait list are processed in first-in-first-out order, so that an I/O can't starve indefinitely. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: decouple common code in dm_integrity_map_continue()Mikulas Patocka2018-07-271-3/+7
| | | | | | | | | | | | | | | | | | | | Decouple how dm_integrity_map_continue() responds to being out of free sectors and when add_new_range() fails. This has no functional change, but helps prepare for the next commit. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm integrity: change 'suspending' variable from bool to intMikulas Patocka2018-07-271-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Early alpha processors can't write a byte or short atomically - they read 8 bytes, modify the byte or two bytes in registers and write back 8 bytes. The modification of the variable "suspending" may race with modification of the variable "failed". Fix this by changing "suspending" to an int. Cc: stable@vger.kernel.org Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm delay: add flush as a third class of IOMikulas Patocka2018-07-272-5/+32
| | | | | | | | | | | | | | | | | | | | | | | | Add a new class for dm-delay that delays flush requests. Previously, flushes were delayed as writes, but it caused problems if the user needed to create a device with one or a few slow sectors for the purpose of testing - all flushes would be forwarded to this device and delayed, and that skews the test results. Fix this by allowing to select 0 delay for flushes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm delay: refactor repetitive codeMikulas Patocka2018-07-271-120/+103
| | | | | | | | | | | | | | | | | | dm-delay has a lot of code that is repeated for delaying read and write bios. Repetitive code is generally bad; refactor out the repetitive code in preperation for adding another delay class for flush bios. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache: only allow a single io_mode cache feature to be requestedJohn Pittman2018-07-271-4/+15
| | | | | | | | | | | | | | | | | | | | | | More than one io_mode feature can be requested when creating a dm cache device (as is: last one wins). The io_mode selections are incompatible with one another, we should force them to be selected exclusively. Add a counter to check for more than one io_mode selection. Fixes: 629d0a8a1a10 ("dm cache metadata: add "metadata2" feature") Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: update stale "Status" DocumentationMike Snitzer2018-07-271-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Documentation/device-mapper-/thin-provisioning.txt's "Status" section no longer reflected the current fitness level of DM thin-provisioning. That is, DM thinp is no longer "EXPERIMENTAL". It has since seen considerable improvement, has been fairly widely deployed and has performed in a robust manner. Update Documentation to dispel concern raised by potential DM thinp users. Reported-by: Drew Hastings <dhastings@crucialwebhost.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | Merge tag 'fsnotify_for_v4.19-rc1' of ↵Linus Torvalds2018-08-179-138/+157
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "fsnotify cleanups from Amir and a small inotify improvement" * tag 'fsnotify_for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Add flag IN_MASK_CREATE for inotify_add_watch() fanotify: factor out helpers to add/remove mark fsnotify: add helper to get mask from connector fsnotify: let connector point to an abstract object fsnotify: pass connp and object type to fsnotify_add_mark() fsnotify: use typedef fsnotify_connp_t for brevity
| * | inotify: Add flag IN_MASK_CREATE for inotify_add_watch()Henry Wilson2018-06-273-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The flag IN_MASK_CREATE is introduced as a flag for inotiy_add_watch() which prevents inotify from modifying any existing watches when invoked. If the pathname specified in the call has a watched inode associated with it and IN_MASK_CREATE is specified, fail with an errno of EEXIST. Use of IN_MASK_CREATE with IN_MASK_ADD is reserved for future use and will return EINVAL. RATIONALE In the current implementation, there is no way to prevent inotify_add_watch() from modifying existing watch descriptors. Even if the caller keeps a record of all watch descriptors collected, this is only sufficient to detect that an existing watch descriptor may have been modified. The assumption that a particular path will map to the same inode over multiple calls to inotify_add_watch() cannot be made as files can be renamed or deleted. It is also not possible to assume that two distinct paths do no map to the same inode, due to hard-links or a dereferenced symbolic link. Further uses of inotify_add_watch() to revert the change may cause other watch descriptors to be modified or created, merely compunding the problem. There is currently no system call such as inotify_modify_watch() to explicity modify a watch descriptor, which would be able to revert unwanted changes. Thus the caller cannot guarantee to be able to revert any changes to existing watch decriptors. Additionally the caller cannot assume that the events that are associated with a watch descriptor are within the set requested, as any future calls to inotify_add_watch() may unintentionally modify a watch descriptor's mask. Thus it cannot currently be guaranteed that a watch descriptor will only generate events which have been requested. The program must filter events which come through its watch descriptor to within its expected range. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Henry Wilson <henry.wilson@acentic.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | fanotify: factor out helpers to add/remove markAmir Goldstein2018-06-271-57/+29
| | | | | | | | | | | | | | | | | | | | | | | | Factor out helpers fanotify_add_mark() and fanotify_remove_mark() to reduce duplicated code. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | fsnotify: add helper to get mask from connectorAmir Goldstein2018-06-273-12/+31
| | | | | | | | | | | | | | | | | | | | | | | | Use a helper to get the mask from the object (i.e. i_fsnotify_mask) to generalize code of add/remove inode/vfsmount mark. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | fsnotify: let connector point to an abstract objectAmir Goldstein2018-06-275-37/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the code to attach/detach a connector to object more generic by letting the fsnotify connector point to an abstract fsnotify_connp_t. Code that needs to dereference an inode or mount object now uses the helpers fsnotify_conn_{inode,mount}. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | fsnotify: pass connp and object type to fsnotify_add_mark()Amir Goldstein2018-06-274-39/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of passing inode and vfsmount arguments to fsnotify_add_mark() and its _locked variant, pass an abstract object pointer and the object type. The helpers fsnotify_obj_{inode,mount} are added to get the concrete object pointer from abstract object pointer. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | fsnotify: use typedef fsnotify_connp_t for brevityAmir Goldstein2018-06-273-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The object marks manipulation functions fsnotify_destroy_marks() fsnotify_find_mark() and their helpers take an argument of type struct fsnotify_mark_connector __rcu ** to dereference the connector pointer. use a typedef to describe this type for brevity. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | | Merge tag 'for_v4.19-rc1' of ↵Linus Torvalds2018-08-179-45/+33
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull UDF and ext2 update from Jan Kara. * tag 'for_v4.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: use ktime_get_real_seconds for timestamps udf: convert inode stamps to timespec64
| * | | ext2: use ktime_get_real_seconds for timestampsArnd Bergmann2018-06-272-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_seconds() is deprecated because of the y2038 overflow, so users should migrate to 64-bit timestamps using ktime_get_real_seconds(). In ext2, the timestamps in the superblock and in the inode are all limited to 32-bit, and this won't get fixed, so let's just stop using the deprecated interface and keep truncating. All users of ext2 should migrate to ext4 before 2038 to prevent this from causing problems. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * | | udf: convert inode stamps to timespec64Arnd Bergmann2018-06-277-41/+28
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | The VFS structures are finally converted to always use 64-bit timestamps, and this file system can represent a long range of on-disk timestamps already, so now let's fit in the missing bits for udf. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jan Kara <jack@suse.cz>
* | | Merge tag 'for-linus-4.19-ofs1' of ↵Linus Torvalds2018-08-162-12/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Orangefs: one cleanup and Souptick's vm_fault_t patch: - add new return type vm_fault_t (Souptick Joarder) - remove redundant pointer (Colin Ian King)" * tag 'for-linus-4.19-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: remove redundant pointer orangefs_inode orangefs: Adding new return type vm_fault_t
| * | | orangefs: remove redundant pointer orangefs_inodeColin Ian King2018-08-141-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pointer orangefs_inode is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'orangefs_inode' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| * | | orangefs: Adding new return type vm_fault_tSouptick Joarder2018-08-141-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. See the following commit 1c8f422059ae ("mm: change return type to vm_fault_t") Fixed checkpatch.pl warning. Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
* | | | Merge tag 'vfio-v4.19-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds2018-08-162-1/+15
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VFIO updates from Alex Williamson: - mark switch fall-through cases (Gustavo A. R. Silva) - disable binding SR-IOV enabled PFs (Alex Williamson) * tag 'vfio-v4.19-rc1' of git://github.com/awilliam/linux-vfio: vfio-pci: Disable binding to PFs with SR-IOV enabled vfio: Mark expected switch fall-throughs
| * | | | vfio-pci: Disable binding to PFs with SR-IOV enabledAlex Williamson2018-08-061-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We expect to receive PFs with SR-IOV disabled, however some host drivers leave SR-IOV enabled at unbind. This puts us in a state where we can potentially assign both the PF and the VF, leading to both functionality as well as security concerns due to lack of managing the SR-IOV state as well as vendor dependent isolation from the PF to VF. If we were to attempt to actively disable SR-IOV on driver probe, we risk VF bound drivers blocking, potentially risking live lock scenarios. Therefore simply refuse to bind to PFs with SR-IOV enabled with a warning message indicating the issue. Users can resolve this by re-binding to the host driver and disabling SR-IOV before attempting to use the device with vfio-pci. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | | | vfio: Mark expected switch fall-throughsGustavo A. R. Silva2018-08-062-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | | | | Merge branch 'linus' of ↵Linus Torvalds2018-08-1624-907/+649
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal Pull thermal management updates from Eduardo Valentin: - rework tsens driver to add support for tsens-v2 (Amit Kucheria) - rework armada thermal driver to use syscon and multichannel support (Miquel Raynal) - fixes to TI SoC, IMX, Exynos, RCar, and hwmon drivers * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: (34 commits) thermal: armada: fix copy-paste error in armada_thermal_probe() thermal: rcar_thermal: avoid NULL dereference in absence of IRQ resources thermal: samsung: Remove Exynos5440 clock handling left-overs thermal: tsens: Fix negative temperature reporting thermal: tsens: switch from of_iomap() to devm_ioremap_resource() thermal: tsens: Rename variable thermal: tsens: Add generic support for TSENS v2 IP thermal: tsens: Rename tsens-8996 to tsens-v2 for reuse thermal: tsens: Add support to split up register address space into two dt: thermal: tsens: Document the fallback DT property for v2 of TSENS IP thermal: tsens: Get rid of unused fields in structure thermal_hwmon: Pass the originating device down to hwmon_device_register_with_info thermal_hwmon: Sanitize attribute name passed to hwmon dt-bindings: thermal: armada: add reference to new bindings dt-bindings: cp110: add the thermal node in the syscon file dt-bindings: cp110: update documentation since DT de-duplication dt-bindings: ap806: add the thermal node in the syscon file dt-bindings: cp110: prepare the syscon file to list other syscons nodes dt-bindings: ap806: prepare the syscon file to list other syscons nodes dt-bindings: cp110: rename cp110 syscon file ...
| * | | | | thermal: armada: fix copy-paste error in armada_thermal_probe()Wei Yongjun2018-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value from devm_kzalloc() is not checked correctly. The test is done against a wrong variable. Fix it. Fixes: e72f03ef2543 ("thermal: armada: use the resource managed registration helper alternative") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
| * | | | | thermal: rcar_thermal: avoid NULL dereference in absence of IRQ resourcesSimon Horman2018-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure that the base address used by a call to rcar_thermal_common_write() may be NULL if the SOC supports interrupts for use with the thermal device but none are defined in DT as is the case for R-Car H1 (r8a7779). Guard against this condition to prevent a NULL dereference when the device is probed. Tested on: * R-Mobile APE6 (r8a73a4) / APE6EVM * R-Car H1 (r8a7779) / Marzen * R-Car H2 (r8a7790) / Lager * R-Car M2-W (r8a7791) / Koelsch * R-Car M2-N (r8a7793) / Gose * R-Car D3 ES1.0 (r8a77995) / Draak Fixes: 1969d9dc2079 ("thermal: rcar_thermal: add r8a77995 support") Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
| * | | | | thermal: samsung: Remove Exynos5440 clock handling left-oversKrzysztof Kozlowski2018-07-281-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8014220d48e7 ("thermal: samsung: Remove support for Exynos5440") removed the Exynos5440 specific part of code for accessing TMU interrupt registers but the surrounding clock handling was left. Clean it up. Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
| * | | | | thermal: tsens: Fix negative temperature reportingAmit Kucheria2018-07-281-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code will always return 0xffffffff in case of negative temperatures due to a bug in how the binary sign extension is being done. Use sign_extend32() instead. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
| * | | | | thermal: tsens: switch from of_iomap() to devm_ioremap_resource()Amit Kucheria2018-07-281-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | devm_ioremap_resources() automatically requests resources (so that the I/O region shows up in /proc/iomem) and devm_ wrappers do better error handling and unmapping of the I/O region when needed. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
| * | | | | thermal: tsens: Rename variableAmit Kucheria2018-07-282-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're actually reading the temperature from the status register. Fix the variable name to reflect that. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>