summaryrefslogtreecommitdiffstats
path: root/drivers/md (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-blockLinus Torvalds2020-08-162-7/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A few fixes on the block side of things: - Discard granularity fix (Coly) - rnbd cleanups (Guoqing) - md error handling fix (Dan) - md sysfs fix (Junxiao) - Fix flush request accounting, which caused an IO slowdown for some configurations (Ming) - Properly propagate loop flag for partition scanning (Lennart)" * tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block: block: fix double account of flush request's driver tag loop: unset GENHD_FL_NO_PART_SCAN on LOOP_CONFIGURE rnbd: no need to set bi_end_io in rnbd_bio_map_kern rnbd: remove rnbd_dev_submit_io md-cluster: Fix potential error pointer dereference in resize_bitmaps() block: check queue's limits.discard_granularity in __blkdev_issue_discard() md: get sysfs entry after redundancy attr group create
| * Merge branch 'md-next' of ↵Jens Axboe2020-08-072-7/+11
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-5.9 Pull MD fixes from Song. * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md-cluster: Fix potential error pointer dereference in resize_bitmaps() md: get sysfs entry after redundancy attr group create
| | * md-cluster: Fix potential error pointer dereference in resize_bitmaps()Dan Carpenter2020-08-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handling calls md_bitmap_free(bitmap) which checks for NULL but will Oops if we pass an error pointer. Let's set "bitmap" to NULL on this error path. Fixes: afd756286083 ("md-cluster/raid10: resize all the bitmaps before start reshape") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
| | * md: get sysfs entry after redundancy attr group createJunxiao Bi2020-08-061-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "sync_completed" and "degraded" belongs to redundancy attr group, it was not exist yet when md device was created. Reported-by: kernel test robot <rong.a.chen@intel.com> Fixes: e1a86dbbbd6a ("md: fix deadlock causing by sysfs_notify") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Song Liu <songliubraving@fb.com>
* | | Merge tag 'locking-urgent-2020-08-10' of ↵Linus Torvalds2020-08-112-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Thomas Gleixner: "A set of locking fixes and updates: - Untangle the header spaghetti which causes build failures in various situations caused by the lockdep additions to seqcount to validate that the write side critical sections are non-preemptible. - The seqcount associated lock debug addons which were blocked by the above fallout. seqcount writers contrary to seqlock writers must be externally serialized, which usually happens via locking - except for strict per CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot validate that the lock is held. This new debug mechanism adds the concept of associated locks. sequence count has now lock type variants and corresponding initializers which take a pointer to the associated lock used for writer serialization. If lockdep is enabled the pointer is stored and write_seqcount_begin() has a lockdep assertion to validate that the lock is held. Aside of the type and the initializer no other code changes are required at the seqcount usage sites. The rest of the seqcount API is unchanged and determines the type at compile time with the help of _Generic which is possible now that the minimal GCC version has been moved up. Adding this lockdep coverage unearthed a handful of seqcount bugs which have been addressed already independent of this. While generally useful this comes with a Trojan Horse twist: On RT kernels the write side critical section can become preemtible if the writers are serialized by an associated lock, which leads to the well known reader preempts writer livelock. RT prevents this by storing the associated lock pointer independent of lockdep in the seqcount and changing the reader side to block on the lock when a reader detects that a writer is in the write side critical section. - Conversion of seqcount usage sites to associated types and initializers" * tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) locking/seqlock, headers: Untangle the spaghetti monster locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header x86/headers: Remove APIC headers from <asm/smp.h> seqcount: More consistent seqprop names seqcount: Compress SEQCNT_LOCKNAME_ZERO() seqlock: Fold seqcount_LOCKNAME_init() definition seqlock: Fold seqcount_LOCKNAME_t definition seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g hrtimer: Use sequence counter with associated raw spinlock kvm/eventfd: Use sequence counter with associated spinlock userfaultfd: Use sequence counter with associated spinlock NFSv4: Use sequence counter with associated spinlock iocost: Use sequence counter with associated spinlock raid5: Use sequence counter with associated spinlock vfs: Use sequence counter with associated spinlock timekeeping: Use sequence counter with associated raw spinlock xfrm: policy: Use sequence counters with associated lock netfilter: nft_set_rbtree: Use sequence counter with associated rwlock netfilter: conntrack: Use sequence counter with associated spinlock sched: tasks: Use sequence counter with associated spinlock ...
| * \ \ Merge branch 'WIP.locking/seqlocks' into locking/urgentIngo Molnar2020-08-062-2/+2
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Pick up the full seqlock series PeterZ is working on. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | raid5: Use sequence counter with associated spinlockAhmed S. Darwish2020-07-292-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A sequence counter write side critical section must be protected by some form of locking to serialize writers. A plain seqcount_t does not contain the information of which lock must be held when entering a write side critical section. Use the new seqcount_spinlock_t data type, which allows to associate a spinlock with the sequence counter. This enables lockdep to verify that the spinlock used for writer serialization is held when the write side critical section is entered. If lockdep is disabled this lock association is compiled out and has neither storage size nor runtime overhead. Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Song Liu <song@kernel.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-20-a.darwish@linutronix.de
* | | | | Merge tag 'for-5.9/dm-changes' of ↵Linus Torvalds2020-08-0713-111/+326
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - DM multipath locking fixes around m->flags tests and improvements to bio-based code so that it follows patterns established by request-based code. - Request-based DM core improvement to eliminate unnecessary call to blk_mq_queue_stopped(). - Add "panic_on_corruption" error handling mode to DM verity target. - DM bufio fix to to perform buffer cleanup from a workqueue rather than wait for IO in reclaim context from shrinker. - DM crypt improvement to optionally avoid async processing via workqueues for reads and/or writes -- via "no_read_workqueue" and "no_write_workqueue" features. This more direct IO processing improves latency and throughput with faster storage. Avoiding workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a requirement for adding zoned block device support to DM crypt. - Add zoned block device support to DM crypt. Makes use of DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature (DM_CRYPT_WRITE_INLINE) that allows write completion to wait for encryption to complete. This allows write ordering to be preserved, which is needed for zoned block devices. - Fix DM ebs target's check for REQ_OP_FLUSH. - Fix DM core's report zones support to not report more zones than were requested. - A few small compiler warning fixes. - DM dust improvements to return output directly to the user rather than require they scrape the system log for output. * tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: don't call report zones for more than the user requested dm ebs: Fix incorrect checking for REQ_OP_FLUSH dm init: Set file local variable static dm ioctl: Fix compilation warning dm raid: Remove empty if statement dm verity: Fix compilation warning dm crypt: Enable zoned block device support dm crypt: add flags to optionally bypass kcryptd workqueues dm bufio: do buffer cleanup from a workqueue dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue() dm dust: add interface to list all badblocks dm dust: report some message results directly back to user dm verity: add "panic_on_corruption" error handling mode dm mpath: use double checked locking in fast path dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl dm mpath: rework __map_bio() dm mpath: factor out multipath_queue_bio dm mpath: push locking down to must_push_back_rq() dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH dm mpath: changes from initial m->flags locking audit
| * | | | | dm: don't call report zones for more than the user requestedJohannes Thumshirn2020-08-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't call report zones for more zones than the user actually requested, otherwise this can lead to out-of-bounds accesses in the callback functions. Such a situation can happen if the target's ->report_zones() callback function returns 0 because we've reached the end of the target and then restart the report zones on the second target. We're again calling into ->report_zones() and ultimately into the user supplied callback function but when we're not subtracting the number of zones already processed this may lead to out-of-bounds accesses in the user callbacks. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Fixes: d41003513e61 ("block: rework zone reporting") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm ebs: Fix incorrect checking for REQ_OP_FLUSHJohn Dorminy2020-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | REQ_OP_FLUSH was being treated as a flag, but the operation part of bio->bi_opf must be treated as a whole. Change to accessing the operation part via bio_op(bio) and checking for equality. Signed-off-by: John Dorminy <jdorminy@redhat.com> Acked-by: Heinz Mauelshagen <heinzm@redhat.com> Fixes: d3c7b35c20d60 ("dm: add emulated block size target") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm init: Set file local variable staticDamien Le Moal2020-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declare dm_allowed_targets as static to avoid the warning: drivers/md/dm-init.c:39:12: warning: symbol 'dm_allowed_targets' was not declared. Should it be static? when compiling with C=1. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm ioctl: Fix compilation warningDamien Le Moal2020-08-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In retrieve_status(), when copying the target type name in the target_type string field of struct dm_target_spec, copy at most DM_MAX_TYPE_NAME - 1 character to avoid the compilation warning: warning: ‘__builtin_strncpy’ specified bound 16 equals destination size [-Wstringop-truncation] when compiling with W-1. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm raid: Remove empty if statementDamien Le Moal2020-08-041-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In super_init_validation(), remove a body-less if statement testing only variables to avoid a compilation warning when compiling with W=1. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm verity: Fix compilation warningDamien Le Moal2020-08-041-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the case !CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG, declare the functions verity_verify_root_hash(), verity_verify_is_sig_opt_arg(), verity_verify_sig_parse_opt_args() and verity_verify_sig_opts_cleanup() as inline to avoid a "no previous prototype for xxx" compilation warning when compiling with W=1. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm crypt: Enable zoned block device supportDamien Le Moal2020-07-201-7/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable support for zoned block devices. This is done by: 1) implementing the target report_zones method. 2) adding the DM_TARGET_ZONED_HM flag to the target features. 3) setting DM_CRYPT_NO_WRITE_WORKQUEUE flag to avoid IO processing via workqueue. 4) Introducing inline write encryption completion to preserve write ordering. The last point is implemented by introducing the internal flag DM_CRYPT_WRITE_INLINE. When set, kcryptd_crypt_write_convert() always waits inline for the completion of a write request encryption if the request is not already completed once crypt_convert() returns. Completion of write request encryption is signaled using the restart completion by kcryptd_async_done(). This mechanism allows using ciphers that have an asynchronous implementation, isolating dm-crypt from any potential request completion reordering for these ciphers. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm crypt: add flags to optionally bypass kcryptd workqueuesIgnat Korchagin2020-07-201-8/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow up to [1] that detailed latency problems associated with dm-crypt's use of workqueues when processing IO. Current dm-crypt implementation creates a significant IO performance overhead (at least on small IO block sizes) for both latency and throughput. We suspect offloading IO request processing into workqueues and async threads is more harmful these days with the modern fast storage. I also did some digging into the dm-crypt git history and much of this async processing is not needed anymore, because the reasons it was added are mostly gone from the kernel. More details can be found in [2] (see "Git archeology" section). This change adds DM_CRYPT_NO_READ_WORKQUEUE and DM_CRYPT_NO_WRITE_WORKQUEUE flags for read and write BIOs, which direct dm-crypt to not offload crypto operations into kcryptd workqueues. In addition, writes are not buffered to be sorted in the dm-crypt red-black tree, but dispatched immediately. For cases, where crypto operations cannot happen (hard interrupt context, for example the read path of some NVME drivers), we offload the work to a tasklet rather than a workqueue. These flags only ensure no async BIO processing in the dm-crypt module. It is worth noting that some Crypto API implementations may offload encryption into their own workqueues, which are independent of the dm-crypt and its configuration. However upon enabling these new flags dm-crypt will instruct Crypto API not to backlog crypto requests. To give an idea of the performance gains for certain workloads, consider the script, and results when tested against various devices, detailed here: https://www.redhat.com/archives/dm-devel/2020-July/msg00138.html [1]: https://www.spinics.net/lists/dm-crypt/msg07516.html [2]: https://blog.cloudflare.com/speeding-up-linux-disk-encryption/ Signed-off-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm bufio: do buffer cleanup from a workqueueMikulas Patocka2020-07-201-19/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, DM bufio's waiting for IO from reclaim context in its shrinker has caused kswapd to block; which results in systemic IO stalls and even deadlock, e.g.: https://www.redhat.com/archives/dm-devel/2020-March/msg00025.html Here is Dave Chinner's problem description that motivated this fix, from: https://lore.kernel.org/linux-fsdevel/20190809215733.GZ7777@dread.disaster.area/ "Waiting for IO in kswapd reclaim context is considered harmful - kswapd context shrinker reclaim should be as non-blocking as possible, and any back-off to wait for IO to complete should be done by the high level reclaim core once it's completed an entire reclaim scan cycle of everything.... What follows from that, and is pertinent in this situation, is that if you don't block kswapd, then other reclaim contexts are not going to get stuck waiting for it regardless of the reclaim context they use." Continued elsewhere: "The only way to fix this problem once and for all is to stop using the shrinker as a mechanism to issue and wait on IO. If you need background writeback of dirty buffers, do it from a WQ_MEM_RECLAIM workqueue that isn't directly in the memory reclaim path and so can issue writeback and block safely from a GFP_KERNEL context. Kick the workqueue from the shrinker context, but get rid of the IO submission and waiting from the shrinker and all the GFP_NOFS memory reclaim recursion problems go away." As such, this commit moves buffer cleanup to a workqueue. Suggested-by: Dave Chinner <dchinner@redhat.com> Reported-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: Gabriel Krisman Bertazi <krisman@collabora.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()Ming Lei2020-07-201-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dm_stop_queue() only uses blk_mq_quiesce_queue() so it doesn't formally stop the blk-mq queue; therefore there is no point making the blk_mq_queue_stopped() check -- it will never be stopped. In addition, even though dm_stop_queue() actually tries to quiesce hw queues via blk_mq_quiesce_queue(), checking with blk_queue_quiesced() to avoid unnecessary queue quiesce isn't reliable because: the QUEUE_FLAG_QUIESCED flag is set before synchronize_rcu() and dm_stop_queue() may be called when synchronize_rcu() from another blk_mq_quiesce_queue() is in-progress. Fixes: 7b17c2f7292ba ("dm: Fix a race condition related to stopping and starting queues") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm dust: add interface to list all badblocksyangerkun2020-07-201-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This interface may help anyone who want to know all badblocks without querying for each block. [Bryan: DMEMIT message if no blocks are in the bad block list.] Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Bryan Gurney <bgurney@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm dust: report some message results directly back to useryangerkun2020-07-201-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some messages (queryblock, countbadblocks, removebadblock) are best reported directly to user directly. Do so with DMEMIT. [Bryan: maintain __func__ output in DMEMIT messages] Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Bryan Gurney <bgurney@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm verity: add "panic_on_corruption" error handling modeJeongHyeon Lee2020-07-132-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Samsung smart phones may need the ability to panic on corruption. Not all devices provide the bootloader support needed to use the existing "restart_on_corruption" mode. Additional details for why Samsung needs this new mode can be found here: https://www.redhat.com/archives/dm-devel/2020-June/msg00235.html Signed-off-by: jhs2.lee <jhs2.lee@samsung.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: use double checked locking in fast pathMike Snitzer2020-07-131-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fast-path code biased toward lazy acknowledgement of bit being set (primarily only for initialization). Multipath code is very retry oriented so even if state is missed it'll recover. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctlMike Snitzer2020-07-131-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes consistent with __map_bio() and multipath_clone_and_map(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: rework __map_bio()Mike Snitzer2020-07-131-14/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | so that it follows same pattern as request-based multipath_clone_and_map() Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: factor out multipath_queue_bioMike Snitzer2020-07-131-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enables further cleanup. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: push locking down to must_push_back_rq()Mike Snitzer2020-07-131-11/+14
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATHMike Snitzer2020-07-131-18/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix multipath_end_io, multipath_end_io_bio and multipath_busy to take m->lock while testing if MPATHF_QUEUE_IF_NO_PATH bit is set. These are all slow-path cases when no paths are available so extra locking isn't a performance hit. Correctness matters most. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * | | | | dm mpath: changes from initial m->flags locking auditMike Snitzer2020-07-131-3/+13
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | Fix locking in slow-paths where m->lock should be taken. Signed-off-by: Mike Snitzer <snitzer@rredhat.com>
* | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2020-08-072-19/+19
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc updates from Andrew Morton: - a few MM hotfixes - kthread, tools, scripts, ntfs and ocfs2 - some of MM Subsystems affected by this patch series: kthread, tools, scripts, ntfs, ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan, debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore, sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan). * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) mm: vmscan: consistent update to pgrefill mm/vmscan.c: fix typo khugepaged: khugepaged_test_exit() check mmget_still_valid() khugepaged: retract_page_tables() remember to test exit khugepaged: collapse_pte_mapped_thp() protect the pmd lock khugepaged: collapse_pte_mapped_thp() flush the right range mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible mm: thp: replace HTTP links with HTTPS ones mm/page_alloc: fix memalloc_nocma_{save/restore} APIs mm/page_alloc.c: skip setting nodemask when we are in interrupt mm/page_alloc: fallbacks at most has 3 elements mm/page_alloc: silence a KASAN false positive mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask() mm/page_alloc.c: simplify pageblock bitmap access mm/page_alloc.c: extract the common part in pfn_to_bitidx() mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits mm/shuffle: remove dynamic reconfiguration mm/memory_hotplug: document why shuffle_zone() is relevant mm/page_alloc: remove nr_free_pagecache_pages() mm: remove vm_total_pages ...
| * | | | | mm, treewide: rename kzfree() to kfree_sensitive()Waiman Long2020-08-072-19/+19
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As said by Linus: A symmetric naming is only helpful if it implies symmetries in use. Otherwise it's actively misleading. In "kzalloc()", the z is meaningful and an important part of what the caller wants. In "kzfree()", the z is actively detrimental, because maybe in the future we really _might_ want to use that "memfill(0xdeadbeef)" or something. The "zero" part of the interface isn't even _relevant_. The main reason that kzfree() exists is to clear sensitive information that should not be leaked to other future users of the same memory objects. Rename kzfree() to kfree_sensitive() to follow the example of the recently added kvfree_sensitive() and make the intention of the API more explicit. In addition, memzero_explicit() is used to clear the memory to make sure that it won't get optimized away by the compiler. The renaming is done by using the command sequence: git grep -w --name-only kzfree |\ xargs sed -i 's/kzfree/kfree_sensitive/' followed by some editing of the kfree_sensitive() kerneldoc and adding a kzfree backward compatibility macro in slab.h. [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h] [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more] Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: James Morris <jmorris@namei.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Joe Perches <joe@perches.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: David Rientjes <rientjes@google.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'powerpc-5.9-1' of ↵Linus Torvalds2020-08-071-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for (optionally) using queued spinlocks & rwlocks. - Support for a new faster system call ABI using the scv instruction on Power9 or later. - Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on Power10 and future processors, leaving us with no way to implement the functionality it requests. This risks breaking userspace, though we believe it is unused in practice. - A bug fix for, and then the removal of, our custom stack expansion checking. We now allow stack expansion up to the rlimit, like other architectures. - Remove the remnants of our (previously disabled) topology update code, which tried to react to NUMA layout changes on virtualised systems, but was prone to crashes and other problems. - Add PMU support for Power10 CPUs. - A change to our signal trampoline so that we don't unbalance the link stack (branch return predictor) in the signal delivery path. - Lots of other cleanups, refactorings, smaller features and so on as usual. Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong, YueHaibing. * tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits) selftests/powerpc: Fix pkey syscall redefinitions powerpc: Fix circular dependency between percpu.h and mmu.h powerpc/powernv/sriov: Fix use of uninitialised variable selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs powerpc/40x: Fix assembler warning about r0 powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric powerpc/papr_scm: Fetch nvdimm performance stats from PHYP cpuidle: pseries: Fixup exit latency for CEDE(0) cpuidle: pseries: Add function to parse extended CEDE records cpuidle: pseries: Set the latency-hint before entering CEDE selftests/powerpc: Fix online CPU selection powerpc/perf: Consolidate perf_callchain_user_[64|32]() powerpc/pseries/hotplug-cpu: Remove double free in error path powerpc/pseries/mobility: Add pr_debug() for device tree changes powerpc/pseries/mobility: Set pr_fmt() powerpc/cacheinfo: Warn if cache object chain becomes unordered powerpc/cacheinfo: Improve diagnostics about malformed cache lists powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages powerpc/cacheinfo: Set pr_fmt() powerpc: fix function annotations to avoid section mismatch warnings with gcc-10 ...
| * | | | | libnvdimm/nvdimm/flush: Allow architecture to override the flush barrierAneesh Kumar K.V2020-07-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Architectures like ppc64 provide persistent memory specific barriers that will ensure that all stores for which the modifications are written to persistent storage by preceding dcbfps and dcbstps instructions have updated persistent storage before any data access or data transfer caused by subsequent instructions is initiated. This is in addition to the ordering done by wmb() Update nvdimm core such that architecture can use barriers other than wmb to ensure all previous writes are architecturally visible for the platform buffer flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200701072235.223558-5-aneesh.kumar@linux.ibm.com
* | | | | | Merge branch 'hch.init_path' of ↵Linus Torvalds2020-08-074-27/+317
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull init and set_fs() cleanups from Al Viro: "Christoph's 'getting rid of ksys_...() uses under KERNEL_DS' series" * 'hch.init_path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (50 commits) init: add an init_dup helper init: add an init_utimes helper init: add an init_stat helper init: add an init_mknod helper init: add an init_mkdir helper init: add an init_symlink helper init: add an init_link helper init: add an init_eaccess helper init: add an init_chmod helper init: add an init_chown helper init: add an init_chroot helper init: add an init_chdir helper init: add an init_rmdir helper init: add an init_unlink helper init: add an init_umount helper init: add an init_mount helper init: mark create_dev as __init init: mark console_on_rootfs as __init init: initialize ramdisk_execute_command at compile time devtmpfs: refactor devtmpfsd() ...
| * | | | | init: add an init_stat helperChristoph Hellwig2020-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a simple helper to stat with a kernel space file name and switch the early init code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | | | md: rewrite md_setup_drive to avoid ioctlsChristoph Hellwig2020-07-163-85/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | md_setup_drive knows it works with md devices, so it is rather pointless to open a file descriptor and issue ioctls. Just call directly into the relevant low-level md routines after getting a handle to the device using blkdev_get_by_dev instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: NeilBrown <neilb@suse.de> Acked-by: Song Liu <song@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | md: simplify md_setup_driveChristoph Hellwig2020-07-161-102/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the loop over the possible arrays into the caller to remove a level of indentation for the whole function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: NeilBrown <neilb@suse.de> Acked-by: Song Liu <song@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | md: remove the kernel version of md_u.hChristoph Hellwig2020-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mdp_major can just move to drivers/md/md.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | md: remove the autoscan partition re-readChristoph Hellwig2020-07-161-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | devfs is long gone, and autoscan works just fine without this these days. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: NeilBrown <neilb@suse.de> Acked-by: Song Liu <song@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | md: replace the RAID_AUTORUN ioctl with a direct function callChristoph Hellwig2020-07-163-21/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using a spcial RAID_AUTORUN ioctl that only exists for non-modular builds and is only called from the early init code, just call the actual function directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: NeilBrown <neilb@suse.de> Acked-by: Song Liu <song@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | md: move the early init autodetect code to drivers/md/Christoph Hellwig2020-07-162-0/+318
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just like the NFS and CIFS root code this better lives with the driver it is tightly integrated with. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-blockLinus Torvalds2020-08-051-20/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block stacking updates from Jens Axboe: "The stacking related fixes depended on both the core block and drivers branches, so here's a topic branch with that change. Outside of that, a late fix from Johannes for zone revalidation" * tag 'for-5.9/block-merge-20200804' of git://git.kernel.dk/linux-block: block: don't do revalidate zones on invalid devices block: remove blk_queue_stack_limits block: remove bdev_stack_limits block: inherit the zoned characteristics in blk_stack_limits
| * | | | | block: remove bdev_stack_limitsChristoph Hellwig2020-07-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function is just a tiny wrapper around blk_stack_limit and has two callers. Simplify the stack a bit by open coding it in the two callers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block: inherit the zoned characteristics in blk_stack_limitsChristoph Hellwig2020-07-201-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lift the code from device mapper into blk_stack_limits to inherity the stacking limitations. This ensures we do the right thing for all stacked zoned block devices. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Tested-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | Merge branch 'for-5.9/drivers' into for-5.9/block-mergeJens Axboe2020-07-207-36/+101
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * for-5.9/drivers: (38 commits) block: add max_active_zones to blk-sysfs block: add max_open_zones to blk-sysfs s390/dasd: Use struct_size() helper s390/dasd: fix inability to use DASD with DIAG driver md-cluster: fix wild pointer of unlock_all_bitmaps() md/raid5-cache: clear MD_SB_CHANGE_PENDING before flushing stripes md: fix deadlock causing by sysfs_notify md: improve io stats accounting md: raid0/linear: fix dereference before null check on pointer mddev rsxx: switch from 'pci_free_consistent()' to 'dma_free_coherent()' nvme: remove ns->disk checks nvme-pci: use standard block status symbolic names nvme-pci: use the consistent return type of nvme_pci_iod_alloc_size() nvme-pci: add a blank line after declarations nvme-pci: fix some comments issues nvme-pci: remove redundant segment validation nvme: document quirked Intel models nvme: expose reconnect_delay and ctrl_loss_tmo via sysfs nvme: support for zoned namespaces nvme: support for multiple Command Sets Supported and Effects log pages ...
| * \ \ \ \ \ Merge branch 'for-5.9/block' into for-5.9/block-mergeJens Axboe2020-07-2033-486/+122
| |\ \ \ \ \ \ | | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * for-5.9/block: (124 commits) blk-cgroup: show global disk stats in root cgroup io.stat blk-cgroup: make iostat functions visible to stat printing block: improve discard bio alignment in __blkdev_issue_discard() block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers block: defer flush request no matter whether we have elevator block: make blk_timeout_init() static block: remove retry loop in ioc_release_fn() block: remove unnecessary ioc nested locking block: integrate bd_start_claiming into __blkdev_get block: use bd_prepare_to_claim directly in the loop driver block: refactor bd_start_claiming block: simplify the restart case in __blkdev_get Revert "blk-rq-qos: remove redundant finish_wait to rq_qos_wait." block: always remove partitions from blk_drop_partitions() block: relax jiffies rounding for timeouts blk-mq: remove redundant validation in __blk_mq_end_request() blk-mq: Remove unnecessary local variable writeback: remove bdi->congested_fn writeback: remove struct bdi_writeback_congested writeback: remove {set,clear}_wb_congested ...
* | | | | | | Merge tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-blockLinus Torvalds2020-08-0525-355/+913
|\ \ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver updates from Jens Axboe: - NVMe: - ZNS support (Aravind, Keith, Matias, Niklas) - Misc cleanups, optimizations, fixes (Baolin, Chaitanya, David, Dongli, Max, Sagi) - null_blk zone capacity support (Aravind) - MD: - raid5/6 fixes (ChangSyun) - Warning fixes (Damien) - raid5 stripe fixes (Guoqing, Song, Yufen) - sysfs deadlock fix (Junxiao) - raid10 deadlock fix (Vitaly) - struct_size conversions (Gustavo) - Set of bcache updates/fixes (Coly) * tag 'for-5.9/drivers-20200803' of git://git.kernel.dk/linux-block: (117 commits) md/raid5: Allow degraded raid6 to do rmw md/raid5: Fix Force reconstruct-write io stuck in degraded raid5 raid5: don't duplicate code for different paths in handle_stripe raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_show md: print errno in super_written md/raid5: remove the redundant setting of STRIPE_HANDLE md: register new md sysfs file 'uuid' read-only md: fix max sectors calculation for super 1.0 nvme-loop: remove extra variable in create ctrl nvme-loop: set ctrl state connecting after init nvme-multipath: do not fall back to __nvme_find_path() for non-optimized paths nvme-multipath: fix logic for non-optimized paths nvme-rdma: fix controller reset hang during traffic nvme-tcp: fix controller reset hang during traffic nvmet: introduce the passthru Kconfig option nvmet: introduce the passthru configfs interface nvmet: Add passthru enable/disable helpers nvmet: add passthru code to process commands nvme: export nvme_find_get_ns() and nvme_put_ns() nvme: introduce nvme_ctrl_get_by_path() ...
| * | | | | | md/raid5: Allow degraded raid6 to do rmwChangSyun Peng2020-08-031-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Degraded raid6 always do reconstruct-write now. With raid6 xor supported, we can do rmw in degraded raid6. This patch can reduce many read IOs to improve performance. If the failed disk is P, Q or the disk we want to write to, we may need to do reconstruct-write in max degraded raid6. In this situation we can not read enough data from handle_stripe_dirtying() so we have to set force_rcw in handle_stripe_fill() to read all data. Reviewed-by: Alex Wu <alexwu@synology.com> Reviewed-by: BingJing Chang <bingjingc@synology.com> Reviewed-by: Danny Shih <dannyshih@synology.com> Signed-off-by: ChangSyun Peng <allenpeng@synology.com> Signed-off-by: Song Liu <songliubraving@fb.com>
| * | | | | | md/raid5: Fix Force reconstruct-write io stuck in degraded raid5ChangSyun Peng2020-08-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In degraded raid5, we need to read parity to do reconstruct-write when data disks fail. However, we can not read parity from handle_stripe_dirtying() in force reconstruct-write mode. Reproducible Steps: 1. Create degraded raid5 mdadm -C /dev/md2 --assume-clean -l5 -n3 /dev/sda2 /dev/sdb2 missing 2. Set rmw_level to 0 echo 0 > /sys/block/md2/md/rmw_level 3. IO to raid5 Now some io may be stuck in raid5. We can use handle_stripe_fill() to read the parity in this situation. Cc: <stable@vger.kernel.org> # v4.4+ Reviewed-by: Alex Wu <alexwu@synology.com> Reviewed-by: BingJing Chang <bingjingc@synology.com> Reviewed-by: Danny Shih <dannyshih@synology.com> Signed-off-by: ChangSyun Peng <allenpeng@synology.com> Signed-off-by: Song Liu <songliubraving@fb.com>
| * | | | | | raid5: don't duplicate code for different paths in handle_stripeGuoqing Jiang2020-08-031-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we can see, R5_LOCKED is set and s.locked is increased whether R5_ReWrite is set or not, so move it to common path. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
| * | | | | | raid5-cache: hold spinlock instead of mutex in r5c_journal_mode_showGuoqing Jiang2020-08-031-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace mddev_lock with spin_lock to align with other show methods in raid5_attrs. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>