summaryrefslogtreecommitdiffstats
path: root/block/genhd.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* block: call into the file system for bdev_mark_deadChristoph Hellwig2023-08-211-20/+24
| | | | | | | | | | | | | | Combine the newly merged bdev_mark_dead helper with the existing mark_dead holder operation so that all operations that invalidate a device that is dead or being removed now go through the holder ops. This allows file systems to explicitly shutdown either ASAP (for a surprise removal) or after writing back data (for an orderly removal), and do so not only for the main device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-15-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* block: consolidate __invalidate_device and fsync_bdevChristoph Hellwig2023-08-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | We currently have two interfaces that take a block_devices and the find a mounted file systems to flush or invaldidate data on it. Both are a bit problematic because they only work for the "main" block devices that is used as s_dev for the super_block, and because they don't call into the file system at all. Merge the two into a new bdev_mark_dead helper that does both the syncing and invalidation and which is properly documented. This is in preparation of merging the functionality into the ->mark_dead holder operation so that it will work on additional block devices used by a file systems and give us a single entry point for invalidation of dead devices or media. Note that a single standalone fsync_bdev call for an obscure ioctl remains for now, but that one will also be deal with in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-14-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* block: fix the exclusive open mask in disk_scan_partitionsChristoph Hellwig2023-06-211-1/+1
| | | | | | | | | | | FMODE_EXEC has nothing to do with exclusive opens, and even is of the wrong type. We need to check for BLK_OPEN_EXCL here. Fixes: 985958b8584c ("block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230621124914.185992-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()Yu Kuai2023-06-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | After commit 2736e8eeb0cc ("block: use the holder as indication for exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and mode contains 'FMODE_EXCL'. holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL, hence it should not use 'FMODE_EXCL', which is broben by the commit. For consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev() if user scan partitions with device opened exclusively. Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(), as it used to be. Reported-by: syzbot+00cd27751f78817f167b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230618140402.7556-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: fix blktrace debugfs entries leakageYu Kuai2023-06-151-1/+4
| | | | | | | | | | | | | | | | | | | | | Commit 99d055b4fd4b ("block: remove per-disk debugfs files in blk_unregister_queue") moves blk_trace_shutdown() from blk_release_queue() to blk_unregister_queue(), this is safe if blktrace is created through sysfs, however, there is a regression in corner case. blktrace can still be enabled after del_gendisk() through ioctl if the disk is opened before del_gendisk(), and if blktrace is not shutdown through ioctl before closing the disk, debugfs entries will be leaked. Fix this problem by shutdown blktrace in disk_release(), this is safe because blk_trace_remove() is reentrant. Fixes: 99d055b4fd4b ("block: remove per-disk debugfs files in blk_unregister_queue") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230610022003.2557284-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig2023-06-121-4/+4
| | | | | | | | | | | | | | The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: use the holder as indication for exclusive opensChristoph Hellwig2023-06-121-3/+2
| | | | | | | | | | | | | | | | | | The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move more code to early-lookup.cChristoph Hellwig2023-06-051-92/+0
| | | | | | | | | | | | blk_lookup_devt is only used by code in early-lookup.c, so move it there. printk_all_partitions and it's helper bdevt_str are only used by the early init code in init/do_mounts.c, so they should go there as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230531125535.676098-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add a mark_dead holder operationChristoph Hellwig2023-06-051-0/+24
| | | | | | | | | | | | | Add a mark_dead method to blk_holder_ops that is called from blk_mark_disk_dead to notify the holder that the block device it is using has been marked dead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: introduce holder opsChristoph Hellwig2023-06-051-2/+4
| | | | | | | | | | | | | | Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: delete partitions later in del_gendiskChristoph Hellwig2023-06-051-5/+19
| | | | | | | | | | | | | | | | | | | Delay dropping the block_devices for partitions in del_gendisk until after the call to blk_mark_disk_dead, so that we can implementat notification of removed devices in blk_mark_disk_dead. This requires splitting a lower-level drop_partition helper out of delete_partition and using that from del_gendisk, while having a common loop for the whole device and partitions that calls remove_inode_hash, fsync_bdev and __invalidate_device before the call to blk_mark_disk_dead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: avoid repeated work in blk_mark_disk_deadChristoph Hellwig2023-06-051-1/+3
| | | | | | | | | | | | | Check if GD_DEAD is already set in blk_mark_disk_dead, and don't duplicate the work already done. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: consolidate the shutdown logic in blk_mark_disk_dead and del_gendiskChristoph Hellwig2023-06-051-14/+12
| | | | | | | | | | | | | | | | | | | | | | blk_mark_disk_dead does very similar work a a section of del_gendisk: - set the GD_DEAD flag - set the capacity to zero - start a queue drain but del_gendisk also sets QUEUE_FLAG_DYING on the queue if it is owned by the disk, sets the capacity to zero before starting the drain, and both with sending a uevent and kernel message for this fake capacity change. Move the exact logic from the more heavily used del_gendisk into blk_mark_disk_dead and then call blk_mark_disk_dead from del_gendisk. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Replace all non-returning strlcpy with strscpyAzeem Shaikh2023-06-011-1/+1
| | | | | | | | | | | | | | | | | | strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated [1]. In an effort to remove strlcpy() completely [2], replace strlcpy() here with strscpy(). No return values were used, so direct replacement is safe. [1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [2] https://github.com/KSPP/linux/issues/89 Signed-off-by: Azeem Shaikh <azeemshaikh38@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230530155608.272266-1-azeemshaikh38@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linuxLinus Torvalds2023-05-061-14/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: - MD pull request via Song: - Improve raid5 sequential IO performance on spinning disks, which fixes a regression since v6.0 (Jan Kara) - Fix bitmap offset types, which fixes an issue introduced in this merge window (Jonathan Derrick) - Cleanup of hweight type used for cgroup writeback (Maxim) - Fix a regression with the "has_submit_bio" changes across partitions (Ming) - Cleanup of QUEUE_FLAG_ADD_RANDOM clearing. We used to set this flag on queues non blk-mq queues, and hence some drivers clear it unconditionally. Since all of these have since been converted to true blk-mq drivers, drop the useless clear as the bit is not set (Chaitanya) - Fix the flags being set in a bio for a flush for drbd (Christoph) - Cleanup and deduplication of the code handling setting block device capacity (Damien) - Fix for ublk handling IO timeouts (Ming) - Fix for a regression in blk-cgroup teardown (Tao) - NBD documentation and code fixes (Eric) - Convert blk-integrity to using device_attributes rather than a second kobject to manage lifetimes (Thomas) * tag 'for-6.4/block-2023-05-06' of git://git.kernel.dk/linux: ublk: add timeout handler drbd: correctly submit flush bio on barrier mailmap: add mailmap entries for Jens Axboe block: Skip destroyed blkg when restart in blkg_destroy_all() writeback: fix call of incorrect macro md: Fix bitmap offset type in sb writer md/raid5: Improve performance for sequential IO docs nbd: userspace NBD now favors github over sourceforge block nbd: use req.cookie instead of req.handle uapi nbd: add cookie alias to handle uapi nbd: improve doc links to userspace spec blk-integrity: register sysfs attributes on struct device blk-integrity: convert to struct device_attribute blk-integrity: use sysfs_emit block/drivers: remove dead clear of random flag block: sync part's ->bd_has_submit_bio with disk's block: Cleanup set_capacity()/bdev_set_nr_sectors()
| * blk-integrity: register sysfs attributes on struct deviceThomas Weißschuh2023-04-271-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "integrity" kobject only acted as a holder for static sysfs entries. It also was embedded into struct gendisk without managing it, violating assumptions of the driver core. Instead register the sysfs entries directly onto the struct device. Also drop the now unused member integrity_kobj from struct gendisk. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20230309-kobj_release-gendisk_integrity-v3-3-ceccb4493c46@weissschuh.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Cleanup set_capacity()/bdev_set_nr_sectors()Damien Le Moal2023-04-241-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code for setting a block device capacity (bd_nr_sectors field of struct block_device) is duplicated in set_capacity() and bdev_set_nr_sectors(). Clean this up by making bdev_set_nr_sectors() a block layer internal function defined in block/bdev.c instead of having this function statically defined in block/partitions/core.c. With this change, set_capacity() implementation can be simplified to only calling bdev_set_nr_sectors(). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230424131318.79935-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds2023-04-271-13/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
| * \ Merge 6.3-rc5 into driver-core-nextGreg Kroah-Hartman2023-04-031-1/+1
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | driver core: class: remove dev_kobj from struct classGreg Kroah-Hartman2023-03-311-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_kobj field in struct class is now only written to, but never read from, so it can be removed as it is useless. Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20230331093318.82288-5-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | driver core: remove CONFIG_SYSFS_DEPRECATED and CONFIG_SYSFS_DEPRECATED_V2Greg Kroah-Hartman2023-03-061-12/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_SYSFS_DEPRECATED was added in commit 88a22c985e35 ("CONFIG_SYSFS_DEPRECATED") in 2006 to allow systems with older versions of some tools (i.e. Fedora 3's version of udev) to boot properly. Four years later, in 2010, the option was attempted to be removed as most of userspace should have been fixed up properly by then, but some kernel developers clung to those old systems and refused to update, so we added CONFIG_SYSFS_DEPRECATED_V2 in commit e52eec13cd6b ("SYSFS: Allow boot time switching between deprecated and modern sysfs layout") to allow them to continue to boot properly, and we allowed a boot time parameter to be used to switch back to the old format if needed. Over time, the logic that was covered under these config options was slowly removed from individual driver subsystems successfully, removed, and the only thing that is now left in the kernel are some changes in the block layer's representation in sysfs where real directories are used instead of symlinks like normal. Because the original changes were done to userspace tools in 2006, and all distros that use those tools are long end-of-life, and older non-udev-based systems do not care about the block layer's sysfs representation, it is time to finally remove this old logic and the config entries from the kernel. Cc: Jonathan Corbet <corbet@lwn.net> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: linux-block@vger.kernel.org Cc: linux-doc@vger.kernel.org Acked-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20230223073326.2073220-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | Merge tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linuxLinus Torvalds2023-04-261-0/+3
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - drbd patches, bringing us closer to unifying the out-of-tree version and the in tree one (Andreas, Christoph) - support for auto-quiesce for the s390 dasd driver (Stefan) - MD pull request via Song: - md/bitmap: Optimal last page size (Jon Derrick) - Various raid10 fixes (Yu Kuai, Li Nan) - md: add error_handlers for raid0 and linear (Mariusz Tkaczyk) - NVMe pull request via Christoph: - Drop redundant pci_enable_pcie_error_reporting (Bjorn Helgaas) - Validate nvmet module parameters (Chaitanya Kulkarni) - Fence TCP socket on receive error (Chris Leech) - Fix async event trace event (Keith Busch) - Minor cleanups (Chaitanya Kulkarni, zhenwei pi) - Fix and cleanup nvmet Identify handling (Damien Le Moal, Christoph Hellwig) - Fix double blk_mq_complete_request race in the timeout handler (Lei Yin) - Fix irq locking in nvme-fcloop (Ming Lei) - Remove queue mapping helper for rdma devices (Sagi Grimberg) - use structured request attribute checks for nbd (Jakub) - fix blk-crypto race conditions between keyslot management (Eric) - add sed-opal support for reading read locking range attributes (Ondrej) - make fault injection configurable for null_blk (Akinobu) - clean up the request insertion API (Christoph) - clean up the queue running API (Christoph) - blkg config helper cleanups (Tejun) - lazy init support for blk-iolatency (Tejun) - various fixes and tweaks to ublk (Ming) - remove hybrid polling. It hasn't really been useful since we got async polled IO support, and these days we don't support sync polled IO at all (Keith) - misc fixes, cleanups, improvements (Zhong, Ondrej, Colin, Chengming, Chaitanya, me) * tag 'for-6.4/block-2023-04-21' of git://git.kernel.dk/linux: (118 commits) nbd: fix incomplete validation of ioctl arg ublk: don't return 0 in case of any failure sed-opal: geometry feature reporting command null_blk: Always check queue mode setting from configfs block: ublk: switch to ioctl command encoding blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush block, bfq: Fix division by zero error on zero wsum fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m block: store bdev->bd_disk->fops->submit_bio state in bdev block: re-arrange the struct block_device fields for better layout md/raid5: remove unused working_disks variable md/raid10: don't call bio_start_io_acct twice for bio which experienced read error md/raid10: fix memleak of md thread md/raid10: fix memleak for 'conf->bio_split' md/raid10: fix leak of 'r10bio->remaining' for recovery md/raid10: don't BUG_ON() in raise_barrier() md: fix soft lockup in status_resync md: add error_handlers for raid0 and linear md: Use optimal I/O size for last bitmap page md: Fix types in sb writer ...
| * | | block: store bdev->bd_disk->fops->submit_bio state in bdevJens Axboe2023-04-161-0/+3
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | We have a long chain of memory dereferencing just to whether or not this disk has a special submit_bio helper. As that's not necessarily the common case, add a bd_has_submit_bio state in the bdev to avoid traversing this memory dependency chain if we don't need to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* / | block: don't set GD_NEED_PART_SCAN if scan partition failedYu Kuai2023-04-071-1/+7
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if disk_scan_partitions() failed, GD_NEED_PART_SCAN will still set, and partition scan will be proceed again when blkdev_get_by_dev() is called. However, this will cause a problem that re-assemble partitioned raid device will creat partition for underlying disk. Test procedure: mdadm -CR /dev/md0 -l 1 -n 2 /dev/sda /dev/sdb -e 1.0 sgdisk -n 0:0:+100MiB /dev/md0 blockdev --rereadpt /dev/sda blockdev --rereadpt /dev/sdb mdadm -S /dev/md0 mdadm -A /dev/md0 /dev/sda /dev/sdb Test result: underlying disk partition and raid partition can be observed at the same time Note that this can still happen in come corner cases that GD_NEED_PART_SCAN can be set for underlying disk while re-assemble raid device. Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* / block: fix wrong mode for blkdev_put() from disk_scan_partitions()Yu Kuai2023-03-071-1/+1
|/ | | | | | | | | | | | | | | | | If disk_scan_partitions() is called with 'FMODE_EXCL', blkdev_get_by_dev() will be called without 'FMODE_EXCL', however, follow blkdev_put() is still called with 'FMODE_EXCL', which will cause 'bd_holders' counter to leak. Fix the problem by using the right mode for blkdev_put(). Reported-by: syzbot+2bcc0d79e548c4f62a59@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/f9649d501bc8c3444769418f6c26263555d9d3be.camel@linux.ibm.com/T/ Tested-by: Julian Ruess <julianr@linux.ibm.com> Fixes: e5cfefa97bcc ("block: fix scan partition for exclusively open device again") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linuxLinus Torvalds2023-03-031-9/+28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Don't access released socket during error recovery (Akinobu Mita) - Bring back auto-removal of deleted namespaces during sequential scan (Christoph Hellwig) - Fix an error code in nvme_auth_process_dhchap_challenge (Dan Carpenter) - Show well known discovery name (Daniel Wagner) - Add a missing endianess conversion in effects masking (Keith Busch) - Fix for a regression introduced in blk-rq-qos during init in this merge window (Breno) - Reorder a few fields in struct blk_mq_tag_set, eliminating a few holes and shrinking it (Christophe) - Remove redundant bdev_get_queue() NULL checks (Juhyung) - Add sed-opal single user mode support flag (Luca) - Remove SQE128 check in ublk as it isn't needed, saving some memory (Ming) - Op specific segment checking for cloned requests (Uday) - Exclusive open partition scan fixes (Yu) - Loop offset/size checking before assigning them in the device (Zhong) - Bio polling fixes (me) * tag 'block-6.3-2023-03-03' of git://git.kernel.dk/linux: blk-mq: enforce op-specific segment limits in blk_insert_cloned_request nvme-fabrics: show well known discovery name nvme-tcp: don't access released socket during error recovery nvme-auth: fix an error code in nvme_auth_process_dhchap_challenge() nvme: bring back auto-removal of deleted namespaces during sequential scan blk-iocost: Pass gendisk to ioc_refresh_params nvme: fix sparse warning on effects masking block: be a bit more careful in checking for NULL bdev while polling block: clear bio->bi_bdev when putting a bio back in the cache loop: loop_set_status_from_info() check before assignment ublk: remove check IO_URING_F_SQE128 in ublk_ch_uring_cmd block: remove more NULL checks after bdev_get_queue() blk-mq: Reorder fields in 'struct blk_mq_tag_set' block: fix scan partition for exclusively open device again block: Revert "block: Do not reread partition table on exclusively open device" sed-opal: add support flag for SUM in status ioctl
| * block: fix scan partition for exclusively open device againYu Kuai2023-02-171-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As explained in commit 36369f46e917 ("block: Do not reread partition table on exclusively open device"), reread partition on the device that is exclusively opened by someone else is problematic. This patch will make sure partition scan will only be proceed if current thread open the device exclusively, or the device is not opened exclusively, and in the later case, other scanners and exclusive openers will be blocked temporarily until partition scan is done. Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Cc: <stable@vger.kernel.org> Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Revert "block: Do not reread partition table on exclusively open device"Yu Kuai2023-02-171-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 36369f46e91785688a5f39d7a5590e3f07981316. This patch can't fix the problem in a corner case that device can be opened exclusively after the checking and before blkdev_get_by_dev(). We'll use a new solution to fix the problem in the next patch, and the new solution doesn't need to change apis. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230217022200.3092987-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'driver-core-6.3-rc1' of ↵Linus Torvalds2023-02-241-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.3-rc1. There's a lot of changes this development cycle, most of the work falls into two different categories: - fw_devlink fixes and updates. This has gone through numerous review cycles and lots of review and testing by lots of different devices. Hopefully all should be good now, and Saravana will be keeping a watch for any potential regression on odd embedded systems. - driver core changes to work to make struct bus_type able to be moved into read-only memory (i.e. const) The recent work with Rust has pointed out a number of areas in the driver core where we are passing around and working with structures that really do not have to be dynamic at all, and they should be able to be read-only making things safer overall. This is the contuation of that work (started last release with kobject changes) in moving struct bus_type to be constant. We didn't quite make it for this release, but the remaining patches will be finished up for the release after this one, but the groundwork has been laid for this effort. Other than that we have in here: - debugfs memory leak fixes in some subsystems - error path cleanups and fixes for some never-able-to-be-hit codepaths. - cacheinfo rework and fixes - Other tiny fixes, full details are in the shortlog All of these have been in linux-next for a while with no reported problems" [ Geert Uytterhoeven points out that that last sentence isn't true, and that there's a pending report that has a fix that is queued up - Linus ] * tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits) debugfs: drop inline constant formatting for ERR_PTR(-ERROR) OPP: fix error checking in opp_migrate_dentry() debugfs: update comment of debugfs_rename() i3c: fix device.h kernel-doc warnings dma-mapping: no need to pass a bus_type into get_arch_dma_ops() driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place Revert "driver core: add error handling for devtmpfs_create_node()" Revert "devtmpfs: add debug info to handle()" Revert "devtmpfs: remove return value of devtmpfs_delete_node()" driver core: cpu: don't hand-override the uevent bus_type callback. devtmpfs: remove return value of devtmpfs_delete_node() devtmpfs: add debug info to handle() driver core: add error handling for devtmpfs_create_node() driver core: bus: update my copyright notice driver core: bus: add bus_get_dev_root() function driver core: bus: constify bus_unregister() driver core: bus: constify some internal functions driver core: bus: constify bus_get_kset() driver core: bus: constify bus_register/unregister_notifier() driver core: remove private pointer from struct bus_type ...
| * driver core: make struct device_type.devnode() take a const *Greg Kroah-Hartman2023-01-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The devnode() callback in struct device_type should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Jens Axboe <axboe@kernel.dk> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Ben Widawsky <bwidawsk@kernel.org> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Joel Stanley <joel@jms.id.au> Cc: Alistar Popple <alistair@popple.id.au> Cc: Eddie James <eajames@linux.ibm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Jilin Yuan <yuanjilin@cdjrlc.com> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Won Chung <wonchung@google.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20230111113018.459199-7-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"Christoph Hellwig2023-02-141-9/+8
| | | | | | | | | | | | | | | | This reverts commit 178fa7d49815ea8001f43ade37a22072829fd8ab. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"Christoph Hellwig2023-02-141-2/+2
| | | | | | | | | | | | | | | | | | This reverts commit c43332fe028c252a2a28e46be70a530f64fc3c9d as it is not needed without moving to disk references in the blkg. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | blk-cgroup: delay calling blkcg_exit_disk until disk_releaseChristoph Hellwig2023-02-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While del_gendisk ensures there is no outstanding I/O on the queue, it can't prevent block layer users from building new I/O. This leads to a NULL ->root_blkg reference in bio_associate_blkg when allocating a new bio on a shut down file system. Delay freeing the blk-cgroup subsystems from del_gendisk until disk_release to make sure the blkg and throttle information is still avaіlable for bio submitters, even if those bios will immediately fail. This now can cause a case where disk_release is called on a disk that hasn't been added. That's mostly harmless, except for a case in blk_throttl_exit that now needs to check for a NULL ->td pointer. Fixes: 178fa7d49815 ("blk-cgroup: delay blk-cgroup initialization until add_disk") Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230208063514.171485-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: stub out and deprecated the capability attribute on the gendiskChristoph Hellwig2023-02-061-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The capability attribute was added in 2017 to expose the kernel internal GENHD_FL_MEDIA_CHANGE_NOTIFY to userspace without ever adding a value to an UAPI header, and without ever setting it in any driver until it was finally removed in Linux 5.7. Deprecate the file and always return 0 instead of exposing the other internal and frequently renumbered other gendisk flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230203150209.3199115-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | blk-cgroup: delay blk-cgroup initialization until add_diskChristoph Hellwig2023-02-031-8/+9
|/ | | | | | | | | | | | | | | There is no need to initialize the cgroup code before the disk is marked live. Moving the cgroup initialization earlier will help to have a fully initialized struct device in the gendisk for the cgroup code to use in the future. Similarly tear the cgroup information down in del_gendisk to be symmetric and because none of the cgroup tracking is needed once non-passthrough I/O stops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'block-2023-01-06' of git://git.kernel.dk/linuxLinus Torvalds2023-01-061-0/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "The big change here is obviously the revert of the pktcdvd driver removal. Outside of that, just minor tweaks. In detail: - Re-instate the pktcdvd driver, which necessitates adding back bio_copy_data_iter() and the fops->devnode() hook for now (me) - Fix for splitting of a bio marked as NOWAIT, causing either nowait reads or writes to error with EAGAIN even if parts of the IO completed (me) - Fix for ublk, punting management commands to io-wq as they can all easily block for extended periods of time (Ming) - Removal of SRCU dependency for the block layer (Paul)" * tag 'block-2023-01-06' of git://git.kernel.dk/linux: block: Remove "select SRCU" Revert "pktcdvd: remove driver." Revert "block: remove devnode callback from struct block_device_operations" Revert "block: bio_copy_data_iter" ublk: honor IO_URING_F_NONBLOCK for handling control command block: don't allow splitting of a REQ_NOWAIT bio block: handle bio_split_to_limits() NULL return
| * Revert "block: remove devnode callback from struct block_device_operations"Jens Axboe2023-01-041-0/+11
| | | | | | | | | | | | | | | | This reverts commit 85d6ce58e493ac8b7122e2fbe3f41b94d6ebdc11. We're reinstating the pktcdvd driver, which needs this API. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'driver-core-6.2-rc1' of ↵Linus Torvalds2022-12-161-2/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core and kernfs changes for 6.2-rc1. The "big" change in here is the addition of a new macro, container_of_const() that will preserve the "const-ness" of a pointer passed into it. The "problem" of the current container_of() macro is that if you pass in a "const *", out of it can comes a non-const pointer unless you specifically ask for it. For many usages, we want to preserve the "const" attribute by using the same call. For a specific example, this series changes the kobj_to_dev() macro to use it, allowing it to be used no matter what the const value is. This prevents every subsystem from having to declare 2 different individual macros (i.e. kobj_const_to_dev() and kobj_to_dev()) and having the compiler enforce the const value at build time, which having 2 macros would not do either. The driver for all of this have been discussions with the Rust kernel developers as to how to properly mark driver core, and kobject, objects as being "non-mutable". The changes to the kobject and driver core in this pull request are the result of that, as there are lots of paths where kobjects and device pointers are not modified at all, so marking them as "const" allows the compiler to enforce this. So, a nice side affect of the Rust development effort has been already to clean up the driver core code to be more obvious about object rules. All of this has been bike-shedded in quite a lot of detail on lkml with different names and implementations resulting in the tiny version we have in here, much better than my original proposal. Lots of subsystem maintainers have acked the changes as well. Other than this change, included in here are smaller stuff like: - kernfs fixes and updates to handle lock contention better - vmlinux.lds.h fixes and updates - sysfs and debugfs documentation updates - device property updates All of these have been in the linux-next tree for quite a while with no problems" * tag 'driver-core-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (58 commits) device property: Fix documentation for fwnode_get_next_parent() firmware_loader: fix up to_fw_sysfs() to preserve const usb.h: take advantage of container_of_const() device.h: move kobj_to_dev() to use container_of_const() container_of: add container_of_const() that preserves const-ness of the pointer driver core: fix up missed drivers/s390/char/hmcdrv_dev.c class.devnode() conversion. driver core: fix up missed scsi/cxlflash class.devnode() conversion. driver core: fix up some missing class.devnode() conversions. driver core: make struct class.devnode() take a const * driver core: make struct class.dev_uevent() take a const * cacheinfo: Remove of_node_put() for fw_token device property: Add a blank line in Kconfig of tests device property: Rename goto label to be more precise device property: Move PROPERTY_ENTRY_BOOL() a bit down device property: Get rid of __PROPERTY_ENTRY_ARRAY_EL*SIZE*() kernfs: fix all kernel-doc warnings and multiple typos driver core: pass a const * into of_device_uevent() kobject: kset_uevent_ops: make name() callback take a const * kobject: kset_uevent_ops: make filter() callback take a const * kobject: make kobject_namespace take a const * ...
| * driver core: make struct class.dev_uevent() take a const *Greg Kroah-Hartman2022-11-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_uevent() in struct class should not be modifying the device that is passed into it, so mark it as a const * and propagate the function signature changes out into all relevant subsystems that use this callback. Cc: Jens Axboe <axboe@kernel.dk> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Russ Weight <russell.h.weight@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Johan Hovold <johan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Keith Busch <kbusch@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: Raed Salem <raeds@nvidia.com> Cc: Chen Zhongjin <chenzhongjin@huawei.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Avihai Horon <avihaih@nvidia.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jakob Koschel <jakobkoschel@gmail.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Wang Yufen <wangyufen@huawei.com> Cc: linux-block@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-media@vger.kernel.org Cc: linux-nvme@lists.infradead.org Cc: linux-pm@vger.kernel.org Cc: linux-rdma@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Sebastian Reichel <sre@kernel.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Link: https://lore.kernel.org/r/20221123122523.1332370-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linuxLinus Torvalds2022-12-131-19/+16
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - NVMe pull requests via Christoph: - Support some passthrough commands without CAP_SYS_ADMIN (Kanchan Joshi) - Refactor PCIe probing and reset (Christoph Hellwig) - Various fabrics authentication fixes and improvements (Sagi Grimberg) - Avoid fallback to sequential scan due to transient issues (Uday Shankar) - Implement support for the DEAC bit in Write Zeroes (Christoph Hellwig) - Allow overriding the IEEE OUI and firmware revision in configfs for nvmet (Aleksandr Miloserdov) - Force reconnect when number of queue changes in nvmet (Daniel Wagner) - Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi Grimberg, Christoph Hellwig, Christophe JAILLET) - Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni) - Use the common tagset helpers in nvme-pci driver (Christoph Hellwig) - Cleanup the nvme-pci removal path (Christoph Hellwig) - Use kstrtobool() instead of strtobool (Christophe JAILLET) - Allow unprivileged passthrough of Identify Controller (Joel Granados) - Support io stats on the mpath device (Sagi Grimberg) - Minor nvmet cleanup (Sagi Grimberg) - MD pull requests via Song: - Code cleanups (Christoph) - Various fixes - Floppy pull request from Denis: - Fix a memory leak in the init error path (Yuan) - Series fixing some batch wakeup issues with sbitmap (Gabriel) - Removal of the pktcdvd driver that was deprecated more than 5 years ago, and subsequent removal of the devnode callback in struct block_device_operations as no users are now left (Greg) - Fix for partition read on an exclusively opened bdev (Jan) - Series of elevator API cleanups (Jinlong, Christoph) - Series of fixes and cleanups for blk-iocost (Kemeng) - Series of fixes and cleanups for blk-throttle (Kemeng) - Series adding concurrent support for sync queues in BFQ (Yu) - Series bringing drbd a bit closer to the out-of-tree maintained version (Christian, Joel, Lars, Philipp) - Misc drbd fixes (Wang) - blk-wbt fixes and tweaks for enable/disable (Yu) - Fixes for mq-deadline for zoned devices (Damien) - Add support for read-only and offline zones for null_blk (Shin'ichiro) - Series fixing the delayed holder tracking, as used by DM (Yu, Christoph) - Series enabling bio alloc caching for IRQ based IO (Pavel) - Series enabling userspace peer-to-peer DMA (Logan) - BFQ waker fixes (Khazhismel) - Series fixing elevator refcount issues (Christoph, Jinlong) - Series cleaning up references around queue destruction (Christoph) - Series doing quiesce by tagset, enabling cleanups in drivers (Christoph, Chao) - Series untangling the queue kobject and queue references (Christoph) - Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye, Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph) * tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits) blktrace: Fix output non-blktrace event when blk_classic option enabled block: sed-opal: Don't include <linux/kernel.h> sed-opal: allow using IOC_OPAL_SAVE for locking too blk-cgroup: Fix typo in comment block: remove bio_set_op_attrs nvmet: don't open-code NVME_NS_ATTR_RO enumeration nvme-pci: use the tagset alloc/free helpers nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers nvme: consolidate setting the tagset flags nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set block: bio_copy_data_iter nvme-pci: split out a nvme_pci_ctrl_is_dead helper nvme-pci: return early on ctrl state mismatch in nvme_reset_work nvme-pci: rename nvme_disable_io_queues nvme-pci: cleanup nvme_suspend_queue nvme-pci: remove nvme_pci_disable nvme-pci: remove nvme_disable_admin_queue nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl nvme: use nvme_wait_ready in nvme_shutdown_ctrl ...
| * block: remove devnode callback from struct block_device_operationsGreg Kroah-Hartman2022-12-031-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | With the removal of the pktcdvd driver, there are no in-kernel users of the devnode callback in struct block_device_operations, so it can be safely removed. If it is needed for new block drivers in the future, it can be brought back. Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20221203140747.1942969-1-gregkh@linuxfoundation.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: Do not reread partition table on exclusively open deviceJan Kara2022-12-011-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures. Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221130175653.24299-1-jack@suse.cz [axboe: fold in followup fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: remove delayed holder registrationChristoph Hellwig2022-11-161-4/+0
| | | | | | | | | | | | | | | | | | | | Now that dm has been fixed to track of holder registrations before add_disk, the somewhat buggy block layer code can be safely removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221115141054.1051801-8-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: clear ->slave_dir when dropping the main slave_dir referenceChristoph Hellwig2022-11-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Zero out the pointer to ->slave_dir so that the holder code doesn't incorrectly treat the object as alive when add_disk failed or after del_gendisk was called. Fixes: 89f871af1b26 ("dm: delay registering the gendisk") Reported-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20221115141054.1051801-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * blk-mq: move the srcu_struct used for quiescing to the tagsetChristoph Hellwig2022-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All I/O submissions have fairly similar latencies, and a tagset-wide quiesce is a fairly common operation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-12-hch@lst.de [axboe: fix whitespace] Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: set the disk capacity to 0 in blk_mark_disk_deadChristoph Hellwig2022-11-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nvme and xen-blkfront are already doing this to stop buffered writes from creating dirty pages that can't be written out later. Move it to the common code. This also removes the comment about the ordering from nvme, as bd_mutex not only is gone entirely, but also hasn't been used for locking updates to the disk size long before that, and thus the ordering requirement documented there doesn't apply any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Chao Leng <lengchao@huawei.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20221101150050.3510-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * blk-mq: move queue_is_mq out of blk_mq_cancel_work_syncJinlong Chen2022-10-311-1/+3
| | | | | | | | | | | | | | | | | | | | The only caller that needs queue_is_mq check is del_gendisk, so move the check into it. Signed-off-by: Jinlong Chen <nickyc975@zju.edu.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221030094730.1275463-1-nickyc975@zju.edu.cn Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: Fix possible memory leak for rq_wb on add_disk failureChen Zhongjin2022-10-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmemleak reported memory leaks in device_add_disk(): kmemleak: 3 new suspected memory leaks unreferenced object 0xffff88800f420800 (size 512): comm "modprobe", pid 4275, jiffies 4295639067 (age 223.512s) hex dump (first 32 bytes): 04 00 00 00 08 00 00 00 01 00 00 00 00 00 00 00 ................ 00 e1 f5 05 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d3662699>] kmalloc_trace+0x26/0x60 [<00000000edc7aadc>] wbt_init+0x50/0x6f0 [<0000000069601d16>] wbt_enable_default+0x157/0x1c0 [<0000000028fc393f>] blk_register_queue+0x2a4/0x420 [<000000007345a042>] device_add_disk+0x6fd/0xe40 [<0000000060e6aab0>] nbd_dev_add+0x828/0xbf0 [nbd] ... It is because the memory allocated in wbt_enable_default() is not released in device_add_disk() error path. Normally, these memory are freed in: del_gendisk() rq_qos_exit() rqos->ops->exit(rqos); wbt_exit() So rq_qos_exit() is called to free the rq_wb memory for wbt_init(). However in the error path of device_add_disk(), only blk_unregister_queue() is called and make rq_wb memory leaked. Add rq_qos_exit() to the error path to fix it. Fixes: 83cbce957446 ("block: add error handling for device_add_disk / add_disk") Signed-off-by: Chen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221029071355.35462-1-chenzhongjin@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: fix memory leak for elevator on add_disk failureYu Kuai2022-10-221-4/+8
|/ | | | | | | | | | | | The default elevator is allocated in the beginning of device_add_disk(), however, it's not freed in the following error path. Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20221022021615.2756171-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'for-6.1/block' into block-6.1Jens Axboe2022-10-101-0/+7
|\ | | | | | | | | | | | | | | | | | | | | Merge in later fixes. * for-6.1/block: block: fix leaking minors of hidden disks block: avoid sign extend problem with default queue flags mask blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() block: Remove the repeat word 'can' MAINTAINERS: Update SED-Opal Maintainers