summaryrefslogtreecommitdiffstats
path: root/block/bdev.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'vfs-6.8.super' of ↵Linus Torvalds2024-01-081-107/+151
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...
| * block: Fix a memory leak in bdev_open_by_dev()Christophe JAILLET2023-12-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | If we early exit here, 'handle' needs to be freed, or some memory leaks. Fixes: ed5cc702d311 ("block: Add config option to not allow writing to mounted devices") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/8eaec334781e695810aaa383b55de00ca4ab1352.1703439383.git.christophe.jaillet@wanadoo.fr Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
| * block: Add config option to not allow writing to mounted devicesJan Kara2023-11-181-1/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. We will make filesystems use this flag for used devices. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. Link: https://lore.kernel.org/all/60788e5d-5c7c-1142-e554-c21d709acfd9@linaro.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231101174325.10596-3-jack@suse.cz Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
| * block: Remove blkdev_get_by_*() functionsJan Kara2023-11-181-64/+30
| | | | | | | | | | | | | | | | | | | | | | | | blkdev_get_by_*() and blkdev_put() functions are now unused. Remove them. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231101174325.10596-2-jack@suse.cz Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
| * bdev: implement freeze and thaw holder operationsChristian Brauner2023-11-181-33/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old method of implementing block device freeze and thaw operations required us to rely on get_active_super() to walk the list of all superblocks on the system to find any superblock that might use the block device. This is wasteful and not very pleasant overall. Now that we can finally go straight from block device to owning superblock things become way simpler. Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-5-599c19f4faac@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
| * bdev: surface the error from sync_blockdev()Christian Brauner2023-11-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When freeze_super() is called, sync_filesystem() will be called which calls sync_blockdev() and already surfaces any errors. Do the same for block devices that aren't owned by a superblock and also for filesystems that don't call sync_blockdev() internally but implicitly rely on bdev_freeze() to do it. Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-3-599c19f4faac@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
| * bdev: rename freeze and thaw helpersChristian Brauner2023-11-181-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have bdev_mark_dead() etc and we're going to move block device freezing to holder ops in the next patch. Make the naming consistent: * freeze_bdev() -> bdev_freeze() * thaw_bdev() -> bdev_thaw() Also document the return code. Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-2-599c19f4faac@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | block: update the stable_writes flag in bdev_addChristoph Hellwig2023-11-201-0/+2
|/ | | | | | | | | | | | | | | | | | | Propagate the per-queue stable_write flags into each bdev inode in bdev_add. This makes sure devices that require stable writes have it set for I/O on the block device node as well. Note that this doesn't cover the case of a flag changing on a live device yet. We should handle that as well, but I plan to cover it as part of a more general rework of how changing runtime paramters on block devices works. Fixes: 1cb039f3dc16 ("bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag") Reported-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231025141020.192413-3-hch@lst.de Tested-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
* Merge tag 'mm-nonmm-stable-2023-11-02-14-08' of ↵Linus Torvalds2023-11-031-3/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "As usual, lots of singleton and doubleton patches all over the tree and there's little I can say which isn't in the individual changelogs. The lengthier patch series are - 'kdump: use generic functions to simplify crashkernel reservation in arch', from Baoquan He. This is mainly cleanups and consolidation of the 'crashkernel=' kernel parameter handling - After much discussion, David Laight's 'minmax: Relax type checks in min() and max()' is here. Hopefully reduces some typecasting and the use of min_t() and max_t() - A group of patches from Oleg Nesterov which clean up and slightly fix our handling of reads from /proc/PID/task/... and which remove task_struct.thread_group" * tag 'mm-nonmm-stable-2023-11-02-14-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (64 commits) scripts/gdb/vmalloc: disable on no-MMU scripts/gdb: fix usage of MOD_TEXT not defined when CONFIG_MODULES=n .mailmap: add address mapping for Tomeu Vizoso mailmap: update email address for Claudiu Beznea tools/testing/selftests/mm/run_vmtests.sh: lower the ptrace permissions .mailmap: map Benjamin Poirier's address scripts/gdb: add lx_current support for riscv ocfs2: fix a spelling typo in comment proc: test ProtectionKey in proc-empty-vm test proc: fix proc-empty-vm test with vsyscall fs/proc/base.c: remove unneeded semicolon do_io_accounting: use sig->stats_lock do_io_accounting: use __for_each_thread() ocfs2: replace BUG_ON() at ocfs2_num_free_extents() with ocfs2_error() ocfs2: fix a typo in a comment scripts/show_delta: add __main__ judgement before main code treewide: mark stuff as __ro_after_init fs: ocfs2: check status values proc: test /proc/${pid}/statm compiler.h: move __is_constexpr() to compiler.h ...
| * treewide: mark stuff as __ro_after_initAlexey Dobriyan2023-10-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | __read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1. Also, mark some stuff as "const" and "__init" while I'm at it. [akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning] [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | block: move bdev_mark_dead out of disk_check_media_changeChristoph Hellwig2023-10-281-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disk_check_media_change is mostly called from ->open where it makes little sense to mark the file system on the device as dead, as we are just opening it. So instead of calling bdev_mark_dead from disk_check_media_change move it into the few callers that are not in an open instance. This avoid calling into bdev_mark_dead and thus taking s_umount with open_mutex held. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231017184823.1383356-4-hch@lst.de Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | fs: Avoid grabbing sb->s_umount under bdev->bd_holder_lockJan Kara2023-10-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of bdev holder operations such as fs_bdev_mark_dead() and fs_bdev_sync() grab sb->s_umount semaphore under bdev->bd_holder_lock. This is problematic because it leads to disk->open_mutex -> sb->s_umount lock ordering which is counterintuitive (usually we grab higher level (e.g. filesystem) locks first and lower level (e.g. block layer) locks later) and indeed makes lockdep complain about possible locking cycles whenever we open a block device while holding sb->s_umount semaphore. Implement a function bdev_super_lock_shared() which safely transitions from holding bdev->bd_holder_lock to holding sb->s_umount on alive superblock without introducing the problematic lock dependency. We use this function fs_bdev_sync() and fs_bdev_mark_dead(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231018152924.3858-1-jack@suse.cz Link: https://lore.kernel.org/r/20231017184823.1383356-1-hch@lst.de Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | block: Use bdev_open_by_dev() in blkdev_open()Jan Kara2023-10-281-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert blkdev_open() to use bdev_open_by_dev(). To be able to propagate handle from blkdev_open() to blkdev_release() we need to stop using existence of file->private_data to determine exclusive block device opens. Use bdev_handle->mode for this purpose since file->f_flags isn't usable for this (O_EXCL is cleared from the flags during open). Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-2-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* | block: Provide bdev_open_* functionsJan Kara2023-10-281-0/+48
|/ | | | | | | | | | | | | | Create struct bdev_handle that contains all parameters that need to be passed to blkdev_put() and provide bdev_open_* functions that return this structure instead of plain bdev pointer. This will eventually allow us to pass one more argument to blkdev_put() (renamed to bdev_release()) without too much hassle. Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-1-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* Merge tag 'vfs-6.6-merge-2' of ↵Christian Brauner2023-08-231-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://gitolite.kernel.org/pub/scm/fs/xfs/xfs-linux Pull filesystem freezing updates from Darrick Wong: New code for 6.6: * Allow the kernel to initiate a freeze of a filesystem. The kernel and userspace can both hold a freeze on a filesystem at the same time; the freeze is not lifted until /both/ holders lift it. This will enable us to fix a longstanding bug in XFS online fsck. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Message-Id: <20230822182604.GB11286@frogsfrogsfrogs> Signed-off-by: Christian Brauner <brauner@kernel.org>
| * fs: distinguish between user initiated freeze and kernel initiated freezeDarrick J. Wong2023-07-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can freeze a filesystem using the FIFREEZE ioctl or by suspending the block device; this state persists until userspace thaws the filesystem with the FITHAW ioctl or resuming the block device. Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for the fsfreeze ioctl") we only allow the first freeze command to succeed. The kernel may decide that it is necessary to freeze a filesystem for its own internal purposes, such as suspends in progress, filesystem fsck activities, or quiescing a device prior to removal. Userspace thaw commands must never break a kernel freeze, and kernel thaw commands shouldn't undo userspace's freeze command. Introduce a couple of freeze holder flags and wire it into the sb_writers state. One kernel and one userspace freeze are allowed to coexist at the same time; the filesystem will not thaw until both are lifted. I wonder if the f2fs/gfs2 code should be using a kernel freeze here, but for now we'll use FREEZE_HOLDER_USERSPACE to preserve existing behaviors. Cc: mcgrof@kernel.org Cc: jack@suse.cz Cc: hch@infradead.org Cc: ruansy.fnst@fujitsu.com Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
* | block: call into the file system for ioctl BLKFLSBUFChristoph Hellwig2023-08-211-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BLKFLSBUF is a historic ioctl that is called on a file handle to a block device and syncs either the file system mounted on that block device if there is one, or otherwise the just the data on the block device. Replace the get_super based syncing with a holder operation to remove the last usage of get_super, and to also support syncing the file system if the block device is not the main block device stored in s_dev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-16-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | block: call into the file system for bdev_mark_deadChristoph Hellwig2023-08-211-21/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Combine the newly merged bdev_mark_dead helper with the existing mark_dead holder operation so that all operations that invalidate a device that is dead or being removed now go through the holder ops. This allows file systems to explicitly shutdown either ASAP (for a surprise removal) or after writing back data (for an orderly removal), and do so not only for the main device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-15-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | block: consolidate __invalidate_device and fsync_bdevChristoph Hellwig2023-08-211-5/+28
|/ | | | | | | | | | | | | | | | | | | | | | | We currently have two interfaces that take a block_devices and the find a mounted file systems to flush or invaldidate data on it. Both are a bit problematic because they only work for the "main" block devices that is used as s_dev for the super_block, and because they don't call into the file system at all. Merge the two into a new bdev_mark_dead helper that does both the syncing and invalidation and which is properly documented. This is in preparation of merging the functionality into the ->mark_dead holder operation so that it will work on additional block devices used by a file systems and give us a single entry point for invalidation of dead devices or media. Note that a single standalone fsync_bdev call for an obscure ioctl remains for now, but that one will also be deal with in a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Message-Id: <20230811100828.1897174-14-hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
* block: Improve kernel-doc headersBart Van Assche2023-06-211-0/+1
| | | | | | | | | | | | | | | | Fix the documentation of the devt_from_partuuid() return value. Fix the following two recently introduced kernel-doc warnings: block/bdev.c:570: warning: Function parameter or member 'hops' not described in 'bd_finish_claiming' block/early-lookup.c:46: warning: Function parameter or member 'devt' not described in 'devt_from_partuuid' Cc: Christoph Hellwig <hch@lst.de> Fixes: 0718afd47f70 ("block: introduce holder ops") Fixes: cf056a431215 ("init: improve the name_to_dev_t interface") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230621165054.743815-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: document the holder argument to blkdev_get_by_pathChristoph Hellwig2023-06-201-0/+1
| | | | | | | Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230620043536.707249-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: replace fmode_t with a block-specific type for block open flagsChristoph Hellwig2023-06-121-16/+16
| | | | | | | | | | | | | | The only overlap between the block open flags mapped into the fmode_t and other uses of fmode_t are FMODE_READ and FMODE_WRITE. Define a new blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and ->ioctl and stop abusing fmode_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: use the holder as indication for exclusive opensChristoph Hellwig2023-06-121-16/+21
| | | | | | | | | | | | | | | | | | The current interface for exclusive opens is rather confusing as it requires both the FMODE_EXCL flag and a holder. Remove the need to pass FMODE_EXCL and just key off the exclusive open off a non-NULL holder. For blkdev_put this requires adding the holder argument, which provides better debug checking that only the holder actually releases the hold, but at the same time allows removing the now superfluous mode argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the unused mode argument to ->releaseChristoph Hellwig2023-06-121-7/+7
| | | | | | | | | | | | The mode argument to the ->release block_device_operation is never used, so remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: pass a gendisk to ->openChristoph Hellwig2023-06-121-1/+1
| | | | | | | | | | | | ->open is only called on the whole device. Make that explicit by passing a gendisk instead of the block_device. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Jack Wang <jinpu.wang@ionos.com> [rnbd] Link: https://lore.kernel.org/r/20230608110258.189493-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: also call ->open for incremental partition opensChristoph Hellwig2023-06-121-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For whole devices ->open is called for each open, but for partitions it is only called on the first open of a partition, e.g.: open("/dev/vdb", ...) open("/dev/vdb", ...) - 2 call to ->open open("/dev/vdb1", ...) open("/dev/vdb", ...) - 2 call to ->open open("/dev/vdb", ...) open("/dev/vdb", ...) - just open call to ->open This is problematic as various block drivers look at open flags and might not do all the required setup if the earlier open was with an odd flag like O_NDELAY or the magic 3 ioctl-only open mode. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230608110258.189493-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: introduce holder opsChristoph Hellwig2023-06-051-12/+29
| | | | | | | | | | | | | | Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and installed in the block_device for exclusive claims. It will be used to allow the block layer to call back into the user of the block device for thing like notification of a removed device or a device resize. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: turn bdev_lock into a mutexChristoph Hellwig2023-06-051-14/+13
| | | | | | | | | | | | | There is no reason for this lock to spin, and being able to sleep under it will come in handy soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: refactor bd_may_claimChristoph Hellwig2023-06-051-18/+22
| | | | | | | | | | | | | | The long if/else chain obsfucates the actual logic. Tidy it up to be more structured. Also drop the whole argument, as it can be trivially derived from bdev using bdev_whole, and having the bdev_whole in the function makes it easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: factor out a bd_end_claim helper from blkdev_putChristoph Hellwig2023-06-051-30/+33
| | | | | | | | | | | | Move all the logic to release an exclusive claim into a helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20230601094459.1350643-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: sync part's ->bd_has_submit_bio with disk'sMing Lei2023-04-251-1/+4
| | | | | | | | | | | | | | | submit_bio() always uses bio->bi_bdev->bd_has_submit_bio to decide if disk's ->submit_bio() is called, and bio->bi_bdev could point to one partition device. So we have to sync part bdev's ->bd_has_submit_bio with disk's. Reported-by: Changhui Zhong <czhong@redhat.com> Link: https://lore.kernel.org/linux-block/ZEdItaPqif8fp85H@ovpn-8-24.pek2.redhat.com/T/#t Fixes: 9f4107b07b17 ("block: store bdev->bd_disk->fops->submit_bio state in bdev") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230425034154.110099-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Cleanup set_capacity()/bdev_set_nr_sectors()Damien Le Moal2023-04-241-0/+8
| | | | | | | | | | | | | | | The code for setting a block device capacity (bd_nr_sectors field of struct block_device) is duplicated in set_capacity() and bdev_set_nr_sectors(). Clean this up by making bdev_set_nr_sectors() a block layer internal function defined in block/bdev.c instead of having this function statically defined in block/partitions/core.c. With this change, set_capacity() implementation can be simplified to only calling bdev_set_nr_sectors(). Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230424131318.79935-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: store bdev->bd_disk->fops->submit_bio state in bdevJens Axboe2023-04-161-0/+1
| | | | | | | | | We have a long chain of memory dereferencing just to whether or not this disk has a special submit_bio helper. As that's not necessarily the common case, add a bd_has_submit_bio state in the bdev to avoid traversing this memory dependency chain if we don't need to. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove ->rw_pageChristoph Hellwig2023-02-031-78/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ->rw_page method is a special purpose bypass of the usual bio handling path that is limited to single-page reads and writes and synchronous which causes a lot of extra code in the drivers, callers and the block layer. The only remaining user is the MM swap code. Switch that swap code to simply submit a single-vec on-stack bio an synchronously wait on it based on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that currently implement ->rw_page instead. While this touches one extra cache line and executes extra code, it simplifies the block layer and drivers and ensures that all feastures are properly supported by all drivers, e.g. right now ->rw_page bypassed cgroup writeback entirely. [akpm@linux-foundation.org: fix comment typo, per Dan] Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Keith Busch <kbusch@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* block: bdev & blktrace: use consistent function doc. notationRandy Dunlap2022-12-011-2/+2
| | | | | | | | | | | | | | Use only one hyphen in kernel-doc notation between the function name and its short description. The is the documented kerenl-doc format. It also fixes the HTML presentation to be consistent with other functions. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Link: https://lore.kernel.org/r/20221201070331.25685-1-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* vfs: support STATX_DIOALIGN on block devicesEric Biggers2022-09-121-0/+23
| | | | | | | | | | | | | | | Add support for STATX_DIOALIGN to block devices, so that direct I/O alignment restrictions are exposed to userspace in a generic way. Note that this breaks the tradition of stat operating only on the block device node, not the block device itself. However, it was felt that doing this is preferable, in order to make the interface useful and avoid needing separate interfaces for regular files and block devices. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220827065851.135710-3-ebiggers@kernel.org
* block: stop using bdevname in bdev_write_inodeChristoph Hellwig2022-07-141-6/+4
| | | | | | | | | | | | Just use the %pg format specifier instead. Also reformat the printk statement to be more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220713055317.1888500-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'exfat-for-5.19-rc1' of ↵Linus Torvalds2022-05-251-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - fix referencing wrong parent directory information during rename - introduce a sys_tz mount option to use system timezone - improve performance while zeroing a cluster with dirsync mount option - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot * tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: check if cluster num is valid exfat: reduce block requests when zeroing a cluster block: add sync_blockdev_range() exfat: introduce mount option 'sys_tz' exfat: fix referencing wrong parent directory information after renaming
| * block: add sync_blockdev_range()Yuezhang Mo2022-05-231-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sync_blockdev_range() is to support syncing multiple sectors with as few block device requests as possible, it is helpful to make the block device to give full play to its performance. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
* | block: turn bdev->bd_openers into an atomic_tChristoph Hellwig2022-04-181-8/+8
|/ | | | | | | | | | | | | All manipulation of bd_openers is under disk->open_mutex and will remain so for the foreseeable future. But at least one place reads it without the lock (blkdev_get) and there are more to be added. So make sure the compiler does not do turn the increments and decrements into non-atomic sequences by using an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220330052917.2566582-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'akpm' (patches from Andrew)Linus Torvalds2022-03-231-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
| * fs: allocate inode by using alloc_inode_sb()Muchun Song2022-03-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | block: remove redundant semicolonNian Yanchuan2022-02-271-1/+1
| | | | | | | | | | | | | | | | Remove redundant semicolon from block/bdev.c Signed-off-by: Nian Yanchuan <yanchuan@nfschina.com> Link: https://lore.kernel.org/r/20220227170124.GA14658@localhost.localdomain Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: default BLOCK_LEGACY_AUTOLOAD to yChristoph Hellwig2022-02-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | As Luis reported, losetup currently doesn't properly create the loop device without this if the device node already exists because old scripts created it manually. So default to y for now and remove the aggressive removal schedule. Reported-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220225181440.1351591-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | block: deprecate autoloading based on dev_tChristoph Hellwig2022-02-021-3/+6
|/ | | | | | | | | | Make the legacy dev_t based autoloading optional and add a deprecation warning. This kind of autoloading has ceased to be useful about 20 years ago. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220104071647.164918-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* mm: remove cleancacheChristoph Hellwig2022-01-221-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit 814bbf49dcd0 ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* bdev: Improve lookup_bdev documentationMatthew Wilcox (Oracle)2021-12-131-6/+6
| | | | | | | | | | Add a Context section and rewrite the rest to be clearer. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Link: https://lore.kernel.org/r/20211213171113.3097631-1-willy@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Remove redundant initialization of variable retColin Ian King2021-11-291-1/+1
| | | | | | | | | | The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20211126230652.1175636-1-colin.i.king@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the GENHD_FL_HIDDEN check in blkdev_get_no_openChristoph Hellwig2021-11-291-8/+0
| | | | | | | | Hidden gendisks never hash the block device inode, so this can't happen. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE to disk->event_flagsChristoph Hellwig2021-11-291-1/+1
| | | | | | | | | GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE is all about the event reporting mechanism, so move it to the event_flags field. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211122130625.1136848-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>