summaryrefslogtreecommitdiffstats
path: root/block/holder.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* block: fix deadlock between bd_link_disk_holder and partition scanLi Nan2024-02-231-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'open_mutex' of gendisk is used to protect open/close block devices. But in bd_link_disk_holder(), it is used to protect the creation of symlink between holding disk and slave bdev, which introduces some issues. When bd_link_disk_holder() is called, the driver is usually in the process of initialization/modification and may suspend submitting io. At this time, any io hold 'open_mutex', such as scanning partitions, can cause deadlocks. For example, in raid: T1 T2 bdev_open_by_dev lock open_mutex [1] ... efi_partition ... md_submit_bio md_ioctl mddev_syspend -> suspend all io md_add_new_disk bind_rdev_to_array bd_link_disk_holder try lock open_mutex [2] md_handle_request -> wait mddev_resume T1 scan partition, T2 add a new device to raid. T1 waits for T2 to resume mddev, but T2 waits for open_mutex held by T1. Deadlock occurs. Fix it by introducing a local mutex 'blk_holder_mutex' to replace 'open_mutex'. Fixes: 1b0a2d950ee2 ("md: use new apis to suspend array for ioctls involed array reconfiguration") Reported-by: mgperkow@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218459 Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20240221090122.1281868-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: don't allow a disk link holder to itselfYu Kuai2022-11-161-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After creating a dm device, then user can reload such dm with itself, and dead loop will be triggered because dm keep looking up to itself. Test procedures: 1) dmsetup create test --table "xxx sda", assume dm-0 is created 2) dmsetup suspend test 3) dmsetup reload test --table "xxx dm-0" 4) dmsetup resume test Test result: BUG: TASK stack guard page was hit at 00000000736a261f (stack is 000000008d12c88d..00000000c8dd82d5) stack guard page: 0000 [#1] PREEMPT SMP CPU: 29 PID: 946 Comm: systemd-udevd Not tainted 6.1.0-rc3-next-20221101-00006-g17640ca3b0ee #1295 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 RIP: 0010:dm_prepare_ioctl+0xf/0x1e0 Code: da 48 83 05 4a 7c 99 0b 01 41 89 c4 eb cd e8 b8 1f 40 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 48 83 05 a1 5a 99 0b 01 <41> 56 49 89 d6 41 55 4c 8d af 90 02 00 00 9 RSP: 0018:ffffc90002090000 EFLAGS: 00010206 RAX: ffff8881049d6800 RBX: ffff88817e589000 RCX: 0000000000000000 RDX: ffffc90002090010 RSI: ffffc9000209001c RDI: ffff88817e589000 RBP: 00000000484a101d R08: 0000000000000000 R09: 0000000000000007 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005331 R13: 0000000000005331 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fddf9609200(0000) GS:ffff889fbfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000208fff8 CR3: 0000000179043000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dm_blk_ioctl+0x50/0x1c0 ? dm_prepare_ioctl+0xe0/0x1e0 dm_blk_ioctl+0x88/0x1c0 dm_blk_ioctl+0x88/0x1c0 ......(a lot of same lines) dm_blk_ioctl+0x88/0x1c0 dm_blk_ioctl+0x88/0x1c0 blkdev_ioctl+0x184/0x3e0 __x64_sys_ioctl+0xa3/0x110 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fddf7306577 Code: b3 66 90 48 8b 05 11 89 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 88 8 RSP: 002b:00007ffd0b2ec318 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00005634ef478320 RCX: 00007fddf7306577 RDX: 0000000000000000 RSI: 0000000000005331 RDI: 0000000000000007 RBP: 0000000000000007 R08: 00005634ef4843e0 R09: 0000000000000080 R10: 00007fddf75cfb38 R11: 0000000000000246 R12: 00000000030d4000 R13: 0000000000000000 R14: 0000000000000000 R15: 00005634ef48b800 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:dm_prepare_ioctl+0xf/0x1e0 Code: da 48 83 05 4a 7c 99 0b 01 41 89 c4 eb cd e8 b8 1f 40 00 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 48 83 05 a1 5a 99 0b 01 <41> 56 49 89 d6 41 55 4c 8d af 90 02 00 00 9 RSP: 0018:ffffc90002090000 EFLAGS: 00010206 RAX: ffff8881049d6800 RBX: ffff88817e589000 RCX: 0000000000000000 RDX: ffffc90002090010 RSI: ffffc9000209001c RDI: ffff88817e589000 RBP: 00000000484a101d R08: 0000000000000000 R09: 0000000000000007 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000005331 R13: 0000000000005331 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fddf9609200(0000) GS:ffff889fbfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000208fff8 CR3: 0000000179043000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fix the problem by forbidding a disk to create link to itself. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221115141054.1051801-11-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: store the holder kobject in bd_holder_diskYu Kuai2022-11-161-5/+6
| | | | | | | | | | | | | | We hold a reference to the holder kobject for each bd_holder_disk, so to make the code a bit more robust, use a reference to it instead of the block_device. As long as no one clears ->bd_holder_dir in before freeing the disk, this isn't strictly required, but it does make the code more clear and more robust. Orignally-From: Christoph Hellwig <hch@lst.de> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221115141054.1051801-10-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: fix use after free for bd_holder_dirYu Kuai2022-11-161-6/+15
| | | | | | | | | | | | | | | | | | | | | | | Currently, the caller of bd_link_disk_holer() get 'bdev' by blkdev_get_by_dev(), which will look up 'bdev' by inode number 'dev'. Howerver, it's possible that del_gendisk() can be called currently, and 'bd_holder_dir' can be freed before bd_link_disk_holer() access it, thus use after free is triggered. t1: t2: bdev = blkdev_get_by_dev del_gendisk kobject_put(bd_holder_dir) kobject_free() bd_link_disk_holder Fix the problem by checking disk is still live and grabbing a reference to 'bd_holder_dir' first in bd_link_disk_holder(). Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20221115141054.1051801-9-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove delayed holder registrationChristoph Hellwig2022-11-161-51/+21
| | | | | | | | | | Now that dm has been fixed to track of holder registrations before add_disk, the somewhat buggy block layer code can be safely removed. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20221115141054.1051801-8-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove WARN_ON() from bd_link_disk_holderLi Nan2022-06-231-4/+0
| | | | | | | | | | | Since commit 83cbce957446("block: add error handling for device_add_disk / add_disk"), bdev->bd_holder_dir can not be empty now, so remove WARN_ON() from bd_link_disk_holder. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623074100.2251301-1-linan122@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove genhd.hChristoph Hellwig2022-02-021-1/+1
| | | | | | | | | | | | There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: drop unused includes in <linux/genhd.h>Christoph Hellwig2021-10-181-0/+1
| | | | | | | | | | Drop various include not actually used in genhd.h itself, and move the remaning includes closer together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add back the bd_holder_dir reference in bd_link_disk_holderChristoph Hellwig2021-08-211-0/+7
| | | | | | | | | | | | | | This essentially reverts "block: remove the extra kobject reference in bd_link_disk_holder". That commit dropped the extra reference because the condition in the comment can't be true. But it turns out that comment did not actually describe the problematic situation, so add back the extra reference and document it properly. Fixes: fbd9a39542ec ("block: remove the extra kobject reference in bd_link_disk_holder") Reported-by: Tushar Sugandhi <tusharsu@linux.microsoft.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: support delayed holder registrationChristoph Hellwig2021-08-091-17/+51
| | | | | | | | | | | | | | | | | device mapper needs to register holders before it is ready to do I/O. Currently it does so by registering the disk early, which can leave the disk and queue in a weird half state where the queue is registered with the disk, except for sysfs and the elevator. And this state has been a bit promlematic before, and will get more so when sorting out the responsibilities between the queue and the disk. Support registering holders on an initialized but not registered disk instead by delaying the sysfs registration until the disk is registered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: look up holders by bdevChristoph Hellwig2021-08-091-9/+9
| | | | | | | | | Invert they way the holder relations are tracked. This very slightly reduces the memory overhead for partitioned devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210804094147.459763-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove the extra kobject reference in bd_link_disk_holderChristoph Hellwig2021-08-091-6/+0
| | | | | | | | | | | Since commit 0d02129e76ed ("block: merge struct block_device and struct hd_struct") there is no way for the bdev to go away as long as there is a holder, so remove the extra references. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: make the block holder code optionalChristoph Hellwig2021-08-091-0/+139
Move the block holder code into a separate file as it is not in any way related to the other block_dev.c code, and add a new selectable config option for it so that we don't have to build it without any remapped drivers selected. The Kconfig symbol contains a _DEPRECATED suffix to match the comments added in commit 49731baa41df ("block: restore multiple bd_link_disk_holder() support"). Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Link: https://lore.kernel.org/r/20210804094147.459763-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>