summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* dm crypt: support using trusted keysAhmad Fatoum2021-02-033-2/+24
| | | | | | | | | | Commit 27f5411a718c ("dm crypt: support using encrypted keys") extended dm-crypt to allow use of "encrypted" keys along with "user" and "logon". Along the same lines, teach dm-crypt to support "trusted" keys as well. Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm crypt: replaced #if defined with IS_ENABLEDAhmad Fatoum2021-02-031-5/+2
| | | | | | | | | | | | | | | IS_ENABLED(CONFIG_ENCRYPTED_KEYS) is true whether the option is built-in or a module, so use it instead of #if defined checking for each separately. The other #if was to avoid a static function defined, but unused warning. As we now always build the callsite when the function is defined, we can remove that first #if guard. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de> Acked-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm writecache: fix unnecessary NULL check warningsTian Tao2021-02-031-4/+2
| | | | | | | | | | | Remove NULL checks before vfree() to fix these warnings: ./drivers/md/dm-writecache.c:2008:2-7: WARNING: NULL check before some freeing functions is not needed. ./drivers/md/dm-writecache.c:2024:2-7: WARNING: NULL check before some freeing functions is not needed. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm writecache: fix performance degradation in ssd modeMikulas Patocka2021-02-031-1/+1
| | | | | | | | | | | | Fix a thinko in ssd_commit_superblock. region.count is in sectors, not bytes. This bug doesn't corrupt data, but it causes performance degradation. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: dc8a01ae1dbd ("dm writecache: optimize superblock write") Cc: stable@vger.kernel.org # v5.7+ Reported-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm integrity: introduce the "fix_hmac" argumentMikulas Patocka2021-02-032-13/+136
| | | | | | | | | | | | | | | | | | The "fix_hmac" argument improves security of internal_hash and journal_mac: - the section number is mixed to the mac, so that an attacker can't copy sectors from one journal section to another journal section - the superblock is protected by journal_mac - a 16-byte salt stored in the superblock is mixed to the mac, so that the attacker can't detect that two disks have the same hmac key and also to disallow the attacker to move sectors from one disk to another Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Daniel Glockner <dg@emlix.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> # ReST fix Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm persistent data: fix return type of shadow_root()Jinoh Kang2021-02-032-2/+2
| | | | | | | | | | | | | | | shadow_root() truncates 64-bit dm_block_t into 32-bit int. This is not an issue in practice, since dm metadata as of v5.11 can only hold at most 4161600 blocks (255 index entries * ~16k metadata blocks). Nevertheless, this can confuse users debugging some specific data corruption scenarios. Also, DM_SM_METADATA_MAX_BLOCKS may be bumped in the future, or persistent-data may find its use in other places. Therefore, switch the return type of shadow_root from int to dm_block_t. Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm: cleanup of front padding calculationJeffle Xu2021-02-031-6/+10
| | | | | | | | | | | Add two helper macros calculating the offset of bio in struct dm_io and struct dm_target_io respectively. Besides, simplify the front padding calculation in dm_alloc_md_mempools(). Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm integrity: fix spelling mistake "flusing" -> "flushing"Colin Ian King2021-02-031-1/+1
| | | | | | | | There is a spelling mistake in a dm_integrity_io_error error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm crypt: Spelling s/cihper/cipher/Geert Uytterhoeven2021-02-031-1/+1
| | | | | | | Fix a misspelling of "cipher". Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm dust: remove h from printk format specifierTom Rix2021-02-031-1/+1
| | | | | | | | | | | | See Documentation/core-api/printk-formats.rst. commit cbacb5ab0aa0 ("docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]") Standard integer promotion is already done and %hx and %hhx is useless so do not encourage the use of %hh[xudi] or %h[xudi]. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* block: fix memory leak of bvecMing Lei2021-02-021-1/+1
| | | | | | | | | | | | | bio_init() clears bio instance, so the bvec index has to be set after bio_init(), otherwise bio->bi_io_vec may be leaked. Fixes: 3175199ab0ac ("block: split bio_kmalloc from bio_alloc_bioset") Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: use rdev_read_only in restart_arrayChristoph Hellwig2021-02-011-1/+1
| | | | | | | | Make the read-only check in restart_array identical to the other two read-only checks. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: check for NULL ->meta_bdev before calling bdev_read_onlyChristoph Hellwig2021-02-011-5/+8
| | | | | | | | | | | ->meta_bdev is optional and not set for most arrays. Add a rdev_read_only helper that calls bdev_read_only for both devices in a safe way. Fixes: 6f0d9689b670 ("block: remove the NULL bdev check in bdev_read_only") Reported-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: drop removed argument from kernel-doc of blk_execute_rq()Lukas Bulwahn2021-01-291-1/+0
| | | | | | | | | | | | | | | | Commit 684da7628d93 ("block: remove unnecessary argument from blk_execute_rq") changes the signature of blk_execute_rq(), but misses to adjust its kernel-doc. Hence, make htmldocs warns on ./block/blk-exec.c:78: warning: Excess function parameter 'q' description in 'blk_execute_rq' Drop removed argument from kernel-doc of blk_execute_rq() as well. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Guoqing Jiang <Guoqing.jiang@cloud.ionos.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: remove typo in kernel-doc of set_disk_ro()Lukas Bulwahn2021-01-291-1/+1
| | | | | | | | | | | | | | | | Commit 52f019d43c22 ("block: add a hard-readonly flag to struct gendisk") provides some kernel-doc for set_disk_ro(), but introduces a small typo. Hence, make htmldocs warns on ./block/genhd.c:1441: warning: Function parameter or member 'read_only' not described in 'set_disk_ro' warning: Excess function parameter 'ready_only' description in 'set_disk_ro' Remove that typo in the kernel-doc for set_disk_ro(). Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-cgroup: Remove obsolete macroBaolin Wang2021-01-281-2/+0
| | | | | | | Remove the obsolete 'MAX_KEY_LEN' macro. Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme-core: check bdev value for NULLChaitanya Kulkarni2021-01-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nvme-core sets the bdev to NULL when admin comamnd is issued from IOCTL in the following path e.g. nvme list :- block_ioctl() blkdev_ioctl() nvme_ioctl() nvme_user_cmd() nvme_submit_user_cmd() The commit 309dca309fc3 ("block: store a block_device pointer in struct bio") now uses bdev unconditionally in the macro bio_set_dev() and assumes that bdev value is not NULL which results in the following crash in since thats where bdev is actually accessed :- void bio_associate_blkg_from_css(struct bio *bio, struct cgroup_subsys_state *css) { if (bio->bi_blkg) blkg_put(bio->bi_blkg); if (css && css->parent) { bio->bi_blkg = blkg_tryget_closest(bio, css); } else { --------------> blkg_get(bio->bi_bdev->bd_disk->queue->root_blkg); bio->bi_blkg = bio->bi_bdev->bd_disk->queue->root_blkg; } } EXPORT_SYMBOL_GPL(bio_associate_blkg_from_css); [ 345.385947] BUG: kernel NULL pointer dereference, address: 0000000000000690 [ 345.387103] #PF: supervisor read access in kernel mode [ 345.387894] #PF: error_code(0x0000) - not-present page [ 345.388756] PGD 162a2b067 P4D 162a2b067 PUD 1633eb067 PMD 0 [ 345.389625] Oops: 0000 [#1] SMP NOPTI [ 345.390206] CPU: 15 PID: 4100 Comm: nvme Tainted: G OE 5.11.0-rc5blk+ #141 [ 345.391377] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba52764 [ 345.393074] RIP: 0010:bio_associate_blkg_from_css.cold.47+0x58/0x21f [ 345.396362] RSP: 0018:ffffc90000dbbce8 EFLAGS: 00010246 [ 345.397078] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 [ 345.398114] RDX: 0000000000000000 RSI: ffff888813be91f0 RDI: ffff888813be91f8 [ 345.399039] RBP: ffffc90000dbbd30 R08: 0000000000000001 R09: 0000000000000001 [ 345.399950] R10: 0000000064c66670 R11: 00000000ef955201 R12: ffff888812d32800 [ 345.401031] R13: 0000000000000000 R14: ffff888113e51540 R15: ffff888113e51540 [ 345.401976] FS: 00007f3747f1d780(0000) GS:ffff888813a00000(0000) knlGS:0000000000000000 [ 345.402997] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 345.403737] CR2: 0000000000000690 CR3: 000000081a4bc000 CR4: 00000000003506e0 [ 345.404685] Call Trace: [ 345.405031] bio_associate_blkg+0x71/0x1c0 [ 345.405649] nvme_submit_user_cmd+0x1aa/0x38e [nvme_core] [ 345.406348] nvme_user_cmd.isra.73.cold.98+0x54/0x92 [nvme_core] [ 345.407117] nvme_ioctl+0x226/0x260 [nvme_core] [ 345.407707] blkdev_ioctl+0x1c8/0x2b0 [ 345.408183] block_ioctl+0x3f/0x50 [ 345.408627] __x64_sys_ioctl+0x84/0xc0 [ 345.409117] do_syscall_64+0x33/0x40 [ 345.409592] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 345.410233] RIP: 0033:0x7f3747632107 [ 345.413125] RSP: 002b:00007ffe461b6648 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 [ 345.414086] RAX: ffffffffffffffda RBX: 00000000007b7fd0 RCX: 00007f3747632107 [ 345.414998] RDX: 00007ffe461b6650 RSI: 00000000c0484e41 RDI: 0000000000000004 [ 345.415966] RBP: 0000000000000004 R08: 00000000007b7fe8 R09: 00000000007b9080 [ 345.416883] R10: 00007ffe461b62c0 R11: 0000000000000206 R12: 00000000007b7fd0 [ 345.417808] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000 Add a NULL check before we set the bdev for bio. This issue is found on block/for-next tree. Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio") Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* mm: only make map_swap_entry available for CONFIG_HIBERNATIONJens Axboe2021-01-271-1/+5
| | | | | | | | | | | | | | Current tree spews this on compile: mm/swapfile.c:2290:17: warning: ‘map_swap_entry’ defined but not used [-Wunused-function] 2290 | static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev) | ^~~~~~~~~~~~~~ if !CONFIG_HIBERNATION, as we don't use the function unless we have that config option set. Fixes: 48d15436fde6 ("mm: remove get_swap_bio") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* mm: remove get_swap_bioChristoph Hellwig2021-01-273-43/+13
| | | | | | | | | | | | | Just reuse the block_device and sector from the swap_info structure, just as used by the SWP_SYNCHRONOUS path. Also remove the checks for NULL returns from bio_alloc as that can't happen for sleeping allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nilfs2: remove cruft in nilfs_alloc_seg_bioChristoph Hellwig2021-01-271-4/+0
| | | | | | | | | | bio_alloc never returns NULL when it can sleep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nfs/blocklayout: remove cruft in bl_alloc_init_bioChristoph Hellwig2021-01-271-5/+0
| | | | | | | | | | bio_alloc never returns NULL when it can sleep. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md/raid6: refactor raid5_read_one_chunkChristoph Hellwig2021-01-271-63/+45
| | | | | | | | | | | | Refactor raid5_read_one_chunk so that all simple checks are done before allocating the bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: remove md_bio_alloc_syncChristoph Hellwig2021-01-271-9/+1
| | | | | | | | | | | | | md_bio_alloc_sync is never called with a NULL mddev, and ->sync_set is initialized in md_run, so it always must be initialized as well. Just open code the remaining call to bio_alloc_bioset. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: simplify sync_page_ioChristoph Hellwig2021-01-271-13/+13
| | | | | | | | | | | Use an on-stack bio and biovec for the single page synchronous I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* md: remove bio_alloc_mddevChristoph Hellwig2021-01-274-15/+3
| | | | | | | | | | | | | bio_alloc_mddev is never called with a NULL mddev, and ->bio_set is initialized in md_run, so it always must be initialized as well. Just open code the remaining call to bio_alloc_bioset. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Song Liu <song@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: remove drbd_req_make_private_bioChristoph Hellwig2021-01-273-14/+8
| | | | | | | | | | | | Open code drbd_req_make_private_bio in the two callers to prepare for further changes. Also don't bother to initialize bi_next as the bio code already does that that. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: remove bio_alloc_drbdChristoph Hellwig2021-01-274-17/+2
| | | | | | | | | | | | Given that drbd_md_io_bio_set is initialized during module initialization and the module fails to load if the initialization fails there is no need to fall back to plain bio_alloc. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* f2fs: remove FAULT_ALLOC_BIOChristoph Hellwig2021-01-274-28/+4
| | | | | | | | | | | Sleeping bio allocations do not fail, which means that injecting an error into sleeping bio allocations is a little silly. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* f2fs: use blkdev_issue_flush in __submit_flush_waitChristoph Hellwig2021-01-273-13/+3
| | | | | | | | | | Use the blkdev_issue_flush helper instead of duplicating it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* dm-clone: use blkdev_issue_flush in commit_metadataChristoph Hellwig2021-01-271-13/+1
| | | | | | | | | | Use blkdev_issue_flush instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: use an on-stack bio in blkdev_issue_flushChristoph Hellwig2021-01-2723-38/+33
| | | | | | | | | | There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: split bio_kmalloc from bio_alloc_biosetChristoph Hellwig2021-01-272-87/+86
| | | | | | | | | | | bio_kmalloc shares almost no logic with the bio_set based fast path in bio_alloc_bioset. Split it into an entirely separate implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-crypto: use bio_kmalloc in blk_crypto_clone_bioChristoph Hellwig2021-01-271-1/+1
| | | | | | | | | | | Use bio_kmalloc instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* btrfs: use bio_kmalloc in __alloc_deviceChristoph Hellwig2021-01-271-1/+1
| | | | | | | | | | | Use bio_kmalloc instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* zonefs: use bio_alloc in zonefs_file_dio_appendChristoph Hellwig2021-01-271-1/+1
| | | | | | | | | | Use bio_alloc instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bfq: Use only idle IO periods for think time calculationsJan Kara2021-01-271-1/+9
| | | | | | | | | | | | | | | | | Currently whenever bfq queue has a request queued we add now - last_completion_time to the think time statistics. This is however misleading in case the process is able to submit several requests in parallel because e.g. if the queue has request completed at time T0 and then queues new requests at times T1, T2, then we will add T1-T0 and T2-T0 to think time statistics which just doesn't make any sence (the queue's think time is penalized by the queue being able to submit more IO). So add to think time statistics only time intervals when the queue had no IO pending. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@linaro.org> [axboe: fix whitespace on empty line] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bfq: Use 'ttime' local variableJan Kara2021-01-271-1/+1
| | | | | | | | Use local variable 'ttime' instead of dereferencing bfqq. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bfq: Avoid false bfq queue mergingJan Kara2021-01-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bfq_setup_cooperator() uses bfqd->in_serv_last_pos so detect whether it makes sense to merge current bfq queue with the in-service queue. However if the in-service queue is freshly scheduled and didn't dispatch any requests yet, bfqd->in_serv_last_pos is stale and contains value from the previously scheduled bfq queue which can thus result in a bogus decision that the two queues should be merged. This bug can be observed for example with the following fio jobfile: [global] direct=0 ioengine=sync invalidate=1 size=1g rw=read [reader] numjobs=4 directory=/mnt where the 4 processes will end up in the one shared bfq queue although they do IO to physically very distant files (for some reason I was able to observe this only with slice_idle=1ms setting). Fix the problem by invalidating bfqd->in_serv_last_pos when switching in-service queue. Fixes: 058fdecc6de7 ("block, bfq: fix in-service-queue check for queue merging") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blkcg: delete redundant get/put operations for queueChunguang Xu2021-01-261-5/+8
| | | | | | | | | When calling blkcg_schedule_throttle(), for the same queue, redundant get/put operations can be removed. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: unexport truncate_bdev_rangeChristoph Hellwig2021-01-262-8/+2
| | | | | | | | truncate_bdev_range is only used in always built-in block layer code, so remove the export and the !CONFIG_BLOCK stub. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk: wbt: remove unused parameter from wbt_should_throttleLei Chen2021-01-261-2/+2
| | | | | | | | The first parameter rwb is not used for this function. So just remove it. Signed-off-by: Lei Chen <lennychen@tencent.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bdev: Do not return EBUSY if bdev discard races with writeJan Kara2021-01-261-6/+4
| | | | | | | | | | | | | | | | | | | | | | blkdev_fallocate() tries to detect whether a discard raced with an overlapping write by calling invalidate_inode_pages2_range(). However this check can give both false negatives (when writing using direct IO or when writeback already writes out the written pagecache range) and false positives (when write is not actually overlapping but ends in the same page when blocksize < pagesize). This actually causes issues for qemu which is getting confused by EBUSY errors. Fix the problem by removing this conflicting write detection since it is inherently racy and thus of little use anyway. Reported-by: Maxim Levitsky <mlevitsk@redhat.com> CC: "Darrick J. Wong" <darrick.wong@oracle.com> Link: https://lore.kernel.org/qemu-devel/20201111153913.41840-1-mlevitsk@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: inherit BIO_REMAPPED when cloning biosChristoph Hellwig2021-01-263-0/+6
| | | | | | | | | | Cloned bios are can be used to on the same device, in which case we need to inherit the BIO_REMAPPED flag to avoid a double partition remap. When the cloned bios are used on another device, bio_set_dev will clear the flag. Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bcache: use bio_set_dev to assign ->bi_bdevChristoph Hellwig2021-01-261-1/+1
| | | | | | | | Always use the bio_set_dev helper to assign ->bi_bdev to make sure other state related to the device is uptodate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme: use bio_set_dev to assign ->bi_bdevChristoph Hellwig2021-01-263-4/+4
| | | | | | | | Always use the bio_set_dev helper to assign ->bi_bdev to make sure other state related to the device is uptodate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* bfq: bfq_check_waker() should be staticJens Axboe2021-01-261-1/+2
| | | | | | | | It's only used in the same file, mark is appropriately static. Fixes: 71217df39dc6 ("block, bfq: make waker-queue detection more robust") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block, bfq: make waker-queue detection more robustPaolo Valente2021-01-252-110/+108
| | | | | | | | | | | | In the presence of many parallel I/O flows, the detection of waker bfq_queues suffers from false positives. This commits addresses this issue by making the filtering of actual wakers more selective. In more detail, a candidate waker must be found to meet waker requirements three times before being promoted to actual waker. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block, bfq: save also injection state on queue mergingPaolo Valente2021-01-252-0/+13
| | | | | | | | | | To prevent injection information from being lost on bfq_queue merging, also the amount of service that a bfq_queue receives must be saved and restored when the bfq_queue is merged and split, respectively. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block, bfq: save also weight-raised service on queue mergingPaolo Valente2021-01-252-0/+3
| | | | | | | | | | To prevent weight-raising information from being lost on bfq_queue merging, also the amount of service that a bfq_queue receives must be saved and restored when the bfq_queue is merged and split, respectively. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block, bfq: fix switch back from soft-rt weitgh-raisingPaolo Valente2021-01-251-2/+20
| | | | | | | | | | | | | | A bfq_queue may happen to be deemed as soft real-time while it is still enjoying interactive weight-raising. If this happens because of a false positive, then the bfq_queue is likely to loose its soft real-time status soon. Upon losing such a status, the bfq_queue must get back its interactive weight-raising, if its interactive period is not over yet. But this case is not handled. This commit corrects this error. Tested-by: Jan Kara <jack@suse.cz> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>