summaryrefslogtreecommitdiffstats
path: root/drivers/block (follow)
Commit message (Collapse)AuthorAgeFilesLines
* nbd: pass nbd_sock to nbd_read_reply() instead of indexLi Nan2023-11-211-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a socket is processing ioctl 'NBD_SET_SOCK', config->socks might be krealloc in nbd_add_socket(), and a garbage request is received now, a UAF may occurs. T1 nbd_ioctl __nbd_ioctl nbd_add_socket blk_mq_freeze_queue T2 recv_work nbd_read_reply sock_xmit krealloc config->socks def config->socks Pass nbd_sock to nbd_read_reply(). And introduce a new function sock_xmit_recv(), which differs from sock_xmit only in the way it get socket. ================================================================== BUG: KASAN: use-after-free in sock_xmit+0x525/0x550 Read of size 8 at addr ffff8880188ec428 by task kworker/u12:1/18779 Workqueue: knbd4-recv recv_work Call Trace: __dump_stack dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 sock_xmit+0x525/0x550 nbd_read_reply+0xfe/0x2c0 recv_work+0x1c2/0x750 process_one_work+0x6b6/0xf10 worker_thread+0xdd/0xd80 kthread+0x30a/0x410 ret_from_fork+0x22/0x30 Allocated by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track set_alloc_info __kasan_kmalloc __kasan_kmalloc.constprop.0+0xf0/0x130 slab_post_alloc_hook slab_alloc_node slab_alloc __kmalloc_track_caller+0x157/0x550 __do_krealloc krealloc+0x37/0xb0 nbd_add_socket +0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Freed by task 18784: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x40 __kasan_slab_free.part.0+0x13f/0x1b0 slab_free_hook slab_free_freelist_hook slab_free kfree+0xcb/0x6c0 krealloc+0x56/0xb0 nbd_add_socket+0x2d3/0x880 __nbd_ioctl nbd_ioctl+0x584/0x8e0 __blkdev_driver_ioctl blkdev_ioctl+0x2a0/0x6e0 block_ioctl+0xee/0x130 vfs_ioctl __do_sys_ioctl __se_sys_ioctl+0x138/0x190 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x61/0xc6 Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230911023308.3467802-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block/null_blk: Fix double blk_mq_start_request() warningChengming Zhou2023-11-201-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by: <syzbot+fcc47ba2476570cbbeb0@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/0000000000000e6aac06098aee0c@google.com/ Fixes: d78bfa1346ab ("block/null_blk: add queue_rqs() support") Suggested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20231120032521.1012037-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nbd: fix null-ptr-dereference while accessing 'nbd->config'Li Nan2023-11-201-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | Memory reordering may occur in nbd_genl_connect(), causing config_refs to be set to 1 while nbd->config is still empty. Opening nbd at this time will cause null-ptr-dereference. T1 T2 nbd_open nbd_get_config_unlocked nbd_genl_connect nbd_alloc_and_init_config //memory reordered refcount_set(&nbd->config_refs, 1) // 2 nbd->config ->null point nbd->config = config // 1 Fix it by adding smp barrier to guarantee the execution sequence. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-4-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nbd: factor out a helper to get nbd_config without holding 'config_lock'Li Nan2023-11-201-8/+19
| | | | | | | | | | There are no functional changes, just to make code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-3-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nbd: fold nbd config initialization into nbd_alloc_config()Li Nan2023-11-201-22/+19
| | | | | | | | | | There are no functional changes, make the code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231116162316.1740402-2-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nbd: fix uaf in nbd_openLi Lingfeng2023-11-071-2/+9
| | | | | | | | | | | | | | | | Commit 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") cleans up disk by blk_cleanup_disk() and it won't set disk->private_data as NULL as before. UAF may be triggered in nbd_open() if someone tries to open nbd device right after nbd_put() since nbd has been free in nbd_dev_remove(). Fix this by implementing ->free_disk and free private data in it. Fixes: 4af5f2e03013 ("nbd: use blk_mq_alloc_disk and blk_cleanup_disk") Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20231107103435.2074904-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2023-11-051-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "vhost,virtio,vdpa: features, fixes, cleanups. vdpa/mlx5: - VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK - new maintainer vdpa: - support for vq descriptor mappings - decouple reset of iotlb mapping from device reset and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vdpa_sim: implement .reset_map support vdpa/mlx5: implement .reset_map driver op vhost-vdpa: clean iotlb map during reset for older userspace vdpa: introduce .compat_reset operation callback vhost-vdpa: introduce IOTLB_PERSIST backend feature bit vhost-vdpa: reset vendor specific mapping to initial state in .release vdpa: introduce .reset_map operation callback virtio_pci: add check for common cfg size virtio-blk: fix implicit overflow on virtio_max_dma_size virtio_pci: add build offset check for the new common cfg items virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit vduse: make vduse_class constant vhost-scsi: Spelling s/preceeding/preceding/g virtio: kdoc for struct virtio_pci_modern_device vdpa: Update sysfs ABI documentation MAINTAINERS: Add myself as mlx5_vdpa driver virtio-balloon: correct the comment of virtballoon_migratepage() mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa/mlx5: Update cvq iotlb mapping on ASID change vdpa/mlx5: Make iotlb helper functions more generic ...
| * virtio-blk: fix implicit overflow on virtio_max_dma_sizezhenwei pi2023-11-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | The following codes have an implicit conversion from size_t to u32: (u32)max_size = (size_t)virtio_max_dma_size(vdev); This may lead overflow, Ex (size_t)4G -> (u32)0. Once virtio_max_dma_size() has a larger size than U32_MAX, use U32_MAX instead. Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20230904061045.510460-1-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | Merge tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linuxLinus Torvalds2023-11-014-132/+237
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - Improvements to the queue_rqs() support, and adding null_blk support for that as well (Chengming) - Series improving badblocks support (Coly) - Key store support for sed-opal (Greg) - IBM partition string handling improvements (Jan) - Make number of ublk devices supported configurable (Mike) - Cancelation improvements for ublk (Ming) - MD pull requests via Song: - Handle timeout in md-cluster, by Denis Plotnikov - Cleanup pers->prepare_suspend, by Yu Kuai - Rewrite mddev_suspend(), by Yu Kuai - Simplify md_seq_ops, by Yu Kuai - Reduce unnecessary locking array_state_store(), by Mariusz Tkaczyk - Make rdev add/remove independent from daemon thread, by Yu Kuai - Refactor code around quiesce() and mddev_suspend(), by Yu Kuai - NVMe pull request via Keith: - nvme-auth updates (Mark) - nvme-tcp tls (Hannes) - nvme-fc annotaions (Kees) - Misc cleanups and improvements (Jiapeng, Joel) * tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux: (95 commits) block: ublk_drv: Remove unused function md: cleanup pers->prepare_suspend() nvme-auth: allow mixing of secret and hash lengths nvme-auth: use transformed key size to create resp nvme-auth: alloc nvme_dhchap_key as single buffer nvmet-tcp: use 'spin_lock_bh' for state_lock() powerpc/pseries: PLPKS SED Opal keystore support block: sed-opal: keystore access for SED Opal keys block:sed-opal: SED Opal keystore ublk: simplify aborting request ublk: replace monitor with cancelable uring_cmd ublk: quiesce request queue when aborting queue ublk: rename mm_lock as lock ublk: move ublk_cancel_dev() out of ub->mutex ublk: make sure io cmd handled in submitter task context ublk: don't get ublk device reference in ublk_abort_queue() ublk: Make ublks_max configurable ublk: Limit dev_id/ub_number values md-cluster: check for timeout while a new disk adding nvme: rework NVME_AUTH Kconfig selection ...
| * | block: ublk_drv: Remove unused functionJiapeng Chong2023-10-191-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function are defined in the ublk_drv.c file, but not called elsewhere, so delete the unused function. drivers/block/ublk_drv.c:1211:20: warning: unused function 'ublk_abort_io_cmds'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6938 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Fixes: b4e1353f4651 ("ublk: simplify aborting request") Reviewed-by: Ming Lei <ming.lei@rehdat.com> Link: https://lore.kernel.org/r/20231019030444.53680-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: simplify aborting requestMing Lei2023-10-171-30/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now ublk_abort_queue() is run exclusively with ublk_queue_rq() and the ubq_daemon task, so simplify aborting request: - set UBLK_IO_FLAG_ABORTED in ublk_abort_queue() just for aborting this request - abort request in ublk_queue_rq() if ubq->canceling is set Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: replace monitor with cancelable uring_cmdMing Lei2023-10-171-89/+119
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Monitor work actually introduces one extra context for handling abort, this way is easy to cause race, and also introduce extra delay when handling aborting. Now we start to support cancelable uring_cmd, so use it instead: 1) this cancel callback is either run from the uring cmd submission task context or called after the io_uring context is exit, so the callback is run exclusively with ublk_ch_uring_cmd() and __ublk_rq_task_work(). 2) the previous patch freezes request queue when calling ublk_abort_queue(), which is now completely exclusive with ublk_queue_rq() and ublk_ch_uring_cmd()/__ublk_rq_task_work(). 3) in timeout handler, if all IOs are in-flight, then all uring commands are completed, uring command canceling can't help us to provide forward progress any more, so call ublk_abort_requests() in timeout handler. This way simplifies aborting queue, and is helpful for adding new feature, such as, relax the limit of using single task for handling one queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: quiesce request queue when aborting queueMing Lei2023-10-171-9/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far aborting queue ends request when the ubq daemon is exiting, and it can be run concurrently with ublk_queue_rq(), this way is fragile and we depend on the tricky usage of UBLK_IO_FLAG_ABORTED for avoiding such race. Quiesce queue when aborting queue, and the two code paths can be run completely exclusively, then it becomes easier to add new ublk feature, such as relaxing single same task limit for each queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: rename mm_lock as lockMing Lei2023-10-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename mm_lock field of ublk_device as lock, so that this lock can be reused for protecting access of ub->ub_disk, which will be used for simplifying ublk_abort_queue() by quiesce queue in next patch. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: move ublk_cancel_dev() out of ub->mutexMing Lei2023-10-171-17/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ublk_cancel_dev() just calls ublk_cancel_queue() to cancel all pending io commands after ublk request queue is idle. The only protection is just the read & write of ubq->nr_io_ready and avoid duplicated command cancel, so add one per-queue lock with cancel flag for providing this protection, meantime move ublk_cancel_dev() out of ub->mutex. Then we needn't to call io_uring_cmd_complete_in_task() to cancel pending command. And the same cancel logic will be re-used for cancelable uring command. This patch basically reverts commit ac5902f84bb5 ("ublk: fix AB-BA lockdep warning"). Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: make sure io cmd handled in submitter task contextMing Lei2023-10-171-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In well-done ublk server implementation, ublk io command won't be linked into any link chain. Meantime they are always handled in no-wait style, so basically io cmd is always handled in submitter task context. However, the server may set IOSQE_ASYNC, or io command is linked to one chain mistakenly, then we may still run into io-wq context and ctx->uring_lock isn't held. So in case of IO_URING_F_UNLOCKED, schedule this command by io_uring_cmd_complete_in_task to force running it in submitter task. Then ublk_ch_uring_cmd_local() is guaranteed to run with context uring_lock held, and we needn't to worry about sync among submission code path any more. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: don't get ublk device reference in ublk_abort_queue()Ming Lei2023-10-171-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ublk_abort_queue() is called in ublk_daemon_monitor_work(), in which it is guaranteed that the device is live because monitor work is canceled when removing device, so no need to get the device reference. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231009093324.957829-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: Make ublks_max configurableMike Christie2023-10-171-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are converting tcmu applications to ublk, but have systems with up to 1k devices. This patch allows us to configure the ublks_max from userspace with the ublks_max modparam. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231012150600.6198-3-michael.christie@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: Limit dev_id/ub_number valuesMike Christie2023-10-171-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_id/ub_number is used for the ublk dev's char device's minor number so it has to fit into MINORMASK. This patch adds checks to prevent userspace from passing a number that's too large and limits what can be allocated by the ublk_index_idr for the case where userspace has the kernel allocate the dev_id/ub_number. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231012150600.6198-2-michael.christie@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | aoe: replace strncpy with strscpyJustin Stitt2023-10-041-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `strncpy` is deprecated for use on NUL-terminated destination strings [1]. `aoe_iflist` is expected to be NUL-terminated which is evident by its use with string apis later on like `strspn`: | p = aoe_iflist + strspn(aoe_iflist, WHITESPACE); It also seems `aoe_iflist` does not need to be NUL-padded which means `strscpy` [2] is a suitable replacement due to the fact that it guarantees NUL-termination on the destination buffer while not unnecessarily NUL-padding. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Xu Panda <xu.panda@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-aoe-aoenet-c-v2-1-3d5d158410e9@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | null_blk: replace strncpy with strscpyJustin Stitt2023-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `strncpy` is deprecated for use on NUL-terminated destination strings [1]. We should favor a more robust and less ambiguous interface. We expect that both `nullb->disk_name` and `disk->disk_name` be NUL-terminated: | snprintf(nullb->disk_name, sizeof(nullb->disk_name), | "%s", config_item_name(&dev->group.cg_item)); ... | pr_info("disk %s created\n", nullb->disk_name); It seems like NUL-padding may be required due to __assign_disk_name() utilizing a memcpy as opposed to a `str*cpy` api. | static inline void __assign_disk_name(char *name, struct gendisk *disk) | { | if (disk) | memcpy(name, disk->disk_name, DISK_NAME_LEN); | else | memset(name, 0, DISK_NAME_LEN); | } Then we go and print it with `__print_disk_name` which wraps `nullb_trace_disk_name()`. | #define __print_disk_name(name) nullb_trace_disk_name(p, name) This function obviously expects a NUL-terminated string. | const char *nullb_trace_disk_name(struct trace_seq *p, char *name) | { | const char *ret = trace_seq_buffer_ptr(p); | | if (name && *name) | trace_seq_printf(p, "disk=%s, ", name); | trace_seq_putc(p, 0); | | return ret; | } >From the above, we need both 1) a NUL-terminated string and 2) a NUL-padded string. So, let's use strscpy_pad() as per Kees' suggestion from v1. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Cc: Kees Cook <keescook@chromium.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230919-strncpy-drivers-block-null_blk-main-c-v3-1-10cf0a87a2c3@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/null_blk: add queue_rqs() supportChengming Zhou2023-09-221-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add batched mq_ops.queue_rqs() support in null_blk for testing. The implementation is much easy since null_blk doesn't have commit_rqs(). We simply handle each request one by one, if errors are encountered, leave them in the passed in list and return back. There is about 3.6% improvement in IOPS of fio/t/io_uring on null_blk with hw_queue_depth=256 on my test VM, from 1.09M to 1.13M. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-6-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | blk-mq: update driver tags request table when start requestChengming Zhou2023-09-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we update driver tags request table in blk_mq_get_driver_tag(), so the driver that support queue_rqs() have to update that inflight table by itself. Move it to blk_mq_start_request(), which is a better place where we setup the deadline for request timeout check. And it's just where the request becomes inflight. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-5-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'hardening-v6.7-rc1' of ↵Linus Torvalds2023-10-311-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One of the more voluminous set of changes is for adding the new __counted_by annotation[1] to gain run-time bounds checking of dynamically sized arrays with UBSan. - Add LKDTM test for stuck CPUs (Mark Rutland) - Improve LKDTM selftest behavior under UBSan (Ricardo Cañuelo) - Refactor more 1-element arrays into flexible arrays (Gustavo A. R. Silva) - Analyze and replace strlcpy and strncpy uses (Justin Stitt, Azeem Shaikh) - Convert group_info.usage to refcount_t (Elena Reshetova) - Add __counted_by annotations (Kees Cook, Gustavo A. R. Silva) - Add Kconfig fragment for basic hardening options (Kees Cook, Lukas Bulwahn) - Fix randstruct GCC plugin performance mode to stay in groups (Kees Cook) - Fix strtomem() compile-time check for small sources (Kees Cook)" * tag 'hardening-v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (56 commits) hwmon: (acpi_power_meter) replace open-coded kmemdup_nul reset: Annotate struct reset_control_array with __counted_by kexec: Annotate struct crash_mem with __counted_by virtio_console: Annotate struct port_buffer with __counted_by ima: Add __counted_by for struct modsig and use struct_size() MAINTAINERS: Include stackleak paths in hardening entry string: Adjust strtomem() logic to allow for smaller sources hardening: x86: drop reference to removed config AMD_IOMMU_V2 randstruct: Fix gcc-plugin performance mode to stay in group mailbox: zynqmp: Annotate struct zynqmp_ipi_pdata with __counted_by drivers: thermal: tsens: Annotate struct tsens_priv with __counted_by irqchip/imx-intmux: Annotate struct intmux_data with __counted_by KVM: Annotate struct kvm_irq_routing_table with __counted_by virt: acrn: Annotate struct vm_memory_region_batch with __counted_by hwmon: Annotate struct gsc_hwmon_platform_data with __counted_by sparc: Annotate struct cpuinfo_tree with __counted_by isdn: kcapi: replace deprecated strncpy with strscpy_pad isdn: replace deprecated strncpy with strscpy NFS/flexfiles: Annotate struct nfs4_ff_layout_segment with __counted_by nfs41: Annotate struct nfs4_file_layout_dsaddr with __counted_by ...
| * | | drbd: Annotate struct fifo_buffer with __counted_byKees Cook2023-10-021-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct fifo_buffer. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: drbd-dev@lists.linbit.com Cc: linux-block@vger.kernel.org Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20230915200316.never.707-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
* | | block: move bdev_mark_dead out of disk_check_media_changeChristoph Hellwig2023-10-282-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disk_check_media_change is mostly called from ->open where it makes little sense to mark the file system on the device as dead, as we are just opening it. So instead of calling bdev_mark_dead from disk_check_media_change move it into the few callers that are not in an open instance. This avoid calling into bdev_mark_dead and thus taking s_umount with open_mutex held. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20231017184823.1383356-4-hch@lst.de Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
* | | zram: Convert to use bdev_open_by_dev()Jan Kara2023-10-282-18/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert zram to use bdev_open_by_dev() and pass the handle around. CC: Minchan Kim <minchan@kernel.org> CC: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-8-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* | | xen/blkback: Convert to bdev_open_by_dev()Jan Kara2023-10-283-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert xen/blkback to use bdev_open_by_dev() and pass the handle around. CC: xen-devel@lists.xenproject.org Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-7-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* | | rnbd-srv: Convert to use bdev_open_by_path()Jan Kara2023-10-282-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert rnbd-srv to use bdev_open_by_path() and pass the handle around. CC: Jack Wang <jinpu.wang@ionos.com> CC: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Acked-by: "Md. Haris Iqbal" <haris.iqbal@ionos.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-6-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* | | pktcdvd: Convert to bdev_open_by_dev()Jan Kara2023-10-281-35/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert pktcdvd to use bdev_open_by_dev(). Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-5-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* | | drdb: Convert to use bdev_open_by_path()Jan Kara2023-10-282-33/+34
| |/ |/| | | | | | | | | | | | | | | | | | | Convert drdb to use bdev_open_by_path(). CC: drbd-dev@lists.linbit.com Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-4-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
* | Merge tag 'block-6.6-2023-10-06' of git://git.kernel.dk/linuxLinus Torvalds2023-10-071-1/+2
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "Just two minor fixes, for nbd and md" * tag 'block-6.6-2023-10-06' of git://git.kernel.dk/linux: nbd: don't call blk_mark_disk_dead nbd_clear_sock_ioctl md/raid5: release batch_last before waiting for another stripe_head
| * | nbd: don't call blk_mark_disk_dead nbd_clear_sock_ioctlChristoph Hellwig2023-10-041-1/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_mark_disk_dead is the proper interface to shut down a block device, but it also makes the disk unusable forever. nbd_clear_sock_ioctl on the other hand wants to shut down the file system, but allow the block device to be used again when when connected to another socket. Switch nbd to use disk_force_media_change and nbd_bdev_reset to go back to a behavior of the old __invalidate_device call, with the added benefit of incrementing the device generation as there is no guarantee the old content comes back when the device is reconnected. Reported-by: Samuel Holland <samuel.holland@sifive.com> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 0c1c9a27ce90 ("nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl") Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Samuel Holland <samuel.holland@sifive.com> Link: https://lore.kernel.org/r/20231003153106.1331363-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | rbd: take header_rwsem in rbd_dev_refresh() only when updatingIlya Dryomov2023-09-261-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rbd_dev_refresh() has been holding header_rwsem across header and parent info read-in unnecessarily for ages. With commit 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held"), the potential for deadlocks became much more real owning to a) header_rwsem now nesting inside lock_rwsem and b) rw_semaphores not allowing new readers after a writer is registered. For example, assuming that I/O request 1, I/O request 2 and header read-in request all target the same OSD: 1. I/O request 1 comes in and gets submitted 2. watch error occurs 3. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and releases lock_rwsem 4. after reestablishing the watch, rbd_reregister_watch() calls rbd_dev_refresh() which takes header_rwsem for write and submits a header read-in request 5. I/O request 2 comes in: after taking lock_rwsem for read in __rbd_img_handle_request(), it blocks trying to take header_rwsem for read in rbd_img_object_requests() 6. another watch error occurs 7. rbd_watch_errcb() blocks trying to take lock_rwsem for write 8. I/O request 1 completion is received by the messenger but can't be processed because lock_rwsem won't be granted anymore 9. header read-in request completion can't be received, let alone processed, because the messenger is stranded Change rbd_dev_refresh() to take header_rwsem only for actually updating rbd_dev->header. Header and parent info read-in don't need any locking. Cc: stable@vger.kernel.org # 0b035401c570: rbd: move rbd_dev_refresh() definition Cc: stable@vger.kernel.org # 510a7330c82a: rbd: decouple header read-in from updating rbd_dev->header Cc: stable@vger.kernel.org # c10311776f0a: rbd: decouple parent info read-in from updating rbd_dev Cc: stable@vger.kernel.org Fixes: 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* | rbd: decouple parent info read-in from updating rbd_devIlya Dryomov2023-09-261-62/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike header read-in, parent info read-in is already decoupled in get_parent_info(), but it's buried in rbd_dev_v2_parent_info() along with the processing logic. Separate the initial read-in and update read-in logic into rbd_dev_setup_parent() and rbd_dev_update_parent() respectively and have rbd_dev_v2_parent_info() just populate struct parent_image_info (i.e. what get_parent_info() did). Some existing QoI issues, like flatten of a standalone clone being disregarded on refresh, remain. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* | rbd: decouple header read-in from updating rbd_dev->headerIlya Dryomov2023-09-261-92/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make rbd_dev_header_info() populate a passed struct rbd_image_header instead of rbd_dev->header and introduce rbd_dev_update_header() for updating mutable fields in rbd_dev->header upon refresh. The initial read-in of both mutable and immutable fields in rbd_dev_image_probe() passes in rbd_dev->header so no update step is required there. rbd_init_layout() is now called directly from rbd_dev_image_probe() instead of individually in format 1 and format 2 implementations. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* | rbd: move rbd_dev_refresh() definitionIlya Dryomov2023-09-261-35/+33
|/ | | | | | | | | Move rbd_dev_refresh() definition further down to avoid having to move struct parent_image_info definition in the next commit. This spares some forward declarations too. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
* Merge tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linuxLinus Torvalds2023-09-092-3/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: - Fix null_blk polled IO timeout handling (Chengming) - Regression fix for swapped arguments in drbd bvec_set_page() (Christoph) - String length handling fix for s390 dasd (Heiko) - Fixes for blk-throttle accounting (Yu) - Fix page pinning issue for same page segments (Christoph) - Remove redundant file_remove_privs() call (Christoph) - Fix a regression in partition handling for devices not supporting partitions (Li) * tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux: drbd: swap bvec_set_page len and offset block: fix pin count management when merging same-page segments null_blk: fix poll request timeout handling s390/dasd: fix string length handling block: don't add or resize partition on the disk with GENHD_FL_NO_PART block: remove the call to file_remove_privs in blkdev_write_iter blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice() blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice() blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative blk-throttle: print signed value 'carryover_bytes/ios' for user
| * drbd: swap bvec_set_page len and offsetChristoph Böhmwalder2023-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bvec_set_page has the following signature: static inline void bvec_set_page(struct bio_vec *bv, struct page *page, unsigned int len, unsigned int offset) However, the usage in DRBD swaps the len and offset parameters. This leads to a bvec with length=0 instead of the intended length=4096, which causes sock_sendmsg to return -EIO. This leaves DRBD unable to transmit any pages and thus completely broken. Swapping the parameters fixes the regression. Fixes: eeac7405c735 ("drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()") Reported-by: Serguei Ivantsov <manowar@gsc-game.com> Link: https://lore.kernel.org/regressions/CAKH+VT3YLmAn0Y8=q37UTDShqxDLsqPcQ4hBMzY7HPn7zNx+RQ@mail.gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://lore.kernel.org/r/20230906133034.948817-1-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * null_blk: fix poll request timeout handlingChengming Zhou2023-09-011-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing io_uring benchmark on /dev/nullb0, it's easy to crash the kernel if poll requests timeout triggered, as reported by David. [1] BUG: kernel NULL pointer dereference, address: 0000000000000008 Workqueue: kblockd blk_mq_timeout_work RIP: 0010:null_timeout_rq+0x4e/0x91 Call Trace: ? null_timeout_rq+0x4e/0x91 blk_mq_handle_expired+0x31/0x4b bt_iter+0x68/0x84 ? bt_tags_iter+0x81/0x81 __sbitmap_for_each_set.constprop.0+0xb0/0xf2 ? __blk_mq_complete_request_remote+0xf/0xf bt_for_each+0x46/0x64 ? __blk_mq_complete_request_remote+0xf/0xf ? percpu_ref_get_many+0xc/0x2a blk_mq_queue_tag_busy_iter+0x14d/0x18e blk_mq_timeout_work+0x95/0x127 process_one_work+0x185/0x263 worker_thread+0x1b5/0x227 This is indeed a race problem between null_timeout_rq() and null_poll(). null_poll() null_timeout_rq() spin_lock(&nq->poll_lock) list_splice_init(&nq->poll_list, &list) spin_unlock(&nq->poll_lock) while (!list_empty(&list)) req = list_first_entry() list_del_init() ... blk_mq_add_to_batch() // req->rq_next = NULL spin_lock(&nq->poll_lock) // rq->queuelist->next == NULL list_del_init(&rq->queuelist) spin_unlock(&nq->poll_lock) Fix these problems by setting requests state to MQ_RQ_COMPLETE under nq->poll_lock protection, in which null_timeout_rq() can safely detect this race and early return. Note this patch just fix the kernel panic when request timeout happen. [1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/ Fixes: 0a593fbbc245 ("null_blk: poll queue support") Reported-by: David Howells <dhowells@redhat.com> Tested-by: David Howells <dhowells@redhat.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Link: https://lore.kernel.org/r/20230901120306.170520-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds2023-09-061-3/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "Mixed with some fixes and cleanups, this brings in reasonably complete fscrypt support to CephFS! The list of things which don't work with encryption should be fairly short, mostly around the edges: fallocate (not supported well in CephFS to begin with), copy_file_range (requires re-encryption), non-default striping patterns. This was a multi-year effort principally by Jeff Layton with assistance from Xiubo Li, Luís Henriques and others, including several dependant changes in the MDS, netfs helper library and fscrypt framework itself" * tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits) ceph: make num_fwd and num_retry to __u32 ceph: make members in struct ceph_mds_request_args_ext a union rbd: use list_for_each_entry() helper libceph: do not include crypto/algapi.h ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper ceph: fix updating i_truncate_pagecache_size for fscrypt ceph: wait for OSD requests' callbacks to finish when unmounting ceph: drop messages from MDS when unmounting ceph: update documentation regarding snapshot naming limitations ceph: prevent snapshot creation in encrypted locked directories ceph: add support for encrypted snapshot names ceph: invalidate pages when doing direct/sync writes ceph: plumb in decryption during reads ceph: add encryption support to writepage and writepages ceph: add read/modify/write to ceph_sync_write ceph: align data in pages in ceph_sync_write ceph: don't use special DIO path for encrypted inodes ceph: add truncate size handling support for fscrypt ceph: add object version support for sync read libceph: allow ceph_osdc_new_request to accept a multi-op read ...
| * rbd: use list_for_each_entry() helperJinjie Ruan2023-08-301-3/+1
| | | | | | | | | | | | | | | | | | | | Convert list_for_each() to list_for_each_entry() so that the tmp list_head pointer and list_entry() call are no longer needed, which can reduce a few lines of code. No functional changed. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds2023-08-303-29/+340
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
| * | ublk: zoned: support REQ_OP_ZONE_RESET_ALLMing Lei2023-08-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There isn't any reason to not support REQ_OP_ZONE_RESET_ALL given everything is actually handled in userspace, not mention it is pretty easy to support RESET_ALL. So enable REQ_OP_ZONE_RESET_ALL and let userspace handle it. Verified by 'tools/zbc_reset_zone -all /dev/ublkb0' in libzbc[1] with libublk-rs based ublk-zoned target prototype[2], follows command line for creating ublk-zoned: cargo run --example zoned -- add -1 1024 # add $dev_id $DEV_SIZE [1] https://github.com/westerndigitalcorporation/libzbc [2] https://github.com/ming1/libublk-rs/tree/zoned.v2 Cc: Niklas Cassel <Niklas.Cassel@wdc.com> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Andreas Hindborg <a.hindborg@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230810124326.321472-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | drivers/rnbd: restore sysfs interface to rnbd-clientLi Zhijian2023-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 137380c0ec40 renamed 'rnbd-client' to 'rnbd_client', this changed sysfs interface to /sys/devices/virtual/rnbd_client/ctl/map_device from /sys/devices/virtual/rnbd-client/ctl/map_device. CC: Ivan Orlov <ivan.orlov0322@gmail.com> CC: "Md. Haris Iqbal" <haris.iqbal@ionos.com> CC: Jack Wang <jinpu.wang@ionos.com> Fixes: 137380c0ec40 ("block/rnbd: make all 'class' structures const") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Link: https://lore.kernel.org/r/20230816022210.2501228-1-lizhijian@fujitsu.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: Switch to memdup_user_nul() helperRuan Jinjie2023-08-151-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use memdup_user_nul() helper instead of open-coding to simplify the code. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230815114815.1551171-1-ruanjinjie@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: fix 'warn: variable dereferenced before check 'req'' from SmatchMing Lei2023-08-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The added check of 'req_op(req) == REQ_OP_ZONE_APPEND' should have been done after the request is confirmed as valid. Actually here, the request should always been true, so add one WARN_ON_ONCE(!req), meantime move the zone_append check after checking the request. Cc: Andreas Hindborg <a.hindborg@samsung.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Fixes: 29802d7ca33b ("ublk: enable zoned storage support") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230811135216.420404-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | swim3: mark swim3_init() staticArnd Bergmann2023-08-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the module init function, which by definition is used only locally, so mark it static to avoid a warning: drivers/block/swim3.c:1280:5: error: no previous prototype for 'swim3_init' [-Werror=missing-prototypes] Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: Fix signedness bug returning warningLi Zetao2023-08-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two warnings reported by smatch: drivers/block/ublk_drv.c:445 ublk_setup_iod_zoned() warn: signedness bug returning '(-95)' drivers/block/ublk_drv.c:963 ublk_setup_iod() warn: signedness bug returning '(-5)' The type of "blk_status_t" is either be a u32 or u8, but this two functions return a negative value when not supported or failed. Use the error code of the blk module to fix these warnings. Fixes: 29802d7ca33b ("ublk: enable zoned storage support") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202308100201.TCRhgdvN-lkp@intel.com/ Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230810084836.3535322-1-lizetao1@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | ublk: enable zoned storage supportAndreas Hindborg2023-08-081-15/+316
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add zoned storage support to ublk: report_zones and operations: - REQ_OP_ZONE_OPEN - REQ_OP_ZONE_CLOSE - REQ_OP_ZONE_FINISH - REQ_OP_ZONE_RESET - REQ_OP_ZONE_APPEND The zone append feature uses the `addr` field of `struct ublksrv_io_cmd` to communicate ALBA back to the kernel. Therefore ublk must be used with the user copy feature (UBLK_F_USER_COPY) for zoned storage support to be available. Without this feature, ublk will not allow zoned storage support. Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230804114610.179530-4-nmi@metaspace.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>