block: Do not pull requests from the scheduler when we cannot dispatch them

Provided the device driver does not implement dispatch budget accounting (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls requests from the IO scheduler as long as it is willing to give out any. That defeats scheduling heuristics inside the scheduler by creating false impression that the device can take more IO when it in fact cannot. For example with BFQ IO scheduler on top of virtio-blk device setting blkio cgroup weight has barely any impact on observed throughput of async IO because __blk_mq_do_dispatch_sched() always sucks out all the IO queued in BFQ. BFQ first submits IO from higher weight cgroups but when that is all dispatched, it will give out IO of lower weight cgroups as well. And then we have to wait for all this IO to be dispatched to the disk (which means lot of it actually has to complete) before the IO scheduler is queried again for dispatching more requests. This completely destroys any service differentiation. So grab request tag for a request pulled out of the IO scheduler already in __blk_mq_do_dispatch_sched() and do not pull any more requests if we cannot get it because we are unlikely to be able to dispatch it. That way only single request is going to wait in the dispatch list for some tag to free. Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
author: Jan Kara <jack@suse.cz> 2021-06-03 12:47:21 +0200
committer: Jens Axboe <axboe@kernel.dk> 2021-06-03 20:01:27 +0200
commit: 613471549f366cdf4170b81ce0f99f3867ec4d16 (patch)
tree: 2c22039a2d4fa5942529f6c47178cf9072a5d9f6
parent: null_blk: Fix null pointer dereference on nullb->disk on blk_cleanup_disk call (diff)
download: linux-613471549f366cdf4170b81ce0f99f3867ec4d16.tar.xz
linux-613471549f366cdf4170b81ce0f99f3867ec4d16.zip
3 files changed, 14 insertions, 2 deletions
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 045b6878b8c5..a9182d2f8ad3 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -168,9 +168,19 @@ static int __blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 		 * in blk_mq_dispatch_rq_list().
 		 */
 		list_add_tail(&rq->queuelist, &rq_list);
+		count++;
 		if (rq->mq_hctx != hctx)
 			multi_hctxs = true;
-	} while (++count < max_dispatch);
+
+		/*
+		 * If we cannot get tag for the request, stop dequeueing
+		 * requests from the IO scheduler. We are unlikely to be able
+		 * to submit them anyway and it creates false impression for
+		 * scheduling heuristics that the device can take more IO.
+		 */
+		if (!blk_mq_get_driver_tag(rq))
+			break;
+	} while (count < max_dispatch);
 
 	if (!count) {
 		if (run_queue)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f11d4018ce2e..4261adee9964 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1104,7 +1104,7 @@ static bool __blk_mq_get_driver_tag(struct request *rq)
 	return true;
 }
 
-static bool blk_mq_get_driver_tag(struct request *rq)
+bool blk_mq_get_driver_tag(struct request *rq)
 {
 	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 556368d2c5b6..4b1ca7b7bbeb 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -260,6 +260,8 @@ static inline void blk_mq_put_driver_tag(struct request *rq)
 	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
 }
 
+bool blk_mq_get_driver_tag(struct request *rq);
+
 static inline void blk_mq_clear_mq_map(struct blk_mq_queue_map *qmap)
 {
 	int cpu;
author	Jan Kara <jack@suse.cz>	2021-06-03 12:47:21 +0200
committer	Jens Axboe <axboe@kernel.dk>	2021-06-03 20:01:27 +0200
commit	613471549f366cdf4170b81ce0f99f3867ec4d16 (patch)
tree	2c22039a2d4fa5942529f6c47178cf9072a5d9f6
parent	null_blk: Fix null pointer dereference on nullb->disk on blk_cleanup_disk call (diff)
download	linux-613471549f366cdf4170b81ce0f99f3867ec4d16.tar.xz linux-613471549f366cdf4170b81ce0f99f3867ec4d16.zip