summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJens Axboe <axboe@kernel.dk>2024-09-18 19:58:19 +0200
committerJens Axboe <axboe@kernel.dk>2024-09-19 19:56:55 +0200
commit04beb6e0e08c30c6f845f50afb7d7953603d7a6f (patch)
treecbeada242f3abebae44b5db0d3d3890a888d7542
parentio_uring/sqpoll: do the napi busy poll outside the submission block (diff)
downloadlinux-04beb6e0e08c30c6f845f50afb7d7953603d7a6f.tar.xz
linux-04beb6e0e08c30c6f845f50afb7d7953603d7a6f.zip
io_uring: check for presence of task_work rather than TIF_NOTIFY_SIGNAL
If some part of the kernel adds task_work that needs executing, in terms of signaling it'll generally use TWA_SIGNAL or TWA_RESUME. Those two directly translate to TIF_NOTIFY_SIGNAL or TIF_NOTIFY_RESUME, and can be used for a variety of use case outside of task_work. However, io_cqring_wait_schedule() only tests explicitly for TIF_NOTIFY_SIGNAL. This means it can miss if task_work got added for the task, but used a different kind of signaling mechanism (or none at all). Normally this doesn't matter as any task_work will be run once the task exits to userspace, except if: 1) The ring is setup with DEFER_TASKRUN 2) The local work item may generate normal task_work For condition 2, this can happen when closing a file and it's the final put of that file, for example. This can cause stalls where a task is waiting to make progress inside io_cqring_wait(), but there's nothing else that will wake it up. Hence change the "should we schedule or loop around" check to check for the presence of task_work explicitly, rather than just TIF_NOTIFY_SIGNAL as the mechanism. While in there, also change the ordering of what type of task_work first in terms of ordering, to both make it consistent with other task_work runs in io_uring, but also to better handle the case of defer task_work generating normal task_work, like in the above example. Reported-by: Jan Hendrik Farr <kernel@jfarr.cc> Link: https://github.com/axboe/liburing/issues/1235 Cc: stable@vger.kernel.org Fixes: 846072f16eed ("io_uring: mimimise io_cqring_wait_schedule") Signed-off-by: Jens Axboe <axboe@kernel.dk>
-rw-r--r--io_uring/io_uring.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1aca501efaf6..11c455b638a2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2461,7 +2461,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
return 1;
if (unlikely(!llist_empty(&ctx->work_llist)))
return 1;
- if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL)))
+ if (unlikely(task_work_pending(current)))
return 1;
if (unlikely(task_sigpending(current)))
return -EINTR;
@@ -2568,9 +2568,9 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
* If we got woken because of task_work being processed, run it
* now rather than let the caller do another wait loop.
*/
- io_run_task_work();
if (!llist_empty(&ctx->work_llist))
io_run_local_work(ctx, nr_wait);
+ io_run_task_work();
/*
* Non-local task_work will be run on exit to userspace, but