diff options
author | Tejun Heo <tj@kernel.org> | 2024-09-04 09:54:28 +0200 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2024-09-04 09:54:28 +0200 |
commit | 7c65ae81ea86a6ed8086e1a5651acd766187f19b (patch) | |
tree | 2f1311de3eed23d24215d77c45d5031260a65010 /tools/sched_ext | |
parent | Merge branch 'tip/sched/core' into for-6.12 (diff) | |
download | linux-7c65ae81ea86a6ed8086e1a5651acd766187f19b.tar.xz linux-7c65ae81ea86a6ed8086e1a5651acd766187f19b.zip |
sched_ext: Don't call put_prev_task_scx() before picking the next task
fd03c5b85855 ("sched: Rework pick_next_task()") changed the definition of
pick_next_task() from:
pick_next_task() := pick_task() + set_next_task(.first = true)
to:
pick_next_task(prev) := pick_task() + put_prev_task() + set_next_task(.first = true)
making invoking put_prev_task() pick_next_task()'s responsibility. This
reordering allows pick_task() to be shared between regular and core-sched
paths and put_prev_task() to know the next task.
sched_ext depended on put_prev_task_scx() enqueueing the current task before
pick_next_task_scx() is called. While pulling sched/core changes,
70cc76aa0d80 ("Merge branch 'tip/sched/core' into for-6.12") added an
explicit put_prev_task_scx() call for SCX tasks in pick_next_task_scx()
before picking the first task as a workaround.
Clean it up and adopt the conventions that other sched classes are
following.
The operation of keeping running the current task was spread and required
the task to be put on the local DSQ before picking:
- balance_one() used SCX_TASK_BAL_KEEP to indicate that the task is still
runnable, hasn't exhausted its slice, and thus should keep running.
- put_prev_task_scx() enqueued the task to local DSQ if SCX_TASK_BAL_KEEP
is set. It also called do_enqueue_task() with SCX_ENQ_LAST if it is the
only runnable task. do_enqueue_task() in turn decided whether to use the
local DSQ depending on SCX_OPS_ENQ_LAST.
Consolidate the logic in balance_one() as it always knows whether it is
going to keep the current task. balance_one() now considers all conditions
where the current task should be kept and uses SCX_TASK_BAL_KEEP to tell
pick_next_task_scx() to keep the current task instead of picking one from
the local DSQ. Accordingly, SCX_ENQ_LAST handling is removed from
put_prev_task_scx() and do_enqueue_task() and pick_next_task_scx() is
updated to pick the current task if SCX_TASK_BAL_KEEP is set.
The workaround put_prev_task[_scx]() calls are replaced with
put_prev_set_next_task().
This causes two behavior changes observable from the BPF scheduler:
- When a task keep running, it no longer goes through enqueue/dequeue cycle
and thus ops.stopping/running() transitions. The new behavior is better
and all the existing schedulers should be able to handle the new behavior.
- The BPF scheduler cannot keep executing the current task by enqueueing
SCX_ENQ_LAST task to the local DSQ. If SCX_OPS_ENQ_LAST is specified, the
BPF scheduler is responsible for resuming execution after each
SCX_ENQ_LAST. SCX_OPS_ENQ_LAST is mostly useful for cases where scheduling
decisions are not made on the local CPU - e.g. central or userspace-driven
schedulin - and the new behavior is more logical and shouldn't pose any
problems. SCX_OPS_ENQ_LAST demonstration from scx_qmap is dropped as it
doesn't fit that well anymore and the last task handling is moved to the
end of qmap_dispatch().
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Andrea Righi <righi.andrea@gmail.com>
Cc: Changwoo Min <multics69@gmail.com>
Cc: Daniel Hodges <hodges.daniel.scott@gmail.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Diffstat (limited to 'tools/sched_ext')
-rw-r--r-- | tools/sched_ext/scx_qmap.bpf.c | 22 |
1 files changed, 18 insertions, 4 deletions
diff --git a/tools/sched_ext/scx_qmap.bpf.c b/tools/sched_ext/scx_qmap.bpf.c index 892278f12dce..7e7f28f4117e 100644 --- a/tools/sched_ext/scx_qmap.bpf.c +++ b/tools/sched_ext/scx_qmap.bpf.c @@ -205,8 +205,7 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags) /* * All enqueued tasks must have their core_sched_seq updated for correct - * core-sched ordering, which is why %SCX_OPS_ENQ_LAST is specified in - * qmap_ops.flags. + * core-sched ordering. Also, take a look at the end of qmap_dispatch(). */ tctx->core_sched_seq = core_sched_tail_seqs[idx]++; @@ -214,7 +213,7 @@ void BPF_STRUCT_OPS(qmap_enqueue, struct task_struct *p, u64 enq_flags) * If qmap_select_cpu() is telling us to or this is the last runnable * task on the CPU, enqueue locally. */ - if (tctx->force_local || (enq_flags & SCX_ENQ_LAST)) { + if (tctx->force_local) { tctx->force_local = false; scx_bpf_dispatch(p, SCX_DSQ_LOCAL, slice_ns, enq_flags); return; @@ -285,6 +284,7 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev) { struct task_struct *p; struct cpu_ctx *cpuc; + struct task_ctx *tctx; u32 zero = 0, batch = dsp_batch ?: 1; void *fifo; s32 i, pid; @@ -349,6 +349,21 @@ void BPF_STRUCT_OPS(qmap_dispatch, s32 cpu, struct task_struct *prev) cpuc->dsp_cnt = 0; } + + /* + * No other tasks. @prev will keep running. Update its core_sched_seq as + * if the task were enqueued and dispatched immediately. + */ + if (prev) { + tctx = bpf_task_storage_get(&task_ctx_stor, prev, 0, 0); + if (!tctx) { + scx_bpf_error("task_ctx lookup failed"); + return; + } + + tctx->core_sched_seq = + core_sched_tail_seqs[weight_to_idx(prev->scx.weight)]++; + } } void BPF_STRUCT_OPS(qmap_tick, struct task_struct *p) @@ -701,6 +716,5 @@ SCX_OPS_DEFINE(qmap_ops, .cpu_offline = (void *)qmap_cpu_offline, .init = (void *)qmap_init, .exit = (void *)qmap_exit, - .flags = SCX_OPS_ENQ_LAST, .timeout_ms = 5000U, .name = "qmap"); |