diff options
| author | Tejun Heo <tj@kernel.org> | 2026-04-12 17:30:52 -1000 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-04-13 06:13:59 -1000 |
| commit | 3d3667f265148d856bc6eb54d1bd780a94e38da7 (patch) | |
| tree | cb558ba0bc3e000b5112395bc8dce6a8cb1c36ed /tools/perf/scripts/python/stackcollapse.py | |
| parent | 49d78adf9555bbc02ccb65a28325e3e57e9c52ed (diff) | |
tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap
scx_qmap uses global BPF queue maps (BPF_MAP_TYPE_QUEUE) that any CPU's
ops.dispatch() can pop from. When a CPU pops a task that can't run on it
(e.g. a pinned per-CPU kthread), it inserts the task into SHARED_DSQ.
consume_dispatch_q() then skips the task due to affinity mismatch, leaving it
stranded until some CPU in its allowed mask calls ops.dispatch(). This doesn't
cause indefinite stalls -- the periodic tick keeps firing (can_stop_idle_tick()
returns false when softirq is pending) -- but can cause noticeable scheduling
delays.
After inserting to SHARED_DSQ, kick the task's home CPU if this CPU can't run
it. There's a small race window where the home CPU can enter idle before the
kick lands -- if a per-CPU kthread like ksoftirqd is the stranded task, this
can trigger a "NOHZ tick-stop error" warning. The kick arrives shortly after
and the home CPU drains the task.
Rather than fully eliminating the warning by routing pinned tasks to local or
global DSQs, the current code keeps them going through the normal BPF queue
path and documents the race and the resulting warning in detail. scx_qmap is an
example scheduler and having tasks go through the usual dispatch path is useful
for testing. The detailed comment also serves as a reference for other
schedulers that may encounter similar warnings.
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Diffstat (limited to 'tools/perf/scripts/python/stackcollapse.py')
0 files changed, 0 insertions, 0 deletions
