diff options
| author | Tejun Heo <tj@kernel.org> | 2026-03-22 10:33:08 -1000 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-03-22 14:05:25 -1000 |
| commit | 76edc2761ab8bd27fe4c4b8b2fb71baefc4a31e8 (patch) | |
| tree | d980b826248deb574455d8304c60e69b8987e8d6 /tools/perf/scripts/python/parallel-perf.py | |
| parent | f03ffe53ab6ffc798ed8291090cebf19c6e5fa3b (diff) | |
sched_ext: Use irq_work_queue_on() in schedule_deferred()
schedule_deferred() uses irq_work_queue() which always queues on the
calling CPU. The deferred work can run from any CPU correctly, and the
_locked() path already processes remote rqs from the calling CPU. However,
when falling through to the irq_work path, queuing on the target CPU is
preferable as the work can run sooner via IPI delivery rather than waiting
for the calling CPU to re-enable IRQs.
Currently, only reenqueue operations use this path - either BPF-initiated
reenqueue targeting a remote rq, or IMMED reenqueue when the target CPU is
busy running userspace (not in balance or wakeup, so the _locked() fast
paths aren't available). Use irq_work_queue_on() to target the owning CPU.
This improves IMMED reenqueue latency when tasks are dispatched to
remote local DSQs. Testing on a 24-CPU AMD Ryzen 3900X with scx_qmap
-I -F 50 (ALWAYS_ENQ_IMMED, every 50th enqueue forced to prev_cpu's
local DSQ) under heavy mixed load (2x CPU oversubscription, yield and
context-switch pressure, SCHED_FIFO bursts, periodic fork storms, mixed
nice levels, C-states disabled), measuring local DSQ residence time
(insert to remove) over 5 x 120s runs (~1.2M tasks per set):
>128us outliers: 71 -> 39 (-45%)
>256us outliers: 59 -> 36 (-39%)
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Diffstat (limited to 'tools/perf/scripts/python/parallel-perf.py')
0 files changed, 0 insertions, 0 deletions
