|
Add a kernel module that benchmarks queue_work() throughput on an
unbound workqueue to measure pool->lock contention under different
affinity scope configurations (cache vs cache_shard).
The module spawns N kthreads (default: num_online_cpus()), each bound
to a different CPU. All threads start simultaneously and queue work
items, measuring the latency of each queue_work() call. Results are
reported as p50/p90/p95 latencies for each affinity scope.
The affinity scope is switched between runs via the workqueue's sysfs
affinity_scope attribute (WQ_SYSFS), avoiding the need for any new
exported symbols.
The module runs as __init-only, returning -EAGAIN to auto-unload,
and can be re-run via insmod.
Example of the output:
running 50 threads, 50000 items/thread
cpu 6806017 items/sec p50=2574 p90=5068 p95=5818 ns
smt 6821040 items/sec p50=2624 p90=5168 p95=5949 ns
cache_shard 1633653 items/sec p50=5337 p90=9694 p95=11207 ns
cache 286069 items/sec p50=72509 p90=82304 p95=85009 ns
numa 319403 items/sec p50=63745 p90=73480 p95=76505 ns
system 308461 items/sec p50=66561 p90=75714 p95=78048 ns
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|