diff options
| author | Breno Leitao <leitao@debian.org> | 2026-04-01 06:03:53 -0700 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-04-01 10:24:18 -1000 |
| commit | 5920d046f7ae3bf9cf51b9d915c1fff13d299d84 (patch) | |
| tree | ed761045ca9464f571d378e54fd47006f93739f4 /include/linux/workqueue.h | |
| parent | 9dc42c9070282c81058a875fea5acae057610980 (diff) | |
workqueue: add WQ_AFFN_CACHE_SHARD affinity scope
On systems where many CPUs share one LLC, unbound workqueues using
WQ_AFFN_CACHE collapse to a single worker pool, causing heavy spinlock
contention on pool->lock. For example, Chuck Lever measured 39% of
cycles lost to native_queued_spin_lock_slowpath on a 12-core shared-L3
NFS-over-RDMA system.
The existing affinity hierarchy (cpu, smt, cache, numa, system) offers
no intermediate option between per-LLC and per-SMT-core granularity.
Add WQ_AFFN_CACHE_SHARD, which subdivides each LLC into groups of at
most wq_cache_shard_size cores (default 8, tunable via boot parameter).
Shards are always split on core (SMT group) boundaries so that
Hyper-Threading siblings are never placed in different pods. Cores are
distributed across shards as evenly as possible -- for example, 36 cores
in a single LLC with max shard size 8 produces 5 shards of 8+7+7+7+7
cores.
The implementation follows the same comparator pattern as other affinity
scopes: precompute_cache_shard_ids() pre-fills the cpu_shard_id[] array
from the already-initialized WQ_AFFN_CACHE and WQ_AFFN_SMT topology,
and cpus_share_cache_shard() is passed to init_pod_type().
Benchmark on NVIDIA Grace (72 CPUs, single LLC, 50k items/thread), show
cache_shard delivers ~5x the throughput and ~6.5x lower p50 latency
compared to cache scope on this 72-core single-LLC system.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Diffstat (limited to 'include/linux/workqueue.h')
| -rw-r--r-- | include/linux/workqueue.h | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 75634a09576a..ab6cb70ca1a5 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -133,6 +133,7 @@ enum wq_affn_scope { WQ_AFFN_CPU, /* one pod per CPU */ WQ_AFFN_SMT, /* one pod per SMT */ WQ_AFFN_CACHE, /* one pod per LLC */ + WQ_AFFN_CACHE_SHARD, /* synthetic sub-LLC shards */ WQ_AFFN_NUMA, /* one pod per NUMA node */ WQ_AFFN_SYSTEM, /* one pod across the whole system */ |
