diff options
| author | Chen Yu <yu.c.chen@intel.com> | 2026-05-13 13:39:17 -0700 |
|---|---|---|
| committer | Peter Zijlstra <peterz@infradead.org> | 2026-05-18 21:33:15 +0200 |
| commit | c1e7fe5e75ed11fa85368e5a186472afd3858f3a (patch) | |
| tree | 325215e8e568e11d70c9674eca7b48b886f024e2 /scripts/objdiff | |
| parent | 808915f982c2a52f5d148510ecfab52284de67cf (diff) | |
sched/cache: Add user control to adjust the aggressiveness of cache-aware scheduling
Introduce a set of debugfs knobs to control how aggressively the
cache aware scheduling does the task aggregation.
(1) aggr_tolerance
With sched_cache enabled, the scheduler uses a process's footprint
as a proxy for its LLC footprint to determine if aggregating tasks
on the preferred LLC could cause cache contention. If the footprint
exceeds the LLC size, aggregation is skipped. Since the kernel
cannot efficiently track per-task cache usage (resctrl is
user-space only), userspace can provide a more accurate hint.
Introduce /sys/kernel/debug/sched/llc_balancing/aggr_tolerance to
let users control how strictly footprint limits aggregation. Values
range from 0 to 100:
- 0: Cache-aware scheduling is disabled.
- 1: Strict; tasks with footprint larger than LLC size are skipped.
- >=100: Aggressive; tasks are aggregated regardless of footprint.
For example, with a 32MB L3 cache:
- aggr_tolerance=1 -> tasks with footprint > 32MB are skipped.
- aggr_tolerance=99 -> tasks with footprint > 784GB are skipped
(784GB = (1 + (99 - 1) * 256) * 32MB).
Similarly, /sys/kernel/debug/sched/llc_balancing/aggr_tolerance also
controls how strictly the number of active threads is considered when
doing cache aware load balance. The number of SMTs is also considered.
High SMT counts reduce the aggregation capacity, preventing excessive
task aggregation on SMT-heavy systems like Power10/Power11.
Yangyu suggested introducing separate aggregation controls for the
number of active threads and memory footprint checks. Since there are
plans to add per-process/task group controls, fine-grained tunables are
deferred to that implementation.
(2) epoch_period, epoch_affinity_timeout,
imb_pct, overaggr_pct are also turned into tunables.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com>
Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Suggested-by: Tingyin Duan <tingyin.duan@gmail.com>
Suggested-by: Jianyong Wu <jianyong.wu@outlook.com>
Suggested-by: Yangyu Chen <cyy@cyyself.name>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Tingyin Duan <tingyin.duan@gmail.com>
Link: https://patch.msgid.link/1c62cc060ba2b33d7b1f0ed98b3390128edbae93.1778703694.git.tim.c.chen@linux.intel.com
Diffstat (limited to 'scripts/objdiff')
0 files changed, 0 insertions, 0 deletions
