linux.git - Linux kernel source tree

diff options

author	Chen Yu <yu.c.chen@intel.com>	2026-05-13 13:39:17 -0700
committer	Peter Zijlstra <peterz@infradead.org>	2026-05-18 21:33:15 +0200
commit	c1e7fe5e75ed11fa85368e5a186472afd3858f3a (patch)
tree	325215e8e568e11d70c9674eca7b48b886f024e2 /scripts/stackusage
parent	808915f982c2a52f5d148510ecfab52284de67cf (diff)

sched/cache: Add user control to adjust the aggressiveness of cache-aware scheduling

Introduce a set of debugfs knobs to control how aggressively the cache aware scheduling does the task aggregation. (1) aggr_tolerance With sched_cache enabled, the scheduler uses a process's footprint as a proxy for its LLC footprint to determine if aggregating tasks on the preferred LLC could cause cache contention. If the footprint exceeds the LLC size, aggregation is skipped. Since the kernel cannot efficiently track per-task cache usage (resctrl is user-space only), userspace can provide a more accurate hint. Introduce /sys/kernel/debug/sched/llc_balancing/aggr_tolerance to let users control how strictly footprint limits aggregation. Values range from 0 to 100: - 0: Cache-aware scheduling is disabled. - 1: Strict; tasks with footprint larger than LLC size are skipped. - >=100: Aggressive; tasks are aggregated regardless of footprint. For example, with a 32MB L3 cache: - aggr_tolerance=1 -> tasks with footprint > 32MB are skipped. - aggr_tolerance=99 -> tasks with footprint > 784GB are skipped (784GB = (1 + (99 - 1) * 256) * 32MB). Similarly, /sys/kernel/debug/sched/llc_balancing/aggr_tolerance also controls how strictly the number of active threads is considered when doing cache aware load balance. The number of SMTs is also considered. High SMT counts reduce the aggregation capacity, preventing excessive task aggregation on SMT-heavy systems like Power10/Power11. Yangyu suggested introducing separate aggregation controls for the number of active threads and memory footprint checks. Since there are plans to add per-process/task group controls, fine-grained tunables are deferred to that implementation. (2) epoch_period, epoch_affinity_timeout, imb_pct, overaggr_pct are also turned into tunables. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Suggested-by: Tingyin Duan <tingyin.duan@gmail.com> Suggested-by: Jianyong Wu <jianyong.wu@outlook.com> Suggested-by: Yangyu Chen <cyy@cyyself.name> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/1c62cc060ba2b33d7b1f0ed98b3390128edbae93.1778703694.git.tim.c.chen@linux.intel.com

Diffstat (limited to 'scripts/stackusage')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: