summaryrefslogtreecommitdiff
path: root/scripts/stackusage
diff options
context:
space:
mode:
authorChen Yu <yu.c.chen@intel.com>2026-05-13 13:39:17 -0700
committerPeter Zijlstra <peterz@infradead.org>2026-05-18 21:33:15 +0200
commitc1e7fe5e75ed11fa85368e5a186472afd3858f3a (patch)
tree325215e8e568e11d70c9674eca7b48b886f024e2 /scripts/stackusage
parent808915f982c2a52f5d148510ecfab52284de67cf (diff)
sched/cache: Add user control to adjust the aggressiveness of cache-aware scheduling
Introduce a set of debugfs knobs to control how aggressively the cache aware scheduling does the task aggregation. (1) aggr_tolerance With sched_cache enabled, the scheduler uses a process's footprint as a proxy for its LLC footprint to determine if aggregating tasks on the preferred LLC could cause cache contention. If the footprint exceeds the LLC size, aggregation is skipped. Since the kernel cannot efficiently track per-task cache usage (resctrl is user-space only), userspace can provide a more accurate hint. Introduce /sys/kernel/debug/sched/llc_balancing/aggr_tolerance to let users control how strictly footprint limits aggregation. Values range from 0 to 100: - 0: Cache-aware scheduling is disabled. - 1: Strict; tasks with footprint larger than LLC size are skipped. - >=100: Aggressive; tasks are aggregated regardless of footprint. For example, with a 32MB L3 cache: - aggr_tolerance=1 -> tasks with footprint > 32MB are skipped. - aggr_tolerance=99 -> tasks with footprint > 784GB are skipped (784GB = (1 + (99 - 1) * 256) * 32MB). Similarly, /sys/kernel/debug/sched/llc_balancing/aggr_tolerance also controls how strictly the number of active threads is considered when doing cache aware load balance. The number of SMTs is also considered. High SMT counts reduce the aggregation capacity, preventing excessive task aggregation on SMT-heavy systems like Power10/Power11. Yangyu suggested introducing separate aggregation controls for the number of active threads and memory footprint checks. Since there are plans to add per-process/task group controls, fine-grained tunables are deferred to that implementation. (2) epoch_period, epoch_affinity_timeout, imb_pct, overaggr_pct are also turned into tunables. Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Suggested-by: Tingyin Duan <tingyin.duan@gmail.com> Suggested-by: Jianyong Wu <jianyong.wu@outlook.com> Suggested-by: Yangyu Chen <cyy@cyyself.name> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/1c62cc060ba2b33d7b1f0ed98b3390128edbae93.1778703694.git.tim.c.chen@linux.intel.com
Diffstat (limited to 'scripts/stackusage')
0 files changed, 0 insertions, 0 deletions