<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/cgroup, branch v6.10</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2024-05-19T18:38:15+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-05-19T18:38:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8dde191aabba42e9c16c8d9c853a72a062db27ee'/>
<id>8dde191aabba42e9c16c8d9c853a72a062db27ee</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:

 - Fix a sched_balance_newidle setting bug

 - Fix bug in the setting of /sys/fs/cgroup/test/cpu.max.burst

 - Fix variable-shadowing build warning

 - Extend sched-domains debug output

 - Fix documentation

 - Fix comments

* tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
  sched/fair: Remove stale FREQUENCY_UTIL comment
  sched/fair: Fix initial util_avg calculation
  docs: cgroup-v1: Clarify that domain levels are system-specific
  sched/debug: Dump domains' level
  sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level
  arch/topology: Fix variable naming to avoid shadowing
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler fixes from Ingo Molnar:

 - Fix a sched_balance_newidle setting bug

 - Fix bug in the setting of /sys/fs/cgroup/test/cpu.max.burst

 - Fix variable-shadowing build warning

 - Extend sched-domains debug output

 - Fix documentation

 - Fix comments

* tag 'sched-urgent-2024-05-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()
  sched/fair: Remove stale FREQUENCY_UTIL comment
  sched/fair: Fix initial util_avg calculation
  docs: cgroup-v1: Clarify that domain levels are system-specific
  sched/debug: Dump domains' level
  sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level
  arch/topology: Fix variable naming to avoid shadowing
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level</title>
<updated>2024-05-17T07:48:24+00:00</updated>
<author>
<name>Vitalii Bursov</name>
<email>vitaly@bursov.com</email>
</author>
<published>2024-04-30T15:05:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a1fd0b9d751f840df23ef0e75b691fc00cfd4743'/>
<id>a1fd0b9d751f840df23ef0e75b691fc00cfd4743</id>
<content type='text'>
Change relax_domain_level checks so that it would be possible
to include or exclude all domains from newidle balancing.

This matches the behavior described in the documentation:

  -1   no request. use system default or follow request of others.
   0   no search.
   1   search siblings (hyperthreads in a core).

"2" enables levels 0 and 1, level_max excludes the last (level_max)
level, and level_max+1 includes all levels.

Fixes: 1d3504fcf560 ("sched, cpuset: customize sched domains, core")
Signed-off-by: Vitalii Bursov &lt;vitaly@bursov.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Link: https://lore.kernel.org/r/bd6de28e80073c79466ec6401cdeae78f0d4423d.1714488502.git.vitaly@bursov.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change relax_domain_level checks so that it would be possible
to include or exclude all domains from newidle balancing.

This matches the behavior described in the documentation:

  -1   no request. use system default or follow request of others.
   0   no search.
   1   search siblings (hyperthreads in a core).

"2" enables levels 0 and 1, level_max excludes the last (level_max)
level, and level_max+1 includes all levels.

Fixes: 1d3504fcf560 ("sched, cpuset: customize sched domains, core")
Signed-off-by: Vitalii Bursov &lt;vitaly@bursov.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Tested-by: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Reviewed-by: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Reviewed-by: Valentin Schneider &lt;vschneid@redhat.com&gt;
Link: https://lore.kernel.org/r/bd6de28e80073c79466ec6401cdeae78f0d4423d.1714488502.git.vitaly@bursov.com
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints</title>
<updated>2024-05-14T19:43:17+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2024-05-01T14:04:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2'/>
<id>21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2</id>
<content type='text'>
This closely resembles helpers added for the global cgroup_rstat_lock in
commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and
tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock.

Based on production workloads, we observe the fast-path "update" function
cgroup_rstat_updated() is invoked around 3 million times per sec, while the
"flush" function cgroup_rstat_flush_locked(), walking each possible CPU,
can see periodic spikes of 700 invocations/sec.

For this reason, the tracepoints are split into normal and fastpath
versions for this per-CPU lock. Making it feasible for production to
continuously monitor the non-fastpath tracepoint to detect lock contention
issues. The reason for monitoring is that lock disables IRQs which can
disturb e.g. softirq processing on the local CPUs involved. When the
global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex),
this per CPU lock becomes the next bottleneck that can introduce latency
variations.

A practical bpftrace script for monitoring contention latency:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
     @start[tid]=nsecs; @cnt[probe]=count()}
   tracepoint:cgroup:cgroup_rstat_cpu_locked {
     if (args-&gt;contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}
     @cnt[probe]=count()}
   interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}'

Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This closely resembles helpers added for the global cgroup_rstat_lock in
commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and
tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock.

Based on production workloads, we observe the fast-path "update" function
cgroup_rstat_updated() is invoked around 3 million times per sec, while the
"flush" function cgroup_rstat_flush_locked(), walking each possible CPU,
can see periodic spikes of 700 invocations/sec.

For this reason, the tracepoints are split into normal and fastpath
versions for this per-CPU lock. Making it feasible for production to
continuously monitor the non-fastpath tracepoint to detect lock contention
issues. The reason for monitoring is that lock disables IRQs which can
disturb e.g. softirq processing on the local CPUs involved. When the
global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex),
this per CPU lock becomes the next bottleneck that can introduce latency
variations.

A practical bpftrace script for monitoring contention latency:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
     @start[tid]=nsecs; @cnt[probe]=count()}
   tracepoint:cgroup:cgroup_rstat_cpu_locked {
     if (args-&gt;contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}
     @cnt[probe]=count()}
   interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}'

Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/cpuset: Remove outdated comment in sched_partition_write()</title>
<updated>2024-04-25T17:16:19+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-04-25T09:30:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b7d56d953a67c839a514e382596d5af29b8d6d87'/>
<id>b7d56d953a67c839a514e382596d5af29b8d6d87</id>
<content type='text'>
The comment here is outdated and can cause confusion, from the code
perspective, there’s also no need for new comment, so just remove it.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The comment here is outdated and can cause confusion, from the code
perspective, there’s also no need for new comment, so just remove it.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Acked-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/cpuset: Fix incorrect top_cpuset flags</title>
<updated>2024-04-24T03:31:18+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2024-04-24T01:00:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=04d63da4da53e6064683f4970d34ce997f9daee3'/>
<id>04d63da4da53e6064683f4970d34ce997f9daee3</id>
<content type='text'>
Commit 8996f93fc388 ("cgroup/cpuset: Statically initialize more
members of top_cpuset") uses an incorrect "&lt;" relational operator for
the CS_SCHED_LOAD_BALANCE bit when initializing the top_cpuset. This
results in load_balancing turned off by default in the top cpuset which
is bad for performance.

Fix this by using the BIT() helper macro to set the desired top_cpuset
flags and avoid similar mistake from being made in the future.

Fixes: 8996f93fc388 ("cgroup/cpuset: Statically initialize more members of top_cpuset")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 8996f93fc388 ("cgroup/cpuset: Statically initialize more
members of top_cpuset") uses an incorrect "&lt;" relational operator for
the CS_SCHED_LOAD_BALANCE bit when initializing the top_cpuset. This
results in load_balancing turned off by default in the top cpuset which
is bad for performance.

Fix this by using the BIT() helper macro to set the desired top_cpuset
flags and avoid similar mistake from being made in the future.

Fixes: 8996f93fc388 ("cgroup/cpuset: Statically initialize more members of top_cpuset")
Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/cpuset: Avoid clearing CS_SCHED_LOAD_BALANCE twice</title>
<updated>2024-04-23T16:00:43+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-04-23T02:44:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e8784765fae6edc47efb68d425c65e3633d70cd6'/>
<id>e8784765fae6edc47efb68d425c65e3633d70cd6</id>
<content type='text'>
In cpuset_css_online(), CS_SCHED_LOAD_BALANCE will be cleared twice,
the former one in the is_in_v2_mode() case could be removed because
is_in_v2_mode() can be true for cgroup v1 if the "cpuset_v2_mode"
mount option is specified, that balance flag change isn't appropriate
for this particular case.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In cpuset_css_online(), CS_SCHED_LOAD_BALANCE will be cleared twice,
the former one in the is_in_v2_mode() case could be removed because
is_in_v2_mode() can be true for cgroup v1 if the "cpuset_v2_mode"
mount option is specified, that balance flag change isn't appropriate
for this particular case.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/cpuset: Statically initialize more members of top_cpuset</title>
<updated>2024-04-22T19:51:33+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-04-20T09:46:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8996f93fc3880d000a0a0cb40c724d5830e140fd'/>
<id>8996f93fc3880d000a0a0cb40c724d5830e140fd</id>
<content type='text'>
Initializing top_cpuset.relax_domain_level and setting
CS_SCHED_LOAD_BALANCE to top_cpuset.flags in cpuset_init() could be
completed at the time of top_cpuset definition by compiler.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Initializing top_cpuset.relax_domain_level and setting
CS_SCHED_LOAD_BALANCE to top_cpuset.flags in cpuset_init() could be
completed at the time of top_cpuset definition by compiler.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Reviewed-by: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Avoid unnecessary looping in cgroup_no_v1()</title>
<updated>2024-04-19T15:43:36+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-04-19T08:53:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=19fc8a896565ecebb3951664fd0eeab0a80314a1'/>
<id>19fc8a896565ecebb3951664fd0eeab0a80314a1</id>
<content type='text'>
No need to continue the for_each_subsys loop after the token matches the
name of subsys and cgroup_no_v1_mask is set.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No need to continue the for_each_subsys loop after the token matches the
name of subsys and cgroup_no_v1_mask is set.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup, legacy_freezer: update comment for freezer_css_offline()</title>
<updated>2024-04-18T16:01:50+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-04-18T12:43:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f71bfbe1e281a707e7766eb0a59ba056dc0f29f9'/>
<id>f71bfbe1e281a707e7766eb0a59ba056dc0f29f9</id>
<content type='text'>
After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
system_freezing_count was replaced by freezer_active.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
system_freezing_count was replaced by freezer_active.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: don't call cgroup1_pidlist_destroy_all() for v2</title>
<updated>2024-04-18T15:56:58+00:00</updated>
<author>
<name>Xiu Jianfeng</name>
<email>xiujianfeng@huawei.com</email>
</author>
<published>2024-04-18T02:19:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=15a0b5fe1ad6fbecfa6517750718089e12ee8344'/>
<id>15a0b5fe1ad6fbecfa6517750718089e12ee8344</id>
<content type='text'>
Currently cgroup1_pidlist_destroy_all() will be called when releasing
cgroup even if the cgroup is on default hierarchy, however it doesn't
make any sense for v2 to destroy pidlist of v1.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently cgroup1_pidlist_destroy_all() will be called when releasing
cgroup even if the cgroup is on default hierarchy, however it doesn't
make any sense for v2 to destroy pidlist of v1.

Signed-off-by: Xiu Jianfeng &lt;xiujianfeng@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
