<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/cgroup/rstat.c, branch v6.15</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock</title>
<updated>2025-04-01T17:33:26+00:00</updated>
<author>
<name>Shakeel Butt</name>
<email>shakeel.butt@linux.dev</email>
</author>
<published>2025-04-01T17:09:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7d6c63c3191427a69ffd1383146df01f695d6195'/>
<id>7d6c63c3191427a69ffd1383146df01f695d6195</id>
<content type='text'>
The commit 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and
locking") during cleanup accidentally changed the code to call
cgroup_rstat_updated_list() without cgroup_rstat_lock which is required.
Fix it.

Fixes: 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and locking")
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reported-by: Breno Leitao &lt;leitao@debian.org&gt;
Reported-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Closes: https://lore.kernel.org/all/6564c3d6-9372-4352-9847-1eb3aea07ca4@linux.ibm.com/
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commit 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and
locking") during cleanup accidentally changed the code to call
cgroup_rstat_updated_list() without cgroup_rstat_lock which is required.
Fix it.

Fixes: 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and locking")
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reported-by: Breno Leitao &lt;leitao@debian.org&gt;
Reported-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Closes: https://lore.kernel.org/all/6564c3d6-9372-4352-9847-1eb3aea07ca4@linux.ibm.com/
Signed-off-by: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup</title>
<updated>2025-03-24T23:49:40+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-03-24T23:49:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=94dc216ad848ebee06ce7692fcfcbb2e9b3e643c'/>
<id>94dc216ad848ebee06ce7692fcfcbb2e9b3e643c</id>
<content type='text'>
Pull cgroup updates from Tejun Heo:

 - Add deprecation info messages to cgroup1-only features

 - rstat updates including a bug fix and breaking up a critical section
   to reduce interrupt latency impact

 - Other misc and doc updates

* tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rstat: Cleanup flushing functions and locking
  cgroup/rstat: avoid disabling irqs for O(num_cpu)
  mm: Fix a build breakage in memcontrol-v1.c
  blk-cgroup: Simplify policy files registration
  cgroup: Update file naming comment
  cgroup: Add deprecation message to legacy freezer controller
  mm: Add transformation message for per-memcg swappiness
  RFC cgroup/cpuset-v1: Add deprecation messages to sched_relax_domain_level
  cgroup/cpuset-v1: Add deprecation messages to memory_migrate
  cgroup/cpuset-v1: Add deprecation messages to mem_exclusive and mem_hardwall
  cgroup: Print message when /proc/cgroups is read on v2-only system
  cgroup/blkio: Add deprecation messages to reset_stats
  cgroup/cpuset-v1: Add deprecation messages to memory_spread_page and memory_spread_slab
  cgroup/cpuset-v1: Add deprecation messages to sched_load_balance and memory_pressure_enabled
  cgroup, docs: Be explicit about independence of RT_GROUP_SCHED and non-cpu controllers
  cgroup/rstat: Fix forceidle time in cpu.stat
  cgroup/misc: Remove unused misc_cg_res_total_usage
  cgroup/cpuset: Move procfs cpuset attribute under cgroup-v1.c
  cgroup: update comment about dropping cgroup kn refs
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull cgroup updates from Tejun Heo:

 - Add deprecation info messages to cgroup1-only features

 - rstat updates including a bug fix and breaking up a critical section
   to reduce interrupt latency impact

 - Other misc and doc updates

* tag 'cgroup-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rstat: Cleanup flushing functions and locking
  cgroup/rstat: avoid disabling irqs for O(num_cpu)
  mm: Fix a build breakage in memcontrol-v1.c
  blk-cgroup: Simplify policy files registration
  cgroup: Update file naming comment
  cgroup: Add deprecation message to legacy freezer controller
  mm: Add transformation message for per-memcg swappiness
  RFC cgroup/cpuset-v1: Add deprecation messages to sched_relax_domain_level
  cgroup/cpuset-v1: Add deprecation messages to memory_migrate
  cgroup/cpuset-v1: Add deprecation messages to mem_exclusive and mem_hardwall
  cgroup: Print message when /proc/cgroups is read on v2-only system
  cgroup/blkio: Add deprecation messages to reset_stats
  cgroup/cpuset-v1: Add deprecation messages to memory_spread_page and memory_spread_slab
  cgroup/cpuset-v1: Add deprecation messages to sched_load_balance and memory_pressure_enabled
  cgroup, docs: Be explicit about independence of RT_GROUP_SCHED and non-cpu controllers
  cgroup/rstat: Fix forceidle time in cpu.stat
  cgroup/misc: Remove unused misc_cg_res_total_usage
  cgroup/cpuset: Move procfs cpuset attribute under cgroup-v1.c
  cgroup: update comment about dropping cgroup kn refs
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: rstat: Cleanup flushing functions and locking</title>
<updated>2025-03-20T16:53:02+00:00</updated>
<author>
<name>Yosry Ahmed</name>
<email>yosry.ahmed@linux.dev</email>
</author>
<published>2025-03-19T21:00:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=093c8812de2d3436744fd10edab9bf816b94f833'/>
<id>093c8812de2d3436744fd10edab9bf816b94f833</id>
<content type='text'>
Now that the rstat lock is being re-acquired on every CPU iteration in
cgroup_rstat_flush_locked(), having the initially acquire the lock is
unnecessary and unclear.

Inline cgroup_rstat_flush_locked() into cgroup_rstat_flush() and move
the lock/unlock calls to the beginning and ending of the loop body to
make the critical section obvious.

cgroup_rstat_flush_hold/release() do not make much sense with the lock
being dropped and reacquired internally. Since it has no external
callers, remove it and explicitly acquire the lock in
cgroup_base_stat_cputime_show() instead.

This leaves the code with a single flushing function,
cgroup_rstat_flush().

Signed-off-by: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that the rstat lock is being re-acquired on every CPU iteration in
cgroup_rstat_flush_locked(), having the initially acquire the lock is
unnecessary and unclear.

Inline cgroup_rstat_flush_locked() into cgroup_rstat_flush() and move
the lock/unlock calls to the beginning and ending of the loop body to
make the critical section obvious.

cgroup_rstat_flush_hold/release() do not make much sense with the lock
being dropped and reacquired internally. Since it has no external
callers, remove it and explicitly acquire the lock in
cgroup_base_stat_cputime_show() instead.

This leaves the code with a single flushing function,
cgroup_rstat_flush().

Signed-off-by: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: avoid disabling irqs for O(num_cpu)</title>
<updated>2025-03-19T17:25:00+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2025-03-19T07:13:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0efc297a3c4974dbd609ee36fc6345720b6ca735'/>
<id>0efc297a3c4974dbd609ee36fc6345720b6ca735</id>
<content type='text'>
cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while
iterating all possible cpus. It only drops the lock if there is
scheduler or spin lock contention. If neither, then interrupts can be
disabled for a long time. On large machines this can disable interrupts
for a long enough time to drop network packets. On 400+ CPU machines
I've seen interrupt disabled for over 40 msec.

Prevent rstat from disabling interrupts while processing all possible
cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This
approach was previously discussed in
https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/,
though this was in the context of an non-irq rstat spin lock.

Benchmark this change with:
1) a single stat_reader process with 400 threads, each reading a test
   memcg's memory.stat repeatedly for 10 seconds.
2) 400 memory hog processes running in the test memcg and repeatedly
   charging memory until oom killed. Then they repeat charging and oom
   killing.

v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds
interrupts are disabled by rstat for 45341 usec:
  #  =&gt; started at: _raw_spin_lock_irq
  #  =&gt; ended at:   cgroup_rstat_flush
  #
  #
  #                    _------=&gt; CPU#
  #                   / _-----=&gt; irqs-off/BH-disabled
  #                  | / _----=&gt; need-resched
  #                  || / _---=&gt; hardirq/softirq
  #                  ||| / _--=&gt; preempt-depth
  #                  |||| / _-=&gt; migrate-disable
  #                  ||||| /     delay
  #  cmd     pid     |||||| time  |   caller
  #     \   /        ||||||  \    |    /
  stat_rea-96532    52d....    0us*: _raw_spin_lock_irq
  stat_rea-96532    52d.... 45342us : cgroup_rstat_flush
  stat_rea-96532    52d.... 45342us : tracer_hardirqs_on &lt;-cgroup_rstat_flush
  stat_rea-96532    52d.... 45343us : &lt;stack trace&gt;
   =&gt; memcg1_stat_format
   =&gt; memory_stat_format
   =&gt; memory_stat_show
   =&gt; seq_read_iter
   =&gt; vfs_read
   =&gt; ksys_read
   =&gt; do_syscall_64
   =&gt; entry_SYSCALL_64_after_hwframe

With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the
longest holder. The longest irqs-off holder has irqs disabled for
4142 usec, a huge reduction from previous 45341 usec rstat finding.

Running stat_reader memory.stat reader for 10 seconds:
- without memory hogs: 9.84M accesses =&gt; 12.7M accesses
-    with memory hogs: 9.46M accesses =&gt; 11.1M accesses
The throughput of memory.stat access improves.

The mode of memory.stat access latency after grouping by of 2 buckets:
- without memory hogs: 64 usec =&gt; 16 usec
-    with memory hogs: 64 usec =&gt;  8 usec
The memory.stat latency improves.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Tested-by: Greg Thelen &lt;gthelen@google.com&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Reviewed-by: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while
iterating all possible cpus. It only drops the lock if there is
scheduler or spin lock contention. If neither, then interrupts can be
disabled for a long time. On large machines this can disable interrupts
for a long enough time to drop network packets. On 400+ CPU machines
I've seen interrupt disabled for over 40 msec.

Prevent rstat from disabling interrupts while processing all possible
cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This
approach was previously discussed in
https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/,
though this was in the context of an non-irq rstat spin lock.

Benchmark this change with:
1) a single stat_reader process with 400 threads, each reading a test
   memcg's memory.stat repeatedly for 10 seconds.
2) 400 memory hog processes running in the test memcg and repeatedly
   charging memory until oom killed. Then they repeat charging and oom
   killing.

v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds
interrupts are disabled by rstat for 45341 usec:
  #  =&gt; started at: _raw_spin_lock_irq
  #  =&gt; ended at:   cgroup_rstat_flush
  #
  #
  #                    _------=&gt; CPU#
  #                   / _-----=&gt; irqs-off/BH-disabled
  #                  | / _----=&gt; need-resched
  #                  || / _---=&gt; hardirq/softirq
  #                  ||| / _--=&gt; preempt-depth
  #                  |||| / _-=&gt; migrate-disable
  #                  ||||| /     delay
  #  cmd     pid     |||||| time  |   caller
  #     \   /        ||||||  \    |    /
  stat_rea-96532    52d....    0us*: _raw_spin_lock_irq
  stat_rea-96532    52d.... 45342us : cgroup_rstat_flush
  stat_rea-96532    52d.... 45342us : tracer_hardirqs_on &lt;-cgroup_rstat_flush
  stat_rea-96532    52d.... 45343us : &lt;stack trace&gt;
   =&gt; memcg1_stat_format
   =&gt; memory_stat_format
   =&gt; memory_stat_show
   =&gt; seq_read_iter
   =&gt; vfs_read
   =&gt; ksys_read
   =&gt; do_syscall_64
   =&gt; entry_SYSCALL_64_after_hwframe

With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the
longest holder. The longest irqs-off holder has irqs disabled for
4142 usec, a huge reduction from previous 45341 usec rstat finding.

Running stat_reader memory.stat reader for 10 seconds:
- without memory hogs: 9.84M accesses =&gt; 12.7M accesses
-    with memory hogs: 9.46M accesses =&gt; 11.1M accesses
The throughput of memory.stat access improves.

The mode of memory.stat access latency after grouping by of 2 buckets:
- without memory hogs: 64 usec =&gt; 16 usec
-    with memory hogs: 64 usec =&gt;  8 usec
The memory.stat latency improves.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Greg Thelen &lt;gthelen@google.com&gt;
Tested-by: Greg Thelen &lt;gthelen@google.com&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Reviewed-by: Yosry Ahmed &lt;yosry.ahmed@linux.dev&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: Fix forceidle time in cpu.stat</title>
<updated>2025-02-27T17:19:17+00:00</updated>
<author>
<name>Abel Wu</name>
<email>wuyun.abel@bytedance.com</email>
</author>
<published>2025-02-09T06:13:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c4af66a95aa3bc1d4f607ebd4eea524fb58946e3'/>
<id>c4af66a95aa3bc1d4f607ebd4eea524fb58946e3</id>
<content type='text'>
The commit b824766504e4 ("cgroup/rstat: add force idle show helper")
retrieves forceidle_time outside cgroup_rstat_lock for non-root cgroups
which can be potentially inconsistent with other stats.

Rather than reverting that commit, fix it in a way that retains the
effort of cleaning up the ifdef-messes.

Fixes: b824766504e4 ("cgroup/rstat: add force idle show helper")
Signed-off-by: Abel Wu &lt;wuyun.abel@bytedance.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commit b824766504e4 ("cgroup/rstat: add force idle show helper")
retrieves forceidle_time outside cgroup_rstat_lock for non-root cgroups
which can be potentially inconsistent with other stats.

Rather than reverting that commit, fix it in a way that retains the
effort of cleaning up the ifdef-messes.

Fixes: b824766504e4 ("cgroup/rstat: add force idle show helper")
Signed-off-by: Abel Wu &lt;wuyun.abel@bytedance.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Remove steal time from usage_usec</title>
<updated>2025-02-07T21:02:17+00:00</updated>
<author>
<name>Muhammad Adeel</name>
<email>Muhammad.Adeel@ibm.com</email>
</author>
<published>2025-02-07T14:24:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=db5fd3cf8bf41b84b577b8ad5234ea95f327c9be'/>
<id>db5fd3cf8bf41b84b577b8ad5234ea95f327c9be</id>
<content type='text'>
The CPU usage time is the time when user, system or both are using the CPU.
Steal time is the time when CPU is waiting to be run by the Hypervisor. It
should not be added to the CPU usage time, hence removing it from the
usage_usec entry.

Fixes: 936f2a70f2077 ("cgroup: add cpu.stat file to root cgroup")
Acked-by: Axel Busch &lt;axel.busch@ibm.com&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Muhammad Adeel &lt;muhammad.adeel@ibm.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The CPU usage time is the time when user, system or both are using the CPU.
Steal time is the time when CPU is waiting to be run by the Hypervisor. It
should not be added to the CPU usage time, hence removing it from the
usage_usec entry.

Fixes: 936f2a70f2077 ("cgroup: add cpu.stat file to root cgroup")
Acked-by: Axel Busch &lt;axel.busch@ibm.com&gt;
Acked-by: Michal Koutný &lt;mkoutny@suse.com&gt;
Signed-off-by: Muhammad Adeel &lt;muhammad.adeel@ibm.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: Tracking cgroup-level niced CPU time</title>
<updated>2024-10-08T18:50:48+00:00</updated>
<author>
<name>Joshua Hahn</name>
<email>joshua.hahn6@gmail.com</email>
</author>
<published>2024-10-02T18:47:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aefa398d93d5db7c555be78a605ff015357f127d'/>
<id>aefa398d93d5db7c555be78a605ff015357f127d</id>
<content type='text'>
Cgroup-level CPU statistics currently include time spent on
user/system processes, but do not include niced CPU time (despite
already being tracked). This patch exposes niced CPU time to the
userspace, allowing users to get a better understanding of their
hardware limits and can facilitate more informed workload distribution.

A new field 'ntime' is added to struct cgroup_base_stat as opposed to
struct task_cputime to minimize footprint.

Signed-off-by: Joshua Hahn &lt;joshua.hahnjy@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cgroup-level CPU statistics currently include time spent on
user/system processes, but do not include niced CPU time (despite
already being tracked). This patch exposes niced CPU time to the
userspace, allowing users to get a better understanding of their
hardware limits and can facilitate more informed workload distribution.

A new field 'ntime' is added to struct cgroup_base_stat as opposed to
struct task_cputime to minimize footprint.

Signed-off-by: Joshua Hahn &lt;joshua.hahnjy@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: add force idle show helper</title>
<updated>2024-07-05T18:32:08+00:00</updated>
<author>
<name>Chen Ridong</name>
<email>chenridong@huawei.com</email>
</author>
<published>2024-07-04T14:01:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b824766504e49f3fdcbb8c722e70996a78c3636e'/>
<id>b824766504e49f3fdcbb8c722e70996a78c3636e</id>
<content type='text'>
In the function cgroup_base_stat_cputime_show, there are five
instances of #ifdef, which makes the code not concise.
To address this, add the function cgroup_force_idle_show
to make the code more succinct.

Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the function cgroup_base_stat_cputime_show, there are five
instances of #ifdef, which makes the code not concise.
To address this, add the function cgroup_force_idle_show
to make the code more succinct.

Signed-off-by: Chen Ridong &lt;chenridong@huawei.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: add cgroup_rstat_cpu_lock helpers and tracepoints</title>
<updated>2024-05-14T19:43:17+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2024-05-01T14:04:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2'/>
<id>21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2</id>
<content type='text'>
This closely resembles helpers added for the global cgroup_rstat_lock in
commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and
tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock.

Based on production workloads, we observe the fast-path "update" function
cgroup_rstat_updated() is invoked around 3 million times per sec, while the
"flush" function cgroup_rstat_flush_locked(), walking each possible CPU,
can see periodic spikes of 700 invocations/sec.

For this reason, the tracepoints are split into normal and fastpath
versions for this per-CPU lock. Making it feasible for production to
continuously monitor the non-fastpath tracepoint to detect lock contention
issues. The reason for monitoring is that lock disables IRQs which can
disturb e.g. softirq processing on the local CPUs involved. When the
global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex),
this per CPU lock becomes the next bottleneck that can introduce latency
variations.

A practical bpftrace script for monitoring contention latency:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
     @start[tid]=nsecs; @cnt[probe]=count()}
   tracepoint:cgroup:cgroup_rstat_cpu_locked {
     if (args-&gt;contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}
     @cnt[probe]=count()}
   interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}'

Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This closely resembles helpers added for the global cgroup_rstat_lock in
commit fc29e04ae1ad ("cgroup/rstat: add cgroup_rstat_lock helpers and
tracepoints"). This is for the per CPU lock cgroup_rstat_cpu_lock.

Based on production workloads, we observe the fast-path "update" function
cgroup_rstat_updated() is invoked around 3 million times per sec, while the
"flush" function cgroup_rstat_flush_locked(), walking each possible CPU,
can see periodic spikes of 700 invocations/sec.

For this reason, the tracepoints are split into normal and fastpath
versions for this per-CPU lock. Making it feasible for production to
continuously monitor the non-fastpath tracepoint to detect lock contention
issues. The reason for monitoring is that lock disables IRQs which can
disturb e.g. softirq processing on the local CPUs involved. When the
global cgroup_rstat_lock stops disabling IRQs (e.g converted to a mutex),
this per CPU lock becomes the next bottleneck that can introduce latency
variations.

A practical bpftrace script for monitoring contention latency:

 bpftrace -e '
   tracepoint:cgroup:cgroup_rstat_cpu_lock_contended {
     @start[tid]=nsecs; @cnt[probe]=count()}
   tracepoint:cgroup:cgroup_rstat_cpu_locked {
     if (args-&gt;contended) {
       @wait_ns=hist(nsecs-@start[tid]); delete(@start[tid]);}
     @cnt[probe]=count()}
   interval:s:1 {time("%H:%M:%S "); print(@wait_ns); print(@cnt); clear(@cnt);}'

Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup/rstat: desc member cgrp in cgroup_rstat_flush_release</title>
<updated>2024-04-18T01:56:30+00:00</updated>
<author>
<name>Jesper Dangaard Brouer</name>
<email>hawk@kernel.org</email>
</author>
<published>2024-04-17T10:59:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=97a46a66ad7d912638c5eefa1f567b4a4a5b58b8'/>
<id>97a46a66ad7d912638c5eefa1f567b4a4a5b58b8</id>
<content type='text'>
Recent change to cgroup_rstat_flush_release added a
parameter cgrp, which is used by tracepoint to correlate
with other tracepoints that also have this cgrp.

The kernel test robot detected kernel doc was missing
a description of this member.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202404170821.HwZGISTY-lkp@intel.com/
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recent change to cgroup_rstat_flush_release added a
parameter cgrp, which is used by tracepoint to correlate
with other tracepoints that also have this cgrp.

The kernel test robot detected kernel doc was missing
a description of this member.

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202404170821.HwZGISTY-lkp@intel.com/
Signed-off-by: Jesper Dangaard Brouer &lt;hawk@kernel.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
