<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/bpf_memcontrol.c, branch master</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>bpf: Revert "bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()"</title>
<updated>2026-01-21T17:38:16+00:00</updated>
<author>
<name>Matt Bobrowski</name>
<email>mattbobrowski@google.com</email>
</author>
<published>2026-01-21T09:00:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d0f5d4f8f3285349bfb94fa84555ab267d2c041d'/>
<id>d0f5d4f8f3285349bfb94fa84555ab267d2c041d</id>
<content type='text'>
This reverts commit e463b6de9da1 ("bpf: drop KF_ACQUIRE flag on BPF
kfunc bpf_get_root_mem_cgroup()").

The original commit removed the KF_ACQUIRE flag from
bpf_get_root_mem_cgroup() under the assumption that it resulted in
simplified usage. This stemmed from the fact that
bpf_get_root_mem_cgroup() inherently returns a reference to an object
which technically isn't reference counted, therefore there is no
strong requirement to call a matching bpf_put_mem_cgroup() on the
returned reference.

Although technically correct, as per the arguments in the thread [0],
dropping the KF_ACQUIRE flag and losing reference tracking semantics
negatively impacted the usability of bpf_get_root_mem_cgroup() in
practice.

[0] https://lore.kernel.org/bpf/878qdx6yut.fsf@linux.dev/

Link: https://lore.kernel.org/bpf/CAADnVQ+6d1Lj4dteAv8u62d7kj3Ze5io6bqM0xeQd-UPk9ZgJQ@mail.gmail.com/
Signed-off-by: Matt Bobrowski &lt;mattbobrowski@google.com&gt;
Link: https://lore.kernel.org/r/20260121090001.240166-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit e463b6de9da1 ("bpf: drop KF_ACQUIRE flag on BPF
kfunc bpf_get_root_mem_cgroup()").

The original commit removed the KF_ACQUIRE flag from
bpf_get_root_mem_cgroup() under the assumption that it resulted in
simplified usage. This stemmed from the fact that
bpf_get_root_mem_cgroup() inherently returns a reference to an object
which technically isn't reference counted, therefore there is no
strong requirement to call a matching bpf_put_mem_cgroup() on the
returned reference.

Although technically correct, as per the arguments in the thread [0],
dropping the KF_ACQUIRE flag and losing reference tracking semantics
negatively impacted the usability of bpf_get_root_mem_cgroup() in
practice.

[0] https://lore.kernel.org/bpf/878qdx6yut.fsf@linux.dev/

Link: https://lore.kernel.org/bpf/CAADnVQ+6d1Lj4dteAv8u62d7kj3Ze5io6bqM0xeQd-UPk9ZgJQ@mail.gmail.com/
Signed-off-by: Matt Bobrowski &lt;mattbobrowski@google.com&gt;
Link: https://lore.kernel.org/r/20260121090001.240166-1-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: drop KF_ACQUIRE flag on BPF kfunc bpf_get_root_mem_cgroup()</title>
<updated>2026-01-14T03:19:13+00:00</updated>
<author>
<name>Matt Bobrowski</name>
<email>mattbobrowski@google.com</email>
</author>
<published>2026-01-13T08:39:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e463b6de9da17995a2ddabf199cc00c65a8a5392'/>
<id>e463b6de9da17995a2ddabf199cc00c65a8a5392</id>
<content type='text'>
With the BPF verifier now treating pointers to struct types returned
from BPF kfuncs as implicitly trusted by default, there is no need for
bpf_get_root_mem_cgroup() to be annotated with the KF_ACQUIRE flag.

bpf_get_root_mem_cgroup() does not acquire any references, but rather
simply returns a NULL pointer or a pointer to a struct mem_cgroup
object that is valid for the entire lifetime of the kernel.

This simplifies BPF programs using this kfunc by removing the
requirement to pair the call with bpf_put_mem_cgroup().

Signed-off-by: Matt Bobrowski &lt;mattbobrowski@google.com&gt;
Acked-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20260113083949.2502978-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the BPF verifier now treating pointers to struct types returned
from BPF kfuncs as implicitly trusted by default, there is no need for
bpf_get_root_mem_cgroup() to be annotated with the KF_ACQUIRE flag.

bpf_get_root_mem_cgroup() does not acquire any references, but rather
simply returns a NULL pointer or a pointer to a struct mem_cgroup
object that is valid for the entire lifetime of the kernel.

This simplifies BPF programs using this kfunc by removing the
requirement to pair the call with bpf_put_mem_cgroup().

Signed-off-by: Matt Bobrowski &lt;mattbobrowski@google.com&gt;
Acked-by: Kumar Kartikeya Dwivedi &lt;memxor@gmail.com&gt;
Link: https://lore.kernel.org/r/20260113083949.2502978-2-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Remove redundant KF_TRUSTED_ARGS flag from all kfuncs</title>
<updated>2026-01-02T20:04:28+00:00</updated>
<author>
<name>Puranjay Mohan</name>
<email>puranjay@kernel.org</email>
</author>
<published>2026-01-02T18:00:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7646c7afd9a95db0b0cb4ad066ed90f6024da67d'/>
<id>7646c7afd9a95db0b0cb4ad066ed90f6024da67d</id>
<content type='text'>
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.

Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Reviewed-by: Emil Tsalapatis &lt;emil@etsalapatis.com&gt;
Signed-off-by: Puranjay Mohan &lt;puranjay@kernel.org&gt;
Link: https://lore.kernel.org/r/20260102180038.2708325-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.

Acked-by: Eduard Zingerman &lt;eddyz87@gmail.com&gt;
Reviewed-by: Emil Tsalapatis &lt;emil@etsalapatis.com&gt;
Signed-off-by: Puranjay Mohan &lt;puranjay@kernel.org&gt;
Link: https://lore.kernel.org/r/20260102180038.2708325-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: introduce BPF kfuncs to access memcg statistics and events</title>
<updated>2025-12-23T06:20:22+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>roman.gushchin@linux.dev</email>
</author>
<published>2025-12-23T04:41:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=99430ab8b804c26b8a0dec93fcbfe75469f3edc7'/>
<id>99430ab8b804c26b8a0dec93fcbfe75469f3edc7</id>
<content type='text'>
Introduce BPF kfuncs to conveniently access memcg data:
  - bpf_mem_cgroup_vm_events(),
  - bpf_mem_cgroup_memory_events(),
  - bpf_mem_cgroup_usage(),
  - bpf_mem_cgroup_page_state(),
  - bpf_mem_cgroup_flush_stats().

These functions are useful for implementing BPF OOM policies, but
also can be used to accelerate access to the memcg data. Reading
it through cgroupfs is much more expensive, roughly 5x, mostly
because of the need to convert the data into the text and back.

JP Kobryn:
An experiment was setup to compare the performance of a program that
uses the traditional method of reading memory.stat vs a program using
the new kfuncs. The control program opens up the root memory.stat file
and for 1M iterations reads, converts the string values to numeric data,
then seeks back to the beginning. The experimental program sets up the
requisite libbpf objects and for 1M iterations invokes a bpf program
which uses the kfuncs to fetch all available stats for node_stat_item,
memcg_stat_item, and vm_event_item types.

The results showed a significant perf benefit on the experimental side,
outperforming the control side by a margin of 93%. In kernel mode,
elapsed time was reduced by 80%, while in user mode, over 99% of time
was saved.

control: elapsed time
real    0m38.318s
user    0m25.131s
sys     0m13.070s

experiment: elapsed time
real    0m2.789s
user    0m0.187s
sys     0m2.512s

control: perf data
33.43% a.out libc.so.6         [.] __vfscanf_internal
 6.88% a.out [kernel.kallsyms] [k] vsnprintf
 6.33% a.out libc.so.6         [.] _IO_fgets
 5.51% a.out [kernel.kallsyms] [k] format_decode
 4.31% a.out libc.so.6         [.] __GI_____strtoull_l_internal
 3.78% a.out [kernel.kallsyms] [k] string
 3.53% a.out [kernel.kallsyms] [k] number
 2.71% a.out libc.so.6         [.] _IO_sputbackc
 2.41% a.out [kernel.kallsyms] [k] strlen
 1.98% a.out a.out             [.] main
 1.70% a.out libc.so.6         [.] _IO_getline_info
 1.51% a.out libc.so.6         [.] __isoc99_sscanf
 1.47% a.out [kernel.kallsyms] [k] memory_stat_format
 1.47% a.out [kernel.kallsyms] [k] memcpy_orig
 1.41% a.out [kernel.kallsyms] [k] seq_buf_printf

experiment: perf data
10.55% memcgstat bpf_prog_..._query [k] bpf_prog_16aab2f19fa982a7_query
 6.90% memcgstat [kernel.kallsyms]  [k] memcg_page_state_output
 3.55% memcgstat [kernel.kallsyms]  [k] _raw_spin_lock
 3.12% memcgstat [kernel.kallsyms]  [k] memcg_events
 2.87% memcgstat [kernel.kallsyms]  [k] __memcg_slab_post_alloc_hook
 2.73% memcgstat [kernel.kallsyms]  [k] kmem_cache_free
 2.70% memcgstat [kernel.kallsyms]  [k] entry_SYSRETQ_unsafe_stack
 2.25% memcgstat [kernel.kallsyms]  [k] __memcg_slab_free_hook
 2.06% memcgstat [kernel.kallsyms]  [k] get_page_from_freelist

Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Co-developed-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Link: https://lore.kernel.org/r/20251223044156.208250-5-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce BPF kfuncs to conveniently access memcg data:
  - bpf_mem_cgroup_vm_events(),
  - bpf_mem_cgroup_memory_events(),
  - bpf_mem_cgroup_usage(),
  - bpf_mem_cgroup_page_state(),
  - bpf_mem_cgroup_flush_stats().

These functions are useful for implementing BPF OOM policies, but
also can be used to accelerate access to the memcg data. Reading
it through cgroupfs is much more expensive, roughly 5x, mostly
because of the need to convert the data into the text and back.

JP Kobryn:
An experiment was setup to compare the performance of a program that
uses the traditional method of reading memory.stat vs a program using
the new kfuncs. The control program opens up the root memory.stat file
and for 1M iterations reads, converts the string values to numeric data,
then seeks back to the beginning. The experimental program sets up the
requisite libbpf objects and for 1M iterations invokes a bpf program
which uses the kfuncs to fetch all available stats for node_stat_item,
memcg_stat_item, and vm_event_item types.

The results showed a significant perf benefit on the experimental side,
outperforming the control side by a margin of 93%. In kernel mode,
elapsed time was reduced by 80%, while in user mode, over 99% of time
was saved.

control: elapsed time
real    0m38.318s
user    0m25.131s
sys     0m13.070s

experiment: elapsed time
real    0m2.789s
user    0m0.187s
sys     0m2.512s

control: perf data
33.43% a.out libc.so.6         [.] __vfscanf_internal
 6.88% a.out [kernel.kallsyms] [k] vsnprintf
 6.33% a.out libc.so.6         [.] _IO_fgets
 5.51% a.out [kernel.kallsyms] [k] format_decode
 4.31% a.out libc.so.6         [.] __GI_____strtoull_l_internal
 3.78% a.out [kernel.kallsyms] [k] string
 3.53% a.out [kernel.kallsyms] [k] number
 2.71% a.out libc.so.6         [.] _IO_sputbackc
 2.41% a.out [kernel.kallsyms] [k] strlen
 1.98% a.out a.out             [.] main
 1.70% a.out libc.so.6         [.] _IO_getline_info
 1.51% a.out libc.so.6         [.] __isoc99_sscanf
 1.47% a.out [kernel.kallsyms] [k] memory_stat_format
 1.47% a.out [kernel.kallsyms] [k] memcpy_orig
 1.41% a.out [kernel.kallsyms] [k] seq_buf_printf

experiment: perf data
10.55% memcgstat bpf_prog_..._query [k] bpf_prog_16aab2f19fa982a7_query
 6.90% memcgstat [kernel.kallsyms]  [k] memcg_page_state_output
 3.55% memcgstat [kernel.kallsyms]  [k] _raw_spin_lock
 3.12% memcgstat [kernel.kallsyms]  [k] memcg_events
 2.87% memcgstat [kernel.kallsyms]  [k] __memcg_slab_post_alloc_hook
 2.73% memcgstat [kernel.kallsyms]  [k] kmem_cache_free
 2.70% memcgstat [kernel.kallsyms]  [k] entry_SYSRETQ_unsafe_stack
 2.25% memcgstat [kernel.kallsyms]  [k] __memcg_slab_free_hook
 2.06% memcgstat [kernel.kallsyms]  [k] get_page_from_freelist

Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Co-developed-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Signed-off-by: JP Kobryn &lt;inwardvessel@gmail.com&gt;
Link: https://lore.kernel.org/r/20251223044156.208250-5-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: introduce bpf_get_root_mem_cgroup() BPF kfunc</title>
<updated>2025-12-23T06:20:21+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>roman.gushchin@linux.dev</email>
</author>
<published>2025-12-23T04:41:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5c7db3239c9fbe3c62cb0d89b64959ea23af2de9'/>
<id>5c7db3239c9fbe3c62cb0d89b64959ea23af2de9</id>
<content type='text'>
Introduce a BPF kfunc to get a trusted pointer to the root memory
cgroup. It's very handy to traverse the full memcg tree, e.g.
for handling a system-wide OOM.

It's possible to obtain this pointer by traversing the memcg tree
up from any known memcg, but it's sub-optimal and makes BPF programs
more complex and less efficient.

bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
however in reality it's not necessary to bump the corresponding
reference counter - root memory cgroup is immortal, reference counting
is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.

Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Link: https://lore.kernel.org/r/20251223044156.208250-4-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Introduce a BPF kfunc to get a trusted pointer to the root memory
cgroup. It's very handy to traverse the full memcg tree, e.g.
for handling a system-wide OOM.

It's possible to obtain this pointer by traversing the memcg tree
up from any known memcg, but it's sub-optimal and makes BPF programs
more complex and less efficient.

bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
however in reality it's not necessary to bump the corresponding
reference counter - root memory cgroup is immortal, reference counting
is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.

Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Link: https://lore.kernel.org/r/20251223044156.208250-4-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: introduce BPF kfuncs to deal with memcg pointers</title>
<updated>2025-12-23T06:20:21+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>roman.gushchin@linux.dev</email>
</author>
<published>2025-12-23T04:41:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5904db9891f80f84283648121e2d8c8a506296a8'/>
<id>5904db9891f80f84283648121e2d8c8a506296a8</id>
<content type='text'>
To effectively operate with memory cgroups in BPF there is a need
to convert css pointers to memcg pointers. A simple container_of
cast which is used in the kernel code can't be used in BPF because
from the verifier's point of view that's a out-of-bounds memory access.

Introduce helper get/put kfuncs which can be used to get
a refcounted memcg pointer from the css pointer:
  - bpf_get_mem_cgroup,
  - bpf_put_mem_cgroup.

bpf_get_mem_cgroup() can take both memcg's css and the corresponding
cgroup's "self" css. It allows it to be used with the existing cgroup
iterator which iterates over cgroup tree, not memcg tree.

Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Link: https://lore.kernel.org/r/20251223044156.208250-3-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To effectively operate with memory cgroups in BPF there is a need
to convert css pointers to memcg pointers. A simple container_of
cast which is used in the kernel code can't be used in BPF because
from the verifier's point of view that's a out-of-bounds memory access.

Introduce helper get/put kfuncs which can be used to get
a refcounted memcg pointer from the css pointer:
  - bpf_get_mem_cgroup,
  - bpf_put_mem_cgroup.

bpf_get_mem_cgroup() can take both memcg's css and the corresponding
cgroup's "self" css. It allows it to be used with the existing cgroup
iterator which iterates over cgroup tree, not memcg tree.

Signed-off-by: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Link: https://lore.kernel.org/r/20251223044156.208250-3-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
