diff options
| author | Amery Hung <ameryhung@gmail.com> | 2026-04-10 18:54:17 -0700 |
|---|---|---|
| committer | Alexei Starovoitov <ast@kernel.org> | 2026-04-10 21:22:32 -0700 |
| commit | 5063e775889948c0475ccdf21c74a6191b7b6482 (patch) | |
| tree | 13d391ce9a74e8b003873a13511b700810c122c5 /rust/kernel/ptr/git@git.tavy.me:linux.git | |
| parent | 78ee02a966ad76966be516ed3d56860d7a58fe7e (diff) | |
bpf: Use kmalloc_nolock() universally in local storage
Switch to kmalloc_nolock() universally in local storage. Socket local
storage didn't move to kmalloc_nolock() when BPF memory allocator was
replaced by it for performance reasons. Now that kfree_rcu() supports
freeing memory allocated by kmalloc_nolock(), we can move the remaining
local storages to use kmalloc_nolock() and cleanup the cluttered free
paths.
Use kfree() instead of kfree_nolock() in bpf_selem_free_trace_rcu() and
bpf_local_storage_free_trace_rcu(). Both callbacks run in process context
where spinning is allowed, so kfree_nolock() is unnecessary.
Benchmark:
./bench -p 1 local-storage-create --storage-type socket \
--batch-size {16,32,64}
The benchmark is a microbenchmark stress-testing how fast local storage
can be created. There is no measurable throughput change for socket local
storage after switching from kzalloc() to kmalloc_nolock().
Socket local storage
batch creation speed diff
--------------- ---- ------------------ ----
Baseline 16 433.9 ± 0.6 k/s
32 434.3 ± 1.4 k/s
64 434.2 ± 0.7 k/s
After 16 439.0 ± 1.9 k/s +1.2%
32 437.3 ± 2.0 k/s +0.7%
64 435.8 ± 2.5k/s +0.4%
Also worth noting that the baseline got a 5% throughput boost when sheaf
replaces percpu partial slab recently [0].
[0] https://lore.kernel.org/bpf/20260123-sheaves-for-all-v4-0-041323d506f7@suse.cz/
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260411015419.114016-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Diffstat (limited to 'rust/kernel/ptr/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions
