linux.git/kernel/bpf/stackmap.c, branch v7.2-rc1

bpf: Fix build_id caching in stack_map_get_build_id_offset()

2026-06-22T00:58:14+00:00

This patch is a follow up to recent implementation of
stack_map_get_build_id_offset_sleepable() [1].

stack_map_get_build_id_offset() and its sleepable variant each cached
only the last successfully resolved VMA, with separate bookkeeping in
each function. A run of IPs in a VMA with no usable build ID will
repeat the lookup for every frame: find_vma() in the non-sleepable
path, a VMA lock and a blocking build_id_parse_file() in the sleepable.

Factor the per-call cache into a shared struct stack_map_build_id_cache
with two independent slots [2][3], used by both functions:

  * resolved   - last VMA that produced a build ID (file, build_id and
                 range), reused to skip the lookup and the parse;
  * unresolved - last VMA with no usable build ID (range only), reused to
                 emit a raw IP without another lookup or parse.

Keeping the slots independent means a build-ID-less VMA no longer evicts
the last resolved build ID, so a trace alternating between a binary and a
region without one stops re-resolving the binary on every return.

The shared lookup tests [vm_start, vm_end), matching the sleepable path;
the non-sleepable path previously reused a build ID for ip == vm_end
(range_in_vma() is inclusive) and now re-resolves it correctly.

[1] https://lore.kernel.org/bpf/20260525223948.1920986-1-ihor.solodrai@linux.dev/
[2] https://lore.kernel.org/bpf/CAEf4Bza2fRDGhLQoPE-EzM7F34xaEJfi5Exmxb-iWVUN3F06=g@mail.gmail.com/
[3] https://lore.kernel.org/bpf/CAEf4BzZXJFr=1iiVx937ht=4PYQkQHg=eFk810zhMDzXQG3ihw@mail.gmail.com/

Suggested-by: Andrii Nakryiko 
Signed-off-by: Ihor Solodrai 
Link: https://lore.kernel.org/r/20260615195536.1065107-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov

bpf: Cache build IDs in sleepable stackmap path

2026-05-28T21:58:13+00:00

Stack traces often contain adjacent IPs from the same VMA or from
different VMAs backed by the same ELF file. Cache the last successfully
parsed build id together with the resolved VMA range and backing file
so the sleepable build id path can avoid repeated VMA locking and file
parsing in common cases.

Suggested-by: Mykyta Yatsenko 
Signed-off-by: Ihor Solodrai 
Signed-off-by: Andrii Nakryiko 
Acked-by: Mykyta Yatsenko 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20260525223948.1920986-4-ihor.solodrai@linux.dev

bpf: Avoid faultable build ID reads under mm locks

2026-05-28T21:58:13+00:00

Sleepable build ID parsing can block in __kernel_read() [1], so the
stackmap sleepable path must not call it while holding mmap_lock or a
per-VMA read lock.

The issue and the fix are conceptually similar to a recent procfs
patch [2]. A similar VMA locking pattern has already been used in
PROCMAP_QUERY [3].

Resolve each covered VMA with a stable read-side reference, preferring
lock_vma_under_rcu() and falling back to mmap_read_trylock() only long
enough to acquire the VMA read lock. Take a reference to the backing
file, drop the VMA lock, and then parse the build ID through
(sleepable) build_id_parse_file().

We have to use mmap_read_trylock() (and give up on failure) in this
context because taking mmap_read_lock() is generally unsafe on code
paths reachable from BPF programs [4], and may lead to deadlocks.

[1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
[2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
[3] https://lore.kernel.org/all/20250808152850.2580887-1-surenb@google.com/
[4] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/

Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers")
Suggested-by: Puranjay Mohan 
Signed-off-by: Ihor Solodrai 
Signed-off-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20260525223948.1920986-3-ihor.solodrai@linux.dev

bpf: Factor out stack_map build ID helpers

2026-05-28T21:58:13+00:00

Factor out helpers from stack_map_get_build_id_offset() in
preparation for adding a sleepable build ID resolution path:
stack_map_build_id_set_ip(), stack_map_build_id_offset(), and
stack_map_build_id_set_valid().

While here, refactor stack_map_get_build_id_offset():
  * use continue-driven control flow in the main loop and remove
    build_id_valid label
  * update prev_vma and prev_build_id on the fall-back-to-IP branch so
    the cache reflects the actual VMA seen on the previous IP [1]
  * guard fetch_build_id() with vma_is_anonymous() [2] to skip parse
    attempts that would otherwise fail the ELF magic check

[1] https://lore.kernel.org/bpf/CAEf4Bzac9uWWqBvzH0iFzKvJcq3vxscZ3pKm0sUHmN-F-z9wVQ@mail.gmail.com/
[2] https://lore.kernel.org/bpf/226398c1ff3f2b686c0aeb010408d85fb15df13f9ff60a045bee31e79b9e41e9@mail.kernel.org/

Signed-off-by: Ihor Solodrai 
Signed-off-by: Andrii Nakryiko 
Acked-by: Mykyta Yatsenko 
Link: https://lore.kernel.org/bpf/20260525223948.1920986-2-ihor.solodrai@linux.dev

Merge tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

2025-12-04T00:54:54+00:00

Pull bpf updates from Alexei Starovoitov:

 - Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
   test_progs runner (Alexis Lothoré)

 - Convert selftests/bpf/test_xsk to test_progs runner (Bastien
   Curutchet)

 - Replace bpf memory allocator with kmalloc_nolock() in
   bpf_local_storage (Amery Hung), and in bpf streams and range tree
   (Puranjay Mohan)

 - Introduce support for indirect jumps in BPF verifier and x86 JIT
   (Anton Protopopov) and arm64 JIT (Puranjay Mohan)

 - Remove runqslower bpf tool (Hoyeon Lee)

 - Fix corner cases in the verifier to close several syzbot reports
   (Eduard Zingerman, KaFai Wan)

 - Several improvements in deadlock detection in rqspinlock (Kumar
   Kartikeya Dwivedi)

 - Implement "jmp" mode for BPF trampoline and corresponding
   DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
   from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)

 - Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
   Chaignon)

 - Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
   Borkmann)

 - Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)

 - Optimize bpf_map_update_elem() for map-in-map types (Ritesh
   Oedayrajsingh Varma)

 - Introduce overwrite mode for BPF ring buffer (Xu Kuohai)

* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
  bpf: optimize bpf_map_update_elem() for map-in-map types
  bpf: make kprobe_multi_link_prog_run always_inline
  selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
  selftests/bpf: remove test_tc_edt.sh
  selftests/bpf: integrate test_tc_edt into test_progs
  selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
  selftests/bpf: Add success stats to rqspinlock stress test
  rqspinlock: Precede non-head waiter queueing with AA check
  rqspinlock: Disable spinning for trylock fallback
  rqspinlock: Use trylock fallback when per-CPU rqnode is busy
  rqspinlock: Perform AA checks immediately
  rqspinlock: Enclose lock/unlock within lock entry acquisitions
  bpf: Remove runqslower tool
  selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
  bpf: Disable file_alloc_security hook
  bpf: check for insn arrays in check_ptr_alignment
  bpf: force BPF_F_RDONLY_PROG on insn array creation
  bpf: Fix exclusive map memory leak
  selftests/bpf: Make CS length configurable for rqspinlock stress test
  selftests/bpf: Add lock wait time stats to rqspinlock stress test
  ...

perf: Support deferred user unwind

2025-10-29T09:29:58+00:00

Add support for deferred userspace unwind to perf.

Where perf currently relies on in-place stack unwinding; from NMI
context and all that. This moves the userspace part of the unwind to
right before the return-to-userspace.

This has two distinct benefits, the biggest is that it moves the
unwind to a faultable context. It becomes possible to fault in debug
info (.eh_frame, SFrame etc.) that might not otherwise be readily
available. And secondly, it de-duplicates the user callchain where
multiple samples happen during the same kernel entry.

To facilitate this the perf interface is extended with a new record
type:

  PERF_RECORD_CALLCHAIN_DEFERRED

and two new attribute flags:

  perf_event_attr::defer_callchain - to request the user unwind be deferred
  perf_event_attr::defer_output    - to request PERF_RECORD_CALLCHAIN_DEFERRED records

The existing PERF_RECORD_SAMPLE callchain section gets a new
context type:

  PERF_CONTEXT_USER_DEFERRED

After which will come a single entry, denoting the 'cookie' of the
deferred callchain that should be attached here, matching the 'cookie'
field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED.

The 'defer_callchain' flag is expected on all events with
PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event
responsible for collecting side-band events (like mmap, comm etc.).
Setting 'defer_output' on multiple events will get you duplicated
PERF_RECORD_CALLCHAIN_DEFERRED records.

Based on earlier patches by Josh and Steven.

Signed-off-by: Peter Zijlstra (Intel) 
Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net

bpf: Fix stackmap overflow check in __bpf_get_stackid()

2025-10-28T16:20:27+00:00

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
 contains more stack entries than the stack map bucket can hold,
 leading to an out-of-bounds write in the bucket's data array.

Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Signed-off-by: Arnaud Lecomte 
Signed-off-by: Andrii Nakryiko 
Acked-by: Yonghong Song 
Acked-by: Song Liu 
Link: https://lore.kernel.org/bpf/20251025192941.1500-1-contact@arnaud-lcm.com

Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

bpf: Refactor stack map trace depth calculation into helper function

2025-10-28T16:20:27+00:00

Extract the duplicated maximum allowed depth computation for stack
traces stored in BPF stacks from bpf_get_stackid() and __bpf_get_stack()
into a dedicated stack_map_calculate_max_depth() helper function.

This unifies the logic for:
- The max depth computation
- Enforcing the sysctl_perf_event_max_stack limit

No functional changes for existing code paths.

Signed-off-by: Arnaud Lecomte 
Signed-off-by: Andrii Nakryiko 
Acked-by: Yonghong Song 
Acked-by: Song Liu 
Link: https://lore.kernel.org/bpf/20251025192858.31424-1-contact@arnaud-lcm.com

Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

2025-10-01T00:58:11+00:00

Pull bpf updates from Alexei Starovoitov:

 - Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc
   (Amery Hung)

   Applied as a stable branch in bpf-next and net-next trees.

 - Support reading skb metadata via bpf_dynptr (Jakub Sitnicki)

   Also a stable branch in bpf-next and net-next trees.

 - Enforce expected_attach_type for tailcall compatibility (Daniel
   Borkmann)

 - Replace path-sensitive with path-insensitive live stack analysis in
   the verifier (Eduard Zingerman)

   This is a significant change in the verification logic. More details,
   motivation, long term plans are in the cover letter/merge commit.

 - Support signed BPF programs (KP Singh)

   This is another major feature that took years to materialize.

   Algorithm details are in the cover letter/marge commit

 - Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich)

 - Add support for may_goto instruction to arm64 JIT (Puranjay Mohan)

 - Fix USDT SIB argument handling in libbpf (Jiawei Zhao)

 - Allow uprobe-bpf program to change context registers (Jiri Olsa)

 - Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and
   Puranjay Mohan)

 - Allow access to union arguments in tracing programs (Leon Hwang)

 - Optimize rcu_read_lock() + migrate_disable() combination where it's
   used in BPF subsystem (Menglong Dong)

 - Introduce bpf_task_work_schedule*() kfuncs to schedule deferred
   execution of BPF callback in the context of a specific task using the
   kernel’s task_work infrastructure (Mykyta Yatsenko)

 - Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya
   Dwivedi)

 - Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi)

 - Improve the precision of tnum multiplier verifier operation
   (Nandakumar Edamana)

 - Use tnums to improve is_branch_taken() logic (Paul Chaignon)

 - Add support for atomic operations in arena in riscv JIT (Pu Lehui)

 - Report arena faults to BPF error stream (Puranjay Mohan)

 - Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin
   Monnet)

 - Add bpf_strcasecmp() kfunc (Rong Tao)

 - Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao
   Chen)

* tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits)
  libbpf: Replace AF_ALG with open coded SHA-256
  selftests/bpf: Add stress test for rqspinlock in NMI
  selftests/bpf: Add test case for different expected_attach_type
  bpf: Enforce expected_attach_type for tailcall compatibility
  bpftool: Remove duplicate string.h header
  bpf: Remove duplicate crypto/sha2.h header
  libbpf: Fix error when st-prefix_ops and ops from differ btf
  selftests/bpf: Test changing packet data from kfunc
  selftests/bpf: Add stacktrace map lookup_and_delete_elem test case
  selftests/bpf: Refactor stacktrace_map case with skeleton
  bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE
  selftests/bpf: Fix flaky bpf_cookie selftest
  selftests/bpf: Test changing packet data from global functions with a kfunc
  bpf: Emit struct bpf_xdp_sock type in vmlinux BTF
  selftests/bpf: Task_work selftest cleanup fixes
  MAINTAINERS: Delete inactive maintainers from AF_XDP
  bpf: Mark kfuncs as __noclone
  selftests/bpf: Add kprobe multi write ctx attach test
  selftests/bpf: Add kprobe write ctx attach test
  selftests/bpf: Add uprobe context ip register change test
  ...

bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE

2025-09-25T23:12:14+00:00

The stacktrace map can be easily full, which will lead to failure in
obtaining the stack. In addition to increasing the size of the map,
another solution is to delete the stack_id after looking it up from
the user, so extend the existing bpf_map_lookup_and_delete_elem()
functionality to stacktrace map types.

Signed-off-by: Tao Chen 
Signed-off-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20250925175030.1615837-1-chen.dylane@linux.dev