linux.git/kernel/bpf, branch master

Merge tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

2026-06-17T08:18:14+00:00

Pull bpf updates from Alexei Starovoitov:
 "Major changes:

   - Recover from BPF arena page faults using a scratch page and add
     ptep_try_set() for lockless empty-slot installs on x86 and arm64.

     This allows BPF kfuncs to access arena pointers directly.

     The 'arena_direct_access' stable branch was created for this work
     and was pulled into sched-ext and bpf-next trees (Tejun Heo, Kumar
     Kartikeya Dwivedi)

   - Lift old restriction and support 6+ arguments in BPF programs and
     kfuncs on x86 and arm64 (Yonghong Song, Puranjay Mohan)

  Other features and fixes:

   - Add 24-bit BTF vlen and reclaim unused bits in the BTF UAPI to ease
     addition of new BTF kinds (Alan Maguire)

   - Raise the maximum BPF call chain depth from 8 to 16 frames (Alexei
     Starovoitov)

   - Refactor object relationship tracking in the verifier and fix a
     dynptr use-after-free bug (Amery Hung)

   - Harden the signed program loader and reject exclusive maps as inner
     maps (Daniel Borkmann)

   - Replace the verifier min/max bounds fields with a circular number
     (cnum) representation and improve 32->64 bit range refinements
     (Eduard Zingerman)

   - Introduce the arena library and runtime (libarena) with a buddy
     allocator, rbtree and SPMC queue data structures, ASAN support and
     a parallel test harness. Allow subprograms to return arena pointers
     and switch to a BTF type-tag based __arena annotation (Emil
     Tsalapatis)

   - Cache build IDs in the sleepable stackmap path and avoid faultable
     build ID reads under mm locks (Ihor Solodrai)

   - Introduce the tracing_multi link to attach a single BPF program to
     many kernel functions at once. Allow specifying the uprobe_multi
     target via FD (Jiri Olsa)

   - Extend the bpf_list family of kfuncs with bpf_list_add/del(), and
     bpf_list_is_first/is_last/empty() (Kaitao Cheng)

   - Extend the BPF syscall with common attributes support for
     prog_load, btf_load and map_create (Leon Hwang)

   - Wrap rhashtable as BPF map (Mykyta Yatsenko, Herbert Xu)

   - Add sleepable support for tracepoint programs and fix deadlocks in
     LRU map due to NMI reentry (Mykyta Yatsenko)

   - Fix OOB access in bpf_flow_keys, fix nullness analysis of inner
     arrays, enforce write checks for global subprograms (Nuoqi Gui)

   - Report the maximum combined stack depth and print a breakdown of
     instructions processed per subprogram (Paul Chaignon)

   - Add an XDP load-balancer benchmark and arm64 JIT support for stack
     arguments (Puranjay Mohan)

   - Add kfuncs to traverse over wakeup_sources (Samuel Wu)

   - Allow sleepable BPF programs to use LPM trie maps directly (Vlad
     Poenaru)

   - Many more fixes and cleanups across the verifier, BTF, sockmap,
     devmap, bpffs, security hooks, s390/riscv/loongarch JITs,
     rqspinlock, libbpf, bpftool, selftests"

* tag 'bpf-next-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (336 commits)
  selftests/bpf: Work around llvm stack overflow in crypto progs
  selftests/bpf: add test for bpf_msg_pop_data() overflow
  bpf, sockmap: fix integer overflow in bpf_msg_pop_data() bounds check
  sockmap: Fix use-after-free in udp_bpf_recvmsg()
  bpf, sockmap: keep sk_msg copy state in sync
  bpf, sockmap: Fix wrong rsge offset in bpf_msg_push_data()
  bpf, sockmap: reject overflowing copy + len in bpf_msg_push_data()
  selftsets/bpf: Retry map update on helper_fill_hashmap()
  selftests/bpf: Add test for sleepable lsm_cgroup rejection
  selftests/bpf: Add test to verify the fix for bpf_setsockopt() helper
  bpf: Fix bpf_get/setsockopt to tos for ipv4-mapped ipv6 socket
  selftests/bpf: Avoid static LLVM linking for cross builds
  selftests/bpf: Use common CFLAGS for urandom_read
  selftests/bpf: Initialize operation name before use
  tools/bpf: build: Append extra cflags
  libbpf: Initialize CFLAGS before including Makefile.include
  bpftool: Append extra host flags
  bpftool: Avoid adding EXTRA_CFLAGS to HOST_CFLAGS
  bpftool: Pass host flags to bootstrap libbpf
  selftests/bpf: correct CONFIG_PPC64 macro name in comment
  ...

bpf: Add support to specify uprobe_multi target via file descriptor

2026-06-15T00:24:25+00:00

Allow uprobe_multi link to identify the target binary by an already
opened file descriptor.

Adding new BPF_F_UPROBE_MULTI_PATH_FD flag and the path_fd field for
the attr.link_create.uprobe_multi struct.

When the flag is set, we resolve the target from path_fd, without the
flag, we keep the existing string path behavior.

I don't see a use case for supporting O_PATH file descriptors, because
we need to read the binary first to get probes offsets, so I'm using
the CLASS(fd, f), which fails for O_PATH fds.

Assisted-by: Codex:GPT-5.4
Signed-off-by: Jiri Olsa 
Link: https://lore.kernel.org/r/20260611114230.950379-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov

Merge tag 'vfs-7.2-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

2026-06-14T22:24:54+00:00

Pull simple_xattr updates from Christian Brauner:
 "This reworks the simple xattr api to make it more efficient and easier
  to use for all consumers.

  The simple_xattr hash table moves from the inode into a per-superblock
  cache, removing the per-inode overhead for the common case of few or
  no xattrs. The interface now passes struct simple_xattrs ** so lazy
  allocation is handled internally instead of by every caller, kernfs
  xattr operations on kernfs nodes shared between multiple superblocks
  are properly serialized, and tmpfs constructs "security.foo" xattr
  names with kasprintf() instead of kmalloc() plus two memcpy()s.

  A follow-up fix links kernfs nodes to their parent before the LSM init
  hook runs: with the per-sb cache kernfs_xattr_set() computes the cache
  via kernfs_root(kn), which faulted on a freshly allocated node when
  selinux_kernfs_init_security() called into it - reproducible as a NULL
  pointer dereference on the first cgroup mkdir on SELinux-enabled
  systems.

  On top of this bpffs gains support for trusted.* and security.* xattrs
  so that user space and BPF LSM programs can attach metadata - for
  example a content hash or a security label - to pinned objects and
  directories and inspect it uniformly like on other filesystems. The
  store is in-memory and non-persistent, living only for the lifetime of
  the mount like everything else in bpffs"

* tag 'vfs-7.2-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  bpf: Add simple xattr support to bpffs
  kernfs: link kn to its parent before the LSM init hook
  simpe_xattr: use per-sb cache
  simple_xattr: change interface to pass struct simple_xattrs **
  tmpfs: simplify constructing "security.foo" xattr names
  kernfs: fix xattr race condition with multiple superblocks

bpf: Raise maximum call chain depth to 16 frames

2026-06-14T20:47:38+00:00

Bump MAX_CALL_FRAMES from 8 to 16 to allow deeper call chains
that Rust-BPF requires and update selftests.

Link: https://lore.kernel.org/r/20260613180755.29671-1-alexei.starovoitov@gmail.com
Signed-off-by: Alexei Starovoitov

bpf: Fix setting retval to -EPERM for cgroup hooks not returning errno

2026-06-13T03:33:16+00:00

When a cgroup BPF program exits with 0, bpf_prog_run_array_cg() sets
the hook return value to -EPERM if it is not a valid errno. This is
correct for errno-based hooks, which return 0 on success and negative
errno on failure, but wrong for boolean and void LSM hooks. Boolean
LSM hooks should only return true or false, and void LSM hooks have
no return value at all.

Fix it by skipping setting -EPERM for hooks not returning errno.

Fixes: 69fd337a975c ("bpf: per-cgroup lsm flavor")
Signed-off-by: Xu Kuohai 
Link: https://lore.kernel.org/r/20260610201724.733943-2-xukuohai@huaweicloud.com
Signed-off-by: Alexei Starovoitov

bpf: Run generic devmap egress prog on private skb

2026-06-13T01:21:01+00:00

Generic XDP devmap multi redirect uses skb_clone() for intermediate
destinations and sends the last destination with the original skb. This
can leave multiple destinations sharing the same packet data.

This becomes visible after generic devmap egress-program support was
added: a devmap egress program may mutate packet data, and another
destination sharing the same data can observe that mutation.

Native XDP broadcast redirect does not have this issue because
xdpf_clone() copies the frame data for each destination. Generic XDP
should provide the same per-destination isolation before running a
devmap egress program.

Fix this by making cloned skbs private before running the generic devmap
egress program. Use skb_copy() instead of skb_unshare() so allocation
failure does not consume the skb and the existing caller error paths keep
their ownership semantics.

Fixes: 2ea5eabaf04a ("bpf: devmap: Implement devmap prog execution for generic XDP")
Suggested-by: Jiayuan Chen 
Suggested-by: Jakub Kicinski 
Reviewed-by: Toke Høiland-Jørgensen 
Signed-off-by: Sun Jian 
Link: https://lore.kernel.org/r/20260612114032.244616-2-sun.jian.kdev@gmail.com
Signed-off-by: Alexei Starovoitov

bpf: Tighten cgroup storage cookie checks for prog arrays

2026-06-10T23:16:46+00:00

The fix in commit abad3d0bad72 ("bpf: Fix oob access in cgroup local
storage") is still incomplete. The prog-array compatibility check
treats a program with no cgroup storage as compatible with any stored
storage cookie. This allows a storage-less program to bridge a tail
call chain between an entry program and a storage-using callee even
though cgroup local storage at runtime still follows the caller's
context, that is, A -> B(no storage) -> C(storage) path.

Requiring exact cookie equality would break the legitimate case of a
storage-less leaf program being tail called from a storage-using one.
Instead, only accept a zero storage cookie if the program cannot
perform tail calls itself. This keeps A -> B(no storage) working
while rejecting the A -> B(no storage) -> C(storage) bridge.

Fixes: abad3d0bad72 ("bpf: Fix oob access in cgroup local storage")
Reported-by: Lin Ma 
Signed-off-by: Daniel Borkmann 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/r/20260610105539.705887-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov

bpf: Cancel special fields on map value recycle

2026-06-10T04:23:11+00:00

Map update and delete paths currently call bpf_obj_free_fields() when a
value is being replaced or recycled. That makes field destruction depend
on the context of the update/delete operation. For tracing programs this
can include NMI context, where referenced kptr destructors, uptr
unpinning, and graph root destruction are not generally safe.

Introduce bpf_obj_cancel_fields() for the reusable-value path. It only
performs NMI-safe cleanup for timer, workqueue, and task_work fields.
Fields that need full destruction are left attached to the recycled value
and are destroyed by the final cleanup path instead.

Switch array and hashtab update/delete/recycle paths to this cancel
helper. Keep bpf_obj_free_fields() for final map destruction and for
bpf_mem_alloc destructors. Preallocated hashtabs do not have allocator
destructors, so teardown continues to walk the normal and extra elements
and fully destroy their fields.

This deliberately relaxes the eager-free semantics of map update/delete
for special fields. Programs that relied on a recycled map slot becoming
empty immediately after update/delete were relying on behavior that
cannot be implemented safely from every BPF execution context without
offloading arbitrary destructors.

There is a chance this change breaks programs making assumptions
regarding the eager freeing of fields. If so, we can relax semantics to
cancellation only when irqs_disabled() is true in the future. However,
theoretically, map values that get reused eagerly already have weaker
guarantees as parallel users can recreate freed fields before the new
element becomes visible again.

Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
Signed-off-by: Justin Suess 
Co-developed-by: Kumar Kartikeya Dwivedi 
Signed-off-by: Kumar Kartikeya Dwivedi 
Link: https://lore.kernel.org/r/20260609202548.3571690-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov

bpf: Reject bpf_obj_drop() from tracing progs

2026-06-10T04:23:11+00:00

bpf_obj_drop() runs bpf_obj_free_fields() synchronously for
program-allocated objects. When such an object contains NMI unsafe
fields, tracing programs that can run from arbitrary instrumented
context can reach that destruction from unsafe contexts, including NMI.

NMI is likely one instance of this problem, and other instances would
include possible unsafe reentrancy. Deferring bpf_obj_drop() is not
appealing either: it would add delayed-free machinery to a release
operation that otherwise has straightforward synchronous ownership
semantics.

Reject bpf_obj_drop() and bpf_percpu_obj_drop() from tracing programs
that may run from unsafe contexts unless every field in the object's BTF
record is explicitly NMI safe. Do not reject sleepable
BPF_PROG_TYPE_TRACING programs, since they are not the arbitrary/NMI
contexts that motivate the restriction.

Note that while bpf_rb_root and bpf_list_head would be NMI safe on their
own to free, the objects recursively held by them may not be; be
conservative and just mark them as not NMI safe for now.

Use a whitelist for the NMI-safe field set instead of listing only known
NMI unsafe fields. Locks, async fields, unreferenced kptrs, and
refcounts are known to be NMI safe because their destruction is either a
no-op, simple state reset, or async cancellation. Referenced kptrs,
percpu referenced kptrs, uptrs, graph roots, graph nodes, and any future
field type are rejected until audited for arbitrary tracing and NMI
contexts. This is less susceptible to future changes in fields that were
previously safe by exclusion, and to new fields being added without
updating this check.

Convert the existing recursive local-object drop success case to a
syscall program in the same commit, since this verifier change makes the
old tracing program form invalid. The test still exercises
bpf_obj_drop() releasing a referenced task kptr from a safe program
type.

Fixes: ac9f06050a35 ("bpf: Introduce bpf_obj_drop")
Signed-off-by: Justin Suess 
Co-developed-by: Kumar Kartikeya Dwivedi 
Signed-off-by: Kumar Kartikeya Dwivedi 
Link: https://lore.kernel.org/r/20260609202548.3571690-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov

bpf: Allow sleepable programs to use LPM trie maps directly

2026-06-09T19:42:38+00:00

The previous change relaxed the rcu_dereference annotations in
lpm_trie.c so the trie walks no longer trip lockdep when reached from a
sleepable BPF program holding only rcu_read_lock_trace().  By itself
that only helps tries reached as the inner map of a map-of-maps, or
from the classic-RCU syscall path: a sleepable program that references
an LPM trie directly is still rejected at load time by
check_map_prog_compatibility(), whose sleepable whitelist omits
BPF_MAP_TYPE_LPM_TRIE:

  Sleepable programs can only use array, hash, ringbuf and local storage maps

LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
into a Tasks Trace grace period before the node -- and the value
embedded in it that trie_lookup_elem() returns to the program -- is
released.  That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
on for sleepable access, so a value handed to a sleepable reader cannot
be freed while the program is still running under rcu_read_lock_trace().
The writer paths take trie->lock across the walk and never relied on the
RCU read-side lock to keep nodes alive.

Add BPF_MAP_TYPE_LPM_TRIE to the sleepable map whitelist so these
programs can use LPM tries directly.

Signed-off-by: Vlad Poenaru 
Reviewed-by: Emil Tsalapatis 
Link: https://lore.kernel.org/r/20260609135558.193287-3-vlad.wing@gmail.com
Signed-off-by: Alexei Starovoitov