linux.git/kernel/bpf/fixups.c, branch v7.2-rc1

bpf: Add support for tracing_multi link session

2026-06-07T17:03:01+00:00

Adding support to use session attachment with tracing_multi link.

Adding new BPF_TRACE_FSESSION_MULTI program attach type, that follows
the BPF_TRACE_FSESSION behaviour but on the tracing_multi link.

Such program is called on entry and exit of the attached function
and allows to pass cookie value from entry to exit execution.

Signed-off-by: Jiri Olsa 
Link: https://lore.kernel.org/r/20260606123955.345967-16-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov

bpf: Add multi tracing attach types

2026-06-07T17:03:01+00:00

Adding new program attach types multi tracing attachment:
  BPF_TRACE_FENTRY_MULTI
  BPF_TRACE_FEXIT_MULTI

and their base support in verifier code.

Programs with such attach type will use specific link attachment
interface coming in following changes.

This was suggested by Andrii some (long) time ago and turned out
to be easier than having special program flag for that.

Bpf programs with such types have 'bpf_multi_func' function set as
their attach_btf_id and keep module reference when it's specified
by attach_prog_fd.

They are also accepted as sleepable programs during verification,
and the real validation for specific BTF_IDs/functions will happen
during the multi link attachment in following changes.

Suggested-by: Andrii Nakryiko 
Signed-off-by: Jiri Olsa 
Link: https://lore.kernel.org/r/20260606123955.345967-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov

bpf: Refactor object relationship tracking and fix dynptr UAF bug

2026-06-02T01:31:41+00:00

Refactor object relationship tracking in the verifier and fix a dynptr
use-after-free bug where file/skb dynptrs are not invalidated when the
parent referenced object is freed.

Add parent_id to bpf_reg_state to precisely track child-parent
relationships. A child object's parent_id points to the parent object's
id. This replaces the PTR_TO_MEM-specific dynptr_id.

Remove ref_obj_id from bpf_reg_state by folding its role into the
existing id field. Previously, id tracked pointer identity for null
checking while ref_obj_id tracked the owning reference for lifetime
management. These are now unified: acquire helpers and kfuncs set id
to the acquired reference id, and release paths use id directly.

Add reg_is_referenced() which checks if a register is referenced by
looking up its id in the reference array. This replaces all former
ref_obj_id checks.

For release_reference(), invalidating an object now also invalidates
all descendants by traversing the object tree. This is done using
stack-based DFS to avoid recursive call chains of release_reference() ->
unmark_stack_slots_dynptr() -> release_reference(). Referenced objects
encountered during tree traversal are reported as leaked references.

Add parent_id to bpf_reference_state to enable hierarchical reference
tracking. When acquiring a reference, a parent_id can be specified to
link the new reference to an existing one (e.g., referenced dynptrs
acquire a reference with parent_id linking to the parent object's
reference).

Pointer casting:

For pointer casting helpers (bpf_sk_fullsock, bpf_tcp_sock), instead of
propagating ref_obj_id, the cast result reuses the same reference id as
the source pointer. Since the cast may return NULL for a non-NULL input,
the NULL case is explored as a separate verifier branch. This allows
releasing any of the original or cast pointers to invalidate all others.

Referenced dynptrs:

When constructing a referenced dynptr, acquire a intermediate reference
with parent_id linking to the parent referenced object. The dynptr and
all clones share the same parent_id (pointing to the intermediate ref)
but get unique ids for independent slice tracking. Releasing a
referenced dynptr releases the parent reference, which in turn
invalidates all clones and their derived slices.

Owning to non-owning reference conversion:

After converting owning to non-owning by clearing id (e.g.,
object(id=1) -> object(id=0)), the verifier releases the reference
state via release_reference_nomark().

Note that the error message "reference has not been acquired before" in
the helper and kfunc release paths is removed. This message was already
unreachable. The verifier only calls release_reference() after
confirming the reference is valid, so the condition could never trigger
in practice.

Fixes: 870c28588afa ("bpf: net_sched: Add basic bpf qdisc kfuncs")
Signed-off-by: Amery Hung 
Acked-by: Eduard Zingerman 
Link: https://lore.kernel.org/r/20260529014936.2811085-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5

2026-05-25T13:33:30+00:00

Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov

bpf,x86: Fix exception unwinding with outgoing stack arguments

2026-05-17T20:53:24+00:00

When a main program with exception_boundary has outgoing stack
arguments (e.g. from calling subprogs with >5 args), bpf_throw() fails
to correctly restore callee-saved registers, causing a kernel crash.

The x86 JIT allocates the outgoing stack arg area below the
callee-saved registers via 'sub rsp, outgoing_rsp' in the prologue.
When bpf_throw() unwinds, it captures the main program's sp (which
includes this outgoing area) and passes it to the exception callback.
The callback gets rsp and rbp, followed by pop_callee_regs, but rsp
points into the outgoing arg area rather than the callee-saved
registers, so the pops restore garbage values. Returning to the
kernel with corrupted callee-saved registers causes a crash.

Fix this by adjusting the sp (adding stack_arg_sp_adjust) passed to
the exception callback, so it points to the bottom of the callee-saved
registers instead of the outgoing arg area. When stack_arg_sp_adjust
is 0 (the common case), this is a no-op.

Fixes: 324c3ca6eed6 ("bpf,x86: Implement JIT support for stack arguments")
Acked-by: Kumar Kartikeya Dwivedi 
Signed-off-by: Yonghong Song 
Link: https://lore.kernel.org/r/20260517150702.288031-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov

bpf: Clean up redundant stack arg checks for non-JITed programs

2026-05-17T00:46:16+00:00

Remove a redundant stack_arg_cnt check in __bpf_prog_select_runtime()
and start the stack arg loop from index 0 in bpf_fixup_call_args().
Both changes are no-ops that simplify the code:

In __bpf_prog_select_runtime(), the subprog_info[0].stack_arg_cnt
check is unreachable:
  - when there is only a main program (no bpf-to-bpf calls),
    subprog_info[0].stack_arg_cnt is always 0 because the main
    program's arg_cnt is forced to 1
  - when bpf-to-bpf calls use stack args and JIT succeeds,
    fp->bpf_func is set and this code is skipped
  - when JIT fails, bpf_fixup_call_args() rejects the program
    before we get to __bpf_prog_select_runtime().

In bpf_fixup_call_args(), starting the loop at i=1 skipped subprog 0,
which is safe since the main program always has arg_cnt=1 and thus
bpf_in_stack_arg_cnt() returns 0. Starting at i=0 removes the need
to reason about this invariant.

Signed-off-by: Yonghong Song 
Link: https://lore.kernel.org/r/20260515225101.824054-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov

bpf: Reject stack arguments in non-JITed programs

2026-05-13T16:27:31+00:00

The interpreter does not understand the bpf register r11
(BPF_REG_PARAMS) used for stack arguments. So reject interpreter
usage if stack arguments are used either in the main program or
any subprogram.

Signed-off-by: Yonghong Song 
Link: https://lore.kernel.org/r/20260513045049.2390444-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov

bpf: Support stack arguments for bpf functions

2026-05-13T16:27:30+00:00

Currently BPF functions (subprogs) are limited to 5 register arguments.
With [1], the compiler can emit code that passes additional arguments
via a dedicated stack area through bpf register BPF_REG_PARAMS (r11),
introduced in an earlier patch ([2]).

The compiler uses positive r11 offsets for incoming (callee-side) args
and negative r11 offsets for outgoing (caller-side) args, following the
x86_64/arm64 calling convention direction. There is an 8-byte gap at
offset 0 separating two regions:
  Incoming (callee reads):   r11+8 (arg6), r11+16 (arg7), ...
  Outgoing (caller writes):  r11-8 (arg6), r11-16 (arg7), ...

The following is an example to show how stack arguments are saved
and transferred between caller and callee:

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    bar(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
  }

  Caller (foo)                           Callee (bar)
  ============                           ============
  Incoming (positive offsets):           Incoming (positive offsets):

  r11+8:  [incoming arg 6]               r11+8:  [incoming arg 6] <-+
  r11+16: [incoming arg 7]               r11+16: [incoming arg 7] <-|+
                                         r11+24: [incoming arg 8] <-||+
  Outgoing (negative offsets):                                      |||
  r11-8:  [outgoing arg 6 to bar] -------->-------------------------+||
  r11-16: [outgoing arg 7 to bar] -------->--------------------------+|
  r11-24: [outgoing arg 8 to bar] -------->---------------------------+

If the bpf function has more than one call:

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    bar1(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
    bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
    ...
  }

  Caller (foo)                             Callee (bar2)
  ============                             ==============
  Incoming (positive offsets):             Incoming (positive offsets):

  r11+8:  [incoming arg 6]                 r11+8:  [incoming arg 6] <+
  r11+16: [incoming arg 7]                 r11+16: [incoming arg 7] <|+
                                           r11+24: [incoming arg 8] <||+
  Outgoing for bar2 (negative offsets):    r11+32: [incoming arg 9] <|||+
  r11-8:  [outgoing arg 6] ---->----------->-------------------------+|||
  r11-16: [outgoing arg 7] ---->----------->--------------------------+||
  r11-24: [outgoing arg 8] ---->----------->---------------------------+|
  r11-32: [outgoing arg 9] ---->----------->----------------------------+

The verifier tracks outgoing stack arguments in stack_arg_regs[] and
out_stack_arg_cnt in bpf_func_state, separately from the regular
r10 stack. The callee does not copy incoming args — it reads them
directly from the caller's outgoing slots at positive r11 offsets.
Similar to stacksafe(), introduce stack_arg_safe() to do pruning
check.

Outgoing stack arg slots are invalidated when the callee returns
(e.g. in prepare_func_exit), not at call time. This allows the callee to
read incoming args from the caller's outgoing slots during
verification. The following are a few examples.

Example 1:
  *(u64 *)(r11 - 8) = r6;
  *(u64 *)(r11 - 16) = r7;
  call bar1;                // arg6 = r6, arg7 = r7
  call bar2;                // expected with 2 stack arguments, failed

Example 2:
To fix the Example 1:
  *(u64 *)(r11 - 8) = r6;
  *(u64 *)(r11 - 16) = r7;
  call bar1;                // arg6 = r6, arg7 = r7
  *(u64 *)(r11 - 8) = r8;
  *(u64 *)(r11 - 16) = r9;
  call bar2;                // arg6 = r8, arg7 = r9

Example 3:
The compiler can hoist the shared stack arg stores above the branch:
  *(u64 *)(r11 - 16) = r7;
  if cond goto else;
    *(u64 *)(r11 - 8) = r8;
    call bar1;               // arg6 = r8, arg7 = r7
    goto end;
  else:
    *(u64 *)(r11 - 8) = r9;
    call bar2;               // arg6 = r9, arg7 = r7
  end:

Example 4:
Within a loop:
  loop:
    *(u64 *)(r11 - 8) = r6;  // arg6, before loop
    call bar;                // reuses arg6 each iteration
    if ... goto loop;

A separate max_out_stack_arg_cnt field in bpf_subprog_info tracks
the deepest outgoing slot actually written. This intends to
reject programs that write to slots beyond what any callee expects.
It is necessary for JIT.

Similar to typical compiler generated code, enforce the following
orderings:
  - all stack arg reads must be ahead of any stack arg write
  - all stack arg reads must be before any bpf func, kfunc and helpers
This is needed as JIT may emit 'mov' insns for read/write with
the same register and bpf function, kfunc and helper will invalidate
all arguments immediately after the call.

Callback functions with stack arguments need kernel setup parameter
types (including stack parameters) properly and then callback function
can retrieve such information for verification purpose.

Global subprogs and freplace with >5 args are not yet supported.

  [1] https://github.com/llvm/llvm-project/pull/189060
  [2] https://lore.kernel.org/bpf/20260423033506.2542005-1-yonghong.song@linux.dev/

Signed-off-by: Yonghong Song 
Link: https://lore.kernel.org/r/20260513045015.2385013-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov

bpf: Fix s16 truncation for large bpf-to-bpf call offsets

2026-05-11T15:27:02+00:00

Currently, the BPF instruction set allows bpf-to-bpf calls (or internal
calls, pseudo calls) to use a 32-bit imm field to represent the relative
jump offset.

However, when JIT is disabled or falls back to the interpreter, the
verifier invokes bpf_patch_call_args() to rewrite the call instruction.
In this function, the 32-bit imm is downcast to s16 and stored in the off
field.

    void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth)
    {
        stack_depth = max_t(u32, stack_depth, 1);
        insn->off = (s16) insn->imm;
        insn->imm = interpreters_args[(round_up(stack_depth, 32) / 32) - 1] -
            __bpf_call_base_args;
        insn->code = BPF_JMP | BPF_CALL_ARGS;
    }

If the original imm exceeds the s16 range (i.e., a jump offset greater
than 32767 instructions), this downcast silently truncates the offset,
resulting in an incorrect call target.

Fix this by:
1. In bpf_patch_call_args(), keeping the imm field unchanged and using the
   off field to store the index of the interpreter function.
2. In ___bpf_prog_run() for the JMP_CALL_ARGS case, retrieving the
   interpreter function pointer from the interpreters_args array using the
   off field as the index, and passing the original imm to calculate the
   last argument of the interpreter function.

After these changes, the truncation issue is resolved, and __bpf_call_base_args
is also no longer needed and can be removed, which makes the code cleaner.

Performance: In ___bpf_prog_run() for the JMP_CALL_ARGS case, changing the
retrieval of the interpreter function pointer from pointer addition to
direct array indexing improves performance. The possible reason is that the
latter has better instruction-level parallelism. See the v5 discussion [1]
for more details.

[1] https://lore.kernel.org/bpf/f120c3c4-6999-414a-b514-518bb64b4758@zju.edu.cn/

To avoid requiring bpftool changes, keep the new imm/off encoding internal
and restore the legacy xlated dump layout in bpf_insn_prepare_dump().
For bpf-to-bpf call offsets that do not fit in s16, export off as 0 instead
of a truncated and misleading value.

Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter")
Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
Suggested-by: Xu Kuohai 
Suggested-by: Puranjay Mohan 
Co-developed-by: Tianci Cao 
Signed-off-by: Tianci Cao 
Co-developed-by: Shenghao Yuan 
Signed-off-by: Shenghao Yuan 
Signed-off-by: Yazhou Tang 
Link: https://lore.kernel.org/r/20260506094714.419842-3-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov

bpf: Fix out-of-bounds read in bpf_patch_call_args()

2026-05-11T15:27:01+00:00

The interpreters_args array only accommodates stack depths up to
MAX_BPF_STACK (512 bytes). However, do_misc_fixups() may allow a larger
stack depth if JIT is requested.

If JIT compilation later fails and falls back to the interpreter, the
verifier invokes bpf_patch_call_args() with this oversized stack depth.
This causes a load-time out-of-bounds (OOB) read when calculating the
interpreter function pointer index.

Fix this by changing bpf_patch_call_args() to return an int and explicitly
rejecting the JIT fallback (returning -EINVAL) if the stack depth exceeds
MAX_BPF_STACK.

Fixes: 1ea47e01ad6e ("bpf: add support for bpf_call to interpreter")
Co-developed-by: Tianci Cao 
Signed-off-by: Tianci Cao 
Co-developed-by: Shenghao Yuan 
Signed-off-by: Shenghao Yuan 
Signed-off-by: Yazhou Tang 
Acked-by: Xu Kuohai 
Link: https://lore.kernel.org/r/20260506094714.419842-2-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov