diff options
| author | Tejun Heo <tj@kernel.org> | 2026-04-10 07:54:06 -1000 |
|---|---|---|
| committer | Tejun Heo <tj@kernel.org> | 2026-04-10 07:54:06 -1000 |
| commit | d1d3c1c6ae3691869be9d94730d6e5325aaae8c6 (patch) | |
| tree | cb57e9b4b9dedb5fdb92bde8751c24e5ca7b9366 /include/net/phy/git@git.tavy.me:linux.git | |
| parent | 2193af26a149acfb7a66f49397665640c2a60d8c (diff) | |
sched_ext: Add verifier-time kfunc context filter
Move enforcement of SCX context-sensitive kfunc restrictions from per-task
runtime kf_mask checks to BPF verifier-time filtering, using the BPF core's
struct_ops context information.
A shared .filter callback is attached to each context-sensitive BTF set
and consults a per-op allow table (scx_kf_allow_flags[]) indexed by SCX
ops member offset. Disallowed calls are now rejected at program load time
instead of at runtime.
The old model split reachability across two places: each SCX_CALL_OP*()
set bits naming its op context, and each kfunc's scx_kf_allowed() check
OR'd together the bits it accepted. A kfunc was callable when those two
masks overlapped. The new model transposes the result to the caller side -
each op's allow flags directly list the kfunc groups it may call. The old
bit assignments were:
Call-site bits:
ops.select_cpu = ENQUEUE | SELECT_CPU
ops.enqueue = ENQUEUE
ops.dispatch = DISPATCH
ops.cpu_release = CPU_RELEASE
Kfunc-group accepted bits:
enqueue group = ENQUEUE | DISPATCH
select_cpu group = SELECT_CPU | ENQUEUE
dispatch group = DISPATCH
cpu_release group = CPU_RELEASE
Intersecting them yields the reachability now expressed directly by
scx_kf_allow_flags[]:
ops.select_cpu -> SELECT_CPU | ENQUEUE
ops.enqueue -> SELECT_CPU | ENQUEUE
ops.dispatch -> ENQUEUE | DISPATCH
ops.cpu_release -> CPU_RELEASE
Unlocked ops carried no kf_mask bits and reached only unlocked kfuncs;
that maps directly to UNLOCKED in the new table.
Equivalence was checked by walking every (op, kfunc-group) combination
across SCX ops, SYSCALL, and non-SCX struct_ops callers against the old
scx_kf_allowed() runtime checks. With two intended exceptions (see below),
all combinations reach the same verdict; disallowed calls are now caught at
load time instead of firing scx_error() at runtime.
scx_bpf_dsq_move_set_slice() and scx_bpf_dsq_move_set_vtime() are
exceptions: they have no runtime check at all, but the new filter rejects
them from ops outside dispatch/unlocked. The affected cases are nonsensical
- the values these setters store are only read by
scx_bpf_dsq_move{,_vtime}(), which is itself restricted to
dispatch/unlocked, so a setter call from anywhere else was already dead
code.
Runtime scx_kf_mask enforcement is left in place by this patch and removed
in a follow-up.
Original-patch-by: Juntong Deng <juntong.deng@outlook.com>
Original-patch-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Diffstat (limited to 'include/net/phy/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions
