linux-stable.git/kernel/sched/ext_internal.h, branch linux-rolling-lts

sched/ext: Implement cgroup_set_idle() callback

2026-05-17T15:15:36+00:00

[ Upstream commit 347ed2d566dabb06c7970fff01129c4f59995ed6 ]

Implement the missing cgroup_set_idle() callback that was marked as a
TODO. This allows BPF schedulers to be notified when a cgroup's idle
state changes, enabling them to adjust their scheduling behavior
accordingly.

The implementation follows the same pattern as other cgroup callbacks
like cgroup_set_weight() and cgroup_set_bandwidth(). It checks if the
BPF scheduler has implemented the callback and invokes it with the
appropriate parameters.

Fixes a spelling error in the cgroup_set_bandwidth() documentation.

tj: s/scx_cgroup_rwsem/scx_cgroup_ops_rwsem/ to fix build breakage.

Signed-off-by: zhidao su 
Signed-off-by: Tejun Heo 
Stable-dep-of: 80afd4c84bc8 ("sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters")
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

sched_ext: Fix SCX_EFLAG_INITIALIZED being a no-op flag

2026-03-12T11:09:22+00:00

[ Upstream commit 749989b2d90ddc7dd253ad3b11a77cf882721acf ]

SCX_EFLAG_INITIALIZED is the sole member of enum scx_exit_flags with no
explicit value, so the compiler assigns it 0. This makes the bitwise OR
in scx_ops_init() a no-op:

    sch->exit_info->flags |= SCX_EFLAG_INITIALIZED; /* |= 0 */

As a result, BPF schedulers cannot distinguish whether ops.init()
completed successfully by inspecting exit_info->flags.

Assign the value 1LLU << 0 so the flag is actually set.

Fixes: f3aec2adce8d ("sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init()")
Signed-off-by: David Carlier 
Signed-off-by: Tejun Heo 
Signed-off-by: Sasha Levin

sched_ext: Fix SCX_KICK_WAIT to work reliably

2026-02-06T15:57:45+00:00

commit a379fa1e2cae15d7422b4eead83a6366f2f445cb upstream.

SCX_KICK_WAIT is used to synchronously wait for the target CPU to complete
a reschedule and can be used to implement operations like core scheduling.

This used to be implemented by scx_next_task_picked() incrementing pnt_seq,
which was always called when a CPU picks the next task to run, allowing
SCX_KICK_WAIT to reliably wait for the target CPU to enter the scheduler and
pick the next task.

However, commit b999e365c298 ("sched_ext: Replace scx_next_task_picked()
with switch_class()") replaced scx_next_task_picked() with the
switch_class() callback, which is only called when switching between sched
classes. This broke SCX_KICK_WAIT because pnt_seq would no longer be
reliably incremented unless the previous task was SCX and the next task was
not.

This fix leverages commit 4c95380701f5 ("sched/ext: Fold balance_scx() into
pick_task_scx()") which refactored the pick path making put_prev_task_scx()
the natural place to track task switches for SCX_KICK_WAIT. The fix moves
pnt_seq increment to put_prev_task_scx() and also increments it in
pick_task_scx() to handle cases where the same task is re-selected, whether
by BPF scheduler decision or slice refill. The semantics: If the current
task on the target CPU is SCX, SCX_KICK_WAIT waits until the CPU enters the
scheduling path. This provides sufficient guarantee for use cases like core
scheduling while keeping the operation self-contained within SCX.

v2: - Also increment pnt_seq in pick_task_scx() to handle same-task
      re-selection (Andrea Righi).
    - Use smp_cond_load_acquire() for the busy-wait loop for better
      architecture optimization (Peter Zijlstra).

Reported-by: Wen-Fang Liu 
Link: http://lkml.kernel.org/r/228ebd9e6ed3437996dffe15735a9caa@honor.com
Cc: Peter Zijlstra 
Reviewed-by: Andrea Righi 
Signed-off-by: Tejun Heo 
Signed-off-by: Christian Loehle 
Signed-off-by: Greg Kroah-Hartman

sched_ext: Add SCX_EFLAG_INITIALIZED to indicate successful ops.init()

2025-09-23T19:03:26+00:00

ops.exit() may be called even if the loading failed before ops.init()
finishes successfully. This is because ops.exit() allows rich exit info
communication. Add SCX_EFLAG_INITIALIZED flag to scx_exit_info.flags to
indicate whether ops.init() finished successfully.

This enables BPF schedulers to distinguish between exit scenarios and
handle cleanup appropriately based on initialization state.

Acked-by: Andrea Righi 
Signed-off-by: Tejun Heo

sched_ext: Use bitfields for boolean warning flags

2025-09-23T19:03:26+00:00

Convert warned_zero_slice and warned_deprecated_rq in scx_sched struct to
single-bit bitfields. While this doesn't reduce struct size immediately,
it prepares for future bitfield additions.

v2: Update patch description.

Acked-by: Andrea Righi 
Signed-off-by: Tejun Heo

sched_ext: deprecation warn for scx_bpf_cpu_rq()

2025-09-03T21:51:57+00:00

scx_bpf_cpu_rq() works on an unlocked rq which generally isn't safe.
For the common use-cases scx_bpf_locked_rq() and
scx_bpf_cpu_curr() work, so add a deprecation warning
to scx_bpf_cpu_rq() so it can eventually be removed.

Signed-off-by: Christian Loehle 
Signed-off-by: Tejun Heo

sched_ext: Put event_stats_cpu in struct scx_sched_pcpu

2025-09-03T21:33:28+00:00

scx_sched.event_stats_cpu is the percpu counters that are used to track
stats. Introduce struct scx_sched_pcpu and move the counters inside. This
will ease adding more per-cpu fields. No functional changes.

Signed-off-by: Tejun Heo 
Acked-by: Andrea Righi

sched_ext: Move internal type and accessor definitions to ext_internal.h

2025-09-03T21:33:28+00:00

There currently isn't a place to place SCX-internal types and accessors to
be shared between ext.c and ext_idle.c. Create kernel/sched/ext_internal.h
and move internal type and accessor definitions there. This trims ext.c a
bit and makes future additions easier. Pure code reorganization. No
functional changes.

Signed-off-by: Tejun Heo 
Acked-by: Andrea Righi