<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/sched, branch master</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>Merge tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2026-03-31T21:23:12+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-03-31T21:23:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9147566d801602c9e7fc7f85e989735735bf38ba'/>
<id>9147566d801602c9e7fc7f85e989735735bf38ba</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:

 - Fix SCX_KICK_WAIT deadlock where multiple CPUs waiting for each other
   in hardirq context form a cycle. Move the wait to a balance callback
   which can drop the rq lock and process IPIs.

 - Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() where
   the waker_node used cpu_to_node() while prev_cpu used
   scx_cpu_node_if_enabled(), leading to undefined behavior when
   per-node idle tracking is disabled.

* tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
  sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull sched_ext fixes from Tejun Heo:

 - Fix SCX_KICK_WAIT deadlock where multiple CPUs waiting for each other
   in hardirq context form a cycle. Move the wait to a balance callback
   which can drop the rq lock and process IPIs.

 - Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() where
   the waker_node used cpu_to_node() while prev_cpu used
   scx_cpu_node_if_enabled(), leading to undefined behavior when
   per-node idle tracking is disabled.

* tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
  sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
</pre>
</div>
</content>
</entry>
<entry>
<title>sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback</title>
<updated>2026-03-30T18:37:27+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2026-03-29T00:18:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=415cb193bb9736f0e830286c72a6fa8eb2a9cc5c'/>
<id>415cb193bb9736f0e830286c72a6fa8eb2a9cc5c</id>
<content type='text'>
SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using
smp_cond_load_acquire() until the target CPU's kick_sync advances. Because
the irq_work runs in hardirq context, the waiting CPU cannot reschedule and
its own kick_sync never advances. If multiple CPUs form a wait cycle, all
CPUs deadlock.

Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to
force the CPU through do_pick_task_scx(), which queues a balance callback
to perform the wait. The balance callback drops the rq lock and enables
IRQs following the sched_core_balance() pattern, so the CPU can process
IPIs while waiting. The local CPU's kick_sync is advanced on entry to
do_pick_task_scx() and continuously during the wait, ensuring any CPU that
starts waiting for us sees the advancement and cannot form cyclic
dependencies.

Fixes: 90e55164dad4 ("sched_ext: Implement SCX_KICK_WAIT")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using
smp_cond_load_acquire() until the target CPU's kick_sync advances. Because
the irq_work runs in hardirq context, the waiting CPU cannot reschedule and
its own kick_sync never advances. If multiple CPUs form a wait cycle, all
CPUs deadlock.

Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to
force the CPU through do_pick_task_scx(), which queues a balance callback
to perform the wait. The balance callback drops the rq lock and enables
IRQs following the sched_core_balance() pattern, so the CPU can process
IPIs while waiting. The local CPU's kick_sync is advanced on entry to
do_pick_task_scx() and continuously during the wait, ensuring any CPU that
starts waiting for us sees the advancement and cannot form cyclic
dependencies.

Fixes: 90e55164dad4 ("sched_ext: Implement SCX_KICK_WAIT")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Tested-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()</title>
<updated>2026-03-22T00:22:37+00:00</updated>
<author>
<name>Cheng-Yang Chou</name>
<email>yphbchou0911@gmail.com</email>
</author>
<published>2026-03-21T10:54:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=db08b1940f4beb25460b4a4e9da3446454f2e8fe'/>
<id>db08b1940f4beb25460b4a4e9da3446454f2e8fe</id>
<content type='text'>
In the WAKE_SYNC path of scx_select_cpu_dfl(), waker_node was computed
with cpu_to_node(), while node (for prev_cpu) was computed with
scx_cpu_node_if_enabled(). When scx_builtin_idle_per_node is disabled,
idle_cpumask(waker_node) is called with a real node ID even though
per-node idle tracking is disabled, resulting in undefined behavior.

Fix by using scx_cpu_node_if_enabled() for waker_node as well, ensuring
both variables are computed consistently.

Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Cheng-Yang Chou &lt;yphbchou0911@gmail.com&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the WAKE_SYNC path of scx_select_cpu_dfl(), waker_node was computed
with cpu_to_node(), while node (for prev_cpu) was computed with
scx_cpu_node_if_enabled(). When scx_builtin_idle_per_node is disabled,
idle_cpumask(waker_node) is called with a real node ID even though
per-node idle tracking is disabled, resulting in undefined behavior.

Fix by using scx_cpu_node_if_enabled() for waker_node as well, ensuring
both variables are computed consistently.

Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks")
Cc: stable@vger.kernel.org # v6.15+
Signed-off-by: Cheng-Yang Chou &lt;yphbchou0911@gmail.com&gt;
Reviewed-by: Andrea Righi &lt;arighi@nvidia.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: idle: Consolidate the handling of two special cases</title>
<updated>2026-03-16T19:29:47+00:00</updated>
<author>
<name>Rafael J. Wysocki</name>
<email>rafael.j.wysocki@intel.com</email>
</author>
<published>2026-03-13T12:25:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f4c31b07b136839e0fb3026f8a5b6543e3b14d2f'/>
<id>f4c31b07b136839e0fb3026f8a5b6543e3b14d2f</id>
<content type='text'>
There are two special cases in the idle loop that are handled
inconsistently even though they are analogous.

The first one is when a cpuidle driver is absent and the default CPU
idle time power management implemented by the architecture code is used.
In that case, the scheduler tick is stopped every time before invoking
default_idle_call().

The second one is when a cpuidle driver is present, but there is only
one idle state in its table.  In that case, the scheduler tick is never
stopped at all.

Since each of these approaches has its drawbacks, reconcile them with
the help of one simple heuristic.  Namely, stop the tick if the CPU has
been woken up by it in the previous iteration of the idle loop, or let
it tick otherwise.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Qais Yousef &lt;qyousef@layalina.io&gt;
Reviewed-by: Aboorva Devarajan &lt;aboorvad@linux.ibm.com&gt;
Fixes: ed98c3491998 ("sched: idle: Do not stop the tick before cpuidle_idle_call()")
[ rjw: Added Fixes tag, changelog edits ]
Link: https://patch.msgid.link/4741364.LvFx2qVVIh@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are two special cases in the idle loop that are handled
inconsistently even though they are analogous.

The first one is when a cpuidle driver is absent and the default CPU
idle time power management implemented by the architecture code is used.
In that case, the scheduler tick is stopped every time before invoking
default_idle_call().

The second one is when a cpuidle driver is present, but there is only
one idle state in its table.  In that case, the scheduler tick is never
stopped at all.

Since each of these approaches has its drawbacks, reconcile them with
the help of one simple heuristic.  Namely, stop the tick if the CPU has
been woken up by it in the previous iteration of the idle loop, or let
it tick otherwise.

Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Reviewed-by: Christian Loehle &lt;christian.loehle@arm.com&gt;
Reviewed-by: Frederic Weisbecker &lt;frederic@kernel.org&gt;
Reviewed-by: Qais Yousef &lt;qyousef@layalina.io&gt;
Reviewed-by: Aboorva Devarajan &lt;aboorvad@linux.ibm.com&gt;
Fixes: ed98c3491998 ("sched: idle: Do not stop the tick before cpuidle_idle_call()")
[ rjw: Added Fixes tag, changelog edits ]
Link: https://patch.msgid.link/4741364.LvFx2qVVIh@rafael.j.wysocki
Signed-off-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2026-03-15T17:49:47+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-03-15T17:49:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=63724e9519a312d7d0b8767d0aeb53bc15a7fdd5'/>
<id>63724e9519a312d7d0b8767d0aeb53bc15a7fdd5</id>
<content type='text'>
Pull scheduler fixes from Ingo Molnar:
 "More MM-CID fixes, mostly fixing hangs/races:

   - Fix CID hangs due to a race between concurrent forks

   - Fix vfork()/CLONE_VM MMCID bug causing hangs

   - Remove pointless preemption guard

   - Fix CID task list walk performance regression on large systems
     by removing the known-flaky and slow counting logic using
     for_each_process_thread() in mm_cid_*fixup_tasks_to_cpus(), and
     implementing a simple sched_mm_cid::node list instead"

* tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mmcid: Avoid full tasklist walks
  sched/mmcid: Remove pointless preempt guard
  sched/mmcid: Handle vfork()/CLONE_VM correctly
  sched/mmcid: Prevent CID stalls due to concurrent forks
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull scheduler fixes from Ingo Molnar:
 "More MM-CID fixes, mostly fixing hangs/races:

   - Fix CID hangs due to a race between concurrent forks

   - Fix vfork()/CLONE_VM MMCID bug causing hangs

   - Remove pointless preemption guard

   - Fix CID task list walk performance regression on large systems
     by removing the known-flaky and slow counting logic using
     for_each_process_thread() in mm_cid_*fixup_tasks_to_cpus(), and
     implementing a simple sched_mm_cid::node list instead"

* tag 'sched-urgent-2026-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/mmcid: Avoid full tasklist walks
  sched/mmcid: Remove pointless preempt guard
  sched/mmcid: Handle vfork()/CLONE_VM correctly
  sched/mmcid: Prevent CID stalls due to concurrent forks
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext</title>
<updated>2026-03-13T21:54:56+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-03-13T21:54:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8369b2e97d806537dcdba1d6b3bb46fb1407dab0'/>
<id>8369b2e97d806537dcdba1d6b3bb46fb1407dab0</id>
<content type='text'>
Pull sched_ext fixes from Tejun Heo:

 - Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE()
   annotations for lock-free accesses to module parameters and dsq-&gt;seq

 - Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and
   above) when passed through the int sched_class interface

 - Documentation updates: scheduling class precedence, task ownership
   state machine, example scheduler descriptions, config list cleanup

 - Selftest fix for format specifier and buffer length in
   file_write_long()

* tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
  sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags
  sched_ext: Documentation: Update sched-ext.rst
  sched_ext: Use READ_ONCE() for scx_slice_bypass_us in scx_bypass()
  sched_ext: Documentation: Mention scheduling class precedence
  sched_ext: Document task ownership state machine
  sched_ext: Use READ_ONCE() for lock-free reads of module param variables
  sched_ext/selftests: Fix format specifier and buffer length in file_write_long()
  sched_ext: Use WRITE_ONCE() for the write side of dsq-&gt;seq update
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull sched_ext fixes from Tejun Heo:

 - Fix data races flagged by KCSAN: add missing READ_ONCE()/WRITE_ONCE()
   annotations for lock-free accesses to module parameters and dsq-&gt;seq

 - Fix silent truncation of upper 32 enqueue flags (SCX_ENQ_PREEMPT and
   above) when passed through the int sched_class interface

 - Documentation updates: scheduling class precedence, task ownership
   state machine, example scheduler descriptions, config list cleanup

 - Selftest fix for format specifier and buffer length in
   file_write_long()

* tag 'sched_ext-for-7.0-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Use WRITE_ONCE() for the write side of scx_enable helper pointer
  sched_ext: Fix enqueue_task_scx() truncation of upper enqueue flags
  sched_ext: Documentation: Update sched-ext.rst
  sched_ext: Use READ_ONCE() for scx_slice_bypass_us in scx_bypass()
  sched_ext: Documentation: Mention scheduling class precedence
  sched_ext: Document task ownership state machine
  sched_ext: Use READ_ONCE() for lock-free reads of module param variables
  sched_ext/selftests: Fix format specifier and buffer length in file_write_long()
  sched_ext: Use WRITE_ONCE() for the write side of dsq-&gt;seq update
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/mmcid: Avoid full tasklist walks</title>
<updated>2026-03-11T11:01:07+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-03-10T20:29:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=192d852129b1b7c4f0ddbab95d0de1efd5ee1405'/>
<id>192d852129b1b7c4f0ddbab95d0de1efd5ee1405</id>
<content type='text'>
Chasing vfork()'ed tasks on a CID ownership mode switch requires a full
task list walk, which is obviously expensive on large systems.

Avoid that by keeping a list of tasks using a mm MMCID entity in mm::mm_cid
and walk this list instead. This removes the proven to be flaky counting
logic and avoids a full task list walk in the case of vfork()'ed tasks.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202526.183824481@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Chasing vfork()'ed tasks on a CID ownership mode switch requires a full
task list walk, which is obviously expensive on large systems.

Avoid that by keeping a list of tasks using a mm MMCID entity in mm::mm_cid
and walk this list instead. This removes the proven to be flaky counting
logic and avoids a full task list walk in the case of vfork()'ed tasks.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202526.183824481@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/mmcid: Remove pointless preempt guard</title>
<updated>2026-03-11T11:01:06+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-03-10T20:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7574ac6e49789ddee1b1be9b2afb42b4a1b4b1f4'/>
<id>7574ac6e49789ddee1b1be9b2afb42b4a1b4b1f4</id>
<content type='text'>
This is a leftover from the early versions of this function where it could
be invoked without mm::mm_cid::lock held.

Remove it and add lockdep asserts instead.

Fixes: 653fda7ae73d ("sched/mmcid: Switch over to the new mechanism")
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202526.116363613@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is a leftover from the early versions of this function where it could
be invoked without mm::mm_cid::lock held.

Remove it and add lockdep asserts instead.

Fixes: 653fda7ae73d ("sched/mmcid: Switch over to the new mechanism")
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202526.116363613@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/mmcid: Handle vfork()/CLONE_VM correctly</title>
<updated>2026-03-11T11:01:06+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-03-10T20:28:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=28b5a1395036d6c7a6c8034d85ad3d7d365f192c'/>
<id>28b5a1395036d6c7a6c8034d85ad3d7d365f192c</id>
<content type='text'>
Matthieu and Jiri reported stalls where a task endlessly loops in
mm_get_cid() when scheduling in.

It turned out that the logic which handles vfork()'ed tasks is broken. It
is invoked when the number of tasks associated to a process is smaller than
the number of MMCID users. It then walks the task list to find the
vfork()'ed task, but accounts all the already processed tasks as well.

If that double processing brings the number of to be handled tasks to 0,
the walk stops and the vfork()'ed task's CID is not fixed up. As a
consequence a subsequent schedule in fails to acquire a (transitional) CID
and the machine stalls.

Cure this by removing the accounting condition and make the fixup always
walk the full task list if it could not find the exact number of users in
the process' thread list.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Closes: https://lore.kernel.org/b24ffcb3-09d5-4e48-9070-0b69bc654281@kernel.org
Reported-by: Matthieu Baerts &lt;matttbe@kernel.org&gt;
Reported-by: Jiri Slaby &lt;jirislaby@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202526.048657665@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Matthieu and Jiri reported stalls where a task endlessly loops in
mm_get_cid() when scheduling in.

It turned out that the logic which handles vfork()'ed tasks is broken. It
is invoked when the number of tasks associated to a process is smaller than
the number of MMCID users. It then walks the task list to find the
vfork()'ed task, but accounts all the already processed tasks as well.

If that double processing brings the number of to be handled tasks to 0,
the walk stops and the vfork()'ed task's CID is not fixed up. As a
consequence a subsequent schedule in fails to acquire a (transitional) CID
and the machine stalls.

Cure this by removing the accounting condition and make the fixup always
walk the full task list if it could not find the exact number of users in
the process' thread list.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Closes: https://lore.kernel.org/b24ffcb3-09d5-4e48-9070-0b69bc654281@kernel.org
Reported-by: Matthieu Baerts &lt;matttbe@kernel.org&gt;
Reported-by: Jiri Slaby &lt;jirislaby@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202526.048657665@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/mmcid: Prevent CID stalls due to concurrent forks</title>
<updated>2026-03-11T11:01:06+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-03-10T20:28:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b2e48c429ec54715d16fefa719dd2fbded2e65be'/>
<id>b2e48c429ec54715d16fefa719dd2fbded2e65be</id>
<content type='text'>
A newly forked task is accounted as MMCID user before the task is visible
in the process' thread list and the global task list. This creates the
following problem:

 CPU1			CPU2
 fork()
   sched_mm_cid_fork(tnew1)
     tnew1-&gt;mm.mm_cid_users++;
     tnew1-&gt;mm_cid.cid = getcid()
-&gt; preemption
			fork()
			  sched_mm_cid_fork(tnew2)
			    tnew2-&gt;mm.mm_cid_users++;
                            // Reaches the per CPU threshold
			    mm_cid_fixup_tasks_to_cpus()
			    for_each_other(current, p)
			         ....

As tnew1 is not visible yet, this fails to fix up the already allocated CID
of tnew1. As a consequence a subsequent schedule in might fail to acquire a
(transitional) CID and the machine stalls.

Move the invocation of sched_mm_cid_fork() after the new task becomes
visible in the thread and the task list to prevent this.

This also makes it symmetrical vs. exit() where the task is removed as CID
user before the task is removed from the thread and task lists.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202525.969061974@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A newly forked task is accounted as MMCID user before the task is visible
in the process' thread list and the global task list. This creates the
following problem:

 CPU1			CPU2
 fork()
   sched_mm_cid_fork(tnew1)
     tnew1-&gt;mm.mm_cid_users++;
     tnew1-&gt;mm_cid.cid = getcid()
-&gt; preemption
			fork()
			  sched_mm_cid_fork(tnew2)
			    tnew2-&gt;mm.mm_cid_users++;
                            // Reaches the per CPU threshold
			    mm_cid_fixup_tasks_to_cpus()
			    for_each_other(current, p)
			         ....

As tnew1 is not visible yet, this fails to fix up the already allocated CID
of tnew1. As a consequence a subsequent schedule in might fail to acquire a
(transitional) CID and the machine stalls.

Move the invocation of sched_mm_cid_fork() after the new task becomes
visible in the thread and the task list to prevent this.

This also makes it symmetrical vs. exit() where the task is removed as CID
user before the task is removed from the thread and task lists.

Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20260310202525.969061974@kernel.org
</pre>
</div>
</content>
</entry>
</feed>
