linux.git/kernel/workqueue.c, branch v7.2-rc1

Merge tag 'wq-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

2026-06-17T10:57:44+00:00

Pull workqueue updates from Tejun Heo:

 - Continued progress toward making alloc_workqueue() unbound by
   default: more callers converted to WQ_PERCPU / system_percpu_wq /
   system_dfl_wq, and new warnings for queues that use neither WQ_PERCPU
   nor WQ_UNBOUND or the legacy system_wq / system_unbound_wq.

 - Misc: drop the now-trivial apply_wqattrs_lock()/unlock() wrappers,
   forbid the TEST_WORKQUEUE benchmark from being built-in, and fix a
   spurious pointer level in the worker debug-dump path.

* tag 'wq-for-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  drm/bridge: anx7625: Add WQ_PERCPU add to alloc_workqueue
  wifi: ath6kl: fix invalid workqueue flags in ath6kl_usb_create()
  btrfs: Drop WQ_PERCPU from ordered_flags in btrfs_init_workqueues()
  workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present
  workqueue: Add warnings and fallback if system_{unbound}_wq is used
  workqueue: drop spurious '*' from print_worker_info() fn declaration
  workqueue: forbid TEST_WORKQUEUE from being built-in
  workqueue: drop apply_wqattrs_lock()/unlock() wrappers
  umh: replace use of system_unbound_wq with system_dfl_wq
  rapidio: rio: add WQ_PERCPU to alloc_workqueue users
  media: ddbridge: add WQ_PERCPU to alloc_workqueue users
  platform: cznic: turris-omnia-mcu: replace use of system_wq with system_percpu_wq
  media: synopsys: hdmirx: replace use of system_unbound_wq with system_dfl_wq
  virt: acrn: Add WQ_PERCPU to alloc_workqueue users

Merge tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip

2026-06-15T09:20:18+00:00

Pull scheduler updates from Ingo Molnar:
 "SMP load-balancing updates:

   - A large series to introduce infrastructure for cache-aware load
     balancing, with the goal of co-locating tasks that share data
     within the same Last Level Cache (LLC) domain. By improving cache
     locality, the scheduler can reduce cache bouncing and cache misses,
     ultimately improving data access efficiency.

     Implemented by Chen Yu and Tim Chen, based on early prototype work
     by Peter Zijlstra, with fixes by Jianyong Wu, Peter Zijlstra and
     Shrikanth Hegde.

   - A series to simplify CONFIG_SCHED_SMT ifdef usage (Shrikanth Hegde)

  Fair scheduler updates:

   - A series to improve SD_ASYM_CPUCAPACITY scheduling by introducing
     SMT awareness (Andrea Righi, K Prateek Nayak)

   - A series to optimize cfs_rq and sched_entity allocation for better
     data locality (Zecheng Li)

   - A preparatory series to change fair/cgroup scheduling to a single
     runqueue, without the final change (Peter Zijlstra)

   - Auto-manage ext/fair dl_server bandwidth (Andrea Righi)

   - Fix cpu_util runnable_avg arithmetic (Hongyan Xia)

   - Optimize update_tg_load_avg()'s rate-limiting code (Rik van Riel)

   - Allow account_cfs_rq_runtime() to throttle current hierarchy
     (K Prateek Nayak)

   - Update util_est after updating util_avg during dequeue, to fix the
     util signal update logic, which reduces signal noise (Vincent
     Guittot)

  Scheduler topology updates:

   - Allow multiple domains to claim sched_domain_shared (K Prateek
     Nayak)

   - Add parameter to split LLC (Peter Zijlstra)

  Core scheduler updates:

   - Use trace_call__() to save a static branch (Gabriele Monaco)

  Scheduler statistics updates:

   - Drop now-stale mul_u64_u64_div_u64() cputime over-approximation
     guard (Nicolas Pitre)

  Deadline scheduler updates:

   - Reject debugfs dl_server writes for offline CPUs (Andrea Righi)

   - Fix replenishment logic for non-deferred servers (Yuri Andriaccio)

  RT scheduling updates:

   - Turn RT_PUSH_IPI default off for non PREEMPT_RT (Steven Rostedt)

   - Update default bandwidth for real-time tasks to 1.0 (Yuri
     Andriaccio)

  Proxy scheduling updates:

   - A series to implement Optimized Donor Migration for Proxy Execution
     (John Stultz, Peter Zijlstra)

   - Various proxy scheduling cleanups and fixes (Peter Zijlstra,
     K Prateek Nayak)

  Misc fixes, improvements and cleanups by Aaron Lu, Andrea Righi,
  Zenghui Yu, Chen Yu, Guanyou.Chen, John Stultz, Shrikanth Hegde,
  Peter Zijlstra, Liang Luo and Yiyang Chen"

* tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (91 commits)
  sched/fair: Fix newidle vs core-sched
  sched/deadline: Use task_on_rq_migrating() helper
  sched/core: Combine separate 'else' and 'if' statements
  sched/fair: Fix cpu_util runnable_avg arithmetic
  sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()
  sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()
  sched/fair: Call update_curr() before unthrottling the hierarchy
  sched/fair: Use throttled_csd_list for local unthrottle
  sched/fair: Convert cfs bandwidth throttling to use guards
  sched/fair: Allocate cfs_tg_state with percpu allocator
  sched/fair: Remove task_group->se pointer array
  sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_state
  sched: restore timer_slack_ns when resetting RT policy on fork
  MAINTAINERS: Fix spelling mistake in Peter's name
  sched: Simplify ttwu_runnable()
  sched/proxy: Remove superfluous clear_task_blocked_in()
  sched/proxy: Remove PROXY_WAKING
  sched/proxy: Switch proxy to use p->is_blocked
  sched/proxy: Only return migrate when needed
  sched: Be more strict about p->is_blocked
  ...

workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present

2026-05-29T17:56:48+00:00

Currently there are no checks in order to enforce the use of one between
WQ_PERCPU or WQ_UNBOUND.

So act as following:
- if neither of them is present, set WQ_PERCPU
- if both are present, remove WQ_PERCPU

Along with this change, WARN_ONCE(), so that the code still uses both or
neither of them, can be changed.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo 
Signed-off-by: Marco Crivellari 
Signed-off-by: Tejun Heo

workqueue: Add warnings and fallback if system_{unbound}_wq is used

2026-05-29T17:55:33+00:00

Currently many users transitioned already to the new introduced workqueue
(system_percpu_wq, system_dfl_wq), but there are new users who still use the
older system_wq and system_unbound_wq.

This change try to push this transition forward, by warning whether the old
workqueues are used.

Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo 
Signed-off-by: Marco Crivellari 
Signed-off-by: Tejun Heo

workqueue: drop spurious '*' from print_worker_info() fn declaration

2026-05-27T18:22:56+00:00

print_worker_info() declares its local 'fn' as work_func_t * but
worker->current_func has type work_func_t (a function pointer). The
extra level of indirection is wrong and only happens to be harmless
today because every supported Linux architecture has
sizeof(work_func_t) == sizeof(work_func_t *):
copy_from_kernel_nofault() reads the correct number of bytes by
accident, and %ps still resolves the printed address because the
stored value is the function address regardless of declared type.

On any future ABI where sizeof(void (*)()) differs from
sizeof(void *), the nofault copy would transfer the wrong number of
bytes and the subsequent %ps would print an incorrect address.

Match the field type so the intent is explicit and the code does not
silently rely on equal pointer sizes.

Fixes: 3d1cb2059d93 ("workqueue: include workqueue info when printing debug dump of a worker task")
Signed-off-by: Breno Leitao 
Signed-off-by: Tejun Heo

sched: Simplify ifdeffery around cpu_smt_mask

2026-05-19T10:17:37+00:00

Now, that cpu_smt_mask is defined as cpumask_of(cpu) for
CONFIG_SCHED_SMT=n, it is possible to get rid of the ifdeffery.

Effectively,
 - This makes sched_smt_present is defined always

 - cpumask_weight(cpumask_of(cpu)) == 1. So sched_smt_present_inc/dec
   will never enable the sched_smt_present. Which is expected.

 - Paths that were compile-time eliminated become runtime guarded
   using static keys.

 - Defines set_idle_cores, test_idle_cores, etc which could likely benefit
   the CONFIG_SCHED_SMT=n systems to use the same optimizations within the
   LLC at wakeups.

 - This will expose sched_smt_present symbol for CONFIG_SCHED_SMT=n.
   Likely not a concern.

 - There is a bloat of code CONFIG_SCHED_SMT=n. (NR_CPUS=2048)
   add/remove: 24/18 grow/shrink: 26/28 up/down: 6396/-3188 (3208)
   Total: Before=30629880, After=30633088, chg +0.01%

 - No code bloat for CONFIG_SCHED_SMT=y, which is expected.

 - Add comments around stop_core_cpuslocked on why ifdefs are not
   removed.

 - This leaves the remaining uses of CONFIG_SCHED_SMT mainly for
   topology building bits which has a policy based decision.

Signed-off-by: Shrikanth Hegde 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Phil Auld 
Reviewed-by: Valentin Schneider 
Acked-by: Tejun Heo 
Tested-by: K Prateek Nayak 
Link: https://patch.msgid.link/20260515172456.542799-3-sshegde@linux.ibm.com

workqueue: drop apply_wqattrs_lock()/unlock() wrappers

2026-05-11T18:58:15+00:00

The apply_wqattrs_lock()/unlock() helpers were introduced by
commit a0111cf6710b ("workqueue: separate out and refactor the locking
of applying attrs") to encapsulate the get_online_cpus() (later
cpus_read_lock()) + mutex_lock(&wq_pool_mutex) acquire pair that was
duplicated across the apply-attrs paths.

Since commit 19af45757383 ("workqueue: Remove cpus_read_lock() from
apply_wqattrs_lock()") removed the cpus_read_lock() (pwq creation and
installation now operate on wq_online_cpumask, so CPU hotplug no longer
needs to be excluded), the wrappers have been one-line forwarders to
mutex_lock(&wq_pool_mutex)/mutex_unlock(&wq_pool_mutex).

They no longer encode any non-trivial locking rule and obscure the fact
that callers just take the existing wq_pool_mutex. This align with the
"unnecessary" helpers that got discussed in [1]

Inline the eight call sites and remove the wrappers. No functional
change.

Link: https://lore.kernel.org/all/afs_44-6ToJJVZTn@gmail.com/ [1]
Signed-off-by: Breno Leitao 
Signed-off-by: Tejun Heo

workqueue: Fix wq->cpu_pwq leak in alloc_and_link_pwqs() WQ_UNBOUND path

2026-05-08T17:59:44+00:00

For WQ_UNBOUND workqueues, alloc_and_link_pwqs() allocates wq->cpu_pwq
via alloc_percpu() and then calls apply_workqueue_attrs_locked(). On
failure it returns the error directly, bypassing the enomem: label
which holds the only free_percpu(wq->cpu_pwq) in this function.

The caller's error path kfree()s wq without touching wq->cpu_pwq,
leaking one percpu pointer table (nr_cpu_ids * sizeof(void *) bytes) per
failed call.

If kmemleak is enabled, we can see:

  unreferenced object (percpu) 0xc0fffa5b121048 (size 8):
    comm "insmod", pid 776, jiffies 4294682844
    backtrace (crc 0):
      pcpu_alloc_noprof+0x665/0xac0
      __alloc_workqueue+0x33f/0xa20
      alloc_workqueue_noprof+0x60/0x100

Route the error through the existing enomem: cleanup and any error
before this one.

Cc: stable@kernel.org
Fixes: 636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu pool_workqueues")
Signed-off-by: Breno Leitao 
Signed-off-by: Tejun Heo

workqueue: Release PENDING in __queue_work() drain/destroy reject path

2026-05-08T17:59:27+00:00

The caller of __queue_work() owns WORK_STRUCT_PENDING, won via
test_and_set_bit() in queue_work_on()/__queue_delayed_work(). The
state machine documented above __queue_work() requires that owner
to either hand the token to a pwq (insert_work() -> set_work_pwq()),
hand it to a timer, or release it via set_work_pool_and_clear_pending().
try_to_grab_pending() relies on this: when it observes
"PENDING && off-queue" it busy-loops, trusting the current owner to
make progress.

The (__WQ_DESTROYING | __WQ_DRAINING) early-return path violates that
contract. It WARN_ONCE()s and bare-returns, leaving work->data with
PENDING set, WORK_STRUCT_PWQ clear, and work->entry empty.

The path is reachable without explicit API abuse: queue_delayed_work()
arms a timer with PENDING set; if drain_workqueue() runs while the
timer is still pending, delayed_work_timer_fn() -> __queue_work() in
softirq context hits the WARN, current is not a wq worker so
is_chained_work() is false, and the work is silently dropped with
PENDING leaked.

Mirror what clear_pending_if_disabled() already does on its analogous
reject path: unpack the off-queue data and call
set_work_pool_and_clear_pending() to release the token before
returning.

I was able to reproduce this by queueing several slow works on
a max_active=1 wq, arm a delayed_work whose timer fires while
drain_workqueue() is blocked, then call cancel_delayed_work_sync().
Without this patch the cancel livelocks at 100% CPU; with it the cancel
returns immediately.

Signed-off-by: Breno Leitao 
Signed-off-by: Tejun Heo

workqueue: Annotate alloc_workqueue_va() with __printf(1, 0)

2026-04-29T19:44:16+00:00

alloc_workqueue_va() forwards its va_list to __alloc_workqueue() which
ultimately feeds vsnprintf(). __alloc_workqueue() already carries
__printf(1, 0); the new wrapper needs the same annotation so format
string checking propagates through the forwarding.

Fixes: 0de4cb473aed ("workqueue: fix devm_alloc_workqueue() va_list misuse")
Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-kbuild-all/202604300347.2LgXyteh-lkp@intel.com/
Signed-off-by: Tejun Heo