<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/tools/workqueue, branch master</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>tools/workqueue: add CACHE_SHARD support to wq_dump.py</title>
<updated>2026-04-01T20:24:18+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-04-01T13:03:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=738390a5321c7d34f468bc69f7232db711210bc0'/>
<id>738390a5321c7d34f468bc69f7232db711210bc0</id>
<content type='text'>
The WQ_AFFN_CACHE_SHARD affinity scope was added to the kernel but
wq_dump.py was not updated to enumerate it. Add the missing constant
lookup and include it in the affinity scopes iteration so that drgn
output shows the CACHE_SHARD pod topology alongside the other scopes.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The WQ_AFFN_CACHE_SHARD affinity scope was added to the kernel but
wq_dump.py was not updated to enumerate it. Add the missing constant
lookup and include it in the affinity scopes iteration so that drgn
output shows the CACHE_SHARD pod topology alongside the other scopes.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/workqueue/wq_dump.py: add NODE prefix to all node columns</title>
<updated>2026-03-07T18:13:56+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-03-07T17:47:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=25e1a46cc3b7dccb1e2c86ddb15a1c8b92b564f0'/>
<id>25e1a46cc3b7dccb1e2c86ddb15a1c8b92b564f0</id>
<content type='text'>
Previously only the first node column showed "NODE 0" while subsequent
columns showed just the bare node number, making it unclear what the
numbers refer to.

Add the "NODE" prefix to all node columns and remove the now-unnecessary
first/else branching.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously only the first node column showed "NODE 0" while subsequent
columns showed just the bare node number, making it unclear what the
numbers refer to.

Add the "NODE" prefix to all node columns and remove the now-unnecessary
first/else branching.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/workqueue/wq_dump.py: fix column alignment in node_nr/max_active section</title>
<updated>2026-03-07T18:13:53+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-03-07T17:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=15e1fab91b3d69773735447cbc151a8a96c4f4a0'/>
<id>15e1fab91b3d69773735447cbc151a8a96c4f4a0</id>
<content type='text'>
On larger machines with many CPUs, max_active values such as 2048
exceed the hardcoded minimum field width of 3 characters, causing the
header and data columns to misalign.

Widen the format specifiers to accommodate 4-digit values and
right-align each nr/max as a single string to keep the output compact.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On larger machines with many CPUs, max_active values such as 2048
exceed the hardcoded minimum field width of 3 characters, causing the
header and data columns to misalign.

Widen the format specifiers to accommodate 4-digit values and
right-align each nr/max as a single string to keep the output compact.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/workqueue/wq_dump.py: remove backslash separator from node_nr/max_active header</title>
<updated>2026-03-07T18:13:49+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2026-03-07T17:47:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=18a1efe0156efd583a85e34d9869f92b02473705'/>
<id>18a1efe0156efd583a85e34d9869f92b02473705</id>
<content type='text'>
Remove the backslash separator between the workqueue name and the
data columns in the "Unbound workqueue -&gt; node_nr/max_active" header
for cleaner output.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the backslash separator between the workqueue name and the
data columns in the "Unbound workqueue -&gt; node_nr/max_active" header
for cleaner output.

Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>workqueue: remove unnecessary import and function in wq_monitor.py</title>
<updated>2024-03-25T20:11:32+00:00</updated>
<author>
<name>Kemeng Shi</name>
<email>shikemeng@huaweicloud.com</email>
</author>
<published>2024-03-21T15:04:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8034b31464c53d6e182f65a293a87b50ddf6dd7e'/>
<id>8034b31464c53d6e182f65a293a87b50ddf6dd7e</id>
<content type='text'>
Remove unnecessary import and function in wq_monitor.py

Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove unnecessary import and function in wq_monitor.py

Signed-off-by: Kemeng Shi &lt;shikemeng@huaweicloud.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>workqueue: Implement BH workqueues to eventually replace tasklets</title>
<updated>2024-02-04T21:28:06+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-02-04T21:28:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4cb1ef64609f9b0254184b2947824f4b46ccab22'/>
<id>4cb1ef64609f9b0254184b2947824f4b46ccab22</id>
<content type='text'>
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws such as
the execution code accessing the tasklet item after the execution is
complete which can lead to subtle use-after-free in certain usage scenarios
and less-developed flush and cancel mechanisms.

This patch implements BH workqueues which share the same semantics and
features of regular workqueues but execute their work items in the softirq
context. As there is always only one BH execution context per CPU, none of
the concurrency management mechanisms applies and a BH workqueue can be
thought of as a convenience wrapper around softirq.

Except for the inability to sleep while executing and lack of max_active
adjustments, BH workqueues and work items should behave the same as regular
workqueues and work items.

Currently, the execution is hooked to tasklet[_hi]. However, the goal is to
convert all tasklet users over to BH workqueues. Once the conversion is
complete, tasklet can be removed and BH workqueues can directly take over
the tasklet softirqs.

system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in
tasklet, all existing tasklet users should be able to use the system BH
workqueues without creating their own workqueues.

v3: - Add missing interrupt.h include.

v2: - Instead of using tasklets, hook directly into its softirq action
      functions - tasklet[_hi]_action(). This is slightly cheaper and closer
      to the eventual code structure we want to arrive at. Suggested by Lai.

    - Lai also pointed out several places which need NULL worker-&gt;task
      handling or can use clarification. Updated.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.com
Tested-by: Allen Pais &lt;allen.lkml@gmail.com&gt;
Reviewed-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The only generic interface to execute asynchronously in the BH context is
tasklet; however, it's marked deprecated and has some design flaws such as
the execution code accessing the tasklet item after the execution is
complete which can lead to subtle use-after-free in certain usage scenarios
and less-developed flush and cancel mechanisms.

This patch implements BH workqueues which share the same semantics and
features of regular workqueues but execute their work items in the softirq
context. As there is always only one BH execution context per CPU, none of
the concurrency management mechanisms applies and a BH workqueue can be
thought of as a convenience wrapper around softirq.

Except for the inability to sleep while executing and lack of max_active
adjustments, BH workqueues and work items should behave the same as regular
workqueues and work items.

Currently, the execution is hooked to tasklet[_hi]. However, the goal is to
convert all tasklet users over to BH workqueues. Once the conversion is
complete, tasklet can be removed and BH workqueues can directly take over
the tasklet softirqs.

system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in
tasklet, all existing tasklet users should be able to use the system BH
workqueues without creating their own workqueues.

v3: - Add missing interrupt.h include.

v2: - Instead of using tasklets, hook directly into its softirq action
      functions - tasklet[_hi]_action(). This is slightly cheaper and closer
      to the eventual code structure we want to arrive at. Suggested by Lai.

    - Lai also pointed out several places which need NULL worker-&gt;task
      handling or can use clarification. Updated.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.com
Tested-by: Allen Pais &lt;allen.lkml@gmail.com&gt;
Reviewed-by: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/workqueue/wq_dump.py: Add node_nr/max_active dump</title>
<updated>2024-01-29T18:11:25+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-01-29T18:11:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=07daa99b7fd7adfffa22180184e39ec124e73013'/>
<id>07daa99b7fd7adfffa22180184e39ec124e73013</id>
<content type='text'>
Print out per-node nr/max_active numbers to improve visibility into
node_nr_active operations.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Print out per-node nr/max_active numbers to improve visibility into
node_nr_active operations.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/workqueue/wq_dump.py: Clean up code and drop duplicate information</title>
<updated>2024-01-25T16:22:03+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2024-01-25T16:21:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a6b48c83d28e21ddcd6a080128bb73f9e3d130ac'/>
<id>a6b48c83d28e21ddcd6a080128bb73f9e3d130ac</id>
<content type='text'>
- Factor out wq_type_str()

- Improve formatting so that it adapts to actual field widths.

- Drop duplicate information from "Workqueue -&gt; rescuer" section. If
  anything, we should add more rescuer-specific info - e.g. the number of
  work items rescued.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Factor out wq_type_str()

- Improve formatting so that it adapts to actual field widths.

- Drop duplicate information from "Workqueue -&gt; rescuer" section. If
  anything, we should add more rescuer-specific info - e.g. the number of
  work items rescued.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/workqueue: Add rescuers printing to wq_dump.py</title>
<updated>2024-01-16T18:47:22+00:00</updated>
<author>
<name>Juri Lelli</name>
<email>juri.lelli@redhat.com</email>
</author>
<published>2024-01-16T16:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ab5e5b99a949b9f282c605d00557b2c727856485'/>
<id>ab5e5b99a949b9f282c605d00557b2c727856485</id>
<content type='text'>
Retrieving rescuers information (e.g., affinity and name) is quite
useful when debugging workqueues configurations.

Add printing of such information to the existing wq_dump.py script.

Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Retrieving rescuers information (e.g., affinity and name) is quite
useful when debugging workqueues configurations.

Add printing of such information to the existing wq_dump.py script.

Signed-off-by: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>workqueue: Implement non-strict affinity scope for unbound workqueues</title>
<updated>2023-08-08T01:57:25+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2023-08-08T01:57:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8639ecebc9b1796d7074751a350462f5e1c61cd4'/>
<id>8639ecebc9b1796d7074751a350462f5e1c61cd4</id>
<content type='text'>
An unbound workqueue can be served by multiple worker_pools to improve
locality. The segmentation is achieved by grouping CPUs into pods. By
default, the cache boundaries according to cpus_share_cache() define the
CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the
system has two L3 caches. The workqueue would be mapped to two worker_pools
each serving one L3 cache domains.

While this improves locality, because the pod boundaries are strict, it
limits the total bandwidth a given issuer can consume. For example, let's
say there is a thread pinned to a CPU issuing enough work items to saturate
the whole machine. With the machine segmented into two pods, no matter how
many work items it issues, it can only use half of the CPUs on the system.

While this limitation has existed for a very long time, it wasn't very
pronounced because the affinity grouping used to be always by NUMA nodes.
With cache boundaries as the default and support for even finer grained
scopes (smt and cpu), it is now an a lot more pressing problem.

This patch implements non-strict affinity scope where the pod boundaries
aren't enforced strictly. Going back to the previous example, the workqueue
would still be mapped to two worker_pools; however, the affinity enforcement
would be soft. The workers in both pools would have their cpus_allowed set
to the whole machine thus allowing the scheduler to migrate them anywhere on
the machine. However, whenever an idle worker is woken up, the workqueue
code asks the scheduler to bring back the task within the pod if the worker
is outside. ie. work items start executing within its affinity scope but can
be migrated outside as the scheduler sees fit. This removes the hard cap on
utilization while maintaining the benefits of affinity scopes.

After the earlier -&gt;__pod_cpumask changes, the implementation is pretty
simple. When non-strict which is the new default:

* pool_allowed_cpus() returns @pool-&gt;attrs-&gt;cpumask instead of
  -&gt;__pod_cpumask so that the workers are allowed to run on any CPU that
  the associated workqueues allow.

* If the idle worker task's -&gt;wake_cpu is outside the pod, kick_pool() sets
  the field to a CPU within the pod.

This would be the first use of task_struct-&gt;wake_cpu outside scheduler
proper, so it isn't clear whether this would be acceptable. However, other
methods of migrating tasks are significantly more expensive and are likely
prohibitively so if we want to do this on every work item. This needs
discussion with scheduler folks.

There is also a race window where setting -&gt;wake_cpu wouldn't be effective
as the target task is still on CPU. However, the window is pretty small and
this being a best-effort optimization, it doesn't seem to warrant more
complexity at the moment.

While the non-strict cache affinity scopes seem to be the best option, the
performance picture interacts with the affinity scope and is a bit
complicated to fully discuss in this patch, so the behavior is made easily
selectable through wqattrs and sysfs and the next patch will add
documentation to discuss performance implications.

v2: pool-&gt;attrs-&gt;affn_strict is set to true for per-cpu worker_pools.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An unbound workqueue can be served by multiple worker_pools to improve
locality. The segmentation is achieved by grouping CPUs into pods. By
default, the cache boundaries according to cpus_share_cache() define the
CPUs are grouped. Let's a workqueue is allowed to run on all CPUs and the
system has two L3 caches. The workqueue would be mapped to two worker_pools
each serving one L3 cache domains.

While this improves locality, because the pod boundaries are strict, it
limits the total bandwidth a given issuer can consume. For example, let's
say there is a thread pinned to a CPU issuing enough work items to saturate
the whole machine. With the machine segmented into two pods, no matter how
many work items it issues, it can only use half of the CPUs on the system.

While this limitation has existed for a very long time, it wasn't very
pronounced because the affinity grouping used to be always by NUMA nodes.
With cache boundaries as the default and support for even finer grained
scopes (smt and cpu), it is now an a lot more pressing problem.

This patch implements non-strict affinity scope where the pod boundaries
aren't enforced strictly. Going back to the previous example, the workqueue
would still be mapped to two worker_pools; however, the affinity enforcement
would be soft. The workers in both pools would have their cpus_allowed set
to the whole machine thus allowing the scheduler to migrate them anywhere on
the machine. However, whenever an idle worker is woken up, the workqueue
code asks the scheduler to bring back the task within the pod if the worker
is outside. ie. work items start executing within its affinity scope but can
be migrated outside as the scheduler sees fit. This removes the hard cap on
utilization while maintaining the benefits of affinity scopes.

After the earlier -&gt;__pod_cpumask changes, the implementation is pretty
simple. When non-strict which is the new default:

* pool_allowed_cpus() returns @pool-&gt;attrs-&gt;cpumask instead of
  -&gt;__pod_cpumask so that the workers are allowed to run on any CPU that
  the associated workqueues allow.

* If the idle worker task's -&gt;wake_cpu is outside the pod, kick_pool() sets
  the field to a CPU within the pod.

This would be the first use of task_struct-&gt;wake_cpu outside scheduler
proper, so it isn't clear whether this would be acceptable. However, other
methods of migrating tasks are significantly more expensive and are likely
prohibitively so if we want to do this on every work item. This needs
discussion with scheduler folks.

There is also a race window where setting -&gt;wake_cpu wouldn't be effective
as the target task is still on CPU. However, the window is pretty small and
this being a best-effort optimization, it doesn't seem to warrant more
complexity at the moment.

While the non-strict cache affinity scopes seem to be the best option, the
performance picture interacts with the affinity scope and is a bit
complicated to fully discuss in this patch, so the behavior is made easily
selectable through wqattrs and sysfs and the next patch will add
documentation to discuss performance implications.

v2: pool-&gt;attrs-&gt;affn_strict is set to true for per-cpu worker_pools.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
