<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/linux/sched.h, branch v6.16-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-06-02T23:00:26+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-02T23:00:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fd1f8473503e5bf897bd3e8efe3545c0352954e6'/>
<id>fd1f8473503e5bf897bd3e8efe3545c0352954e6</id>
<content type='text'>
Pull more MM updates from Andrew Morton:

 - "zram: support algorithm-specific parameters" from Sergey Senozhatsky
   adds infrastructure for passing algorithm-specific parameters into
   zram. A single parameter `winbits' is implemented at this time.

 - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
   charging nmi-safe, which is required by BFP, which can operate in NMI
   context.

 - "Some random fixes and cleanup to shmem" from Kemeng Shi implements
   small fixes and cleanups in the shmem code.

 - "Skip mm selftests instead when kernel features are not present" from
   Zi Yan fixes some issues in the MM selftest code.

 - "mm/damon: build-enable essential DAMON components by default" from
   SeongJae Park reworks DAMON Kconfig to make it easier to enable
   CONFIG_DAMON.

 - "sched/numa: add statistics of numa balance task migration" from Libo
   Chen adds more info into sysfs and procfs files to improve visibility
   into the NUMA balancer's task migration activity.

 - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
   provides various updates to some of the MM selftests to make them
   play better with the overall containing framework.

* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
  mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  selftests/mm: fix test result reporting in gup_longterm
  selftests/mm: report unique test names for each cow test
  selftests/mm: add helper for logging test start and results
  selftests/mm: use standard ksft_finished() in cow and gup_longterm
  selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
  sched/numa: add statistics of numa balance task
  sched/numa: fix task swap by skipping kernel threads
  tools/testing: check correct variable in open_procmap()
  tools/testing/vma: add missing function stub
  mm/gup: update comment explaining why gup_fast() disables IRQs
  selftests/mm: two fixes for the pfnmap test
  mm/khugepaged: fix race with folio split/free using temporary reference
  mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
  mmu_notifiers: remove leftover stub macros
  selftests/mm: deduplicate test names in madv_populate
  kcov: rust: add flags for KCOV with Rust
  mm: rust: make CONFIG_MMU ifdefs more narrow
  mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  mm/damon/Kconfig: enable CONFIG_DAMON by default
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull more MM updates from Andrew Morton:

 - "zram: support algorithm-specific parameters" from Sergey Senozhatsky
   adds infrastructure for passing algorithm-specific parameters into
   zram. A single parameter `winbits' is implemented at this time.

 - "memcg: nmi-safe kmem charging" from Shakeel Butt makes memcg
   charging nmi-safe, which is required by BFP, which can operate in NMI
   context.

 - "Some random fixes and cleanup to shmem" from Kemeng Shi implements
   small fixes and cleanups in the shmem code.

 - "Skip mm selftests instead when kernel features are not present" from
   Zi Yan fixes some issues in the MM selftest code.

 - "mm/damon: build-enable essential DAMON components by default" from
   SeongJae Park reworks DAMON Kconfig to make it easier to enable
   CONFIG_DAMON.

 - "sched/numa: add statistics of numa balance task migration" from Libo
   Chen adds more info into sysfs and procfs files to improve visibility
   into the NUMA balancer's task migration activity.

 - "selftests/mm: cow and gup_longterm cleanups" from Mark Brown
   provides various updates to some of the MM selftests to make them
   play better with the overall containing framework.

* tag 'mm-stable-2025-06-01-14-06' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (43 commits)
  mm/khugepaged: clean up refcount check using folio_expected_ref_count()
  selftests/mm: fix test result reporting in gup_longterm
  selftests/mm: report unique test names for each cow test
  selftests/mm: add helper for logging test start and results
  selftests/mm: use standard ksft_finished() in cow and gup_longterm
  selftests/damon/_damon_sysfs: skip testcases if CONFIG_DAMON_SYSFS is disabled
  sched/numa: add statistics of numa balance task
  sched/numa: fix task swap by skipping kernel threads
  tools/testing: check correct variable in open_procmap()
  tools/testing/vma: add missing function stub
  mm/gup: update comment explaining why gup_fast() disables IRQs
  selftests/mm: two fixes for the pfnmap test
  mm/khugepaged: fix race with folio split/free using temporary reference
  mm: add CONFIG_PAGE_BLOCK_ORDER to select page block order
  mmu_notifiers: remove leftover stub macros
  selftests/mm: deduplicate test names in madv_populate
  kcov: rust: add flags for KCOV with Rust
  mm: rust: make CONFIG_MMU ifdefs more narrow
  mmu_gather: move tlb flush for VM_PFNMAP/VM_MIXEDMAP vmas into free_pgtables()
  mm/damon/Kconfig: enable CONFIG_DAMON by default
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/numa: add statistics of numa balance task</title>
<updated>2025-06-01T05:46:15+00:00</updated>
<author>
<name>Chen Yu</name>
<email>yu.c.chen@intel.com</email>
</author>
<published>2025-05-23T12:51:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ad6b26b6a0a79166b53209df2ca1cf8636296382'/>
<id>ad6b26b6a0a79166b53209df2ca1cf8636296382</id>
<content type='text'>
On systems with NUMA balancing enabled, it has been found that tracking
task activities resulting from NUMA balancing is beneficial.  NUMA
balancing employs two mechanisms for task migration: one is to migrate
a task to an idle CPU within its preferred node, and the other is to
swap tasks located on different nodes when they are on each other's
preferred nodes.

The kernel already provides NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched.  However, it
lacks statistics regarding task migration and swapping.  Therefore,
relevant counts for task migration and swapping should be added.

The following two new fields:

numa_task_migrated
numa_task_swapped

will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
and /proc/vmstat.

Introducing both per-task and per-memory cgroup (memcg) NUMA balancing
statistics facilitates a rapid evaluation of the performance and
resource utilization of the target workload.  For instance, users can
first identify the container with high NUMA balancing activity and then
further pinpoint a specific task within that group, and subsequently
adjust the memory policy for that task.  In short, although it is
possible to iterate through /proc/$pid/sched to locate the problematic
task, the introduction of aggregated NUMA balancing activity for tasks
within each memcg can assist users in identifying the task more
efficiently through a divide-and-conquer approach.

As Libo Chen pointed out, the memcg event relies on the text names in
vmstat_text, and /proc/vmstat generates corresponding items based on
vmstat_text.  Thus, the relevant task migration and swapping events
introduced in vmstat_text also need to be populated by
count_vm_numa_event(), otherwise these values are zero in /proc/vmstat.

In theory, task migration and swap events are part of the scheduler's
activities.  The reason for exposing them through the
memory.stat/vmstat interface is that we already have NUMA balancing
statistics in memory.stat/vmstat, and these events are closely related
to each other.  Following Shakeel's suggestion, we describe the
end-to-end flow/story of all these events occurring on a timeline for
future reference:

The goal of NUMA balancing is to co-locate a task and its memory pages
on the same NUMA node.  There are two strategies: migrate the pages to
the task's node, or migrate the task to the node where its pages
reside.

Suppose a task p1 is running on Node 0, but its pages are located on
Node 1.  NUMA page fault statistics for p1 reveal its "page footprint"
across nodes.  If NUMA balancing detects that most of p1's pages are on
Node 1:

1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.

2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.

Case 2.1: Idle CPU Available

  If Node 1 has an idle CPU, p1 is directly scheduled there.  This
  event is logged as numa_task_migrated.

Case 2.2: No Idle CPU (Task Swap)

  If all CPUs on Node1 are busy, direct migration could cause CPU
  contention or load imbalance.  Instead: The Numa balance selects a
  candidate task p2 on Node 1 that prefers Node 0 (e.g., due to its own
  page footprint).  p1 and p2 are swapped.  This cross-node swap is
  recorded as numa_task_swapped.

Link: https://lkml.kernel.org/r/d00edb12ba0f0de3c5222f61487e65f2ac58f5b1.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com
Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Tested-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Tested-by: Madadi Vineeth Reddy &lt;vineethr@linux.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Cc: Aubrey Li &lt;aubrey.li@intel.com&gt;
Cc: Ayush Jain &lt;Ayush.jain3@amd.com&gt;
Cc: "Chen, Tim C" &lt;tim.c.chen@intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Libo Chen &lt;libo.chen@oracle.com&gt;
Cc: Mel Gorman &lt;mgorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On systems with NUMA balancing enabled, it has been found that tracking
task activities resulting from NUMA balancing is beneficial.  NUMA
balancing employs two mechanisms for task migration: one is to migrate
a task to an idle CPU within its preferred node, and the other is to
swap tasks located on different nodes when they are on each other's
preferred nodes.

The kernel already provides NUMA page migration statistics in
/sys/fs/cgroup/mytest/memory.stat and /proc/{PID}/sched.  However, it
lacks statistics regarding task migration and swapping.  Therefore,
relevant counts for task migration and swapping should be added.

The following two new fields:

numa_task_migrated
numa_task_swapped

will be shown in /sys/fs/cgroup/{GROUP}/memory.stat, /proc/{PID}/sched
and /proc/vmstat.

Introducing both per-task and per-memory cgroup (memcg) NUMA balancing
statistics facilitates a rapid evaluation of the performance and
resource utilization of the target workload.  For instance, users can
first identify the container with high NUMA balancing activity and then
further pinpoint a specific task within that group, and subsequently
adjust the memory policy for that task.  In short, although it is
possible to iterate through /proc/$pid/sched to locate the problematic
task, the introduction of aggregated NUMA balancing activity for tasks
within each memcg can assist users in identifying the task more
efficiently through a divide-and-conquer approach.

As Libo Chen pointed out, the memcg event relies on the text names in
vmstat_text, and /proc/vmstat generates corresponding items based on
vmstat_text.  Thus, the relevant task migration and swapping events
introduced in vmstat_text also need to be populated by
count_vm_numa_event(), otherwise these values are zero in /proc/vmstat.

In theory, task migration and swap events are part of the scheduler's
activities.  The reason for exposing them through the
memory.stat/vmstat interface is that we already have NUMA balancing
statistics in memory.stat/vmstat, and these events are closely related
to each other.  Following Shakeel's suggestion, we describe the
end-to-end flow/story of all these events occurring on a timeline for
future reference:

The goal of NUMA balancing is to co-locate a task and its memory pages
on the same NUMA node.  There are two strategies: migrate the pages to
the task's node, or migrate the task to the node where its pages
reside.

Suppose a task p1 is running on Node 0, but its pages are located on
Node 1.  NUMA page fault statistics for p1 reveal its "page footprint"
across nodes.  If NUMA balancing detects that most of p1's pages are on
Node 1:

1.Page Migration Attempt:
The Numa balance first tries to migrate p1's pages to Node 0.
The numa_page_migrate counter increments.

2.Task Migration Strategies:
After the page migration finishes, Numa balance checks every
1 second to see if p1 can be migrated to Node 1.

Case 2.1: Idle CPU Available

  If Node 1 has an idle CPU, p1 is directly scheduled there.  This
  event is logged as numa_task_migrated.

Case 2.2: No Idle CPU (Task Swap)

  If all CPUs on Node1 are busy, direct migration could cause CPU
  contention or load imbalance.  Instead: The Numa balance selects a
  candidate task p2 on Node 1 that prefers Node 0 (e.g., due to its own
  page footprint).  p1 and p2 are swapped.  This cross-node swap is
  recorded as numa_task_swapped.

Link: https://lkml.kernel.org/r/d00edb12ba0f0de3c5222f61487e65f2ac58f5b1.1748493462.git.yu.c.chen@intel.com
Link: https://lkml.kernel.org/r/7ef90a88602ed536be46eba7152ed0d33bad5790.1748002400.git.yu.c.chen@intel.com
Signed-off-by: Chen Yu &lt;yu.c.chen@intel.com&gt;
Tested-by: K Prateek Nayak &lt;kprateek.nayak@amd.com&gt;
Tested-by: Madadi Vineeth Reddy &lt;vineethr@linux.ibm.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Venkat Rao Bagalkote &lt;venkat88@linux.ibm.com&gt;
Cc: Aubrey Li &lt;aubrey.li@intel.com&gt;
Cc: Ayush Jain &lt;Ayush.jain3@amd.com&gt;
Cc: "Chen, Tim C" &lt;tim.c.chen@intel.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Jonathan Corbet &lt;corbet@lwn.net&gt;
Cc: Libo Chen &lt;libo.chen@oracle.com&gt;
Cc: Mel Gorman &lt;mgorman &lt;mgorman@suse.de&gt;
Cc: Michal Hocko &lt;mhocko@kernel.org&gt;
Cc: Michal Koutný &lt;mkoutny@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-06-01T02:12:53+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-06-01T02:12:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7d4e49a77d9930c69751b9192448fda6ff9100f1'/>
<id>7d4e49a77d9930c69751b9192448fda6ff9100f1</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - "hung_task: extend blocking task stacktrace dump to semaphore" from
   Lance Yang enhances the hung task detector.

   The detector presently dumps the blocking tasks's stack when it is
   blocked on a mutex. Lance's series extends this to semaphores

 - "nilfs2: improve sanity checks in dirty state propagation" from
   Wentao Liang addresses a couple of minor flaws in nilfs2

 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
   fixes a couple of issues in the gdb scripts

 - "Support kdump with LUKS encryption by reusing LUKS volume keys" from
   Coiby Xu addresses a usability problem with kdump.

   When the dump device is LUKS-encrypted, the kdump kernel may not have
   the keys to the encrypted filesystem. A full writeup of this is in
   the series [0/N] cover letter

 - "sysfs: add counters for lockups and stalls" from Max Kellermann adds
   /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
   /sys/kernel/rcu_stall_count

 - "fork: Page operation cleanups in the fork code" from Pasha Tatashin
   implements a number of code cleanups in fork.c

 - "scripts/gdb/symbols: determine KASLR offset on s390 during early
   boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
   scripts

* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
  llist: make llist_add_batch() a static inline
  delayacct: remove redundant code and adjust indentation
  squashfs: add optional full compressed block caching
  crash_dump, nvme: select CONFIGFS_FS as built-in
  scripts/gdb/symbols: determine KASLR offset on s390 during early boot
  scripts/gdb/symbols: factor out pagination_off()
  scripts/gdb/symbols: factor out get_vmlinux()
  kernel/panic.c: format kernel-doc comments
  mailmap: update and consolidate Casey Connolly's name and email
  nilfs2: remove wbc-&gt;for_reclaim handling
  fork: define a local GFP_VMAP_STACK
  fork: check charging success before zeroing stack
  fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
  fork: clean-up ifdef logic around stack allocation
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
  x86/crash: make the page that stores the dm crypt keys inaccessible
  x86/crash: pass dm crypt keys to kdump kernel
  Revert "x86/mm: Remove unused __set_memory_prot()"
  crash_dump: retrieve dm crypt keys in kdump kernel
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull non-MM updates from Andrew Morton:

 - "hung_task: extend blocking task stacktrace dump to semaphore" from
   Lance Yang enhances the hung task detector.

   The detector presently dumps the blocking tasks's stack when it is
   blocked on a mutex. Lance's series extends this to semaphores

 - "nilfs2: improve sanity checks in dirty state propagation" from
   Wentao Liang addresses a couple of minor flaws in nilfs2

 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
   fixes a couple of issues in the gdb scripts

 - "Support kdump with LUKS encryption by reusing LUKS volume keys" from
   Coiby Xu addresses a usability problem with kdump.

   When the dump device is LUKS-encrypted, the kdump kernel may not have
   the keys to the encrypted filesystem. A full writeup of this is in
   the series [0/N] cover letter

 - "sysfs: add counters for lockups and stalls" from Max Kellermann adds
   /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
   /sys/kernel/rcu_stall_count

 - "fork: Page operation cleanups in the fork code" from Pasha Tatashin
   implements a number of code cleanups in fork.c

 - "scripts/gdb/symbols: determine KASLR offset on s390 during early
   boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
   scripts

* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
  llist: make llist_add_batch() a static inline
  delayacct: remove redundant code and adjust indentation
  squashfs: add optional full compressed block caching
  crash_dump, nvme: select CONFIGFS_FS as built-in
  scripts/gdb/symbols: determine KASLR offset on s390 during early boot
  scripts/gdb/symbols: factor out pagination_off()
  scripts/gdb/symbols: factor out get_vmlinux()
  kernel/panic.c: format kernel-doc comments
  mailmap: update and consolidate Casey Connolly's name and email
  nilfs2: remove wbc-&gt;for_reclaim handling
  fork: define a local GFP_VMAP_STACK
  fork: check charging success before zeroing stack
  fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
  fork: clean-up ifdef logic around stack allocation
  kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
  kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
  x86/crash: make the page that stores the dm crypt keys inaccessible
  x86/crash: pass dm crypt keys to kdump kernel
  Revert "x86/mm: Remove unused __set_memory_prot()"
  crash_dump: retrieve dm crypt keys in kdump kernel
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next</title>
<updated>2025-05-28T22:24:36+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-28T22:24:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1b98f357dadd6ea613a435fbaef1a5dd7b35fd21'/>
<id>1b98f357dadd6ea613a435fbaef1a5dd7b35fd21</id>
<content type='text'>
Pull networking updates from Paolo Abeni:
 "Core:

   - Implement the Device Memory TCP transmit path, allowing zero-copy
     data transmission on top of TCP from e.g. GPU memory to the wire.

   - Move all the IPv6 routing tables management outside the RTNL scope,
     under its own lock and RCU. The route control path is now 3x times
     faster.

   - Convert queue related netlink ops to instance lock, reducing again
     the scope of the RTNL lock. This improves the control plane
     scalability.

   - Refactor the software crc32c implementation, removing unneeded
     abstraction layers and improving significantly the related
     micro-benchmarks.

   - Optimize the GRO engine for UDP-tunneled traffic, for a 10%
     performance improvement in related stream tests.

   - Cover more per-CPU storage with local nested BH locking; this is a
     prep work to remove the current per-CPU lock in local_bh_disable()
     on PREMPT_RT.

   - Introduce and use nlmsg_payload helper, combining buffer bounds
     verification with accessing payload carried by netlink messages.

  Netfilter:

   - Rewrite the procfs conntrack table implementation, improving
     considerably the dump performance. A lot of user-space tools still
     use this interface.

   - Implement support for wildcard netdevice in netdev basechain and
     flowtables.

   - Integrate conntrack information into nft trace infrastructure.

   - Export set count and backend name to userspace, for better
     introspection.

  BPF:

   - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
     programs and can be controlled in similar way to traditional qdiscs
     using the "tc qdisc" command.

   - Refactor the UDP socket iterator, addressing long standing issues
     WRT duplicate hits or missed sockets.

  Protocols:

   - Improve TCP receive buffer auto-tuning and increase the default
     upper bound for the receive buffer; overall this improves the
     single flow maximum thoughput on 200Gbs link by over 60%.

   - Add AFS GSSAPI security class to AF_RXRPC; it provides transport
     security for connections to the AFS fileserver and VL server.

   - Improve TCP multipath routing, so that the sources address always
     matches the nexthop device.

   - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
     and thus preventing DoS caused by passing around problematic FDs.

   - Retire DCCP socket. DCCP only receives updates for bugs, and major
     distros disable it by default. Its removal allows for better
     organisation of TCP fields to reduce the number of cache lines hit
     in the fast path.

   - Extend TCP drop-reason support to cover PAWS checks.

  Driver API:

   - Reorganize PTP ioctl flag support to require an explicit opt-in for
     the drivers, avoiding the problem of drivers not rejecting new
     unsupported flags.

   - Converted several device drivers to timestamping APIs.

   - Introduce per-PHY ethtool dump helpers, improving the support for
     dump operations targeting PHYs.

  Tests and tooling:

   - Add support for classic netlink in user space C codegen, so that
     ynl-c can now read, create and modify links, routes addresses and
     qdisc layer configuration.

   - Add ynl sub-types for binary attributes, allowing ynl-c to output
     known struct instead of raw binary data, clarifying the classic
     netlink output.

   - Extend MPTCP selftests to improve the code-coverage.

   - Add tests for XDP tail adjustment in AF_XDP.

  New hardware / drivers:

   - OpenVPN virtual driver: offload OpenVPN data channels processing to
     the kernel-space, increasing the data transfer throughput WRT the
     user-space implementation.

   - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.

   - Broadcom asp-v3.0 ethernet driver.

   - AMD Renoir ethernet device.

   - ReakTek MT9888 2.5G ethernet PHY driver.

   - Aeonsemi 10G C45 PHYs driver.

  Drivers:

   - Ethernet high-speed NICs:
       - nVidia/Mellanox (mlx5):
           - refactor the steering table handling to significantly
             reduce the amount of memory used
           - add support for complex matches in H/W flow steering
           - improve flow streeing error handling
           - convert to netdev instance locking
       - Intel (100G, ice, igb, ixgbe, idpf):
           - ice: add switchdev support for LLDP traffic over VF
           - ixgbe: add firmware manipulation and regions devlink support
           - igb: introduce support for frame transmission premption
           - igb: adds persistent NAPI configuration
           - idpf: introduce RDMA support
           - idpf: add initial PTP support
       - Meta (fbnic):
           - extend hardware stats coverage
           - add devlink dev flash support
       - Broadcom (bnxt):
           - add support for RX-side device memory TCP
       - Wangxun (txgbe):
           - implement support for udp tunnel offload
           - complete PTP and SRIOV support for AML 25G/10G devices

   - Ethernet NICs embedded and virtual:
       - Google (gve):
           - add device memory TCP TX support
       - Amazon (ena):
           - support persistent per-NAPI config
       - Airoha:
           - add H/W support for L2 traffic offload
           - add per flow stats for flow offloading
       - RealTek (rtl8211): add support for WoL magic packet
       - Synopsys (stmmac):
           - dwmac-socfpga 1000BaseX support
           - add Loongson-2K3000 support
           - introduce support for hardware-accelerated VLAN stripping
       - Broadcom (bcmgenet):
           - expose more H/W stats
       - Freescale (enetc, dpaa2-eth):
           - enetc: add MAC filter, VLAN filter RSS and loopback support
           - dpaa2-eth: convert to H/W timestamping APIs
       - vxlan: convert FDB table to rhashtable, for better scalabilty
       - veth: apply qdisc backpressure on full ring to reduce TX drops

   - Ethernet switches:
       - Microchip (kzZ88x3): add ETS scheduler support

   - Ethernet PHYs:
       - RealTek (rtl8211):
           - add support for WoL magic packet
           - add support for PHY LEDs

   - CAN:
       - Adds RZ/G3E CANFD support to the rcar_canfd driver.
       - Preparatory work for CAN-XL support.
       - Add self-tests framework with support for CAN physical interfaces.

   - WiFi:
       - mac80211:
           - scan improvements with multi-link operation (MLO)
       - Qualcomm (ath12k):
           - enable AHB support for IPQ5332
           - add monitor interface support to QCN9274
           - add multi-link operation support to WCN7850
           - add 802.11d scan offload support to WCN7850
           - monitor mode for WCN7850, better 6 GHz regulatory
       - Qualcomm (ath11k):
           - restore hibernation support
       - MediaTek (mt76):
           - WiFi-7 improvements
           - implement support for mt7990
       - Intel (iwlwifi):
           - enhanced multi-link single-radio (EMLSR) support on 5 GHz links
           - rework device configuration
       - RealTek (rtw88):
           - improve throughput for RTL8814AU
       - RealTek (rtw89):
           - add multi-link operation support
           - STA/P2P concurrency improvements
           - support different SAR configs by antenna

   - Bluetooth:
       - introduce HCI Driver protocol
       - btintel_pcie: do not generate coredump for diagnostic events
       - btusb: add HCI Drv commands for configuring altsetting
       - btusb: add RTL8851BE device 0x0bda:0xb850
       - btusb: add new VID/PID 13d3/3584 for MT7922
       - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
       - btnxpuart: implement host-wakeup feature"

* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
  selftests/bpf: Fix bpf selftest build warning
  selftests: netfilter: Fix skip of wildcard interface test
  net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
  net: openvswitch: Fix the dead loop of MPLS parse
  calipso: Don't call calipso functions for AF_INET sk.
  selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
  net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
  octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
  octeontx2-pf: QOS: Perform cache sync on send queue teardown
  net: mana: Add support for Multi Vports on Bare metal
  net: devmem: ncdevmem: remove unused variable
  net: devmem: ksft: upgrade rx test to send 1K data
  net: devmem: ksft: add 5 tuple FS support
  net: devmem: ksft: add exit_wait to make rx test pass
  net: devmem: ksft: add ipv4 support
  net: devmem: preserve sockc_err
  page_pool: fix ugly page_pool formatting
  net: devmem: move list_add to net_devmem_bind_dmabuf.
  selftests: netfilter: nft_queue.sh: include file transfer duration in log message
  net: phy: mscc: Fix memory leak when using one step timestamping
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking updates from Paolo Abeni:
 "Core:

   - Implement the Device Memory TCP transmit path, allowing zero-copy
     data transmission on top of TCP from e.g. GPU memory to the wire.

   - Move all the IPv6 routing tables management outside the RTNL scope,
     under its own lock and RCU. The route control path is now 3x times
     faster.

   - Convert queue related netlink ops to instance lock, reducing again
     the scope of the RTNL lock. This improves the control plane
     scalability.

   - Refactor the software crc32c implementation, removing unneeded
     abstraction layers and improving significantly the related
     micro-benchmarks.

   - Optimize the GRO engine for UDP-tunneled traffic, for a 10%
     performance improvement in related stream tests.

   - Cover more per-CPU storage with local nested BH locking; this is a
     prep work to remove the current per-CPU lock in local_bh_disable()
     on PREMPT_RT.

   - Introduce and use nlmsg_payload helper, combining buffer bounds
     verification with accessing payload carried by netlink messages.

  Netfilter:

   - Rewrite the procfs conntrack table implementation, improving
     considerably the dump performance. A lot of user-space tools still
     use this interface.

   - Implement support for wildcard netdevice in netdev basechain and
     flowtables.

   - Integrate conntrack information into nft trace infrastructure.

   - Export set count and backend name to userspace, for better
     introspection.

  BPF:

   - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
     programs and can be controlled in similar way to traditional qdiscs
     using the "tc qdisc" command.

   - Refactor the UDP socket iterator, addressing long standing issues
     WRT duplicate hits or missed sockets.

  Protocols:

   - Improve TCP receive buffer auto-tuning and increase the default
     upper bound for the receive buffer; overall this improves the
     single flow maximum thoughput on 200Gbs link by over 60%.

   - Add AFS GSSAPI security class to AF_RXRPC; it provides transport
     security for connections to the AFS fileserver and VL server.

   - Improve TCP multipath routing, so that the sources address always
     matches the nexthop device.

   - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
     and thus preventing DoS caused by passing around problematic FDs.

   - Retire DCCP socket. DCCP only receives updates for bugs, and major
     distros disable it by default. Its removal allows for better
     organisation of TCP fields to reduce the number of cache lines hit
     in the fast path.

   - Extend TCP drop-reason support to cover PAWS checks.

  Driver API:

   - Reorganize PTP ioctl flag support to require an explicit opt-in for
     the drivers, avoiding the problem of drivers not rejecting new
     unsupported flags.

   - Converted several device drivers to timestamping APIs.

   - Introduce per-PHY ethtool dump helpers, improving the support for
     dump operations targeting PHYs.

  Tests and tooling:

   - Add support for classic netlink in user space C codegen, so that
     ynl-c can now read, create and modify links, routes addresses and
     qdisc layer configuration.

   - Add ynl sub-types for binary attributes, allowing ynl-c to output
     known struct instead of raw binary data, clarifying the classic
     netlink output.

   - Extend MPTCP selftests to improve the code-coverage.

   - Add tests for XDP tail adjustment in AF_XDP.

  New hardware / drivers:

   - OpenVPN virtual driver: offload OpenVPN data channels processing to
     the kernel-space, increasing the data transfer throughput WRT the
     user-space implementation.

   - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.

   - Broadcom asp-v3.0 ethernet driver.

   - AMD Renoir ethernet device.

   - ReakTek MT9888 2.5G ethernet PHY driver.

   - Aeonsemi 10G C45 PHYs driver.

  Drivers:

   - Ethernet high-speed NICs:
       - nVidia/Mellanox (mlx5):
           - refactor the steering table handling to significantly
             reduce the amount of memory used
           - add support for complex matches in H/W flow steering
           - improve flow streeing error handling
           - convert to netdev instance locking
       - Intel (100G, ice, igb, ixgbe, idpf):
           - ice: add switchdev support for LLDP traffic over VF
           - ixgbe: add firmware manipulation and regions devlink support
           - igb: introduce support for frame transmission premption
           - igb: adds persistent NAPI configuration
           - idpf: introduce RDMA support
           - idpf: add initial PTP support
       - Meta (fbnic):
           - extend hardware stats coverage
           - add devlink dev flash support
       - Broadcom (bnxt):
           - add support for RX-side device memory TCP
       - Wangxun (txgbe):
           - implement support for udp tunnel offload
           - complete PTP and SRIOV support for AML 25G/10G devices

   - Ethernet NICs embedded and virtual:
       - Google (gve):
           - add device memory TCP TX support
       - Amazon (ena):
           - support persistent per-NAPI config
       - Airoha:
           - add H/W support for L2 traffic offload
           - add per flow stats for flow offloading
       - RealTek (rtl8211): add support for WoL magic packet
       - Synopsys (stmmac):
           - dwmac-socfpga 1000BaseX support
           - add Loongson-2K3000 support
           - introduce support for hardware-accelerated VLAN stripping
       - Broadcom (bcmgenet):
           - expose more H/W stats
       - Freescale (enetc, dpaa2-eth):
           - enetc: add MAC filter, VLAN filter RSS and loopback support
           - dpaa2-eth: convert to H/W timestamping APIs
       - vxlan: convert FDB table to rhashtable, for better scalabilty
       - veth: apply qdisc backpressure on full ring to reduce TX drops

   - Ethernet switches:
       - Microchip (kzZ88x3): add ETS scheduler support

   - Ethernet PHYs:
       - RealTek (rtl8211):
           - add support for WoL magic packet
           - add support for PHY LEDs

   - CAN:
       - Adds RZ/G3E CANFD support to the rcar_canfd driver.
       - Preparatory work for CAN-XL support.
       - Add self-tests framework with support for CAN physical interfaces.

   - WiFi:
       - mac80211:
           - scan improvements with multi-link operation (MLO)
       - Qualcomm (ath12k):
           - enable AHB support for IPQ5332
           - add monitor interface support to QCN9274
           - add multi-link operation support to WCN7850
           - add 802.11d scan offload support to WCN7850
           - monitor mode for WCN7850, better 6 GHz regulatory
       - Qualcomm (ath11k):
           - restore hibernation support
       - MediaTek (mt76):
           - WiFi-7 improvements
           - implement support for mt7990
       - Intel (iwlwifi):
           - enhanced multi-link single-radio (EMLSR) support on 5 GHz links
           - rework device configuration
       - RealTek (rtw88):
           - improve throughput for RTL8814AU
       - RealTek (rtw89):
           - add multi-link operation support
           - STA/P2P concurrency improvements
           - support different SAR configs by antenna

   - Bluetooth:
       - introduce HCI Driver protocol
       - btintel_pcie: do not generate coredump for diagnostic events
       - btusb: add HCI Drv commands for configuring altsetting
       - btusb: add RTL8851BE device 0x0bda:0xb850
       - btusb: add new VID/PID 13d3/3584 for MT7922
       - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
       - btnxpuart: implement host-wakeup feature"

* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
  selftests/bpf: Fix bpf selftest build warning
  selftests: netfilter: Fix skip of wildcard interface test
  net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
  net: openvswitch: Fix the dead loop of MPLS parse
  calipso: Don't call calipso functions for AF_INET sk.
  selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
  net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
  octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
  octeontx2-pf: QOS: Perform cache sync on send queue teardown
  net: mana: Add support for Multi Vports on Bare metal
  net: devmem: ncdevmem: remove unused variable
  net: devmem: ksft: upgrade rx test to send 1K data
  net: devmem: ksft: add 5 tuple FS support
  net: devmem: ksft: add exit_wait to make rx test pass
  net: devmem: ksft: add ipv4 support
  net: devmem: preserve sockc_err
  page_pool: fix ugly page_pool formatting
  net: devmem: move list_add to net_devmem_bind_dmabuf.
  selftests: netfilter: nft_queue.sh: include file transfer duration in log message
  net: phy: mscc: Fix memory leak when using one step timestamping
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-05-26T23:04:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-26T23:04:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=785cdec46e9227f9433884ed3b436471e944007c'/>
<id>785cdec46e9227f9433884ed3b436471e944007c</id>
<content type='text'>
Pull core x86 updates from Ingo Molnar:
 "Boot code changes:

   - A large series of changes to reorganize the x86 boot code into a
     better isolated and easier to maintain base of PIC early startup
     code in arch/x86/boot/startup/, by Ard Biesheuvel.

     Motivation &amp; background:

  	| Since commit
  	|
  	|    c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C")
  	|
  	| dated Jun 6 2017, we have been using C code on the boot path in a way
  	| that is not supported by the toolchain, i.e., to execute non-PIC C
  	| code from a mapping of memory that is different from the one provided
  	| to the linker. It should have been obvious at the time that this was a
  	| bad idea, given the need to sprinkle fixup_pointer() calls left and
  	| right to manipulate global variables (including non-pointer variables)
  	| without crashing.
  	|
  	| This C startup code has been expanding, and in particular, the SEV-SNP
  	| startup code has been expanding over the past couple of years, and
  	| grown many of these warts, where the C code needs to use special
  	| annotations or helpers to access global objects.

     This tree includes the first phase of this work-in-progress x86
     boot code reorganization.

  Scalability enhancements and micro-optimizations:

   - Improve code-patching scalability (Eric Dumazet)

   - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)

  CPU features enumeration updates:

   - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
     Darwish)

   - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
     Thomas Gleixner)

   - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)

  Memory management changes:

   - Allow temporary MMs when IRQs are on (Andy Lutomirski)

   - Opt-in to IRQs-off activate_mm() (Andy Lutomirski)

   - Simplify choose_new_asid() and generate better code (Borislav
     Petkov)

   - Simplify 32-bit PAE page table handling (Dave Hansen)

   - Always use dynamic memory layout (Kirill A. Shutemov)

   - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)

   - Make 5-level paging support unconditional (Kirill A. Shutemov)

   - Stop prefetching current-&gt;mm-&gt;mmap_lock on page faults (Mateusz
     Guzik)

   - Predict valid_user_address() returning true (Mateusz Guzik)

   - Consolidate initmem_init() (Mike Rapoport)

  FPU support and vector computing:

   - Enable Intel APX support (Chang S. Bae)

   - Reorgnize and clean up the xstate code (Chang S. Bae)

   - Make task_struct::thread constant size (Ingo Molnar)

   - Restore fpu_thread_struct_whitelist() to fix
     CONFIG_HARDENED_USERCOPY=y (Kees Cook)

   - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
     Nesterov)

   - Always preserve non-user xfeatures/flags in __state_perm (Sean
     Christopherson)

  Microcode loader changes:

   - Help users notice when running old Intel microcode (Dave Hansen)

   - AMD: Do not return error when microcode update is not necessary
     (Annie Li)

   - AMD: Clean the cache if update did not load microcode (Boris
     Ostrovsky)

  Code patching (alternatives) changes:

   - Simplify, reorganize and clean up the x86 text-patching code (Ingo
     Molnar)

   - Make smp_text_poke_batch_process() subsume
     smp_text_poke_batch_finish() (Nikolay Borisov)

   - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)

  Debugging support:

   - Add early IDT and GDT loading to debug relocate_kernel() bugs
     (David Woodhouse)

   - Print the reason for the last reset on modern AMD CPUs (Yazen
     Ghannam)

   - Add AMD Zen debugging document (Mario Limonciello)

   - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)

   - Stop decoding i64 instructions in x86-64 mode at opcode (Masami
     Hiramatsu)

  CPU bugs and bug mitigations:

   - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)

   - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)

   - Restructure and harmonize the various CPU bug mitigation methods
     (David Kaplan)

   - Fix spectre_v2 mitigation default on Intel (Pawan Gupta)

  MSR API:

   - Large MSR code and API cleanup (Xin Li)

   - In-kernel MSR API type cleanups and renames (Ingo Molnar)

  PKEYS:

   - Simplify PKRU update in signal frame (Chang S. Bae)

  NMI handling code:

   - Clean up, refactor and simplify the NMI handling code (Sohil Mehta)

   - Improve NMI duration console printouts (Sohil Mehta)

  Paravirt guests interface:

   - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)

  SEV support:

   - Share the sev_secrets_pa value again (Tom Lendacky)

  x86 platform changes:

   - Introduce the &lt;asm/amd/&gt; header namespace (Ingo Molnar)

   - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
     &lt;asm/amd/fch.h&gt; (Mario Limonciello)

  Fixes and cleanups:

   - x86 assembly code cleanups and fixes (Uros Bizjak)

   - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
     Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
     Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
     Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
     Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
     Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
     Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"

* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
  x86/bugs: Fix spectre_v2 mitigation default on Intel
  x86/bugs: Restructure ITS mitigation
  x86/xen/msr: Fix uninitialized variable 'err'
  x86/msr: Remove a superfluous inclusion of &lt;asm/asm.h&gt;
  x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
  x86/mm/64: Make 5-level paging support unconditional
  x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
  x86/mm/64: Always use dynamic memory layout
  x86/bugs: Fix indentation due to ITS merge
  x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
  x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
  x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
  x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
  x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
  x86/cpuid: Set &lt;asm/cpuid/api.h&gt; as the main CPUID header
  x86/cpuid: Move CPUID(0x2) APIs into &lt;cpuid/api.h&gt;
  x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
  x86/mm: Fix kernel-doc descriptions of various pgtable methods
  x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
  x86/boot: Defer initialization of VM space related global variables
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull core x86 updates from Ingo Molnar:
 "Boot code changes:

   - A large series of changes to reorganize the x86 boot code into a
     better isolated and easier to maintain base of PIC early startup
     code in arch/x86/boot/startup/, by Ard Biesheuvel.

     Motivation &amp; background:

  	| Since commit
  	|
  	|    c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C")
  	|
  	| dated Jun 6 2017, we have been using C code on the boot path in a way
  	| that is not supported by the toolchain, i.e., to execute non-PIC C
  	| code from a mapping of memory that is different from the one provided
  	| to the linker. It should have been obvious at the time that this was a
  	| bad idea, given the need to sprinkle fixup_pointer() calls left and
  	| right to manipulate global variables (including non-pointer variables)
  	| without crashing.
  	|
  	| This C startup code has been expanding, and in particular, the SEV-SNP
  	| startup code has been expanding over the past couple of years, and
  	| grown many of these warts, where the C code needs to use special
  	| annotations or helpers to access global objects.

     This tree includes the first phase of this work-in-progress x86
     boot code reorganization.

  Scalability enhancements and micro-optimizations:

   - Improve code-patching scalability (Eric Dumazet)

   - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper)

  CPU features enumeration updates:

   - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S.
     Darwish)

   - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish,
     Thomas Gleixner)

   - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish)

  Memory management changes:

   - Allow temporary MMs when IRQs are on (Andy Lutomirski)

   - Opt-in to IRQs-off activate_mm() (Andy Lutomirski)

   - Simplify choose_new_asid() and generate better code (Borislav
     Petkov)

   - Simplify 32-bit PAE page table handling (Dave Hansen)

   - Always use dynamic memory layout (Kirill A. Shutemov)

   - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov)

   - Make 5-level paging support unconditional (Kirill A. Shutemov)

   - Stop prefetching current-&gt;mm-&gt;mmap_lock on page faults (Mateusz
     Guzik)

   - Predict valid_user_address() returning true (Mateusz Guzik)

   - Consolidate initmem_init() (Mike Rapoport)

  FPU support and vector computing:

   - Enable Intel APX support (Chang S. Bae)

   - Reorgnize and clean up the xstate code (Chang S. Bae)

   - Make task_struct::thread constant size (Ingo Molnar)

   - Restore fpu_thread_struct_whitelist() to fix
     CONFIG_HARDENED_USERCOPY=y (Kees Cook)

   - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg
     Nesterov)

   - Always preserve non-user xfeatures/flags in __state_perm (Sean
     Christopherson)

  Microcode loader changes:

   - Help users notice when running old Intel microcode (Dave Hansen)

   - AMD: Do not return error when microcode update is not necessary
     (Annie Li)

   - AMD: Clean the cache if update did not load microcode (Boris
     Ostrovsky)

  Code patching (alternatives) changes:

   - Simplify, reorganize and clean up the x86 text-patching code (Ingo
     Molnar)

   - Make smp_text_poke_batch_process() subsume
     smp_text_poke_batch_finish() (Nikolay Borisov)

   - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra)

  Debugging support:

   - Add early IDT and GDT loading to debug relocate_kernel() bugs
     (David Woodhouse)

   - Print the reason for the last reset on modern AMD CPUs (Yazen
     Ghannam)

   - Add AMD Zen debugging document (Mario Limonciello)

   - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu)

   - Stop decoding i64 instructions in x86-64 mode at opcode (Masami
     Hiramatsu)

  CPU bugs and bug mitigations:

   - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov)

   - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov)

   - Restructure and harmonize the various CPU bug mitigation methods
     (David Kaplan)

   - Fix spectre_v2 mitigation default on Intel (Pawan Gupta)

  MSR API:

   - Large MSR code and API cleanup (Xin Li)

   - In-kernel MSR API type cleanups and renames (Ingo Molnar)

  PKEYS:

   - Simplify PKRU update in signal frame (Chang S. Bae)

  NMI handling code:

   - Clean up, refactor and simplify the NMI handling code (Sohil Mehta)

   - Improve NMI duration console printouts (Sohil Mehta)

  Paravirt guests interface:

   - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov)

  SEV support:

   - Share the sev_secrets_pa value again (Tom Lendacky)

  x86 platform changes:

   - Introduce the &lt;asm/amd/&gt; header namespace (Ingo Molnar)

   - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to
     &lt;asm/amd/fch.h&gt; (Mario Limonciello)

  Fixes and cleanups:

   - x86 assembly code cleanups and fixes (Uros Bizjak)

   - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy
     Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav
     Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David
     Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf,
     Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan
     Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank
     Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)"

* tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits)
  x86/bugs: Fix spectre_v2 mitigation default on Intel
  x86/bugs: Restructure ITS mitigation
  x86/xen/msr: Fix uninitialized variable 'err'
  x86/msr: Remove a superfluous inclusion of &lt;asm/asm.h&gt;
  x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only
  x86/mm/64: Make 5-level paging support unconditional
  x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model
  x86/mm/64: Always use dynamic memory layout
  x86/bugs: Fix indentation due to ITS merge
  x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor()
  x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter
  x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter
  x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2()
  x86/cpuid: Rename have_cpuid_p() to cpuid_feature()
  x86/cpuid: Set &lt;asm/cpuid/api.h&gt; as the main CPUID header
  x86/cpuid: Move CPUID(0x2) APIs into &lt;cpuid/api.h&gt;
  x86/msr: Add rdmsrl_on_cpu() compatibility wrapper
  x86/mm: Fix kernel-doc descriptions of various pgtable methods
  x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too
  x86/boot: Defer initialization of VM space related global variables
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_dup{4, 6}: Move duplication check to task_struct</title>
<updated>2025-05-23T11:57:12+00:00</updated>
<author>
<name>Sebastian Andrzej Siewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-05-12T10:28:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a1f1acb9c5db9b385c9b3eb1f27f897c06df49ae'/>
<id>a1f1acb9c5db9b385c9b3eb1f27f897c06df49ae</id>
<content type='text'>
nf_skb_duplicated is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Due to the recursion involved, the simplest change is to make it a
per-task variable.

Move the per-CPU variable nf_skb_duplicated to task_struct and name it
in_nf_duplicate. Add it to the existing bitfield so it doesn't use
additional memory.

Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nf_skb_duplicated is a per-CPU variable and relies on disabled BH for its
locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT
this data structure requires explicit locking.

Due to the recursion involved, the simplest change is to make it a
per-task variable.

Move the per-CPU variable nf_skb_duplicated to task_struct and name it
in_nf_duplicate. Add it to the existing bitfield so it doesn't use
additional memory.

Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Juri Lelli &lt;juri.lelli@redhat.com&gt;
Cc: Vincent Guittot &lt;vincent.guittot@linaro.org&gt;
Cc: Dietmar Eggemann &lt;dietmar.eggemann@arm.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Ben Segall &lt;bsegall@google.com&gt;
Cc: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Valentin Schneider &lt;vschneid@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched,livepatch: Untangle cond_resched() and live-patching</title>
<updated>2025-05-14T11:16:24+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2025-05-09T11:36:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=676e8cf70cb0533e1118e29898c9a9c33ae3a10f'/>
<id>676e8cf70cb0533e1118e29898c9a9c33ae3a10f</id>
<content type='text'>
With the goal of deprecating / removing VOLUNTARY preempt, live-patch
needs to stop relying on cond_resched() to make forward progress.

Instead, rely on schedule() with TASK_FREEZABLE set. Just like
live-patching, the freezer needs to be able to stop tasks in a safe /
known state.

[bigeasy: use likely() in __klp_sched_try_switch() and update comments]

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Tested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Tested-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Acked-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Link: https://lore.kernel.org/r/20250509113659.wkP_HJ5z@linutronix.de
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the goal of deprecating / removing VOLUNTARY preempt, live-patch
needs to stop relying on cond_resched() to make forward progress.

Instead, rely on schedule() with TASK_FREEZABLE set. Just like
live-patching, the freezer needs to be able to stop tasks in a safe /
known state.

[bigeasy: use likely() in __klp_sched_try_switch() and update comments]

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Petr Mladek &lt;pmladek@suse.com&gt;
Tested-by: Petr Mladek &lt;pmladek@suse.com&gt;
Tested-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Acked-by: Miroslav Benes &lt;mbenes@suse.cz&gt;
Acked-by: Josh Poimboeuf &lt;jpoimboe@kernel.org&gt;
Link: https://lore.kernel.org/r/20250509113659.wkP_HJ5z@linutronix.de
</pre>
</div>
</content>
</entry>
<entry>
<title>hung_task: replace blocker_mutex with encoded blocker</title>
<updated>2025-05-12T00:54:07+00:00</updated>
<author>
<name>Lance Yang</name>
<email>ioworker0@gmail.com</email>
</author>
<published>2025-04-14T14:59:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e711faaafbe54a884f33b53472434063d342f6d4'/>
<id>e711faaafbe54a884f33b53472434063d342f6d4</id>
<content type='text'>
Patch series "hung_task: extend blocking task stacktrace dump to
semaphore", v5.

Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores.  Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs.  To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it.  While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  &lt;TASK&gt;
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  &lt;/TASK&gt;
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  &lt;TASK&gt;
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  &lt;/TASK&gt;


This patch (of 3):

This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
blocker', as only one blocker is active at a time.

The blocker filed can store both the lock addrees and the lock type, with
LSB used to encode the type as Masami suggested, making it easier to
extend the feature to cover other types of locks.

Also, once the lock type is determined, we can directly extract the
address and cast it to a lock pointer ;)

Link: https://lkml.kernel.org/r/20250414145945.84916-1-ioworker0@gmail.com
Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com [1]
Link: https://lkml.kernel.org/r/20250414145945.84916-2-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang &lt;mingzhe.yang@ly.com&gt;
Signed-off-by: Lance Yang &lt;ioworker0@gmail.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Suggested-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Cc: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: John Stultz &lt;jstultz@google.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tomasz Figa &lt;tfiga@chromium.org&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yongliang Gao &lt;leonylgao@tencent.com&gt;
Cc: Zi Li &lt;amaindex@outlook.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Patch series "hung_task: extend blocking task stacktrace dump to
semaphore", v5.

Inspired by mutex blocker tracking[1], this patch series extend the
feature to not only dump the blocker task holding a mutex but also to
support semaphores.  Unlike mutexes, semaphores lack explicit ownership
tracking, making it challenging to identify the root cause of hangs.  To
address this, we introduce a last_holder field to the semaphore structure,
which is updated when a task successfully calls down() and cleared during
up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it.  While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr  8 12:19:07 2025]       Tainted: G            E      6.14.0-rc6+ #1
[Tue Apr  8 12:19:07 2025] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr  8 12:19:07 2025] task:cat             state:D stack:0     pid:945   tgid:945   ppid:828    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  &lt;TASK&gt;
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0xe3/0xf0
[Tue Apr  8 12:19:07 2025]  ? __folio_mod_stat+0x2a/0x80
[Tue Apr  8 12:19:07 2025]  ? set_ptes.constprop.0+0x27/0x90
[Tue Apr  8 12:19:07 2025]  __down_common+0x155/0x280
[Tue Apr  8 12:19:07 2025]  down+0x53/0x70
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x23/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  &lt;/TASK&gt;
[Tue Apr  8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr  8 12:19:07 2025] task:cat             state:S stack:0     pid:938   tgid:938   ppid:584    task_flags:0x400000 flags:0x00000000
[Tue Apr  8 12:19:07 2025] Call Trace:
[Tue Apr  8 12:19:07 2025]  &lt;TASK&gt;
[Tue Apr  8 12:19:07 2025]  __schedule+0x491/0xbd0
[Tue Apr  8 12:19:07 2025]  ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr  8 12:19:07 2025]  schedule+0x27/0xf0
[Tue Apr  8 12:19:07 2025]  schedule_timeout+0x77/0xf0
[Tue Apr  8 12:19:07 2025]  ? __pfx_process_timeout+0x10/0x10
[Tue Apr  8 12:19:07 2025]  msleep_interruptible+0x49/0x60
[Tue Apr  8 12:19:07 2025]  read_dummy_semaphore+0x2d/0x60
[Tue Apr  8 12:19:07 2025]  full_proxy_read+0x5f/0xa0
[Tue Apr  8 12:19:07 2025]  vfs_read+0xbc/0x350
[Tue Apr  8 12:19:07 2025]  ? __count_memcg_events+0xa5/0x140
[Tue Apr  8 12:19:07 2025]  ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr  8 12:19:07 2025]  ? handle_mm_fault+0x180/0x260
[Tue Apr  8 12:19:07 2025]  ksys_read+0x66/0xe0
[Tue Apr  8 12:19:07 2025]  do_syscall_64+0x51/0x120
[Tue Apr  8 12:19:07 2025]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr  8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr  8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr  8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr  8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr  8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr  8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr  8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr  8 12:19:07 2025]  &lt;/TASK&gt;


This patch (of 3):

This patch replaces 'struct mutex *blocker_mutex' with 'unsigned long
blocker', as only one blocker is active at a time.

The blocker filed can store both the lock addrees and the lock type, with
LSB used to encode the type as Masami suggested, making it easier to
extend the feature to cover other types of locks.

Also, once the lock type is determined, we can directly extract the
address and cast it to a lock pointer ;)

Link: https://lkml.kernel.org/r/20250414145945.84916-1-ioworker0@gmail.com
Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com [1]
Link: https://lkml.kernel.org/r/20250414145945.84916-2-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang &lt;mingzhe.yang@ly.com&gt;
Signed-off-by: Lance Yang &lt;ioworker0@gmail.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Suggested-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Suggested-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Cc: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: John Stultz &lt;jstultz@google.com&gt;
Cc: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tomasz Figa &lt;tfiga@chromium.org&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yongliang Gao &lt;leonylgao@tencent.com&gt;
Cc: Zi Li &lt;amaindex@outlook.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/fpu: Make task_struct::thread constant size</title>
<updated>2025-04-14T06:18:29+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2025-04-09T21:11:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cb7ca40a3882360ce87191793449d48df0b29184'/>
<id>cb7ca40a3882360ce87191793449d48df0b29184</id>
<content type='text'>
Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.

This allows us to remove the old kludge of task_struct being variable size:

  struct task_struct {

       ...
       /*
        * New fields for task_struct should be added above here, so that
        * they are included in the randomized portion of task_struct.
        */
       randomized_struct_fields_end

       /* CPU-specific state of this task: */
       struct thread_struct            thread;

       /*
        * WARNING: on x86, 'thread_struct' contains a variable-sized
        * structure.  It *MUST* be at the end of 'task_struct'.
        *
        * Do not put anything below here!
        */
  };

... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.

But the primary motivation is to allow the decoupling of task_struct from
hardware details (&lt;asm/processor.h&gt; in particular), and to eventually allow
the per-task infrastructure:

   DECLARE_PER_TASK(type, name);
   ...
   per_task(current, name) = val;

... which requires task_struct to be a constant size struct.

The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.

Fixed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Turn thread.fpu into a pointer. Since most FPU code internals work by passing
around the FPU pointer already, the code generation impact is small.

This allows us to remove the old kludge of task_struct being variable size:

  struct task_struct {

       ...
       /*
        * New fields for task_struct should be added above here, so that
        * they are included in the randomized portion of task_struct.
        */
       randomized_struct_fields_end

       /* CPU-specific state of this task: */
       struct thread_struct            thread;

       /*
        * WARNING: on x86, 'thread_struct' contains a variable-sized
        * structure.  It *MUST* be at the end of 'task_struct'.
        *
        * Do not put anything below here!
        */
  };

... which creates a number of problems, such as requiring thread_struct to be
the last member of the struct - not allowing it to be struct-randomized, etc.

But the primary motivation is to allow the decoupling of task_struct from
hardware details (&lt;asm/processor.h&gt; in particular), and to eventually allow
the per-task infrastructure:

   DECLARE_PER_TASK(type, name);
   ...
   per_task(current, name) = val;

... which requires task_struct to be a constant size struct.

The fpu_thread_struct_whitelist() quirk to hardened usercopy can be removed,
now that the FPU structure is not embedded in the task struct anymore, which
reduces text footprint a bit.

Fixed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Chang S. Bae &lt;chang.seok.bae@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Link: https://lore.kernel.org/r/20250409211127.3544993-4-mingo@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-04-01T17:06:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-04-01T17:06:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d6b02199cde4b9cb99b311eeab1cdbe23165082c'/>
<id>d6b02199cde4b9cb99b311eeab1cdbe23165082c</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
</pre>
</div>
</content>
</entry>
</feed>
