summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2026-05-28mm/damon: add node_eligible_mem_bp goal metricRavi Jonnalagadda
Background and Motivation ========================= In heterogeneous memory systems, controlling memory distribution across NUMA nodes is essential for performance optimization. This patch enables system-wide page distribution with target-state goals such as "maintain 60% of scheme-eligible memory on DRAM" using PA-mode DAMON schemes. Rather than using absolute thresholds, this metric tracks the ratio of memory that matches each scheme's access pattern filters on a target node, enabling the quota system to automatically adjust migration aggressiveness to maintain the desired distribution. What This Metric Measures ========================= node_eligible_mem_bp: scheme_eligible_bytes_on_node / total_scheme_eligible_bytes * 10000 Two-Scheme Setup for Hot Page Distribution ========================================== For maintaining 60% of hot memory on DRAM (node 0) and 40% on CXL (node 1): PULL scheme: migrate_hot to node 0 goal: node_eligible_mem_bp, nid=0, target=6000 addr filter: node 1 address range (only migrate FROM CXL) "Move hot pages to DRAM if less than 60% of hot data is in DRAM" PUSH scheme: migrate_hot to node 1 goal: node_eligible_mem_bp, nid=1, target=4000 addr filter: node 0 address range (only migrate FROM DRAM) "Move hot pages to CXL if less than 40% of hot data is in CXL" Each scheme independently measures its own eligible memory and adjusts its quota to achieve its target ratio. The schemes work in concert through DAMON's unified monitoring context, with the quota autotuner balancing their relative aggressiveness. Implementation Details ====================== The implementation adds a new quota goal metric type DAMOS_QUOTA_NODE_ELIGIBLE_MEM_BP to the existing DAMOS quota goal framework. When this metric is configured for a scheme: 1. During each quota adjustment cycle, damos_get_node_eligible_mem_bp() is called to calculate the current memory distribution. 2. The function iterates through all regions that match the scheme's access pattern (via __damos_valid_target()) and calculates: - Total eligible bytes across all nodes - Eligible bytes specifically on the target node (goal->nid) 3. For each eligible region, damos_calc_eligible_bytes() walks through the physical address range, using damon_get_folio() to look up each folio and determine its NUMA node via folio_nid(). 4. Large folios are handled by calculating the exact overlap between the region boundaries and folio boundaries, ensuring accurate byte counts even when regions partially span folios. 5. The ratio (node_eligible / total_eligible * 10000) is returned as basis points, which the quota autotuner uses to adjust the scheme's effective quota size (esz). The implementation requires CONFIG_DAMON_PADDR since damon_get_folio() is only available for physical address space monitoring. Testing Results =============== Functionally tested on a two-node heterogeneous memory system with DRAM (node 0) and CXL memory (node 1). A PUSH+PULL scheme configuration using migrate_hot actions was used to reach a target hot memory ratio between the two tiers. With the TEMPORAL tuner, the system converges quickly to the target distribution. The tuner drives esz to maximum when under goal and to zero once the goal is met, forming a simple on/off feedback loop that stabilizes at the desired ratio. With the CONSIST tuner, the scheme still converges but more slowly, as it migrates and then throttles itself based on quota feedback. The time to reach the goal varies depending on workload intensity. Note: This metric works with both TEMPORAL and CONSIST goal tuners. Link: https://lore.kernel.org/20260428030520.701-1-ravis.opensrc@gmail.com Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@gmail.com> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: Honggyu Kim <honggyu.kim@sk.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yunjeong Mun <yunjeong.mun@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28vmalloc: optimize vfree with free_pages_bulk()Ryan Roberts
Whenever vmalloc allocates high order pages (e.g. for a huge mapping) it must immediately split_page() to order-0 so that it remains compatible with users that want to access the underlying struct page. Commit a06157804399 ("mm/vmalloc: request large order pages from buddy allocator") recently made it much more likely for vmalloc to allocate high order pages which are subsequently split to order-0. Unfortunately this had the side effect of causing performance regressions for tight vmalloc/vfree loops (e.g. test_vmalloc.ko benchmarks). See Closes: tag. This happens because the high order pages must be gotten from the buddy but then because they are split to order-0, when they are freed they are freed to the order-0 pcp. Previously allocation was for order-0 pages so they were recycled from the pcp. It would be preferable if when vmalloc allocates an (e.g.) order-3 page that it also frees that order-3 page to the order-3 pcp, then the regression could be removed. So let's do exactly that; update stats separately first as coalescing is hard to do correctly without complexity. Use free_pages_bulk() which uses the new __free_contig_range() API to batch-free contiguous ranges of pfns. This not only removes the regression, but significantly improves performance of vfree beyond the baseline. A selection of test_vmalloc benchmarks running on arm64 server class system. mm-new is the baseline. Commit a06157804399 ("mm/vmalloc: request large order pages from buddy allocator") was added in v6.19-rc1 where we see regressions. Then with this change performance is much better. (>0 is faster, <0 is slower, (R)/(I) = statistically significant Regression/Improvement): +-----------------+----------------------------------------------------------+-------------------+--------------------+ | Benchmark | Result Class | mm-new | this series | +=================+==========================================================+===================+====================+ | micromm/vmalloc | fix_align_alloc_test: p:1, h:0, l:500000 (usec) | 1331843.33 | (I) 67.17% | | | fix_size_alloc_test: p:1, h:0, l:500000 (usec) | 415907.33 | -5.14% | | | fix_size_alloc_test: p:4, h:0, l:500000 (usec) | 755448.00 | (I) 53.55% | | | fix_size_alloc_test: p:16, h:0, l:500000 (usec) | 1591331.33 | (I) 57.26% | | | fix_size_alloc_test: p:16, h:1, l:500000 (usec) | 1594345.67 | (I) 68.46% | | | fix_size_alloc_test: p:64, h:0, l:100000 (usec) | 1071826.00 | (I) 79.27% | | | fix_size_alloc_test: p:64, h:1, l:100000 (usec) | 1018385.00 | (I) 84.17% | | | fix_size_alloc_test: p:256, h:0, l:100000 (usec) | 3970899.67 | (I) 77.01% | | | fix_size_alloc_test: p:256, h:1, l:100000 (usec) | 3821788.67 | (I) 89.44% | | | fix_size_alloc_test: p:512, h:0, l:100000 (usec) | 7795968.00 | (I) 82.67% | | | fix_size_alloc_test: p:512, h:1, l:100000 (usec) | 6530169.67 | (I) 118.09% | | | full_fit_alloc_test: p:1, h:0, l:500000 (usec) | 626808.33 | -0.98% | | | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 532145.67 | -1.68% | | | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) | 537032.67 | -0.96% | | | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec) | 8805069.00 | (I) 74.58% | | | pcpu_alloc_test: p:1, h:0, l:500000 (usec) | 500824.67 | 4.35% | | | random_size_align_alloc_test: p:1, h:0, l:500000 (usec) | 1637554.67 | (I) 76.99% | | | random_size_alloc_test: p:1, h:0, l:500000 (usec) | 4556288.67 | (I) 72.23% | | | vm_map_ram_test: p:1, h:0, l:500000 (usec) | 107371.00 | -0.70% | +-----------------+----------------------------------------------------------+-------------------+--------------------+ Link: https://lore.kernel.org/20260401101634.2868165-3-usama.anjum@arm.com Fixes: a06157804399 ("mm/vmalloc: request large order pages from buddy allocator") Closes: https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Co-developed-by: Muhammad Usama Anjum <usama.anjum@arm.com> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Sterba <dsterba@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28mm/page_alloc: optimize free_contig_range()Ryan Roberts
Patch series "mm: Free contiguous order-0 pages efficiently", v6. A recent change to vmalloc caused some performance benchmark regressions (see [1]). I'm attempting to fix that (and at the same time significantly improve beyond the baseline) by freeing a contiguous set of order-0 pages as a batch. At the same time I observed that free_contig_range() was essentially doing the same thing as vfree() so I've fixed it there too. While at it, optimize the __free_contig_frozen_range() as well. Check that the contiguous range falls in the same section. If they aren't enabled, the if conditions get optimized out by the compiler as memdesc_section() returns 0. See num_pages_contiguous() for more details about it. This patch (of 3): Decompose the range of order-0 pages to be freed into the set of largest possible power-of-2 size and aligned chunks and free them to the pcp or buddy. This improves on the previous approach which freed each order-0 page individually in a loop. Testing shows performance to be improved by more than 10x in some cases. Since each page is order-0, we must decrement each page's reference count individually and only consider the page for freeing as part of a high order chunk if the reference count goes to zero. Additionally free_pages_prepare() must be called for each individual order-0 page too, so that the struct page state and global accounting state can be appropriately managed. But once this is done, the resulting high order chunks can be freed as a unit to the pcp or buddy. This significantly speeds up the free operation but also has the side benefit that high order blocks are added to the pcp instead of each page ending up on the pcp order-0 list; memory remains more readily available in high orders. vmalloc will shortly become a user of this new optimized free_contig_range() since it aggressively allocates high order non-compound pages, but then calls split_page() to end up with contiguous order-0 pages. These can now be freed much more efficiently. The execution time of the following function was measured in a server class arm64 machine: static int page_alloc_high_order_test(void) { unsigned int order = HPAGE_PMD_ORDER; struct page *page; int i; for (i = 0; i < 100000; i++) { page = alloc_pages(GFP_KERNEL, order); if (!page) return -1; split_page(page, order); free_contig_range(page_to_pfn(page), 1UL << order); } return 0; } Execution time before: 4097358 usec Execution time after: 729831 usec Perf trace before: 99.63% 0.00% kthreadd [kernel.kallsyms] [.] kthread | ---kthread 0xffffb33c12a26af8 | |--98.13%--0xffffb33c12a26060 | | | |--97.37%--free_contig_range | | | | | |--94.93%--___free_pages | | | | | | | |--55.42%--__free_frozen_pages | | | | | | | | | --43.20%--free_frozen_page_commit | | | | | | | | | --35.37%--_raw_spin_unlock_irqrestore | | | | | | | |--11.53%--_raw_spin_trylock | | | | | | | |--8.19%--__preempt_count_dec_and_test | | | | | | | |--5.64%--_raw_spin_unlock | | | | | | | |--2.37%--__get_pfnblock_flags_mask.isra.0 | | | | | | | --1.07%--free_frozen_page_commit | | | | | --1.54%--__free_frozen_pages | | | --0.77%--___free_pages | --0.98%--0xffffb33c12a26078 alloc_pages_noprof Perf trace after: 8.42% 2.90% kthreadd [kernel.kallsyms] [k] __free_contig_range | |--5.52%--__free_contig_range | | | |--5.00%--free_prepared_contig_range | | | | | |--1.43%--__free_frozen_pages | | | | | | | --0.51%--free_frozen_page_commit | | | | | |--1.08%--_raw_spin_trylock | | | | | --0.89%--_raw_spin_unlock | | | --0.52%--free_pages_prepare | --2.90%--ret_from_fork kthread 0xffffae1c12abeaf8 0xffffae1c12abe7a0 | --2.69%--vfree __free_contig_range Link: https://lore.kernel.org/20260401101634.2868165-1-usama.anjum@arm.com Link: https://lore.kernel.org/20260401101634.2868165-2-usama.anjum@arm.com Link: https://lore.kernel.org/all/66919a28-bc81-49c9-b68f-dd7c73395a0d@arm.com [1] Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Co-developed-by: Muhammad Usama Anjum <usama.anjum@arm.com> Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Sterba <dsterba@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Terrell <terrelln@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28mm/vmscan: add balance_pgdat begin/end tracepointsBunyod Suvonov
Vmscan has six main reclaim entry points: try_to_free_pages() for direct reclaim, try_to_free_mem_cgroup_pages() for memcg reclaim, mem_cgroup_shrink_node() for memcg soft limit reclaim, node_reclaim() for node reclaim, shrink_all_memory() for hibernation reclaim, and balance_pgdat() for kswapd reclaim. All of them, except for shrink_all_memory() and balance_pgdat(), already have begin/end tracepoints. This makes it harder to trace which reclaim path is responsible for memory reclaim activity, because kswapd reclaim cannot be identified as cleanly as other reclaim entry points, even though it is the main background reclaim path under memory pressure. There may be no need to trace shrink_all_memory() as it is primarily used during hibernation. So this patch adds the missing tracepoint pair for balance_pgdat(). The begin tracepoint records the node id, requested reclaim order, and the requested classzone bound (highest_zoneidx). The end tracepoint records the node id, the reclaim order that balance_pgdat() finished with, the requested classzone bound, and nr_reclaimed. Together, they show the requested reclaim order and classzone bound, whether reclaim fell back to a lower order, and how much reclaim work was done. The end tracepoint also records highest_zoneidx even though it does not change within a balance_pgdat() invocation. This keeps the end event self-contained, so users can analyze reclaim results directly from end events without depending on begin/end correlation, which is less convenient when tracing is filtered or records are dropped. It also makes it straightforward to relate nr_reclaimed and the final reclaim order to the requested classzone bound. Link: https://lore.kernel.org/20260424031418.174597-1-b.suvonov@sjtu.edu.cn Link: https://lore.kernel.org/20260423103753.546582-1-b.suvonov@sjtu.edu.cn Signed-off-by: Bunyod Suvonov <b.suvonov@sjtu.edu.cn> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28mm: convert vmemmap_p?d_populate() to static functionsChengkaitao
Since the vmemmap_p?d_populate functions are unused outside the mm subsystem, we can remove their external declarations and convert them to static functions. Link: https://lore.kernel.org/20260423101441.7089-1-kaitao.cheng@linux.dev Signed-off-by: Chengkaitao <chengkaitao@kylinos.cn> Acked-by: David Hildenbrand (arm) <david@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoisonWupeng Ma
Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page can trigger a recursive spinlock self-deadlock (AA deadlock) on hugetlb_lock when racing with a concurrent unmap: thread#0 thread#1 -------- -------- madvise(folio, MADV_HWPOISON) -> poisons the folio successfully madvise(folio, MADV_HWPOISON) unmap(folio) try_memory_failure_hugetlb get_huge_page_for_hwpoison spin_lock_irq(&hugetlb_lock) <- held __get_huge_page_for_hwpoison hugetlb_update_hwpoison() -> MF_HUGETLB_FOLIO_PRE_POISONED goto out: folio_put() refcount: 1 -> 0 free_huge_folio() spin_lock_irqsave(&hugetlb_lock) -> AA DEADLOCK! The out: path in __get_huge_page_for_hwpoison() calls folio_put() to drop the GUP reference while the hugetlb_lock is still held by the hugetlb.c wrapper get_huge_page_for_hwpoison(). If concurrent unmap has released the page table mapping reference, folio_put() drops the folio refcount to zero, triggering free_huge_folio() which attempts to re-acquire the non-recursive hugetlb_lock. Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the folio_put() at the out: label so the folio is always released outside the lock. [akpm@linux-foundation.org: fix race, rename label per Miaohe] Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@huawei.com Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@huawei.com Link: https://lore.kernel.org/20260522010305.4099834-1-mawupeng1@huawei.com Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28ring-buffer: Add persistent ring buffer invalid-page inject testMasami Hiramatsu (Google)
Add a self-corrupting test for the persistent ring buffer. This will inject an erroneous value to some sub-buffer pages (where the index is even or multiples of 5) in the persistent ring buffer when the kernel panics, and checks whether the number of detected invalid pages and the total entry_bytes are the same as the recorded values after reboot. This ensures that the kernel can correctly recover a partially corrupted persistent ring buffer after a reboot or panic. The test only runs on the persistent ring buffer whose name is "ptracingtest". The user has to fill it with events before a kernel panic. To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_INJECT and add the following kernel cmdline: reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace panic=1 Run the following commands after the 1st boot: cd /sys/kernel/tracing/instances/ptracingtest echo 1 > tracing_on echo 1 > events/enable sleep 3 echo c > /proc/sysrq-trigger After panic message, the kernel will reboot and run the verification on the persistent ring buffer, e.g. Ring buffer meta [2] invalid buffer page detected Ring buffer meta [2] is from previous boot! (318 pages discarded) Ring buffer testing [2] invalid pages: PASSED (318/318) Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476) Link: https://patch.msgid.link/20260522171051.260140328@kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-28spmi: use kzalloc_flex in main allocationRosen Penev
Add a flexible array member to avoid indexing past the struct. Signed-off-by: Rosen Penev <rosenp@gmail.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2026-05-28spmi: clean up kernel-doc in spmi.hRandy Dunlap
Fix all kernel-doc warnings in spmi.h: Warning: include/linux/spmi.h:114 function parameter 'ctrl' not described in 'spmi_controller_put' Warning: include/linux/spmi.h:144 struct member 'shutdown' not described in 'spmi_driver' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2026-05-28net: make page_pool_get_stats() voidJakub Kicinski
The kdoc for page_pool_get_stats() is missing a Returns: statement. Looking at this function, I have no idea what is the purpose of the bool it returns. My guess was that maybe the static inline stub returns false if CONFIG_PAGE_POOL_STATS=n but such static inline helper doesn't exist at all. All callers pass a pointer to a struct on the stack. Make this function void. Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260526155722.2790742-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-28crypto: api - Remove per-tfm refcountEric Biggers
This reverts commit ae131f4970f0 ("crypto: api - Add crypto_tfm_get"). The refcount in struct crypto_tfm was added solely to support crypto_clone_tfm(). Before then it was a simple non-refcounted object. Since crypto_clone_tfm() has been removed, remove the refcount as well. Note that this eliminates an expensive atomic operation from every tfm freeing operation. So this revert doesn't just remove unused code, but it also fixes a performance regression. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://patch.msgid.link/20260522053028.91165-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-28crypto: cipher - Remove crypto_clone_cipher()Eric Biggers
Since the only caller of crypto_clone_cipher() was cmac_clone_tfm() which has been removed, remove crypto_clone_cipher() as well. Note that no tests need to be removed, as this function had no tests. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://patch.msgid.link/20260522053028.91165-3-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-28crypto: hash - Remove support for cloning hash tfmsEric Biggers
Hash transformation cloning no longer has a user, and there's a good chance no new one will appear because the library API solves the problem in a much simpler and more efficient way. Remove support for it. Note that no tests need to be removed, as this feature had no tests. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://patch.msgid.link/20260522053028.91165-2-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-29Merge tag 'dd-lifetimes-7.2-rc1' of ↵Danilo Krummrich
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core into drm-rust-next Higher-Ranked Lifetime Types for Rust device drivers Replace drvdata() with registration data on the auxiliary bus. Private data is now scoped to the registration object, removing the ordering constraints and lifetime complications that came with drvdata(). Add Higher-Ranked Lifetime Types (HRT) so driver structs can borrow device resources like pci::Bar and IoMem directly, tied to the device binding scope. This removes the need for Devres indirection and ARef<Device> in most driver code. This is a stable tag for other trees to merge. Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-05-29Merge patch series "rust: device: Higher-Ranked Lifetime Types for device ↵Danilo Krummrich
drivers" Danilo Krummrich <dakr@kernel.org> says: Currently, Rust device drivers access device resources such as PCI BAR mappings and I/O memory regions through Devres<T>. Devres::access() provides zero-overhead access by taking a &Device<Bound> reference as proof that the device is still bound. Since a &Device<Bound> is available in almost all contexts by design, Devres is mostly a type-system level proof that the resource is valid, but it can also be used from scopes without this guarantee through its try_access() accessor. This works well in general, but has a few limitations: - Every access to a device resource goes through Devres::access(), which despite zero cost, adds boilerplate to every access site. - Destructors do not receive a &Device<Bound>, so they must use try_access(), which can fail. In practice the access succeeds if teardown ordering is correct, but the type system can't express this, forcing drivers to handle a failure path that should never be taken. - Sharing a resource across components (e.g. passing a BAR to a sub-component) requires Arc<Devres<T>>. - Device references must be stored as ARef<Device> rather than plain &Device borrows. These limitations stem from the driver's bus device private data being 'static -- the driver struct cannot borrow from the device reference it receives in probe(), even though it structurally cannot outlive the device binding. This series introduces Higher-Ranked Lifetime Types (HRT) for Rust device drivers. An HRT is a type that is generic over a lifetime -- it does not have a fixed lifetime, but can be instantiated with any lifetime chosen by the caller. Bus driver traits use a Generic Associated Type (GAT) type Data<'bound> to introduce the lifetime on the private data, rather than parameterizing the Driver trait itself. This avoids a driver trait global lifetime and avoids the need for ForLt for bus device private data, making the bus implementations much simpler. ForLt is only needed for auxiliary registration data, where the lifetime is not introduced by a trait callback but must be threaded through Registration. With HRT, driver structs carry a lifetime parameter tied to the device binding scope -- the interval of a bus device being bound to a driver. Device resources like pci::Bar<'bound> and IoMem<'bound> are handed out with this lifetime, so the compiler enforces at build time that they do not escape the binding scope. Before: struct MyDriver { pdev: ARef<pci::Device>, bar: Devres<pci::Bar<BAR_SIZE>>, } let io = self.bar.access(dev)?; io.read32(OFFSET); After: struct MyDriver<'bound> { pdev: &'bound pci::Device, bar: pci::Bar<'bound, BAR_SIZE>, } self.bar.read32(OFFSET); Lifetime-parameterized device resources can be put into a Devres at any point via Bar::into_devres() / IoMem::into_devres(), providing the exact same semantics as before. This is useful for resources shared across subsystem boundaries where revocation is needed. This also synergizes with the upcoming self-referential initialization support in pin-init, which allows one field of the driver struct to borrow another during initialization without unsafe code. The same pattern is applied to auxiliary device registration data as a first example beyond bus device private data. Registration<F: ForLt> can hold lifetime-parameterized data tied to the parent driver's binding scope. Since the auxiliary bus guarantees that the parent remains bound while the auxiliary device is registered, the registration data can safely borrow the parent's device resources. More generally, binding resource lifetimes to a registration scope applies to every registration that is scoped to a driver binding -- auxiliary devices, class devices, IRQ handlers, workqueues. A follow-up series extends this to class device registrations, starting with DRM, so that class device callbacks (IOCTLs, etc.) can safely access device resources through the separate registration data bound to the registration's lifetime without Devres indirection. Thanks to Gary for coming up with the ForLt implementation; thanks to Alice for the early discussions around lifetime-parameterized private data that helped shape the direction of this work. Link: https://patch.msgid.link/20260525202921.124698-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-05-28net: remove SIOCSHWTSTAMP and SIOCGHWTSTAMP from ndo_eth_ioctl commentXuan Zhuo
Since commit 4ee58e1e5680 ("net: promote SIOCSHWTSTAMP and SIOCGHWTSTAMP ioctls to dedicated handlers"), SIOCSHWTSTAMP and SIOCGHWTSTAMP are no longer dispatched through dev_eth_ioctl() / ndo_eth_ioctl(). They are now handled by their own dedicated functions dev_set_hwtstamp() and dev_get_hwtstamp() in the ioctl path. However, the comment describing ndo_eth_ioctl in netdevice.h still lists these two ioctls, which is misleading for driver developers who may incorrectly assume they need to handle hardware timestamping commands in their ndo_eth_ioctl implementation. Remove the stale references from the comment to accurately reflect that ndo_eth_ioctl only handles SIOCGMIIPHY, SIOCGMIIREG and SIOCSMIIREG. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260527120936.24169-1-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.1-rc6). Conflicts: drivers/net/phy/air_en8811h.c d895767c33781 ("net: phy: air_en8811h: add AN8811HB MCU assert/deassert support") dddfadd75197e ("net: phy: Add Airoha phy library for shared code") 5226bb6634cdf ("net: phy: air_phy_lib: Factorize BuckPBus register accessors") e08f0ea6daf2e ("net: phy: Rename Airoha common BuckPBus register accessors") net/sched/sch_netem.c a2f6ed7b4873 ("net/sched: netem: add per-impairment extended statistics") 9552b11e3eda ("net/sched: fix packet loop on netem when duplicate is on") Adjacent changes: drivers/dpll/zl3073x/core.c c1224569cef0 ("dpll: zl3073x: make frequency monitor a per-device attribute") 54e65df8cf18 ("dpll: zl3073x: report FFO as DPLL vs input reference offset") net/iucv/af_iucv.c 347fdd4df85f ("af_iucv: convert to getsockopt_iter") 3589d20a666c ("net/iucv: fix locking in .getsockopt") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-28Merge tag 'acpi-7.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fixes from Rafael Wysocki: "Fix three issues in the ACPI button driver: a possible crash due to a button press after unloading the driver (introduced during the 6.15 development cycle), function keys breakage on Toshiba Tecra X40 due to missing ACPI events (introduced during the 7.0 development cycle), and a missing probe rollback path item that has not been added by mistake during a recent update" * tag 'acpi-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: button: Add missing device class clearing on probe failures ACPI: button: Enable wakeup GPEs for ACPI buttons at probe time ACPI: button: Fix ACPI GPE handler leak during removal
2026-05-28Merge tag 'net-7.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "This is again significantly bigger than the same point into the previous cycle, but at least smaller than last week. I'm not aware of any pending regression for the current cycle. Including fixes from netfilter. Current release - regressions: - netfilter: walk fib6_siblings under RCU Previous releases - regressions: - netlink: fix sending unassigned nsid after assigned one - bridge: fix sleep in atomic context in netlink path - sched: fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop - ipv4: fix net->ipv4.sysctl_local_reserved_ports UaF - eth: tun: free page on short-frame rejection in tun_xdp_one() Previous releases - always broken: - skbuff: fix missing zerocopy reference in pskb_carve helpers - handshake: drain pending requests at net namespace exit - ethtool: - rss: avoid modifying the RSS context response - module: avoid leaking a netdev ref on module flash errors - coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES - netfilter: fix dst corruption in same register operation - nfc: hci: fix out-of-bounds read in HCP header parsing - ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo() - eth: - vti: use ip6_tnl.net in vti6_changelink(). - vxlan: do not reuse cached ip_hdr() value after skb_tunnel_check_pmtu()" * tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) dpll: zl3073x: make frequency monitor a per-device attribute dpll: zl3073x: use __dpll_device_change_ntf() and remove change_work dpll: export __dpll_device_change_ntf() for use under dpll_lock net/handshake: Drain pending requests at net namespace exit net/handshake: Verify file-reference balance in submit paths net/handshake: Close the submit-side sock_hold race net/handshake: hand off the pinned file reference to accept_doit net/handshake: Take a long-lived file reference at submit net/handshake: Pass negative errno through handshake_complete() nvme-tcp: store negative errno in queue->tls_err net/handshake: Use spin_lock_bh for hn_lock net: skbuff: fix missing zerocopy reference in pskb_carve helpers net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path net: hibmcge: disable Relaxed Ordering to fix RX packet corruption selftests/tc-testing: Add netem test case exercising loops selftests/tc-testing: Add mirred test cases exercising loops net/sched: act_mirred: Fix return code in early mirred redirect error paths net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop net/sched: fix packet loop on netem when duplicate is on ...
2026-05-28bitops: Define generic___bitrev8/16/32 for reuseJinjie Ruan
Define generic___bitrev8/16/32 using the implementation in <linux/bitrev.h>, so they can be reused in <asm/bitrev.h>, such as RISCV. Reviewed-by: Yury Norov <ynorov@nvidia.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Yury Norov <ynorov@nvidia.com>
2026-05-28bitmap: fix find helper documentationYury Norov
find_nth_and_bit() and find_nth_and_andnot_bit() may return a value greater than @size when the requested bit does not exist, matching find_nth_bit(). Document that correctly. All current users are safe against the '>=' vs '==' conditions. Also fix the for_each_clear_bitrange_from() parameter descriptions so they describe clear ranges instead of set ranges. Signed-off-by: Yury Norov <ynorov@nvidia.com>
2026-05-28bitmap: drop bitmap_print_to_pagebuf()Yury Norov
Now that all users of bitmap_print_to_pagebuf() are switched to the alternatives, drop the function. Signed-off-by: Yury Norov <ynorov@nvidia.com>
2026-05-28cpumask: switch cpumap_print_to_pagebuf() to using scnprintf()Yury Norov
In preparation for removing bitmap_print_to_pagebuf(), switch cpumap_print_to_pagebuf() to using scnprintf("%*pbl"). Signed-off-by: Yury Norov <ynorov@nvidia.com>
2026-05-28block: export passthrough stats enabledKeith Busch
A user can enable io accounting for passthrough requests, so export the helper that checks if the request should be tracked. This will enable stacking drivers to to report iostats for passthrough workloads. Since the stacking request_queue may not be the one providing the request, the API has to add a parameter for the caller to specify which one to check. Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260528010041.1533124-2-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-28drm/tegra: tegra_drm.h: fix all uapi kernel-doc warningsRandy Dunlap
Add 2 struct member descriptions and convert #define macro constants comments to kernel-doc comments to eliminate all kernel-doc warnings: Warning: include/uapi/drm/tegra_drm.h:353 struct member 'cmdbuf' not described in 'drm_tegra_reloc' Warning: include/uapi/drm/tegra_drm.h:353 struct member 'target' not described in 'drm_tegra_reloc' Warning: include/uapi/drm/tegra_drm.h:780 This comment starts with '/**', but isn't a kernel-doc comment. * Specify that bit 39 of the patched-in address should be set to switch Warning: include/uapi/drm/tegra_drm.h:832 This comment starts with '/**', but isn't a kernel-doc comment. * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr` Warning: include/uapi/drm/tegra_drm.h:837 This comment starts with '/**', but isn't a kernel-doc comment. * Wait for a syncpoint to reach a value before continuing with further Warning: include/uapi/drm/tegra_drm.h:842 This comment starts with '/**', but isn't a kernel-doc comment. * Wait for a syncpoint to reach a value before continuing with further Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260427184454.693794-1-rdunlap@infradead.org
2026-05-28gpu: host1x: Allow entries in BO caches to be freedMikko Perttunen
When a buffer object is pinned via host1x_bo_pin() with a cache, the resulting mapping is kept in the cache so it can be reused on subsequent pins. Each mapping held a reference to the underlying host1x_bo (taken in tegra_bo_pin / gather_bo_pin), so as long as a mapping was cached, the bo itself could not be freed. However, the only way to remove the cached mapping was through the free path of the buffer object. This meant that if a bo got cached, it could never get freed again. Resolve the circularity by holding a weak reference to the bo from the cache side. This is done by having the .pin callbacks not bump the bo's refcount -- instead the common Host1x bo code does so, except for the cache reference. Also move the remove-cache-mapping-on-free code into a common function inside Host1x code. This is only called from the TegraDRM GEM buffers since those are the only ones that can be cached at the moment. Reported-by: Aaron Kling <webgeek1234@gmail.com> Fixes: 1f39b1dfa53c ("drm/tegra: Implement buffer object cache") Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Tested-by: Aaron Kling <webgeek1234@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260515-host1x-bocache-leak-v1-1-a0375f68aeab@nvidia.com
2026-05-28gpu: host1x: trace: fix string fields in host1x tracesArtur Kowalski
Use __assign_str and __get_str as required by tracing subsystem. Fixes string fields being rejected by the verifier and unreadable from userspace. Tested on v6.18.21. Signed-off-by: Artur Kowalski <arturkow2000@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patch.msgid.link/20260519-host1x-tracing-v1-1-55afb8cbd186@gmail.com
2026-05-28block: add a bio_endio_status helperChristoph Hellwig
Add a helper that sets bi_status and call bio_endio() as that is a very common pattern and convert the core block code over to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@kernel.org> Link: https://patch.msgid.link/20260528084632.2505277-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-28bvec: make the bvec_iter helpers inline functionsChristoph Hellwig
The macros are impossible to follow due to the lack of visual type information and all the braces. Replace them with inline helpers to improve on that. Because the calling conventions are a bit problematic with a lot of passing structures by value, all the helpers are marked as __always_inline so that they are force inlined. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://patch.msgid.link/20260527151043.2349900-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-28block: mark biovec_init_pool staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20260527150646.2349405-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-28Disable -Wattribute-alias for clang-23 and newerNathan Chancellor
Clang recently added support for -Wattribute-alias [1], which results in the same warnings that necessitated commit bee20031772a ("disable -Wattribute-alias warning for SYSCALL_DEFINEx()") for GCC. kernel/time/itimer.c:325:1: error: alias and aliasee have different types 'long (unsigned int)' and 'long (typeof (__builtin_choose_expr((__builtin_types_compatible_p(typeof ((unsigned int)0), typeof (0LL)) || __builtin_types_compatible_p(typeof ((unsigned int)0), typeof (0ULL))), 0LL, 0L)))' (aka 'long (long)') [-Werror,-Wattribute-alias] 325 | SYSCALL_DEFINE1(alarm, unsigned int, seconds) | ^ include/linux/syscalls.h:225:36: note: expanded from macro 'SYSCALL_DEFINE1' 225 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__) | ^ include/linux/syscalls.h:236:2: note: expanded from macro 'SYSCALL_DEFINEx' 236 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) | ^ include/linux/syscalls.h:251:18: note: expanded from macro '__SYSCALL_DEFINEx' 251 | __attribute__((alias(__stringify(__se_sys##name)))); \ | ^ kernel/time/itimer.c:325:1: note: aliasee is declared here include/linux/syscalls.h:225:36: note: expanded from macro 'SYSCALL_DEFINE1' 225 | #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__) | ^ include/linux/syscalls.h:236:2: note: expanded from macro 'SYSCALL_DEFINEx' 236 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) | ^ include/linux/syscalls.h:255:18: note: expanded from macro '__SYSCALL_DEFINEx' 255 | asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ | ^ <scratch space>:16:1: note: expanded from here 16 | __se_sys_alarm | ^ Disable the warnings in the same way for clang-23 and newer. Disable the warning about unknown warning options to avoid breaking the build for versions of clang-23 that do not have -Wattribute-alias, such as ones deployed by vendors like Android or CI systems or when bisecting LLVM between llvmorg-23-init and release/23.x. Cc: stable@vger.kernel.org Closes: https://github.com/ClangBuiltLinux/linux/issues/2163 Link: https://github.com/llvm/llvm-project/commit/40da6920a0d71d49dfa2392b09153600b0759f5e [1] Link: https://patch.msgid.link/20260515-syscall-disable-attribute-alias-for-clang-v1-1-9a9d95d41df6@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-05-28Merge tag 'tee-fixes-for-v7.1' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee into arm/fixes TEE fixes for v7.1 Fixing: - params_from_user() cleanup in error path in tee_ioctl_supp_recv() - possible tee_shm leak in error path in register_shm_helper() - padding in struct tee_ioctl_object_invoke_arg * tag 'tee-fixes-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jenswi/linux-tee: tee: fix params_from_user() error path in tee_ioctl_supp_recv tee: shm: fix shm leak in register_shm_helper() tee: fix tee_ioctl_object_invoke_arg padding Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-05-28dpll: export __dpll_device_change_ntf() for use under dpll_lockIvan Vecera
Export __dpll_device_change_ntf() so that drivers can send device change notifications from within device callbacks, which are already called under dpll_lock. Using dpll_device_change_ntf() in that context would deadlock. Add lockdep_assert_held() to catch misuse without the lock held. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260526074525.1451008-2-ivecera@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-28ASoC: soc-card: add snd_soc_card_set_topology_name()Kuninori Morimoto
Some drivers want to use topology name, but currently each drivers are setting it by own method. This patch adds new snd_soc_card_set_topology_name() and do it by same method. Almost all driver doesn't set topology name, let's remove fixed name array, and use devm_kasprintf() instead. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Link: https://patch.msgid.link/878q942wce.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-05-28net: Introduce skb tc depth field to track packet loopsJamal Hadi Salim
Add a 2-bit per-skb tc depth field to track packet loops across the stack. The previous per-CPU loop counters like MIRRED_NEST_LIMIT assume a single call stack and lose state in two cases: 1) When a packet is queued and reprocessed later (e.g., egress->ingress via backlog), the per-cpu state is gone by the time it is dequeued. 2) With XPS/RPS a packet may arrive on one CPU and be processed on another. A per-skb field solves both by travelling with the packet itself. The field fits in existing padding, using 2 bits that were previously a hole: pahole before(-) and after (+) diff looks like: __u8 slow_gro:1; /* 132: 3 1 */ __u8 csum_not_inet:1; /* 132: 4 1 */ __u8 unreadable:1; /* 132: 5 1 */ + __u8 tc_depth:2; /* 132: 6 1 */ - /* XXX 2 bits hole, try to pack */ /* XXX 1 byte hole, try to pack */ __u16 tc_index; /* 134 2 */ There used to be a ttl field which was removed as part of tc_verd in commit aec745e2c520 ("net-tc: remove unused tc_verd fields"). It was already unused by that time, due to remove earlier in commit c19ae86a510c ("tc: remove unused redirect ttl"). The first user of this field is netem, which increments tc_depth on duplicated packets before re-enqueueing them at the root qdisc. On re-entry, netem skips duplication for any skb with tc_depth already set, bounding recursion to a single level regardless of tree topology. The other user is mirred which increments it on each pass and limits to depth to MIRRED_DEFER_LIMIT (3). The new field was called ttl in earlier versions of this patch but renamed to tc_depth to avoid confusion with IP ttl. Note (looking at you Sashiko! Dont ignore me and continue bringing this up): 1. Since both mirred and netem utilize the same 2-bit tc_depth field it is possible when netem and mirred are used together that netem qdisc to skip the duplication step. This is a known trade-off, as a 2-bit field cannot independently track both features' recursion depths and it is not considered sane to have a setup that addresses both features on at the same time. 2. skb_scrub_packet does not clear tc_depth. This means a packet's loop history is preserved even across namespaces. While this might be restrictive for some topologies, it is also design intent to provide robustness against loops across namespaces. Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260525122556.973584-2-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-28Merge branch 'locking/context' into locking/corePeter Zijlstra
2026-05-28seqlock: Allow UBSAN_ALIGNMENT to fail optimizingHeiko Carstens
With gcc-15 and gcc-16 with UBSAN_ALIGNMENT enabled the compiler fails to inline and optimize __scoped_seqlock_bug() away on s390: s390x-16.1.0-ld: kernel/sched/build_policy.o: in function `__scoped_seqlock_next': /.../seqlock.h:1286:(.text+0x22030): undefined reference to `__scoped_seqlock_bug' Fix this by adding UBSAN_ALIGNMENT to the list of config options where a not inlined empty __scoped_seqlock_bug() is allowed. Closes: https://lore.kernel.org/r/20260515092057.810542-1-arnd@kernel.org/ Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260519110315.1385307-1-hca@linux.ibm.com
2026-05-28compiler-context-analysis: Bump required Clang version to 23Marco Elver
Clang 23 introduces several major improvements: 1. Support for multiple arguments in the `guarded_by` and `pt_guarded_by` attributes [1]. This allows defining variables protected by multiple context locks, where read access requires holding at least one lock (shared or exclusive), and write access requires holding all of them exclusively. 2. Function pointer support [2]. We can now add attributes to function pointers just like we do on normal functions. 3. A fix to use arrays of locks [3]. Each index is now correctly treated as a separate lock instance. 4. A fix for implicit member access in attributes [4]. This allows to use __guarded_by(&foo->lock) correctly. Overall that makes it worthwhile bumping the compiler version instead of trying to make both Clang 22 and later work while supporting these new features. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://github.com/llvm/llvm-project/pull/186838 [1] Link: https://github.com/llvm/llvm-project/pull/191187 [2] Link: https://github.com/llvm/llvm-project/pull/148551 [3] Link: https://github.com/llvm/llvm-project/pull/194457 [4] Link: https://patch.msgid.link/20260515124426.2227783-1-elver@google.com
2026-05-28thunderbolt: Prevent XDomain delayed work use-after-free on disconnectMichael Bommarito
tb_xdp_handle_request() runs on system_wq and queues xd->state_work via queue_delayed_work() in three request handlers: PROPERTIES_CHANGED_REQUEST, UUID_REQUEST (via start_handshake), and LINK_STATE_CHANGE_REQUEST. Similarly, update_xdomain() queues xd->properties_changed_work when local properties change. Concurrently, tb_xdomain_remove() calls stop_handshake() which does cancel_delayed_work_sync() on both delayed works. Later, tb_xdomain_unregister() calls device_unregister() which eventually frees the xdomain. Since commit 559c1e1e0134 ("thunderbolt: Run tb_xdp_handle_request() in system workqueue") moved the request handler off tb->wq, the handler and the remove path are no longer serialized. If queue_delayed_work() executes after cancel_delayed_work_sync() but before the xdomain is freed, the delayed work fires on a freed object. Add xd->removing that tb_xdomain_remove() sets under xd->lock before calling stop_handshake(). Each external queue site holds the same lock and checks removing before calling queue_delayed_work(). This provides the mutual exclusion needed: either the queue site acquires the lock first and queues work that the subsequent cancel will see, or the remove path acquires the lock first and the queue site observes removing == true and skips the queue. Fixes: 559c1e1e0134 ("thunderbolt: Run tb_xdp_handle_request() in system workqueue") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2026-05-28Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get GEM LRU fixes from commit 379e8f1c ("drm/gem: Make the GEM LRU lock part of drm_device") and other updates from v7.1-rc5. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2026-05-28Merge v7.1-rc5 into drm-nextSimona Vetter
Boris Brezillion needs the gem lru fixes 379e8f1ca5e9 ("drm/gem: Make the GEM LRU lock part of drm_device") backmerged for drm-misc-next. That also means we need to sort out the rename conflict in panthor with the fixup patch from Boris from drm-tip. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2026-05-28PCI: Add pci_ats_required() for CXL.cache capable devicesNicolin Chen
Controlled by IOMMU drivers, ATS can be enabled "on demand", when a given PASID on a device is attached to an I/O page table. This is working, even when a device has no translation on its RID (i.e., RID is IOMMU bypassed). However, certain PCIe devices require non-PASID ATS on their RID even when the RID is IOMMU bypassed. Call this "ATS always on" in IOMMU term. For example, CXL spec r4.0 notes in sec 3.2.5.13 Memory Type on CXL.cache: "To source requests on CXL.cache, devices need to get the Host Physical Address (HPA) from the Host by means of an ATS request on CXL.io." In other words, the CXL.cache capability requires ATS; otherwise, it can't access host physical memory. Introduce a new pci_ats_required() helper for the IOMMU driver to scan a PCI device and shift ATS policies between "on demand" and "always on". Add the support for CXL.cache devices first. Pre-CXL devices will be added in quirks.c file. Note that pci_ats_required() validates against pci_ats_supported(), so we ensure that untrusted devices (e.g. external ports) will not be always on. This maintains the existing ATS security policy regarding potential side- channel attacks via ATS. Cc: linux-cxl@vger.kernel.org Suggested-by: Vikram Sethi <vsethi@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-05-28iommu, debugobjects: avoid gcc-16.1 section mismatch warningsArnd Bergmann
gcc-16 has gained some more advanced inter-procedual optimization techniques that enable it to inline the dummy_tlb_add_page() and dummy_tlb_flush() function pointers into a specialized version of __arm_v7s_unmap: WARNING: modpost: vmlinux: section mismatch in reference: __arm_v7s_unmap+0x2cc (section: .text) -> dummy_tlb_add_page (section: .init.text) ERROR: modpost: Section mismatches detected. >From what I can tell, the transformation is correct, as this is only called when __arm_v7s_unmap() is called from arm_v7s_do_selftests(), which is also __init. Since __arm_v7s_unmap() however is not __init, gcc cannot inline the inner function calls directly. In debug_objects_selftest(), the same thing happens. Both the caller and the leaf function are __init, but the IPA pulls it into a non-init one: WARNING: modpost: vmlinux: section mismatch in reference: lookup_object_or_alloc+0x7c (section: .text.lookup_object_or_alloc) -> is_static_object (section: .init.text) Marking the affected functions as not "__init" would reliably avoid this issue but is not a good solution because it removes an otherwise correct annotation. I tried marking the functions as 'noinline', but that ended up not covering all the affected configurations. With some more experimenting, I found that marking these functions as __attribute__((noipa)) is both logical and reliable. In order to keep the syntax readable, add a custom macro for this in include/linux/compiler_attributes.h next to other related macros and use it to annotate both files. Link: https://lore.kernel.org/all/abRB6g-48ZX6Yl2r@willie-the-truck/ Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-05-27ipv6: guard against possible NULL deref in __in6_dev_stats_get()Eric Dumazet
dev_get_by_index_rcu() could return NULL if the original physical device is unregistered. Found by Sashiko. Fixes: e1ae5c2ea478 ("vrf: Increment Icmp6InMsgs on the original netdev") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260526145529.3587126-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-27compiler-clang.h: Drop explicit version number from "all" diagnostic macroNathan Chancellor
This is more consistent with what commit 7efa84b5cdd6 ("compiler-gcc.h: Introduce __diag_GCC_all") did for GCC. Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-16-b3b8cda46bdd@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-05-27compiler-clang.h: Remove __cleanup -Wunused-variable workaroundNathan Chancellor
Now that the minimum supported version of LLVM for building the kernel has been raised to 17.0.1, the redefinition of __cleanup with __maybe_unused added to it is unnecessary because the referenced LLVM change is present in all supported LLVM versions. Drop it. Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-15-b3b8cda46bdd@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-05-27mshv: Add conditional VMBus dependencyMichael Kelley
When the VMBus driver is not part of the kernel (CONFIG_HYPERV_VMBUS=n), the MSHV root driver fails to link: ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined! Fix this while meeting these requirements: * It must be possible to include the MSHV root driver without the VMBus driver. In such case, the MSHV root driver can be built-in to the kernel image, or it can be built as a separate module. * If both the MSHV root driver and the VMBus driver are present, the MSHV root driver and VMBus driver can both be built-in, or they can both be separate modules. Or the MSHV root driver can be a module while the VMBus driver can be built-in, but the reverse is disallowed. Regardless of the build choices, the VMBus driver must be loaded before the MSHV driver in order for the SynIC to be managed properly (see comments in the MSHV SynIC code). The fix has two parts: * Add a Kconfig entry for MSHV_ROOT to depend on HYPERV_VMBUS if HYPERV_VMBUS is present. The entry disallows MSHV_ROOT being built-in when HYPERV_VMBUS is a module, but without requiring that HYPERV_VMBUS be built. * Add a stub implementation of hv_vmbus_exists() for when the VMBus driver is not present so that the MSHV root driver has no module dependency on VMBus. When the VMBus driver *is* present, the module dependency ensures that the VMBus driver loads first when both are built as modules. Existing code ensures that the VMBus driver loads first if it is built-in. The VMBus driver uses subsys_initcall(), which is initcall level 4. The MSHV root driver uses module_init(), which becomes device_init() when built-in, and device_init() is initcall level 6. Reported-by: Arnd Bergmann <arnd@arndb.de> Closes: https://lore.kernel.org/all/20260520074044.923728-1-arnd@kernel.org/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Hardik Garg <hargar@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-05-27hyperv: Clean up and fix the guest ID comment in hvgdk.hDexuan Cui
Change the "64 bit" to "64-bit", and the "Os" to "OS". Remove the obsolete paragraph since the guideline has been published in the Hypervisor Top Level Functional Specification for many years. The "OS Type" is 0x1 for Linux, not 0x100. No functional change. Fixes: 83ba0c4f3f31 ("Drivers: hv: Cleanup the guest ID computation") Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-05-27ACPICA: Update version to 20260408Saket Dumbre
Update ACPI_CA_VERSION to match the 20260408 upstream release. Link: https://github.com/acpica/acpica/commit/232ff3f8ae1a Signed-off-by: Saket Dumbre <saket.dumbre@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1881459.TLkxdtWsSY@rafael.j.wysocki
2026-05-27ACPICA: Update the copyright year to 2026Pawel Chmielewski
Update copyright notices in all ACPICA files. Link: https://github.com/acpica/acpica/commit/9def02549a9c Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/4379132.1IzOArtZ34@rafael.j.wysocki