summaryrefslogtreecommitdiff
path: root/arch/x86/Kconfig
AgeCommit message (Collapse)Author
2026-04-20x86/shstk: Prevent deadlock during shstk sigreturnRick Edgecombe
During sigreturn the shadow stack signal frame is popped. The kernel does this by reading the shadow stack using normal read accesses. When it can't assume the memory is shadow stack, it takes extra steps to makes sure it is reading actual shadow stack memory and not other normal readable memory. It does this by holding the mmap read lock while doing the access and checking the flags of the VMA. Unfortunately that is not safe. If the read of the shadow stack sigframe hits a page fault, the fault handler will try to recursively grab another mmap read lock. This normally works ok, but if a writer on another CPU is also waiting, the second read lock could fail and cause a deadlock. Fix this by not holding mmap lock during the read access to userspace. Instead use mmap_lock_speculate_...() to watch for changes between dropping mmap lock and the userspace access. Retry if anything grabbed an mmap write lock in between and could have changed the VMA. These mmap_lock_speculate_...() helpers use mm::mm_lock_seq, which is only available when PER_VMA_LOCK is configured. So make X86_USER_SHADOW_STACK depend on it. On x86, PER_VMA_LOCK is a default configuration for SMP kernels. So drop support for the other configs under the assumption that the !SMP shadow stack user base does not exist. Currently there is a check that skips the lookup work when the SSP can be assumed to be on a shadow stack. While reorganizing the function, remove the optimization to make the tricky code flows more common, such that issues like this cannot escape detection for so long. Fixes: 7fad2a432cd3 ("x86/shstk: Check that signal frame is shadow stack mem") Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Reviewed-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-14Merge tag 'x86_fred_for_v7.1_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 FRED updates from Borislav Petkov: "We made the FRED support an opt-in initially out of fear of it breaking machines left and right in the case of a hw bug in the first generation of machines supporting it. Now that that the FRED code has seen a lot of hammering, flip the logic to be opt-out as is the usual case with new hw features" * tag 'x86_fred_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fred: Remove kernel log message when initializing exceptions x86/fred: Enable FRED by default
2026-04-14Merge tag 'x86-platform-2026-04-13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: - Remove M486/M486SX/ELAN support, first minimal step (Ingo Molnar) - Print AGESA string from DMI additional information entry (Yazen Ghannam, Mario Limonciello) - Improve and fix the DMI code (Mario Limonciello): - Correct an indexing error in <linux/dmi.h> - Adjust dmi_decode() to use enums <linux/dmi.h> - Add pr_fmt() for dmi_scan.c to fix & standardize the log prefixes * tag 'x86-platform-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/CPU/AMD: Print AGESA string from DMI additional information entry firmware: dmi: Add pr_fmt() for dmi_scan.c firmware: dmi: Adjust dmi_decode() to use enums firmware: dmi: Correct an indexing error in dmi.h x86/cpu: Remove M486/M486SX/ELAN support
2026-04-05mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVEDavid Hildenbrand (Arm)
Patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION". While working on memory hotplug code cleanups, I realized that CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE is not really required anymore. Changing that revealed some rather nasty looking CONFIG_MIGRATION handling. Let's clean that up by introducing a dedicated CONFIG_NUMA_MIGRATION option and reducing the dependencies that CONFIG_MIGRATION has. This patch (of 2): All architectures that select CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE also select CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG. So we can just remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE. For CONFIG_MIGRATION, make it depend on CONFIG_MEMORY_HOTREMOVE instead, and make CONFIG_MEMORY_HOTREMOVE select CONFIG_MIGRATION (just like CONFIG_CMA and CONFIG_COMPACTION already do). We'll clean up CONFIG_MIGRATION next. Link: https://lkml.kernel.org/r/20260319-config_migration-v1-0-42270124966f@kernel.org Link: https://lkml.kernel.org/r/20260319-config_migration-v1-1-42270124966f@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-30x86/cpu: Remove M486/M486SX/ELAN supportIngo Molnar
In the x86 architecture we have various complicated hardware emulation facilities on x86-32 to support ancient 32-bit CPUs that very very few people are using with modern kernels. This compatibility glue is sometimes even causing problems that people spend time to resolve, which time could be spent on other things. As Linus recently remarked: > I really get the feeling that it's time to leave i486 support behind. > There's zero real reason for anybody to waste one second of > development effort on this kind of issue. Implement the first step and remove M486/M486SX/ELAN support: CONFIG_M486SX CONFIG_M486 CONFIG_MELAN [ There's no recent M486=y kernel package for any mainstream x86 32-bit distribution available that I've been able to find, so actual users should not be impacted, and any legacy users can keep using older kernels. ] Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ahmed S. Darwish <darwi@linutronix.de> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20251214084710.3606385-2-mingo@kernel.org
2026-03-27x86/fred: Enable FRED by defaultH. Peter Anvin
When FRED was added to the mainline kernel, it was set up as an explicit opt-in due to the risk of regressions before hardware was available publicly. Now, Panther Lake (Core Ultra 300 series) has been released, and benchmarking by Phoronix has shown that it provides a significant performance benefit on most workloads: https://www.phoronix.com/review/intel-fred-panther-lake Accordingly, enable FRED by default if the CPU supports it. FRED can of course still be disabled via the fred=off command line option. Touch up Kconfig help too. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://patch.msgid.link/20260325230151.1898287-2-hpa@zytor.com
2026-02-27x86/apic: Enable TSC coupled programming modeThomas Gleixner
The TSC deadline timer is directly coupled to the TSC and setting the next deadline is tedious as the clockevents core code converts the CLOCK_MONOTONIC based absolute expiry time to a relative expiry by reading the current time from the TSC. It converts that delta to cycles and hands the result to lapic_next_deadline(), which then has read to the TSC and add the delta to program the timer. The core code now supports coupled clock event devices and can provide the expiry time in TSC cycles directly without reading the TSC at all. This obviouly works only when the TSC is the current clocksource, but that's the default for all modern CPUs which implement the TSC deadline timer. If the TSC is not the current clocksource (e.g. early boot) then the core code falls back to the relative set_next_event() callback as before. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.076565985@kernel.org
2026-02-27x86: Inline TSC reads in timekeepingThomas Gleixner
Avoid the overhead of the indirect call for a single instruction to read the TSC. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.741886362@kernel.org
2026-02-12Merge tag 'mm-stable-2026-02-11-19-22' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev) It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process. - "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky) - "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand) - "memcg cleanups" tidies up some memcg code (Chen Ridong) - "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park) - "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang) - "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu) - "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai) - "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg) - "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport) - "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes) - "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka) - "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt) - "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang) - "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park) - "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan) - "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov) - "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park) - "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg) - "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand) - "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen) - "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari) - "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky) - "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang) - "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park) - "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park) - "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park) - "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes) - "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song) - "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng) * tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...
2026-02-10Merge tag 'x86_paravirt_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Borislav Petkov: - A nice cleanup to the paravirt code containing a unification of the paravirt clock interface, taming the include hell by splitting the pv_ops structure and removing of a bunch of obsolete code (Juergen Gross) * tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted() x86/paravirt: Remove trailing semicolons from alternative asm templates x86/pvlocks: Move paravirt spinlock functions into own header x86/paravirt: Specify pv_ops array in paravirt macros x86/paravirt: Allow pv-calls outside paravirt.h objtool: Allow multiple pv_ops arrays x86/xen: Drop xen_mmu_ops x86/xen: Drop xen_cpu_ops x86/xen: Drop xen_irq_ops x86/paravirt: Move pv_native_*() prototypes to paravirt.c x86/paravirt: Introduce new paravirt-base.h header x86/paravirt: Move paravirt_sched_clock() related code into tsc.c x86/paravirt: Use common code for paravirt_steal_clock() riscv/paravirt: Use common code for paravirt_steal_clock() loongarch/paravirt: Use common code for paravirt_steal_clock() arm64/paravirt: Use common code for paravirt_steal_clock() arm/paravirt: Use common code for paravirt_steal_clock() sched: Move clock related paravirt code to kernel/sched paravirt: Remove asm/paravirt_api_clock.h x86/paravirt: Move thunk macros to paravirt_types.h ...
2026-02-10Merge tag 'x86_microcode_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loader update from Borislav Petkov: - Since debugging the microcode loader makes sense on baremetal too (it was used in a guest only until now), extend it to be able to do that too * tag 'x86_microcode_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Allow loader debugging to be enabled on baremetal too
2026-02-10Merge tag 'x86_cache_for_v7.0_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Extend the resctrl machinery to support telemetry monitoring on Intel (Tony Luck) The practical usage of this is being able to tell how much energy or how much work can be attributed to a group of tasks tracked under a single idenitifier. Prepend this work with proper refactoring of resctrl domains handling code. * tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86,fs/resctrl: Update documentation for telemetry events x86/resctrl: Enable RDT_RESOURCE_PERF_PKG fs/resctrl: Move RMID initialization to first mount x86,fs/resctrl: Compute number of RMIDs as minimum across resources fs/resctrl: Move allocation/free of closid_num_dirty_rmid[] x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG x86/resctrl: Add energy/perf choices to rdt boot option x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp() fs/resctrl: Refactor mkdir_mondata_subdir() x86/resctrl: Read telemetry events x86/resctrl: Find and enable usable telemetry events x86,fs/resctrl: Add architectural event pointer x86,fs/resctrl: Fill in details of events for performance and energy GUIDs x86/resctrl: Discover hardware telemetry events fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains x86,fs/resctrl: Add and initialize a resource for package scope monitoring x86,fs/resctrl: Add an architectural hook called for first mount x86,fs/resctrl: Support binary fixed point event counters x86,fs/resctrl: Handle events that can be read from any CPU ...
2026-02-10Merge tag 'x86-cpu-2026-02-09' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: - CPU model updates (Andrew Cooper): - amd: Correct the microcode table for Zenbleed - amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[] - Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix - tsx: Set default TSX mode to auto (Nikolay Borisov) - Drop unused Kconfig symbol X86_P6_NOP (Randy Dunlap) * tag 'x86-cpu-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsx: Set default TSX mode to auto x86/cpu: Drop unused Kconfig symbol X86_P6_NOP x86/cpu: Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix x86/cpu/amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[] x86/cpu/amd: Correct the microcode table for Zenbleed
2026-02-06mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREEQi Zheng
The PT_RECLAIM can work on all architectures that support MMU_GATHER_RCU_TABLE_FREE, except for those that have selected HAVE_ARCH_TLB_REMOVE_TABLE,so make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE && !HAVE_ARCH_TLB_REMOVE_TABLE. BTW, change PT_RECLAIM to be enabled by default, since nobody should want to turn it off. Link: https://lkml.kernel.org/r/83b034810935a9ff18e425b085e065bb0acb28f3.1769515122.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-28bpf,x86: Use single ftrace_ops for direct callsJiri Olsa
Using single ftrace_ops for direct calls update instead of allocating ftrace_ops object for each trampoline. With single ftrace_ops object we can use update_ftrace_direct_* api that allows multiple ip sites updates on single ftrace_ops object. Adding HAVE_SINGLE_FTRACE_DIRECT_OPS config option to be enabled on each arch that supports this. At the moment we can enable this only on x86 arch, because arm relies on ftrace_ops object representing just single trampoline image (stored in ftrace_ops::direct_call). Archs that do not support this will continue to use *_ftrace_direct api. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-10-jolsa@kernel.org
2026-01-20mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODEKevin Brodsky
Architectures currently opt in for implementing lazy_mmu helpers by defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE. In preparation for introducing a generic lazy_mmu layer that will require storage in task_struct, let's switch to a cleaner approach: instead of defining a macro, select a CONFIG option. This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch select it when it implements lazy_mmu helpers. __HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on the new CONFIG instead. On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected. This creates some complications in arch/x86/boot/, because a few files manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does not define the lazy_mmu helpers, but this breaks the build as <linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE. There does not seem to be a clean way out of this - let's just undefine that new CONFIG too. Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-14x86/microcode/AMD: Allow loader debugging to be enabled on baremetal tooBorislav Petkov (AMD)
Debugging the loader on baremetal does make sense, so enable it there too. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20260108165028.27417-1-bp@kernel.org
2026-01-12x86/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. With all archs supporting Xen now having been switched to the common variant, including paravirt.h can be dropped from drivers/xen/time.c. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-12-jgross@suse.com
2026-01-12x86/paravirt: Remove PARAVIRT_DEBUG config optionJuergen Gross
The only effect of CONFIG_PARAVIRT_DEBUG set is that instead of doing a call using a NULL pointer a BUG() is being raised. While the BUG() will be a little bit easier to analyse, the call of NULL isn't really that difficult to find the reason for. Remove the config option to make paravirt coding a little bit less annoying. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-4-jgross@suse.com
2026-01-09x86/resctrl: Read telemetry eventsTony Luck
Introduce intel_aet_read_event() to read telemetry events for resource RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each package, so scan all of them and add up all counters. Aggregators may return an invalid data indication if they have received no records for a given RMID. The user will see "Unavailable" if none of the aggregators on a package provide valid counts. Resctrl now uses readq() so depends on X86_64. Update Kconfig. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-09x86/resctrl: Discover hardware telemetry eventsTony Luck
Each CPU collects data for telemetry events that it sends to the nearest telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID changes, or when a two millisecond timer expires. There is a feature type ("energy" or "perf"), GUID, and MMIO region associated with each aggregator. This combination links to an XML description of the set of telemetry events tracked by the aggregator. XML files are published by Intel in a GitHub repository¹. The telemetry event aggregators maintain per-RMID per-event counts of the total seen for all the CPUs. There may be multiple telemetry event aggregators per package. There are separate sets of aggregators for each feature type. Aggregators in a set may have different GUIDs. All aggregators with the same feature type and GUID are symmetric keeping counts for the same set of events for the CPUs that provide data to them. The XML file for each aggregator provides the following information: 0) Feature type of the events ("perf" or "energy") 1) Which telemetry events are tracked by the aggregator. 2) The order in which the event counters appear for each RMID. 3) The value type of each event counter (integer or fixed-point). 4) The number of RMIDs supported. 5) Which additional aggregator status registers are included. 6) The total size of the MMIO region for an aggregator. Introduce struct event_group that condenses the relevant information from an XML file. Hereafter an "event group" refers to a group of events of a particular feature type (event_group::pfname set to "energy" or "perf") with a particular GUID. Use event_group::pfname to determine the feature id needed to obtain the aggregator details. It will later be used in console messages and with the rdt= boot parameter. The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events. This driver provides intel_pmt_get_regions_by_feature() to list all available telemetry event aggregators of a given feature type. The list includes the "guid", the base address in MMIO space for the region where the event counters are exposed, and the package id where the all the CPUs that report to this aggregator are located. Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event group to obtain a private copy of that event group's aggregator data. Duplicate the aggregator data between event groups that have the same feature type but different GUID. Further processing on this private copy will be unique to the event group. ¹https://github.com/intel/Intel-PMT [ bp: Zap text explaining the code, s/guid/GUID/g ] Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
2026-01-06x86/tsx: Set default TSX mode to autoNikolay Borisov
At SUSE we've been releasing our kernels with TSX enabled for the past 6 years and some customers have started to rely on it. Furthermore, the last known vulnerability concerning TSX was TAA (CVE-2019-11135) and a significant amount time has passed since then without anyone reporting any issues. Intel has released numerous processors which do not have the TAA vulnerability (Cooper/Ice Lake, Sapphire/Emerald/Granite Rappids) yet TSX remains being disabled by default. The main aim of this patch is to reduce the divergence between SUSE's configuration and the upstream by switching the default TSX mode to auto. I believe this strikes the right balance between keeping it enabled where appropriate (i.e every machine which doesn't contain the TAA vulnerability) and disabling it preventively. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251112190548.750746-1-nik.borisov@suse.com
2025-12-05Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-12-03Merge tag 'bpf-next-6.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to test_progs runner (Alexis Lothoré) - Convert selftests/bpf/test_xsk to test_progs runner (Bastien Curutchet) - Replace bpf memory allocator with kmalloc_nolock() in bpf_local_storage (Amery Hung), and in bpf streams and range tree (Puranjay Mohan) - Introduce support for indirect jumps in BPF verifier and x86 JIT (Anton Protopopov) and arm64 JIT (Puranjay Mohan) - Remove runqslower bpf tool (Hoyeon Lee) - Fix corner cases in the verifier to close several syzbot reports (Eduard Zingerman, KaFai Wan) - Several improvements in deadlock detection in rqspinlock (Kumar Kartikeya Dwivedi) - Implement "jmp" mode for BPF trampoline and corresponding DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong) - Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul Chaignon) - Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel Borkmann) - Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko) - Optimize bpf_map_update_elem() for map-in-map types (Ritesh Oedayrajsingh Varma) - Introduce overwrite mode for BPF ring buffer (Xu Kuohai) * tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits) bpf: optimize bpf_map_update_elem() for map-in-map types bpf: make kprobe_multi_link_prog_run always_inline selftests/bpf: do not hardcode target rate in test_tc_edt BPF program selftests/bpf: remove test_tc_edt.sh selftests/bpf: integrate test_tc_edt into test_progs selftests/bpf: rename test_tc_edt.bpf.c section to expose program type selftests/bpf: Add success stats to rqspinlock stress test rqspinlock: Precede non-head waiter queueing with AA check rqspinlock: Disable spinning for trylock fallback rqspinlock: Use trylock fallback when per-CPU rqnode is busy rqspinlock: Perform AA checks immediately rqspinlock: Enclose lock/unlock within lock entry acquisitions bpf: Remove runqslower tool selftests/bpf: Remove usage of lsm/file_alloc_security in selftest bpf: Disable file_alloc_security hook bpf: check for insn arrays in check_ptr_alignment bpf: force BPF_F_RDONLY_PROG on insn array creation bpf: Fix exclusive map memory leak selftests/bpf: Make CS length configurable for rqspinlock stress test selftests/bpf: Add lock wait time stats to rqspinlock stress test ...
2025-12-01Merge tag 'core-bugs-2025-12-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull bug handling infrastructure updates from Ingo Molnar: "Core updates: - Improve WARN(), which has vararg printf like arguments, to work with the x86 #UD based WARN-optimizing infrastructure by hiding the format in the bug_table and replacing this first argument with the address of the bug-table entry, while making the actual function that's called a UD1 instruction (Peter Zijlstra) - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo Molnar, s390 support by Heiko Carstens) Fixes and cleanups: - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens) - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter Zijlstra)" * tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS x86/bug: Fix BUG_FORMAT vs KASLR x86_64/bug: Inline the UD1 x86/bug: Implement WARN_ONCE() x86_64/bug: Implement __WARN_printf() x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED x86/bug: Add BUG_FORMAT basics bug: Allow architectures to provide __WARN_printf() bug: Implement WARN_ON() using __WARN_FLAGS() bug: Add report_bug_entry() bug: Add BUG_FORMAT_ARGS infrastructure bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS bug: Add BUG_FORMAT infrastructure x86: Rework __bug_table helpers bugs/s390: Remove private WARN_ON() implementation bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS() ...
2025-12-01Merge tag 'perf-core-2025-12-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Callchain support: - Add support for deferred user-space stack unwinding for perf, enabled on x86. (Peter Zijlstra, Steven Rostedt) - unwind_user/x86: Enable frame pointer unwinding on x86 (Josh Poimboeuf) x86 PMU support and infrastructure: - x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra) - x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra) Intel PMU driver: - Large series to prepare for and implement architectural PEBS support for Intel platforms such as Clearwater Forest (CWF) and Panther Lake (PTL). (Dapeng Mi, Kan Liang) - Check dynamic constraints (Kan Liang) - Optimize PEBS extended config (Peter Zijlstra) - cstates: - Remove PC3 support from LunarLake (Zhang Rui) - Add Pantherlake support (Zhang Rui) - Clearwater Forest support (Zide Chen) AMD PMU driver: - x86/amd: Check event before enable to avoid GPF (George Kennedy) Fixes and cleanups: - task_work: Fix NMI race condition (Peter Zijlstra) - perf/x86: Fix NULL event access and potential PEBS record loss (Dapeng Mi) - Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter Zijlstra)" * tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use perf/x86/intel: Optimize PEBS extended config perf/x86/intel: Check PEBS dyn_constraints perf/x86/intel: Add a check for dynamic constraints perf/x86/intel: Add counter group support for arch-PEBS perf/x86/intel: Setup PEBS data configuration and enable legacy groups perf/x86/intel: Update dyn_constraint base on PEBS event precise level perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR perf/x86/intel: Process arch-PEBS records or record fragments perf/x86/intel/ds: Factor out PEBS group processing code to functions perf/x86/intel/ds: Factor out PEBS record processing code to functions perf/x86/intel: Initialize architectural PEBS perf/x86/intel: Correct large PEBS flag check perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call perf/x86: Fix NULL event access and potential PEBS record loss perf/x86: Remove redundant is_x86_event() prototype entry,unwind/deferred: Fix unwind_reset_info() placement unwind_user/x86: Fix arch=um build perf: Support deferred user unwind unwind_user/x86: Teach FP unwind about start of function ...
2025-11-27x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERSPeter Zijlstra
Linus figured less #ifdef is more better and making x86-32 use GENERIC_BUG_RELATIVE_POINTERS removes one layer of macro magic from the bug.h bits. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2025-11-24x86/ftrace: Implement DYNAMIC_FTRACE_WITH_JMPMenglong Dong
Implement the DYNAMIC_FTRACE_WITH_JMP for x86_64. In ftrace_call_replace, we will use JMP32_INSN_OPCODE instead of CALL_INSN_OPCODE if the address should use "jmp". Meanwhile, adjust the direct call in the ftrace_regs_caller. The RSB is balanced in the "jmp" mode. Take the function "foo" for example: original_caller: call foo -> foo: call fentry -> fentry: [do ftrace callbacks ] move tramp_addr to stack RET -> tramp_addr tramp_addr: [..] call foo_body -> foo_body: [..] RET -> back to tramp_addr [..] RET -> back to original_caller Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20251118123639.688444-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-16iommu/sva: invalidate stale IOTLB entries for kernel address spaceLu Baolu
Introduce a new IOMMU interface to flush IOTLB paging cache entries for the CPU kernel address space. This interface is invoked from the x86 architecture code that manages combined user and kernel page tables, specifically before any kernel page table page is freed and reused. This addresses the main issue with vfree() which is a common occurrence and can be triggered by unprivileged users. While this resolves the primary problem, it doesn't address some extremely rare case related to memory unplug of memory that was present as reserved memory at boot, which cannot be triggered by unprivileged users. The discussion can be found at the link below. Enable SVA on x86 architecture since the IOMMU can now receive notification to flush the paging cache before freeing the CPU kernel page table pages. Link: https://lkml.kernel.org/r/20251022082635.2462433-9-baolu.lu@linux.intel.com Link: https://lore.kernel.org/linux-iommu/04983c62-3b1d-40d4-93ae-34ca04b827e5@intel.com/ Co-developed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Suggested-by: Jann Horn <jannh@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-29unwind_user/x86: Enable frame pointer unwinding on x86Josh Poimboeuf
Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the unwind_user interfaces can be used. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250827193828.347397433@kernel.org
2025-10-14livepatch: Add CONFIG_KLP_BUILDJosh Poimboeuf
In preparation for introducing klp-build, add a new CONFIG_KLP_BUILD option. The initial version will only be supported on x86-64. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-11Merge tag 'x86_cleanups_for_v6.18_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - Simplify inline asm flag output operands now that the minimum compiler version supports the =@ccCOND syntax - Remove a bunch of AS_* Kconfig symbols which detect assembler support for various instruction mnemonics now that the minimum assembler version supports them all - The usual cleanups all over the place * tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__ x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h> x86/mtrr: Remove license boilerplate text with bad FSF address x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h> x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h> x86/entry/fred: Push __KERNEL_CS directly x86/kconfig: Remove CONFIG_AS_AVX512 crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ crypto: X86 - Remove CONFIG_AS_VAES crypto: x86 - Remove CONFIG_AS_GFNI x86/kconfig: Drop unused and needless config X86_64_SMP
2025-10-04Merge tag 'x86_tdx_for_6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 TDX updates from Dave Hansen: "The biggest change here is making TDX and kexec play nicely together. Before this, the memory encryption hardware (which doesn't respect cache coherency) could write back old cachelines on top of data in the new kernel, so kexec and TDX were made mutually exclusive. This removes the limitation. There is also some work to tighten up a hardware bug workaround and some MAINTAINERS updates. - Make TDX and kexec work together - Skip TDX bug workaround when the bug is not present - Update maintainers entries" * tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/virt/tdx: Use precalculated TDVPR page physical address KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs x86/virt/tdx: Update the kexec section in the TDX documentation x86/virt/tdx: Remove the !KEXEC_CORE dependency x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL x86/sme: Use percpu boolean to control WBINVD during kexec x86/kexec: Consolidate relocate_kernel() function parameters x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present x86/tdx: Tidy reset_pamt functions x86/tdx: Eliminate duplicate code in tdx_clear_page() MAINTAINERS: Add KVM mail list to the TDX entry MAINTAINERS: Add Rick Edgecombe as a TDX reviewer MAINTAINERS: Update the file list in the TDX entry.
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-09-30Merge tag 'timers-vdso-2025-09-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO updates from Thomas Gleixner: - Further consolidation of the VDSO infrastructure and the common data store - Simplification of the related Kconfig logic - Improve the VDSO selftest suite * tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests: vDSO: Drop vdso_test_clock_getres selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64() selftests: vDSO: vdso_test_abi: Test CPUTIME clocks selftests: vDSO: vdso_test_abi: Use explicit indices for name array selftests: vDSO: vdso_test_abi: Drop clock availability tests selftests: vDSO: vdso_test_abi: Use ksft_finished() selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper vdso: Add struct __kernel_old_timeval forward declaration to gettime.h vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO vdso: Drop Kconfig GENERIC_VDSO_TIME_NS vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE vdso: Drop kconfig GENERIC_COMPAT_VDSO vdso: Drop kconfig GENERIC_VDSO_32 riscv: vdso: Untangle Kconfig logic time: Build generic update_vsyscall() only with generic time vDSO vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs vdso: Move ENABLE_COMPAT_VDSO from core to arm64 ARM: VDSO: Remove cntvct_ok global variable vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30Merge tag 'core-core-2025-09-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull TIF bit unification updates from Thomas Gleixner: "A set of changes to consolidate the generic TIF (thread info flag) bits accross architectures. All architectures define the same set of generic TIF bits. This makes it pointlessly hard to add a new generic TIF bit or to change an existing one. Provide a generic variant and convert the architectures which utilize the generic entry code over to use it. The TIF space is divided into 16 generic bits and 16 architecture specific bits, which turned out to provide enough space on both sides" * tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: LoongArch: Fix bitflag conflict for TIF_FIXADE riscv: Use generic TIF bits loongarch: Use generic TIF bits s390/entry: Remove unused TIF flags s390: Use generic TIF bits x86: Use generic TIF bits asm-generic: Provide generic TIF infrastructure
2025-09-30Merge tag 'x86_apic_for_v6.18_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV and apic updates from Borislav Petkov: - Add functionality to provide runtime firmware updates for the non-x86 parts of an AMD platform like the security processor (ASP) firmware, modules etc, for example. The intent being that these updates are interim, live fixups before a proper BIOS update can be attempted - Add guest support for AMD's Secure AVIC feature which gives encrypted guests the needed protection against a malicious hypervisor generating unexpected interrupts and injecting them into such guest, thus interfering with its operation in an unexpected and negative manner. The advantage of this scheme is that the guest determines which interrupts and when to accept them vs leaving that to the benevolence (or not) of the hypervisor - Strictly separate the startup code from the rest of the kernel where former is executed from the initial 1:1 mapping of memory. The problem was that the toolchain-generated version of the code was being executed from a different mapping of memory than what was "assumed" during code generation, needing an ever-growing pile of fixups for absolute memory references which are invalid in the early, 1:1 memory mapping during boot. The major advantage of this is that there's no need to check the 1:1 mapping portion of the code for absolute relocations anymore and get rid of the RIP_REL_REF() macro sprinkling all over the place. For more info, see Ard's very detailed writeup on this [1] - The usual cleanups and fixes Link: https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB%2BHxvfQbFz0sTiHdn4B%2B%2BzVBJ2mhkPkQ4Q@mail.gmail.com [1] * tag 'x86_apic_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits) x86/boot: Drop erroneous __init annotation from early_set_pages_state() crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver crypto: ccp - Add new HV-Fixed page allocation/free API x86/sev: Add new dump_rmp parameter to snp_leak_pages() API x86/startup/sev: Document the CPUID flow in the boot #VC handler objtool: Ignore __pi___cfi_ prefixed symbols x86/sev: Zap snp_abort() x86/apic/savic: Do not use snp_abort() x86/boot: Get rid of the .head.text section x86/boot: Move startup code out of __head section efistub/x86: Remap inittext read-execute when needed x86/boot: Create a confined code area for startup code x86/kbuild: Incorporate boot/startup/ via Kbuild makefile x86/boot: Revert "Reject absolute references in .head.text" x86/boot: Check startup code for absence of absolute relocations objtool: Add action to check for absence of absolute relocations x86/sev: Export startup routines for later use x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object x86/sev: Provide PIC aliases for SEV related data objects x86/boot: Provide PIC aliases for 5-level paging related constants ...
2025-09-30Merge tag 'x86_cpu_for_v6.18_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpuid updates from Borislav Petkov: - Make UMIP instruction detection more robust - Correct and cleanup AMD CPU topology detection; document the relevant CPUID leaves topology parsing precedence on AMD - Add support for running the kernel as guest on FreeBSD's Bhyve hypervisor - Cleanups and improvements * tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases) x86/umip: Check that the instruction opcode is at least two bytes Documentation/x86/topology: Detail CPUID leaves used for topology enumeration x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info x86/cpu: Rename and move CPU model entry for Diamond Rapids x86/cpu: Detect FreeBSD Bhyve hypervisor
2025-09-30Merge tag 'x86_microcode_for_v6.18_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode loading updates from Borislav Petkov: - Add infrastructure to be able to debug the microcode loader in a guest - Refresh Intel old microcode revisions * tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Add microcode loader debugging functionality x86/microcode: Add microcode= cmdline parsing x86/microcode/intel: Refresh the revisions that determine old_microcode
2025-09-30Merge tag 'x86_build_for_v6.18_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Borislav Petkov: - Remove and simplify a bunch of cc-option and compiler version checks in the build machinery now that the minimal version of both compilers supports them * tag 'x86_build_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Kconfig: Clean up LLVM version checks in IBT configurations x86/build: Remove cc-option from -mskip-rax-setup x86/build: Remove cc-option from -mno-fp-ret-in-387 x86/build: Clean up stack alignment flags in CC_FLAGS_FPU x86/build: Remove cc-option from stack alignment flags x86/build: Remove cc-option for GCC retpoline flags
2025-09-30Merge tag 'sched-core-2025-09-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler changes: - Make migrate_{en,dis}able() inline, to improve performance (Menglong Dong) - Move STDL_INIT() functions out-of-line (Peter Zijlstra) - Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra) Fair scheduling: - Defer throttling to when tasks exit to user-space, to reduce the chance & impact of throttle-preemption with held locks and other resources (Aaron Lu, Valentin Schneider) - Get rid of sched_domains_curr_level hack for tl->cpumask(), as the warning was getting triggered on certain topologies (Peter Zijlstra) Misc cleanups & fixes: - Header cleanups (Menglong Dong) - Fix race in push_dl_task() (Harshit Agarwal)" * tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix some typos in include/linux/preempt.h sched: Make migrate_{en,dis}able() inline rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c sched/fair: Do not balance task to a throttled cfs_rq sched/fair: Do not special case tasks in throttled hierarchy sched/fair: update_cfs_group() for throttled cfs_rqs sched/fair: Propagate load for throttled cfs_rq sched/fair: Get rid of throttled_lb_pair() sched/fair: Task based throttle time accounting sched/fair: Switch to task based throttle model sched/fair: Implement throttle task work and related helpers sched/fair: Add related data structure for task based throttle sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig sched: Move STDL_INIT() functions out-of-line sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() sched/deadline: Fix race in push_dl_task()
2025-09-29Merge tag 'hardening-v6.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: "One notable addition is the creation of the 'transitional' keyword for kconfig so CONFIG renaming can go more smoothly. This has been a long-standing deficiency, and with the renaming of CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI support), this came up again. The breadth of the diffstat is mainly this renaming. - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva) - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure (Junjie Cao) - Add str_assert_deassert() helper (Lad Prabhakar) - gcc-plugins: Remove TODO_verify_il for GCC >= 16 - kconfig: Fix BrokenPipeError warnings in selftests - kconfig: Add transitional symbol attribute for migration support - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI" * tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib/string_choices: Add str_assert_deassert() helper kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI kconfig: Add transitional symbol attribute for migration support kconfig: Fix BrokenPipeError warnings in selftests gcc-plugins: Remove TODO_verify_il for GCC >= 16 stddef: Introduce __TRAILING_OVERLAP() stddef: Remove token-pasting in TRAILING_OVERLAP() lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-24kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFIKees Cook
The kernel's CFI implementation uses the KCFI ABI specifically, and is not strictly tied to a particular compiler. In preparation for GCC supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with associated options). Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will enable CONFIG_CFI during olddefconfig. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-22x86/Kconfig: Reenable PTDUMP on i386Alexander Popov
The commit f9aad622006bd64c ("mm: rename GENERIC_PTDUMP and PTDUMP_CORE") has broken PTDUMP and the Kconfig options that use it on ARCH=i386, including CONFIG_DEBUG_WX. CONFIG_GENERIC_PTDUMP was renamed into CONFIG_ARCH_HAS_PTDUMP, but it was mistakenly moved from "config X86" to "config X86_64". That made PTDUMP unavailable for i386. Move CONFIG_ARCH_HAS_PTDUMP back to "config X86" to fix it. [ bp: Massage commit message. ] Fixes: f9aad622006bd64c ("mm: rename GENERIC_PTDUMP and PTDUMP_CORE") Signed-off-by: Alexander Popov <alex.popov@linux.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org
2025-09-21x86/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"David Hildenbrand
Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE is selected. Link: https://lkml.kernel.org/r/20250901150359.867252-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-17x86: Use generic TIF bitsThomas Gleixner
No point in defining generic items and the upcoming RSEQ optimizations are only available with this _and_ the generic entry infrastructure, which is already used by x86. So no further action required here. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2025-09-16Merge branch 'x86/urgent' into x86/apic, to resolve conflictIngo Molnar
Conflicts: arch/x86/include/asm/sev.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-09-15x86/cpu: Detect FreeBSD Bhyve hypervisorDavid Woodhouse
Detect the Bhyve hypervisor and enable 15-bit MSI support if available. Detecting Bhyve used to be a purely cosmetic issue of the kernel printing 'Hypervisor detected: Bhyve' at boot time. But FreeBSD 15.0 will support¹ the 15-bit MSI enlightenment to support more than 255 vCPUs (http://david.woodhou.se/ExtDestId.pdf) which means there's now actually some functional reason to do so. ¹ https://github.com/freebsd/freebsd-src/commit/313a68ea20b4 [ bp: Massage, move tail comment ontop. ] Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Ahmed S. Darwish <darwi@linutronix.de> Link: https://lore.kernel.org/03802f6f7f5b5cf8c5e8adfe123c397ca8e21093.camel@infradead.org
2025-09-05x86/virt/tdx: Remove the !KEXEC_CORE dependencyKai Huang
During kexec it is now guaranteed that all dirty cachelines of TDX private memory are flushed before jumping to the new kernel. The TDX private memory from the old kernel will remain as TDX private memory in the new kernel, but it is OK because kernel read/write to TDX private memory will never cause machine check, except on the platforms with the TDX partial write erratum, which has already been handled. It is safe to allow kexec to work together with TDX now. Remove the !KEXEC_CORE dependency. Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Link: https://lore.kernel.org/all/20250901160930.1785244-6-pbonzini%40redhat.com