| Age | Commit message (Collapse) | Author |
|
During sigreturn the shadow stack signal frame is popped. The kernel does
this by reading the shadow stack using normal read accesses. When it can't
assume the memory is shadow stack, it takes extra steps to makes sure it is
reading actual shadow stack memory and not other normal readable memory. It
does this by holding the mmap read lock while doing the access and checking
the flags of the VMA.
Unfortunately that is not safe. If the read of the shadow stack sigframe
hits a page fault, the fault handler will try to recursively grab another
mmap read lock. This normally works ok, but if a writer on another CPU is
also waiting, the second read lock could fail and cause a deadlock.
Fix this by not holding mmap lock during the read access to userspace.
Instead use mmap_lock_speculate_...() to watch for changes between dropping
mmap lock and the userspace access. Retry if anything grabbed an mmap write
lock in between and could have changed the VMA.
These mmap_lock_speculate_...() helpers use mm::mm_lock_seq, which is only
available when PER_VMA_LOCK is configured. So make X86_USER_SHADOW_STACK
depend on it. On x86, PER_VMA_LOCK is a default configuration for SMP
kernels. So drop support for the other configs under the assumption that
the !SMP shadow stack user base does not exist.
Currently there is a check that skips the lookup work when the SSP can be
assumed to be on a shadow stack. While reorganizing the function, remove
the optimization to make the tricky code flows more common, such that
issues like this cannot escape detection for so long.
Fixes: 7fad2a432cd3 ("x86/shstk: Check that signal frame is shadow stack mem")
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce
stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields
some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan
Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu
Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by
prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based
metadata tracking system with serialization to a radix tree data
structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas
Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and
VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of
a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec",
which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl
Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail
pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer
filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less
efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the
min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code
simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly
simplifications, code movements, documentation and renaming of
zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one
benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0
pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to
a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae
Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement"
(SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the
addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons
overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and
documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David
Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code
movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and
improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms"
(SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning
algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with
reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma
code
- "mm/damon: support addr_unit on default monitoring targets for
modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged
mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh
Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo
Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type
to the new vma_flags_t type which replaces it. Mainly in the vma
code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace
the deprecated f_op->mmap hook which has been the source of bugs and
security issues for some time. Cleanups, documentation, extension of
mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around
vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm: fix deferred split queue races during migration
mm/khugepaged: fix issue with tracking lock
mm/huge_memory: add and use has_deposited_pgtable()
mm/huge_memory: add and use normal_or_softleaf_folio_pmd()
mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio()
mm/huge_memory: separate out the folio part of zap_huge_pmd()
mm/huge_memory: use mm instead of tlb->mm
mm/huge_memory: remove unnecessary sanity checks
mm/huge_memory: deduplicate zap deposited table call
mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE()
mm/huge_memory: add a common exit path to zap_huge_pmd()
mm/huge_memory: handle buggy PMD entry in zap_huge_pmd()
mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc
mm/huge: avoid big else branch in zap_huge_pmd()
mm/huge_memory: simplify vma_is_specal_huge()
mm: on remap assert that input range within the proposed VMA
mm: add mmap_action_map_kernel_pages[_full]()
uio: replace deprecated mmap hook with mmap_prepare in uio_info
drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare
mm: allow handling of stacked mmap_prepare hooks in more drivers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FRED updates from Borislav Petkov:
"We made the FRED support an opt-in initially out of fear of it
breaking machines left and right in the case of a hw bug in the first
generation of machines supporting it.
Now that that the FRED code has seen a lot of hammering, flip the
logic to be opt-out as is the usual case with new hw features"
* tag 'x86_fred_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fred: Remove kernel log message when initializing exceptions
x86/fred: Enable FRED by default
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updates from Ingo Molnar:
- Remove M486/M486SX/ELAN support, first minimal step (Ingo Molnar)
- Print AGESA string from DMI additional information entry (Yazen
Ghannam, Mario Limonciello)
- Improve and fix the DMI code (Mario Limonciello):
- Correct an indexing error in <linux/dmi.h>
- Adjust dmi_decode() to use enums <linux/dmi.h>
- Add pr_fmt() for dmi_scan.c to fix & standardize the log prefixes
* tag 'x86-platform-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Print AGESA string from DMI additional information entry
firmware: dmi: Add pr_fmt() for dmi_scan.c
firmware: dmi: Adjust dmi_decode() to use enums
firmware: dmi: Correct an indexing error in dmi.h
x86/cpu: Remove M486/M486SX/ELAN support
|
|
Patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup
CONFIG_MIGRATION".
While working on memory hotplug code cleanups, I realized that
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE is not really required anymore.
Changing that revealed some rather nasty looking CONFIG_MIGRATION
handling.
Let's clean that up by introducing a dedicated CONFIG_NUMA_MIGRATION
option and reducing the dependencies that CONFIG_MIGRATION has.
This patch (of 2):
All architectures that select CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE also
select CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG. So we can just remove
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE.
For CONFIG_MIGRATION, make it depend on CONFIG_MEMORY_HOTREMOVE instead,
and make CONFIG_MEMORY_HOTREMOVE select CONFIG_MIGRATION (just like
CONFIG_CMA and CONFIG_COMPACTION already do).
We'll clean up CONFIG_MIGRATION next.
Link: https://lkml.kernel.org/r/20260319-config_migration-v1-0-42270124966f@kernel.org
Link: https://lkml.kernel.org/r/20260319-config_migration-v1-1-42270124966f@kernel.org
Signed-off-by: David Hildenbrand (Arm) <david@kernel.org>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: "Huang, Ying" <ying.huang@linux.alibaba.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the x86 architecture we have various complicated hardware emulation
facilities on x86-32 to support ancient 32-bit CPUs that very very few
people are using with modern kernels. This compatibility glue is sometimes
even causing problems that people spend time to resolve, which time could
be spent on other things.
As Linus recently remarked:
> I really get the feeling that it's time to leave i486 support behind.
> There's zero real reason for anybody to waste one second of
> development effort on this kind of issue.
Implement the first step and remove M486/M486SX/ELAN support:
CONFIG_M486SX
CONFIG_M486
CONFIG_MELAN
[ There's no recent M486=y kernel package for any mainstream x86
32-bit distribution available that I've been able to find, so
actual users should not be impacted, and any legacy users can
keep using older kernels. ]
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ahmed S. Darwish <darwi@linutronix.de>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: John Ogness <john.ogness@linutronix.de>
Link: https://patch.msgid.link/20251214084710.3606385-2-mingo@kernel.org
|
|
When FRED was added to the mainline kernel, it was set up as an explicit
opt-in due to the risk of regressions before hardware was available publicly.
Now, Panther Lake (Core Ultra 300 series) has been released, and benchmarking
by Phoronix has shown that it provides a significant performance benefit on
most workloads:
https://www.phoronix.com/review/intel-fred-panther-lake
Accordingly, enable FRED by default if the CPU supports it. FRED can of
course still be disabled via the fred=off command line option.
Touch up Kconfig help too.
Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Link: https://patch.msgid.link/20260325230151.1898287-2-hpa@zytor.com
|
|
The TSC deadline timer is directly coupled to the TSC and setting the next
deadline is tedious as the clockevents core code converts the
CLOCK_MONOTONIC based absolute expiry time to a relative expiry by reading
the current time from the TSC. It converts that delta to cycles and hands
the result to lapic_next_deadline(), which then has read to the TSC and add
the delta to program the timer.
The core code now supports coupled clock event devices and can provide the
expiry time in TSC cycles directly without reading the TSC at all.
This obviouly works only when the TSC is the current clocksource, but
that's the default for all modern CPUs which implement the TSC deadline
timer. If the TSC is not the current clocksource (e.g. early boot) then the
core code falls back to the relative set_next_event() callback as before.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163430.076565985@kernel.org
|
|
Avoid the overhead of the indirect call for a single instruction to read
the TSC.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260224163429.741886362@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes
arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)
It adds a generic enter/leave layer and switches architectures to use
it. Various hacks were removed in the process.
- "zram: introduce compressed data writeback" implements data
compression for zram writeback (Richard Chang and Sergey Senozhatsky)
- "mm: folio_zero_user: clear page ranges" adds clearing of contiguous
page ranges for hugepages. Large improvements during demand faulting
are demonstrated (David Hildenbrand)
- "memcg cleanups" tidies up some memcg code (Chen Ridong)
- "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos
stats" improves DAMOS stat's provided information, deterministic
control, and readability (SeongJae Park)
- "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few
issues in the hugetlb cgroup charging selftests (Li Wang)
- "Fix va_high_addr_switch.sh test failure - again" addresses several
issues in the va_high_addr_switch test (Chunyu Hu)
- "mm/damon/tests/core-kunit: extend existing test scenarios" improves
the KUnit test coverage for DAMON (Shu Anzai)
- "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a
glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to
transiently return -EAGAIN (Shivank Garg)
- "arch, mm: consolidate hugetlb early reservation" reworks and
consolidates a pile of straggly code related to reservation of
hugetlb memory from bootmem and creation of CMA areas for hugetlb
(Mike Rapoport)
- "mm: clean up anon_vma implementation" cleans up the anon_vma
implementation in various ways (Lorenzo Stoakes)
- "tweaks for __alloc_pages_slowpath()" does a little streamlining of
the page allocator's slowpath code (Vlastimil Babka)
- "memcg: separate private and public ID namespaces" cleans up the
memcg ID code and prevents the internal-only private IDs from being
exposed to userspace (Shakeel Butt)
- "mm: hugetlb: allocate frozen gigantic folio" cleans up the
allocation of frozen folios and avoids some atomic refcount
operations (Kefeng Wang)
- "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement
of memory betewwn the active and inactive LRUs and adds auto-tuning
of the ratio-based quotas and of monitoring intervals (SeongJae Park)
- "Support page table check on PowerPC" makes
CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)
- "nodemask: align nodes_and{,not} with underlying bitmap ops" makes
nodes_and() and nodes_andnot() propagate the return values from the
underlying bit operations, enabling some cleanup in calling code
(Yury Norov)
- "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up
some DAMON internal interfaces (SeongJae Park)
- "mm/khugepaged: cleanups and scan limit fix" does some cleanup work
in khupaged and fixes a scan limit accounting issue (Shivank Garg)
- "mm: balloon infrastructure cleanups" goes to town on the balloon
infrastructure and its page migration function. Mainly cleanups, also
some locking simplification (David Hildenbrand)
- "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds
additional tracepoints to the page reclaim code (Jiayuan Chen)
- "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is
part of Marco's kernel-wide migration from the legacy workqueue APIs
over to the preferred unbound workqueues (Marco Crivellari)
- "Various mm kselftests improvements/fixes" provides various unrelated
improvements/fixes for the mm kselftests (Kevin Brodsky)
- "mm: accelerate gigantic folio allocation" greatly speeds up gigantic
folio allocation, mainly by avoiding unnecessary work in
pfn_range_valid_contig() (Kefeng Wang)
- "selftests/damon: improve leak detection and wss estimation
reliability" improves the reliability of two of the DAMON selftests
(SeongJae Park)
- "mm/damon: cleanup kdamond, damon_call(), damos filter and
DAMON_MIN_REGION" does some cleanup work in the core DAMON code
(SeongJae Park)
- "Docs/mm/damon: update intro, modules, maintainer profile, and misc"
performs maintenance work on the DAMON documentation (SeongJae Park)
- "mm: add and use vma_assert_stabilised() helper" refactors and cleans
up the core VMA code. The main aim here is to be able to use the mmap
write lock's lockdep state to perform various assertions regarding
the locking which the VMA code requires (Lorenzo Stoakes)
- "mm, swap: swap table phase II: unify swapin use" removes some old
swap code (swap cache bypassing and swap synchronization) which
wasn't working very well. Various other cleanups and simplifications
were made. The end result is a 20% speedup in one benchmark (Kairui
Song)
- "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM
available on 64-bit alpha, loongarch, mips, parisc, and um. Various
cleanups were performed along the way (Qi Zheng)
* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits)
mm/memory: handle non-split locks correctly in zap_empty_pte_table()
mm: move pte table reclaim code to memory.c
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
um: mm: enable MMU_GATHER_RCU_TABLE_FREE
parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE
mips: mm: enable MMU_GATHER_RCU_TABLE_FREE
LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE
alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE
mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h
mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles
zsmalloc: make common caches global
mm: add SPDX id lines to some mm source files
mm/zswap: use %pe to print error pointers
mm/vmscan: use %pe to print error pointers
mm/readahead: fix typo in comment
mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file()
mm: refactor vma_map_pages to use vm_insert_pages
mm/damon: unify address range representation with damon_addr_range
mm/cma: replace snprintf with strscpy in cma_new_area
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- A nice cleanup to the paravirt code containing a unification of the
paravirt clock interface, taming the include hell by splitting the
pv_ops structure and removing of a bunch of obsolete code (Juergen
Gross)
* tag 'x86_paravirt_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
x86/paravirt: Use XOR r32,r32 to clear register in pv_vcpu_is_preempted()
x86/paravirt: Remove trailing semicolons from alternative asm templates
x86/pvlocks: Move paravirt spinlock functions into own header
x86/paravirt: Specify pv_ops array in paravirt macros
x86/paravirt: Allow pv-calls outside paravirt.h
objtool: Allow multiple pv_ops arrays
x86/xen: Drop xen_mmu_ops
x86/xen: Drop xen_cpu_ops
x86/xen: Drop xen_irq_ops
x86/paravirt: Move pv_native_*() prototypes to paravirt.c
x86/paravirt: Introduce new paravirt-base.h header
x86/paravirt: Move paravirt_sched_clock() related code into tsc.c
x86/paravirt: Use common code for paravirt_steal_clock()
riscv/paravirt: Use common code for paravirt_steal_clock()
loongarch/paravirt: Use common code for paravirt_steal_clock()
arm64/paravirt: Use common code for paravirt_steal_clock()
arm/paravirt: Use common code for paravirt_steal_clock()
sched: Move clock related paravirt code to kernel/sched
paravirt: Remove asm/paravirt_api_clock.h
x86/paravirt: Move thunk macros to paravirt_types.h
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loader update from Borislav Petkov:
- Since debugging the microcode loader makes sense on baremetal too (it
was used in a guest only until now), extend it to be able to do that
too
* tag 'x86_microcode_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Allow loader debugging to be enabled on baremetal too
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
- Extend the resctrl machinery to support telemetry monitoring on
Intel (Tony Luck)
The practical usage of this is being able to tell how much energy or
how much work can be attributed to a group of tasks tracked under a
single idenitifier. Prepend this work with proper refactoring of
resctrl domains handling code.
* tag 'x86_cache_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86,fs/resctrl: Update documentation for telemetry events
x86/resctrl: Enable RDT_RESOURCE_PERF_PKG
fs/resctrl: Move RMID initialization to first mount
x86,fs/resctrl: Compute number of RMIDs as minimum across resources
fs/resctrl: Move allocation/free of closid_num_dirty_rmid[]
x86/resctrl: Handle number of RMIDs supported by RDT_RESOURCE_PERF_PKG
x86/resctrl: Add energy/perf choices to rdt boot option
x86,fs/resctrl: Handle domain creation/deletion for RDT_RESOURCE_PERF_PKG
fs/resctrl: Refactor rmdir_mondata_subdir_allrdtgrp()
fs/resctrl: Refactor mkdir_mondata_subdir()
x86/resctrl: Read telemetry events
x86/resctrl: Find and enable usable telemetry events
x86,fs/resctrl: Add architectural event pointer
x86,fs/resctrl: Fill in details of events for performance and energy GUIDs
x86/resctrl: Discover hardware telemetry events
fs/resctrl: Emphasize that L3 monitoring resource is required for summing domains
x86,fs/resctrl: Add and initialize a resource for package scope monitoring
x86,fs/resctrl: Add an architectural hook called for first mount
x86,fs/resctrl: Support binary fixed point event counters
x86,fs/resctrl: Handle events that can be read from any CPU
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu updates from Ingo Molnar:
- CPU model updates (Andrew Cooper):
- amd: Correct the microcode table for Zenbleed
- amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[]
- Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix
- tsx: Set default TSX mode to auto (Nikolay Borisov)
- Drop unused Kconfig symbol X86_P6_NOP (Randy Dunlap)
* tag 'x86-cpu-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Set default TSX mode to auto
x86/cpu: Drop unused Kconfig symbol X86_P6_NOP
x86/cpu: Drop vestigial PBE logic in AMD/Hygon/Centaur/Cyrix
x86/cpu/amd: Use ZEN_MODEL_STEP_UCODE() for erratum_1386_microcode[]
x86/cpu/amd: Correct the microcode table for Zenbleed
|
|
The PT_RECLAIM can work on all architectures that support
MMU_GATHER_RCU_TABLE_FREE, except for those that have selected
HAVE_ARCH_TLB_REMOVE_TABLE,so make PT_RECLAIM depends on
MMU_GATHER_RCU_TABLE_FREE && !HAVE_ARCH_TLB_REMOVE_TABLE.
BTW, change PT_RECLAIM to be enabled by default, since nobody should want
to turn it off.
Link: https://lkml.kernel.org/r/83b034810935a9ff18e425b085e065bb0acb28f3.1769515122.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Magnus Lindholm <linmag7@gmail.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Using single ftrace_ops for direct calls update instead of allocating
ftrace_ops object for each trampoline.
With single ftrace_ops object we can use update_ftrace_direct_* api
that allows multiple ip sites updates on single ftrace_ops object.
Adding HAVE_SINGLE_FTRACE_DIRECT_OPS config option to be enabled on
each arch that supports this.
At the moment we can enable this only on x86 arch, because arm relies
on ftrace_ops object representing just single trampoline image (stored
in ftrace_ops::direct_call). Archs that do not support this will continue
to use *_ftrace_direct api.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/bpf/20251230145010.103439-10-jolsa@kernel.org
|
|
Architectures currently opt in for implementing lazy_mmu helpers by
defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE.
In preparation for introducing a generic lazy_mmu layer that will require
storage in task_struct, let's switch to a cleaner approach: instead of
defining a macro, select a CONFIG option.
This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch
select it when it implements lazy_mmu helpers.
__HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on
the new CONFIG instead.
On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected.
This creates some complications in arch/x86/boot/, because a few files
manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does
not define the lazy_mmu helpers, but this breaks the build as
<linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE.
There does not seem to be a clean way out of this - let's just undefine
that new CONFIG too.
Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com>
Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc]
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Debugging the loader on baremetal does make sense, so enable it there
too.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260108165028.27417-1-bp@kernel.org
|
|
Remove the arch-specific variant of paravirt_steal_clock() and use
the common one instead.
With all archs supporting Xen now having been switched to the common
variant, including paravirt.h can be dropped from drivers/xen/time.c.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-12-jgross@suse.com
|
|
The only effect of CONFIG_PARAVIRT_DEBUG set is that instead of doing a call
using a NULL pointer a BUG() is being raised.
While the BUG() will be a little bit easier to analyse, the call of NULL isn't
really that difficult to find the reason for.
Remove the config option to make paravirt coding a little bit less annoying.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260105110520.21356-4-jgross@suse.com
|
|
Introduce intel_aet_read_event() to read telemetry events for resource
RDT_RESOURCE_PERF_PKG. There may be multiple aggregators tracking each
package, so scan all of them and add up all counters. Aggregators may return
an invalid data indication if they have received no records for a given RMID.
The user will see "Unavailable" if none of the aggregators on a package
provide valid counts.
Resctrl now uses readq() so depends on X86_64. Update Kconfig.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
|
|
Each CPU collects data for telemetry events that it sends to the nearest
telemetry event aggregator either when the value of MSR_IA32_PQR_ASSOC.RMID
changes, or when a two millisecond timer expires.
There is a feature type ("energy" or "perf"), GUID, and MMIO region associated
with each aggregator. This combination links to an XML description of the
set of telemetry events tracked by the aggregator. XML files are published
by Intel in a GitHub repository¹.
The telemetry event aggregators maintain per-RMID per-event counts of the
total seen for all the CPUs. There may be multiple telemetry event aggregators
per package.
There are separate sets of aggregators for each feature type. Aggregators
in a set may have different GUIDs. All aggregators with the same feature
type and GUID are symmetric keeping counts for the same set of events for
the CPUs that provide data to them.
The XML file for each aggregator provides the following information:
0) Feature type of the events ("perf" or "energy")
1) Which telemetry events are tracked by the aggregator.
2) The order in which the event counters appear for each RMID.
3) The value type of each event counter (integer or fixed-point).
4) The number of RMIDs supported.
5) Which additional aggregator status registers are included.
6) The total size of the MMIO region for an aggregator.
Introduce struct event_group that condenses the relevant information from
an XML file. Hereafter an "event group" refers to a group of events of a
particular feature type (event_group::pfname set to "energy" or "perf") with
a particular GUID.
Use event_group::pfname to determine the feature id needed to obtain the
aggregator details. It will later be used in console messages and with the
rdt= boot parameter.
The INTEL_PMT_TELEMETRY driver enumerates support for telemetry events.
This driver provides intel_pmt_get_regions_by_feature() to list all available
telemetry event aggregators of a given feature type. The list includes the
"guid", the base address in MMIO space for the region where the event counters
are exposed, and the package id where the all the CPUs that report to this
aggregator are located.
Call INTEL_PMT_TELEMETRY's intel_pmt_get_regions_by_feature() for each event
group to obtain a private copy of that event group's aggregator data. Duplicate
the aggregator data between event groups that have the same feature type
but different GUID. Further processing on this private copy will be unique
to the event group.
¹https://github.com/intel/Intel-PMT
[ bp: Zap text explaining the code, s/guid/GUID/g ]
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
|
|
At SUSE we've been releasing our kernels with TSX enabled for the past 6
years and some customers have started to rely on it. Furthermore, the last
known vulnerability concerning TSX was TAA (CVE-2019-11135) and a
significant amount time has passed since then without anyone reporting any
issues. Intel has released numerous processors which do not have the
TAA vulnerability (Cooper/Ice Lake, Sapphire/Emerald/Granite Rappids)
yet TSX remains being disabled by default.
The main aim of this patch is to reduce the divergence between SUSE's
configuration and the upstream by switching the default TSX mode to
auto. I believe this strikes the right balance between keeping it
enabled where appropriate (i.e every machine which doesn't contain the
TAA vulnerability) and disabling it preventively.
Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://patch.msgid.link/20251112190548.750746-1-nik.borisov@suse.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki)
Rework the vmalloc() code to support non-blocking allocations
(GFP_ATOIC, GFP_NOWAIT)
"ksm: fix exec/fork inheritance" (xu xin)
Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not
inherited across fork/exec
"mm/zswap: misc cleanup of code and documentations" (SeongJae Park)
Some light maintenance work on the zswap code
"mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira)
Enhance the /sys/kernel/debug/page_owner debug feature by adding
unique identifiers to differentiate the various stack traces so
that userspace monitoring tools can better match stack traces over
time
"mm/page_alloc: pcp->batch cleanups" (Joshua Hahn)
Minor alterations to the page allocator's per-cpu-pages feature
"Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra)
Address a scalability issue in userfaultfd's UFFDIO_MOVE operation
"kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov)
"drivers/base/node: fold node register and unregister functions" (Donet Tom)
Clean up the NUMA node handling code a little
"mm: some optimizations for prot numa" (Kefeng Wang)
Cleanups and small optimizations to the NUMA allocation hinting
code
"mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn)
Address long lock hold times at boot on large machines. These were
causing (harmless) softlockup warnings
"optimize the logic for handling dirty file folios during reclaim" (Baolin Wang)
Remove some now-unnecessary work from page reclaim
"mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park)
Enhance the DAMOS auto-tuning feature
"mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan)
Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace
configuration
"expand mmap_prepare functionality, port more users" (Lorenzo Stoakes)
Enhance the new(ish) file_operations.mmap_prepare() method and port
additional callsites from the old ->mmap() over to ->mmap_prepare()
"Fix stale IOTLB entries for kernel address space" (Lu Baolu)
Fix a bug (and possible security issue on non-x86) in the IOMMU
code. In some situations the IOMMU could be left hanging onto a
stale kernel pagetable entry
"mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang)
Clean up and optimize the folio splitting code
"mm, swap: misc cleanup and bugfix" (Kairui Song)
Some cleanups and a minor fix in the swap discard code
"mm/damon: misc documentation fixups" (SeongJae Park)
"mm/damon: support pin-point targets removal" (SeongJae Park)
Permit userspace to remove a specific monitoring target in the
middle of the current targets list
"mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo)
A couple of cleanups related to mm header file inclusion
"mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He)
improve the selection of swap devices for NUMA machines
"mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista)
Change the memory block labels from macros to enums so they will
appear in kernel debug info
"ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes)
Address an inefficiency when KSM unmerges an address range
"mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park)
Fix leaks and unhandled malloc() failures in DAMON userspace unit
tests
"some cleanups for pageout()" (Baolin Wang)
Clean up a couple of minor things in the page scanner's
writeback-for-eviction code
"mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu)
Move hugetlb's sysfs/sysctl handling code into a new file
"introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes)
Make the VMA guard regions available in /proc/pid/smaps and
improves the mergeability of guarded VMAs
"mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes)
Reduce mmap lock contention for callers performing VMA guard region
operations
"vma_start_write_killable" (Matthew Wilcox)
Start work on permitting applications to be killed when they are
waiting on a read_lock on the VMA lock
"mm/damon/tests: add more tests for online parameters commit" (SeongJae Park)
Add additional userspace testing of DAMON's "commit" feature
"mm/damon: misc cleanups" (SeongJae Park)
"make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes)
Address the possible loss of a VMA's VM_SOFTDIRTY flag when that
VMA is merged with another
"mm: support device-private THP" (Balbir Singh)
Introduce support for Transparent Huge Page (THP) migration in zone
device-private memory
"Optimize folio split in memory failure" (Zi Yan)
"mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang)
Some more cleanups in the folio splitting code
"mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes)
Clean up our handling of pagetable leaf entries by introducing the
concept of 'software leaf entries', of type softleaf_t
"reparent the THP split queue" (Muchun Song)
Reparent the THP split queue to its parent memcg. This is in
preparation for addressing the long-standing "dying memcg" problem,
wherein dead memcg's linger for too long, consuming memory
resources
"unify PMD scan results and remove redundant cleanup" (Wei Yang)
A little cleanup in the hugepage collapse code
"zram: introduce writeback bio batching" (Sergey Senozhatsky)
Improve zram writeback efficiency by introducing batched bio
writeback support
"memcg: cleanup the memcg stats interfaces" (Shakeel Butt)
Clean up our handling of the interrupt safety of some memcg stats
"make vmalloc gfp flags usage more apparent" (Vishal Moola)
Clean up vmalloc's handling of incoming GFP flags
"mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang)
Teach soft dirty and userfaultfd write protect tracking to use
RISC-V's Svrsw60t59b extension
"mm: swap: small fixes and comment cleanups" (Youngjun Park)
Fix a small bug and clean up some of the swap code
"initial work on making VMA flags a bitmap" (Lorenzo Stoakes)
Start work on converting the vma struct's flags to a bitmap, so we
stop running out of them, especially on 32-bit
"mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park)
Address a possible bug in the swap discard code and clean things
up a little
[ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu:
register device memory for poison handling") because it looks
broken to me, I've asked for clarification - Linus ]
* tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits)
mm: fix vma_start_write_killable() signal handling
mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate
mm/swapfile: fix list iteration when next node is removed during discard
fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling
mm/kfence: add reboot notifier to disable KFENCE on shutdown
memcg: remove inc/dec_lruvec_kmem_state helpers
selftests/mm/uffd: initialize char variable to Null
mm: fix DEBUG_RODATA_TEST indentation in Kconfig
mm: introduce VMA flags bitmap type
tools/testing/vma: eliminate dependency on vma->__vm_flags
mm: simplify and rename mm flags function for clarity
mm: declare VMA flags by bit
zram: fix a spelling mistake
mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted
pagemap: update BUDDY flag documentation
mm: swap: remove scan_swap_map_slots() references from comments
mm: swap: change swap_alloc_slow() to void
mm, swap: remove redundant comment for read_swap_cache_async
mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
test_progs runner (Alexis Lothoré)
- Convert selftests/bpf/test_xsk to test_progs runner (Bastien
Curutchet)
- Replace bpf memory allocator with kmalloc_nolock() in
bpf_local_storage (Amery Hung), and in bpf streams and range tree
(Puranjay Mohan)
- Introduce support for indirect jumps in BPF verifier and x86 JIT
(Anton Protopopov) and arm64 JIT (Puranjay Mohan)
- Remove runqslower bpf tool (Hoyeon Lee)
- Fix corner cases in the verifier to close several syzbot reports
(Eduard Zingerman, KaFai Wan)
- Several improvements in deadlock detection in rqspinlock (Kumar
Kartikeya Dwivedi)
- Implement "jmp" mode for BPF trampoline and corresponding
DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)
- Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
Chaignon)
- Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
Borkmann)
- Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)
- Optimize bpf_map_update_elem() for map-in-map types (Ritesh
Oedayrajsingh Varma)
- Introduce overwrite mode for BPF ring buffer (Xu Kuohai)
* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
bpf: optimize bpf_map_update_elem() for map-in-map types
bpf: make kprobe_multi_link_prog_run always_inline
selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
selftests/bpf: remove test_tc_edt.sh
selftests/bpf: integrate test_tc_edt into test_progs
selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
selftests/bpf: Add success stats to rqspinlock stress test
rqspinlock: Precede non-head waiter queueing with AA check
rqspinlock: Disable spinning for trylock fallback
rqspinlock: Use trylock fallback when per-CPU rqnode is busy
rqspinlock: Perform AA checks immediately
rqspinlock: Enclose lock/unlock within lock entry acquisitions
bpf: Remove runqslower tool
selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
bpf: Disable file_alloc_security hook
bpf: check for insn arrays in check_ptr_alignment
bpf: force BPF_F_RDONLY_PROG on insn array creation
bpf: Fix exclusive map memory leak
selftests/bpf: Make CS length configurable for rqspinlock stress test
selftests/bpf: Add lock wait time stats to rqspinlock stress test
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull bug handling infrastructure updates from Ingo Molnar:
"Core updates:
- Improve WARN(), which has vararg printf like arguments, to work
with the x86 #UD based WARN-optimizing infrastructure by hiding the
format in the bug_table and replacing this first argument with the
address of the bug-table entry, while making the actual function
that's called a UD1 instruction (Peter Zijlstra)
- Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo
Molnar, s390 support by Heiko Carstens)
Fixes and cleanups:
- bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens)
- <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter
Zijlstra)"
* tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS
x86/bug: Fix BUG_FORMAT vs KASLR
x86_64/bug: Inline the UD1
x86/bug: Implement WARN_ONCE()
x86_64/bug: Implement __WARN_printf()
x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED
x86/bug: Add BUG_FORMAT basics
bug: Allow architectures to provide __WARN_printf()
bug: Implement WARN_ON() using __WARN_FLAGS()
bug: Add report_bug_entry()
bug: Add BUG_FORMAT_ARGS infrastructure
bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS
bug: Add BUG_FORMAT infrastructure
x86: Rework __bug_table helpers
bugs/s390: Remove private WARN_ON() implementation
bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output
bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output
bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance events updates from Ingo Molnar:
"Callchain support:
- Add support for deferred user-space stack unwinding for perf,
enabled on x86. (Peter Zijlstra, Steven Rostedt)
- unwind_user/x86: Enable frame pointer unwinding on x86 (Josh
Poimboeuf)
x86 PMU support and infrastructure:
- x86/insn: Simplify for_each_insn_prefix() (Peter Zijlstra)
- x86/insn,uprobes,alternative: Unify insn_is_nop() (Peter Zijlstra)
Intel PMU driver:
- Large series to prepare for and implement architectural PEBS
support for Intel platforms such as Clearwater Forest (CWF) and
Panther Lake (PTL). (Dapeng Mi, Kan Liang)
- Check dynamic constraints (Kan Liang)
- Optimize PEBS extended config (Peter Zijlstra)
- cstates:
- Remove PC3 support from LunarLake (Zhang Rui)
- Add Pantherlake support (Zhang Rui)
- Clearwater Forest support (Zide Chen)
AMD PMU driver:
- x86/amd: Check event before enable to avoid GPF (George Kennedy)
Fixes and cleanups:
- task_work: Fix NMI race condition (Peter Zijlstra)
- perf/x86: Fix NULL event access and potential PEBS record loss
(Dapeng Mi)
- Misc other fixes and cleanups (Dapeng Mi, Ingo Molnar, Peter
Zijlstra)"
* tag 'perf-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf/x86/intel: Fix and clean up intel_pmu_drain_arch_pebs() type use
perf/x86/intel: Optimize PEBS extended config
perf/x86/intel: Check PEBS dyn_constraints
perf/x86/intel: Add a check for dynamic constraints
perf/x86/intel: Add counter group support for arch-PEBS
perf/x86/intel: Setup PEBS data configuration and enable legacy groups
perf/x86/intel: Update dyn_constraint base on PEBS event precise level
perf/x86/intel: Allocate arch-PEBS buffer and initialize PEBS_BASE MSR
perf/x86/intel: Process arch-PEBS records or record fragments
perf/x86/intel/ds: Factor out PEBS group processing code to functions
perf/x86/intel/ds: Factor out PEBS record processing code to functions
perf/x86/intel: Initialize architectural PEBS
perf/x86/intel: Correct large PEBS flag check
perf/x86/intel: Replace x86_pmu.drain_pebs calling with static call
perf/x86: Fix NULL event access and potential PEBS record loss
perf/x86: Remove redundant is_x86_event() prototype
entry,unwind/deferred: Fix unwind_reset_info() placement
unwind_user/x86: Fix arch=um build
perf: Support deferred user unwind
unwind_user/x86: Teach FP unwind about start of function
...
|
|
Linus figured less #ifdef is more better and making x86-32 use
GENERIC_BUG_RELATIVE_POINTERS removes one layer of macro magic from
the bug.h bits.
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Implement the DYNAMIC_FTRACE_WITH_JMP for x86_64. In ftrace_call_replace,
we will use JMP32_INSN_OPCODE instead of CALL_INSN_OPCODE if the address
should use "jmp".
Meanwhile, adjust the direct call in the ftrace_regs_caller. The RSB is
balanced in the "jmp" mode. Take the function "foo" for example:
original_caller:
call foo -> foo:
call fentry -> fentry:
[do ftrace callbacks ]
move tramp_addr to stack
RET -> tramp_addr
tramp_addr:
[..]
call foo_body -> foo_body:
[..]
RET -> back to tramp_addr
[..]
RET -> back to original_caller
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20251118123639.688444-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce a new IOMMU interface to flush IOTLB paging cache entries for
the CPU kernel address space. This interface is invoked from the x86
architecture code that manages combined user and kernel page tables,
specifically before any kernel page table page is freed and reused.
This addresses the main issue with vfree() which is a common occurrence
and can be triggered by unprivileged users. While this resolves the
primary problem, it doesn't address some extremely rare case related to
memory unplug of memory that was present as reserved memory at boot, which
cannot be triggered by unprivileged users. The discussion can be found at
the link below.
Enable SVA on x86 architecture since the IOMMU can now receive
notification to flush the paging cache before freeing the CPU kernel page
table pages.
Link: https://lkml.kernel.org/r/20251022082635.2462433-9-baolu.lu@linux.intel.com
Link: https://lore.kernel.org/linux-iommu/04983c62-3b1d-40d4-93ae-34ca04b827e5@intel.com/
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Suggested-by: Jann Horn <jannh@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robin Murohy <robin.murphy@arm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Yi Lai <yi1.lai@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use ARCH_INIT_USER_FP_FRAME to describe how frame pointers are unwound
on x86, and enable CONFIG_HAVE_UNWIND_USER_FP accordingly so the
unwind_user interfaces can be used.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20250827193828.347397433@kernel.org
|
|
In preparation for introducing klp-build, add a new CONFIG_KLP_BUILD
option. The initial version will only be supported on x86-64.
Acked-by: Petr Mladek <pmladek@suse.com>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- Simplify inline asm flag output operands now that the minimum
compiler version supports the =@ccCOND syntax
- Remove a bunch of AS_* Kconfig symbols which detect assembler support
for various instruction mnemonics now that the minimum assembler
version supports them all
- The usual cleanups all over the place
* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
x86/mtrr: Remove license boilerplate text with bad FSF address
x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
x86/entry/fred: Push __KERNEL_CS directly
x86/kconfig: Remove CONFIG_AS_AVX512
crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
crypto: X86 - Remove CONFIG_AS_VAES
crypto: x86 - Remove CONFIG_AS_GFNI
x86/kconfig: Drop unused and needless config X86_64_SMP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
"The biggest change here is making TDX and kexec play nicely together.
Before this, the memory encryption hardware (which doesn't respect
cache coherency) could write back old cachelines on top of data in the
new kernel, so kexec and TDX were made mutually exclusive. This
removes the limitation.
There is also some work to tighten up a hardware bug workaround and
some MAINTAINERS updates.
- Make TDX and kexec work together
- Skip TDX bug workaround when the bug is not present
- Update maintainers entries"
* tag 'x86_tdx_for_6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/virt/tdx: Use precalculated TDVPR page physical address
KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
x86/virt/tdx: Update the kexec section in the TDX documentation
x86/virt/tdx: Remove the !KEXEC_CORE dependency
x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
x86/sme: Use percpu boolean to control WBINVD during kexec
x86/kexec: Consolidate relocate_kernel() function parameters
x86/tdx: Skip clearing reclaimed pages unless X86_BUG_TDX_PW_MCE is present
x86/tdx: Tidy reset_pamt functions
x86/tdx: Eliminate duplicate code in tdx_clear_page()
MAINTAINERS: Add KVM mail list to the TDX entry
MAINTAINERS: Add Rick Edgecombe as a TDX reviewer
MAINTAINERS: Update the file list in the TDX entry.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song improves
performance and reduces the failure rate of swap cluster allocation
- "support large align and nid in Rust allocators" from Vitaly Wool
permits Rust allocators to set NUMA node and large alignment when
perforning slub and vmalloc reallocs
- "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend
DAMOS_STAT's handling of the DAMON operations sets for virtual
address spaces for ops-level DAMOS filters
- "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren
Baghdasaryan reduces mmap_lock contention during reads of
/proc/pid/maps
- "mm/mincore: minor clean up for swap cache checking" from Kairui Song
performs some cleanup in the swap code
- "mm: vm_normal_page*() improvements" from David Hildenbrand provides
code cleanup in the pagemap code
- "add persistent huge zero folio support" from Pankaj Raghav provides
a block layer speedup by optionalls making the
huge_zero_pagepersistent, instead of releasing it when its refcount
falls to zero
- "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to
the recently added Kexec Handover feature
- "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo
Stoakes turns mm_struct.flags into a bitmap. To end the constant
struggle with space shortage on 32-bit conflicting with 64-bit's
needs
- "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap
code
- "selftests/mm: Fix false positives and skip unsupported tests" from
Donet Tom fixes a few things in our selftests code
- "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised"
from David Hildenbrand "allows individual processes to opt-out of
THP=always into THP=madvise, without affecting other workloads on the
system".
It's a long story - the [1/N] changelog spells out the considerations
- "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on
the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and
https://blogs.oracle.com/linux/post/introducing-memdesc
- "Tiny optimization for large read operations" from Chi Zhiling
improves the efficiency of the pagecache read path
- "Better split_huge_page_test result check" from Zi Yan improves our
folio splitting selftest code
- "test that rmap behaves as expected" from Wei Yang adds some rmap
selftests
- "remove write_cache_pages()" from Christoph Hellwig removes that
function and converts its two remaining callers
- "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD
selftests issues
- "introduce kernel file mapped folios" from Boris Burkov introduces
the concept of "kernel file pages". Using these permits btrfs to
account its metadata pages to the root cgroup, rather than to the
cgroups of random inappropriate tasks
- "mm/pageblock: improve readability of some pageblock handling" from
Wei Yang provides some readability improvements to the page allocator
code
- "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON
to understand arm32 highmem
- "tools: testing: Use existing atomic.h for vma/maple tests" from
Brendan Jackman performs some code cleanups and deduplication under
tools/testing/
- "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes
a couple of 32-bit issues in tools/testing/radix-tree.c
- "kasan: unify kasan_enabled() and remove arch-specific
implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific
initialization code into a common arch-neutral implementation
- "mm: remove zpool" from Johannes Weiner removes zspool - an
indirection layer which now only redirects to a single thing
(zsmalloc)
- "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a
couple of cleanups in the fork code
- "mm: remove nth_page()" from David Hildenbrand makes rather a lot of
adjustments at various nth_page() callsites, eventually permitting
the removal of that undesirable helper function
- "introduce kasan.write_only option in hw-tags" from Yeoreum Yun
creates a KASAN read-only mode for ARM, using that architecture's
memory tagging feature. It is felt that a read-only mode KASAN is
suitable for use in production systems rather than debug-only
- "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does
some tidying in the hugetlb folio allocation code
- "mm: establish const-correctness for pointer parameters" from Max
Kellermann makes quite a number of the MM API functions more accurate
about the constness of their arguments. This was getting in the way
of subsystems (in this case CEPH) when they attempt to improving
their own const/non-const accuracy
- "Cleanup free_pages() misuse" from Vishal Moola fixes a number of
code sites which were confused over when to use free_pages() vs
__free_pages()
- "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the
mapletree code accessible to Rust. Required by nouveau and by its
forthcoming successor: the new Rust Nova driver
- "selftests/mm: split_huge_page_test: split_pte_mapped_thp
improvements" from David Hildenbrand adds a fix and some cleanups to
the thp selftesting code
- "mm, swap: introduce swap table as swap cache (phase I)" from Chris
Li and Kairui Song is the first step along the path to implementing
"swap tables" - a new approach to swap allocation and state tracking
which is expected to yield speed and space improvements. This
patchset itself yields a 5-20% performance benefit in some situations
- "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc
layer to clean up the ptdesc code a little
- "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some
issues in our 5-level pagetable selftesting code
- "Minor fixes for memory allocation profiling" from Suren Baghdasaryan
addresses a couple of minor issues in relatively new memory
allocation profiling feature
- "Small cleanups" from Matthew Wilcox has a few cleanups in
preparation for more memdesc work
- "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from
Quanmin Yan makes some changes to DAMON in furtherance of supporting
arm highmem
- "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad
Anjum adds that compiler check to selftests code and fixes the
fallout, by removing dead code
- "Improvements to Victim Process Thawing and OOM Reaper Traversal
Order" from zhongjinji makes a number of improvements in the OOM
killer: mainly thawing a more appropriate group of victim threads so
they can release resources
- "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park
is a bunch of small and unrelated fixups for DAMON
- "mm/damon: define and use DAMON initialization check function" from
SeongJae Park implement reliability and maintainability improvements
to a recently-added bug fix
- "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from
SeongJae Park provides additional transparency to userspace clients
of the DAMON_STAT information
- "Expand scope of khugepaged anonymous collapse" from Dev Jain removes
some constraints on khubepaged's collapsing of anon VMAs. It also
increases the success rate of MADV_COLLAPSE against an anon vma
- "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()"
from Lorenzo Stoakes moves us further towards removal of
file_operations.mmap(). This patchset concentrates upon clearing up
the treatment of stacked filesystems
- "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau
provides some fixes and improvements to mlock's tracking of large
folios. /proc/meminfo's "Mlocked" field became more accurate
- "mm/ksm: Fix incorrect accounting of KSM counters during fork" from
Donet Tom fixes several user-visible KSM stats inaccuracies across
forks and adds selftest code to verify these counters
- "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses
some potential but presently benign issues in KSM's mm_slot handling
* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits)
mm: swap: check for stable address space before operating on the VMA
mm: convert folio_page() back to a macro
mm/khugepaged: use start_addr/addr for improved readability
hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list
alloc_tag: fix boot failure due to NULL pointer dereference
mm: silence data-race in update_hiwater_rss
mm/memory-failure: don't select MEMORY_ISOLATION
mm/khugepaged: remove definition of struct khugepaged_mm_slot
mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL
hugetlb: increase number of reserving hugepages via cmdline
selftests/mm: add fork inheritance test for ksm_merging_pages counter
mm/ksm: fix incorrect KSM counter handling in mm_struct during fork
drivers/base/node: fix double free in register_one_node()
mm: remove PMD alignment constraint in execmem_vmalloc()
mm/memory_hotplug: fix typo 'esecially' -> 'especially'
mm/rmap: improve mlock tracking for large folios
mm/filemap: map entire large folio faultaround
mm/fault: try to map the entire file folio in finish_fault()
mm/rmap: mlock large folios in try_to_unmap_one()
mm/rmap: fix a mlock race condition in folio_referenced_one()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:
- Further consolidation of the VDSO infrastructure and the common data
store
- Simplification of the related Kconfig logic
- Improve the VDSO selftest suite
* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests: vDSO: Drop vdso_test_clock_getres
selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
selftests: vDSO: vdso_test_abi: Use explicit indices for name array
selftests: vDSO: vdso_test_abi: Drop clock availability tests
selftests: vDSO: vdso_test_abi: Use ksft_finished()
selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
vdso: Drop kconfig GENERIC_COMPAT_VDSO
vdso: Drop kconfig GENERIC_VDSO_32
riscv: vdso: Untangle Kconfig logic
time: Build generic update_vsyscall() only with generic time vDSO
vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
vdso: Move ENABLE_COMPAT_VDSO from core to arm64
ARM: VDSO: Remove cntvct_ok global variable
vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull TIF bit unification updates from Thomas Gleixner:
"A set of changes to consolidate the generic TIF (thread info flag)
bits accross architectures.
All architectures define the same set of generic TIF bits. This makes
it pointlessly hard to add a new generic TIF bit or to change an
existing one.
Provide a generic variant and convert the architectures which utilize
the generic entry code over to use it. The TIF space is divided into
16 generic bits and 16 architecture specific bits, which turned out to
provide enough space on both sides"
* tag 'core-core-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
LoongArch: Fix bitflag conflict for TIF_FIXADE
riscv: Use generic TIF bits
loongarch: Use generic TIF bits
s390/entry: Remove unused TIF flags
s390: Use generic TIF bits
x86: Use generic TIF bits
asm-generic: Provide generic TIF infrastructure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV and apic updates from Borislav Petkov:
- Add functionality to provide runtime firmware updates for the non-x86
parts of an AMD platform like the security processor (ASP) firmware,
modules etc, for example. The intent being that these updates are
interim, live fixups before a proper BIOS update can be attempted
- Add guest support for AMD's Secure AVIC feature which gives encrypted
guests the needed protection against a malicious hypervisor
generating unexpected interrupts and injecting them into such guest,
thus interfering with its operation in an unexpected and negative
manner.
The advantage of this scheme is that the guest determines which
interrupts and when to accept them vs leaving that to the benevolence
(or not) of the hypervisor
- Strictly separate the startup code from the rest of the kernel where
former is executed from the initial 1:1 mapping of memory.
The problem was that the toolchain-generated version of the code was
being executed from a different mapping of memory than what was
"assumed" during code generation, needing an ever-growing pile of
fixups for absolute memory references which are invalid in the early,
1:1 memory mapping during boot.
The major advantage of this is that there's no need to check the 1:1
mapping portion of the code for absolute relocations anymore and get
rid of the RIP_REL_REF() macro sprinkling all over the place.
For more info, see Ard's very detailed writeup on this [1]
- The usual cleanups and fixes
Link: https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB%2BHxvfQbFz0sTiHdn4B%2B%2BzVBJ2mhkPkQ4Q@mail.gmail.com [1]
* tag 'x86_apic_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (49 commits)
x86/boot: Drop erroneous __init annotation from early_set_pages_state()
crypto: ccp - Add AMD Seamless Firmware Servicing (SFS) driver
crypto: ccp - Add new HV-Fixed page allocation/free API
x86/sev: Add new dump_rmp parameter to snp_leak_pages() API
x86/startup/sev: Document the CPUID flow in the boot #VC handler
objtool: Ignore __pi___cfi_ prefixed symbols
x86/sev: Zap snp_abort()
x86/apic/savic: Do not use snp_abort()
x86/boot: Get rid of the .head.text section
x86/boot: Move startup code out of __head section
efistub/x86: Remap inittext read-execute when needed
x86/boot: Create a confined code area for startup code
x86/kbuild: Incorporate boot/startup/ via Kbuild makefile
x86/boot: Revert "Reject absolute references in .head.text"
x86/boot: Check startup code for absence of absolute relocations
objtool: Add action to check for absence of absolute relocations
x86/sev: Export startup routines for later use
x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
x86/sev: Provide PIC aliases for SEV related data objects
x86/boot: Provide PIC aliases for 5-level paging related constants
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpuid updates from Borislav Petkov:
- Make UMIP instruction detection more robust
- Correct and cleanup AMD CPU topology detection; document the relevant
CPUID leaves topology parsing precedence on AMD
- Add support for running the kernel as guest on FreeBSD's Bhyve
hypervisor
- Cleanups and improvements
* tag 'x86_cpu_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/umip: Fix decoding of register forms of 0F 01 (SGDT and SIDT aliases)
x86/umip: Check that the instruction opcode is at least two bytes
Documentation/x86/topology: Detail CPUID leaves used for topology enumeration
x86/cpu/topology: Define AMD64_CPUID_EXT_FEAT MSR
x86/cpu/topology: Check for X86_FEATURE_XTOPOLOGY instead of passing has_xtopology
x86/cpu/cacheinfo: Simplify cacheinfo_amd_init_llc_id() using _cpuid4_info
x86/cpu: Rename and move CPU model entry for Diamond Rapids
x86/cpu: Detect FreeBSD Bhyve hypervisor
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode loading updates from Borislav Petkov:
- Add infrastructure to be able to debug the microcode loader in a guest
- Refresh Intel old microcode revisions
* tag 'x86_microcode_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Add microcode loader debugging functionality
x86/microcode: Add microcode= cmdline parsing
x86/microcode/intel: Refresh the revisions that determine old_microcode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov:
- Remove and simplify a bunch of cc-option and compiler version checks
in the build machinery now that the minimal version of both compilers
supports them
* tag 'x86_build_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Clean up LLVM version checks in IBT configurations
x86/build: Remove cc-option from -mskip-rax-setup
x86/build: Remove cc-option from -mno-fp-ret-in-387
x86/build: Clean up stack alignment flags in CC_FLAGS_FPU
x86/build: Remove cc-option from stack alignment flags
x86/build: Remove cc-option for GCC retpoline flags
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Core scheduler changes:
- Make migrate_{en,dis}able() inline, to improve performance
(Menglong Dong)
- Move STDL_INIT() functions out-of-line (Peter Zijlstra)
- Unify the SCHED_{SMT,CLUSTER,MC} Kconfig (Peter Zijlstra)
Fair scheduling:
- Defer throttling to when tasks exit to user-space, to reduce the
chance & impact of throttle-preemption with held locks and other
resources (Aaron Lu, Valentin Schneider)
- Get rid of sched_domains_curr_level hack for tl->cpumask(), as the
warning was getting triggered on certain topologies (Peter
Zijlstra)
Misc cleanups & fixes:
- Header cleanups (Menglong Dong)
- Fix race in push_dl_task() (Harshit Agarwal)"
* tag 'sched-core-2025-09-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix some typos in include/linux/preempt.h
sched: Make migrate_{en,dis}able() inline
rcu: Replace preempt.h with sched.h in include/linux/rcupdate.h
arch: Add the macro COMPILE_OFFSETS to all the asm-offsets.c
sched/fair: Do not balance task to a throttled cfs_rq
sched/fair: Do not special case tasks in throttled hierarchy
sched/fair: update_cfs_group() for throttled cfs_rqs
sched/fair: Propagate load for throttled cfs_rq
sched/fair: Get rid of throttled_lb_pair()
sched/fair: Task based throttle time accounting
sched/fair: Switch to task based throttle model
sched/fair: Implement throttle task work and related helpers
sched/fair: Add related data structure for task based throttle
sched: Unify the SCHED_{SMT,CLUSTER,MC} Kconfig
sched: Move STDL_INIT() functions out-of-line
sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask()
sched/deadline: Fix race in push_dl_task()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"One notable addition is the creation of the 'transitional' keyword for
kconfig so CONFIG renaming can go more smoothly.
This has been a long-standing deficiency, and with the renaming of
CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
support), this came up again.
The breadth of the diffstat is mainly this renaming.
- Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)
- lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
(Junjie Cao)
- Add str_assert_deassert() helper (Lad Prabhakar)
- gcc-plugins: Remove TODO_verify_il for GCC >= 16
- kconfig: Fix BrokenPipeError warnings in selftests
- kconfig: Add transitional symbol attribute for migration support
- kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"
* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib/string_choices: Add str_assert_deassert() helper
kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
kconfig: Add transitional symbol attribute for migration support
kconfig: Fix BrokenPipeError warnings in selftests
gcc-plugins: Remove TODO_verify_il for GCC >= 16
stddef: Introduce __TRAILING_OVERLAP()
stddef: Remove token-pasting in TRAILING_OVERLAP()
lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
|
|
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).
Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
The commit
f9aad622006bd64c ("mm: rename GENERIC_PTDUMP and PTDUMP_CORE")
has broken PTDUMP and the Kconfig options that use it on ARCH=i386, including
CONFIG_DEBUG_WX.
CONFIG_GENERIC_PTDUMP was renamed into CONFIG_ARCH_HAS_PTDUMP, but it was
mistakenly moved from "config X86" to "config X86_64". That made PTDUMP
unavailable for i386.
Move CONFIG_ARCH_HAS_PTDUMP back to "config X86" to fix it.
[ bp: Massage commit message. ]
Fixes: f9aad622006bd64c ("mm: rename GENERIC_PTDUMP and PTDUMP_CORE")
Signed-off-by: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
|
|
Now handled by the core automatically once SPARSEMEM_VMEMMAP_ENABLE is
selected.
Link: https://lkml.kernel.org/r/20250901150359.867252-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
No point in defining generic items and the upcoming RSEQ optimizations are
only available with this _and_ the generic entry infrastructure, which is
already used by x86. So no further action required here.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
|
|
Conflicts:
arch/x86/include/asm/sev.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Detect the Bhyve hypervisor and enable 15-bit MSI support if available.
Detecting Bhyve used to be a purely cosmetic issue of the kernel printing
'Hypervisor detected: Bhyve' at boot time.
But FreeBSD 15.0 will support¹ the 15-bit MSI enlightenment to support
more than 255 vCPUs (http://david.woodhou.se/ExtDestId.pdf) which means
there's now actually some functional reason to do so.
¹ https://github.com/freebsd/freebsd-src/commit/313a68ea20b4
[ bp: Massage, move tail comment ontop. ]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Ahmed S. Darwish <darwi@linutronix.de>
Link: https://lore.kernel.org/03802f6f7f5b5cf8c5e8adfe123c397ca8e21093.camel@infradead.org
|
|
During kexec it is now guaranteed that all dirty cachelines of TDX
private memory are flushed before jumping to the new kernel. The TDX
private memory from the old kernel will remain as TDX private memory in
the new kernel, but it is OK because kernel read/write to TDX private
memory will never cause machine check, except on the platforms with the
TDX partial write erratum, which has already been handled.
It is safe to allow kexec to work together with TDX now. Remove the
!KEXEC_CORE dependency.
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Link: https://lore.kernel.org/all/20250901160930.1785244-6-pbonzini%40redhat.com
|