summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)Author
2026-01-24Merge tag 'kvmarm-fixes-6.19-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.19 - Ensure early return semantics are preserved for pKVM fault handlers - Fix case where the kernel runs with the guest's PAN value when CONFIG_ARM64_PAN is not set - Make stage-1 walks to set the access flag respect the access permission of the underlying stage-2, when enabled - Propagate computed FGT values to the pKVM view of the vCPU at vcpu_load() - Correctly program PXN and UXN privilege bits for hVHE's stage-1 page tables - Check that the VM is actually using VGICv3 before accessing the GICv3 CPU interface - Delete some unused code
2026-01-23arm64: errata: Workaround for SI L1 downstream coherency issueLucas Wei
When software issues a Cache Maintenance Operation (CMO) targeting a dirty cache line, the CPU and DSU cluster may optimize the operation by combining the CopyBack Write and CMO into a single combined CopyBack Write plus CMO transaction presented to the interconnect (MCN). For these combined transactions, the MCN splits the operation into two separate transactions, one Write and one CMO, and then propagates the write and optionally the CMO to the downstream memory system or external Point of Serialization (PoS). However, the MCN may return an early CompCMO response to the DSU cluster before the corresponding Write and CMO transactions have completed at the external PoS or downstream memory. As a result, stale data may be observed by external observers that are directly connected to the external PoS or downstream memory. This erratum affects any system topology in which the following conditions apply: - The Point of Serialization (PoS) is located downstream of the interconnect. - A downstream observer accesses memory directly, bypassing the interconnect. Conditions: This erratum occurs only when all of the following conditions are met: 1. Software executes a data cache maintenance operation, specifically, a clean or clean&invalidate by virtual address (DC CVAC or DC CIVAC), that hits on unique dirty data in the CPU or DSU cache. This results in a combined CopyBack and CMO being issued to the interconnect. 2. The interconnect splits the combined transaction into separate Write and CMO transactions and returns an early completion response to the CPU or DSU before the write has completed at the downstream memory or PoS. 3. A downstream observer accesses the affected memory address after the early completion response is issued but before the actual memory write has completed. This allows the observer to read stale data that has not yet been updated at the PoS or downstream memory. The implementation of workaround put a second loop of CMOs at the same virtual address whose operation meet erratum conditions to wait until cache data be cleaned to PoC. This way of implementation mitigates performance penalty compared to purely duplicate original CMO. Signed-off-by: Lucas Wei <lucaswei@google.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-23KVM: arm64: Trap MTE access and discovery when MTE is disabledFuad Tabba
If MTE is not supported by the hardware, or is disabled in the kernel configuration (`CONFIG_ARM64_MTE=n`) or command line (`arm64.nomte`), the kernel stops advertising MTE to userspace and avoids using MTE instructions. However, this is a software-level disable only. When MTE hardware is present and enabled by EL3 firmware, leaving `HCR_EL2.ATA` set allows the host to execute MTE instructions (STG, LDG, etc.) and access allocation tags in physical memory. Prevent this by clearing `HCR_EL2.ATA` when MTE is disabled. Remove it from the `HCR_HOST_NVHE_FLAGS` default, and conditionally set it in `cpu_prepare_hyp_mode()` only when `system_supports_mte()` returns true. This causes MTE instructions to trap to EL2 when `HCR_EL2.ATA` is cleared. Additionally, set `HCR_EL2.TID5` when MTE is disabled. This traps reads of `GMID_EL1` (Multiple tag transfer ID register) to EL2, preventing the discovery of MTE parameters (such as tag block size) when the feature is suppressed. Early boot code in `head.S` temporarily keeps `HCR_ATA` set to avoid special-casing initialization paths. This is safe because this code executes before untrusted code runs and will clear `HCR_ATA` if MTE is disabled. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260122112218.531948-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/pkvm-features-6.20 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/pkvm-features-6.20: : . : pKVM guest feature trapping fixes, courtesy of Fuad Tabba. : . KVM: arm64: Prevent host from managing timer offsets for protected VMs KVM: arm64: Check whether a VM IOCTL is allowed in pKVM KVM: arm64: Track KVM IOCTLs and their associated KVM caps KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVM KVM: arm64: Include VM type when checking VM capabilities in pKVM KVM: arm64: Introduce helper to calculate fault IPA offset KVM: arm64: Fix MTE flag initialization for protected VMs KVM: arm64: Fix Trace Buffer trap polarity for protected VMs KVM: arm64: Fix Trace Buffer trapping for protected VMs Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/feat_idst into kvmarm-master/nextMarc Zyngier
* kvm-arm64/feat_idst: : . : Add support for FEAT_IDST, allowing ID registers that are not implemented : to be reported as a normal trap rather than as an UNDEF exception. : . KVM: arm64: selftests: Add a test for FEAT_IDST KVM: arm64: pkvm: Report optional ID register traps with a 0x18 syndrome KVM: arm64: pkvm: Add a generic synchronous exception injection primitive KVM: arm64: Force trap of GMID_EL1 when the guest doesn't have MTE KVM: arm64: Handle CSSIDR2_EL1 and SMIDR_EL1 in a generic way KVM: arm64: Handle FEAT_IDST for sysregs without specific handlers KVM: arm64: Add a generic synchronous exception injection primitive KVM: arm64: Add trap routing for GMID_EL1 arm64: Repaint ID_AA64MMFR2_EL1.IDS description Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch kvm-arm64/vtcr into kvmarm-master/nextMarc Zyngier
* kvm-arm64/vtcr: : . : VTCR_EL2 conversion to the configuration-driven RESx framework, : fixing a couple of UXN/PXN/XN bugs in the process. : . KVM: arm64: nv: Return correct RES0 bits for FGT registers KVM: arm64: Always populate FGT masks at boot time KVM: arm64: Honor UX/PX attributes for EL2 S1 mappings KVM: arm64: Convert VTCR_EL2 to config-driven sanitisation KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co arm64: Convert VTCR_EL2 to sysreg infratructure arm64: Convert ID_AA64MMFR0_EL1.TGRAN{4,16,64}_2 to UnsignedEnum KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers KVM: arm64: Don't blindly set set PSTATE.PAN on guest exit KVM: arm64: nv: Respect stage-2 write permssion when setting stage-1 AF KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps() KVM: arm64: Remove unused parameter in synchronize_vcpu_pstate() KVM: arm64: Remove extra argument for __pvkm_host_{share,unshare}_hyp() KVM: arm64: Inject UNDEF for a register trap without accessor KVM: arm64: Copy FGT traps to unprotected pKVM VCPU on VCPU load KVM: arm64: Fix EL2 S1 XN handling for hVHE setups KVM: arm64: gic: Check for vGICv3 when clearing TWI Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-23Merge branch arm64/for-next/cpufeature into kvmarm-master/nextMarc Zyngier
Merge arm64/for-next/cpufeature in to resolve conflicts resulting from the removal of CONFIG_PAN. * arm64/for-next/cpufeature: arm64: Add support for FEAT_{LS64, LS64_V} KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1 KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memory KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots arm64: Unconditionally enable PAN support arm64: Unconditionally enable LSE support arm64: Add support for TSV110 Spectre-BHB mitigation Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-22arm64: Add support for FEAT_{LS64, LS64_V}Yicong Yang
Armv8.7 introduces single-copy atomic 64-byte loads and stores instructions and its variants named under FEAT_{LS64, LS64_V}. These features are identified by ID_AA64ISAR1_EL1.LS64 and the use of such instructions in userspace (EL0) can be trapped. As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later patch will be exported to userspace, FEAT_LS64_V will be enabled only in kernel. In order to support the use of corresponding instructions in userspace: - Make ID_AA64ISAR1_EL1.LS64 visbile to userspace - Add identifying and enabling in the cpufeature list - Expose these support of these features to userspace through HWCAP3 and cpuinfo ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for special memory (device memory) so requires support by the CPU, system and target memory location (device that support these instructions). The HWCAP3_LS64, implies the support of CPU and system (since no identification method from system, so SoC vendors should advertise support in the CPU if system also support them). Otherwise for ld64b/st64b the atomicity may not be guaranteed or a DABT will be generated, so users (probably userspace driver developer) should make sure the target memory (device) also have the support. For st64bv 0xffffffffffffffff will be returned as status result for unsupported memory so user should check it. Document the restrictions along with HWCAP3_LS64. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guestYicong Yang
Using FEAT_{LS64, LS64_V} instructions in a guest is also controlled by HCRX_EL2.{EnALS, EnASR}. Enable it if guest has related feature. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1Yicong Yang
Instructions introduced by FEAT_{LS64, LS64_V} is controlled by HCRX_EL2.{EnALS, EnASR}. Configure all of these to allow usage at EL0/1. This doesn't mean these instructions are always available in EL0/1 if provided. The hypervisor still have the control at runtime. Acked-by: Will Deacon <will@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22KVM: arm64: Handle DABT caused by LS64* instructions on unsupported memoryYicong Yang
If FEAT_LS64WB not supported, FEAT_LS64* instructions only support to access Device/Uncacheable memory, otherwise a data abort for unsupported Exclusive or atomic access (0x35, UAoEF) is generated per spec. It's implementation defined whether the target exception level is routed and is possible to implemented as route to EL2 on a VHE VM according to DDI0487L.b Section C3.2.6 Single-copy atomic 64-byte load/store. If it's implemented as generate the DABT to the final enabled stage (stage-2), inject the UAoEF back to the guest after checking the memslot is valid. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22arm64: Unconditionally enable PAN supportMarc Zyngier
FEAT_PAN has been around since ARMv8.1 (over 11 years ago), has no compiler dependency (we have our own accessors), and is a great security benefit. Drop CONFIG_ARM64_PAN, and make the support unconditionnal. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22arm64: Unconditionally enable LSE supportMarc Zyngier
LSE atomics have been in the architecture since ARMv8.1 (released in 2014), and are hopefully supported by all modern toolchains. Drop the optional nature of LSE support in the kernel, and always compile the support in, as this really is very little code. LL/SC still is the default, and the switch to LSE is done dynamically. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-20treewide: provide a generic clear_user_page() variantDavid Hildenbrand
Patch series "mm: folio_zero_user: clear page ranges", v11. This series adds clearing of contiguous page ranges for hugepages. The series improves on the current discontiguous clearing approach in two ways: - clear pages in a contiguous fashion. - use batched clearing via clear_pages() wherever exposed. The first is useful because it allows us to make much better use of hardware prefetchers. The second, enables advertising the real extent to the processor. Where specific instructions support it (ex. string instructions on x86; "mops" on arm64 etc), a processor can optimize based on this because, instead of seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a larger unit being operated on. For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to a mode where they start eliding cacheline allocation. This is helpful not just because it results in higher bandwidth, but also because now the cache is not evicting useful cachelines and replacing them with zeroes. Demand faulting a 64GB region shows performance improvement: $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5 baseline +series (GBps +- %stdev) (GBps +- %stdev) pg-sz=2MB 11.76 +- 1.10% 25.34 +- 1.18% [*] +115.47% preempt=* pg-sz=1GB 24.85 +- 2.41% 39.22 +- 2.32% + 57.82% preempt=none|voluntary pg-sz=1GB (similar) 52.73 +- 0.20% [#] +112.19% preempt=full|lazy [*] This improvement is because switching to sequential clearing allows the hardware prefetchers to do a much better job. [#] For pg-sz=1GB a large part of the improvement is because of the cacheline elision mentioned above. preempt=full|lazy improves upon that because, not needing explicit invocations of cond_resched() to ensure reasonable preemption latency, it can clear the full extent as a single unit. In comparison the maximum extent used for preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB). When provided the full extent the processor forgoes allocating cachelines on this path almost entirely. (The hope is that eventually, in the fullness of time, the lazy preemption model will be able to do the same job that none or voluntary models are used for, allowing us to do away with cond_resched().) Raghavendra also tested previous version of the series on AMD Genoa and sees similar improvement [1] with preempt=lazy. $ perf bench mem map -p $page-size -f populate -s 64GB -l 10 base patched change pg-sz=2MB 12.731939 GB/sec 26.304263 GB/sec 106.6% pg-sz=1GB 26.232423 GB/sec 61.174836 GB/sec 133.2% This patch (of 8): Let's drop all variants that effectively map to clear_page() and provide it in a generic variant instead. We'll use the macro clear_user_page to indicate whether an architecture provides it's own variant. Also, clear_user_page() is only called from the generic variant of clear_user_highpage(), so define it only if the architecture does not provide a clear_user_highpage(). And, for simplicity define it in linux/highmem.h. Note that for parisc, clear_page() and clear_user_page() map to clear_page_asm(), so we can just get rid of the custom clear_user_page() implementation. There is a clear_user_page_asm() function on parisc, that seems to be unused. Not sure what's up with that. Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com Signed-off-by: David Hildenbrand <david@redhat.com> Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ankur Arora <ankur.a.arora@oracle.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Hildenbrand <david@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20arm64: mm: replace TIF_LAZY_MMU with is_lazy_mmu_mode_active()Kevin Brodsky
The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode. As a result we no longer need a TIF flag for that purpose - let's use the new is_lazy_mmu_mode_active() helper instead. The explicit check for in_interrupt() is no longer necessary either as is_lazy_mmu_mode_active() always returns false in interrupt context. Link: https://lkml.kernel.org/r/20251215150323.2218608-11-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm: enable lazy_mmu sections to nestKevin Brodsky
Despite recent efforts to prevent lazy_mmu sections from nesting, it remains difficult to ensure that it never occurs - and in fact it does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC). Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested") made nesting tolerable on arm64, but without truly supporting it: the inner call to leave() disables the batching optimisation before the outer section ends. This patch actually enables lazy_mmu sections to nest by tracking the nesting level in task_struct, in a similar fashion to e.g. pagefault_{enable,disable}(). This is fully handled by the generic lazy_mmu helpers that were recently introduced. lazy_mmu sections were not initially intended to nest, so we need to clarify the semantics w.r.t. the arch_*_lazy_mmu_mode() callbacks. This patch takes the following approach: * The outermost calls to lazy_mmu_mode_{enable,disable}() trigger calls to arch_{enter,leave}_lazy_mmu_mode() - this is unchanged. * Nested calls to lazy_mmu_mode_{enable,disable}() are not forwarded to the arch via arch_{enter,leave} - lazy MMU remains enabled so the assumption is that these callbacks are not relevant. However, existing code may rely on a call to disable() to flush any batched state, regardless of nesting. arch_flush_lazy_mmu_mode() is therefore called in that situation. A separate interface was recently introduced to temporarily pause the lazy MMU mode: lazy_mmu_mode_{pause,resume}(). pause() fully exits the mode *regardless of the nesting level*, and resume() restores the mode at the same nesting level. pause()/resume() are themselves allowed to nest, so we actually store two nesting levels in task_struct: enable_count and pause_count. A new helper is_lazy_mmu_mode_active() is introduced to determine whether we are currently in lazy MMU mode; this will be used in subsequent patches to replace the various ways arch's currently track whether the mode is enabled. In summary (enable/pause represent the values *after* the call): lazy_mmu_mode_enable() -> arch_enter() enable=1 pause=0 lazy_mmu_mode_enable() -> ø enable=2 pause=0 lazy_mmu_mode_pause() -> arch_leave() enable=2 pause=1 lazy_mmu_mode_resume() -> arch_enter() enable=2 pause=0 lazy_mmu_mode_disable() -> arch_flush() enable=1 pause=0 lazy_mmu_mode_disable() -> arch_leave() enable=0 pause=0 Note: is_lazy_mmu_mode_active() is added to <linux/sched.h> to allow arch headers included by <linux/pgtable.h> to use it. Link: https://lkml.kernel.org/r/20251215150323.2218608-10-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm: bail out of lazy_mmu_mode_* in interrupt contextKevin Brodsky
The lazy MMU mode cannot be used in interrupt context. This is documented in <linux/pgtable.h>, but isn't consistently handled across architectures. arm64 ensures that calls to lazy_mmu_mode_* have no effect in interrupt context, because such calls do occur in certain configurations - see commit b81c688426a9 ("arm64/mm: Disable barrier batching in interrupt contexts"). Other architectures do not check this situation, most likely because it hasn't occurred so far. Let's handle this in the new generic lazy_mmu layer, in the same fashion as arm64: bail out of lazy_mmu_mode_* if in_interrupt(). Also remove the arm64 handling that is now redundant. Both arm64 and x86/Xen also ensure that any lazy MMU optimisation is disabled while in interrupt (see queue_pte_barriers() and xen_get_lazy_mode() respectively). This will be handled in the generic layer in a subsequent patch. Link: https://lkml.kernel.org/r/20251215150323.2218608-9-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODEKevin Brodsky
Architectures currently opt in for implementing lazy_mmu helpers by defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE. In preparation for introducing a generic lazy_mmu layer that will require storage in task_struct, let's switch to a cleaner approach: instead of defining a macro, select a CONFIG option. This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch select it when it implements lazy_mmu helpers. __HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on the new CONFIG instead. On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected. This creates some complications in arch/x86/boot/, because a few files manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does not define the lazy_mmu helpers, but this breaks the build as <linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE. There does not seem to be a clean way out of this - let's just undefine that new CONFIG too. Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-15KVM: arm64: Check whether a VM IOCTL is allowed in pKVMFuad Tabba
Certain VM IOCTLs are tied to specific VM features. Since pKVM does not support all features, restrict which IOCTLs are allowed depending on whether the associated feature is supported. Use the existing VM capability check as the source of truth to whether an IOCTL is allowed for a particular VM by mapping the IOCTLs with their associated capabilities. Suggested-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Track KVM IOCTLs and their associated KVM capsFuad Tabba
Track KVM IOCTLs (VM IOCTLs for now), and the associated KVM capability that enables that IOCTL. Add a function that performs the lookup. This will be used by CoCo VM Hypervisors (e.g., pKVM) to determine whether a particular KVM IOCTL is allowed for its VMs. Suggested-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> [maz: don't expose KVM_CAP_BASIC to userspace, and rely on NR_VCPUS as a proxy for this] Link: https://patch.msgid.link/20251211104710.151771-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Do not allow KVM_CAP_ARM_MTE for any guest in pKVMFuad Tabba
Supporting MTE in pKVM introduces significant complexity to the hypervisor at EL2, even for non-protected VMs, since it would require EL2 to handle tag management. For now, do not allow KVM_CAP_ARM_MTE for any VM type in protected mode. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Include VM type when checking VM capabilities in pKVMFuad Tabba
Certain features and capabilities are restricted in protected mode. Most of these features are restricted only for protected VMs, but some are restricted for ALL VMs in protected mode. Extend the pKVM capability check to pass the VM (kvm), and use that when determining supported features. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Introduce helper to calculate fault IPA offsetFuad Tabba
This 12-bit FAR fault IPA offset mask is hard-coded as 'GENMASK(11, 0)' in several places to reconstruct the full fault IPA. Introduce FAR_TO_FIPA_OFFSET() to calculate this value in a shared header and replace all open-coded instances to improve readability. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251211104710.151771-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Add a generic synchronous exception injection primitiveMarc Zyngier
Maybe in a surprising way, we don't currently have a generic way to inject a synchronous exception at the EL the vcpu is currently running at. Extract such primitive from the UNDEF injection code. Reviewed-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Link: https://patch.msgid.link/20260108173233.2911955-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Honor UX/PX attributes for EL2 S1 mappingsMarc Zyngier
Now that we potentially have two bits to deal with when setting execution permissions, make sure we correctly handle them when both when building the page tables and when reading back from them. Reported-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and coMarc Zyngier
None of the registers we manage in the feature dependency infrastructure so far has any RES1 bit. This is about to change, as VTCR_EL2 has its bit 31 being RES1. In order to not fail the consistency checks by not describing a bit, add RES1 bits to the set of immutable bits. This requires some extra surgery for the FGT handling, as we now need to track RES1 bits there as well. There are no RES1 FGT bits *yet*. Watch this space. Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-5-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-15arm64: Convert VTCR_EL2 to sysreg infratructureMarc Zyngier
Our definition of VTCR_EL2 is both partial (tons of fields are missing) and totally inconsistent (some constants are shifted, some are not). They are also expressed in terms of TCR, which is rather inconvenient. Replace the ad-hoc definitions with the the generated version. This results in a bunch of additional changes to make the code with the unshifted nature of generated enumerations. The register data was extracted from the BSD licenced AARCHMRS (AARCHMRS_OPENSOURCE_A_profile_FAT-2025-09_ASL0). Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20251210173024.561160-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-14KVM: arm64: Calculate hyp VA size only oncePetteri Kangaslampi
Calculate the hypervisor's VA size only once to maintain consistency between the memory layout and MMU initialization logic. Previously the two would be inconsistent when the kernel is configured for less than IDMAP_VA_BITS of VA space. Signed-off-by: Petteri Kangaslampi <pekangas@google.com> Tested-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260113194409.2970324-2-pekangas@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-12arm64/paravirt: Use common code for paravirt_steal_clock()Juergen Gross
Remove the arch-specific variant of paravirt_steal_clock() and use the common one instead. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-9-jgross@suse.com
2026-01-12sched: Move clock related paravirt code to kernel/schedJuergen Gross
Paravirt clock related functions are available in multiple archs. In order to share the common parts, move the common static keys to kernel/sched/ and remove them from the arch specific files. Make a common paravirt_steal_clock() implementation available in kernel/sched/cputime.c, guarding it with a new config option CONFIG_HAVE_PV_STEAL_CLOCK_GEN, which can be selected by an arch in case it wants to use that common variant. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-7-jgross@suse.com
2026-01-12paravirt: Remove asm/paravirt_api_clock.hJuergen Gross
All architectures supporting CONFIG_PARAVIRT share the same contents of asm/paravirt_api_clock.h: #include <asm/paravirt.h> So remove all incarnations of asm/paravirt_api_clock.h and remove the only place where it is included, as there asm/paravirt.h is included anyway. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> # powerpc, scheduler bits Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260105110520.21356-6-jgross@suse.com
2026-01-10KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkersWill Deacon
Commit ddcadb297ce5 ("KVM: arm64: Ignore EAGAIN for walks outside of a fault") introduced a new walker flag ('KVM_PGTABLE_WALK_HANDLE_FAULT') to KVM's page-table code. When set, the walk logic maintains its previous behaviour of terminating a walk as soon as the visitor callback returns an error. However, when the flag is clear, the walk will continue if the visitor returns -EAGAIN and the error is then suppressed and returned as zero to the caller. Clearing the flag is beneficial when write-protecting a range of IPAs with kvm_pgtable_stage2_wrprotect() but is not useful in any other cases, either because we are operating on a single page (e.g. kvm_pgtable_stage2_mkyoung() or kvm_phys_addr_ioremap()) or because the early termination is desirable (e.g. when mapping pages from a fault in user_mem_abort()). Subsequently, commit e912efed485a ("KVM: arm64: Introduce the EL1 pKVM MMU") hooked up pKVM's hypercall interface to the MMU code at EL1 but failed to propagate any of the walker flags. As a result, page-table walks at EL2 fail to set KVM_PGTABLE_WALK_HANDLE_FAULT even when the early termination semantics are desirable on the fault handling path. Rather than complicate the pKVM hypercall interface, invert the flag so that the whole thing can be simplified and only pass the new flag ('KVM_PGTABLE_WALK_IGNORE_EAGAIN') from the wrprotect code. Cc: Fuad Tabba <tabba@google.com> Cc: Quentin Perret <qperret@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oupton@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Quentin Perret <qperret@google.com> Link: https://msgid.link/20260105154939.11041-2-will@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-09Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Do not return false if !preemptible() in current_in_efi(). EFI runtime services can now run with preemption enabled - Fix uninitialised variable in the arm MPAM driver, reported by sparse - Fix partial kasan_reset_tag() use in change_memory_common() when calculating page indices or comparing ranges - Save/restore TCR2_EL1 during suspend/resume, otherwise the E0POE bit is lost * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix cleared E0POE bit after cpu_suspend()/resume() arm64: mm: Fix incomplete tag reset in change_memory_common() arm_mpam: Stop using uninitialized variables in __ris_msmon_read() arm64/efi: Don't fail check current_in_efi() if preemptible
2026-01-09KVM: arm64: Don't blindly set set PSTATE.PAN on guest exitMarc Zyngier
We set PSTATE.PAN to 1 on exiting from a guest if PAN support has been compiled in and that it exists on the HW. However, this is not necessarily correct. In a nVHE configuration, there is no notion of PAN at EL2, so setting PSTATE.PAN to anything is pointless. Furthermore, not setting PAN to 0 when CONFIG_ARM64_PAN isn't set means we run with the *guest's* PSTATE.PAN (which might be set to 1), and we will explode on the next userspace access. Yes, the architecture is delightful in that particular corner. Fix the whole thing by always setting PAN to something when running VHE (which implies PAN support), and only ignore it when running nVHE. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20260107124600.2736328-1-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-09KVM: arm64: Remove unused vcpu_{clear,set}_wfx_traps()Dongxu Sun
Function vcpu_{clear,set}_wfx_traps() are unused since commit 0b5afe05377d7 ("KVM: arm64: Add early_param to control WFx trapping"). Remove it. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Dongxu Sun <sundongxu1024@163.com> Link: https://msgid.link/20260109080226.761107-1-sundongxu1024@163.com Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-09arm64: Fix cleared E0POE bit after cpu_suspend()/resume()Yeoreum Yun
TCR2_ELx.E0POE is set during smp_init(). However, this bit is not reprogrammed when the CPU enters suspension and later resumes via cpu_resume(), as __cpu_setup() does not re-enable E0POE and there is no save/restore logic for the TCR2_ELx system register. As a result, the E0POE feature no longer works after cpu_resume(). To address this, save and restore TCR2_EL1 in the cpu_suspend()/cpu_resume() path, rather than adding related logic to __cpu_setup(), taking into account possible future extensions of the TCR2_ELx feature. Fixes: bf83dae90fbc ("arm64: enable the Permission Overlay Extension for EL0") Cc: <stable@vger.kernel.org> # 6.12.x Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-01-08KVM: arm64: Fix EL2 S1 XN handling for hVHE setupsMarc Zyngier
The current XN implementation is tied to the EL2 translation regime, and fall flat on its face with the EL2&0 one that is used for hVHE, as the permission bit for privileged execution is a different one. Fixes: 6537565fd9b7f ("KVM: arm64: Adjust EL2 stage-1 leaf AP bits when ARM64_KVM_HVHE is set") Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Fuad Tabba <tabba@google.com> Link: https://msgid.link/20251210173024.561160-2-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2026-01-07KVM: arm64: Shuffle KVM_HOST_DATA_FLAG_* indicesMark Rutland
There's a gap in the KVM_HOST_DATA_FLAG_* indices since the removal of KVM_HOST_DATA_FLAG_HOST_SVE_ENABLED and KVM_HOST_DATA_FLAG_HOST_SME_ENABLED in commits: * 459f059be702 ("KVM: arm64: Remove VHE host restore of CPACR_EL1.ZEN") * 407a99c4654e ("KVM: arm64: Remove VHE host restore of CPACR_EL1.SMEN") Shuffle the indices to remove the gap, as Oliver requested at the time of the removals: https://lore.kernel.org/linux-arm-kernel/Z6qC4qn47ONfDCSH@linux.dev/ There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Fuad Tabba <tabba@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Will Deacon <will@kernel.org> Tested-by: Fuad Tabba <tabba@google.com> Reviewed-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://patch.msgid.link/20260106173707.3292074-3-mark.rutland@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-06arm64/efi: Don't fail check current_in_efi() if preemptibleBen Horgan
As EFI runtime services can now be run without disabling preemption remove the check for non preemptible in current_in_efi(). Without this change, firmware errors that were previously recovered from by __efi_runtime_kernel_fixup_exception() will lead to a kernel oops. Fixes: a5baf582f4c0 ("arm64/efi: Call EFI runtime services without disabling preemption") Signed-off-by: Ben Horgan <ben.horgan@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Richard Lyu <richard.lyu@suse.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-01-05arm64: Avoid memcpy() for syscall_get_arguments()Jinjie Ruan
Do not use memcpy() to extract syscall arguments from struct pt_regs but rather just perform direct assignments. Update syscall_set_arguments() too to keep syscall_get_arguments() and syscall_set_arguments() in sync. With Generic Entry patch[1] and turn on audit, the performance benchmarks from perf bench basic syscall on kunpeng920 gives roughly a 1% performance uplift. | Metric | W/O this patch | With this patch | Change | | ---------- | -------------- | --------------- | --------- | | Total time | 2.241 [sec] | 2.211 [sec] | ↓1.36% | | usecs/op | 0.224157 | 0.221146 | ↓1.36% | | ops/sec | 4,461,157 | 4,501,409 | ↑0.9% | Disassembly shows that using direct assignment causes syscall_set_arguments() to be inlined and cuts the instruction count by five or six compared to memcpy(). Because __audit_syscall_entry() only uses four syscall arguments, the compiler has also elided the copy of regs->regs[4] and regs->regs[5]. Before: <syscall_get_arguments.constprop.0>: aa0103e2 mov x2, x1 91002003 add x3, x0, #0x8 f9408804 ldr x4, [x0, #272] f8008444 str x4, [x2], #8 a9409404 ldp x4, x5, [x0, #8] a9009424 stp x4, x5, [x1, #8] a9418400 ldp x0, x1, [x0, #24] a9010440 stp x0, x1, [x2, #16] f9401060 ldr x0, [x3, #32] f9001040 str x0, [x2, #32] d65f03c0 ret d503201f nop After: a9408e82 ldp x2, x3, [x20, #8] 2a1603e0 mov w0, w22 f9400e84 ldr x4, [x20, #24] f9408a81 ldr x1, [x20, #272] 9401c4ba bl ffff800080215ca8 <__audit_syscall_entry> This also aligns the implementation with x86 and RISC-V. [1]: https://lore.kernel.org/all/20251126071446.3234218-1-ruanjinjie@huawei.com/ Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-05syscall.h: Remove unused SYSCALL_MAX_ARGSJinjie Ruan
The "SYSCALL_MAX_ARGS" appears to have been unused since commit 32d92586629a ("syscalls: Remove start and number from syscall_set_arguments() args"), so remove it. Fixes: 32d92586629a ("syscalls: Remove start and number from syscall_set_arguments() args") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Will Deacon <will@kernel.org>
2025-12-14arm64/simd: Avoid pointless clearing of FP/SIMD bufferArd Biesheuvel
The buffer provided to kernel_neon_begin() is only used if the task is scheduled out while the FP/SIMD is in use by the kernel, or when such a section is interrupted by a softirq that also uses the FP/SIMD. IOW, this happens rarely, and even if it happened often, there is still no reason for this buffer to be cleared beforehand, which happens unconditionally, due to the use of a compound literal expression. So define that buffer variable explicitly, and mark it as __uninitialized so that it will not get cleared, even when -ftrivial-auto-var-init is in effect. This requires some preprocessor gymnastics, due to the fact that the variable must be defined throughout the entire guarded scope, and the expression ({ struct user_fpsimd_state __uninitialized st; &st; }) is problematic in that regard, even though the compilers seem to permit it. So instead, repeat the 'for ()' trick that is also used in the implementation of the guarded scope helpers. Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kees Cook <keescook@chromium.org> Cc: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Fixes: 4fa617cc6851 ("arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack") Link: https://lore.kernel.org/r/20251209054848.998878-2-ardb@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-05Merge tag 'driver-core-6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "Arch Topology: - Move parse_acpi_topology() from arm64 to common code for reuse in RISC-V CPU: - Expose housekeeping CPUs through /sys/devices/system/cpu/housekeeping - Print a newline (or 0x0A) instead of '(null)' reading /sys/devices/system/cpu/nohz_full when nohz_full= is not set debugfs - Remove (broken) 'no-mount' mode - Remove redundant access mode checks in debugfs_get_tree() and debugfs_create_*() functions Devres: - Remove unused devm_free_percpu() helper - Move devm_alloc_percpu() from device.h to devres.h Firmware Loader: - Replace simple_strtol() with kstrtoint() - Do not call cancel_store() when no upload is in progress kernfs: - Increase struct super_block::maxbytes to MAX_LFS_FILESIZE - Fix a missing unwind path in __kernfs_new_node() Misc: - Increase the name size in struct auxiliary_device_id to 40 characters - Replace system_unbound_wq with system_dfl_wq and add WQ_PERCPU to alloc_workqueue() Platform: - Replace ERR_PTR() with IOMEM_ERR_PTR() in platform ioremap functions Rust: - Auxiliary: - Unregister auxiliary device on parent device unbind - Move parent() to impl Device; implement device context aware parent() for Device<Bound> - Illustrate how to safely obtain a driver's device private data when calling from an auxiliary driver into the parant device driver - DebugFs: - Implement support for binary large objects - Device: - Let probe() return the driver's device private data as pinned initializer, i.e. impl PinInit<Self, Error> - Implement safe accessor for a driver's device private data for Device<Bound> (returned reference can't out-live driver binding and guarantees the correct private data type) - Implement AsBusDevice trait, to be used by class device abstractions to derive the bus device type of the parent device - DMA: - Store raw pointer of allocation as NonNull - Use start_ptr() and start_ptr_mut() to inherit correct mutability of self - FS: - Add file::Offset type alias - I2C: - Add abstractions for I2C device / driver infrastructure - Implement abstractions for manual I2C device registrations - I/O: - Use "kernel vertical" style for imports - Define ResourceSize as resource_size_t - Move ResourceSize to top-level I/O module - Add type alias for phys_addr_t - Implement Rust version of read_poll_timeout_atomic() - PCI: - Use "kernel vertical" style for imports - Move I/O and IRQ infrastructure to separate files - Add support for PCI interrupt vectors - Implement TryInto<IrqRequest<'a>> for IrqVector<'a> to convert an IrqVector bound to specific pci::Device into an IrqRequest bound to the same pci::Device's parent Device - Leverage pin_init_scope() to get rid of redundant Result in IRQ methods - PinInit: - Add {pin_}init_scope() to execute code before creating an initializer - Platform: - Leverage pin_init_scope() to get rid of redundant Result in IRQ methods - Timekeeping: - Implement abstraction of udelay() - Uaccess: - Implement read_slice_partial() and read_slice_file() for UserSliceReader - Implement write_slice_partial() and write_slice_file() for UserSliceWriter sysfs: - Prepare the constification of struct attribute" * tag 'driver-core-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (75 commits) rust: pci: fix build failure when CONFIG_PCI_MSI is disabled debugfs: Fix default access mode config check debugfs: Remove broken no-mount mode debugfs: Remove redundant access mode checks driver core: Check drivers_autoprobe for all added devices driver core: WQ_PERCPU added to alloc_workqueue users driver core: replace use of system_unbound_wq with system_dfl_wq tick/nohz: Expose housekeeping CPUs in sysfs tick/nohz: avoid showing '(null)' if nohz_full= not set sysfs/cpu: Use DEVICE_ATTR_RO for nohz_full attribute kernfs: fix memory leak of kernfs_iattrs in __kernfs_new_node fs/kernfs: raise sb->maxbytes to MAX_LFS_FILESIZE mod_devicetable: Bump auxiliary_device_id name size sysfs: simplify attribute definition macros samples/kobject: constify 'struct foo_attribute' samples/kobject: add is_visible() callback to attribute group sysfs: attribute_group: enable const variants of is_visible() sysfs: introduce __SYSFS_FUNCTION_ALTERNATIVE() sysfs: transparently handle const pointers in ATTRIBUTE_GROUPS() sysfs: attribute_group: allow registration of const attribute ...
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-02Merge tag 'fpsimd-on-stack-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull arm64 FPSIMD on-stack buffer updates from Eric Biggers: "This is a core arm64 change. However, I was asked to take this because most uses of kernel-mode FPSIMD are in crypto or CRC code. In v6.8, the size of task_struct on arm64 increased by 528 bytes due to the new 'kernel_fpsimd_state' field. This field was added to allow kernel-mode FPSIMD code to be preempted. Unfortunately, 528 bytes is kind of a lot for task_struct. This regression in the task_struct size was noticed and reported. Recover that space by making this state be allocated on the stack at the beginning of each kernel-mode FPSIMD section. To make it easier for all the users of kernel-mode FPSIMD to do that correctly, introduce and use a 'scoped_ksimd' abstraction" * tag 'fpsimd-on-stack-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (23 commits) lib/crypto: arm64: Move remaining algorithms to scoped ksimd API lib/crypto: arm/blake2b: Move to scoped ksimd API arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack arm64/fpu: Enforce task-context only for generic kernel mode FPU net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 arm64/xorblocks: Switch to 'ksimd' scoped guard API crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API crypto/arm64: polyval - Switch to 'ksimd' scoped guard API crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API raid6: Move to more abstract 'ksimd' guard API crypto: aegis128-neon - Move to more abstract 'ksimd' guard API crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API ...
2025-12-02Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "These are the arm64 updates for 6.19. The biggest part is the Arm MPAM driver under drivers/resctrl/. There's a patch touching mm/ to handle spurious faults for huge pmd (similar to the pte version). The corresponding arm64 part allows us to avoid the TLB maintenance if a (huge) page is reused after a write fault. There's EFI refactoring to allow runtime services with preemption enabled and the rest is the usual perf/PMU updates and several cleanups/typos. Summary: Core features: - Basic Arm MPAM (Memory system resource Partitioning And Monitoring) driver under drivers/resctrl/ which makes use of the fs/rectrl/ API Perf and PMU: - Avoid cycle counter on multi-threaded CPUs - Extend CSPMU device probing and add additional filtering support for NVIDIA implementations - Add support for the PMUs on the NoC S3 interconnect - Add additional compatible strings for new Cortex and C1 CPUs - Add support for data source filtering to the SPE driver - Add support for i.MX8QM and "DB" PMU in the imx PMU driver Memory managemennt: - Avoid broadcast TLBI if page reused in write fault - Elide TLB invalidation if the old PTE was not valid - Drop redundant cpu_set_*_tcr_t0sz() macros - Propagate pgtable_alloc() errors outside of __create_pgd_mapping() - Propagate return value from __change_memory_common() ACPI and EFI: - Call EFI runtime services without disabling preemption - Remove unused ACPI function Miscellaneous: - ptrace support to disable streaming on SME-only systems - Improve sysreg generation to include a 'Prefix' descriptor - Replace __ASSEMBLY__ with __ASSEMBLER__ - Align register dumps in the kselftest zt-test - Remove some no longer used macros/functions - Various spelling corrections" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits) arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic arm64/pageattr: Propagate return value from __change_memory_common arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS KVM: arm64: selftests: Consider all 7 possible levels of cache KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros Documentation/arm64: Fix the typo of register names ACPI: GTDT: Get rid of acpi_arch_timer_mem_init() perf: arm_spe: Add support for filtering on data source perf: Add perf_event_attr::config4 perf/imx_ddr: Add support for PMU in DB (system interconnects) perf/imx_ddr: Get and enable optional clks perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe() dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT arm64: mm: use untagged address to calculate page index MAINTAINERS: new entry for MPAM Driver arm_mpam: Add kunit tests for props_mismatch() arm_mpam: Add kunit test for bitmap reset arm_mpam: Add helper to reset saved mbwu state ...
2025-12-02Merge tag 'kvmarm-6.19' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.19 - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner. - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ. - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU. - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM. - Minor fixes to KVM and selftests
2025-12-02Merge tag 'core-uaccess-2025-11-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(*from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user.*.access(). This also corrects a couple of imbalanced masked_*_begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" * tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user() ARM: uaccess: Implement missing __get_user_asm_dword()
2025-12-01Merge tag 'core-bugs-2025-12-01' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull bug handling infrastructure updates from Ingo Molnar: "Core updates: - Improve WARN(), which has vararg printf like arguments, to work with the x86 #UD based WARN-optimizing infrastructure by hiding the format in the bug_table and replacing this first argument with the address of the bug-table entry, while making the actual function that's called a UD1 instruction (Peter Zijlstra) - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo Molnar, s390 support by Heiko Carstens) Fixes and cleanups: - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens) - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter Zijlstra)" * tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS x86/bug: Fix BUG_FORMAT vs KASLR x86_64/bug: Inline the UD1 x86/bug: Implement WARN_ONCE() x86_64/bug: Implement __WARN_printf() x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED x86/bug: Add BUG_FORMAT basics bug: Allow architectures to provide __WARN_printf() bug: Implement WARN_ON() using __WARN_FLAGS() bug: Add report_bug_entry() bug: Add BUG_FORMAT_ARGS infrastructure bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS bug: Add BUG_FORMAT infrastructure x86: Rework __bug_table helpers bugs/s390: Remove private WARN_ON() implementation bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS() ...
2025-12-01Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/nextOliver Upton
* kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>