summaryrefslogtreecommitdiff
path: root/arch/arm64
AgeCommit message (Collapse)Author
2 daysMerge tag 'rust-fixes-7.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull Rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Fix 'rustc-option' (the Makefile one) when cross-compiling that leads to build or boot failures in certain configs - Work around a Rust compiler bug (already fixed for Rust 1.98.0) thats lead to boot failures in certain configs due to missing 'uwtable' LLVM module flags - Support a Rust compiler change (starting with Rust 1.98.0) in the unstable target specification JSON files - Forbid Rust + arm + KASAN configs, which do not build 'kernel' crate: - Fix NOMMU build by adding a missing helper" * tag 'rust-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: rust: x86: support Rust >= 1.98.0 target spec rust: arm64: set uwtable llvm module flag for CONFIG_UNWIND_TABLES rust: helpers: add is_vmalloc_addr wrapper for NOMMU builds rust: kasan/kbuild: fix rustc-option when cross-compiling ARM: Do not select HAVE_RUST when KASAN is enabled
3 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "arm64: - Correctly drop the ITS translation cache reference when it actually gets invalidated - Take the SRCU lock for SW page table walks - Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming inaccessible from EL0 after running a guest - Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init and MMU notifiers are mutually exclusive - Correctly handle FEAT_XNX at stage-2 s390: - More fixes for the new page table management and nested virtualization x86: - More fixes for GHCB issues: - Read start/end indices of page size change requests exactly once per vmexit - Unmap and unpin the GHCB as needed on vCPU free" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (23 commits) KVM: arm64: Correctly identify executable PTEs at stage-2 KVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNX KVM: arm64: Reassign nested_mmus array behind mmu_lock KVM: arm64: Restore POR_EL0 access to host EL0 KVM: arm64: Take the SRCU lock for page table walks in fault injection and AT emulation KVM: arm64: vgic-its: Drop the translation cache reference only for the erased entry KVM: SEV: Unmap and unpin the GHCB as needed on vCPU free KVM: SEV: Decouple the need to sync the GHCB SA from the need to free the SA KVM: SEV: Move sev_free_vcpu() down below sev_es_unmap_ghcb() KVM: Don't WARN if memory is dirtied without a vCPU when the VM is dying KVM: SEV: Read start/end indices of PSC requests exactly once per #VMGEXIT KVM: SEV: Add an anonymous "psc" struct to track current PSC metadata KVM: SEV: Make it more obvious when KVM is writing back the current PSC index KVM: s390: Remove ptep_zap_softleaf_entry() KVM: s390: Fix possible reference leak in fault-in code KVM: s390: Prevent memslots outside the ASCE range KVM: s390: Lock pte when making page secure KVM: s390: Fix fault-in code KVM: s390: vsie: Fix rmap handling in _do_shadow_crste() KVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl() ...
3 daysMerge tag 'kvmarm-fixes-7.1-5' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #5 - Correctly drop the ITS translation cache reference when it actually gets invalidated - Take the SRCU lock for SW page table walks - Restore POR_EL0 access to host EL0, avoiding POR_EL0 becoming inaccessible from EL0 after running a guest - Reassign nested_mmus array behind mmu_lock, ensuring that vcpu init and MMU notifiers are mutually exclusive - Correctly handle FEAT_XNX at stage-2
4 daysKVM: arm64: Correctly identify executable PTEs at stage-2Oliver Upton
KVM invalidates the I-cache before installing an executable PTE on implementations without DIC. Unfortunately, support for FEAT_XNX broke this check as KVM_PTE_LEAF_ATTR_HI_S2_XN was expanded to a bitfield. Fix it by reusing kvm_pgtable_stage2_pte_prot() and testing the abstract permission bits instead. Fixes: 2608563b466b ("KVM: arm64: Add support for FEAT_XNX stage-2 permissions") Reported-by: Sashiko (gemini/gemini-3.1-pro-preview) Signed-off-by: Oliver Upton <oupton@kernel.org> Reviewed-by: Wei-Lin Chang <weilin.chang@arm.com> Link: https://patch.msgid.link/20260602165901.52800-3-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
4 daysKVM: arm64: nv: Fix handling of XN[0] when !FEAT_XNXOliver Upton
XN has already been extracted from its bitfield position so using FIELD_PREP() on the mask that clears XN[0] is completely broken, having the effect of unconditionally granting execute permissions... Fix the obvious mistake by manipulating the right bit. Cc: stable@vger.kernel.org Fixes: d93febe2ed2e ("KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2") Reviewed-by: Wei-Lin Chang <weilin.chang@arm.com> Signed-off-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/20260602165901.52800-2-oupton@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
4 daysKVM: arm64: Reassign nested_mmus array behind mmu_lockHyunwoo Kim
kvm->arch.nested_mmus[] is walked under kvm->mmu_lock, including from the MMU notifier path (kvm_unmap_gfn_range() -> kvm_nested_s2_unmap()), which can run at any time. kvm_vcpu_init_nested() reallocates the array and frees the old buffer while holding only kvm->arch.config_lock, so such a walker can reference the freed array. Allocate the new array outside of mmu_lock, as the allocation can sleep. Under the lock, copy the existing entries, fix up the back pointers and reassign the array. Free the old buffer after dropping the lock, as kvfree() can sleep as well. Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/aiKIVVeIr1aAB1yp@v4bel Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger,kernel.org
4 daysKVM: arm64: Restore POR_EL0 access to host EL0Joey Gouly
CPTR_EL2.E0POE was being cleared in __deactivate_cptr_traps_vhe(), which meant that any accesses to POR_EL0 from host EL0 would trap and be reported to userspace as an Illegal instruction. This would happen after running any VM, regardless if it used POE or not. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Link: https://sashiko.dev/#/patchset/20260602155430.2088142-1-maz@kernel.org?part=1 Link: https://patch.msgid.link/20260604105434.2297268-1-joey.gouly@arm.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger,kernel.org
4 daysKVM: arm64: Take the SRCU lock for page table walks in fault injection and ↵Hyunwoo Kim
AT emulation walk_s1() and kvm_walk_nested_s2() expect to be called while holding kvm->srcu to guard against memslot changes. While this is generally the case, __kvm_at_s12() and __kvm_find_s1_desc_level() call into the respective walkers without taking kvm->srcu. Fix by acquiring kvm->srcu prior to the table walk in both instances. Cc: stable@vger.kernel.org Fixes: 50f77dc87f13 ("KVM: arm64: Populate level on S1PTW SEA injection") Fixes: be04cebf3e78 ("KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}") Suggested-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/aiAZfdeyanIvP8SD@v4bel Signed-off-by: Marc Zyngier <maz@kernel.org>
4 daysKVM: arm64: vgic-its: Drop the translation cache reference only for the ↵Hyunwoo Kim
erased entry vgic_its_invalidate_cache() walks the per-ITS translation cache with xa_for_each() and drops the cache's reference on each entry with vgic_put_irq(). It puts the iterated pointer, though, rather than the value returned by xa_erase(). The function is called from contexts that do not exclude one another: the ITS command handlers hold its_lock, the GITS_CTLR write path holds cmd_lock, and the path that clears EnableLPIs in a redistributor's GICR_CTLR holds neither. Two or more of them can drain the same cache concurrently, and if each one observes the same entry, erases it and then puts it, the single reference the cache holds on that entry is dropped more than once. The entry can then be freed while an ITE still maps it. xa_erase() is atomic and returns the previous entry, so put only the entry that this context actually removed. The cache reference is then dropped exactly once per entry even when the invalidations run concurrently, and the behavior is unchanged when only one context runs. Fixes: 8201d1028caa ("KVM: arm64: vgic-its: Maintain a translation cache per ITS") Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Oliver Upton <oupton@kernel.org> Link: https://patch.msgid.link/ah2c5lu4JbUg7dj-@v4bel Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
6 daysMerge tag 'soc-fixes-7.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "Following the previous set of fixes, this addresses another significant number of small issues found in firmware drivers (tee, optee, qcomtee, qcom ice, exynos acpm) drivers through various tools. This is about error handling, resource leaks, concurrency and a use-after-free bug. The fixes for the Qualcomm ICE driver also introduce interface changes in the UFS and MMC drivers using it. Outside of firmware drivers, there are a few fixes across the tree: - Minor driver code mistakes in the Atmel EBI memory controller, the i.MX soc ID driver and socfpga boot logic - A defconfig change to avoid a boot time regression on multiple qualcomm boards - Device tree fixes for qualcomm, at91 and gemini, addressing mostly minor configuration mistakes" * tag 'soc-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (28 commits) firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion firmware: samsung: acpm: Fix missing LKMM barriers in sequence allocator firmware: samsung: acpm: Fix false timeouts and Use-After-Free in polling ARM: dts: gemini: Fix partition offsets ARM: socfpga: Fix OF node refcount leak in SMP setup soc: qcom: ice: Fix the error code when 'qcom,ice' property is not found arm64: dts: qcom: eliza: Add power-domain and iface clk for ice node arm64: dts: qcom: milos: Add power-domain and iface clk for ice node tee: qcomtee: add missing va_end in early return qcomtee_object_user_init() tee: fix params_from_user() error path in tee_ioctl_supp_recv tee: shm: fix shm leak in register_shm_helper() tee: fix tee_ioctl_object_invoke_arg padding arm64: defconfig: Enable PCI M.2 power sequencing driver scsi: ufs: ufs-qcom: Remove NULL check from devm_of_qcom_ice_get() mmc: sdhci-msm: Remove NULL check from devm_of_qcom_ice_get() soc: qcom: ice: Return proper error codes from devm_of_qcom_ice_get() instead of NULL soc: qcom: ice: Return -ENODEV if the ICE platform device is not found soc: qcom: ice: Fix race between qcom_ice_probe() and of_qcom_ice_get() ARM: dts: microchip: sam9x7: fix GMAC clock configuration firmware: samsung: acpm: Fix mailbox channel leak on probe error ...
10 daysrust: arm64: set uwtable llvm module flag for CONFIG_UNWIND_TABLESAlice Ryhl
Due to a rustc bug [1] the -Cforce-unwind-tables=y flag only emits the uwtable annotation for functions, but not for the module. This means that compiler-generated functions such as 'asan.module_ctor' do not receive the uwtable annotation. When CONFIG_UNWIND_PATCH_PAC_INTO_SCS is enabled, this leads to boot failures because the dwarf information emitted for the kasan constructors is wrong, which causes the SCS boot patching code to patch the constructor in an illegal manner. Specifically, the paciasp instruction is patched, but the autiasp instruction is not. This mismatch leads to a crash when the constructor is called during boot. ================================================================== BUG: KASAN: global-out-of-bounds in do_basic_setup+0x4c/0x90 Read of size 8 at addr ffffffe3cc7eb488 by task swapper/0/1 Specifically the faulting instruction is the (*fn)() to invoke the constructor in do_ctors() of the init/main.c file. Once the fix lands in rustc, this flag can be made conditional on the rustc version. Note that passing the flag on a rustc with the fix present has no effect. [ The fix [1] has landed for Rust 1.98.0 (expected release on 2026-08-20). Thus add a version check as discussed. - Miguel ] Fixes: d077242d68a3 ("rust: support for shadow call stack sanitizer") Cc: stable@kernel.org Link: https://github.com/rust-lang/rust/pull/156973 [1] Reported-by: Bo Ye <bo.ye@mediatek.com> Debugged-by: Isaac Manjarres <isaacmanjarres@google.com> Debugged-by: Sami Tolvanen <samitolvanen@google.com> Tested-by: Isaac Manjarres <isaacmanjarres@google.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260527-uwtable-module-flag-v1-1-caa41342be4b@google.com [ Adjusted link and comment. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
10 daysMerge tag 'kvmarm-fixes-7.1-4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #4 - Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by making sure the config symbol is correctly spelled out in the code - Don't reset the AArch32 view of the PMU counters to zero when the guest is writing to them - Fix an assorted collection of memory leaks in the newly added tracing code - Fix the capping of ZCR_EL2 which could be used in an unsanitised way by an L2 guest
11 daysKVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisorMark Brown
ZCR_EL2 can be updated by a VHE guest hypervisor either using ZCR_EL2 (which traps) or ZCR_EL1 (which does not trap). KVM handles both in different way: - on ZCR_EL2 trap, ZCR_EL2.LEN is immediately capped at the VM's own VL limit. This has the potential to break existing SW that relies on the full LEN field to be stateful. - on ZCR_EL1 access, we do absolutely nothing. On restoring the SVE context for an L2 guest, we directly restore the guest hypervisor's view of ZCR_EL2 into the physical ZCR_EL2. If the guest's view of the register was updated using the ZCR_EL2 accessor, the value has already been sanitised (with the caveat mentioned above). But if the guest used ZCR_EL1, the raw value is written into the HW, and the L2 guest can now access VLs that it shouldn't. Fix all the above by moving the VL capping to the restore points, ensuring that: - the HW is always programmed with a capped value, irrespective of the accessor being used, - the ZCR_EL2.LEN field is always completely stateful, irrespective of the accessor being used. Additionally, move ZCR_EL2 to be a sanitised register, ensuring that only the LEN field is actually stateful. This requires some creative construction of the RES0 mask, as the sysreg generation script does not yet generate RAZ/WI fields. Fixes: b3d29a823099 ("KVM: arm64: nv: Handle ZCR_EL2 traps") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260529-kvm-arm64-fix-zcr-len-nv-v2-1-86cad51992bd@kernel.org [maz: rewrote commit message, tidy up access_zcr_el2()] Signed-off-by: Marc Zyngier <maz@kernel.org>
11 daysMerge tag 'qcom-arm64-defconfig-fixes-for-7.1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 defconfig fixes for v7.1 A number of targets now depends on the M.2 PCIe power sequencing driver, enable this to keep these devices functional with a defconfig build. * tag 'qcom-arm64-defconfig-fixes-for-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: defconfig: Enable PCI M.2 power sequencing driver Signed-off-by: Arnd Bergmann <arnd@arndb.de>
11 daysMerge tag 'qcom-arm64-fixes-for-7.1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 DeviceTree fixes for v7.1 Add missing power-domain and iface clocks for the ICE node of Eliza and Milos to avoid the validation errors that resulted from late binding changes. Also drop the reference clock for the USB QMP PHYs, for the same reason. Avoid touching the 20'th I2C bus on the Hamoa-based (X Elite) Dell laptops, as this conflicts with the battery management firmware. * tag 'qcom-arm64-fixes-for-7.1' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: eliza: Add power-domain and iface clk for ice node arm64: dts: qcom: milos: Add power-domain and iface clk for ice node arm64: dts: qcom: x1-dell-thena: remove i2c20 (battery SMBus) and reserve its pins arm64: dts: qcom: glymur: Drop RPMh CXO clocks from QMP PHYs Signed-off-by: Arnd Bergmann <arnd@arndb.de>
13 daysKVM: arm64: Fix memory leak in hyp_trace_unload()Vincent Donnefort
During trace remote loading, hyp_trace_load() allocates the descriptor pages but fails to store the allocated size in trace_buffer->desc_size. As a result, when unloading the trace buffer, hyp_trace_unload() calls free_pages_exact() with a size of 0 which fails to free the memory. Fix this by updating the descriptor size in trace_buffer->desc_size. Fixes: 3aed038aac8d ("KVM: arm64: Add trace remote for the nVHE/pKVM hyp") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260521124613.911067-4-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
13 daysKVM: arm64: Fix rollback in hyp_trace_buffer_share_hyp()Vincent Donnefort
When sharing the trace buffer with the hypervisor, if sharing a page fails, the rollback path in hyp_trace_buffer_share_hyp() misses unsharing the metadata page (meta_va) which was successfully shared before entering the page sharing loop. Additionally, if a failure occurs, the cleanup calls hyp_trace_buffer_unshare_hyp() with an incorrect CPU index. Since that CPU's pages were already rolled back locally in the loop, this leads to duplicate unsharing attempts. Fix both issues affecting the rollback. Fixes: 3aed038aac8d ("KVM: arm64: Add trace remote for the nVHE/pKVM hyp") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260521124613.911067-3-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
13 daysKVM: arm64: Fix meta-page unsharing in pKVM hyp tracingVincent Donnefort
As the hyp_trace_buffer_unshare_hyp() function name suggests we should unshare all the previously shared pages, otherwise we leak hyp-shared pages which won't be reusable for hyp memory. Fix the typo by calling __unshare_page() on the meta-page, ensuring all previously shared pages are correctly unshared. Fixes: 3aed038aac8d ("KVM: arm64: Add trace remote for the nVHE/pKVM hyp") Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260521124613.911067-2-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
13 daysKVM: arm64: PMU: Preserve AArch32 counter low bitsQiang Ma
AArch32 writes to PMU event counters cannot update the top 32 bits, even when PMUv3p5 makes the counters 64-bit. KVM therefore needs to preserve the existing high half and only update the low half written by the guest, unless the caller explicitly forces a full reset through PMCR.P. The current code masks @val down to the old high half before taking lower_32_bits(val), which means the low half is always zero. As a result, AArch32 writes to event counters discard the guest-provided low 32 bits instead of storing them. Build the new value from the old high 32 bits and the low 32 bits of the value supplied by the guest. Fixes: 26d2d0594d70 ("KVM: arm64: PMU: Do not let AArch32 change the counters' top 32 bits") Signed-off-by: Qiang Ma <maqianga@uniontech.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20260526074640.791991-1-maqianga@uniontech.com Cc: stable@vger.kernel.org
2026-05-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "arm64: - Fix ITS EventID sanitisation when restoring an interrupt translation table. - Fix PPI memory leak when failing to initialise a vcpu. - Correctly return an error when the validation of a hypervisor trace descriptor fails, and limit this validation to protected mode only. RISC-V: - Fix invalid HVA warning in steal-time recording - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and pmu_snapshot_set_shmem() - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler - Fix sign extension of value for MMIO loads s390: - Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by the page table rewrite. x86: - Apply erratum #1235 workaround (disable AVIC IPI virtualization) on Hygon Family 18h, just like on AMD Family 17h. - When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the VM's configured APIC bus frequency instead of the default. This is less confusing (read: not wrong) and makes it easier to fill in CPUID information that communicates the APIC bus frequency to the guest. Selftests: - Do not include glibc-internal <bits/endian.h>; it worked by chance and broke building KVM selftests with musl" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235) KVM: selftests: Verify that KVM returns the configured APIC cycle length KVM: x86: Return the VM's configured APIC bus frequency when queried KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h> KVM: s390: Properly reset zero bit in PGSTE KVM: s390: vsie: Fix redundant rmap entries KVM: s390: vsie: Fix unshadowing logic KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors KVM: s390: vsie: Fix memory leak when unshadowing KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc KVM: arm64: vgic: Free private_irqs when init fails after allocation KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits RISC-V: KVM: Fix sign extension for MMIO loads RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM RISC-V: KVM: Fix invalid HVA warning in steal-time recording
2026-05-23Merge tag 'kvm-s390-master-7.1-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: some vSIE and UCONTROL fixes Fix some memory issues and some hangs in vSIE.
2026-05-23Merge tag 'kvmarm-fixes-7.1-3' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #3 - Fix ITS EventID sanitisation when restoring an interrupt translation table. - Fix PPI memory leak when failing to initialise a vcpu. - Correctly return an error when the validation of a hypervisor trace descriptor fails, and limit this validation to protected mode only.
2026-05-22Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Handle probe on hinted conditional branch instructions. BC.cond instructions can be simulated in the same way as B.cond instructions, so extend the decode mask for B.cond to cover BC.cond - Flush the walk cache when unsharing PMD tables. Recent changes to huge_pmd_unshare() introduced mmu_gather::unshared_tables but the arm64 code was still treating the TLB flushing as only targeting leaf entries (TLBI VALE1IS). Fix it by using non-leaf-only instructions (TLBI VAE1IS) when tlb->unshared_tables is set * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: tlb: Flush walk cache when unsharing PMD tables arm64: probes: Handle probes on hinted conditional branch instructions
2026-05-22arm64: tlb: Flush walk cache when unsharing PMD tablesZeng Heng
When huge_pmd_unshare() is called to unshare a PMD table, the tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true but the aarch64 tlb_flush() only checked tlb->freed_tables to determine whether to use TLBF_NONE (vae1is, invalidates walk cache) or TLBF_NOWALKCACHE (vale1is, leaf-only). This caused the stale PMD page table entry to remain in the walk cache after unshare, potentially leading to incorrect page table walks. Fix by including unshared_tables in the check, so that when unsharing tables, TLBF_NONE is used and the walk cache is properly invalidated. Here is the detailed distinction between vae1is and vale1is: | Instruction Combination | Actual Invalidation Scope | | ------------------------ | --------------------------------------------------| | `VAE1IS` + TTL=`0` | All entries at all levels (full invalidation) | | `VAE1IS` + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2 | | `VALE1IS` + TTL=`0` | Leaf entries at all levels (non-leaf not cleared) | | `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only | Signed-off-by: Zeng Heng <zengheng4@huawei.com> Fixes: 8ce720d5bd91 ("mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather") Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-21Merge branch ↵Bjorn Andersson
'20260416-qcom_ice_power_and_clk_vote-v5-13-5ccf5d7e2846@oss.qualcomm.com' into arm64-fixes-for-7.1 Merge the fixes to add power-domain and correct clocks for the ICC block in Eliza and Milos through a topic branch, to allow them to be merged also into arm64-for-7.2 to resolve the merge conflicts that would otherwise appear. Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21arm64: dts: qcom: eliza: Add power-domain and iface clk for ice nodeHarshal Dev
Qualcomm in-line crypto engine (ICE) platform driver specifies and votes for its own resources. Before accessing ICE hardware during probe, to avoid potential unclocked register access issues (when clk_ignore_unused is not passed on the kernel command line), in addition to the 'core' clock the 'iface' clock should also be turned on by the driver. This can only be done if the GCC_UFS_PHY_GDSC power domain is enabled. Specify both the GCC_UFS_PHY_GDSC power domain and the 'iface' clock in the ICE node for eliza. Fixes: af20af39fc09b ("arm64: dts: qcom: Introduce Eliza Soc base dtsi") Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Fixes: 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the Reviewed-by: Kuldeep Singh <kuldeep.singh@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260416-qcom_ice_power_and_clk_vote-v5-13-5ccf5d7e2846@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21arm64: dts: qcom: milos: Add power-domain and iface clk for ice nodeHarshal Dev
Qualcomm in-line crypto engine (ICE) platform driver specifies and votes for its own resources. Before accessing ICE hardware during probe, to avoid potential unclocked register access issues (when clk_ignore_unused is not passed on the kernel command line), in addition to the 'core' clock the 'iface' clock should also be turned on by the driver. This can only be done if the UFS_PHY_GDSC power domain is enabled. Specify both the UFS_PHY_GDSC power domain and the 'iface' clock in the ICE node for milos. Fixes: 04bb37433330e ("arm64: dts: qcom: milos: Add UFS nodes") Signed-off-by: Harshal Dev <harshal.dev@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Kuldeep Singh <kuldeep.singh@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260416-qcom_ice_power_and_clk_vote-v5-12-5ccf5d7e2846@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-21Merge tag 'trace-ringbuffer-v7.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fixes from Steven Rostedt: - Fix reporting MISSED EVENTS in trace iterator When the "trace" file is read with tracing enabled, if the writer were to pass the iterator reader, it resets, sets a "missed_events" flag and continues. The tracing output checks for missed events and if there are some, it prints out "[LOST EVENTS]" to let the user know events were dropped. But the clearing of the missed_events happened when the tracing system queried the ring buffer iterator about missed events. This was premature as the ring buffer is per CPU, and the tracing code reads all the CPU buffers and checks for missed events when it is read. If the CPU iterator that had missed events isn't printed next, the output for the LOST EVENTS is lost. Clear the missed_events flag when the iterator moves to the next event and not when the missed_events flag is queried. Also clear it on reset. - Flush and stop the persistent ring buffer on panic On panic the persistent ring buffer is used to debug what caused the panic. But on some architectures, it requires flushing the memory from cache, otherwise, the ring buffer persistent memory may not have the last events and this could also cause the ring buffer to be corrupted on the next boot. - Fix nr_subbufs initialization in simple_ring_buffer_init_mm The remote simple ring buffer meta data nr_subbufs is initialized too early and gets cleared later on, making it zero and not reflect the actual number of sub-buffers. - Fix unload_page for simple_ring_buffer init rollback On error, the pages loaded need to be unloaded. To unload a page it is expected that: page = load_page(va); -> unload_page(page). But the code was doing: unload_page(va) and not unload_page(page). - Create output file from cmd_check_undefined The check for undefined symbols checks if the file *.o.checked exists and if so it skips doing the work. But the *.o.checked file never was created making every build do the work even when it was already done previously. * tag 'trace-ringbuffer-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Create output file from cmd_check_undefined tracing: Fix unload_page for simple_ring_buffer init rollback tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm() ring-buffer: Flush and stop persistent ring buffer on panic ring-buffer: Fix reporting of missed events in iterator
2026-05-21Merge tag 'soc-fixes-7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: - The ff-a firmware driver gets 11 individual bugfixes for a number of issues with robustness to buggy firmware or client implementations. Another firmware fix address suspend to RAM via PSCI firmware. - The final code change is for the old Arm Integrator reference platform that recently started exposing an old NULL pointer dereference bug. - The MAINTAINERS file gets two updates, notably James Tai and Yu-Chun Lin are stepping up as co-maintainers for the Realtek platform. - The remaining patches are all for devicetree files. Two of these are for riscv boards, the rest are all for enesas Arm platforms, addressing build time checking issues as well as minor configuration problems. * tag 'soc-fixes-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits) firmware: psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND ARM: realtek: MAINTAINERS: Include pin controller drivers MAINTAINERS: Add maintainers for ARM/REALTEK ARCHITECTURE ARM: integrator: Fix early initialization firmware: arm_ffa: Fix sched-recv callback partition lookup firmware: arm_ffa: Snapshot notifier callbacks under lock firmware: arm_ffa: Align RxTx buffer size before mapping firmware: arm_ffa: Validate framework notification message layout firmware: arm_ffa: Keep framework RX release under lock firmware: arm_ffa: Bound PARTITION_INFO_GET_REGS copies firmware: arm_ffa: Unregister bus notifier on teardown for FF-A v1.0 firmware: arm_ffa: Fix per-vcpu self notifications handling in workqueue firmware: arm_ffa: Avoid collapsing NPI work from different CPUs firmware: arm_ffa: Skip free_pages on RX buffer alloc failure firmware: arm_ffa: Check for NULL FF-A ID table while driver registration riscv: dts: microchip: fix icicle i2c pinctrl configuration riscv: dts: starfive: jh7110: Drop CAMSS node arm64: dts: renesas: r9a09g056: Add #mux-state-cells to usb20phyrst arm64: dts: renesas: r9a09g057: Add #mux-state-cells to usb2{0,1}phyrst ARM: dts: renesas: rskrza1: Drop superfluous cells ...
2026-05-21KVM: arm64: Fix CONFIG_PKVM_DISABLE_STAGE2_ON_PANICVincent Donnefort
A typo in the config guard in __hyp_do_panic broke the stage-2 disabling and made backtraces for pKVM quite unreliable. Fix that typo. Fixes: 9019e82c7e46 ("KVM: arm64: Add PKVM_DISABLE_STAGE2_ON_PANIC") Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260520220830.273289-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-21ring-buffer: Flush and stop persistent ring buffer on panicMasami Hiramatsu (Google)
On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid descVincent Donnefort
pKVM must validate the host-provided tracing buffer descriptor. However, if an error is found, the hypervisor would just return 0 to the host. Fix the return value on validation failure. While at it, rename the function to hyp_trace_desc_is_valid() and skip validation for the nVHE mode as we trust host-provided data in that case. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Fixes: 680a04c333fa ("KVM: arm64: Add tracing capability for the nVHE/pKVM hyp") Link: https://lore.kernel.org/r/20260514162624.3477857-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-20KVM: arm64: vgic: Free private_irqs when init fails after allocationMichael Bommarito
Companion to commit 250f25367b58 ("KVM: arm64: Tear down vGIC on failed vCPU creation"), which added the missing kvm_vgic_vcpu_destroy() call to the kvm_share_hyp() failure path in kvm_arch_vcpu_create(). The kvm_vgic_vcpu_init() failure path immediately above it has the same shape and still needs the same cleanup. Call kvm_vgic_vcpu_destroy() when kvm_vgic_vcpu_init() fails so private IRQs allocated before a redistributor iodev registration failure are released before the failed vCPU is freed. Fixes: 03b3d00a70b5 ("KVM: arm64: vgic: Allocate private interrupts on demand") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260519135042.2219239-1-michael.bommarito@gmail.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-20KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bitsMichael Bommarito
Userspace can restore an ITS Device Table Entry whose Size field encodes more EventID bits than the virtual ITS supports. The live MAPD path rejects that state, but vgic_its_restore_dte() accepts it and stores the out-of-range value in dev->num_eventid_bits. Reject restored DTEs with num_eventid_bits > VITS_TYPER_IDBITS before allocating the device. This mirrors the MAPD check and prevents the restored state from reaching vgic_its_restore_itt(), where the unchecked value can be converted into an oversized scan_its_table() range. Fixes: 57a9a117154c ("KVM: arm64: vgic-its: Device table save/restore") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260519132519.2142458-1-michael.bommarito@gmail.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2026-05-19Merge tag 'mm-hotfixes-stable-2026-05-18-21-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "14 hotfixes. 9 are for MM. 10 are cc:stable and the remainder are for post-7.1 issues or aren't deemed suitable for backporting. There's a two-patch MAINTAINERS series from Mike Rapoport which updates us for the new KEXEC/KDUMP/crash/LUO/etc arrangements. And another two-patch series from Muchun Song to fix a couple of memory-hotplug issues. Otherwise singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-05-18-21-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory: fix spurious warning when unmapping device-private/exclusive pages mm: fix __vm_normal_page() to handle missing support for pmd_special()/pud_special() drivers/base/memory: fix memory block reference leak in poison accounting mm/memory_hotplug: fix memory block reference leak on remove lib: kunit_iov_iter: fix test fail on powerpc mm/page_alloc: fix initialization of tags of the huge zero folio with init_on_free MAINTAINERS: add kexec@ list to LIVE UPDATE ENTRY MAINTAINERS: add tree for KDUMP and KEXEC selftests/mm: run_vmtests.sh: fix destructive tests invocation scripts/gdb: slab: update field names of struct kmem_cache scripts/gdb: mm: cast untyped symbols in x86_page_ops mm/damon: fix damos_stat tracepoint format for sz_applied mm/damon/sysfs-schemes: call missing mem_cgroup_iter_break() mm/migrate_device: fix spinlock leak in migrate_vma_insert_huge_pmd_page
2026-05-19arm64: probes: Handle probes on hinted conditional branch instructionsVladimir Murzin
BC.cond instructions introduced by FEAT_HBC cannot be executed out-of-line, like other branch instructions. However, they can be simulated in the same way as B.cond instructions. Extend the B.cond decoder mask to match BC.cond instructions as well, and handle them using the existing B.cond simulation path. Fixes: 7f86d128e437 ("arm64: add HWCAP for FEAT_HBC (hinted conditional branches)") Cc: <stable@vger.kernel.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-18arm64: defconfig: Enable PCI M.2 power sequencing driverManivannan Sadhasivam
POWER_SEQUENCING_PCIE_M2 driver handles power supply to the PCIe M.2 connectors and is required on wide variety of ARM64 platforms such as Qcom Snapdragon X Elite laptops and Mediatek Dojo Chromebooks. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260514065017.11305-1-manivannan.sadhasivam@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-17Merge tag 'sched-urgent-2026-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: - Fix ARM64-specific rseq regressions (Mark Rutland) * tag 'sched-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/entry: Fix arm64-specific rseq brokenness
2026-05-13mm/page_alloc: fix initialization of tags of the huge zero folio with ↵David Hildenbrand (Arm)
init_on_free __GFP_ZEROTAGS semantics are currently a bit weird, but effectively this flag is only ever set alongside __GFP_ZERO and __GFP_SKIP_KASAN. If we run with init_on_free, we will zero out pages during __free_pages_prepare(), to skip zeroing on the allocation path. However, when allocating with __GFP_ZEROTAG set, post_alloc_hook() will consequently not only skip clearing page content, but also skip clearing tag memory. Not clearing tags through __GFP_ZEROTAGS is irrelevant for most pages that will get mapped to user space through set_pte_at() later: set_pte_at() and friends will detect that the tags have not been initialized yet (PG_mte_tagged not set), and initialize them. However, for the huge zero folio, which will be mapped through a PMD marked as special, this initialization will not be performed, ending up exposing whatever tags were still set for the pages. The docs (Documentation/arch/arm64/memory-tagging-extension.rst) state that allocation tags are set to 0 when a page is first mapped to user space. That no longer holds with the huge zero folio when init_on_free is enabled. Fix it by decoupling __GFP_ZEROTAGS from __GFP_ZERO, passing to tag_clear_highpages() whether we want to also clear page content. Invert the meaning of the tag_clear_highpages() return value to have clearer semantics. Reproduced with the huge zero folio by modifying the check_buffer_fill arm64/mte selftest to use a 2 MiB area, after making sure that pages have a non-0 tag set when freeing (note that, during boot, we will not actually initialize tags, but only set KASAN_TAG_KERNEL in the page flags). $ ./check_buffer_fill 1..20 ... not ok 17 Check initial tags with private mapping, sync error mode and mmap memory not ok 18 Check initial tags with private mapping, sync error mode and mmap/mprotect memory ... This code needs more cleanups; we'll tackle that next, like decoupling __GFP_ZEROTAGS from __GFP_SKIP_KASAN. [akpm@linux-foundation.org: s/__GPF_ZERO/__GFP_ZERO/, per David] Link: https://lore.kernel.org/20260421-zerotags-v2-1-05cb1035482e@kernel.org Fixes: adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio") Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Lance Yang <lance.yang@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "arm64: - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0 - Correctly deal with permission faults in guest_memfd memslots - Fix the steal-time selftest after the infrastructure was reworked - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390 - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events - Handle the current vcpu being NULL on EL2 panic - Fix the selftest_vcpu memcache being empty at the point of donation or sharing - Check that the memcache has enough capacity before engaging on the share/donate path - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context s390: - Fix array overrun with large amounts of PCI devices x86: - Never use L0's PAUSE loop exiting while L2 is running, since it's unlikely that a nested guest will help solving the hypervisor's spinlock contention - Fix emulation of MOVNTDQA - Fix typo in Xen hypercall tracepoint - Add back an optimization that was left behind when recently fixing a bug - Add module parameter to disable CET, whose implementation seems to have issues. For now it remains enabled by default Generic: - Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn() Documentation: - Update stale links Selftests: - Fix guest_memfd_test with host page size > guest page size" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: VMX: introduce module parameter to disable CET KVM: x86: Swap the dst and src operand for MOVNTDQA KVM: x86: use again the flush argument of __link_shadow_page() KVM: selftests: Ensure gmem file sizes are multiple of host page size Documentation: kvm: update links in the references section of AMD Memory Encryption KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running KVM: x86: Fix Xen hypercall tracepoint argument assignment KVM: Reject wrapped offset in kvm_reset_dirty_gfn() KVM: arm64: Pre-check vcpu memcache for host->guest donate KVM: arm64: Pre-check vcpu memcache for host->guest share KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache KVM: arm64: Fix __deactivate_fgt macro parameter typo KVM: arm64: Guard against NULL vcpu on VHE hyp panic path KVM: arm64: Make EL2 exception entry and exit context-synchronization events MAINTAINERS: Add Steffen as reviewer for KVM/arm64 KVM: arm64: Remove potential UB on nvhe tracing clock update KVM: selftests: arm64: Fix steal_time test after UAPI refactoring KVM: arm64: Handle permission faults with guest_memfd KVM: arm64: nv: Consider the DS bit when translating TCR_EL2 KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests ...
2026-05-12arm64: dts: qcom: x1-dell-thena: remove i2c20 (battery SMBus) and reserve ↵Val Packett
its pins i2c20 is used by the battmgr service on the ADSP to communicate with the SBS interface of the battery. Initializing it from Linux would break the battmgr functionality when booted in EL2. Mark those pins as reserved. Fixes: e7733b42111c ("arm64: dts: qcom: Add support for Dell Inspiron 7441 / Latitude 7455") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Signed-off-by: Val Packett <val@packett.cool> Link: https://lore.kernel.org/r/20260312005731.12488-2-val@packett.cool Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-11arm64: dts: qcom: glymur: Drop RPMh CXO clocks from QMP PHYsAbel Vesa
On Glymur, all QMP PHYs except the one used by USB SS0 take their reference clock from the TCSR clock controller. Since these TCSR clocks already derive from RPMH_CXO_CLK as their sole parent, there is no need to provide an extra `clkref` clock to the PHY nodes. Drop the extra RPMh CXO clock inputs and use the TCSR clocks as the PHY reference clocks instead. This also fixes the devicetree schema validation, as the bindings do not allow a separate `clkref` clock. Fixes: 4eee57dd4df9 ("arm64: dts: qcom: glymur: Add USB related nodes") Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reported-by: Rob Herring <robh@kernel.org> Closes: https://lore.kernel.org/r/20260410145205.GA554754-robh@kernel.org/ Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260414-dts-glymur-drop-rpmh-cxo-clk-from-qmpphys-v1-1-ab12d77c4aec@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-08Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: - ptrace(PTRACE_SETREGSET) fix to zero the target's fpsimd_state rather than the tracer's * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's
2026-05-08arm64/entry: Fix arm64-specific rseq brokennessMark Rutland
Mathias Stearn reports that since v6.19, there are two big issues affecting rseq: (1) On arm64 specifically, rseq critical sections aren't aborted when they should be. (2) The 'cpu_id_start' field is no longer written by the kernel in all cases it used to be, including some cases where TCMalloc depends on the kernel clobbering the field. This patch fixes issue #1. This patch DOES NOT fix issue #2, which will need to be addressed by other patches. The arm64-specific brokenness is a result of commits: 2fc0e4b4126c ("rseq: Record interrupt from user space") 39a167560a61 ("rseq: Optimize event setting") The first commit failed to add a call to rseq_note_user_irq_entry() on arm64. Thus arm64 never sets rseq_event::user_irq to record that it may be necessary to abort an active rseq critical section upon return to userspace. On its own, this commit had no functional impact as the value of rseq_event::user_irq was not consumed. The second commit relied upon rseq_event::user_irq to determine whether or not to bother to perform rseq work when returning to userspace. As rseq_event::user_irq wasn't set on arm64, this work would be skipped, and consequently an active rseq critical section would not be aborted. Fix this by giving arm64 syscall-specific entry/exit paths, and performing the relevant logic in syscall and non-syscall paths, including calling rseq_note_user_irq_entry() for non-syscall entry. Currently arm64 cannot use syscall_enter_from_user_mode(), syscall_exit_to_user_mode(), and irqentry_exit_to_user_mode(), due to ordering constraints with exception masking, and risk of ABI breakage for syscall tracing/audit/etc. For the moment the entry/exit logic is left as arm64-specific, directly using enter_from_user_mode() and exit_to_user_mode(), but mirroring the generic code. I intend to follow up with refactoring/cleanup, as we did for kernel mode entry paths in commit: 041aa7a85390 ("entry: Split preemption from irqentry_exit_to_kernel_mode()") ... which will allow arm64 to use the GENERIC_IRQ_ENTRY functions directly. Fixes: 39a167560a61 ("rseq: Optimize event setting") Reported-by: Mathias Stearn <mathias@mongodb.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/regressions/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com/ Link: https://patch.msgid.link/20260508142023.3268622-1-mark.rutland@arm.com
2026-05-07KVM: arm64: Pre-check vcpu memcache for host->guest donateFuad Tabba
__pkvm_host_donate_guest() flips the host stage-2 PTE for the donated page to a non-valid annotation via host_stage2_set_owner_metadata_locked() and then calls kvm_pgtable_stage2_map() to install the matching guest stage-2 mapping. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM even at PAGE_SIZE granularity: the donate path verifies PKVM_NOPAGE for the guest IPA before the map, so the walker must allocate fresh page-table pages from the vcpu memcache, and the host controls the vcpu memcache via the topup interface. An under-provisioned donation request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation alongside the existing __host_check_page_state_range() / __guest_check_page_state_range() pre-checks, using the helper introduced for host->guest share. If the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(), return -ENOMEM before any state mutation. Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Pre-check vcpu memcache for host->guest shareFuad Tabba
__pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to install the guest stage-2 mapping, after a forward pass that mutates the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments host_share_guest_count) for every page in the range. The map's return value is wrapped in WARN_ON() and otherwise discarded, asserting that the call cannot fail. WARN_ON() at nVHE EL2 panics, so this assertion is only correct if the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail with -ENOMEM when the stage-2 walker exhausts the caller's memcache, and the host controls the vcpu memcache via the topup interface, so an under-provisioned share request would otherwise turn a recoverable -ENOMEM into a fatal hyp panic. Bound the worst-case walker allocation in the existing pre-check pass so that kvm_pgtable_stage2_map() cannot fail at the call site, using kvm_mmu_cache_min_pages() -- the same bound host EL1 uses for its own stage-2 maps. If the vcpu memcache holds fewer pages, return -ENOMEM before any state mutation. Fixes: d0bd3e6570ae ("KVM: arm64: Introduce __pkvm_host_share_guest()") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Seed pkvm_ownership_selftest vcpu memcacheFuad Tabba
The hypercall handlers call pkvm_refill_memcache() to top up the hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest(). pkvm_ownership_selftest invokes those functions directly with a static selftest_vcpu that has an empty memcache. Seed selftest_vcpu's memcache from the prepopulated selftest pages, leaving the remainder for selftest_vm.pool. Required by the memcache-sufficiency pre-check added in the following patches. Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Fix __deactivate_fgt macro parameter typoFuad Tabba
__deactivate_fgt() declares its first parameter as "htcxt" but the body references "hctxt". The parameter is unused; the macro silently captures "hctxt" from the enclosing scope. Both existing callers (__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen to define a local "struct kvm_cpu_context *hctxt", so the macro works by coincidence. A future caller without an "hctxt" local in scope, or naming it differently, would compile but bind to the wrong context. Align the parameter name with the sibling __activate_fgt() macro. The "vcpu" parameter remains unused in the body, kept for API symmetry with __activate_fgt() (which uses it). Fixes: f5a5a406b4b8 ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Guard against NULL vcpu on VHE hyp panic pathFuad Tabba
On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu) on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer is cleared after every guest exit (and is never set when no guest is running), so an unexpected EL2 exception landing in _guest_exit_panic, e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this function with vcpu == NULL. __deactivate_traps() then dereferences vcpu via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv() -> vcpu->arch.features, faulting inside the panic handler and obscuring the original failure. The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c) already guards its vcpu-using cleanup with "if (vcpu)"; mirror that here. sysreg_restore_host_state_vhe() does not depend on vcpu and continues to run unconditionally, preserving panic forensics. The trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's %p handling. Fixes: 6a0259ed29bb ("KVM: arm64: Remove hyp_panic arguments") Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-07KVM: arm64: Make EL2 exception entry and exit context-synchronization eventsFuad Tabba
SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI 0487 M.b D24.2.175 (p. D24-9754): - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally a CSE. - FEAT_ExS: the reset value is architecturally UNKNOWN; software must set the bit to make the entry/exit a CSE. INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after MSRs to context-switching system registers (HCR_EL2, ZCR_EL2, ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not architecturally backed unless EOS=1 (and, for entry, EIS=1). Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON was setting both unconditionally. The conversion made SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only models unconditionally-RES1 fields and EIS/EOS are RES1 only when FEAT_ExS is absent, the auto-generated mask is UL(0). The seven other bits dropped from the old mask (positions 4, 5, 16, 18, 23, 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS are the only bits whose semantics changed for FEAT_ExS hardware and where the kernel relies on the value being 1. Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are unconditionally CSEs regardless of whether FEAT_ExS is implemented. This matches the pairing in arch/arm64/kvm/config.c which treats EIS and EOS together as RES1 under !FEAT_ExS. Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure") Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Gemini:gemini-3.1-pro review-prompts Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://patch.msgid.link/20260501112149.2824881-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>