summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2026-05-21Merge tag 'soc-fixes-7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: - The ff-a firmware driver gets 11 individual bugfixes for a number of issues with robustness to buggy firmware or client implementations. Another firmware fix address suspend to RAM via PSCI firmware. - The final code change is for the old Arm Integrator reference platform that recently started exposing an old NULL pointer dereference bug. - The MAINTAINERS file gets two updates, notably James Tai and Yu-Chun Lin are stepping up as co-maintainers for the Realtek platform. - The remaining patches are all for devicetree files. Two of these are for riscv boards, the rest are all for enesas Arm platforms, addressing build time checking issues as well as minor configuration problems. * tag 'soc-fixes-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits) firmware: psci: Set pm_set_resume/suspend_via_firmware() for SYSTEM_SUSPEND ARM: realtek: MAINTAINERS: Include pin controller drivers MAINTAINERS: Add maintainers for ARM/REALTEK ARCHITECTURE ARM: integrator: Fix early initialization firmware: arm_ffa: Fix sched-recv callback partition lookup firmware: arm_ffa: Snapshot notifier callbacks under lock firmware: arm_ffa: Align RxTx buffer size before mapping firmware: arm_ffa: Validate framework notification message layout firmware: arm_ffa: Keep framework RX release under lock firmware: arm_ffa: Bound PARTITION_INFO_GET_REGS copies firmware: arm_ffa: Unregister bus notifier on teardown for FF-A v1.0 firmware: arm_ffa: Fix per-vcpu self notifications handling in workqueue firmware: arm_ffa: Avoid collapsing NPI work from different CPUs firmware: arm_ffa: Skip free_pages on RX buffer alloc failure firmware: arm_ffa: Check for NULL FF-A ID table while driver registration riscv: dts: microchip: fix icicle i2c pinctrl configuration riscv: dts: starfive: jh7110: Drop CAMSS node arm64: dts: renesas: r9a09g056: Add #mux-state-cells to usb20phyrst arm64: dts: renesas: r9a09g057: Add #mux-state-cells to usb2{0,1}phyrst ARM: dts: renesas: rskrza1: Drop superfluous cells ...
2026-05-21Revert "drivers: net: 3com: 3c509: Remove this driver"Maciej W. Rozycki
This reverts commit 91f3a27ae9f66d81a5906461762c37c8a2bcab06. Contrary to the assumption stated with the original commit description this driver is in use and I'm going to maintain it for the foreseeable future. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Link: https://patch.msgid.link/alpine.DEB.2.21.2605201204260.1450@angie.orcam.me.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21KVM: arm64: Fix CONFIG_PKVM_DISABLE_STAGE2_ON_PANICVincent Donnefort
A typo in the config guard in __hyp_do_panic broke the stage-2 disabling and made backtraces for pKVM quite unreliable. Fix that typo. Fixes: 9019e82c7e46 ("KVM: arm64: Add PKVM_DISABLE_STAGE2_ON_PANIC") Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Link: https://patch.msgid.link/20260520220830.273289-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-21LoongArch: Remove unused code to avoid build warningHuacai Chen
After commit feee6b2989165631b1 ("mm/memory_hotplug: shrink zones when offlining memory"), __remove_pages() doesn't need the "zone" parameter so the "page" variable is also unused. Remove the unused code to avoid such build warning: arch/loongarch/mm/init.c: In function 'arch_remove_memory': arch/loongarch/mm/init.c:134:22: warning: variable 'page' set but not used [-Wunused-but-set-variable=] 134 | struct page *page = pfn_to_page(start_pfn); Cc: <stable@vger.kernel.org> Reviewed-by: Guo Ren <guoren@kernel.org> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-21LoongArch: Avoid initrd overlap during kernel relocationWANG Rui
Validate the relocation address against the initrd region specified via "initrd=" or "initrdmem=" on the command line. Reject relocation targets that overlap the initrd to prevent memory corruption during early boot. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-21LoongArch: Skip relocation-time KASLR if already appliedWANG Rui
When the kernel is relocated during early boot (efistub or kexec_file), a randomized load address may has already been selected and applied. In this case, performing KASLR again in relocate.c is unnecessary. Note: strictly-defined KASLR means the kernel's final runtime address has a random offset from the kernel's load address, which is implemented in relocate.c; broadly-defined KALSR means the kernel's final runtime address has a random offset from the kernel's link address (a.k.a. VMLINUX_LOAD_ADDRESS), which also include the efistlub implementation, kexec_file implementation and QEMU direct kernel boot. kaslr_disabled() return true only means strictly-defined KASLR is disabled. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-21efi/loongarch: Randomize kernel preferred address for KASLRWANG Rui
Introduce efi_get_kimg_kaslr_address() helper to compute the preferred kernel image load address dynamically when CONFIG_RANDOMIZE_BASE is enabled. The function derives a random offset by using the EFI-provided randomness combined with the timer tick value, and constrains it within CONFIG_RANDOMIZE_BASE_MAX_OFFSET. Update EFI_KIMG_PREFERRED_ADDRESS to call this helper so that the EFI stub can select a randomized load address when KASLR is active, while preserving the original base address behavior when KASLR is disabled or "nokaslr" is specified. Note: LoongArch can't KASLR for hibernation, so set efi_nokaslr to true if "resume=<devname>" is explicitly specified in cmdline. Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: WANG Rui <wangrui@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2026-05-21ring-buffer: Flush and stop persistent ring buffer on panicMasami Hiramatsu (Google)
On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20x86/mm: Disable broadcast TLB flush when PCID is disabledTom Lendacky
Booting with "nopcid" clears X86_FEATURE_PCID and keeps CR4.PCIDE from being set to one. On AMD CPUs that support INVLPGB, broadcast TLB flushing remains enabled. There are two checks that decide whether the global ASID code runs, mm_global_asid() and consider_global_asid(), that key off of the X86_FEATURE_INVLPGB feature. Once an mm becomes active on more than three CPUs, consider_global_asid() assigns it a global ASID, after which flush_tlb_mm_range() takes the broadcast_tlb_flush() path using a non-zero PCID. Issuing an INVLPGB with a non-zero PCID while CR4.PCIDE is not set results in a #GP: Oops: general protection fault, kernel NULL pointer dereference 0x1: 0000 [#1] SMP NOPTI CPU: 158 UID: 0 PID: 3119 Comm: snap Not tainted 7.1.0-rc3 #1 PREEMPT(full) Hardware name: ... RIP: 0010:broadcast_tlb_flush Code: ... 89 da 48 83 c8 07 <0f> 01 fe eb 08 cc cc cc ... Call Trace: <TASK> flush_tlb_mm_range ptep_clear_flush wp_page_copy ? _raw_spin_unlock __handle_mm_fault handle_mm_fault do_user_addr_fault exc_page_fault asm_exc_page_fault All processors that support broadcast TLB invalidation also have PCID support, so it is only the "nopcid" scenario that is of concern. In this situation just disable the broadcast TLB support using the CPUID dependency support by making X86_FEATURE_INVLPGB dependent on X86_FEATURE_PCID. [ bp: Massage commit message. ] Fixes: 4afeb0ed1753 ("x86/mm: Enable broadcast TLB invalidation for multi-threaded processes") Suggested-by: Dave Hansen <dave.hansen@intel.com> Assisted-by: Claude:claude-opus-4.7 Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Rik van Riel <riel@surriel.com> Cc: <stable@kernel.org> Link: https://patch.msgid.link/b915acfd63e8b2a094fdeb8dc608738072518764.1779296450.git.thomas.lendacky@amd.com
2026-05-20s390/topology: Use zero-based numbering for containing entitiesAlexandra Winter
Start the numbering scheme for higher-level topology structures (like socket, book, drawer) at zero, matching the convention for other hardware identifiers like e.g. CPU numbers. Hardware documentation, the Hardware Management Console and other tools like zmemtopo also use zero-based numbering for these containing entities. Aligning the numbering in sysfs, procfs, and tools like lscpu improves user experience by making it easier to correlate topology information across different interfaces. If available, Linux on s390 derives this physical topology information from the stsi function code 15 store_topology instruction, which is defined to start at 1 for the lowest numbered container id. Subtract one, so drawer_id, book_id and socket_id in cpu_topology[] start with 0 for the lowest numbered entity; and /proc/cpuinfo and tools like 'lscpu -ye' display the expected values. Display only, no functional change intended. Example: In a partition with 3 cores in a system with 8 cores per socket; 2 sockets per book; 4 books per dawer; and 4 drawers: Before this fix: $ lscpu -ye CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2 ONLINE CONFIGURED POLARIZATION ADDRESS 0 0 2 4 1 0 0:0:0 yes yes vert-high 0 1 0 2 4 1 0 1:1:1 yes yes vert-high 1 2 0 2 4 1 1 2:2:2 yes yes vert-medium 2 3 0 2 4 1 1 3:3:3 yes yes vert-medium 3 4 0 2 4 2 3 4:4:4 yes yes vert-low 4 5 0 2 4 2 3 5:5:5 yes yes vert-low 5 After this fix: $ lscpu -ye CPU NODE DRAWER BOOK SOCKET CORE L1d:L1i:L2 ONLINE CONFIGURED POLARIZATION ADDRESS 0 0 1 3 0 0 0:0:0 yes yes vert-high 0 1 0 1 3 0 0 1:1:1 yes yes vert-high 1 2 0 1 3 0 1 2:2:2 yes yes vert-medium 2 3 0 1 3 0 1 3:3:3 yes yes vert-medium 3 4 0 1 3 1 3 4:4:4 yes yes vert-low 4 5 0 1 3 1 3 5:5:5 yes yes vert-low 5 For KVM guests, qemu emulates the stsi FC15 store_topology instruction. This emulation currently erroneously starts id numbering at 0. A qemu fix is proposed that makes this emulation compliant to the stsi architecture. In case a guest with this patch is running on a qemu without the other fix, it can happen that ids of 255 are displayed erroneously. z/VM currently does not provide or emulate physical topology information to its guests. So this patch does not change anything for z/VM guests. Fixes: 10d385895055 ("[S390] topology: expose core identifier") Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2026-05-20KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid descVincent Donnefort
pKVM must validate the host-provided tracing buffer descriptor. However, if an error is found, the hypervisor would just return 0 to the host. Fix the return value on validation failure. While at it, rename the function to hyp_trace_desc_is_valid() and skip validation for the nVHE mode as we trust host-provided data in that case. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Fixes: 680a04c333fa ("KVM: arm64: Add tracing capability for the nVHE/pKVM hyp") Link: https://lore.kernel.org/r/20260514162624.3477857-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-20KVM: arm64: vgic: Free private_irqs when init fails after allocationMichael Bommarito
Companion to commit 250f25367b58 ("KVM: arm64: Tear down vGIC on failed vCPU creation"), which added the missing kvm_vgic_vcpu_destroy() call to the kvm_share_hyp() failure path in kvm_arch_vcpu_create(). The kvm_vgic_vcpu_init() failure path immediately above it has the same shape and still needs the same cleanup. Call kvm_vgic_vcpu_destroy() when kvm_vgic_vcpu_init() fails so private IRQs allocated before a redistributor iodev registration failure are released before the failed vCPU is freed. Fixes: 03b3d00a70b5 ("KVM: arm64: vgic: Allocate private interrupts on demand") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260519135042.2219239-1-michael.bommarito@gmail.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-20KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bitsMichael Bommarito
Userspace can restore an ITS Device Table Entry whose Size field encodes more EventID bits than the virtual ITS supports. The live MAPD path rejects that state, but vgic_its_restore_dte() accepts it and stores the out-of-range value in dev->num_eventid_bits. Reject restored DTEs with num_eventid_bits > VITS_TYPER_IDBITS before allocating the device. This mirrors the MAPD check and prevents the restored state from reaching vgic_its_restore_itt(), where the unchecked value can be converted into an oversized scan_its_table() range. Fixes: 57a9a117154c ("KVM: arm64: vgic-its: Device table save/restore") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://lore.kernel.org/r/20260519132519.2142458-1-michael.bommarito@gmail.com Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2026-05-19x86/kvm/vmx: Fix VMX vs hrtimer_rearm_deferred()Peter Zijlstra
Vishal reported that KVM unit test 'x2apic' started failing after commit 0e98eb14814e ("entry: Prepare for deferred hrtimer rearming"). The reason is that KVM/VMX is injecting interrupts while it has interrupts disabled, for a context that will enable interrupts, this means that regs->flags.X86_EFLAGS_IF == 0 and irqentry_exit() will not do the right thing. Notably, irqentry_exit() must not call hrtimer_rearm_deferred() when the return context does not have IF set, because this will cause problems vs NMIs. Therefore, fix up the state after the injection. Fixes: 0e98eb14814e ("entry: Prepare for deferred hrtimer rearming") Reported-by: "Verma, Vishal L" <vishal.l.verma@intel.com> Suggested-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: "Verma, Vishal L" <vishal.l.verma@intel.com> Tested-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Link: https://patch.msgid.link/20260423155936.957351833@infradead.org Closes: https://lore.kernel.org/r/70cd3e97fbb796e2eb2ff8cd4b7614ada05a5f24.camel%40intel.com
2026-05-19x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 corePeter Zijlstra
Move the VMX interrupt dispatch magic into the x86 core code. This isolates KVM from the FRED/IDT decisions and reduces the amount of EXPORT_SYMBOL_FOR_KVM(). Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: "Verma, Vishal L" <vishal.l.verma@intel.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Binbin Wu <binbin.wu@linxu.intel.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260508091829.GO3126523@noisy.programming.kicks-ass.net
2026-05-19Merge tag 'mm-hotfixes-stable-2026-05-18-21-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "14 hotfixes. 9 are for MM. 10 are cc:stable and the remainder are for post-7.1 issues or aren't deemed suitable for backporting. There's a two-patch MAINTAINERS series from Mike Rapoport which updates us for the new KEXEC/KDUMP/crash/LUO/etc arrangements. And another two-patch series from Muchun Song to fix a couple of memory-hotplug issues. Otherwise singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-05-18-21-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory: fix spurious warning when unmapping device-private/exclusive pages mm: fix __vm_normal_page() to handle missing support for pmd_special()/pud_special() drivers/base/memory: fix memory block reference leak in poison accounting mm/memory_hotplug: fix memory block reference leak on remove lib: kunit_iov_iter: fix test fail on powerpc mm/page_alloc: fix initialization of tags of the huge zero folio with init_on_free MAINTAINERS: add kexec@ list to LIVE UPDATE ENTRY MAINTAINERS: add tree for KDUMP and KEXEC selftests/mm: run_vmtests.sh: fix destructive tests invocation scripts/gdb: slab: update field names of struct kmem_cache scripts/gdb: mm: cast untyped symbols in x86_page_ops mm/damon: fix damos_stat tracepoint format for sz_applied mm/damon/sysfs-schemes: call missing mem_cgroup_iter_break() mm/migrate_device: fix spinlock leak in migrate_vma_insert_huge_pmd_page
2026-05-19x86/vdso: Fix incorrect size in munmap() on map_vdso() failureGuilherme Giacomo Simoes
In map_vdso(), if a failure occurs during the installation of the VVAR mappings, the error path attempts to clean up previously allocated mappings using do_munmap(). However, the cleanup for the VVAR mapping is incorrectly using image->size (the size of the vDSO text) instead of the actual size allocated for the VVAR area. Replace the incorrect do_munmap() image->size parameter with the constant VDSO_NR_PAGES * PAGE_SIZE. Ensure the unmap size exactly matches the size used during the vdso_install_vvar_mapping() phase to provide a symmetrical and complete teardown of the memory region. Fixes: e93d2521b27f ("x86/vdso: Split virtual clock pages into dedicated mapping") Signed-off-by: Guilherme Giacomo Simoes <trintaeoitogc@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20260503191609.551817-1-trintaeoitogc@gmail.com
2026-05-19arm64: probes: Handle probes on hinted conditional branch instructionsVladimir Murzin
BC.cond instructions introduced by FEAT_HBC cannot be executed out-of-line, like other branch instructions. However, they can be simulated in the same way as B.cond instructions. Extend the B.cond decoder mask to match BC.cond instructions as well, and handle them using the existing B.cond simulation path. Fixes: 7f86d128e437 ("arm64: add HWCAP for FEAT_HBC (hinted conditional branches)") Cc: <stable@vger.kernel.org> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-18arm64: defconfig: Enable PCI M.2 power sequencing driverManivannan Sadhasivam
POWER_SEQUENCING_PCIE_M2 driver handles power supply to the PCIe M.2 connectors and is required on wide variety of ARM64 platforms such as Qcom Snapdragon X Elite laptops and Mediatek Dojo Chromebooks. Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260514065017.11305-1-manivannan.sadhasivam@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-18RISC-V: KVM: Fix sign extension for MMIO loadsJiakai Xu
The kvm_riscv_vcpu_mmio_return() function handles MMIO read results by writing the data back to the guest register. For signed load instructions (LB, LH, LW on RV64), the value needs sign-extension from a smaller integer to unsigned long. The current code uses: (ulong)data << shift >> shift but (ulong) makes the right shift a logical shift (zero-extend) rather than an arithmetic shift (sign-extend), causing incorrect results when the MMIO device returns a negative value. For example, LB reading 0x80 would return 128 instead of -128. Fix this by casting to (long) after the left shift so that the subsequent right shift is arithmetic and correctly propagates the sign bit: (long)((ulong)data << shift) >> shift Additionally, remove the unnecessary shift assignment for LBU (unsigned byte load) since it does not need sign extension. This makes LBU consistent with LHU and LWU which already keep shift = 0. Fixes: b91f0e4cb8a3 ("RISC-V: KVM: Factor-out instruction emulation into separate sources") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Assisted-by: OpenClaw:DeepSeek-V3.2 Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260514081752.472987-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handlerJiakai Xu
The SBI v0.1 SEND_IPI handler iterates over the hart mask and calls kvm_get_vcpu_by_id() to find the target vcpu for each set bit. When a guest provides a hart mask containing bits for non-existent vcpu_ids, kvm_get_vcpu_by_id() returns NULL, which is then unconditionally dereferenced by kvm_riscv_vcpu_set_interrupt(), causing a kernel crash. Fix this by adding a NULL check before dereferencing the return value. If the target vcpu is not found, skip it and continue processing the remaining valid harts. Fixes: a046c2d8578c ("RISC-V: KVM: Reorganize SBI code by moving SBI v0.1 to its own file") Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Assisted-by: OpenClaw:DeepSeek-V3.2 Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260517124414.420919-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOMOsama Abdelkader
kvm_riscv_vcpu_pmu_event_info() returned -ENOMEM from the SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall() to abort KVM_RUN and surface the error to userspace instead of completing the ECALL with a negative SBI error in a0. Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU handlers and kvm_sbi_ext_pmu_handler comment. Fixes: e309fd113b9f ("RISC-V: KVM: Implement get event info function") Cc: stable@vger.kernel.org Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260514173642.41448-2-osama.abdelkader@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOMOsama Abdelkader
kvm_riscv_vcpu_pmu_snapshot_set_shmem() returned -ENOMEM from the SBI extension handler, which caused kvm_riscv_vcpu_sbi_ecall() to abort KVM_RUN and surface the error to userspace instead of ompleting the ECALL with a negative SBI error in a0. Use SBI_ERR_FAILURE and the normal retdata path, matching other PMU handlers and kvm_sbi_ext_pmu_handler comment. Fixes: c2f41ddbcdd7 ("RISC-V: KVM: Implement SBI PMU Snapshot feature") Cc: stable@vger.kernel.org Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20260514173642.41448-1-osama.abdelkader@gmail.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-18RISC-V: KVM: Fix invalid HVA warning in steal-time recordingJiakai Xu
kvm_riscv_vcpu_record_steal_time() assumes that the steal-time shared memory GPA (vcpu->arch.sta.shmem) is always backed by a valid guest memory slot. However, this assumption is not guaranteed by the KVM userspace ABI. A malicious or buggy userspace can set the STA shared memory GPA via KVM_SET_ONE_REG without establishing a corresponding memory region via KVM_SET_USER_MEMORY_REGION. In such cases, the GPA cannot be translated to a valid HVA and kvm_vcpu_gfn_to_hva() returns an error address. The current implementation incorrectly treats this as a kernel warning using WARN_ON(), which may escalate to a kernel panic when panic_on_warn is enabled. This is not a kernel bug condition but a normal invalid configuration from userspace, and should be handled gracefully. Fix it by removing WARN_ON() and treating invalid HVA as a normal failure case, resetting the STA shared memory state. Fixes: e9f12b5fff8ad0 ("RISC-V: KVM: Implement SBI STA extension") Signed-off-by: Jiakai Xu <xujiakai2025@iscas.ac.cn> Signed-off-by: Jiakai Xu <jiakaiPeanut@gmail.com> Assisted-by: OpenClaw:DeepSeek-V3.2 Reviewed-by: Nutty Liu <nutty.liu@hotmail.com> Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260415075216.2757427-1-xujiakai2025@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-17Merge tag 'x86-urgent-2026-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: - Fix x86 boot crash for non-kjump kexecs (David Woodhouse) * tag 'x86-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/kexec: Push kjump return address even for non-kjump kexec
2026-05-17Merge tag 'sched-urgent-2026-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: - Fix ARM64-specific rseq regressions (Mark Rutland) * tag 'sched-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: arm64/entry: Fix arm64-specific rseq brokenness
2026-05-17Merge tag 'ras-urgent-2026-05-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MCE fix from Ingo Molnar: - Fix an MCE polling interval adjustment regression (Borislav Petkov) * tag 'ras-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Restore MCA polling interval halving
2026-05-17Merge tag 'riscv-for-linus-7.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Paul Walmsley: "Relatively low-impact fixes. Probably the most notable one is that we no longer ask the monitor-mode firmware to delegate misaligned access handling to the kernel by default, since the kernel code needs significant improvement to match the functionality of the firmware. This change avoids functional problems at some cost in performance, but shouldn't affect any system with misaligned access handling in hardware. - Disable satp register probing when no5lvl is specified on the kernel command line - Fix a CFI-related issue with the misaligned access speed measurement code - Reduce the CFI shadow stack size limit from 4GB to 2GB (following ARM64 GCS) - Prevent the kernel from requesting delegation of misaligned access faults unless a new Kconfig option, RISCV_SBI_FWFT_DELEGATE_MISALIGNED, is enabled. This will depend on CONFIG_NONPORTABLE until the deficiencies of the kernel misaligned access fixup code are fixed - Fix some potential uninitialized memory accesses in error paths in compat_riscv_gpr_set() and compat_restore_sigcontext() - Fix a bug in the RISC-V MIPS vendor errata patching code where a logical-and was used in place of a bitwise-and - Drop some unnecessary code in riscv_fill_hwcap_from_isa_string() - Use macros for isa2hwcap indices in riscv_fill_hwcap(), rather than open-coding them - Fix some documentation typos (one affecting 'make htmldocs')" * tag 'riscv-for-linus-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: misaligned: Make enabling delegation depend on NONPORTABLE riscv: Docs: fix unmatched quote warning riscv: cfi: reduce shadow stack size limit from 4GB to 2GB riscv: cpufeature: Use pre-defined ISA ext macros to index isa2hwcap riscv: mm: Fixup no5lvl failure when vaddr is invalid riscv: Fix register corruption from uninitialized cregs on error riscv: errata: Fix bitwise vs logical AND in MIPS errata patching Documentation: riscv: cmodx: fix typos riscv: cpufeature: Drop this_hwcap clear in T-Head vector workaround riscv: Define __riscv_copy_{,vec_}{words,bytes}_unaligned() using SYM_TYPED_FUNC_START
2026-05-16Merge tag 'powerpc-7.1-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Madhavan Srinivasan: - Fix preempt count leak in sysfs show paths - Fix error handling in pika_dtm_thread - Remove pmac_low_i2c_{lock,unlock}() - Enable all windfarms by default - Fix dead default for GUEST_STATE_BUFFER_TEST - Remove redundant preempt_disable|enable() calls from arch_irq_work_raise() Thanks to Aboorva Devarajan, Ally Heev, Amit Machhiwal, Bart Van Assche, Christophe Leroy, Christophe Leroy (CS GROUP), Dan Carpenter, Gautam Menghani, Harsh Prateek Bora, Julian Braha, Krzysztof Kozlowski, Linus Walleij, Ma Ke, Ritesh Harjani (IBM), and Sayali Patil * tag 'powerpc-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/time: Remove redundant preempt_disable|enable() calls from arch_irq_work_raise() powerpc/hv-gpci: fix preempt count leak in sysfs show paths powerpc: fix dead default for GUEST_STATE_BUFFER_TEST powerpc/powermac: Remove pmac_low_i2c_{lock,unlock}() powerpc/warp: Fix error handling in pika_dtm_thread powerpc: 82xx: fix uninitialized pointers with free attribute powerpc/g5: Enable all windfarms by default
2026-05-16ARM: dts: microchip: sam9x7: fix GMAC clock configurationMihai Sain
The GMAC node incorrectly listed four clocks, including a separate tx_clk and a TSU GCK clock sourced from ID 67. According to the SAM9X7 clocking scheme, the GMAC uses only three clocks: HCLK, PCLK, and the TSU GCK derived from the GMAC peripheral clock (ID 24). Remove the unused tx_clk, update the clock-names accordingly, and correct the assigned clock to use GCK 24 instead of GCK 67. This aligns the device tree with the actual hardware clock topology and prevents misconfiguration of the GMAC clock tree. [root@SAM9X75 ~]$ cat /sys/kernel/debug/clk/clk_summary | grep gmac gmac_gclk 1 1 1 266666666 0 0 50000 Y f802c000.ethernet tsu_clk f802c000.ethernet tsu_clk gmac_clk 2 2 0 266666666 0 0 50000 Y f802c000.ethernet hclk f802c000.ethernet pclk Fixes: 41af45af8bc3 ("ARM: dts: at91: sam9x7: add device tree for SoC") Signed-off-by: Mihai Sain <mihai.sain@microchip.com> Link: https://lore.kernel.org/r/20260309075329.1528-5-mihai.sain@microchip.com [claudiu.beznea: massaged the patch description] Signed-off-by: Claudiu Beznea <claudiu.beznea@tuxon.dev>
2026-05-15Merge tag 'for-linus-7.1b-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - one simple cleanup - a fix for a corner case when running as Xen PV dom0 - a fix of a regression for Xen PV guests, introduced in 7.0 * tag 'for-linus-7.1b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Tolerate nested XEN_LAZY_MMU entering/leaving x86/xen: Fix xen_e820_swap_entry_with_ram() xen/arm: Replace __ASSEMBLY__ with __ASSEMBLER__ in interface.h
2026-05-15ARM: Do not select HAVE_RUST when KASAN is enabledNathan Chancellor
When KASAN is enabled, such as with allmodconfig, the build fails when building the Rust code with: error: kernel-address sanitizer is not supported for this target error: aborting due to 1 previous error make[4]: *** [rust/Makefile:654: rust/core.o] Error 1 The arm-unknown-linux-gnueabi target does not support KASAN, so avoid saying Rust is supported when it is enabled. Cc: stable@vger.kernel.org Fixes: ccb8ce526807 ("ARM: 9441/1: rust: Enable Rust support for ARMv7") Link: https://github.com/Rust-for-Linux/linux/issues/1234 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Christian Schrefl <chrisi.schrefl@gmail.com> Link: https://patch.msgid.link/20260511-arm-avoid-rust-with-kasan-v1-1-24d55f4a900b@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2026-05-14Merge tag 'acpi-7.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fixes from Rafael Wysocki: "These fix several platform drivers that use the ACPI companion of the given platform device without checking its presence, which may lead to a NULL pointer dereference or other kind of malfunction if the driver is forced to match a device without an ACPI companion via driver override, and restore debug log level for some messages in the ACPI CPPC library: - Check ACPI_COMPANION() against NULL during probe in several core ACPI device drivers (Rafael Wysocki) - Restore log level of messages in amd_set_max_freq_ratio() (Mario Limonciello)" * tag 'acpi-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: PAD: xen: Check ACPI_COMPANION() against NULL ACPI: driver: Check ACPI_COMPANION() against NULL during probe Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"
2026-05-14Merge branch 'acpi-cppc'Rafael J. Wysocki
Merge a revert of an ACPI CPPC commit that increased the log level of some debug messages which turned out to be a bad idea: - Restore log level of messages in amd_set_max_freq_ratio() (Mario Limonciello) * acpi-cppc: Revert "ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"
2026-05-14x86/xen: Tolerate nested XEN_LAZY_MMU entering/leavingJuergen Gross
With the support of nested lazy mmu sections it can happen that arch_enter_lazy_mmu_mode() is being called twice without a call of arch_leave_lazy_mmu_mode() in between, as the lazy_mmu_*() helpers are not disabling preemption when checking for nested lazy mmu sections. This is a problem when running as a Xen PV guest, as xen_enter_lazy_mmu() and xen_leave_lazy_mmu() don't tolerate this case. Fix that in xen_enter_lazy_mmu() and xen_leave_lazy_mmu() in order not to hurt all other lazy mmu mode users. Fixes: 291b3abed657 ("x86/xen: use lazy_mmu_state when context-switching") Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260508143933.493013-1-jgross@suse.com>
2026-05-14x86/xen: Fix xen_e820_swap_entry_with_ram()Juergen Gross
When swapping a not page-aligned E820 map entry with RAM, the start address of the modified entry is calculated wrong (the offset into the page is subtracted instead of being added to the page address). Fixes: be35d91c8880 ("xen: tolerate ACPI NVS memory overlapping with Xen allocated memory") Reported-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260505102417.208138-1-jgross@suse.com>
2026-05-14powerpc/time: Remove redundant preempt_disable|enable() calls from ↵Sayali Patil
arch_irq_work_raise() A kernel panic is observed when handling machine check exceptions from real mode. BUG: Unable to handle kernel data access on read at 0xc00000006be21300 Oops: Kernel access of bad area, sig: 11 [#1] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 88222248 XER: 00000005 CFAR: c00000000003ffc4 DAR: c00000006be21300 DSISR: 40000000 IRQMASK: 0 NIP [c000000000029e40] arch_irq_work_raise+0x10/0x70 LR [c00000000003ffc8] machine_check_queue_event+0xa8/0x150 Call Trace: [c0000000179d3c70] [c00000000003ff64] machine_check_queue_event+0x44/0x150 [c0000000179d3d30] [c0000000000084e0] machine_check_early_common+0x1f0/0x2c0 The crash occurs because arch_irq_work_raise() calls preempt_disable() from machine check exception (MCE) handlers running in real mode. In this context, accessing the preempt_count can fault, leading to the panic. The preempt_disable()/preempt_enable() pair in arch_irq_work_raise() was originally added by commit 0fe1ac48bef0 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") to avoid races while raising irq work from exception context. Later, commit 471ba0e686cb ("irq_work: Do not raise an IPI when queueing work on the local CPU") added preemption protection in irq_work_queue() path, while commit 20b876918c06 ("irq_work: Use per cpu atomics instead of regular atomics") added equivalent protection in irq_work_queue_on() before reaching arch_irq_work_raise(): irq_work_queue() / irq_work_queue_on() -> preempt_disable() -> __irq_work_queue_local() -> irq_work_raise() -> arch_irq_work_raise() As a result, callers other than mce_irq_work_raise() already execute with preemption disabled, making the additional preempt_disable()/preempt_enable() pair in arch_irq_work_raise() redundant. The arch_irq_work_raise() function executes in NMI context when called from MCE handler. Hence we will not be preempted or scheduled out since we are in NMI context with MSR[EE]=0. Therefore, it is safe to remove the preempt_disable()/preempt_enable() calls from here. Remove it to avoid accessing preempt_count from real mode context. Fixes: cc15ff327569 ("powerpc/mce: Avoid using irq_work_queue() in realmode") Suggested-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Shrikanth Hegde <sshegde@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Sayali Patil <sayalip@linux.ibm.com> [Maddy: Fixed the commit title] Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260513081413.222490-1-sayalip@linux.ibm.com
2026-05-13riscv: misaligned: Make enabling delegation depend on NONPORTABLEVivian Wang
The unaligned access emulation code in Linux has various deficiencies. For example, it doesn't emulate vector instructions [1] [2], and doesn't emulate KVM guest accesses. Therefore, requesting misaligned exception delegation with SBI FWFT actually regresses vector instructions' and KVM guests' behavior. Until Linux can handle it properly, guard these sbi_fwft_set() calls behind RISCV_SBI_FWFT_DELEGATE_MISALIGNED, which in turn depends on NONPORTABLE. Those who are sure that this wouldn't be a problem can enable this option, perhaps getting better performance. The rest of the existing code proceeds as before, except as if SBI_FWFT_MISALIGNED_EXC_DELEG is not available, to handle any remaining address misaligned exceptions on a best-effort basis. The KVM SBI FWFT implementation is also not touched, but it is disabled if the firmware emulates unaligned accesses. Cc: stable@vger.kernel.org Fixes: cf5a8abc6560 ("riscv: misaligned: request misaligned exception from SBI") Reported-by: Songsong Zhang <U2FsdGVkX1@gmail.com> # KVM Link: https://lore.kernel.org/linux-riscv/38ce44c1-08cf-4e3f-8ade-20da224f529c@iscas.ac.cn/ [1] Link: https://lore.kernel.org/linux-riscv/b3cfcdac-0337-4db0-a611-258f2868855f@iscas.ac.cn/ [2] Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20260401-riscv-misaligned-dont-delegate-v2-1-5014a288c097@iscas.ac.cn Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-13riscv: cfi: reduce shadow stack size limit from 4GB to 2GBZong Li
Follow the ARM64 GCS (Guarded Control Stack) implementation approach by reducing the shadow stack size allocation from min(RLIMIT_STACK, 4GB) to min(RLIMIT_STACK/2, 2GB). See commit 506496bcbb42 ("arm64/gcs: Ensure that new threads have a GCS") Rationale: 1. Shadow stacks only store return addresses (8 bytes per entry), not local variables, function parameters, or saved registers. A 2GB shadow stack is far more than sufficient for any practical application, even with extremely deep recursion. Using half the size maintains adequate margin while being more resource-efficient. 2. On memory-constrained systems (e.g., platforms with only 4GB of physical memory, which is a common configuration), allocating 4GB of virtual address space for shadow stack per process/thread can lead to virtual memory allocation failures when the overcommit mode is set to OVERCOMMIT_GUESS or OVERCOMMIT_NEVER: Error: "__vm_enough_memory: not enough memory for the allocation" This reduces virtual address space consumption by 50% while maintaining more than adequate space for return address storage. Signed-off-by: Zong Li <zong.li@sifive.com> Link: https://patch.msgid.link/20260428024105.645162-1-zong.li@sifive.com [pjw@kernel.org: clean up patch description] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-13mm/page_alloc: fix initialization of tags of the huge zero folio with ↵David Hildenbrand (Arm)
init_on_free __GFP_ZEROTAGS semantics are currently a bit weird, but effectively this flag is only ever set alongside __GFP_ZERO and __GFP_SKIP_KASAN. If we run with init_on_free, we will zero out pages during __free_pages_prepare(), to skip zeroing on the allocation path. However, when allocating with __GFP_ZEROTAG set, post_alloc_hook() will consequently not only skip clearing page content, but also skip clearing tag memory. Not clearing tags through __GFP_ZEROTAGS is irrelevant for most pages that will get mapped to user space through set_pte_at() later: set_pte_at() and friends will detect that the tags have not been initialized yet (PG_mte_tagged not set), and initialize them. However, for the huge zero folio, which will be mapped through a PMD marked as special, this initialization will not be performed, ending up exposing whatever tags were still set for the pages. The docs (Documentation/arch/arm64/memory-tagging-extension.rst) state that allocation tags are set to 0 when a page is first mapped to user space. That no longer holds with the huge zero folio when init_on_free is enabled. Fix it by decoupling __GFP_ZEROTAGS from __GFP_ZERO, passing to tag_clear_highpages() whether we want to also clear page content. Invert the meaning of the tag_clear_highpages() return value to have clearer semantics. Reproduced with the huge zero folio by modifying the check_buffer_fill arm64/mte selftest to use a 2 MiB area, after making sure that pages have a non-0 tag set when freeing (note that, during boot, we will not actually initialize tags, but only set KASAN_TAG_KERNEL in the page flags). $ ./check_buffer_fill 1..20 ... not ok 17 Check initial tags with private mapping, sync error mode and mmap memory not ok 18 Check initial tags with private mapping, sync error mode and mmap/mprotect memory ... This code needs more cleanups; we'll tackle that next, like decoupling __GFP_ZEROTAGS from __GFP_SKIP_KASAN. [akpm@linux-foundation.org: s/__GPF_ZERO/__GFP_ZERO/, per David] Link: https://lore.kernel.org/20260421-zerotags-v2-1-05cb1035482e@kernel.org Fixes: adfb6609c680 ("mm/huge_memory: initialise the tags of the huge zero folio") Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Lance Yang <lance.yang@linux.dev> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Will Deacon <will@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "arm64: - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0 - Correctly deal with permission faults in guest_memfd memslots - Fix the steal-time selftest after the infrastructure was reworked - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390 - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events - Handle the current vcpu being NULL on EL2 panic - Fix the selftest_vcpu memcache being empty at the point of donation or sharing - Check that the memcache has enough capacity before engaging on the share/donate path - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context s390: - Fix array overrun with large amounts of PCI devices x86: - Never use L0's PAUSE loop exiting while L2 is running, since it's unlikely that a nested guest will help solving the hypervisor's spinlock contention - Fix emulation of MOVNTDQA - Fix typo in Xen hypercall tracepoint - Add back an optimization that was left behind when recently fixing a bug - Add module parameter to disable CET, whose implementation seems to have issues. For now it remains enabled by default Generic: - Reject offset causing an unsigned overflow in kvm_reset_dirty_gfn() Documentation: - Update stale links Selftests: - Fix guest_memfd_test with host page size > guest page size" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: VMX: introduce module parameter to disable CET KVM: x86: Swap the dst and src operand for MOVNTDQA KVM: x86: use again the flush argument of __link_shadow_page() KVM: selftests: Ensure gmem file sizes are multiple of host page size Documentation: kvm: update links in the references section of AMD Memory Encryption KVM: nSVM: Never use L0's PAUSE loop exiting while L2 is running KVM: x86: Fix Xen hypercall tracepoint argument assignment KVM: Reject wrapped offset in kvm_reset_dirty_gfn() KVM: arm64: Pre-check vcpu memcache for host->guest donate KVM: arm64: Pre-check vcpu memcache for host->guest share KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache KVM: arm64: Fix __deactivate_fgt macro parameter typo KVM: arm64: Guard against NULL vcpu on VHE hyp panic path KVM: arm64: Make EL2 exception entry and exit context-synchronization events MAINTAINERS: Add Steffen as reviewer for KVM/arm64 KVM: arm64: Remove potential UB on nvhe tracing clock update KVM: selftests: arm64: Fix steal_time test after UAPI refactoring KVM: arm64: Handle permission faults with guest_memfd KVM: arm64: nv: Consider the DS bit when translating TCR_EL2 KVM: arm64: Work around C1-Pro erratum 4193714 for protected guests ...
2026-05-13KVM: x86: Rate-limit global clock updates on vCPU loadLei Chen
commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. As a result, kvm_arch_vcpu_load() can queue global clock update requests every time a vCPU is scheduled when the master clock is disabled or when the vCPU is loaded for the first time. Restore the throttling with a per-VM ratelimit state and gate KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU scheduling does not generate a steady stream of redundant clock update requests. Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ Link: https://patch.msgid.link/20260409142226.2581-1-lei.chen@smartx.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-13x86/virt: Silence RCU lockdep splat in emergency virt callback pathMikhail Gavrilov
x86_virt_invoke_kvm_emergency_callback() reaches rcu_dereference() through machine_crash_shutdown() with IRQs disabled but with RCU not necessarily watching the crashing CPU, which triggers a suspicious RCU usage splat on debug kernels (CONFIG_PROVE_RCU=y) during panic/kdump: WARNING: suspicious RCU usage arch/x86/virt/hw.c:52 suspicious rcu_dereference_check() usage! rcu_scheduler_active = 2, debug_locks = 1 1 lock held by tee/11119: #0: ffff8881fa32c440 (sb_writers#3){.+.+}-{0:0}, at: ksys_write Call Trace: <TASK> dump_stack_lvl+0x84/0xd0 lockdep_rcu_suspicious.cold+0x37/0x8f x86_virt_invoke_kvm_emergency_callback+0x5f/0x70 x86_svm_emergency_disable_virtualization_cpu+0x2a/0x30 x86_virt_emergency_disable_virtualization_cpu+0x6b/0x90 native_machine_crash_shutdown+0x72/0x170 __crash_kexec+0x137/0x280 panic+0xce/0xd0 sysrq_handle_crash+0x1f/0x20 __handle_sysrq.cold+0x192/0x335 write_sysrq_trigger+0x8c/0xc0 proc_reg_write+0x1c3/0x3c0 vfs_write+0x1d0/0xf80 ksys_write+0x116/0x250 do_syscall_64+0x11c/0x1480 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> A truly correct fix is non-trivial: the RCU usage genuinely is wrong in panic context (RCU may ignore the crashing CPU during synchronization), and a concurrent KVM module unload could in principle race with the callback read; see commit 2baa33a8ddd6 ("KVM: x86: Leave user-return notifier registered on reboot/shutdown") which notes that nothing prevents module unload during panic/reboot. However, the alternatives are worse: - smp_store_release()/smp_load_acquire() handles ordering but not liveness; the kernel still needs to keep the module text alive while the callback is in flight. - Taking a lock in the panic path is risky — any lock could be held by a CPU that has already been NMI'd to a halt. Use rcu_dereference_raw() to silence the splat and accept the vanishingly small remaining race. Panic context inherently cannot guarantee complete correctness; the goal here is to keep debug builds quiet on the kdump path so the splat doesn't obscure the actual kernel state being captured. Reproducible on a debug kernel (CONFIG_PROVE_LOCKING=y, CONFIG_PROVE_RCU=y) with kvm_amd or kvm_intel loaded by triggering kdump: echo c > /proc/sysrq-trigger Suggested-by: Sean Christopherson <seanjc@google.com> Fixes: 428afac5a8ea ("KVM: x86: Move bulk of emergency virtualizaton logic to virt subsystem") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260504235435.90957-1-mikhail.v.gavrilov@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-13x86/mce: Restore MCA polling interval halvingBorislav Petkov (AMD)
RongQing reported that the MCA polling interval doesn't halve when an error gets logged. It was traced down to the commit in Fixes:, because: mce_timer_fn() |-> mce_poll_banks() |-> machine_check_poll() |-> mce_log() which will queue the work and return. Now, back in mce_timer_fn(): /* * Alert userspace if needed. If we logged an MCE, reduce the polling * interval, otherwise increase the polling interval. */ if (mce_notify_irq()) <--- here we haven't ran the notifier chain yet so mce_need_notify is not set yet so this won't hit and we won't halve the interval iv. Now the notifier chain runs. mce_early_notifier() sets the bit, does mce_notify_irq(), that clears the bit and then the notifier chain a little later logs the error. So this is a silly timing issue. But, that's all unnecessary. All it needs to happen here is, the "should we notify of a logged MCE" mce_notify_irq() asks, should be simply a question to the mce gen pool: "Are you empty?" And that then turns into a simple yes or no answer and it all JustWorks(tm). So do that and also distribute the functionality where it belongs: - Print that MCE events have been logged in mce_log() - Trigger the mcelog tool specific work in the first notifier As a result, mce_notify_irq() can go now. Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector") Reported-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com
2026-05-13KVM: VMX: introduce module parameter to disable CETPaolo Bonzini
There have been reports of host hangs caused by CET virtualization. Until these are analyzed further, introduce a module parameter that makes it possible to easily disable it. Link: https://lore.kernel.org/all/85548beb-1486-40f9-beb4-632c78e3360b@proxmox.com/ Cc: David Riley <d.riley@proxmox.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12Merge tag 'kvm-s390-master-7.1-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: pci: fix array indexing For large amounts of PCI devices its possible to overrun the arrays as the index was miscalculated in 2 places.
2026-05-12KVM: x86: Swap the dst and src operand for MOVNTDQASean Christopherson
Swap the MOVNTDQA operands, as MOVNTDQA does NOT in fact have "the same characteristics as 0F E7 (MOVNTDQ)"; MOVNTDQA loads from memory and stores to registers, while MOVNTDQ loads from registers and stores to memory. Per the SDM: MOVNTDQ - Move packed integer values in xmm1 to m128 using non-temporal hint. MOVNTDQA - Move double quadword from m128 to xmm1 using non-temporal hint if WC memory type. Reported-by: Josh Eads <josheads@google.com> Fixes: c57d9bafbd0b ("KVM: x86: Add support for emulating MOVNTDQA") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260506213514.2781948-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12KVM: x86: use again the flush argument of __link_shadow_page()Paolo Bonzini
Except in the case of parentless nested-TDP pages, mmu_page_zap_pte() clears the SPTE but leaves the invalid_list empty. In this case, using kvm_flush_remote_tlbs() as kvm_mmu_remote_flush_or_zap() does is overkill. Avoid flushing the entirety of the remote TLBs unless the invalid_list was populated: instead, use a more efficient gfn-targeting flush (if available) and skip it altogether if the caller guarantees that a TLB flush is not necessary. Based-on: <20260503201029.106481-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20260503210917.121840-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-12arm64: dts: qcom: x1-dell-thena: remove i2c20 (battery SMBus) and reserve ↵Val Packett
its pins i2c20 is used by the battmgr service on the ADSP to communicate with the SBS interface of the battery. Initializing it from Linux would break the battmgr functionality when booted in EL2. Mark those pins as reserved. Fixes: e7733b42111c ("arm64: dts: qcom: Add support for Dell Inspiron 7441 / Latitude 7455") Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Signed-off-by: Val Packett <val@packett.cool> Link: https://lore.kernel.org/r/20260312005731.12488-2-val@packett.cool Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-05-12Merge tag 'kvmarm-fixes-7.1-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #2 - Add the pKVM side of the workaround for ARM's erratum 4193714, provided that the EL3 firmware does its part of the job. KVM will refuse to initialise otherwise. - Correctly handle 52bit VAs for guest EL2 stage-1 translations when running under NV with E2H==0. - Correctly deal with permission faults in guest_memfd memslots. - Fix the steal-time selftest after the infrastructure was reworked. - Make sure the host cannot pass a non-sensical clock update to the EL2 tracing infrastructure. - Appoint Steffen Eiden as a reviewer in anticipation of the KVM/s390 ability to run arm64 guests, which will inevitably lead to arm64 code being directly used on s390. - Make sure that EL2 is configured with both exception entry and exit being Context Synchronization Events. - Handle the current vcpu being NULL on EL2 panic. - Fix the selftest_vcpu memcache being empty at the point of donation or sharing. - Check that the memcache has enough capacity before engaging on the share/donate path. - Fix __deactivate_fgt() to use its parameter rather than a variable in the macro context.