linux.git/include/linux/kvm_host.h, branch v7.1-rc2

Merge tag 'kvm-x86-svm-7.1' of https://github.com/kvm-x86/linux into HEAD

2026-04-13T17:00:43+00:00

KVM SVM changes for 7.1

 - Fix and optimize IRQ window inhibit handling for AVIC (the tracking needs to
   be per-vCPU, e.g. so that KVM doesn't prematurely re-enable AVIC if multiple
   vCPUs have to-be-injected IRQs).

 - Fix an undefined behavior warning where a crafty userspace can read the
   "avic" module param before it's fully initialized.

 - Fix a (likely benign) bug in the "OS-visible workarounds" handling, where
   KVM could clobber state when enabling virtualization on multiple CPUs in
   parallel, and clean up and optimize the code.

 - Drop a WARN in KVM_MEMORY_ENCRYPT_REG_REGION where KVM complains about a
   "too large" size based purely on user input, and clean up and harden the
   related pinning code.

 - Disallow synchronizing a VMSA of an already-launched/encrypted vCPU, as
   doing so for an SNP guest will trigger an RMP violation #PF and crash the
   host.

 - Protect all of sev_mem_enc_register_region() with kvm->lock to ensure
   sev_guest() is stable for the entire of the function.

 - Lock all vCPUs when synchronizing VMSAs for SNP guests to ensure the VMSA
   page isn't actively being used.

 - Overhaul KVM's APIs for detecting SEV+ guests so that VM-scoped queries are
   required to hold kvm->lock (KVM has had multiple bugs due "is SEV?" checks
   becoming stale), enforced by lockdep.  Add and use vCPU-scoped APIs when
   possible/appropriate, as all checks that originate from a vCPU are
   guaranteed to be stable.

 - Convert a pile of kvm->lock SEV code to guard().

Merge tag 'kvm-x86-vmxon-7.1' of https://github.com/kvm-x86/linux into HEAD

2026-04-13T11:04:48+00:00

KVM x86 VMXON and EFER.SVME extraction for 7.1

Move _only_ VMXON+VMXOFF and EFER.SVME toggling out of KVM (versus all of VMX
and SVM enabling) out of KVM and into the core kernel so that non-KVM TDX
enabling, e.g. for trusted I/O, can make SEAMCALLs without needing to ensure
KVM is fully loaded.

TIO isn't a hypervisor, and isn't trying to be a hypervisor. Specifically, TIO
should _never_ have it's own VMCSes (that are visible to the host; the
TDX-Module has it's own VMCSes to do SEAMCALL/SEAMRET), and so there is simply
no reason to move that functionality out of KVM.

With that out of the way, dealing with VMXON/VMXOFF and EFER.SVME is a fairly
simple refcounting game.

Merge tag 'kvm-x86-mmio-7.1' of https://github.com/kvm-x86/linux into HEAD

2026-04-13T10:49:14+00:00

KVM x86 emulated MMIO changes for 7.1

Copy single-chunk MMIO write values into a persistent (per-fragment) field to
fix use-after-free stack bugs due to KVM dereferencing a stack pointer after an
exit to userspace.

Clean up and comment the emulated MMIO code to try to make it easier to
maintain (not necessarily "easy", but "easier").

KVM: SEV: Disallow LAUNCH_FINISH if vCPUs are actively being created

2026-04-03T16:37:36+00:00

Reject LAUNCH_FINISH for SEV-ES and SNP VMs if KVM is actively creating
one or more vCPUs, as KVM needs to process and encrypt each vCPU's VMSA.
Letting userspace create vCPUs while LAUNCH_FINISH is in-progress is
"fine", at least in the current code base, as kvm_for_each_vcpu() operates
on online_vcpus, LAUNCH_FINISH (all SEV+ sub-ioctls) holds kvm->mutex, and
fully onlining a vCPU in kvm_vm_ioctl_create_vcpu() is done under
kvm->mutex.  I.e. there's no difference between an in-progress vCPU and a
vCPU that is created entirely after LAUNCH_FINISH.

However, given that concurrent LAUNCH_FINISH and vCPU creation can't
possibly work (for any reasonable definition of "work"), since userspace
can't guarantee whether a particular vCPU will be encrypted or not,
disallow the combination as a hardening measure, to reduce the probability
of introducing bugs in the future, and to avoid having to reason about the
safety of future changes related to LAUNCH_FINISH.

Cc: Jethro Beekman 
Closes: https://lore.kernel.org/all/b31f7c6e-2807-4662-bcdd-eea2c1e132fa@fortanix.com
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260310234829.2608037-5-seanjc@google.com
Signed-off-by: Sean Christopherson

KVM: arm64: gic-v5: Introduce kvm_arm_vgic_v5_ops and register them

2026-03-19T18:21:29+00:00

Only the KVM_DEV_ARM_VGIC_GRP_CTRL->KVM_DEV_ARM_VGIC_CTRL_INIT op is
currently supported. All other ops are stubbed out.

Co-authored-by: Timothy Hayes 
Signed-off-by: Timothy Hayes 
Signed-off-by: Sascha Bischoff 
Reviewed-by: Jonathan Cameron 
Link: https://patch.msgid.link/20260319154937.3619520-36-sascha.bischoff@arm.com
Signed-off-by: Marc Zyngier

Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into HEAD

2026-03-11T17:01:55+00:00

KVM generic changes for 7.0

 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
   unnecessary and confusing, triggered compiler warnings due to
   -Wflex-array-member-not-at-end.

 - Document that vcpu->mutex is take outside of kvm->slots_lock and
   kvm->slots_arch_lock, which is intentional and desirable despite being
   rather unintuitive.

KVM: Bury kvm_{en,dis}able_virtualization() in kvm_main.c once more

2026-03-04T16:53:10+00:00

Now that TDX handles doing VMXON without KVM's involvement, bury the
top-level APIs to enable and disable virtualization back in kvm_main.c.

No functional change intended.

Reviewed-by: Dan Williams 
Reviewed-by: Chao Gao 
Tested-by: Chao Gao 
Tested-by: Sagi Shahar 
Link: https://patch.msgid.link/20260214012702.2368778-16-seanjc@google.com
Signed-off-by: Sean Christopherson

KVM: x86: Move kvm_rebooting to x86

2026-03-04T16:52:19+00:00

Move kvm_rebooting, which is only read by x86, to KVM x86 so that it can
be moved again to core x86 code.  Add a "shutdown" arch hook to facilate
setting the flag in KVM x86, along with a pile of comments to provide more
context around what KVM x86 is doing and why.

Reviewed-by: Chao Gao 
Acked-by: Dave Hansen 
Tested-by: Chao Gao 
Reviewed-by: Dan Williams 
Tested-by: Sagi Shahar 
Link: https://patch.msgid.link/20260214012702.2368778-2-seanjc@google.com
Signed-off-by: Sean Christopherson

KVM: x86: Use scratch field in MMIO fragment to hold small write values

2026-03-03T00:02:52+00:00

When exiting to userspace to service an emulated MMIO write, copy the
to-be-written value to a scratch field in the MMIO fragment if the size
of the data payload is 8 bytes or less, i.e. can fit in a single chunk,
instead of pointing the fragment directly at the source value.

This fixes a class of use-after-free bugs that occur when the emulator
initiates a write using an on-stack, local variable as the source, the
write splits a page boundary, *and* both pages are MMIO pages.  Because
KVM's ABI only allows for physically contiguous MMIO requests, accesses
that split MMIO pages are separated into two fragments, and are sent to
userspace one at a time.  When KVM attempts to complete userspace MMIO in
response to KVM_RUN after the first fragment, KVM will detect the second
fragment and generate a second userspace exit, and reference the on-stack
variable.

The issue is most visible if the second KVM_RUN is performed by a separate
task, in which case the stack of the initiating task can show up as truly
freed data.

  ==================================================================
  BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420
  Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984

  CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
  dump_stack+0xbe/0xfd
  print_address_description.constprop.0+0x19/0x170
  __kasan_report.cold+0x6c/0x84
  kasan_report+0x3a/0x50
  check_memory_region+0xfd/0x1f0
  memcpy+0x20/0x60
  complete_emulated_mmio+0x305/0x420
  kvm_arch_vcpu_ioctl_run+0x63f/0x6d0
  kvm_vcpu_ioctl+0x413/0xb20
  __se_sys_ioctl+0x111/0x160
  do_syscall_64+0x30/0x40
  entry_SYSCALL_64_after_hwframe+0x67/0xd1
  RIP: 0033:0x42477d
  Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
  RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d
  RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
  RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c
  R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720

  The buggy address belongs to the page:
  page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37
  flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
  raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000
  raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected

  Memory state around the buggy address:
  ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                   ^
  ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ==================================================================

The bug can also be reproduced with a targeted KVM-Unit-Test by hacking
KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by
overwrite the data value with garbage.

Limit the use of the scratch fields to 8-byte or smaller accesses, and to
just writes, as larger accesses and reads are not affected thanks to
implementation details in the emulator, but add a sanity check to ensure
those details don't change in the future.  Specifically, KVM never uses
on-stack variables for accesses larger that 8 bytes, e.g. uses an operand
in the emulator context, and *all* reads are buffered through the mem_read
cache.

Note!  Using the scratch field for reads is not only unnecessary, it's
also extremely difficult to handle correctly.  As above, KVM buffers all
reads through the mem_read cache, and heavily relies on that behavior when
re-emulating the instruction after a userspace MMIO read exit.  If a read
splits a page, the first page is NOT an MMIO page, and the second page IS
an MMIO page, then the MMIO fragment needs to point at _just_ the second
chunk of the destination, i.e. its position in the mem_read cache.  Taking
the "obvious" approach of copying the fragment value into the destination
when re-emulating the instruction would clobber the first chunk of the
destination, i.e. would clobber the data that was read from guest memory.

Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO")
Suggested-by: Yashu Zhang 
Reported-by: Yashu Zhang 
Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com
Cc: stable@vger.kernel.org
Tested-by: Tom Lendacky 
Tested-by: Rick Edgecombe 
Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com
Signed-off-by: Sean Christopherson

KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER

2026-02-28T14:31:35+00:00

All architectures now use MMU notifier for KVM page table management.
Remove the Kconfig symbol and the code that is used when it is
disabled.

Signed-off-by: Paolo Bonzini