linux.git/arch/x86, branch v6.8

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

2024-03-10T16:27:39+00:00

Pull kvm fixes from Paolo Bonzini:
 "KVM GUEST_MEMFD fixes for 6.8:

   - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
     to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not
     writable from userspace, so there would be no way to write to a
     read-only guest_memfd).

   - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
     clear that such VMs are purely for development and testing.

   - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term
     plan is to support confidential VMs with deterministic private
     memory (SNP and TDX) only in the TDP MMU.

   - Fix a bug in a GUEST_MEMFD dirty logging test that caused false
     passes.

  x86 fixes:

   - Fix missing marking of a guest page as dirty when emulating an
     atomic access.

   - Check for mmu_notifier invalidation events before faulting in the
     pfn, and before acquiring mmu_lock, to avoid unnecessary work and
     lock contention with preemptible kernels (including
     CONFIG_PREEMPT_DYNAMIC in non-preemptible mode).

   - Disable AMD DebugSwap by default, it breaks VMSA signing and will
     be re-enabled with a better VM creation API in 6.10.

   - Do the cache flush of converted pages in svm_register_enc_region()
     before dropping kvm->lock, to avoid a race with unregistering of
     the same region and the consequent use-after-free issue"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  SEV: disable SEV-ES DebugSwap by default
  KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing
  KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
  KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive
  KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases
  KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU
  KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP
  KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY
  KVM: x86: Mark target gfn of emulated atomic instruction as dirty

SEV: disable SEV-ES DebugSwap by default

2024-03-09T16:42:25+00:00

The DebugSwap feature of SEV-ES provides a way for confidential guests to use
data breakpoints.  However, because the status of the DebugSwap feature is
recorded in the VMSA, enabling it by default invalidates the attestation
signatures.  In 6.10 we will introduce a new API to create SEV VMs that
will allow enabling DebugSwap based on what the user tells KVM to do.
Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
enable DebugSwap.

For compatibility with kernels that pre-date the introduction of DebugSwap,
as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
the feature by default.  If anybody wants to use it, for now they can enable
the sev_es_debug_swap_enabled module parameter, but this will result in a
warning.

Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini

Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of https://github.com/kvm-x86/linux into HEAD

2024-03-09T16:42:17+00:00

KVM GUEST_MEMFD fixes for 6.8:

 - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to
   avoid creating ABI that KVM can't sanely support.

 - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly
   clear that such VMs are purely a development and testing vehicle, and
   come with zero guarantees.

 - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan
   is to support confidential VMs with deterministic private memory (SNP
   and TDX) only in the TDP MMU.

 - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes
   when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.

Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEAD

2024-03-09T16:42:06+00:00

KVM x86 fixes for 6.8, round 2:

 - When emulating an atomic access, mark the gfn as dirty in the memslot
   to fix a bug where KVM could fail to mark the slot as dirty during live
   migration, ultimately resulting in guest data corruption due to a dirty
   page not being re-copied from the source to the target.

 - Check for mmu_notifier invalidation events before faulting in the pfn,
   and before acquiring mmu_lock, to avoid unnecessary work and lock
   contention.  Contending mmu_lock is especially problematic on preemptible
   kernels, as KVM may yield mmu_lock in response to the contention, which
   severely degrades overall performance due to vCPUs making it difficult
   for the task that triggered invalidation to make forward progress.

   Note, due to another kernel bug, this fix isn't limited to preemtible
   kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield
   contended rwlocks and spinlocks.

   https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com

Merge tag 'hyperv-fixes-signed-20240303' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux

2024-03-05T20:38:50+00:00

Pull hyperv fixes from Wei Liu:

 - Multiple fixes, cleanups and documentations for Hyper-V core code and
   drivers

* tag 'hyperv-fixes-signed-20240303' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
  Drivers: hv: vmbus: make hv_bus const
  x86/hyperv: Allow 15-bit APIC IDs for VTL platforms
  x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()
  x86/mm: Regularize set_memory_p() parameters and make non-static
  x86/hyperv: Use slow_virt_to_phys() in page transition hypervisor callback
  Documentation: hyperv: Add overview of PCI pass-thru device support
  Drivers: hv: vmbus: Update indentation in create_gpadl_header()
  Drivers: hv: vmbus: Remove duplication and cleanup code in create_gpadl_header()
  fbdev/hyperv_fb: Fix logic error for Gen2 VMs in hvfb_getmem()
  Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
  hv_utils: Allow implicit ICTIMESYNCFLAG_SYNC

Merge tag 'x86_urgent_for_v6.8_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2024-03-03T17:43:03+00:00

Pull x86 fixes from Borislav Petkov:

 - Do not reserve SETUP_RNG_SEED setup data in the e820 map as it should
   be used by kexec only

 - Make sure MKTME feature detection happens at an earlier time in the
   boot process so that the physical address size supported by the CPU
   is properly corrected and MTRR masks are programmed properly, leading
   to TDX systems booting without disable_mtrr_cleanup on the cmdline

 - Make sure the different address sizes supported by the CPU are read
   out as early as possible

* tag 'x86_urgent_for_v6.8_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/e820: Don't reserve SETUP_RNG_SEED in e820
  x86/cpu/intel: Detect TME keyid bits before setting MTRR mask registers
  x86/cpu: Allow reducing x86_phys_bits during early_identify_cpu()

x86/e820: Don't reserve SETUP_RNG_SEED in e820

2024-03-01T18:27:20+00:00

SETUP_RNG_SEED in setup_data is supplied by kexec and should
not be reserved in the e820 map.

Doing so reserves 16 bytes of RAM when booting with kexec.
(16 bytes because data->len is zeroed by parse_setup_data so only
sizeof(setup_data) is reserved.)

When kexec is used repeatedly, each boot adds two entries in the
kexec-provided e820 map as the 16-byte range splits a larger
range of usable memory. Eventually all of the 128 available entries
get used up. The next split will result in losing usable memory
as the new entries cannot be added to the e820 map.

Fixes: 68b8e9713c8e ("x86/setup: Use rng seeds from setup_data")
Signed-off-by: Jiri Bohac 
Signed-off-by: Borislav Petkov (AMD) 
Signed-off-by: Dave Hansen 
Cc: 
Link: https://lore.kernel.org/r/ZbmOjKnARGiaYBd5@dwarf.suse.cz

x86/hyperv: Allow 15-bit APIC IDs for VTL platforms

2024-03-01T08:36:22+00:00

The current method for signaling the compatibility of a Hyper-V host
with MSIs featuring 15-bit APIC IDs relies on a synthetic cpuid leaf.
However, for higher VTLs, this leaf is not reported, due to the absence
of an IO-APIC.

As an alternative, assume that when running at a high VTL, the host
supports 15-bit APIC IDs. This assumption is safe, as Hyper-V does not
employ any architectural MSIs at higher VTLs

This unblocks startup of VTL2 environments with more than 256 CPUs.

Signed-off-by: Saurabh Sengar 
Reviewed-by: Michael Kelley 
Link: https://lore.kernel.org/r/1705341460-18394-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu 
Message-ID: <1705341460-18394-1-git-send-email-ssengar@linux.microsoft.com>

x86/hyperv: Make encrypted/decrypted changes safe for load_unaligned_zeropad()

2024-03-01T08:31:42+00:00

In a CoCo VM, when transitioning memory from encrypted to decrypted, or
vice versa, the caller of set_memory_encrypted() or set_memory_decrypted()
is responsible for ensuring the memory isn't in use and isn't referenced
while the transition is in progress.  The transition has multiple steps,
and the memory is in an inconsistent state until all steps are complete.
A reference while the state is inconsistent could result in an exception
that can't be cleanly fixed up.

However, the kernel load_unaligned_zeropad() mechanism could cause a stray
reference that can't be prevented by the caller of set_memory_encrypted()
or set_memory_decrypted(), so there's specific code to handle this case.
But a CoCo VM running on Hyper-V may be configured to run with a paravisor,
with the #VC or #VE exception routed to the paravisor. There's no
architectural way to forward the exceptions back to the guest kernel, and
in such a case, the load_unaligned_zeropad() specific code doesn't work.

To avoid this problem, mark pages as "not present" while a transition
is in progress. If load_unaligned_zeropad() causes a stray reference, a
normal page fault is generated instead of #VC or #VE, and the
page-fault-based fixup handlers for load_unaligned_zeropad() resolve the
reference. When the encrypted/decrypted transition is complete, mark the
pages as "present" again.

Signed-off-by: Michael Kelley 
Reviewed-by: Kuppuswamy Sathyanarayanan 
Link: https://lore.kernel.org/r/20240116022008.1023398-4-mhklinux@outlook.com
Signed-off-by: Wei Liu 
Message-ID: <20240116022008.1023398-4-mhklinux@outlook.com>

x86/mm: Regularize set_memory_p() parameters and make non-static

2024-03-01T08:31:41+00:00

set_memory_p() is currently static.  It has parameters that don't
match set_memory_p() under arch/powerpc and that aren't congruent
with the other set_memory_* functions. There's no good reason for
the difference.

Fix this by making the parameters consistent, and update the one
existing call site.  Make the function non-static and add it to
include/asm/set_memory.h so that it is completely parallel to
set_memory_np() and is usable in other modules.

No functional change.

Signed-off-by: Michael Kelley 
Reviewed-by: Rick Edgecombe 
Acked-by: Kirill A. Shutemov 
Link: https://lore.kernel.org/r/20240116022008.1023398-3-mhklinux@outlook.com
Signed-off-by: Wei Liu 
Message-ID: <20240116022008.1023398-3-mhklinux@outlook.com>