linux.git/arch/x86/platform/efi, branch master

x86/efi: Restore IRQ state in EFI page fault handler

2026-05-05T07:31:28+00:00

The kernel's softirq API does not permit re-enabling softirqs while IRQs
are disabled. The reason for this is that local_bh_enable() will not
only re-enable delivery of softirqs over the back of IRQs, it will also
handle any pending softirqs immediately, regardless of whether IRQs are
enabled at that point.

For this reason, commit

  d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")

disables softirqs only when IRQs are enabled, as it is not permitted
otherwise, but also unnecessary, given that asynchronous softirq
delivery never happens to begin with while IRQs are disabled.

However, this does mean that entering a kernel mode FPU section with
IRQs enabled and leaving it with IRQs disabled leads to problems, as
identified by Sashiko [0]: the EFI page fault handler is called from
page_fault_oops() with IRQs disabled, and thus ends the kernel mode FPU
section with IRQs disabled as well, regardless of whether IRQs were
enabled when it was started. This may result in schedule() being called
with a non-zero preempt_count, causing a BUG().

So take care to re-enable IRQs when handling any EFI page faults if they
were taken with IRQs enabled.

[0] https://sashiko.dev/#/patchset/20260430074107.27051-1-ivan.hu%40canonical.com

Cc: Eric Biggers 
Cc: Ivan Hu 
Cc: x86@kernel.org
Cc: 
Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Reviewed-by: Eric Biggers 
Signed-off-by: Ard Biesheuvel

x86/efi: Fix graceful fault handling after FPU softirq changes

2026-05-04T10:41:51+00:00

Since commit d02198550423 ("x86/fpu: Improve crypto performance by
making kernel-mode FPU reliably usable in softirqs"), kernel_fpu_begin()
calls fpregs_lock() which uses local_bh_disable() instead of the
previous preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count
during the entire EFI runtime service call, causing in_interrupt() to
return true in normal task context.

The graceful page fault handler efi_crash_gracefully_on_page_fault()
uses in_interrupt() to bail out for faults in real interrupt context.
With SOFTIRQ_OFFSET now set, the handler always bails out, leaving EFI
firmware page faults unhandled. This escalates to die() which also sees
in_interrupt() as true and calls panic("Fatal exception in interrupt"),
resulting in a hard system freeze. On systems with buggy firmware that
triggers page faults during EFI runtime calls (e.g., accessing unmapped
memory in GetTime()), this causes an unrecoverable hang instead of the
expected graceful EFI_ABORTED recovery.

Fix by replacing in_interrupt() with !in_task(). This preserves the
original intent of bailing for interrupts or NMI faults, while no longer
falsely triggering from the FPU code path's local_bh_disable().

Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs")
Cc: 
Signed-off-by: Ivan Hu 
[ardb: Sashiko spotted that using 'in_hardirq() || in_nmi()' leaves a
       window where a softirq may be taken before fpregs_lock() is
       called, but after efi_rts_work.efi_rts_id has been assigned,
       and any page faults occurring in that window will then be
       misidentified as having been caused by the firmware. Instead,
       use !in_task(), which incorporates in_serving_softirq(). ]
Signed-off-by: Ard Biesheuvel

Merge tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

2026-04-18T18:29:14+00:00

Pull memblock updates from Mike Rapoport:

 - improve debuggability of reserve_mem kernel parameter handling with
   print outs in case of a failure and debugfs info showing what was
   actually reserved

 - Make memblock_free_late() and free_reserved_area() use the same core
   logic for freeing the memory to buddy and ensure it takes care of
   updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled.

* tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  x86/alternative: delay freeing of smp_locks section
  memblock: warn when freeing reserved memory before memory map is initialized
  memblock, treewide: make memblock_free() handle late freeing
  memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y
  memblock: extract page freeing from free_reserved_area() into a helper
  memblock: make free_reserved_area() more robust
  mm: move free_reserved_area() to mm/memblock.c
  powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact()
  powerpc: fadump: pair alloc_pages_exact() with free_pages_exact()
  memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name()
  memblock: move reserve_bootmem_range() to memblock.c and make it static
  memblock: Add reserve_mem debugfs info
  memblock: Print out errors on reserve_mem parser

Merge tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity

2026-04-17T22:42:01+00:00

Pull integrity updates from Mimi Zohar:
 "There are two main changes, one feature removal, some code cleanup,
  and a number of bug fixes.

  Main changes:
   - Detecting secure boot mode was limited to IMA. Make detecting
     secure boot mode accessible to EVM and other LSMs
   - IMA sigv3 support was limited to fsverity. Add IMA sigv3 support
     for IMA regular file hashes and EVM portable signatures

  Remove:
   - Remove IMA support for asychronous hash calculation originally
     added for hardware acceleration

  Cleanup:
   - Remove unnecessary Kconfig CONFIG_MODULE_SIG and CONFIG_KEXEC_SIG
     tests
   - Add descriptions of the IMA atomic flags

  Bug fixes:
   - Like IMA, properly limit EVM "fix" mode
   - Define and call evm_fix_hmac() to update security.evm
   - Fallback to using i_version to detect file change for filesystems
     that do not support STATX_CHANGE_COOKIE
   - Address missing kernel support for configured (new) TPM hash
     algorithms
   - Add missing crypto_shash_final() return value"

* tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
  evm: Enforce signatures version 3 with new EVM policy 'bit 3'
  integrity: Allow sigv3 verification on EVM_XATTR_PORTABLE_DIGSIG
  ima: add support to require IMA sigv3 signatures
  ima: add regular file data hash signature version 3 support
  ima: Define asymmetric_verify_v3() to verify IMA sigv3 signatures
  ima: remove buggy support for asynchronous hashes
  integrity: Eliminate weak definition of arch_get_secureboot()
  ima: Add code comments to explain IMA iint cache atomic_flags
  ima_fs: Correctly create securityfs files for unsupported hash algos
  ima: check return value of crypto_shash_final() in boot aggregate
  ima: Define and use a digest_size field in the ima_algo_desc structure
  powerpc/ima: Drop unnecessary check for CONFIG_MODULE_SIG
  ima: efi: Drop unnecessary check for CONFIG_MODULE_SIG/CONFIG_KEXEC_SIG
  ima: fallback to using i_version to detect file change
  evm: fix security.evm for a file with IMA signature
  s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
  evm: Don't enable fix mode when secure boot is enabled
  integrity: Make arch_ima_get_secureboot integrity-wide

Merge tag 'x86_cpu_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2026-04-14T21:24:45+00:00

Pull x86 cpu updates from Dave Hansen:

 - Complete LASS enabling: deal with vsyscall and EFI

   The existing Linear Address Space Separation (LASS) support punted
   on support for common EFI and vsyscall configs. Complete the
   implementation by supporting EFI and vsyscall=xonly.

 - Clean up CPUID usage in newer Intel "avs" audio driver and update the
   x86-cpuid-db file

* tag 'x86_cpu_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tools/x86/kcpuid: Update bitfields to x86-cpuid-db v3.0
  ASoC: Intel: avs: Include CPUID header at file scope
  ASoC: Intel: avs: Check maximum valid CPUID leaf
  x86/cpu: Remove LASS restriction on vsyscall emulation
  x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE
  x86/vsyscall: Restore vsyscall=xonly mode under LASS
  x86/traps: Consolidate user fixups in the #GP handler
  x86/vsyscall: Reorganize the page fault emulation code
  x86/cpu: Remove LASS restriction on EFI
  x86/efi: Disable LASS while executing runtime services
  x86/cpu: Defer LASS enabling until userspace comes up

memblock, treewide: make memblock_free() handle late freeing

2026-04-01T08:20:15+00:00

It shouldn't be responsibility of memblock users to detect if they free
memory allocated from memblock late and should use memblock_free_late().

Make memblock_free() and memblock_phys_free() take care of late memory
freeing and drop memblock_free_late().

Link: https://patch.msgid.link/20260323074836.3653702-9-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft)

x86/efi: efi_unmap_boot_services: fix calculation of ranges_to_free size

2026-03-20T14:31:10+00:00

ranges_to_free array should have enough room to store the entire EFI
memmap plus an extra element for NULL entry.
The calculation of this array size wrongly adds 1 to the overall size
instead of adding 1 to the number of elements.

Add parentheses to properly size the array.

Reported-by: Guenter Roeck 
Fixes: a4b0bf6a40f3 ("x86/efi: defer freeing of boot services memory")
Signed-off-by: Mike Rapoport (Microsoft) 
Signed-off-by: Ard Biesheuvel

integrity: Make arch_ima_get_secureboot integrity-wide

2026-03-05T16:10:08+00:00

EVM and other LSMs need the ability to query the secure boot status of
the system, without directly calling the IMA arch_ima_get_secureboot
function. Refactor the secure boot status check into a general function
named arch_get_secureboot.

Reported-and-suggested-by: Mimi Zohar 
Suggested-by: Roberto Sassu 
Signed-off-by: Coiby Xu 
Acked-by: Ard Biesheuvel 
Signed-off-by: Mimi Zohar

x86/efi: Disable LASS while executing runtime services

2026-03-03T17:49:45+00:00

Ideally, EFI runtime services should switch to kernel virtual addresses
after SetVirtualAddressMap(). However, firmware implementations are
known to be buggy in this regard and continue to access physical
addresses. The kernel maintains a 1:1 mapping of all runtime services
code and data regions to avoid breaking such firmware.

LASS enforcement relies on bit 63 of the virtual address, which would
block such accesses to the lower half. Unfortunately, not doing anything
could lead to #GP faults when users update to a kernel with LASS
enabled.

One option is to use a STAC/CLAC pair to temporarily disable LASS data
enforcement. However, there is no guarantee that the stray accesses
would only touch data and not perform instruction fetches. Also, relying
on the AC bit would depend on the runtime calls preserving RFLAGS, which
is highly unlikely in practice.

Instead, use the big hammer and switch off the entire LASS mechanism
temporarily by clearing CR4.LASS. Runtime services are called in the
context of efi_mm, which has explicitly unmapped any memory EFI isn't
allowed to touch (including userspace). So, do this right after
switching to efi_mm to avoid any security impact.

Some runtime services can be invoked during boot when LASS isn't active.
Use a global variable (similar to efi_mm) to save and restore the
correct CR4.LASS state. The runtime calls are serialized with the
efi_runtime_lock, so no concurrency issues are expected.

Signed-off-by: Sohil Mehta 
Signed-off-by: Dave Hansen 
Tested-by: Tony Luck 
Tested-by: Maciej Wieczor-Retman 
Link: https://patch.msgid.link/20260120234730.2215498-3-sohil.mehta@intel.com

x86/efi: defer freeing of boot services memory

2026-02-25T11:02:48+00:00

efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE
and EFI_BOOT_SERVICES_DATA using memblock_free_late().

There are two issue with that: memblock_free_late() should be used for
memory allocated with memblock_alloc() while the memory reserved with
memblock_reserve() should be freed with free_reserved_area().

More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y
efi_free_boot_services() is called before deferred initialization of the
memory map is complete.

Benjamin Herrenschmidt reports that this causes a leak of ~140MB of
RAM on EC2 t3a.nano instances which only have 512MB or RAM.

If the freed memory resides in the areas that memory map for them is
still uninitialized, they won't be actually freed because
memblock_free_late() calls memblock_free_pages() and the latter skips
uninitialized pages.

Using free_reserved_area() at this point is also problematic because
__free_page() accesses the buddy of the freed page and that again might
end up in uninitialized part of the memory map.

Delaying the entire efi_free_boot_services() could be problematic
because in addition to freeing boot services memory it updates
efi.memmap without any synchronization and that's undesirable late in
boot when there is concurrency.

More robust approach is to only defer freeing of the EFI boot services
memory.

Split efi_free_boot_services() in two. First efi_unmap_boot_services()
collects ranges that should be freed into an array then
efi_free_boot_services() later frees them after deferred init is complete.

Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org
Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode")
Cc: 
Signed-off-by: Mike Rapoport (Microsoft) 
Reviewed-by: Benjamin Herrenschmidt 
Signed-off-by: Ard Biesheuvel