linux-stable.git/arch/s390/kernel/entry.S, branch linux-rolling-stable

Merge tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

2026-03-28T16:50:11+00:00

Pull s390 fixes from Vasily Gorbik:

 - Add array_index_nospec() to syscall dispatch table lookup to prevent
   limited speculative out-of-bounds access with user-controlled syscall
   number

 - Mark array_index_mask_nospec() __always_inline since GCC may emit an
   out-of-line call instead of the inline data dependency sequence the
   mitigation relies on

 - Clear r12 on kernel entry to prevent potential speculative use of
   user value in system_call, ext/io/mcck interrupt handlers

* tag 's390-7.0-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/entry: Scrub r12 register on kernel entry
  s390/syscalls: Add spectre boundary for syscall dispatch table
  s390/barrier: Make array_index_mask_nospec() __always_inline

s390/entry: Scrub r12 register on kernel entry

2026-03-27T23:43:39+00:00

Before commit f33f2d4c7c80 ("s390/bp: remove TIF_ISOLATE_BP"),
all entry handlers loaded r12 with the current task pointer
(lg %r12,__LC_CURRENT) for use by the BPENTER/BPEXIT macros. That
commit removed TIF_ISOLATE_BP, dropping both the branch prediction
macros and the r12 load, but did not add r12 to the register clearing
sequence.

Add the missing xgr %r12,%r12 to make the register scrub consistent
across all entry points.

Fixes: f33f2d4c7c80 ("s390/bp: remove TIF_ISOLATE_BP")
Cc: stable@kernel.org
Reviewed-by: Ilya Leoshkevich 
Signed-off-by: Vasily Gorbik

KVM: s390: vsie: Avoid injecting machine check on signal

2026-03-16T15:56:39+00:00

The recent XFER_TO_GUEST_WORK change resulted in a situation, where the
vsie code would interpret a signal during work as a machine check during
SIE as both use the EINTR return code.
The exit_reason of the sie64a function has nothing to do with the
kvm_run exit_reason. Rename it and define a specific code for machine
checks instead of abusing -EINTR.
rename exit_reason into sie_return to avoid the naming conflict
and change the code flow in vsie.c to have a separate variable for rc
and sie_return.

Fixes: 2bd1337a1295e ("KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions")
Signed-off-by: Christian Borntraeger 
Reviewed-by: Heiko Carstens 
Reviewed-by: Claudio Imbrenda

s390/bug: Implement __WARN_printf()

2026-01-27T11:16:16+00:00

This is the s390 variant of commit 5b472b6e5bd9 ("x86_64/bug: Implement
__WARN_printf()"). See the x86 commit for the general idea; there are only
implementation details which are different.

With the new exception based __WARN_printf() implementation the generated
code for a simple WARN() is simplified.

For example:

void foo(int a) { WARN(a, "bar"); }

Before this change the generated code looks like this:

0000000000000210 :
 210:   c0 04 00 00 00 00       jgnop   210 
 216:   ec 26 00 06 00 7c       cgijne  %r2,0,222 
 21c:   c0 f4 00 00 00 00       jg      21c 
                        21e: R_390_PC32DBL      __s390_indirect_jump_r14+0x2
 222:   eb ef f0 88 00 24       stmg    %r14,%r15,136(%r15)
 228:   b9 04 00 ef             lgr     %r14,%r15
 22c:   e3 f0 ff e8 ff 71       lay     %r15,-24(%r15)
 232:   e3 e0 f0 98 00 24       stg     %r14,152(%r15)
 238:   c0 20 00 00 00 00       larl    %r2,238 
                        23a: R_390_PC32DBL      .LC48+0x2
 23e:   c0 e5 00 00 00 00       brasl   %r14,23e 
                        240: R_390_PLT32DBL     __warn_printk+0x2
 244:   af 00 00 00             mc      0,0
 248:   eb ef f0 a0 00 04       lmg     %r14,%r15,160(%r15)
 24e:   c0 f4 00 00 00 00       jg      24e 
                        250: R_390_PC32DBL      __s390_indirect_jump_r14+0x2

With this change the generated code looks like this:

0000000000000210 :
 210:   c0 04 00 00 00 00       jgnop   210 
 216:   ec 26 00 06 00 7c       cgijne  %r2,0,222 
 21c:   c0 f4 00 00 00 00       jg      21c 
                        21e: R_390_PC32DBL      __s390_indirect_jump_r14+0x2
 222:   c0 20 00 00 00 00       larl    %r2,222 
                        224: R_390_PC32DBL      __bug_table+0x2
 228:   c0 f4 00 00 00 00       jg      228 
                        22a: R_390_PLT32DBL     __WARN_trap+0x2

Downside is that the call trace now starts at __WARN_trap():

------------[ cut here ]------------
bar
WARNING: arch/s390/kernel/setup.c:1017 at 0x0, CPU#0: swapper/0/0
...
Krnl PSW : 0704c00180000000 000003ffe0f6a3b4 (__WARN_trap+0x4/0x10)
...
Krnl Code: 000003ffe0f6a3ac: 0707                bcr     0,%r7
           000003ffe0f6a3ae: 0707                bcr     0,%r7
          *000003ffe0f6a3b0: af000001            mc      1,0
          >000003ffe0f6a3b4: 07fe                bcr     15,%r14
           000003ffe0f6a3b6: 47000700            bc      0,1792
           000003ffe0f6a3ba: 0707                bcr     0,%r7
           000003ffe0f6a3bc: 0707                bcr     0,%r7
           000003ffe0f6a3be: 0707                bcr     0,%r7
Call Trace:
 [<000003ffe0f6a3b4>] __WARN_trap+0x4/0x10
([<000003ffe185a54c>] start_kernel+0x53c/0x5d8)
 [<000003ffe010002e>] startup_continue+0x2e/0x40

Which isn't too helpful. This can be addressed by just skipping __WARN_trap(),
which will be addressed in a later patch.

Reviewed-by: Sven Schnelle 
Signed-off-by: Heiko Carstens

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

2025-12-06T01:01:20+00:00

Pull KVM updates from Paolo Bonzini:
 "ARM:

   - Support for userspace handling of synchronous external aborts
     (SEAs), allowing the VMM to potentially handle the abort in a
     non-fatal manner

   - Large rework of the VGIC's list register handling with the goal of
     supporting more active/pending IRQs than available list registers
     in hardware. In addition, the VGIC now supports EOImode==1 style
     deactivations for IRQs which may occur on a separate vCPU than the
     one that acked the IRQ

   - Support for FEAT_XNX (user / privileged execute permissions) and
     FEAT_HAF (hardware update to the Access Flag) in the software page
     table walkers and shadow MMU

   - Allow page table destruction to reschedule, fixing long
     need_resched latencies observed when destroying a large VM

   - Minor fixes to KVM and selftests

  Loongarch:

   - Get VM PMU capability from HW GCFG register

   - Add AVEC basic support

   - Use 64-bit register definition for EIOINTC

   - Add KVM timer test cases for tools/selftests

  RISC/V:

   - SBI message passing (MPXY) support for KVM guest

   - Give a new, more specific error subcode for the case when in-kernel
     AIA virtualization fails to allocate IMSIC VS-file

   - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually
     in small chunks

   - Fix guest page fault within HLV* instructions

   - Flush VS-stage TLB after VCPU migration for Andes cores

  s390:

   - Always allocate ESCA (Extended System Control Area), instead of
     starting with the basic SCA and converting to ESCA with the
     addition of the 65th vCPU. The price is increased number of exits
     (and worse performance) on z10 and earlier processor; ESCA was
     introduced by z114/z196 in 2010

   - VIRT_XFER_TO_GUEST_WORK support

   - Operation exception forwarding support

   - Cleanups

  x86:

   - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO
     SPTE caching is disabled, as there can't be any relevant SPTEs to
     zap

   - Relocate a misplaced export

   - Fix an async #PF bug where KVM would clear the completion queue
     when the guest transitioned in and out of paging mode, e.g. when
     handling an SMI and then returning to paged mode via RSM

   - Leave KVM's user-return notifier registered even when disabling
     virtualization, as long as kvm.ko is loaded. On reboot/shutdown,
     keeping the notifier registered is ok; the kernel does not use the
     MSRs and the callback will run cleanly and restore host MSRs if the
     CPU manages to return to userspace before the system goes down

   - Use the checked version of {get,put}_user()

   - Fix a long-lurking bug where KVM's lack of catch-up logic for
     periodic APIC timers can result in a hard lockup in the host

   - Revert the periodic kvmclock sync logic now that KVM doesn't use a
     clocksource that's subject to NTP corrections

   - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the
     latter behind CONFIG_CPU_MITIGATIONS

   - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast
     path; the only reason they were handled in the fast path was to
     paper of a bug in the core #MC code, and that has long since been
     fixed

   - Add emulator support for AVX MOV instructions, to play nice with
     emulated devices whose guest drivers like to access PCI BARs with
     large multi-byte instructions

  x86 (AMD):

   - Fix a few missing "VMCB dirty" bugs

   - Fix the worst of KVM's lack of EFER.LMSLE emulation

   - Add AVIC support for addressing 4k vCPUs in x2AVIC mode

   - Fix incorrect handling of selective CR0 writes when checking
     intercepts during emulation of L2 instructions

   - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32]
     on VMRUN and #VMEXIT

   - Fix a bug where KVM corrupt the guest code stream when re-injecting
     a soft interrupt if the guest patched the underlying code after the
     VM-Exit, e.g. when Linux patches code with a temporary INT3

   - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits
     to userspace, and extend KVM "support" to all policy bits that
     don't require any actual support from KVM

  x86 (Intel):

   - Use the root role from kvm_mmu_page to construct EPTPs instead of
     the current vCPU state, partly as worthwhile cleanup, but mostly to
     pave the way for tracking per-root TLB flushes, and elide EPT
     flushes on pCPU migration if the root is clean from a previous
     flush

   - Add a few missing nested consistency checks

   - Rip out support for doing "early" consistency checks via hardware
     as the functionality hasn't been used in years and is no longer
     useful in general; replace it with an off-by-default module param
     to WARN if hardware fails a check that KVM does not perform

   - Fix a currently-benign bug where KVM would drop the guest's
     SPEC_CTRL[63:32] on VM-Enter

   - Misc cleanups

   - Overhaul the TDX code to address systemic races where KVM (acting
     on behalf of userspace) could inadvertantly trigger lock contention
     in the TDX-Module; KVM was either working around these in weird,
     ugly ways, or was simply oblivious to them (though even Yan's
     devilish selftests could only break individual VMs, not the host
     kernel)

   - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a
     TDX vCPU, if creating said vCPU failed partway through

   - Fix a few sparse warnings (bad annotation, 0 != NULL)

   - Use struct_size() to simplify copying TDX capabilities to userspace

   - Fix a bug where TDX would effectively corrupt user-return MSR
     values if the TDX Module rejects VP.ENTER and thus doesn't clobber
     host MSRs as expected

  Selftests:

   - Fix a math goof in mmu_stress_test when running on a single-CPU
     system/VM

   - Forcefully override ARCH from x86_64 to x86 to play nice with
     specifying ARCH=x86_64 on the command line

   - Extend a bunch of nested VMX to validate nested SVM as well

   - Add support for LA57 in the core VM_MODE_xxx macro, and add a test
     to verify KVM can save/restore nested VMX state when L1 is using
     5-level paging, but L2 is not

   - Clean up the guest paging code in anticipation of sharing the core
     logic for nested EPT and nested NPT

  guest_memfd:

   - Add NUMA mempolicy support for guest_memfd, and clean up a variety
     of rough edges in guest_memfd along the way

   - Define a CLASS to automatically handle get+put when grabbing a
     guest_memfd from a memslot to make it harder to leak references

   - Enhance KVM selftests to make it easer to develop and debug
     selftests like those added for guest_memfd NUMA support, e.g. where
     test and/or KVM bugs often result in hard-to-debug SIGBUS errors

   - Misc cleanups

  Generic:

   - Use the recently-added WQ_PERCPU when creating the per-CPU
     workqueue for irqfd cleanup

   - Fix a goof in the dirty ring documentation

   - Fix choice of target for directed yield across different calls to
     kvm_vcpu_on_spin(); the function was always starting from the first
     vCPU instead of continuing the round-robin search"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits)
  KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS
  KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2
  KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX}
  KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected"
  KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot()
  KVM: arm64: Add endian casting to kvm_swap_s[12]_desc()
  KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n
  KVM: arm64: selftests: Add test for AT emulation
  KVM: arm64: nv: Expose hardware access flag management to NV guests
  KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW
  KVM: arm64: Implement HW access flag management in stage-1 SW PTW
  KVM: arm64: Propagate PTW errors up to AT emulation
  KVM: arm64: Add helper for swapping guest descriptor
  KVM: arm64: nv: Use pgtable definitions in stage-2 walk
  KVM: arm64: Handle endianness in read helper for emulated PTW
  KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW
  KVM: arm64: Call helper for reading descriptors directly
  KVM: arm64: nv: Advertise support for FEAT_XNX
  KVM: arm64: Teach ptdump about FEAT_XNX permissions
  KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions
  ...

KVM: s390: Enable and disable interrupts in entry code

2025-11-27T14:39:46+00:00

Move enabling and disabling of interrupts around the SIE instruction to
entry code. Enabling interrupts only after the __TI_sie flag has been set
guarantees that the SIE instruction is not executed if an interrupt happens
between enabling interrupts and the execution of the SIE instruction.
Interrupt handlers and machine check handler forward the PSW to the
sie_exit label in such cases.

This is a prerequisite for VIRT_XFER_TO_GUEST_WORK to prevent that guest
context is entered when e.g. a scheduler IPI, indicating that a reschedule
is required, happens right before the SIE instruction, which could lead to
long delays.

Signed-off-by: Heiko Carstens 
Tested-by: Andrew Donnellan 
Signed-off-by: Andrew Donnellan 
Reviewed-by: Janosch Frank 
Signed-off-by: Janosch Frank

s390/entry: Use lay instead of aghik

2025-11-26T11:28:23+00:00

Use the lay instruction instead of aghik. aghik is only available since
z196, therefore compiling the kernel for z10 results in this error:

   arch/s390/kernel/entry.S: Assembler messages:
   arch/s390/kernel/entry.S:165: Error: Unrecognized opcode: `aghik'

Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-kbuild-all/202511261518.nBbQN5h7-lkp@intel.com/
Fixes: f5730d44e05e ("s390: Add stackprotector support")
Reviewed-by: Jan Polensky 
Signed-off-by: Heiko Carstens

s390: Add stackprotector support

2025-11-24T10:45:21+00:00

Stackprotector support was previously unavailable on s390 because by
default compilers generate code which is not suitable for the kernel:
the canary value is accessed via thread local storage, where the address
of thread local storage is within access registers 0 and 1.

Using those registers also for the kernel would come with a significant
performance impact and more complicated kernel entry/exit code, since
access registers contents would have to be exchanged on every kernel entry
and exit.

With the upcoming gcc 16 release new compiler options will become available
which allow to generate code suitable for the kernel. [1]

Compiler option -mstack-protector-guard=global instructs gcc to generate
stackprotector code that refers to a global stackprotector canary value via
symbol __stack_chk_guard. Access to this value is guaranteed to occur via
larl and lgrl instructions.

Furthermore, compiler option -mstack-protector-guard-record generates a
section containing all code addresses that reference the canary value.

To allow for per task canary values the instructions which load the address
of __stack_chk_guard are patched so they access a lowcore field instead: a
per task canary value is available within the task_struct of each task, and
is written to the per-cpu lowcore location on each context switch.

Also add sanity checks and debugging option to be consistent with other
kernel code patching mechanisms.

Full debugging output can be enabled with the following kernel command line
options:

debug_stackprotector
bootdebug
ignore_loglevel
earlyprintk
dyndbg="file stackprotector.c +p"

Example debug output:

stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240

where ":  -> ".

[1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector")

Reviewed-by: Sven Schnelle 
Signed-off-by: Heiko Carstens

s390/syscalls: Switch to generic system call table generation

2025-11-17T10:10:39+00:00

The s390 syscall.tbl format differs slightly from most others, and
therefore requires an s390 specific system call table generation
script.

With compat support gone use the opportunity to switch to generic
system call table generation. The abi for all 64 bit system calls is
now common, since there is no need to specify if system call entry
points are only for 64 bit anymore.

Furthermore create the system call table in C instead of assembler
code in order to get type checking for all system call functions
contained within the table.

Reviewed-by: Arnd Bergmann 
Signed-off-by: Heiko Carstens

s390: Remove compat support

2025-11-17T10:10:38+00:00

There shouldn't be any 31 bit code around anymore that matters.
Remove the compat layer support required to run 31 bit code.

Reason for removal is code simplification and reduced test effort.

Note that this comes without any deprecation warnings added to config
options, or kernel messages, since most likely those would be ignored
anyway.

If it turns out there is still a reason to keep the compat layer this
can be reverted at any time in the future.

Reviewed-by: Arnd Bergmann 
Signed-off-by: Heiko Carstens