summaryrefslogtreecommitdiff
path: root/arch/s390/kvm
AgeCommit message (Collapse)Author
7 daysKVM: s390: Fix possible reference leak in fault-in codeClaudio Imbrenda
If kvm_s390_new_mmu_cache() fails, kvm_s390_faultin_gfn() returns without releasing the faulted page. Fix this by moving the allocation of the memory cache outside of the loop. There is no reason to check at every iteration. Opportunistically fix a comment. Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-10-imbrenda@linux.ibm.com>
7 daysKVM: s390: Prevent memslots outside the ASCE rangeClaudio Imbrenda
With KVM_S390_VM_MEM_LIMIT_SIZE, userspace can set the highest address allowed for the VM. Creating a memslot that lies over the maximum address does not make sense and is only a potential source of bugs. Prevent creation of memslots over the maximum address, and prevent the maximum address from being reduced below the end of existing memslots. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-9-imbrenda@linux.ibm.com>
7 daysKVM: s390: Lock pte when making page secureClaudio Imbrenda
Make sure _kvm_s390_pv_make_secure() takes the pte lock for the given address when attempting to make the page secure. One of the steps in making the page secure is freezing the folio using folio_ref_freeze(), which temporarily sets the reference count to 0. Any attempt to get such a folio while frozen will fail and cause a warning to be printed. Other users of folio_ref_freeze() make sure that the page is not mapped while it's being frozen, thus preventing gup functions from being able to access it. For _kvm_s390_pv_make_secure(), this is not possible, because the page needs to be mapped in order for the import to succeed. By taking the pte lock, gup functions will be blocked until the import operation is done, thus avoiding the race. In theory this does not completely solve the issue: if a page is mapped through multiple mappings, locking one pte does not protect from calling gup on it through the other mapping. In practice this does not happen and it is a decent stopgap solution until a more correct solution is available. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-8-imbrenda@linux.ibm.com>
7 daysKVM: s390: Fix fault-in codeClaudio Imbrenda
Fix the fault-in code so that it does not return success if a concurrent unmap event invalidated the fault-in process between the best-effort lockless check and the proper check with lock. The new behaviour is to retry, like the best-effort lockless check already did. This prevents the fault-in handler from returning success without having actually faulted in the requested page. Fixes: e907ae530133 ("KVM: s390: Add helper functions for fault handling") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-7-imbrenda@linux.ibm.com>
7 daysKVM: s390: vsie: Fix rmap handling in _do_shadow_crste()Claudio Imbrenda
Fix _do_shadow_crste() to also apply a mask on the reverse address, to prevent spurious entries from being created, like already done in gmap_protect_rmap(). Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-6-imbrenda@linux.ibm.com>
7 daysKVM: s390: Fix guest / virtual address confusion in _essa_clear_cbrl()Claudio Imbrenda
Until now, gmap_helper_zap_one_page() was being called with the guest absolute address, but it expects a userspace virtual address. This meant that in the best case the requested pages were not being discarded, and in the worst case that the wrong pages were being discarded. Fix this by converting the guest absolute address to host virtual before passing it to gmap_helper_zap_one_page(). Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-5-imbrenda@linux.ibm.com>
7 daysKVM: s390: Fix _gmap_crstep_xchg_atomic()Claudio Imbrenda
The previous incorrect behaviour cleared the vsie_notif bit without returning false, which allowed shadow crstes to be installed without the vsie_notif bit. Return false and do not perform the operation if an unshadow event has been triggered, but still attempt to clear the vsie_notif bit from the existing crste. This will prevent the installation of shadow crstes without vsie_notif bit and will also prevent the caller from looping forever if it was not checking for the sg->invalidated flag. Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-3-imbrenda@linux.ibm.com>
7 daysKVM: s390: Fix _gmap_unmap_crste()Claudio Imbrenda
In _gmap_unmap_crste(), the crste to be unmapped is zapped calling gmap_crstep_xchg_atomic() exactly once, and expecting it to succeed. This is a reasonable sanity check, since kvm->mmu_lock is being held in write mode, and thus no races should be possible. An upcoming patch will change the behaviour of gmap_crstep_xchg_atomic() to return false and clear the vsie_notif bit if the operation triggers an unshadow operation. With the new behaviour, an unmap operation that triggers an unshadow would cause the VM to be killed. Prepare for the change by checking if the vsie_notif bit was set in the old crste if gmap_crstep_xchg_atomic() fails the first time, and try a second time. The second time no failures are allowed. Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-2-imbrenda@linux.ibm.com>
2026-05-22KVM: s390: Properly reset zero bit in PGSTEClaudio Imbrenda
In case of memory pressure, it's possible that a guest page gets freed and then almost immediately reused by the guest. If CMMA is enabled, _essa_clear_cbrl() will discard all pages that are either unused or zero. If a discarded page is reused before _essa_clear_cbrl() is called, and the pgste.zero bit is not cleared, the page will be discarded despite not being unused. When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This prevents the page from being accidentally discarded when not unused. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: vsie: Fix redundant rmap entriesClaudio Imbrenda
The address passed to the gmap rmap was not being masked. As a consequence several different (but functionally equivalent) rmap entries were being created for each shadowed table. Fix this by properly masking the address depending on the table level. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: vsie: Fix unshadowing logicClaudio Imbrenda
In some cases (i.e. under extreme memory pressure on the host), attempting to shadow memory will result in the same memory being unshadowed, causing a loop. Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent unnecessary unshadowing and perform better checks. Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did not unshadow properly when the large page would become unprotected. Opportunistically add a check in gmap_protect_rmap() to make sure it won't be called with level == TABLE_TYPE_PAGE_TABLE. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errorsClaudio Imbrenda
Fix a memory leak that can happen if gmap_ucas_map_one() or kvm_s390_mmu_cache_topup() return error values. Also fix a similar issue in gmap_set_limit(). Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reported-by: Jiaxin Fan <jiaxin.fan@ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: vsie: Fix memory leak when unshadowingClaudio Imbrenda
When performing a partial unshadowing, the rmap was being leaked. Add the missing kfree(). Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-12Merge tag 'kvm-s390-master-7.1-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: pci: fix array indexing For large amounts of PCI devices its possible to overrun the arrays as the index was miscalculated in 2 places.
2026-04-27KVM: s390: pci: Fix aisb calculationMatthew Rosato
The current implementation of aisb calculation will erroneously index via an unsigned long * as well as multiply by 8B for every 64-bits in the offset; only one or the other is required. This throws off aisb calculations once the number of devices exceeds 64, and can result in out-of-bounds access as well as failure to indicate summary bits associated with those devices in guests. Fix this by converting to a physical address before applying the offset, as is already done in arch/s390/pci/pci_irq.c. Fixes: 3c5a1b6f0a18 ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-04-17KVM: s390: pci: fix GAIT table indexing due to double-scaling pointer arithmeticJunrui Luo
kvm_s390_pci_aif_enable(), kvm_s390_pci_aif_disable(), and aen_host_forward() index the GAIT by manually multiplying the index with sizeof(struct zpci_gaite). Since aift->gait is already a struct zpci_gaite pointer, this double-scales the offset, accessing element aisb*16 instead of aisb. This causes out-of-bounds accesses when aisb >= 32 (with ZPCI_NR_DEVICES=512) Fix by removing the erroneous sizeof multiplication. Fixes: 3c5a1b6f0a18 ("KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding") Fixes: 73f91b004321 ("KVM: s390: pci: enable host forwarding of Adapter Event Notifications") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-04-13Merge tag 'kvm-s390-next-7.1-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - ESA nesting support - 4k memslots - LPSW/E fix
2026-04-07KVM: s390: vsie: Fix races with partial gmap invalidationsClaudio Imbrenda
Introduce a new boolean flag, used for shadow gmaps, to keep track of whether the gmap has been invalidated, either partially or totally. Use the new flag to check whether shadow gmap invalidations happened during shadowing. In such cases, abort whatever was going on, return -EAGAIN and let the caller try again. Fixes: 19d6c5b80443 ("KVM: s390: vsie: Fix unshadowing while shadowing") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260407161721.247044-1-imbrenda@linux.ibm.com>
2026-04-07KVM: s390: ucontrol: Fix memslot handlingClaudio Imbrenda
Fix memslots handling for UCONTROL guests. Attempts to delete user memslots will fail, as they should, without the risk of a NULL pointer dereference. Fixes: 413c98f24c63 ("KVM: s390: fake memslot for ucontrol VMs") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07KVM: s390: Allow 4k granularity for memslotsClaudio Imbrenda
Until now memslots on s390 needed to have 1M granularity and be 1M aligned. Since the new gmap code can handle memslots with 4k granularity and alignment, remove the restrictions. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07KVM: s390: Add alignment checks for hugepagesClaudio Imbrenda
When backing a guest page with a large page, check that the alignment of the guest page matches the alignment of the host physical page backing it within the large page. Also check that the memslot is large enough to fit the large page. Those checks are currently not needed, because memslots are guaranteed to be 1m-aligned, but this will change. Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-07KVM: s390: Add some useful mask macrosClaudio Imbrenda
Add _{SEGMENT,REGION3}_FR_MASK, similar to _{SEGMENT,REGION3}_MASK, but working on gfn/pfn instead of addresses. Use them in gaccess.c instead of using the normal masks plus gpa_to_gfn(). Also add _PAGES_PER_{SEGMENT,REGION3} to make future code more readable. Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-04-02KVM: s390: Add KVM capability for ESA mode guestsHendrik Brueckner
Now that all the bits are properly addressed, provide a mechanism for testing ESA mode guests in nested configurations. Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com> [farman@us.ibm.com: Updated commit message] Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02KVM: s390: vsie: Accommodate ESA prefix pagesEric Farman
The prefix page address occupies a different number of bits for z/Architecture versus ESA mode. Adjust the definition to cover both, and permit an ESA mode address within the nested codepath. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02KVM: s390: vsie: Disable some bits when in ESA modeEric Farman
In the event that a nested guest is put in ESA mode, ensure that some bits are scrubbed from the shadow SCB. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-04-02KVM: s390: vsie: Allow non-zarch guestsEric Farman
Linux/KVM runs in z/Architecture-only mode. Although z/Architecture is built upon a long history of hardware refinements, any other CPU mode is not permitted. Allow a userspace to explicitly enable the use of ESA mode for nested guests, otherwise usage will be rejected. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-03-31KVM: s390: Fix lpsw/e breaking event handlingJanosch Frank
LPSW and LPSWE need to set the gbea on completion but currently don't. Time to fix this up. LPSWEY was designed to not set the bear. Fixes: 48a3e950f4cee ("KVM: s390: Add support for machine checks.") Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-03-31KVM: s390: only deliver service interrupt with payloadEric Farman
Routine __inject_service() may set both the SERVICE and SERVICE_EV pending bits, and in the case of a pure service event the corresponding trip through __deliver_service_ev() will clear the SERVICE_EV bit only. This necessitates an additional trip through __deliver_service() for the other pending interrupt bit, however it is possible that the external interrupt parameters are zero and there is nothing to be delivered to the guest. To avoid sending empty data to the guest, let's only write out the SCLP data when there is something for the guest to do, otherwise bail out. Signed-off-by: Eric Farman <farman@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2026-03-26KVM: s390: Fix KVM_S390_VCPU_FAULT ioctlClaudio Imbrenda
A previous commit changed the behaviour of the KVM_S390_VCPU_FAULT ioctl. The current (wrong) implementation will trigger a guest addressing exception if the requested address lies outside of a memslot, unless the VM is UCONTROL. Restore the previous behaviour by open coding the fault-in logic. Fixes: 3762e905ec2e ("KVM: s390: use __kvm_faultin_pfn()") Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix guest page tables protectionClaudio Imbrenda
When shadowing, the guest page tables are write-protected, in order to trap changes and properly unshadow the shadow mapping for the nested guest. Already shadowed levels are skipped, so that only the needed levels are write protected. Currently the levels that get write protected are exactly one level too deep: the last level (nested guest memory) gets protected in the wrong way, and will be protected again correctly a few lines afterwards; most importantly, the highest non-shadowed level does *not* get write protected. Moreover, if the nested guest is running in a real address space, there are no DAT tables to shadow. Write protect the correct levels, so that all the levels that need to be protected are protected, and avoid double protecting the last level; skip attempting to shadow the DAT tables when the nested guest is running in a real address space. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix unshadowing while shadowingClaudio Imbrenda
If shadowing causes the shadow gmap to get unshadowed, exit early to prevent an attempt to dereference the parent pointer, which at this point is NULL. Opportunistically add some more checks to prevent NULL parents. Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE") Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix refcount overflow for shadow gmapsClaudio Imbrenda
In most cases gmap_put() was not called when it should have. Add the missing gmap_put() in vsie_run(). Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix nested guest memory shadowingClaudio Imbrenda
Fix _do_shadow_pte() to use the correct pointer (guest pte instead of nested guest) to set up the new pte. Add a check to return -EOPNOTSUPP if the mapping for the nested guest is writeable but the same page in the guest is only read-only. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: Correctly handle guest mappings without struct pageClaudio Imbrenda
Introduce a new special softbit for large pages, like already presend for normal pages, and use it to mark guest mappings that do not have struct pages. Whenever a leaf DAT entry becomes dirty, check the special softbit and only call SetPageDirty() if there is an actual struct page. Move the logic to mark pages dirty inside _gmap_ptep_xchg() and _gmap_crstep_xchg_atomic(), to avoid needlessly duplicating the code. Fixes: 5a74e3d93417 ("KVM: s390: KVM-specific bitfields and helper functions") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: Fix gmap_link()Claudio Imbrenda
The slow path of the fault handler ultimately called gmap_link(), which assumed the fault was a major fault, and blindly called dat_link(). In case of minor faults, things were not always handled properly; in particular the prefix and vsie marker bits were ignored. Move dat_link() into gmap.c, renaming it accordingly. Once moved, the new _gmap_link() function will be able to correctly honour the prefix and vsie markers. This will cause spurious unshadows in some uncommon cases. Fixes: 94fd9b16cc67 ("KVM: s390: KVM page table management functions: lifecycle management") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix check for pre-existing shadow mappingClaudio Imbrenda
When shadowing a nested guest, a check is performed and no shadowing is attempted if the nested guest is already shadowed. The existing check was incomplete; fix it by also checking whether the leaf DAT table entry in the existing shadow gmap has the same protection as the one specified in the guest DAT entry. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: Remove non-atomic dat_crstep_xchg()Claudio Imbrenda
In practice dat_crstep_xchg() is racy and hard to use correctly. Simply remove it and replace its uses with dat_crstep_xchg_atomic(). This solves some actual races that lead to system hangs / crashes. Opportunistically fix an alignment issue in _gmap_crstep_xchg_atomic(). Fixes: 589071eaaa8f ("KVM: s390: KVM page table management functions: clear and replace") Fixes: 94fd9b16cc67 ("KVM: s390: KVM page table management functions: lifecycle management") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-26KVM: s390: vsie: Fix dat_split_ste()Claudio Imbrenda
If the guest misbehaves and puts the page tables for its nested guest inside the memory of the nested guest itself, and the guest and nested guest are being mapped with large pages, the shadow mapping will lose synchronization with the actual mapping, since this will cause the large page with the vsie notification bit to be split, but the vsie notification bit will not be propagated to the resulting small pages. Fix this by propagating the vsie_notif bit from large pages to normal pages when splitting a large page. Fixes: 2db149a0a6c5 ("KVM: s390: KVM page table management functions: walks") Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-24Merge tag 'kvm-s390-master-7.0-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for 7.0 - fix deadlock in new memory management - handle kernel faults on donated memory properly - fix bounds checking for irq routing + selftest - fix invalid machine checks + logging
2026-03-16KVM: s390: vsie: Avoid injecting machine check on signalChristian Borntraeger
The recent XFER_TO_GUEST_WORK change resulted in a situation, where the vsie code would interpret a signal during work as a machine check during SIE as both use the EINTR return code. The exit_reason of the sie64a function has nothing to do with the kvm_run exit_reason. Rename it and define a specific code for machine checks instead of abusing -EINTR. rename exit_reason into sie_return to avoid the naming conflict and change the code flow in vsie.c to have a separate variable for rc and sie_return. Fixes: 2bd1337a1295e ("KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-03-16KVM: s390: log machine checks more aggressivelyChristian Borntraeger
KVM will reinject machine checks that happen during guest activity. From a host perspective this machine check is no longer visible and even for the guest, the guest might decide to only kill a userspace program or even ignore the machine check. As this can be a disruptive event nevertheless, we should log this not only in the VM debug event (that gets lost after guest shutdown) but also on the global KVM event as well as syslog. Consolidate the logging and log with loglevel 2 and higher. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
2026-03-16KVM: s390: Limit adapter indicator access to mapped pageJanosch Frank
While we check the address for errors, we don't seem to check the bit offsets and since they are 32 and 64 bits a lot of memory can be reached indirectly via those offsets. Fixes: 84223598778b ("KVM: s390: irq routing for adapter interrupts.") Suggested-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-03-11Merge tag 'kvm-x86-generic-7.0-rc3' of https://github.com/kvm-x86/linux into ↵Paolo Bonzini
HEAD KVM generic changes for 7.0 - Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being unnecessary and confusing, triggered compiler warnings due to -Wflex-array-member-not-at-end. - Document that vcpu->mutex is take outside of kvm->slots_lock and kvm->slots_arch_lock, which is intentional and desirable despite being rather unintuitive.
2026-03-06KVM: s390: Fix a deadlockClaudio Imbrenda
In some scenarios, a deadlock can happen, involving _do_shadow_pte(). Convert all usages of pgste_get_lock() to pgste_get_trylock() in _do_shadow_pte() and return -EAGAIN. All callers can already deal with -EAGAIN being returned. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2026-02-28KVM: always define KVM_CAP_SYNC_MMUPaolo Bonzini
KVM_CAP_SYNC_MMU is provided by KVM's MMU notifiers, which are now always available. Move the definition from individual architectures to common code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-28KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIERPaolo Bonzini
All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-10KVM: s390: Increase permitted SE header size to 1 MiBSteffen Eiden
Relax the maximum allowed Secure Execution (SE) header size from 8 KiB to 1 MiB. This allows individual secure guest images to run on a wider range of physical machines. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>