summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
10 daysx86/apic: KVM: Use cpu_physical_id() to get APIC ID of running vCPU for AVICSean Christopherson
Use cpu_physical_id() instead of default_cpu_present_to_apicid() when getting the APIC ID of the pCPU on which a vCPU is running/loaded, as the kernel has gone way off the rails if a vCPU is loaded on a pCPU that has been physically removed from the system. Even if the impossible were to happen, the absolutely worst case scenario is that hardware will ring the AIVC doorbell on the wrong pCPU, i.e. a severely broken system will experience mild performance issues. Kill off KVM's superfluous kvm_cpu_get_apicid() wrapper along with the for-KVM export of default_cpu_present_to_apicid(), as they existed purely for the wonky AVIC usage. Cc: Kai Huang <kai.huang@intel.com> Cc: Yosry Ahmed <yosry@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Acked-by: Naveen N Rao (AMD) <naveen@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Yosry Ahmed <yosry@kernel.org> Message-ID: <20260612185459.591892-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: x86/mmu: Expose number of shadow MMU shadow pages as a statSean Christopherson
Turn arch.n_used_mmu_pages into a stat, mmu_shadow_pages, as the number of live shadow pages is arguably _the_ most critical datapoint when it comes to analyzing the shadow MMU. Before the TDP MMU came along, i.e. when the shadow MMU was the only MMU, explicitly tracking the number of shadow pages wasn't as interesting, because the same information could more or less be gleaned from the pages_{1g,2m,4k} stats. But with the TDP MMU, where the shadow MMU is only used for nested TDP, it becomes extremely difficult, if not impossible, to determine which SPTEs are coming from the TDP MMU, and which are coming from the shadow MMU. E.g. when triaging/debugging shadow MMU performance issues due to "too many shadow pages", being able to observe that 99%+ of all shadow pages are unsync is critical to being able to deduce that KVM is effectively leaking shadow pages. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260612133727.411902-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysMerge tag 'kvm-s390-next-7.2-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD * Fix S390_USER_OPEREXEC so it can now be enabled regardless of other unrelated capabilities * Fix handling of the _PAGE_UNUSED pte bit that could lead to guest memory corruption in some scenarios * A bunch of misc gmap fixes (locking, behaviour under memory pressure) * Fix CMMA dirty tracking
10 daysdrm/i915/cdclk: Fix up CDCLK_FREQ_DECIMAL without a full PLL re-enableVille Syrjälä
The GOP (and even Bspec on some platforms) is a bit inconsistent on what the CDCLK_FREQ_DECIMAL divider should be. Currently any mismatch there causes a full CDCLK PLL disable+re-enable, which we really don't want to do if any displays are currently active. Let's instead just reprogram CDCLK_FREQ_DECIMAL when that is the only thing amiss. For any other (more serious) mismatch we still punt to the full PLL reprogramming. We also need to tweak the bxt_cdclk_cd2x_pipe() stuff a bit to consistently select pipe==NONE since we have no idea which pipes are enabled at this point. Since we're not actually changing the CDCLK frequency here we don't need to sync the update to any pipe. Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16209 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20260612173653.7830-2-ville.syrjala@linux.intel.com Reviewed-by: Michał Grzelak <michal.grzelak@intel.com> (cherry picked from commit 3f9de66f8acbf8ff45a91b4920605ed10c6b7c06) Fixes: ba91b9eecb47 ("drm/i915/cdclk: Decouple cdclk from state->modeset") Fixes: d66a21947e21 ("drm/i915/bxt: Sanitize CDCLK to fix breakage during S4 resume") Fixes: c73666f394fc ("drm/i915/skl: If needed sanitize bios programmed cdclk") Cc: <stable@vger.kernel.org> # v4.5+ Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
10 daysKVM: x86: Unconditionally recompute CR8 intercept on PPR updateCarlos López
The TPR_THRESHOLD field in the VMCS is used by VMX to induce VM exits when the guest's virtual TPR falls under the specified threshold, allowing KVM to inject previously masked interrupts. KVM handles these VM exits in handle_tpr_below_threshold(). Commit eb90f3417a0c ("KVM: vmx: speed up TPR below threshold vmexits") optimized this function by calling apic_update_ppr() instead of raising KVM_REQ_EVENT. apic_update_ppr() then raises KVM_REQ_EVENT if there is a pending, deliverable interrupt. However, if there are no new interrupts pending, apic_update_ppr() does not issue the request. Thus, kvm_lapic_update_cr8_intercept() and vmx_update_cr8_intercept() are not called before VM entry, which results in a high, stale TPR_THRESHOLD. This is problematic due to the following sentence in 28.2.1.1 "VM-Execution Control Fields" in the SDM: The following check is performed if the “use TPR shadow” VM-execution control is 1 and the “virtualize APIC accesses” and “virtual-interrupt delivery” VM-execution controls are both 0: the value of bits 3:0 of the TPR threshold VM-execution control field should not be greater than the value of bits 7:4 of VTPR. This error condition is typically not observed when KVM runs on a bare metal system because modern processors support APICv, which enables virtual-interrupt delivery, and which KVM uses when possible. This causes the processor to no longer generate TPR-below-threshold exits and to no longer check TPR_THRESHOLD on entry. However, when running on older platforms, or under nested virtualization on a hypervisor that does not support virtual-interrupt delivery and enforces this check (like Hyper-V) this can cause a VM entry failure with hardware error 0x7, as seen in [1]. Call kvm_lapic_update_cr8_intercept() if apic_update_ppr() does not find a deliverable interrupt (and thus does not raise KVM_REQ_EVENT). Remove calls to kvm_lapic_update_cr8_intercept() on paths that end up in apic_update_ppr(), as they now become redundant. This ensures that any path that updates the guest's PPR also figures out if KVM needs to wait for a TPR change (using TPR_THRESHOLD on VMX or CR8 intercepts on SVM). Link: https://github.com/coconut-svsm/svsm/issues/1081 [1] Tested-by: Stefano Garzarella <sgarzare@redhat.com> Cc: stable@vger.kernel.org Fixes: eb90f3417a0c ("KVM: vmx: speed up TPR below threshold vmexits") Signed-off-by: Carlos López <clopez@suse.de> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260618174347.1981064-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: VMX: Grab vmcs12 on CR8 interception update iff vCPU is in guest modeSean Christopherson
When updating CR8 intercepts, get vmcs12 if and only if the vCPU is in guest mode so that a future change can have update CR8 intercepts during vCPU creation, without running afoul of get_vmcs12()'s lockdep assertion. ------------[ cut here ]------------ debug_locks && !(lock_is_held(&(&vcpu->mutex)->dep_map) || !refcount_read(&vcpu->kvm->users_count)) WARNING: arch/x86/kvm/vmx/nested.h:61 at get_vmcs12 arch/x86/kvm/vmx/nested.h:60 [inline], CPU#0: syz.2.19/5879 WARNING: arch/x86/kvm/vmx/nested.h:61 at vmx_update_cr8_intercept+0x3de/0x4e0 arch/x86/kvm/vmx/vmx.c:6879, CPU#0: syz.2.19/5879 Modules linked in: CPU: 0 UID: 0 PID: 5879 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:get_vmcs12 arch/x86/kvm/vmx/nested.h:60 [inline] RIP: 0010:vmx_update_cr8_intercept+0x3de/0x4e0 arch/x86/kvm/vmx/vmx.c:6879 Call Trace: <TASK> apic_update_ppr arch/x86/kvm/lapic.c:984 [inline] kvm_lapic_reset+0x1c24/0x2980 arch/x86/kvm/lapic.c:3023 kvm_vcpu_reset+0x44c/0x1bf0 arch/x86/kvm/x86.c:12986 kvm_arch_vcpu_create+0x746/0x8b0 arch/x86/kvm/x86.c:12847 kvm_vm_ioctl_create_vcpu+0x428/0x930 virt/kvm/kvm_main.c:4201 kvm_vm_ioctl+0x893/0xd50 virt/kvm/kvm_main.c:5159 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> No functional change intended. Reported-by: syzbot ci <syzbot+ci493c6d734b63e050@syzkaller.appspotmail.com> Closes: https://lore.kernel.org/all/6a2adf3b.3b0a2d4e.8c8d1.0012.GAE@google.com Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260618174347.1981064-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: x86: WARN (once) if RTC pending EOI tracking goes off the railsSean Christopherson
WARN once if KVM's tracking for pending EOIs for Real-Time Clock IRQs goes off the rails, as there's no reason to bug the host or risk a DoS due to spamming dmesg with endless WARNs. Absolute worst case scenario, guest time will go awry. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20260618174527.1982333-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: x86: WARN and fail kvm_set_irq() if a PIC or I/O APIC vector is invalidSean Christopherson
WARN and return an error up the stack if the PIC or I/O APIC encounters an invalid vector when injecting an IRQ, as there is no danger to the host and thus no justification for potentially panicking the kernel. Don't bug the VM either, as the risk of corrupting the guest is minuscule, and the guest might even be completely tolerant of a lost interrupt. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20260618185213.2019937-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: x86: Bug the VM, not the kernel, if the ISR count {under,over}flowsSean Christopherson
Bug the VM, not the host kernel, if KVM's ISR count {under,over}flows when tracking in-flight ISRs. There is zero danger to the host if KVM messes up its IRQ tracking. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20260618185350.2020845-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: x86/mmu: Bug the VM, not the host kernel, if KVM write-protects upper SPTEsSean Christopherson
Instead of bugging the host kernel, WARN and terminate the VM if KVM attempts to write-protect at a level that cannot use leaf SPTEs. There is no reason to bring down the entire host; even termininating the VM is likely overkill, but in theory a missed write could corrupt guest memory, so play it safe. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20260618185641.2022368-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: x86: Replace BUG_ON() with WARN_ON_ONCE() on "bad" nested GPA translationSean Christopherson
If KVM attempts to translate what it thinks is an L2 GPA with a non-nested MMU, simply WARN and return the GPA, i.e. trust the MMU more than the caller, as there is zero reason to potentially panic the host kernel just because KVM misused an API. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Message-ID: <20260618185746.2023283-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysKVM: Replace guest-triggerable BUG_ON() in ioeventfd datamatch with ↵Sean Christopherson
get_unaligned() Drop a BUG_ON() that has been reachable since it was first added, way back in 2009, and instead use get_unaligned() to perform potentially-unaligned accesses. For a given store, KVM x86's emulator tracks the entire value in the destination operand, x86_emulate_ctxt.dst. If the destination is memory, and the target splits multiple pages and/or is emulated MMIO, then KVM handles each fragment independently. E.g. on a page split starting at page offset 0xffc, KVM writes 4 bytes to the first page, then the remaining bytes to the second page, using ctxt->dst as the source for both (with appropriate offsets). If the destination splits a page *and* hits emulated MMIO on the second page, then KVM will complete the write to the first page, then emulate the MMIO access to the second page. If there is a datamatch-enabled ioeventfd at offset 0 of the second page, then KVM will process the remainder of the store as a potential ioeventfd signal. Putting it all together, if the guest emits a store that splits a page starting at page offset N, and the second page has a datamatch-enabled ioeventfd at offset 0, then KVM will check for datamatch using &dst.valptr[N] as the source. Due to dst (and thus dst.valptr) being 32-byte aligned, if N is not aligned to @len, the BUG_ON() fires. E.g. with a 16-byte store at page offset 0xffc, to an ioeventfd of len 8, all initial checks in ioeventfd_in_range() will succeed, and the BUG_ON() fires due to @val being 4-byte aligned, but not 8-byte aligned. ------------[ cut here ]------------ kernel BUG at arch/x86/kvm/../../../virt/kvm/eventfd.c:783! Oops: invalid opcode: 0000 [#1] SMP CPU: 0 UID: 1000 PID: 615 Comm: repro Not tainted 7.1.0-rc2-ff238429d1ea #365 PREEMPT Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:ioeventfd_write+0x6c/0x70 [kvm] Call Trace: <TASK> __kvm_io_bus_write+0x85/0xb0 [kvm] kvm_io_bus_write+0x53/0x80 [kvm] vcpu_mmio_write+0x66/0xf0 [kvm] emulator_read_write_onepage+0x12a/0x540 [kvm] emulator_read_write+0x109/0x2b0 [kvm] x86_emulate_insn+0x4f8/0xfb0 [kvm] x86_emulate_instruction+0x181/0x790 [kvm] kvm_mmu_page_fault+0x313/0x630 [kvm] vmx_handle_exit+0x18a/0x590 [kvm_intel] kvm_arch_vcpu_ioctl_run+0xc81/0x1c90 [kvm] kvm_vcpu_ioctl+0x2d5/0x970 [kvm] __x64_sys_ioctl+0x8a/0xd0 do_syscall_64+0xb7/0x890 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f19c931a9bf </TASK> Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0000000000000000 ]--- In a perfect world, the fix would be to simply delete the BUG_ON(), as KVM x86 doesn't perform alignment checks on "normal" memory accesses at CPL0. Sadly, C99 ruins all the fun; while the x86 architecture plays nice, dereferencing an unaligned pointer directly is undefined behavior in C, e.g. triggers splats when running with CONFIG_UBSAN_ALIGNMENT=y. Fixes: d34e6b175e61 ("KVM: add ioeventfd support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260612225241.678509-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
10 daysALSA: seq: Fix uninitialised heap leak in snd_seq_event_dup()HyeongJun An
snd_seq_event_dup() copies an incoming event into a pool cell and, in the UMP-enabled build, clears the trailing cell->ump.raw.extra word that the memcpy() did not cover. The guard deciding whether to clear it compares the copied size against sizeof(cell->event): memcpy(&cell->ump, event, size); if (size < sizeof(cell->event)) cell->ump.raw.extra = 0; For a legacy (non-UMP) event, size == sizeof(struct snd_seq_event) == sizeof(cell->event), so the condition is false and the extra word keeps stale data. The cell pool is allocated with kvmalloc() (not zeroed) and cells are reused via a free list, so that word holds uninitialised heap or leftover event data. When such a cell is delivered to a UMP client (client->midi_version > 0) that set SNDRV_SEQ_FILTER_NO_CONVERT -- so the legacy event reaches it unconverted -- snd_seq_read() reads it out as the larger struct snd_seq_ump_event and copies the stale word to user space, a 4-byte kernel heap infoleak to an unprivileged /dev/snd/seq client. Compare against sizeof(cell->ump) instead, so the trailing word is zeroed for every event shorter than the UMP cell. Fixes: 46397622a3fa ("ALSA: seq: Add UMP support") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: HyeongJun An <sammiee5311@gmail.com> Link: https://patch.msgid.link/20260623233841.853326-1-sammiee5311@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
10 daysKVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits()Claudio Imbrenda
If the allocation of the bits array failed, kvm_s390_set_cmma_bits() would return 0 instead of an error code. Rework the function to use the __free() macros and thus simplify the code flow; when the above mentioned allocation fails, simply return -ENOMEM. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-10-imbrenda@linux.ibm.com>
10 daysKVM: s390: selftests: Fix cmma selftestClaudio Imbrenda
The existing cmma selftest depended on the host allocating page tables for all present memslots. Since the gmap rewrite, memory that is not accessed by the guest might not have page tables allocated yet. This caused the test to fail due to a mismatch in the assertion. Fix by having the guest access also the second half of the test memslot, thus guaranteeing that its page tables are present. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-9-imbrenda@linux.ibm.com>
10 daysKVM: s390: Fix cmma dirty trackingClaudio Imbrenda
It is possible that some guest memory areas have not been touched yet when starting migration mode, and thus have no ptes allocated. Only existing and allocated ptes should count toward the total of dirty cmma entries. When starting migration mode, enable the migration_mode flag immediately, so that any subsequent ESSA will trap in the host and cause cmma_dirty_pages to be increased as needed. Subsequently, set the cmma_d bit on all existing cmma-clean PGSTEs, increasing cmma_dirty_pages as needed. Skipping cmma-dirty pages prevents double counting. Conversely, when disabling migration mode, set cmma_dirty_pages to 0 and clear the cmma_d bit in all existing PGSTEs. The invariant is that when migration mode is off, no PGSTE has its cmma_d bit set, and cmma_dirty_pages is 0. kvm->slots_lock protects kvm_s390_vm_start_migration() and kvm_s390_vm_stop_migration() from each other and from kvm_s390_get_cmma_bits(). Also fix dat_get_cmma() to properly wrap around if the first attempt reached the end of guest memory without finding cmma-dirty pages. [ imbrenda: Moved kvm_s390_sync_request_broadcast() before gmap_set_cmma_all_dirty() ] Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-8-imbrenda@linux.ibm.com>
10 daysKVM: s390: Fix locking in kvm_s390_set_mem_control()Claudio Imbrenda
Add the missing locking around dat_reset_cmma(). Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-7-imbrenda@linux.ibm.com>
10 daysKVM: s390: Fix handle_{sske,pfmf} under memory pressureClaudio Imbrenda
Under heavy memory pressure, handle_sske() and handle_pfmf() might cause an endless loop if the mmu cache runs empty, the atomic allocations fail, and the top-up function also fails. While quite unlikely, that scenario is not impossible. Fix the issue by not ignoring the return value of kvm_s390_mmu_cache_topup(), and appropriately returning an error code in case of failure. Fixes: e38c884df921 ("KVM: s390: Switch to new gmap") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-6-imbrenda@linux.ibm.com>
10 daysKVM: s390: Fix code typo in gmap_protect_asce_top_level()Claudio Imbrenda
The correct length to pass to kvm_s390_get_guest_pages() is asce.tl + 1, not asce.dt + 1. It was a typo, which, due to fortuitous circumstances, did not cause bugs. It should nonetheless be fixed. Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-5-imbrenda@linux.ibm.com>
10 daysKVM: s390: Do not set special large pages dirtyClaudio Imbrenda
Special pages / folios should not be set dirty. This also applies to large pages. Add a missing check in gmap_clear_young_crste() to prevent setting the large page dirty if it is a special page. Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-4-imbrenda@linux.ibm.com>
10 daysKVM: s390: Fix dat_peek_cmma() overflowClaudio Imbrenda
If userspace passes a start address that is out of bounds, _dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be touched and will stay 0. This will cause *count to underflow and report a very high number, and the function will end up erroneously reporting success. Fix by only setting *count if the end address is not smaller than the starting address. This way invalid starting addresses will correctly return -EFAULT and *count will correctly indicate that no values have been returned. Fixes: 7b368470e1a4 ("KVM: s390: KVM page table management functions: CMMA") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-3-imbrenda@linux.ibm.com>
10 dayss390/mm: Fix handling of _PAGE_UNUSED pte bitClaudio Imbrenda
The _PAGE_UNUSED softbit should not really be lying around. Its sole purpose is to signal to try_to_unmap_one() and try_to_migrate_one() that the page can be discarded instead of being moved / swapped. KVM has no way to know why a page is being unmapped, so it sets the bit on userspace ptes corresponding to unused guest pages every time they get unmapped. KVM has no reasonable way to clear the bit once the page is in use again. While set_ptes() checks and clears the bit, other paths that set new ptes did not. This led to used pages being thrown out as if they were unused, causing guest corruption. Fix the issue by clearing the _PAGE_UNUSED bit for present ptes in set_pte(), i.e. whenever a present pte is getting set. The check in set_ptes() is then redundant and can be removed. Also fix gmap_helper_try_set_pte_unused() to only set the bit if the pte is present; the _PAGE_UNUSED bit is only defined for present ptes and thus should not be set for non-present ptes. Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()") Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260623153331.233784-2-imbrenda@linux.ibm.com>
10 daysKVM: s390: Fix typo in UCONTROL documentationEric Farman
Small typo noticed while writing the USER_OPEREXEC selftest. Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260507200836.3500368-4-farman@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
10 daysKVM: s390: selftests: Extended user_operexec testsEric Farman
There is a possibility that the user_operexec capability only works if facility bit 74 is enabled. This is now fixed, but add a selftest to demonstrate that. Signed-off-by: Eric Farman <farman@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260507200836.3500368-3-farman@linux.ibm.com>
10 daysKVM: s390: Fix S390_USER_OPEREXEC enablement without STFLE 74Eric Farman
The KVM_CAP_S390_USER_OPEREXEC capability allows operation exceptions to be forwarded to userspace. But the actual enablement at the hardware level occurs in kvm_arch_vcpu_postcreate(), and only if STFLE.74 or user_instr0 are enabled. The latter is associated with a separate capability (KVM_CAP_S390_USER_INSTR0), so the only way this happens for the USER_OPEREXEC capability is if STFLE.74 is enabled. KVM unconditionally enables this bit in kvm_arch_init_vm(), but the guest could disable it from the CPU model and thus ignore this capability. Add USER_OPEREXEC to the check in kvm_arch_vcpu_postcreate(), such that either capability would enable this type of exception. Fixes: 8e8678e740ec ("KVM: s390: Add capability that forwards operation exceptions") Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> [Fixed patch title, as recommended by frankja@linux.ibm.com] Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260507200836.3500368-2-farman@linux.ibm.com>
10 daysapparmor: advertise the tcp fast open fix is appliedJohn Johansen
The fix for tcp-fast-open ensures that the connect permission is being mediated correctly but it didn't add an artifact to the feature set to advertise the fix is available. Add an artifact so that the test suite can identify if the fix has not been properly applied or a new unexpected regression has occurred. Fixes: 4d587cd8a7215 ("apparmor: mediate the implicit connect of TCP fast open sendmsg") Signed-off-by: John Johansen <john.johansen@canonical.com>
10 dayseth: fbnic: fix ordering of heartbeat vs ownershipJakub Kicinski
When requesting ownership of the NIC (MAC/PHY control), we set up the heartbeat to look stale: /* Initialize heartbeat, set last response to 1 second in the past * so that we will trigger a timeout if the firmware doesn't respond */ fbd->last_heartbeat_response = req_time - HZ; fbd->last_heartbeat_request = req_time; The response handler then sets: fbd->last_heartbeat_response = jiffies; for which we wait via: fbnic_fw_init_heartbeat() -> fbnic_fw_heartbeat_current() The scheme is a bit odd, but it should work in principle. Fix the ordering of operations. We have to set up the stale heartbeat before we send the message. Otherwise if the response is very fast we will override it. This triggers on QEMU if we run on the core that handles the IRQ, and results in ndo_open failing with ETIMEDOUT. The change in ordering doesn't impact releasing the ownership. Both ndo_stop and heartbeat check are under rtnl_lock. Fixes: 20d2e88cc746 ("eth: fbnic: Add initial messaging to notify FW of our presence") Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260622154753.827506-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge branch 'ipv6-fix-error-handling-in-disable_ipv6-sysctl'Jakub Kicinski
Fernando Fernandez Mancera says: ==================== ipv6: fix error handling in disable_ipv6 sysctl While working on a different IPv6 patch series I have spotted multiple minor bugs around sysctl error handling and notifications. In general, they are not serious issues. In addition, there is one more issue in forwarding sysctl as it does not check for CAP_NET_ADMIN for the namespace. I am keeping that patch out of this series and I am aiming it at the net-next tree once it re-opens. During v3, Ido's pointed out that it is unnecessary to reset the position pointer when the return value is negative as at new_sync_write() the ppos is only advanced when ret return value is positive. That means we can get rid of that operation in ipv4/ipv6 sysctls. That is going to be sent to net-next too. ==================== Link: https://patch.msgid.link/20260622130857.5115-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysipv6: fix missing notification for ignore_routes_with_linkdownFernando Fernandez Mancera
When changing the ignore_routes_with_linkdown sysctl for a specific interface, the RTM_NEWNETCONF netlink notification was not being emitted to userspace. Fix this by emitting the notification when needed. In addition, fix bogus return value for successful "all" and specific interface write operation leading to a wrong reset of the position pointer. Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down") Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260622130857.5115-7-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysipv6: fix state corruption during proxy_ndp sysctl restartFernando Fernandez Mancera
When handling proxy_ndp, if rtnl_net_trylock() fails, the operation is retried but as the value was already modified by the initial proc_dointvec() call, the restarted syscall will read the newly modified value as the 'old' state. Fix this by taking the RTNL lock before parsing the input value if the operation is a write. Fixes: c92d5491a6d9 ("netconf: add support for IPv6 proxy_ndp") Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260622130857.5115-6-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysipv6: fix error handling in disable_policy sysctlFernando Fernandez Mancera
When writing to the disable_policy sysctl, if proc_dointvec() fails to parse the input, it returns a negative error code. The current implementation is resetting the position argument even if an error occurred during proc_dointvec() and not only during sysctl restart. Fix this by checking the return value of proc_dointvec() and returning early on failure. Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260622130857.5115-5-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysipv6: fix error handling in forwarding sysctlFernando Fernandez Mancera
When writing to the forwarding sysctl, if proc_dointvec() fails to parse the input, it returns a negative error code. The current implementation is overwriting that error for write operations. This results in a silent failure, it returns a successful write although the configuration was not modified at all. When modifying the "all" variant it can also modify the configuration of existing interfaces to the wrong value. Fix this by checking the return value of proc_dointvec() and returning early on failure. In addition, adjust return code of addrconf_fixup_forwarding() for successful operation. Fixes: b325fddb7f86 ("ipv6: Fix sysctl unregistration deadlock") Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260622130857.5115-4-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysipv6: fix error handling in ignore_routes_with_linkdown sysctlFernando Fernandez Mancera
When writing to the ignore_routes_with_linkdown sysctl, if proc_dointvec() fails to parse the input, it returns a negative error code. The current implementation is overwriting that error for write operations. This results in a silent failure, it returns a successful write although the configuration was not modified at all. When modifying the "all" variant it can also modify the configuration of existing interfaces to the wrong value. Fix this by checking the return value of proc_dointvec() and returning early on failure. Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down") Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260622130857.5115-3-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysipv6: fix error handling in disable_ipv6 sysctlFernando Fernandez Mancera
When writing to the disable_ipv6 sysctl, if proc_dointvec() fails to parse the input, it returns a negative error code. The current implementation is overwriting that error for write operations. This results in a silent failure, it returns a successful write although the configuration was not modified at all. When modifying the "all" variant it can also modify the configuration of existing interfaces to the wrong value. Fix this by checking the return value of proc_dointvec() and returning early on failure. Fixes: 56d417b12e57 ("IPv6: Add 'autoconf' and 'disable_ipv6' module parameters") Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260622130857.5115-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMAINTAINERS: Orphan SUNPLUS ETHERNET DRIVERWells Lu
I have left Sunplus and no longer have access to the relevant hardware to test or maintain this driver. Mark the driver as orphaned. Signed-off-by: Wells Lu <wellslutw@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260622180721.28334-1-wellslutw@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: au1000: move free_irq out of the close-time spinlocked sectionRunyu Xiao
au1000_close() calls free_irq() while aup->lock is still held with spin_lock_irqsave(). free_irq() can sleep because it takes the IRQ descriptor request mutex, so it does not belong inside the close-time spinlocked section. This was found by our static analysis tool and then confirmed by manual review of the in-tree au1000_close() .ndo_stop path. The reviewed path keeps aup->lock held across the MAC reset, queue stop and free_irq(dev->irq, dev). A directed runtime validation kept that ndo_stop carrier and the same free_irq(dev->irq, dev) operation under the driver lock. Lockdep reported "BUG: sleeping function called from invalid context" and "Invalid wait context" while free_irq() was taking desc->request_mutex, with au1000_close() and free_irq() on the stack. Drop aup->lock before freeing the IRQ. The protected close-time work still stops the device and queue before IRQ teardown, but the sleepable IRQ core path now runs outside the spinlocked section. Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260619151816.1144289-1-runyu.xiao@seu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 dayssctp: fix err_chunk memory leaks in INIT handlingXin Long
When sctp_verify_init() encounters unrecognized parameters, it allocates an err_chunk to report them. However, this chunk is leaked in several code paths: 1. In sctp_sf_do_5_1B_init(), if security_sctp_assoc_request() fails after sctp_verify_init() has populated err_chunk, the function returns immediately without freeing it. 2. In sctp_sf_do_unexpected_init(), the same leak occurs on the security_sctp_assoc_request() failure path. 3. In sctp_sf_do_unexpected_init(), on the success path after copying unrecognized parameters to the INIT-ACK, the function returns without freeing err_chunk, unlike sctp_sf_do_5_1B_init() which properly frees it. Fix all three leaks by adding sctp_chunk_free(err_chunk) calls before returning in the error paths and on the success path in sctp_sf_do_unexpected_init(). Fixes: c081d53f97a1 ("security: pass asoc to sctp_assoc_request and sctp_sk_clone") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0656704f1b0158287c98aec09ba36c83e4a537ab.1781970534.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet/sched: cls_api: Handle TC_ACT_CONSUMED in tcf_qevent_handleJamal Hadi Salim
tcf_classify() can return TC_ACT_CONSUMED while the skb is held by the defragmentation engine (e.g. act_ct on out-of-order fragments). When that happens the skb is no longer owned by the caller and must not be touched again. tcf_qevent_handle() did not handle TC_ACT_CONSUMED: it fell through the switch and returned the skb to the caller as if classification had passed. The only qdisc that wires up qevents today is RED, via three call sites (qe_mark on RED_PROB_MARK/HARD_MARK, qe_early_drop on congestion_drop) red_enqueue() was continuing to operate on an skb it no longer owns in this case -- enqueueing it, dropping it, or updating statistics. Resulting in a UAF. tc qdisc add dev eth0 root handle 1: red ... qevent early_drop block 10 tc filter add block 10 ... action ct (with ct defrag enabled and traffic that produces out-of-order fragments, e.g. a fragmented UDP stream) Handle TC_ACT_CONSUMED in tcf_qevent_handle() the same way the ingress and egress fast paths do: treat it as stolen and return NULL without touching the skb. Unlike the TC_ACT_STOLEN case, the skb must not be dropped/freed here, as it is no longer owned by us. Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags") Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com> Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260620130749.226642-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge branch 'drop-skb-metadata-before-lwt-encapsulation'Jakub Kicinski
Jakub Sitnicki says: ==================== Drop skb metadata before LWT encapsulation See description for patch 1. ==================== Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-0-71d6a33ab76b@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysselftests/bpf: Add LWT encap tests for skb metadataJakub Sitnicki
Test that an LWT encapsulation does not silently corrupt XDP metadata sitting in the skb headroom. Exercise all three LWT dispatch paths: - BPF LWT xmit prog reserves headroom on the LWT .xmit redirect, - mpls pushes an MPLS label on the LWT .xmit redirect, - seg6 in encap mode runs on the LWT .input redirect, - ioam6 encap inserts an IOAM Hop-by-Hop option on LWT .output redirect. Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-2-71d6a33ab76b@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysnet: lwtunnel: Drop skb metadata before LWT encapsulationJakub Sitnicki
skb metadata is meant for passing information between XDP and TC. It lives in the skb headroom, immediately before skb->data. LWT programs cannot access the __sk_buff->data_meta pseudo-pointer to metadata. However, LWT encapsulation prepends outer headers, moving skb->data back over the headroom where the metadata sits. On an RX-originated (forwarded) packet that still carries XDP metadata this goes wrong in two different ways, depending on the encap type: 1. Non-BPF LWT encaps (mpls, seg6, ioam6 ...) call skb_push()/skb_pull() and silently overwrite the metadata that sits in the headroom. 2) BPF LWT xmit calls bpf_skb_change_head(), which uses skb_data_move(). That helper expects metadata immediately before skb->data. But since the IP output path runs LWT xmit before neighbour output has built the outgoing L2 header, for forwarded packets skb->data points at the L3 header while skb_mac_header() still points at the old L2 header. skb_data_move() sees metadata ending at skb_mac_header(), not before skb->data, warns and clears metadata: WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90 CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G O 6.18.21 #1 RIP: 0010:skb_data_move+0x47/0x90 Call Trace: <IRQ> bpf_skb_change_head+0xe6/0x1a0 bpf_prog_...+0x213/0x2e3 run_lwt_bpf.isra.0+0x1d3/0x360 bpf_xmit+0x46/0xe0 lwtunnel_xmit+0xa1/0xf0 ip_finish_output2+0x1e7/0x5e0 ip_output+0x63/0x100 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0x9c/0x150 __napi_poll+0x2b/0x190 net_rx_action+0x40b/0x7f0 handle_softirqs+0xd2/0x270 do_softirq+0x3f/0x60 </IRQ> That is what happens, as for how to fix it - a received packet that carries metadata can reach an encap through any of the three LWT redirect modes: LWTUNNEL_STATE_INPUT_REDIRECT ip6_rcv_finish dst_input lwtunnel_input LWTUNNEL_STATE_OUTPUT_REDIRECT ip6_rcv_finish dst_input ip6_forward ip6_forward_finish dst_output lwtunnel_output LWTUNNEL_STATE_XMIT_REDIRECT ip6_rcv_finish dst_input ip6_forward ip6_forward_finish dst_output ip6_output ip6_finish_output ip6_finish_output2 lwtunnel_xmit Every encap funnels through the three LWT dispatch helpers, so drop the metadata there, right before handing the skb to the encap op. This single chokepoint covers all encap types and all three redirect modes: - lwtunnel_input(): seg6, rpl, ila, seg6_local - lwtunnel_output(): ioam6 - lwtunnel_xmit(): mpls, LWT BPF xmit Alternatively, we could clear the metadata right after TC ingress hook. That would require a compromise, however. Metadata would become inaccessible from TC egress (in setups where it actually reaches the hook it tact, that is without any L2 tunnels on path). Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-1-71d6a33ab76b@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
10 daysMerge tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New features: - XPRTRDMA: Decouple req recycling from RPC completion - NFS: Expose FMODE_NOWAIT for read-only files Bugfixes: - SUNRPC: - Fix sunrpc sysfs error handling - Fix uninitialized xprt_create_args structure - XPRTRDMA: - Harden connect and reply handling - NFS: - Fix EOF updates after fallocate/zero-range - Keep PG_UPTODATE clear after read errors in page groups - Use nfsi->rwsem to protect traversal of the file lock list - Prevent resource leak in nfs_alloc_server() - NFSv4: - Clear exception state on successful mkdir retry - Don't skip revalidate when holding a dir delegation and attrs are stale - pNFS: - Fix use-after-free in pnfs_update_layout() - Defer return_range callbacks until after inode unlock - Fix LAYOUTCOMMIT retry loop on OLD_STATEID - Reject zero-length r_addr in nfs4_decode_mp_ds_addr - NFS/flexfiles: - Reject zero-length filehandle version arrays - Fix checking if a layout is striped - Fixes for honoring FF_FLAGS_NO_IO_THRU_MDS Other cleanups and improvements: - Remove the fileid field from struct nfs_inode - Move long-delayed xprtrdma work onto the system_dfl_long_wq - Convert xprtrdma send buffer free list to an llist - Show "<redacted>" for cert_serial and privkey_serial mount options" * tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (42 commits) NFS: Use common error handling code in nfs_alloc_server() NFS: Prevent resource leak in nfs_alloc_server() NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr nfs: don't skip revalidate on directory delegation when attrs flagged stale xprtrdma: Return sendctx slot after Send preparation failure xprtrdma: Repost Receive buffers for malformed replies xprtrdma: Sanitize the reply credit grant after parsing xprtrdma: Fix bcall rep leak and unbounded peek xprtrdma: Resize reply buffers before reposting receives xprtrdma: Check frwr_wp_create() during connect xprtrdma: Initialize re_id before removal registration xprtrdma: Fix ep kref imbalance on ADDR_CHANGE xprtrdma: Convert send buffer free list to llist NFS: correct CONFIG_NFS_V4 macro name in #endif comment nfs: use nfsi->rwsem to protect traversal of the file lock list NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on OLD_STATEID nfs: expose FMODE_NOWAIT for read-only files nfs: add nowait version of nfs_start_io_direct NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors ...
10 daysMerge tag 'f2fs-for-7.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "The changes primarily focus on filesystem error reporting, reducing memory footprint by reverting in-memory data structures used for runtime validation, honoring FDP hints, and adding trace and debug logs. In addition, there are critical bug fixes resolving out-of-bounds read vulnerabilities in inline directory and ACL handling, potential deadlocks in balance_fs, use-after-free issues in atomic writes, and false data/node type assignments in large sections. Enhancements: - Revert in-memory sit version and block bitmaps - support to report fserror - add trace_f2fs_fault_report - add iostat latency tracking for direct IO - add logs in f2fs_disable_checkpoint() - honor per-I/O write streams for direct writes - map data writes to FDP streams - skip inode folio lookup for cached overwrite - skip direct I/O iostat context when disabled - revert "check in-memory block bitmap" - revert "check in-memory sit version bitmap" Fixes: - optimize representative type determination in GC - fix incorrect FI_NO_EXTENT handling in __destroy_extent_node() - fix potential deadlock in f2fs_balance_fs() - fix potential deadlock in gc_merge path of f2fs_balance_fs() - atomic: fix UAF issue on f2fs_inode_info.atomic_inode - fix missing read bio submission on large folio error - pass correct iostat type for single node writes - fix to do sanity check on f2fs_get_node_folio_ra() - validate orphan inode entry count - keep atomic write retry from zeroing original data - read COW data with the original inode during atomic write - validate inline dentry name lengths before conversion - validate dentry name length before lookup compares it - reject setattr size changes on large folio files - revert "remove non-uptodate folio from the page cache in move_data_block" - validate ACL entry sizes in f2fs_acl_from_disk() - bound i_inline_xattr_size for non-inline-xattr inodes - fix listxattr handling of corrupted xattr entries - fix to round down start offset of fallocate for pin file" * tag 'f2fs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits) f2fs: fix to round down start offset of fallocate for pin file f2fs: fix listxattr handling of corrupted xattr entries f2fs: skip direct I/O iostat context when disabled f2fs: remove unneeded f2fs_is_compressed_page() f2fs: avoid unnecessary fscrypt_finalize_bounce_page() f2fs: avoid unnecessary sanity check on ckpt_valid_blocks f2fs: misc cleanup in f2fs_record_stop_reason() f2fs: fix wrong description in printed log f2fs: bound i_inline_xattr_size for non-inline-xattr inodes f2fs: validate ACL entry sizes in f2fs_acl_from_disk() Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block" f2fs: Split f2fs_write_end_io() f2fs: Rename f2fs_post_read_wq into f2fs_wq f2fs: Prepare for supporting delayed bio completion f2fs: reject setattr size changes on large folio files f2fs: validate dentry name length before lookup compares it f2fs: validate inline dentry name lengths before conversion f2fs: read COW data with the original inode during atomic write f2fs: skip inode folio lookup for cached overwrite f2fs: keep atomic write retry from zeroing original data ...
10 daysMerge tag 'x86-urgent-2026-06-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: - Prevent NULL dereference on theoretical missing IO bitmap (Li RongQing) * tag 'x86-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ioperm: Prevent NULL dereference on theoretical missing IO bitmap
11 daysMerge tag 'timers-urgent-2026-06-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc timer fixes from Ingo Molnar: - Fix timekeeping locking order bug in the timekeeping init code (Mikhail Gavrilov) - Fix u64 multiplication bug in the posix-cpu-timers code on 32-bit kernels (Zhan Xusheng) - Fix macro name in comment block (Ethan Nelson-Moore) - Fix off-by-one bug in the compat settimeofday() usecs validation code (Wang Yan) * tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Fix off-by-one in compat settimeofday() usec validation hrtimer: Correct CONFIG_NO_HZ_COMMON macro name in comment posix-cpu-timers: Use u64 multiplication in update_rlimit_cpu() timekeeping: Register default clocksource before taking tk_core.lock
11 daysMerge tag 'smp-urgent-2026-06-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc CPU hotplug fixes from Ingo Molnar: - Fix CPU hotplug error handling rollback bug (Bradley Morgan) - Fix possible output OOB write bug in the sysfs hotplug states printing code (Bradley Morgan) * tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu: hotplug: Bound hotplug states sysfs output cpu: hotplug: Preserve per instance callback errors
11 daysMerge tag 'perf-urgent-2026-06-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: - Fix event::addr_filter_ranges lifetime bug (Peter Zijlstra) * tag 'perf-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix addr_filter_ranges lifetime
11 daysMerge tag 'ipsec-2026-06-22' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-06-22 1) xfrm: use compat translator only for u64 alignment mismatch Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT so 32-bit compat tasks on arches whose 32-bit ABI already matches the native 64-bit layout are no longer rejected with -EOPNOTSUPP. From Sanman Pradhan. 2) net: af_key: initialize alg_key_len for IPComp states Initialize the alg_key_len to 0 in the IPComp branch of pfkey_msg2xfrm_state() so an uninitialized value cannot drive xfrm_alg_len() into a slab-out-of-bounds kmemdup during XFRM_MSG_MIGRATE. From Zijing Yin. 3) xfrm: Fix dev use-after-free in xfrm async resumption Stash the original skb->dev and extend the RCU critical section across xfrm_rcv_cb() and transport_finish() to prevent a tunnel-device UAF and original-device refcount leak when a callback replaces skb->dev. From Dong Chenchen. 4) xfrm: Fix xfrm state cache insertion race Move the state-validity check inside xfrm_state_lock in the input state cache insertion path so a state cannot be killed between the check and the insert. From Herbert Xu. 5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count and xfrm_policy_default to silence the KCSAN data race reported on net->xfrm.policy_count. From Eric Dumazet. 6) espintcp: use sk_msg_free_partial to fix partial send Replace the manual skmsg accounting in espintcp with sk_msg_free_partial() so the skmsg stays consistent on every iteration and the partial-send accounting bugs go away. From Sabrina Dubroca. 7) xfrm: validate selector family and prefixlen during match Reject mismatched address families in xfrm_selector_match() and bound prefixlen in addr4_match()/addr_match() to prevent the shift-out-of-bounds syzbot reported when an AF_UNSPEC selector with a large prefixlen is matched against an IPv4 flow. From Eric Dumazet. * tag 'ipsec-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: validate selector family and prefixlen during match espintcp: use sk_msg_free_partial to fix partial send xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[] xfrm: Fix xfrm state cache insertion race xfrm: Fix dev use-after-free in xfrm async resumption net: af_key: initialize alg_key_len for IPComp states xfrm: use compat translator only for u64 alignment mismatch ==================== Link: https://patch.msgid.link/20260622075726.29685-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 daysMerge tag 'locking-urgent-2026-06-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Ingo Molnar: - Fix the incorrect RCU protection in rt_spin_unlock() (Thomas Gleixner) * tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
11 daysMerge tag 'core-urgent-2026-06-23' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc core fixes from Ingo Molnar: - Fix an MM-CID race that can cause an OOB write (Rik van Riel) - Fix a debugobjects OOM handling race (Thomas Gleixner) * tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: debugobjects: Plug race against a concurrent OOM disable sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path