| Age | Commit message (Collapse) | Author |
|
Use cpu_physical_id() instead of default_cpu_present_to_apicid() when
getting the APIC ID of the pCPU on which a vCPU is running/loaded, as the
kernel has gone way off the rails if a vCPU is loaded on a pCPU that has
been physically removed from the system. Even if the impossible were to
happen, the absolutely worst case scenario is that hardware will ring the
AIVC doorbell on the wrong pCPU, i.e. a severely broken system will
experience mild performance issues.
Kill off KVM's superfluous kvm_cpu_get_apicid() wrapper along with the
for-KVM export of default_cpu_present_to_apicid(), as they existed purely
for the wonky AVIC usage.
Cc: Kai Huang <kai.huang@intel.com>
Cc: Yosry Ahmed <yosry@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Naveen N Rao (AMD) <naveen@kernel.org>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Yosry Ahmed <yosry@kernel.org>
Message-ID: <20260612185459.591892-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Turn arch.n_used_mmu_pages into a stat, mmu_shadow_pages, as the number of
live shadow pages is arguably _the_ most critical datapoint when it comes
to analyzing the shadow MMU. Before the TDP MMU came along, i.e. when the
shadow MMU was the only MMU, explicitly tracking the number of shadow pages
wasn't as interesting, because the same information could more or less be
gleaned from the pages_{1g,2m,4k} stats. But with the TDP MMU, where the
shadow MMU is only used for nested TDP, it becomes extremely difficult, if
not impossible, to determine which SPTEs are coming from the TDP MMU, and
which are coming from the shadow MMU.
E.g. when triaging/debugging shadow MMU performance issues due to "too many
shadow pages", being able to observe that 99%+ of all shadow pages are
unsync is critical to being able to deduce that KVM is effectively leaking
shadow pages.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260612133727.411902-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
* Fix S390_USER_OPEREXEC so it can now be enabled regardless of other
unrelated capabilities
* Fix handling of the _PAGE_UNUSED pte bit that could lead to guest
memory corruption in some scenarios
* A bunch of misc gmap fixes (locking, behaviour under memory pressure)
* Fix CMMA dirty tracking
|
|
The GOP (and even Bspec on some platforms) is a bit inconsistent
on what the CDCLK_FREQ_DECIMAL divider should be. Currently any
mismatch there causes a full CDCLK PLL disable+re-enable, which
we really don't want to do if any displays are currently active.
Let's instead just reprogram CDCLK_FREQ_DECIMAL when that is the
only thing amiss. For any other (more serious) mismatch we still
punt to the full PLL reprogramming.
We also need to tweak the bxt_cdclk_cd2x_pipe() stuff a bit to
consistently select pipe==NONE since we have no idea which pipes
are enabled at this point. Since we're not actually changing the
CDCLK frequency here we don't need to sync the update to any
pipe.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/work_items/16209
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20260612173653.7830-2-ville.syrjala@linux.intel.com
Reviewed-by: Michał Grzelak <michal.grzelak@intel.com>
(cherry picked from commit 3f9de66f8acbf8ff45a91b4920605ed10c6b7c06)
Fixes: ba91b9eecb47 ("drm/i915/cdclk: Decouple cdclk from state->modeset")
Fixes: d66a21947e21 ("drm/i915/bxt: Sanitize CDCLK to fix breakage during S4 resume")
Fixes: c73666f394fc ("drm/i915/skl: If needed sanitize bios programmed cdclk")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
The TPR_THRESHOLD field in the VMCS is used by VMX to induce VM exits
when the guest's virtual TPR falls under the specified threshold,
allowing KVM to inject previously masked interrupts.
KVM handles these VM exits in handle_tpr_below_threshold().
Commit eb90f3417a0c ("KVM: vmx: speed up TPR below threshold vmexits")
optimized this function by calling apic_update_ppr() instead of raising
KVM_REQ_EVENT. apic_update_ppr() then raises KVM_REQ_EVENT if there is
a pending, deliverable interrupt.
However, if there are no new interrupts pending, apic_update_ppr() does
not issue the request. Thus, kvm_lapic_update_cr8_intercept() and
vmx_update_cr8_intercept() are not called before VM entry, which results
in a high, stale TPR_THRESHOLD. This is problematic due to the following
sentence in 28.2.1.1 "VM-Execution Control Fields" in the SDM:
The following check is performed if the “use TPR shadow” VM-execution
control is 1 and the “virtualize APIC accesses” and “virtual-interrupt
delivery” VM-execution controls are both 0: the value of bits 3:0 of
the TPR threshold VM-execution control field should not be greater
than the value of bits 7:4 of VTPR.
This error condition is typically not observed when KVM runs on a bare
metal system because modern processors support APICv, which enables
virtual-interrupt delivery, and which KVM uses when possible. This
causes the processor to no longer generate TPR-below-threshold exits
and to no longer check TPR_THRESHOLD on entry. However, when running
on older platforms, or under nested virtualization on a hypervisor that
does not support virtual-interrupt delivery and enforces this check
(like Hyper-V) this can cause a VM entry failure with hardware error
0x7, as seen in [1].
Call kvm_lapic_update_cr8_intercept() if apic_update_ppr() does not
find a deliverable interrupt (and thus does not raise KVM_REQ_EVENT).
Remove calls to kvm_lapic_update_cr8_intercept() on paths that end up in
apic_update_ppr(), as they now become redundant. This ensures that any
path that updates the guest's PPR also figures out if KVM needs to wait
for a TPR change (using TPR_THRESHOLD on VMX or CR8 intercepts on SVM).
Link: https://github.com/coconut-svsm/svsm/issues/1081 [1]
Tested-by: Stefano Garzarella <sgarzare@redhat.com>
Cc: stable@vger.kernel.org
Fixes: eb90f3417a0c ("KVM: vmx: speed up TPR below threshold vmexits")
Signed-off-by: Carlos López <clopez@suse.de>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260618174347.1981064-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When updating CR8 intercepts, get vmcs12 if and only if the vCPU is in
guest mode so that a future change can have update CR8 intercepts during
vCPU creation, without running afoul of get_vmcs12()'s lockdep assertion.
------------[ cut here ]------------
debug_locks && !(lock_is_held(&(&vcpu->mutex)->dep_map) || !refcount_read(&vcpu->kvm->users_count))
WARNING: arch/x86/kvm/vmx/nested.h:61 at get_vmcs12 arch/x86/kvm/vmx/nested.h:60 [inline], CPU#0: syz.2.19/5879
WARNING: arch/x86/kvm/vmx/nested.h:61 at vmx_update_cr8_intercept+0x3de/0x4e0 arch/x86/kvm/vmx/vmx.c:6879, CPU#0: syz.2.19/5879
Modules linked in:
CPU: 0 UID: 0 PID: 5879 Comm: syz.2.19 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:get_vmcs12 arch/x86/kvm/vmx/nested.h:60 [inline]
RIP: 0010:vmx_update_cr8_intercept+0x3de/0x4e0 arch/x86/kvm/vmx/vmx.c:6879
Call Trace:
<TASK>
apic_update_ppr arch/x86/kvm/lapic.c:984 [inline]
kvm_lapic_reset+0x1c24/0x2980 arch/x86/kvm/lapic.c:3023
kvm_vcpu_reset+0x44c/0x1bf0 arch/x86/kvm/x86.c:12986
kvm_arch_vcpu_create+0x746/0x8b0 arch/x86/kvm/x86.c:12847
kvm_vm_ioctl_create_vcpu+0x428/0x930 virt/kvm/kvm_main.c:4201
kvm_vm_ioctl+0x893/0xd50 virt/kvm/kvm_main.c:5159
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
No functional change intended.
Reported-by: syzbot ci <syzbot+ci493c6d734b63e050@syzkaller.appspotmail.com>
Closes: https://lore.kernel.org/all/6a2adf3b.3b0a2d4e.8c8d1.0012.GAE@google.com
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260618174347.1981064-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
WARN once if KVM's tracking for pending EOIs for Real-Time Clock IRQs goes
off the rails, as there's no reason to bug the host or risk a DoS due to
spamming dmesg with endless WARNs. Absolute worst case scenario, guest
time will go awry.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618174527.1982333-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
WARN and return an error up the stack if the PIC or I/O APIC encounters an
invalid vector when injecting an IRQ, as there is no danger to the host and
thus no justification for potentially panicking the kernel. Don't bug the
VM either, as the risk of corrupting the guest is minuscule, and the guest
might even be completely tolerant of a lost interrupt.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185213.2019937-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Bug the VM, not the host kernel, if KVM's ISR count {under,over}flows when
tracking in-flight ISRs. There is zero danger to the host if KVM messes up
its IRQ tracking.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185350.2020845-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Instead of bugging the host kernel, WARN and terminate the VM if KVM
attempts to write-protect at a level that cannot use leaf SPTEs.
There is no reason to bring down the entire host; even termininating
the VM is likely overkill, but in theory a missed write could corrupt
guest memory, so play it safe.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185641.2022368-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If KVM attempts to translate what it thinks is an L2 GPA with a non-nested
MMU, simply WARN and return the GPA, i.e. trust the MMU more than the
caller, as there is zero reason to potentially panic the host kernel just
because KVM misused an API.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20260618185746.2023283-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
get_unaligned()
Drop a BUG_ON() that has been reachable since it was first added, way back
in 2009, and instead use get_unaligned() to perform potentially-unaligned
accesses.
For a given store, KVM x86's emulator tracks the entire value in the
destination operand, x86_emulate_ctxt.dst. If the destination is memory,
and the target splits multiple pages and/or is emulated MMIO, then KVM
handles each fragment independently. E.g. on a page split starting at page
offset 0xffc, KVM writes 4 bytes to the first page, then the remaining
bytes to the second page, using ctxt->dst as the source for both (with
appropriate offsets).
If the destination splits a page *and* hits emulated MMIO on the second
page, then KVM will complete the write to the first page, then emulate the
MMIO access to the second page. If there is a datamatch-enabled ioeventfd
at offset 0 of the second page, then KVM will process the remainder of the
store as a potential ioeventfd signal.
Putting it all together, if the guest emits a store that splits a page
starting at page offset N, and the second page has a datamatch-enabled
ioeventfd at offset 0, then KVM will check for datamatch using
&dst.valptr[N] as the source. Due to dst (and thus dst.valptr) being
32-byte aligned, if N is not aligned to @len, the BUG_ON() fires.
E.g. with a 16-byte store at page offset 0xffc, to an ioeventfd of len 8,
all initial checks in ioeventfd_in_range() will succeed, and the BUG_ON()
fires due to @val being 4-byte aligned, but not 8-byte aligned.
------------[ cut here ]------------
kernel BUG at arch/x86/kvm/../../../virt/kvm/eventfd.c:783!
Oops: invalid opcode: 0000 [#1] SMP
CPU: 0 UID: 1000 PID: 615 Comm: repro Not tainted 7.1.0-rc2-ff238429d1ea #365 PREEMPT
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:ioeventfd_write+0x6c/0x70 [kvm]
Call Trace:
<TASK>
__kvm_io_bus_write+0x85/0xb0 [kvm]
kvm_io_bus_write+0x53/0x80 [kvm]
vcpu_mmio_write+0x66/0xf0 [kvm]
emulator_read_write_onepage+0x12a/0x540 [kvm]
emulator_read_write+0x109/0x2b0 [kvm]
x86_emulate_insn+0x4f8/0xfb0 [kvm]
x86_emulate_instruction+0x181/0x790 [kvm]
kvm_mmu_page_fault+0x313/0x630 [kvm]
vmx_handle_exit+0x18a/0x590 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0xc81/0x1c90 [kvm]
kvm_vcpu_ioctl+0x2d5/0x970 [kvm]
__x64_sys_ioctl+0x8a/0xd0
do_syscall_64+0xb7/0x890
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f19c931a9bf
</TASK>
Modules linked in: kvm_intel kvm irqbypass
---[ end trace 0000000000000000 ]---
In a perfect world, the fix would be to simply delete the BUG_ON(), as KVM
x86 doesn't perform alignment checks on "normal" memory accesses at CPL0.
Sadly, C99 ruins all the fun; while the x86 architecture plays nice,
dereferencing an unaligned pointer directly is undefined behavior in C,
e.g. triggers splats when running with CONFIG_UBSAN_ALIGNMENT=y.
Fixes: d34e6b175e61 ("KVM: add ioeventfd support")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260612225241.678509-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
snd_seq_event_dup() copies an incoming event into a pool cell and, in
the UMP-enabled build, clears the trailing cell->ump.raw.extra word that
the memcpy() did not cover. The guard deciding whether to clear it
compares the copied size against sizeof(cell->event):
memcpy(&cell->ump, event, size);
if (size < sizeof(cell->event))
cell->ump.raw.extra = 0;
For a legacy (non-UMP) event, size == sizeof(struct snd_seq_event) ==
sizeof(cell->event), so the condition is false and the extra word keeps
stale data. The cell pool is allocated with kvmalloc() (not zeroed) and
cells are reused via a free list, so that word holds uninitialised heap
or leftover event data.
When such a cell is delivered to a UMP client (client->midi_version > 0)
that set SNDRV_SEQ_FILTER_NO_CONVERT -- so the legacy event reaches it
unconverted -- snd_seq_read() reads it out as the larger struct
snd_seq_ump_event and copies the stale word to user space, a 4-byte
kernel heap infoleak to an unprivileged /dev/snd/seq client.
Compare against sizeof(cell->ump) instead, so the trailing word is zeroed
for every event shorter than the UMP cell.
Fixes: 46397622a3fa ("ALSA: seq: Add UMP support")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: HyeongJun An <sammiee5311@gmail.com>
Link: https://patch.msgid.link/20260623233841.853326-1-sammiee5311@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
If the allocation of the bits array failed, kvm_s390_set_cmma_bits()
would return 0 instead of an error code.
Rework the function to use the __free() macros and thus simplify the
code flow; when the above mentioned allocation fails, simply return
-ENOMEM.
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-10-imbrenda@linux.ibm.com>
|
|
The existing cmma selftest depended on the host allocating page tables
for all present memslots. Since the gmap rewrite, memory that is not
accessed by the guest might not have page tables allocated yet.
This caused the test to fail due to a mismatch in the assertion.
Fix by having the guest access also the second half of the test
memslot, thus guaranteeing that its page tables are present.
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-9-imbrenda@linux.ibm.com>
|
|
It is possible that some guest memory areas have not been touched yet
when starting migration mode, and thus have no ptes allocated. Only
existing and allocated ptes should count toward the total of dirty cmma
entries.
When starting migration mode, enable the migration_mode flag
immediately, so that any subsequent ESSA will trap in the host and
cause cmma_dirty_pages to be increased as needed.
Subsequently, set the cmma_d bit on all existing cmma-clean PGSTEs,
increasing cmma_dirty_pages as needed. Skipping cmma-dirty pages
prevents double counting.
Conversely, when disabling migration mode, set cmma_dirty_pages to 0
and clear the cmma_d bit in all existing PGSTEs.
The invariant is that when migration mode is off, no PGSTE has its
cmma_d bit set, and cmma_dirty_pages is 0. kvm->slots_lock protects
kvm_s390_vm_start_migration() and kvm_s390_vm_stop_migration() from
each other and from kvm_s390_get_cmma_bits().
Also fix dat_get_cmma() to properly wrap around if the first attempt
reached the end of guest memory without finding cmma-dirty pages.
[ imbrenda: Moved kvm_s390_sync_request_broadcast() before gmap_set_cmma_all_dirty() ]
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-8-imbrenda@linux.ibm.com>
|
|
Add the missing locking around dat_reset_cmma().
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-7-imbrenda@linux.ibm.com>
|
|
Under heavy memory pressure, handle_sske() and handle_pfmf() might
cause an endless loop if the mmu cache runs empty, the atomic
allocations fail, and the top-up function also fails. While quite
unlikely, that scenario is not impossible.
Fix the issue by not ignoring the return value of
kvm_s390_mmu_cache_topup(), and appropriately returning an error code
in case of failure.
Fixes: e38c884df921 ("KVM: s390: Switch to new gmap")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-6-imbrenda@linux.ibm.com>
|
|
The correct length to pass to kvm_s390_get_guest_pages() is asce.tl + 1,
not asce.dt + 1. It was a typo, which, due to fortuitous circumstances,
did not cause bugs. It should nonetheless be fixed.
Fixes: e5f98a6899bd ("KVM: s390: Add some helper functions needed for vSIE")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-5-imbrenda@linux.ibm.com>
|
|
Special pages / folios should not be set dirty. This also applies to
large pages.
Add a missing check in gmap_clear_young_crste() to prevent setting the
large page dirty if it is a special page.
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-4-imbrenda@linux.ibm.com>
|
|
If userspace passes a start address that is out of bounds,
_dat_walk_gfn_range() will fail with -EFAULT, but state.end will not be
touched and will stay 0. This will cause *count to underflow and report
a very high number, and the function will end up erroneously reporting
success.
Fix by only setting *count if the end address is not smaller than the
starting address. This way invalid starting addresses will correctly
return -EFAULT and *count will correctly indicate that no values have
been returned.
Fixes: 7b368470e1a4 ("KVM: s390: KVM page table management functions: CMMA")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-3-imbrenda@linux.ibm.com>
|
|
The _PAGE_UNUSED softbit should not really be lying around. Its sole
purpose is to signal to try_to_unmap_one() and try_to_migrate_one()
that the page can be discarded instead of being moved / swapped.
KVM has no way to know why a page is being unmapped, so it sets the bit
on userspace ptes corresponding to unused guest pages every time they
get unmapped. KVM has no reasonable way to clear the bit once the page
is in use again.
While set_ptes() checks and clears the bit, other paths that set new
ptes did not. This led to used pages being thrown out as if they were
unused, causing guest corruption.
Fix the issue by clearing the _PAGE_UNUSED bit for present ptes in
set_pte(), i.e. whenever a present pte is getting set. The check in
set_ptes() is then redundant and can be removed.
Also fix gmap_helper_try_set_pte_unused() to only set the bit if the
pte is present; the _PAGE_UNUSED bit is only defined for present ptes
and thus should not be set for non-present ptes.
Fixes: c98175b7917f ("KVM: s390: Add gmap_helper_set_unused()")
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260623153331.233784-2-imbrenda@linux.ibm.com>
|
|
Small typo noticed while writing the USER_OPEREXEC selftest.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260507200836.3500368-4-farman@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
|
|
There is a possibility that the user_operexec capability
only works if facility bit 74 is enabled. This is now fixed,
but add a selftest to demonstrate that.
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Acked-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260507200836.3500368-3-farman@linux.ibm.com>
|
|
The KVM_CAP_S390_USER_OPEREXEC capability allows operation exceptions
to be forwarded to userspace. But the actual enablement at the hardware
level occurs in kvm_arch_vcpu_postcreate(), and only if STFLE.74 or
user_instr0 are enabled. The latter is associated with a separate
capability (KVM_CAP_S390_USER_INSTR0), so the only way this happens
for the USER_OPEREXEC capability is if STFLE.74 is enabled. KVM
unconditionally enables this bit in kvm_arch_init_vm(), but the guest
could disable it from the CPU model and thus ignore this capability.
Add USER_OPEREXEC to the check in kvm_arch_vcpu_postcreate(), such that
either capability would enable this type of exception.
Fixes: 8e8678e740ec ("KVM: s390: Add capability that forwards operation exceptions")
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
[Fixed patch title, as recommended by frankja@linux.ibm.com]
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20260507200836.3500368-2-farman@linux.ibm.com>
|
|
The fix for tcp-fast-open ensures that the connect permission is being
mediated correctly but it didn't add an artifact to the feature set to
advertise the fix is available. Add an artifact so that the test suite
can identify if the fix has not been properly applied or a new
unexpected regression has occurred.
Fixes: 4d587cd8a7215 ("apparmor: mediate the implicit connect of TCP fast open sendmsg")
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
When requesting ownership of the NIC (MAC/PHY control), we set up
the heartbeat to look stale:
/* Initialize heartbeat, set last response to 1 second in the past
* so that we will trigger a timeout if the firmware doesn't respond
*/
fbd->last_heartbeat_response = req_time - HZ;
fbd->last_heartbeat_request = req_time;
The response handler then sets:
fbd->last_heartbeat_response = jiffies;
for which we wait via:
fbnic_fw_init_heartbeat() -> fbnic_fw_heartbeat_current()
The scheme is a bit odd, but it should work in principle.
Fix the ordering of operations. We have to set up the stale heartbeat
before we send the message. Otherwise if the response is very fast
we will override it. This triggers on QEMU if we run on the core
that handles the IRQ, and results in ndo_open failing with ETIMEDOUT.
The change in ordering doesn't impact releasing the ownership.
Both ndo_stop and heartbeat check are under rtnl_lock.
Fixes: 20d2e88cc746 ("eth: fbnic: Add initial messaging to notify FW of our presence")
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20260622154753.827506-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Fernando Fernandez Mancera says:
====================
ipv6: fix error handling in disable_ipv6 sysctl
While working on a different IPv6 patch series I have spotted multiple
minor bugs around sysctl error handling and notifications. In general,
they are not serious issues.
In addition, there is one more issue in forwarding sysctl as it does not
check for CAP_NET_ADMIN for the namespace. I am keeping that patch out
of this series and I am aiming it at the net-next tree once it re-opens.
During v3, Ido's pointed out that it is unnecessary to reset the
position pointer when the return value is negative as at
new_sync_write() the ppos is only advanced when ret return value is
positive. That means we can get rid of that operation in ipv4/ipv6
sysctls. That is going to be sent to net-next too.
====================
Link: https://patch.msgid.link/20260622130857.5115-1-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When changing the ignore_routes_with_linkdown sysctl for a specific
interface, the RTM_NEWNETCONF netlink notification was not being emitted
to userspace. Fix this by emitting the notification when needed.
In addition, fix bogus return value for successful "all" and specific
interface write operation leading to a wrong reset of the position
pointer.
Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-7-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When handling proxy_ndp, if rtnl_net_trylock() fails, the operation is
retried but as the value was already modified by the initial
proc_dointvec() call, the restarted syscall will read the newly modified
value as the 'old' state.
Fix this by taking the RTNL lock before parsing the input value if the
operation is a write.
Fixes: c92d5491a6d9 ("netconf: add support for IPv6 proxy_ndp")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-6-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When writing to the disable_policy sysctl, if proc_dointvec() fails to
parse the input, it returns a negative error code. The current
implementation is resetting the position argument even if an error
occurred during proc_dointvec() and not only during sysctl restart.
Fix this by checking the return value of proc_dointvec() and returning
early on failure.
Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-5-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When writing to the forwarding sysctl, if proc_dointvec() fails to parse
the input, it returns a negative error code. The current implementation
is overwriting that error for write operations.
This results in a silent failure, it returns a successful write although
the configuration was not modified at all. When modifying the "all"
variant it can also modify the configuration of existing interfaces to
the wrong value.
Fix this by checking the return value of proc_dointvec() and returning
early on failure. In addition, adjust return code of
addrconf_fixup_forwarding() for successful operation.
Fixes: b325fddb7f86 ("ipv6: Fix sysctl unregistration deadlock")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-4-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When writing to the ignore_routes_with_linkdown sysctl, if
proc_dointvec() fails to parse the input, it returns a negative error
code. The current implementation is overwriting that error for write
operations.
This results in a silent failure, it returns a successful write although
the configuration was not modified at all. When modifying the "all"
variant it can also modify the configuration of existing interfaces to
the wrong value.
Fix this by checking the return value of proc_dointvec() and returning
early on failure.
Fixes: 35103d11173b ("net: ipv6 sysctl option to ignore routes when nexthop link is down")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-3-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When writing to the disable_ipv6 sysctl, if proc_dointvec() fails to
parse the input, it returns a negative error code. The current
implementation is overwriting that error for write operations.
This results in a silent failure, it returns a successful write although
the configuration was not modified at all. When modifying the "all"
variant it can also modify the configuration of existing interfaces to
the wrong value.
Fix this by checking the return value of proc_dointvec() and returning
early on failure.
Fixes: 56d417b12e57 ("IPv6: Add 'autoconf' and 'disable_ipv6' module parameters")
Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Link: https://patch.msgid.link/20260622130857.5115-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I have left Sunplus and no longer have access to the relevant hardware
to test or maintain this driver. Mark the driver as orphaned.
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20260622180721.28334-1-wellslutw@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
au1000_close() calls free_irq() while aup->lock is still held with
spin_lock_irqsave(). free_irq() can sleep because it takes the IRQ
descriptor request mutex, so it does not belong inside the close-time
spinlocked section.
This was found by our static analysis tool and then confirmed by manual
review of the in-tree au1000_close() .ndo_stop path. The reviewed path
keeps aup->lock held across the MAC reset, queue stop and
free_irq(dev->irq, dev).
A directed runtime validation kept that ndo_stop carrier and the same
free_irq(dev->irq, dev) operation under the driver lock. Lockdep reported
"BUG: sleeping function called from invalid context" and "Invalid wait
context" while free_irq() was taking desc->request_mutex, with
au1000_close() and free_irq() on the stack.
Drop aup->lock before freeing the IRQ. The protected close-time work still
stops the device and queue before IRQ teardown, but the sleepable IRQ core
path now runs outside the spinlocked section.
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260619151816.1144289-1-runyu.xiao@seu.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When sctp_verify_init() encounters unrecognized parameters, it allocates an
err_chunk to report them. However, this chunk is leaked in several code
paths:
1. In sctp_sf_do_5_1B_init(), if security_sctp_assoc_request() fails after
sctp_verify_init() has populated err_chunk, the function returns
immediately without freeing it.
2. In sctp_sf_do_unexpected_init(), the same leak occurs on the
security_sctp_assoc_request() failure path.
3. In sctp_sf_do_unexpected_init(), on the success path after copying
unrecognized parameters to the INIT-ACK, the function returns without
freeing err_chunk, unlike sctp_sf_do_5_1B_init() which properly frees
it.
Fix all three leaks by adding sctp_chunk_free(err_chunk) calls before
returning in the error paths and on the success path in
sctp_sf_do_unexpected_init().
Fixes: c081d53f97a1 ("security: pass asoc to sctp_assoc_request and sctp_sk_clone")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/0656704f1b0158287c98aec09ba36c83e4a537ab.1781970534.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tcf_classify() can return TC_ACT_CONSUMED while the skb is held by the
defragmentation engine (e.g. act_ct on out-of-order fragments). When
that happens the skb is no longer owned by the caller and must not be
touched again.
tcf_qevent_handle() did not handle TC_ACT_CONSUMED: it fell through the
switch and returned the skb to the caller as if classification had
passed. The only qdisc that wires up qevents today is RED, via three call sites
(qe_mark on RED_PROB_MARK/HARD_MARK, qe_early_drop on congestion_drop)
red_enqueue() was continuing to operate on an skb it no longer owns in this
case -- enqueueing it, dropping it, or updating statistics. Resulting in a UAF.
tc qdisc add dev eth0 root handle 1: red ... qevent early_drop block 10
tc filter add block 10 ... action ct
(with ct defrag enabled and traffic that produces out-of-order
fragments, e.g. a fragmented UDP stream)
Handle TC_ACT_CONSUMED in tcf_qevent_handle() the same way the ingress
and egress fast paths do: treat it as stolen and return NULL without
touching the skb. Unlike the TC_ACT_STOLEN case, the skb must not be
dropped/freed here, as it is no longer owned by us.
Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags")
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20260620130749.226642-1-jhs@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jakub Sitnicki says:
====================
Drop skb metadata before LWT encapsulation
See description for patch 1.
====================
Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-0-71d6a33ab76b@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Test that an LWT encapsulation does not silently corrupt XDP metadata
sitting in the skb headroom. Exercise all three LWT dispatch paths:
- BPF LWT xmit prog reserves headroom on the LWT .xmit redirect,
- mpls pushes an MPLS label on the LWT .xmit redirect,
- seg6 in encap mode runs on the LWT .input redirect,
- ioam6 encap inserts an IOAM Hop-by-Hop option on LWT .output redirect.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-2-71d6a33ab76b@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
skb metadata is meant for passing information between XDP and TC. It lives
in the skb headroom, immediately before skb->data. LWT programs cannot
access the __sk_buff->data_meta pseudo-pointer to metadata.
However, LWT encapsulation prepends outer headers, moving skb->data back
over the headroom where the metadata sits. On an RX-originated (forwarded)
packet that still carries XDP metadata this goes wrong in two different
ways, depending on the encap type:
1. Non-BPF LWT encaps (mpls, seg6, ioam6 ...) call skb_push()/skb_pull()
and silently overwrite the metadata that sits in the headroom.
2) BPF LWT xmit calls bpf_skb_change_head(), which uses skb_data_move().
That helper expects metadata immediately before skb->data. But since
the IP output path runs LWT xmit before neighbour output has built
the outgoing L2 header, for forwarded packets skb->data points at the
L3 header while skb_mac_header() still points at the old L2 header.
skb_data_move() sees metadata ending at skb_mac_header(), not before
skb->data, warns and clears metadata:
WARNING: CPU: 21 PID: 454557 at include/linux/skbuff.h:4609 skb_data_move+0x47/0x90
CPU: 21 UID: 0 PID: 454557 Comm: napi/iconduit-g Tainted: G O 6.18.21 #1
RIP: 0010:skb_data_move+0x47/0x90
Call Trace:
<IRQ>
bpf_skb_change_head+0xe6/0x1a0
bpf_prog_...+0x213/0x2e3
run_lwt_bpf.isra.0+0x1d3/0x360
bpf_xmit+0x46/0xe0
lwtunnel_xmit+0xa1/0xf0
ip_finish_output2+0x1e7/0x5e0
ip_output+0x63/0x100
__netif_receive_skb_one_core+0x85/0xa0
process_backlog+0x9c/0x150
__napi_poll+0x2b/0x190
net_rx_action+0x40b/0x7f0
handle_softirqs+0xd2/0x270
do_softirq+0x3f/0x60
</IRQ>
That is what happens, as for how to fix it - a received packet that
carries metadata can reach an encap through any of the three LWT
redirect modes:
LWTUNNEL_STATE_INPUT_REDIRECT
ip6_rcv_finish
dst_input
lwtunnel_input
LWTUNNEL_STATE_OUTPUT_REDIRECT
ip6_rcv_finish
dst_input
ip6_forward
ip6_forward_finish
dst_output
lwtunnel_output
LWTUNNEL_STATE_XMIT_REDIRECT
ip6_rcv_finish
dst_input
ip6_forward
ip6_forward_finish
dst_output
ip6_output
ip6_finish_output
ip6_finish_output2
lwtunnel_xmit
Every encap funnels through the three LWT dispatch helpers, so drop the
metadata there, right before handing the skb to the encap op. This
single chokepoint covers all encap types and all three redirect modes:
- lwtunnel_input(): seg6, rpl, ila, seg6_local
- lwtunnel_output(): ioam6
- lwtunnel_xmit(): mpls, LWT BPF xmit
Alternatively, we could clear the metadata right after TC ingress hook.
That would require a compromise, however. Metadata would become
inaccessible from TC egress (in setups where it actually reaches the
hook it tact, that is without any L2 tunnels on path).
Fixes: 8989d328dfe7 ("net: Helper to move packet data and metadata after skb_push/pull")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20260619-bpf-lwt-drop-skb-metadata-v3-1-71d6a33ab76b@cloudflare.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull NFS client updates from Anna Schumaker:
"New features:
- XPRTRDMA: Decouple req recycling from RPC completion
- NFS: Expose FMODE_NOWAIT for read-only files
Bugfixes:
- SUNRPC:
- Fix sunrpc sysfs error handling
- Fix uninitialized xprt_create_args structure
- XPRTRDMA:
- Harden connect and reply handling
- NFS:
- Fix EOF updates after fallocate/zero-range
- Keep PG_UPTODATE clear after read errors in page groups
- Use nfsi->rwsem to protect traversal of the file lock list
- Prevent resource leak in nfs_alloc_server()
- NFSv4:
- Clear exception state on successful mkdir retry
- Don't skip revalidate when holding a dir delegation and attrs are stale
- pNFS:
- Fix use-after-free in pnfs_update_layout()
- Defer return_range callbacks until after inode unlock
- Fix LAYOUTCOMMIT retry loop on OLD_STATEID
- Reject zero-length r_addr in nfs4_decode_mp_ds_addr
- NFS/flexfiles:
- Reject zero-length filehandle version arrays
- Fix checking if a layout is striped
- Fixes for honoring FF_FLAGS_NO_IO_THRU_MDS
Other cleanups and improvements:
- Remove the fileid field from struct nfs_inode
- Move long-delayed xprtrdma work onto the system_dfl_long_wq
- Convert xprtrdma send buffer free list to an llist
- Show "<redacted>" for cert_serial and privkey_serial mount options"
* tag 'nfs-for-7.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (42 commits)
NFS: Use common error handling code in nfs_alloc_server()
NFS: Prevent resource leak in nfs_alloc_server()
NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr
nfs: don't skip revalidate on directory delegation when attrs flagged stale
xprtrdma: Return sendctx slot after Send preparation failure
xprtrdma: Repost Receive buffers for malformed replies
xprtrdma: Sanitize the reply credit grant after parsing
xprtrdma: Fix bcall rep leak and unbounded peek
xprtrdma: Resize reply buffers before reposting receives
xprtrdma: Check frwr_wp_create() during connect
xprtrdma: Initialize re_id before removal registration
xprtrdma: Fix ep kref imbalance on ADDR_CHANGE
xprtrdma: Convert send buffer free list to llist
NFS: correct CONFIG_NFS_V4 macro name in #endif comment
nfs: use nfsi->rwsem to protect traversal of the file lock list
NFSv4.1/pNFS: fix LAYOUTCOMMIT retry loop on OLD_STATEID
nfs: expose FMODE_NOWAIT for read-only files
nfs: add nowait version of nfs_start_io_direct
NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write
NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The changes primarily focus on filesystem error reporting, reducing
memory footprint by reverting in-memory data structures used for
runtime validation, honoring FDP hints, and adding trace and debug
logs. In addition, there are critical bug fixes resolving
out-of-bounds read vulnerabilities in inline directory and ACL
handling, potential deadlocks in balance_fs, use-after-free issues in
atomic writes, and false data/node type assignments in large sections.
Enhancements:
- Revert in-memory sit version and block bitmaps
- support to report fserror
- add trace_f2fs_fault_report
- add iostat latency tracking for direct IO
- add logs in f2fs_disable_checkpoint()
- honor per-I/O write streams for direct writes
- map data writes to FDP streams
- skip inode folio lookup for cached overwrite
- skip direct I/O iostat context when disabled
- revert "check in-memory block bitmap"
- revert "check in-memory sit version bitmap"
Fixes:
- optimize representative type determination in GC
- fix incorrect FI_NO_EXTENT handling in __destroy_extent_node()
- fix potential deadlock in f2fs_balance_fs()
- fix potential deadlock in gc_merge path of f2fs_balance_fs()
- atomic: fix UAF issue on f2fs_inode_info.atomic_inode
- fix missing read bio submission on large folio error
- pass correct iostat type for single node writes
- fix to do sanity check on f2fs_get_node_folio_ra()
- validate orphan inode entry count
- keep atomic write retry from zeroing original data
- read COW data with the original inode during atomic write
- validate inline dentry name lengths before conversion
- validate dentry name length before lookup compares it
- reject setattr size changes on large folio files
- revert "remove non-uptodate folio from the page cache in move_data_block"
- validate ACL entry sizes in f2fs_acl_from_disk()
- bound i_inline_xattr_size for non-inline-xattr inodes
- fix listxattr handling of corrupted xattr entries
- fix to round down start offset of fallocate for pin file"
* tag 'f2fs-for-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (42 commits)
f2fs: fix to round down start offset of fallocate for pin file
f2fs: fix listxattr handling of corrupted xattr entries
f2fs: skip direct I/O iostat context when disabled
f2fs: remove unneeded f2fs_is_compressed_page()
f2fs: avoid unnecessary fscrypt_finalize_bounce_page()
f2fs: avoid unnecessary sanity check on ckpt_valid_blocks
f2fs: misc cleanup in f2fs_record_stop_reason()
f2fs: fix wrong description in printed log
f2fs: bound i_inline_xattr_size for non-inline-xattr inodes
f2fs: validate ACL entry sizes in f2fs_acl_from_disk()
Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block"
f2fs: Split f2fs_write_end_io()
f2fs: Rename f2fs_post_read_wq into f2fs_wq
f2fs: Prepare for supporting delayed bio completion
f2fs: reject setattr size changes on large folio files
f2fs: validate dentry name length before lookup compares it
f2fs: validate inline dentry name lengths before conversion
f2fs: read COW data with the original inode during atomic write
f2fs: skip inode folio lookup for cached overwrite
f2fs: keep atomic write retry from zeroing original data
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
- Prevent NULL dereference on theoretical missing IO bitmap (Li
RongQing)
* tag 'x86-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ioperm: Prevent NULL dereference on theoretical missing IO bitmap
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc timer fixes from Ingo Molnar:
- Fix timekeeping locking order bug in the timekeeping init code
(Mikhail Gavrilov)
- Fix u64 multiplication bug in the posix-cpu-timers code on 32-bit
kernels (Zhan Xusheng)
- Fix macro name in comment block (Ethan Nelson-Moore)
- Fix off-by-one bug in the compat settimeofday() usecs validation code
(Wang Yan)
* tag 'timers-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Fix off-by-one in compat settimeofday() usec validation
hrtimer: Correct CONFIG_NO_HZ_COMMON macro name in comment
posix-cpu-timers: Use u64 multiplication in update_rlimit_cpu()
timekeeping: Register default clocksource before taking tk_core.lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc CPU hotplug fixes from Ingo Molnar:
- Fix CPU hotplug error handling rollback bug (Bradley Morgan)
- Fix possible output OOB write bug in the sysfs hotplug states
printing code (Bradley Morgan)
* tag 'smp-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: hotplug: Bound hotplug states sysfs output
cpu: hotplug: Preserve per instance callback errors
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fix from Ingo Molnar:
- Fix event::addr_filter_ranges lifetime bug (Peter Zijlstra)
* tag 'perf-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix addr_filter_ranges lifetime
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2026-06-22
1) xfrm: use compat translator only for u64 alignment mismatch
Gate the XFRM_USER_COMPAT translator on COMPAT_FOR_U64_ALIGNMENT
so 32-bit compat tasks on arches whose 32-bit ABI already matches
the native 64-bit layout are no longer rejected with -EOPNOTSUPP.
From Sanman Pradhan.
2) net: af_key: initialize alg_key_len for IPComp states
Initialize the alg_key_len to 0 in the IPComp branch of
pfkey_msg2xfrm_state() so an uninitialized value cannot drive
xfrm_alg_len() into a slab-out-of-bounds kmemdup during
XFRM_MSG_MIGRATE. From Zijing Yin.
3) xfrm: Fix dev use-after-free in xfrm async resumption
Stash the original skb->dev and extend the RCU critical section
across xfrm_rcv_cb() and transport_finish() to prevent a
tunnel-device UAF and original-device refcount leak when a
callback replaces skb->dev. From Dong Chenchen.
4) xfrm: Fix xfrm state cache insertion race
Move the state-validity check inside xfrm_state_lock in the
input state cache insertion path so a state cannot be killed
between the check and the insert. From Herbert Xu.
5) xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
Add READ_ONCE()/WRITE_ONCE() annotations on xfrm_policy_count
and xfrm_policy_default to silence the KCSAN data race reported
on net->xfrm.policy_count. From Eric Dumazet.
6) espintcp: use sk_msg_free_partial to fix partial send
Replace the manual skmsg accounting in espintcp with
sk_msg_free_partial() so the skmsg stays consistent on every
iteration and the partial-send accounting bugs go away.
From Sabrina Dubroca.
7) xfrm: validate selector family and prefixlen during match
Reject mismatched address families in xfrm_selector_match() and
bound prefixlen in addr4_match()/addr_match() to prevent the
shift-out-of-bounds syzbot reported when an AF_UNSPEC selector
with a large prefixlen is matched against an IPv4 flow.
From Eric Dumazet.
* tag 'ipsec-2026-06-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: validate selector family and prefixlen during match
espintcp: use sk_msg_free_partial to fix partial send
xfrm: annotate data-races around xfrm_policy_count[] and xfrm_policy_default[]
xfrm: Fix xfrm state cache insertion race
xfrm: Fix dev use-after-free in xfrm async resumption
net: af_key: initialize alg_key_len for IPComp states
xfrm: use compat translator only for u64 alignment mismatch
====================
Link: https://patch.msgid.link/20260622075726.29685-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
- Fix the incorrect RCU protection in rt_spin_unlock() (Thomas
Gleixner)
* tag 'locking-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rt: Fix the incorrect RCU protection in rt_spin_unlock()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc core fixes from Ingo Molnar:
- Fix an MM-CID race that can cause an OOB write (Rik van Riel)
- Fix a debugobjects OOM handling race (Thomas Gleixner)
* tag 'core-urgent-2026-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Plug race against a concurrent OOM disable
sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
|