summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
7 daysipv6: fix possible infinite loop in fib6_select_path()Jiayuan Chen
Found while auditing the same pattern Sashiko reported in rt6_fill_node() [1]. Apply the same fix as commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()"). Writers holding tb6_lock can list_del_rcu(&first->fib6_siblings) without waiting for RCU readers; first->fib6_siblings.next then still points into the old ring and this softirq-side walker never reaches &first->fib6_siblings as its terminator. fib6_purge_rt() always WRITE_ONCE()s first->fib6_nsiblings to 0 before list_del_rcu(), so an inside-loop check is a reliable detach signal. [1] https://sashiko.dev/#/patchset/20260526020227.4857-1-jiayuan.chen%40linux.dev Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-2-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysipv6: fix possible infinite loop in rt6_fill_node()Jiayuan Chen
Sashiko reported this issue [1]. Apply the same fix as commit f8d8ce1b515a ("ipv6: fix possible infinite loop in fib6_info_uses_dev()"). Writers holding tb6_lock can list_del_rcu(&rt->fib6_siblings) without waiting for RCU readers; rt->fib6_siblings.next then still points into the old ring and this softirq-side walker never reaches &rt->fib6_siblings, causing a CPU stall. fib6_del_route() always WRITE_ONCE()s rt->fib6_nsiblings to 0 before list_del_rcu(), so an inside-loop check is a reliable detach signal. [1] https://sashiko.dev/#/patchset/20260526020227.4857-1-jiayuan.chen%40linux.dev Fixes: d9ccb18f83ea ("ipv6: Fix soft lockups in fib6_select_path under high next hop churn") Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260527053133.180695-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysbpf: sockmap: fix tail fragment offset in bpf_msg_push_dataYuqi Xu
When bpf_msg_push_data() inserts data in the middle of a scatterlist entry, it splits the original entry into a left fragment and a right fragment. The right fragment offset is page-local, but the code advances it with `start`, which is the message-global insertion point. For inserts into a non-first SG entry, this over-advances the offset and leaves the split layout inconsistent. Advance the right fragment offset by the fragment-local delta, `start - offset`, which matches the length removed from the front of the original entry. Fixes: 6fff607e2f14 ("bpf: sk_msg program helper bpf_msg_push_data") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yuqi Xu <xuyq21@lenovo.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/8b129d10566aa3eb43f61a8f9757bcf51707d324.1779636774.git.xuyq21@lenovo.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysvsock/virtio: bind uarg before filling zerocopy skbJingguo Tan
virtio_transport_send_pkt_info() allocates or reuses the zerocopy uarg before entering the send loop, but virtio_transport_alloc_skb() still fills the skb before it inherits that uarg. When fixed-buffer vectored zerocopy hits MAX_SKB_FRAGS, io_sg_from_iter() may partially attach managed frags and return -EMSGSIZE. The rollback path call kfree_skb() to free an skb that carries SKBFL_MANAGED_FRAG_REFS but no uarg, so skb_release_data() falls through to ordinary frag unref. Pass the uarg into virtio_transport_alloc_skb() and bind it immediately before virtio_transport_fill_skb(). This keeps control or no-payload skbs untouched while ensuring success and rollback share one lifetime rule. Fixes: 581512a6dc93 ("vsock/virtio: MSG_ZEROCOPY flag support") Signed-off-by: Lin Ma <malin89@huawei.com> Signed-off-by: Rongzhen Cui <cuirongzhen@huawei.com> Signed-off-by: Jingguo Tan <tanjingguo@huawei.com> Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20260527023301.1075581-1-malin89@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 daysMerge commit 'kvm-psc-for-7.1' into HEADPaolo Bonzini
7 daysKVM: SEV: Use READ_ONCE() when reading entries/indices from PSC bufferSean Christopherson
Use READ_ONCE() when reading entries/indices from the guest-accessible Page State Change buffer to defend against TOCTOU bugs. Don't bother with READ_ONCE()/WRITE_ONCE() for cases where KVM is writing (and not consuming the result!), as the guest isn't supposed to touch the buffer while it's being processed. I.e. using READ_ONCE() is all about protecting against misbehaving guests. Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-11-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Check PSC request indices against the actual size of the bufferSean Christopherson
When processing Page State Change (PSC) requests, validate the PSC buffer against the effective size of the scratch area, which could be less than the maximum size if the guest provided a pointer that isn't exactly at the start of the GHCB shared buffer. Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Don't explicitly pass PSC buffer to snp_begin_psc()Sean Christopherson
Stop explicitly passing the PSC buffer to snp_begin_psc(): it *must* be the scratch area. This will allow fixing a variety of bugs without further complicating the code. No functional change intended. Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: WARN if KVM attempts to setup scratch area with min_len==0Sean Christopherson
Now that all paths in KVM properly validate the length needed for the scratch area, and are guaranteed to pass in a non-zero length, WARN if KVM attempts to configured the scratch area with min_len==0 to guard against future bugs. Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Compute the correct max length of the in-GHCB scratch areaSean Christopherson
When setting the length of the GHCB scratch area, and the area is in the GHCB shared buffer, set the effective length of the scratch area to the max possible size given the start of the guest-provided pointer, and the end of the shared buffer. The code was "fine" when first introduced, as KVM doesn't consult the length of the buffer when emulating MMIO, because the passed in @len always specifies the *max* size required. But for PSC requests, the incoming @len is just the minimum length (to process the header), and KVM needs to know the full size of the scratch area to avoid buffer overflows (spoiler alert). Opportunistically rename @len => @min_len to better reflect its role. Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Use the size of the PSC header as the minimum size for PSC requestsSean Christopherson
When handling a Page State Change (PSC) #VMGEXIT use the size of the PSC header as the minimum size for the scratch area. Per the GHCB spec, PSC requests do NOT provide the length, i.e. using control->exit_info_2 for the length is completely made up behavior. The existing code "works", e.g. even though Linux-as-a-guest always passes '0', because KVM doesn't do anything with the length when the request is in the GHCB's shared buffer. Use the header as the min length. Once the header is retrieved, KVM can use the specified indices to compute the full size of the request. Fixes: 9b54e248d264 ("KVM: SEV: Add support to handle Page State Change VMGEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Ignore Port I/O requests of length '0'Sean Christopherson
Explicitly ignore Port I/O requests of length '0' (or count '0'), so that setting up the software scratch area (and other code) doesn't have to worry about underflowing the length, and to allow for WARNing on trying to configure the scratch area with len==0. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Reject MMIO requests larger than 8 bytes with GHCB v2+Sean Christopherson
When using GHCB v2+, reject MMIO requests that are larger than 8 bytes. Per the GHCB spec: SW_EXITINFO2 must be less than or equal to 0x7fffffff for version 1 and less than or equal to 0x8 for all other versions. Fixes: 4af663c2f64a ("KVM: SEV: Allow per-guest configuration of GHCB protocol version") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Ignore MMIO requests of length '0'Sean Christopherson
Explicitly ignore MMIO requests of length '0', so that setting up the software scratch area (and other code) doesn't have to worry about underflowing the length, and to allow for special casing '0' in the future. Fixes: 8f423a80d299 ("KVM: SVM: Support MMIO for an SEV-ES guest") Cc: stable@vger.kernel.org Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysKVM: SEV: Require in-GHCB scratch area if GHCB v2+ is in useMichael Roth
As per the GHCB spec, when using GHCB v2+ require the software scratch area to reside in the GHCB's shared buffer. Note, things like Page State Change (PSC) requests _rely_ on this behavior, as the guest can't provide a length when making the request, i.e. the size of the guest payload is bounded by the size of the shared buffer. Failure to force usage of the GHCB, and a slew of other flaws, lets a malicious SNP guest corrupt host kernel heap memory, and leak host heap layout information. setup_vmgexit_scratch() allocates a buffer via kvzalloc(exit_info_2), where exit_info_2 is guest-controlled. With exit_info_2=24, this yields a 24-byte allocation in kmalloc-cg-32 (32-byte slab objects). The buffer holds an 8-byte psc_hdr followed by 8-byte psc_entry structs, so only entries[0] and entries[1] are in-bounds. snp_begin_psc() validates end_entry against VMGEXIT_PSC_MAX_COUNT (253) but NOT against the actual buffer size: idx_end = hdr->end_entry; if (idx_end >= VMGEXIT_PSC_MAX_COUNT) { // checks 253, not buffer snp_complete_psc(svm, ...); return 1; } for (idx = idx_start; idx <= idx_end; idx++) { entry_start = entries[idx]; // OOB when idx >= 2 The guest sets end_entry=10+, causing the host to iterate entries[2+] which are OOB into adjacent slab objects. For each OOB entry: - The host reads 8 bytes (OOB READ / info leak oracle) - If the data passes PSC validation, __snp_complete_one_psc() writes cur_page = 1 or 512 into the entry (OOB WRITE, sev.c:3806) - If validation fails, the error response reveals whether adjacent memory is zero vs non-zero (information disclosure to guest) The guest controls allocation size (exit_info_2), entry range (cur_entry/end_entry), and can fire unlimited VMGEXITs to repeatedly hit different slab positions. By exploiting the variety of bugs, a malicious SEV-SNP guest can: - OOB read adjacent kmalloc-cg-32 objects (heap layout disclosure) - OOB write cur_page bits into adjacent objects (heap corruption) - Trigger use-after-free conditions across VMGEXITs E.g. with KASAN enabled, a single insmod of the PoC guest module produces 73 KASAN reports: BUG: KASAN: slab-out-of-bounds in snp_begin_psc+0x126/0x890 Read of size 8 at addr ffff888219ffb5e0 by task qemu-system-x86/2199 BUG: KASAN: slab-out-of-bounds in snp_begin_psc+0x468/0x890 Write of size 8 at addr ffff888351566648 by task qemu-system-x86/2199 The buggy address belongs to the object at ffff888XXXXXXXXX which belongs to the cache kmalloc-cg-32 of size 32 The buggy address is located N bytes to the right of allocated 32-byte region [ffff888XXXXXXXXX, ffff888XXXXXXXXX) Breakdown: 62 slab-out-of-bounds (reads + writes past allocation) 7 slab-use-after-free 4 use-after-free All credit to Stan for the wonderful description and reproducer! Reported-by: Stan Shaw <shawstan96@gmail.com> Cc: Michael Roth <michael.roth@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Peter Gonda <pgonda@google.com> Cc: Jacky Li <jackyli@google.com> Fixes: 4af663c2f64a ("KVM: SEV: Allow per-guest configuration of GHCB protocol version") Cc: stable@vger.kernel.org Signed-off-by: Michael Roth <michael.roth@amd.com> [sean: write changelog] Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260501202250.2115252-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 daysMerge tag 'block-7.1-20260529' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fix from Jens Axboe: "Just a single fix for the block side, making a slight tweak to a fix from this cycle" * tag 'block-7.1-20260529' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: blk-mq: reinsert cached request to the list
7 daysMerge tag 'io_uring-7.1-20260529' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fix from Jens Axboe: "Just a single fix for a regression introduced in this cycle, where we should ensure the node is visible before the entry is added to the tctx list" * tag 'io_uring-7.1-20260529' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/tctx: set ->io_uring before publishing the tctx node
7 daysMerge tag 'kvmarm-fixes-7.1-4' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #4 - Restore CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC to its former glory by making sure the config symbol is correctly spelled out in the code - Don't reset the AArch32 view of the PMU counters to zero when the guest is writing to them - Fix an assorted collection of memory leaks in the newly added tracing code - Fix the capping of ZCR_EL2 which could be used in an unsanitised way by an L2 guest
7 daysMerge tag 'kvm-x86-fixes-7.1-rc6' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 fixes for 7.1-rcN - Include the kernel's linux/mman.h in KVM selftests to ensure MADV_COLLAPSE is defined, as older libc versions may not provide it. - Include execinfo.h if and only if KVM selftests are building against glibc, and provide a test_dump_stack() for non-glibc builds. - Fudge around an RCU splat in the emegerncy reboot code that is technically a legitimate flaw, but in practice is a non-issue and fixing the flaw, e.g. by adding locking, would incur meaningful risk, i.e. do more harm than good. - Rate-limit global clock updates once again (but without delayed work), as KVM was subtly relying on the old rate-limiting for NPT correction to guard against "update storms" when running without a master clock on systems with overcommitted CPUs. - Fix a brown paper bag goof where KVM checked if ERAPS is "dirty" instead of marking it dirty when emulating INVPCID. - Flush the TLB when transitioning from xAVIC => x2AVIC to ensure the CPU TLB doesn't contain AVIC-tagged entries for the APIC base GPA.
7 daysMerge tag 'cxl-fixes-7.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) fixes from Dave Jiang: - cxl/test: update mock dev array before calling platform_device_add() * tag 'cxl-fixes-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/test: Update mock dev array before calling platform_device_add()
7 daysMerge tag 'iommu-fixes-v7.1-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - Fix compile warning with gcc-16.1 - Intel VT-d: Simplify calculate_psi_aligned_address() - MAINTAINERS updates * tag 'iommu-fixes-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: MAINTAINERS: Add my employer to my entries MAINTAINERS: Add Vasant Hegde to reviewers of AMD IOMMU iommu, debugobjects: avoid gcc-16.1 section mismatch warnings iommu/vt-d: Simplify calculate_psi_aligned_address()
7 daysMerge tag 'sound-7.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of recent small fixes and quirks. We still see a bit more changes than wished, but most of them are device-specific ones that are pretty safe to apply, while a core fix is a typical UAF fix for PCM core that was recently caught by fuzzer; so overall nothing looks really worrisome. Core: - Fix a UAF in PCM OSS proc interface HD-audio: - Fix memory leaks in CS35L56 driver - Various device-specific quirks for Realtek and CS420x codecs USB-audio: - Quirk for TAE1160 USB Audio - Fix for Scarlett2 Gen4 direct monitor gain ASoC: - Fixes for QCom q6asm-dai, Intel bytcht_es8316, and simple-mux codec FireWire: - Fix for Motu DSP event queue protection" * tag 'sound-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: codecs: simple-mux: Fix enum control bounds check ALSA: usb-audio: Add iface reset and delay quirk for TAE1160 USB Audio ALSA: hda/cs420x: Add CS4208 fixup for iMac16,1 ALSA: hda/realtek: add quirk for HP Dragonfly Folio G3 2-in-1 ALSA: hda/realtek: Fix speaker output on ASUS ROG Strix G615LP ASoC: qcom: q6asm-dai: use pointer type with kzalloc_obj() ASoC: qcom: q6asm-dai: remove unnecessary braces ASoC: qcom: q6asm-dai: fix error handling in prepare and set_params ASoC: qcom: q6asm-dai: close stream only when running ASoC: qcom: q6asm-dai: do not set stream state in event and trigger callbacks ASoC: Intel: bytcht_es8316: Fix MCLK leak on init errors ALSA: hda/realtek: Limit mic boost on Positivo DN140 ALSA: scarlett2: Fix 2i2 Gen 4 direct monitor gain on firmware 2417 ALSA: pcm: oss: Fix setup list UAF on proc write error ALSA: hda: cs35l56: Fix system name string leaks ALSA: hda/realtek: Add HDA_CODEC_QUIRK for Lenovo Yoga Slim 7 14AGP11 ALSA: hda/realtek: Fix incorrect comment for ALC299_FIXUP_PREDATOR_SPK ALSA: firewire-motu: Protect register DSP event queue positions
7 daysMerge tag 'hid-for-linus-2026052801' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - buffer overflow fix for lenovo (Kean) and wacom (Lee Jones) drivers - segfaults prevention in lenovo-go driver when used with an emulated device (Louis Clinckx) - cleanup of resources in u2fzero (Myeonghun Pak) - a quirk for a USB mouse and a cleanup in hid.h (hlleng and Liu Kai) * tag 'hid-for-linus-2026052801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: wacom: Fix OOB write in wacom_hid_set_device_mode() HID: lenovo-go: drop dead NULL check on to_usb_interface() HID: lenovo-go: reject non-USB transports in probe HID: lenovo: Fix buffer over-read and unaligned access in X12 Tab raw_event handler HID: quirks: Add ALWAYS_POLL quirk for SIGMACHIP USB mouse HID: remove duplicate hid_warn_ratelimited definition HID: u2fzero: free allocated URB on probe errors
7 daysmmc: sdhci: add signal voltage switch in sdhci_resume_hostJisheng Zhang
I met one suspend/resume issue with sdr104 capable sdio wifi card (with "keep-power-in-suspend" set in DT property): After resuming from suspend to ram, the sdio wifi card stops working. Further debug shows that although ios shows the sdio card is at sdr104 mode, the voltage is still at 3V3. This is due to missing the calling of ->start_signal_voltage_switch() in sdhci_resume_host(). Fix this issue by adding ->start_signal_voltage_switch() in sdhci_resume_host(). This also matches what we do for sdhci_runtime_resume_host(). Then the question is: why this issue hasn't reported and fixed for so long time. IMHO, several reasons: Some host controllers just kick off the runtime resume for system resume, so they benefit from the well supported runtime pm code; Some platforms just use the old sdio wifi card which doesn't need signal voltage switch at all, the default voltage is 3v3 after resuming. Fixes: 6308d2905bd3 ("mmc: sdhci: add quirk for keeping card power during suspend") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org>
7 daysmmc: dw_mmc-rockchip: Add missing private data for very old controllersHeiko Stuebner
The really old controllers (rk2928, rk3066, rk3188) do not support UHS speeds at all, and thus never handled phase data. For that reason it never had a parse_dt callback and no driver private data at all. Commit ff6f0286c896 ("mmc: dw_mmc-rockchip: Add memory clock auto-gating support") makes the private data sort of mandatory, because the init function checks whether phases are configured internally or through the clock controller. This results in the old SoCs then experiencing NULL-pointer dereferences when they try to access that private-data struct. While we could have if (priv) conditionals in all places, it's way less cluttery to just give the old types their private-data struct. Fixes: ff6f0286c896 ("mmc: dw_mmc-rockchip: Add memory clock auto-gating support") Cc: stable@vger.kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de> Acked-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulfh@kernel.org>
7 daysmmc: litex_mmc: Set mandatory idle clocks before CMD0Inochi Amaoto
The litex_mmc driver assumes the card is already probed in the BIOS and skip the phy initialization. This will cause the command fail like the following when the old card is unplugged and then insert a new card: [ 62.923593] litex-mmc f0004000.mmc: Command (cmd 8) error, status -110 [ 62.949717] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110 [ 62.976606] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110 [ 63.002516] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110 [ 63.028442] litex-mmc f0004000.mmc: Command (cmd 55) error, status -110 Add required clock settings and initialization for the CMD 0, so it can probe the new card. Fixes: 92e099104729 ("mmc: Add driver for LiteX's LiteSDCard interface") Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Gabriel Somlo <gsomlo@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org>
7 daysmmc: litex_mmc: Use DIV_ROUND_UP for more accurate clock calculationInochi Amaoto
The previous clock uses roundup_pow_of_two() to calculate the core clock frequency. It does not meet the actual hardware meaning. The actual frequency is calculated by "ref_clk / ((div >> 1) << 1)". Fix the clock divider calculation. Fixes: 92e099104729 ("mmc: Add driver for LiteX's LiteSDCard interface") Signed-off-by: Inochi Amaoto <inochiama@gmail.com> Reviewed-by: Gabriel Somlo <gsomlo@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org>
7 daysmmc: renesas_sdhi: Add OF entry for RZ/G2H SoCLad Prabhakar
The RZ/G2H (R8A774E1) SoC was previously handled via the generic "renesas,rcar-gen3-sdhi" fallback compatible string. However, because the SDHI IP on RZ/G2H is identical with the R-Car H3-N (R8A77951), it requires the specific quirks and configuration defined in `of_r8a7795_compatible` rather than the generic Gen3 data. Add the explicit "renesas,sdhi-r8a774e1" match entry to map it correctly. Note that the DT binding file renesas,sdhi.yaml does not need an update as the entry for this SoC is already present. Fixes: 31941342888d ("arm64: dts: renesas: r8a774e1: Add SDHI nodes") Cc: stable@vger.kernel.org Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Ulf Hansson <ulfh@kernel.org>
7 daysmmc: sdhci-of-dwcmshc: Fix reset, clk, and SDIO support for Eswin EIC7700Huan He
The EIC7700 code in sdhci-of-dwcmshc uses host->mmc->caps2 to select different configuration paths for different card types. The current logic distinguishes eMMC and SD, but does not handle SDIO separately. Update the EIC7700 card-type checks so that eMMC, SD and SDIO are distinguished explicitly. Switch the reset path to dwcmshc_reset() so that pending interrupt state is cleared consistently, and use sdhci_enable_clk() so the clock enable sequence follows the standard SDHCI flow. Fixes: 32b2633219d3 ("mmc: sdhci-of-dwcmshc: Add support for Eswin EIC7700") Signed-off-by: Huan He <hehuan1@eswincomputing.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulfh@kernel.org>
7 daysfirmware: samsung: acpm: Fix infinite loop on sequence number exhaustionTudor Ambarus
Sashiko identified a possible infinite loop [1]. ACPM IPC sequence numbers are tracked via a 64-bit bitmap. Previously, acpm_prepare_xfer() used a do...while loop to search for a free sequence number. If all 63 available sequence numbers are leaked due to transient hardware timeouts or mailbox failures, the bitmap becomes full. The next call to acpm_prepare_xfer() would enter an infinite loop. Fix this by utilizing the kernel's optimized bitmap search functions (find_next_zero_bit / find_first_zero_bit). If the pool is completely exhausted, log the failure and return -EBUSY to allow the kernel to fail gracefully instead of hanging. Furthermore, drop the allocation loop entirely. Because acpm_prepare_xfer() is strictly called under the 'tx_lock' mutex, sequence number allocations are perfectly serialized. If find_next_zero_bit() locates a free bit, a single test_and_set_bit_lock() is mathematically guaranteed to succeed. To enforce this locking invariant, wrap the allocation in a WARN_ON_ONCE. If the atomic set fails, it indicates the driver's mutex serialization is fundamentally broken. The warning generates a stack trace for debugging, while returning -EIO immediately aborts the transfer to prevent silent payload corruption. Cc: stable@vger.kernel.org Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver") Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://patch.msgid.link/20260505-acpm-fixes-sashiko-reports-v5-7-43b5ee7f1674@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
7 daysfirmware: samsung: acpm: Fix missing LKMM barriers in sequence allocatorTudor Ambarus
Sashiko identified memory ordering races in [1]. The ACPM driver uses a globally shared 'bitmap_seqnum' to track available sequence numbers. Even though threads now strictly free their own sequence numbers, the allocation and freeing of these bits across concurrent threads are effectively lockless operations and require explicit LKMM memory barriers. Previously, the driver used plain bitwise operators (test_bit, set_bit, clear_bit), which lack ordering guarantees. This creates two race conditions on weakly ordered architectures like ARM64: 1. Polling Release Violation: The polling thread copies its payload and calls clear_bit(). Without a release barrier, the CPU can reorder the memory operations, making the cleared bit globally visible before the payload reads have fully completed. 2. TX Acquire Violation: The TX thread loops on test_bit(), calls set_bit(), and then wipes the payload buffer via memset(). Without an acquire barrier, the CPU can speculatively execute the memset() before the bit is safely and formally claimed. If these reorderings overlap, a new TX thread can claim the sequence number and overwrite the buffer while the original polling thread is still actively reading from it. Fix this by upgrading the bitwise operators. Wrap the TX allocation in test_and_set_bit_lock() to establish formal LKMM Acquire semantics, and pair it with clear_bit_unlock() in the polling path to enforce Release semantics. Cc: stable@vger.kernel.org Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver") Closes: https://sashiko.dev/#/patchset/20260423-acpm-fixes-sashiko-reports-v1-0-2217b790925e%40linaro.org [1] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://patch.msgid.link/20260505-acpm-fixes-sashiko-reports-v5-6-43b5ee7f1674@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
7 daysfirmware: samsung: acpm: Fix false timeouts and Use-After-Free in pollingTudor Ambarus
Sashiko identified severe races in the polling state machine [1]. In the ACPM driver's polling mode, threads waited for responses by monitoring the globally shared 'bitmap_seqnum'. This caused false timeouts because if a thread processed its response and freed the sequence number, a concurrent TX thread could immediately reallocate it before the polling thread woke up. Additionally, the driver suffered from a cross-thread Use-After-Free (UAF) preemption race. Previously, acpm_get_rx() cleared the sequence number of whichever RX message it drained from the hardware queue. This meant Thread A could globally free Thread B's sequence slot while Thread B was asleep. A new Thread C could then steal the slot, overwrite the buffer, and leave Thread B to wake up to corrupted state or a timeout. Fix this by rewriting the polling state machine: 1. Decouple polling from the global allocator by introducing a per-slot 'completed' flag, synchronized via smp_store_release() and smp_load_acquire(). 2. Strip acpm_get_saved_rx() out of acpm_get_rx() to make it a pure queue-draining function. Introduce a 'native_match' boolean argument which evaluates to true only if the thread natively processed its own sequence number during the call. This explicitly informs the polling loop whether it must retrieve its payload from the cross-thread cache. 3. Centralize the cache fallback and sequence number free (clear_bit) inside the polling loop. Crucially, the free operation now strictly targets the thread's own TX sequence number (xfer->txd[0]), rather than the drained RX sequence number. This enforces strict ownership: a thread only ever frees its own allocated sequence slot, and only at the exact moment it completes its poll, eliminating the UAF window. Furthermore, explicitly guard the 'native_match' assignment with an if (rx_seqnum == tx_seqnum) check, even for zero-length (no payload) responses. While an unguarded assignment wouldn't crash (because the cache fallback acpm_get_saved_rx() safely returns early on zero-length transfers) doing so would "lie" to the state machine. If a thread drained the queue and found another thread's zero-length message, setting native_match = true would falsely convince the polling loop that it natively handled its own response. Maintaining a rigorous state machine requires that native_match is only set when a thread explicitly processes its own sequence number. Cc: stable@vger.kernel.org Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver") Closes: https://sashiko.dev/#/patchset/20260429-acpm-fixes-sashiko-reports-v3-0-47cf74ab09ad%40linaro.org [1] Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Link: https://patch.msgid.link/20260505-acpm-fixes-sashiko-reports-v5-5-43b5ee7f1674@linaro.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
7 daysRevert "media: renesas: vsp1: brx: Fix format propagation"Laurent Pinchart
This reverts commit 937f3e6b51f1cea079be9ba642665f2bf8bcc31f. The change to format propagation in the BRx broke configuration of the DRM pipeline. Revert it to fix the regression. The original commit was meant to fix a v4l2-compliance failure, with no known userspace applications being affected beside test tools. Reverting is the simplest option, a more comprehensive fix can be developed (and tested more thoroughly) later. Reported-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Closes: https://lore.kernel.org/linux-media/CA+V-a8t481xuwava0nb7uY9CUPqFWZ_8EP0xrK3BgumP7HDcLg@mail.gmail.com Fixes: 937f3e6b51f1 ("media: renesas: vsp1: brx: Fix format propagation") Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/T2H Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260506215650.1897177-3-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
7 daysRevert "media: renesas: vsp1: Initialize format on all pads"Laurent Pinchart
This reverts commit 133ac42af0a1b389e8b7b3dc7c1cc8c30ff162b6. The change to format initialization, along with the change to format propagation in the BRx in commit 937f3e6b51f1 ("media: renesas: vsp1: brx: Fix format propagation"), broke configuration of the DRM pipeline. Revert it to fix the regression. The original commit was meant to fix a v4l2-compliance failure, with no known userspace applications being affected beside test tools. Reverting is the simplest option, a more comprehensive fix can be developed (and tested more thoroughly) later. Fixes: 133ac42af0a1 ("media: renesas: vsp1: Initialize format on all pads") Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> # On RZ/T2H Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260506215650.1897177-2-laurent.pinchart+renesas@ideasonboard.com Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
7 daysKVM: arm64: Correctly cap ZCR_EL2 provided by a guest hypervisorMark Brown
ZCR_EL2 can be updated by a VHE guest hypervisor either using ZCR_EL2 (which traps) or ZCR_EL1 (which does not trap). KVM handles both in different way: - on ZCR_EL2 trap, ZCR_EL2.LEN is immediately capped at the VM's own VL limit. This has the potential to break existing SW that relies on the full LEN field to be stateful. - on ZCR_EL1 access, we do absolutely nothing. On restoring the SVE context for an L2 guest, we directly restore the guest hypervisor's view of ZCR_EL2 into the physical ZCR_EL2. If the guest's view of the register was updated using the ZCR_EL2 accessor, the value has already been sanitised (with the caveat mentioned above). But if the guest used ZCR_EL1, the raw value is written into the HW, and the L2 guest can now access VLs that it shouldn't. Fix all the above by moving the VL capping to the restore points, ensuring that: - the HW is always programmed with a capped value, irrespective of the accessor being used, - the ZCR_EL2.LEN field is always completely stateful, irrespective of the accessor being used. Additionally, move ZCR_EL2 to be a sanitised register, ensuring that only the LEN field is actually stateful. This requires some creative construction of the RES0 mask, as the sysreg generation script does not yet generate RAZ/WI fields. Fixes: b3d29a823099 ("KVM: arm64: nv: Handle ZCR_EL2 traps") Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260529-kvm-arm64-fix-zcr-len-nv-v2-1-86cad51992bd@kernel.org [maz: rewrote commit message, tidy up access_zcr_el2()] Signed-off-by: Marc Zyngier <maz@kernel.org>
7 daysRevert "esp: fix page frag reference leak on skb_to_sgvec failure"Steffen Klassert
This reverts commit 2982e599fff6faa21c8df147d96fc7af6c1a2f24. The patch does not fully fix the issue and the Author does not match the 'Signed-off-by:' tag, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
7 daysdrm: prevent integer overflows in dumb buffer creation helpersRajat Gupta
Fix integer overflow issues in the dumb buffer creation path: 1. drm_mode_create_dumb() does not bound width, height, or bpp before passing them to driver callbacks. Downstream helpers (e.g. drm_gem_dma_dumb_create_internal) perform pitch/size alignment in u32 arithmetic that can overflow for extreme values. Add hard limits: width and height < 8192, bpp <= 32. No legitimate software rendering use case exceeds these. 2. drm_mode_align_dumb() uses roundup(pitch, hw_pitch_align) without checking for overflow. If pitch is near U32_MAX, roundup() wraps to a small value, making subsequent check_mul_overflow() pass with a much smaller pitch than intended. Add an overflow check after roundup. 3. drm_mode_align_dumb() uses ALIGN(size, hw_size_align) which only works correctly for power-of-two alignment values. Replace with roundup() which works for any alignment. Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Rajat Gupta <rajat.gupta@oss.qualcomm.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
7 daysmm/cma: fix reserved page leak on activation failureMuchun Song
If cma_activate_area() fails after allocating only part of the range bitmaps, the cleanup path still has to release the reserved pages when CMA_RESERVE_PAGES_ON_ERROR is clear. That is still worth doing even in this __init path. A bitmap_zalloc() failure does not necessarily mean the system cannot make further progress: freeing the reserved CMA pages can return a substantial amount of memory to the buddy allocator and may relieve the temporary memory shortage that caused the allocation failure in the first place. However, the cleanup path currently uses the bitmap-freeing bound for page release as well. That is only correct for ranges whose bitmap allocation already succeeded. The failed range and all later ranges still keep their reserved pages, so a partial bitmap allocation failure can permanently leak them. Fix this by releasing reserved pages for all ranges. Use the saved early_pfn[] value for ranges whose bitmap allocation already succeeded and for the failed range, and use cmr->early_pfn for later ranges whose bitmap allocation was never attempted. Link: https://lore.kernel.org/20260523060123.2207992-1-songmuchun@bytedance.com Fixes: c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Usama Arif <usama.arif@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Frank van der Linden <fvdl@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysmm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoisonWupeng Ma
Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page can trigger a recursive spinlock self-deadlock (AA deadlock) on hugetlb_lock when racing with a concurrent unmap: thread#0 thread#1 -------- -------- madvise(folio, MADV_HWPOISON) -> poisons the folio successfully madvise(folio, MADV_HWPOISON) unmap(folio) try_memory_failure_hugetlb get_huge_page_for_hwpoison spin_lock_irq(&hugetlb_lock) <- held __get_huge_page_for_hwpoison hugetlb_update_hwpoison() -> MF_HUGETLB_FOLIO_PRE_POISONED goto out: folio_put() refcount: 1 -> 0 free_huge_folio() spin_lock_irqsave(&hugetlb_lock) -> AA DEADLOCK! The out: path in __get_huge_page_for_hwpoison() calls folio_put() to drop the GUP reference while the hugetlb_lock is still held by the hugetlb.c wrapper get_huge_page_for_hwpoison(). If concurrent unmap has released the page table mapping reference, folio_put() drops the folio refcount to zero, triggering free_huge_folio() which attempts to re-acquire the non-recursive hugetlb_lock. Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the folio_put() at the out: label so the folio is always released outside the lock. [akpm@linux-foundation.org: fix race, rename label per Miaohe] Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@huawei.com Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@huawei.com Link: https://lore.kernel.org/20260522010305.4099834-1-mawupeng1@huawei.com Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: Muchun Song <muchun.song@linux.dev> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysmm/hugetlb: restore reservation on error in hugetlb folio copy pathsDavid Carlier
Two sites in mm/hugetlb.c allocate a hugetlb folio via alloc_hugetlb_folio() (consuming a VMA reservation) and then call copy_user_large_folio(), which became int-returning in commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults") and can now fail (e.g. -EHWPOISON on a hwpoisoned source page). On the failure path, folio_put() restores the global hugetlb pool count through free_huge_folio(), but the per-VMA reservation map entry is left marked consumed: - hugetlb_mfill_atomic_pte() resubmission path (UFFDIO_COPY) - copy_hugetlb_page_range() fork-time CoW path when hugetlb_try_dup_anon_rmap() fails (rare: pinned hugetlb anon folio under fork) User-visible effect: on UFFDIO_COPY into a private hugetlb VMA where the resubmission copy fails, the reservation for that address is leaked from the VMA's reserve map. A subsequent fault at the same address takes the no-reservation path, and under hugetlb pool pressure the task is SIGBUSed at an address it had previously reserved. The fork-time CoW path leaks the same way in the child VMA's reserve map, though it requires the much rarer combination of pinned hugetlb anon page + hwpoisoned source. Add the missing restore_reserve_on_error() call before folio_put() on both error paths. Link: https://lore.kernel.org/20260520044912.6751-1-devnexen@gmail.com Fixes: 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults") Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Muchun Song <muchun.song@linux.dev> Cc: David Hildenbrand <david@kernel.org> Cc: Mina Almasry <almasrymina@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: yuehaibing <yuehaibing@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysmm/cma_debug: fix invalid accesses for inactive CMA areasMuchun Song
cma_activate_area() can fail after allocating range bitmaps. Its cleanup path frees those bitmaps, but only clears cma->count and cma->available_count. It leaves cma->nranges and each range's count in place, so cma_debugfs_init() can still register debugfs files for an area that never activated successfully. That exposes two problems. Reading the bitmap file can make debugfs walk a freed range bitmap and trigger an invalid memory access. Reading maxchunk can also take cma->lock even though that lock is initialized only on the successful activation path. Fix this by creating debugfs entries only for CMA areas that reached CMA_ACTIVATED. c009da4258f9 introduced the invalid access to bitmap file. 2e32b947606d introduced the invalid access to cma->lock. This change applies to both issues. So I added two Fixes tags. Link: https://lore.kernel.org/20260520061025.3971821-1-songmuchun@bytedance.com Fixes: c009da4258f9 ("mm, cma: support multiple contiguous ranges, if requested") Fixes: 2e32b947606d ("mm: cma: add functions to get region pages counters") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Frank van der Linden <fvdl@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Stefan Strogin <stefan.strogin@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysmemcg: use round-robin victim selection in refill_stockShakeel Butt
Harry Yoo reported that get_random_u32_below() is not safe to call in the nmi context and memcg charge draining can happen in nmi context. More specifically get_random_u32_below() is neither reentrant- nor NMI-safe: it acquires a per-cpu local_lock via local_lock_irqsave() on the batched_entropy_u32 state. An NMI that lands on a CPU mid-update of the ChaCha batch state and recurses into the random subsystem would corrupt that state. The memcg_stock local_trylock prevents re-entry on the percpu stock itself, but cannot protect an unrelated subsystem's per-cpu lock. Replace the random pick with a per-cpu round-robin counter stored in memcg_stock_pcp and serialized by the same local_trylock that already guards cached[] and nr_pages[]. No atomics, no random calls, no extra locks needed. Link: https://lore.kernel.org/20260521223751.3794625-1-shakeel.butt@linux.dev Fixes: f735eebe55f8f ("memcg: multi-memcg percpu charge cache") Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reported-by: Harry Yoo <harry@kernel.org> Closes: https://lore.kernel.org/4e20f643-6983-4b6e-b12d-c6c4eb20ae0c@kernel.org/ Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysmm/hugetlb: avoid false positive lockdep assertionLorenzo Stoakes
Commit 081056dc00a2 ("mm/hugetlb: unshare page tables during VMA split, not before") changed the locking model around hugetlbfs PMD unsharing on VMA split, but did not update the function which asserts the locks, hugetlb_vma_assert_locked(). This function asserts that either the hugetlb VMA lock is held (if a shared mapping) or that the reservation map lock is held (if private). If you get an unfortunate race between something which results in one of these locks being released and a hugetlb VMA split and you have CONFIG_LOCKDEP enabled, you can therefore see a false positive assertion arise when there is in fact no issue. Since this change introduced a new take_locks parameter to hugetlb_unshare_pmds(), which, when set to false, indicates that locking is sufficient, simply pass this to the unsharing logic and predicate the lock assertions on this. This is safe, as we already asserted the file rmap lock and the VMA write lock prior to this (implying exclusive mmap write lock), so we cannot be raced by either rmap or page fault page table walkers which the asserted locks are intended to protect against (we don't mind GUP-fast). Separate out huge_pmd_unshare() into __huge_pmd_unshare() to add a check_locks parameter, and update hugetlb_unshare_pmds() to pass this parameter to it. This leaves all other callers of huge_pmd_unshare() still correctly asserting the locks. The below reproducer will trigger the assert in a kernel with CONFIG_LOCKDEP enabled by racing process teardown (which will release the hugetlb lock) against a hugetlb split. void execute_one(void) { void *ptr; pid_t pid; /* * Create a hugetlb mapping spanning a PUD entry. * * We force the hugetlb page allocation with populate and * noreserve. * * |---------------------| * | | * |---------------------| * 0 PUD boundary */ ptr = mmap(0, PUD_SIZE, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED | MAP_ANON | MAP_NORESERVE | MAP_HUGETLB | MAP_POPULATE, -1, 0); if (ptr == MAP_FAILED) { perror("mmap"); exit(EXIT_FAILURE); } /* * Fork but with a bogus stack pointer so we try to execute code in * a non-VM_EXEC VMA, causing segfault + teardown via exit_mmap(). * * The clone will cause PMD page table sharing between the * processes first via: * copy_process() -> ... -> huge_pte_alloc() -> huge_pmd_share() * * Then tear down and release the hugetlb 'VMA' lock via: * exit_mmap() -> ... -> vma_close() -> hugetlb_vma_lock_free() */ pid = syscall(__NR_clone, 0, 2 * PMD_SIZE, 0, 0, 0); if (pid < 0) { perror("clone"); exit(EXIT_FAILURE); } if (pid == 0) { /* Pop stack... */ return; } /* * We are the parent process. * * Race the child process's teardown with a PMD unshare. * * We do this by triggering: * * __split_vma() -> hugetlb_split() -> hugetlb_unshare_pmds() * * Which, importantly, doesn't hold the hugetlb VMA lock (nor can * it), meaning we assert in hugetlb_vma_assert_locked(). * * . * |----------.----------| * | . | * |----------.----------| * 0 . PUD boundary */ mmap(0, PUD_SIZE / 2, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_ANON | MAP_PRIVATE, -1, 0); } int main(void) { int i; /* Kick off fork children. */ for (i = 0; i < NUM_FORKS; i++) { pid_t pid = fork(); if (pid < 0) { perror("fork"); exit(EXIT_FAILURE); } /* Fork children do their work and exit. */ if (!pid) { int j; for (j = 0; j < NUM_ITERS; j++) execute_one(); return EXIT_SUCCESS; } } /* If we succeeded, wait on children. */ for (i = 0; i < NUM_FORKS; i++) wait(NULL); return EXIT_SUCCESS; } [ljs@kernel.org: account for the !CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING case] Link: https://lore.kernel.org/agWZsPGYid08uU6O@lucifer Link: https://lore.kernel.org/20260513085658.45264-1-ljs@kernel.org Fixes: 081056dc00a2 ("mm/hugetlb: unshare page tables during VMA split, not before") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Jann Horn <jannh@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
7 daysMerge tag 'amd-drm-fixes-7.1-2026-05-28' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-7.1-2026-05-28: amdgpu: - GEM_OP warning fix - GEM_OP locking fix - Userq fixes - DCN 2.1 refclk fix - SI fix - HMM fixes amdkfd: - svm_range_set_attr locking fix - CRIU restore fix - KFD debugger fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20260528211843.893681-1-alexander.deucher@amd.com
7 daysMerge tag 'drm-xe-fixes-2026-05-28' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes - Restore IDLEDLY regiter on engine reset (Bala) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/ahhBUt8fDqjB-mQq@intel.com
8 daysMerge tag 'drm-intel-fixes-2026-05-27' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes - Fix potential UAF in TTM object purge (Janusz Krzysztofik) - Use polling when irqs are unavailable [aux] (Michał Grzelak) - Fix HDR pre-CSC LUT programming loop [color] (Pranay Samala) - Block DC states on vblank enable when Panel Replay supported [psr] (Jouni Högander) - Use DC_OFF wake reference to block DC6 on vblank enable [psr] (Jouni Högander) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tursulin@igalia.com> Link: https://patch.msgid.link/ahajfPMJE_qKzig5@linux
8 daysnet: pcs: pcs-mtk-lynxi: fix bpi-r3 serdes configurationFrank Wunderlich
Commit 8871389da151 introduces common pcs dts properties which writes rx=normal,tx=normal polarity to register SGMSYS_QPHY_WRAP_CTRL of switch. This is initialized with tx-bit set and so change inverts polarity compared to before. It looks like mt7531 has tx polarity inverted in hardware and set tx-bit by default to restore the normal polarity. The MT7531 datasheet quite clearly states: Register 000050EC QPHY_WRAP_CTRL -- QPHY wrapper control Reset value: 0x00000501 BIT 1 RX_BIT_POLARITY -- RX bit polarity control 1'b0: normal 1'b1: inverted BIT 0 TX_BIT_POLARITY -- TX bit polarity control (TX default inversed in MT7531) 1'b0: normal 1'b1: inverted Till this patch the register write was only called when mediatek,pnswap property was set which cannot be done for switch because the fw-node param was always NULL from switch driver in the mtk_pcs_lynxi_create call. Do not configure switch side like it's done before. Fixes: 8871389da151 ("net: pcs: pcs-mtk-lynxi: deprecate "mediatek,pnswap"") Signed-off-by: Frank Wunderlich <frank-w@public-files.de> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260526153239.30194-1-linux@fw-web.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysMerge tag 'for-net-2026-05-28' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_core: Rework hci_dev_do_reset() to use hci_sync functions - hci_conn: Fix memory leak in hci_le_big_terminate() - hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close - hci_sync: Reset device counters in hci_dev_close_sync() - hci_sync: fix UAF in hci_le_create_cis_sync - L2CAP: Fix possible crash on l2cap_ecred_conn_rsp - L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn - L2CAP: use chan timer to close channels in cleanup_listen() - L2CAP: clear chan->ident on ECRED reconfiguration success - ISO: fix UAF in iso_recv_frame - ISO: serialize iso_sock_clear_timer with socket lock - HIDP: fix missing length checks in hidp_input_report() - 6lowpan: check skb_clone() return value in send_mcast_pkt() - btusb: Allow firmware re-download when version matches - hci_qca: Use 100 ms SSR delay for rampatch and NVM loading * tag 'for-net-2026-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: hci_sync: Reset device counters in hci_dev_close_sync() Bluetooth: hci_sync: Set HCI_CMD_DRAIN_WORKQUEUE during device close Bluetooth: hci_core: Rework hci_dev_do_reset() to use hci_sync functions Bluetooth: ISO: serialize iso_sock_clear_timer with socket lock Bluetooth: ISO: fix UAF in iso_recv_frame Bluetooth: L2CAP: Fix possible crash on l2cap_ecred_conn_rsp Bluetooth: l2cap: clear chan->ident on ECRED reconfiguration success Bluetooth: hci_qca: Use 100 ms SSR delay for rampatch and NVM loading Bluetooth: hci_sync: fix UAF in hci_le_create_cis_sync Bluetooth: 6lowpan: check skb_clone() return value in send_mcast_pkt() Bluetooth: btusb: Allow firmware re-download when version matches Bluetooth: HIDP: fix missing length checks in hidp_input_report() Bluetooth: L2CAP: use chan timer to close channels in cleanup_listen() Bluetooth: L2CAP: fix chan ref leak in l2cap_chan_timeout() on !conn Bluetooth: hci_conn: Fix memory leak in hci_le_big_terminate() ==================== Link: https://patch.msgid.link/20260528131839.462344-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 dayssctp: fix race between sctp_wait_for_connect and peeloffZhenghang Xiao
sctp_wait_for_connect() drops and re-acquires the socket lock while waiting for the association to reach ESTABLISHED state. During this window, another thread can peeloff the association to a new socket via getsockopt(SCTP_SOCKOPT_PEELOFF), changing asoc->base.sk. After re-acquiring the old socket lock, sctp_wait_for_connect() returns success without noticing the migration — the caller then accesses the association under the wrong lock in sctp_datamsg_from_user(). Add the same sk != asoc->base.sk check that sctp_wait_for_sndbuf() already has, returning an error if the association was migrated while we slept. Fixes: 668c9beb9020 ("sctp: implement assign_number for sctp_stream_interleave") Signed-off-by: Zhenghang Xiao <kipreyyy@gmail.com> Acked-by: Xin Long <lucien.xin@gmail.com> Link: https://patch.msgid.link/20260527032411.60959-1-kipreyyy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
8 daysMerge branch ↵Jakub Kicinski
'net-mana-fix-null-dereferences-during-teardown-after-attach-failure' Dipayaan Roy says: ==================== net: mana: Fix NULL dereferences during teardown after attach failure When mana_attach() fails (e.g. during queue allocation), the error cleanup frees apc->tx_qp and apc->rxqs and sets them to NULL. Multiple subsequent teardown paths can then dereference these NULL pointers, causing kernel panics. Patch 1 adds NULL guards in the low-level teardown functions (mana_fence_rqs, mana_destroy_vport, mana_dealloc_queues) so they are safe to call regardless of queue initialization state. This covers all callers: mana_remove(), mana_change_mtu() recovery, and internal error paths in mana_alloc_queues(). Patch 2 adds an early exit in mana_detach() for already-detached ports, making it safe for non-close callers. This allows the queue reset handler to safely retry mana_attach() without redundant teardown. ==================== Link: https://patch.msgid.link/20260525081129.1230035-1-dipayanroy@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>