summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-12-10dm: ignore discard return valueChaitanya Kulkarni
__blkdev_issue_discard() always returns 0, making all error checking at call sites dead code. For dm-thin change issue_discard() return type to void, in passdown_double_checking_shared_status() remove the r assignment from return value of the issue_discard(), for end_discard() hardcode value of r to 0 that matches only value returned from __blkdev_issue_discard(). Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10MAINTAINERS: add Benjamin Marzinski as a device mapper maintainerMikulas Patocka
Ben will be working with me as a maintainer, so add him to the MAINTAINERS file. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Mike Snitzer <snitzer@kernel.org>
2025-12-10dm-mpath: Simplify the setup_scsi_dh codeBenjamin Marzinski
There's no point to the MPATHF_RETAIN_ATTACHED_HW_HANDLER flag any more. The way setup_scsi_dh() worked, if that flag wasn't set, it would attempt to attach any passed in hardware handler. This would always fail if a different hardware handler was attached, which caused setup_scsi_dh() to rerun as if the flag was set. So the code would already retain any attached handler, because attaching a different one would always fail. Also, the code had a bug. If attached_handler_name was NULL but there was a scsi device handler attached (because either scsi_dh_attached_handler_name failed() to allocate a name, a handler got attached after it was called) the code would loop endlessly. Instead, ignore MPATHF_RETAIN_ATTACHED_HW_HANDLER, and always free the passed in handler if *attached_handler_name is set. This simplifies the code, and avoids the endless loop bug, while keeping the functionality the same. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm vdo: fix kerneldoc warningsMatthew Sakai
Fix kerneldoc warnings across the dm-vdo target. Also remove some unhelpful or inaccurate doc comments, and fix some format inconsistencies that did not produce warnings. No functional changes. Suggested-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-12-10dm-bufio: align write boundary on physical block sizeMikulas Patocka
There may be devices with physical block size larger than 4k. If dm-bufio sends I/O that is not aligned on physical block size, performance is degraded. The 4k minimum alignment limit is there because some SSDs report logical and physical block size 512 despite having 4k internally - so dm-bufio shouldn't send I/Os not aligned on 4k boundary, because they perform badly (the SSD does read-modify-write for them). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: stable@vger.kernel.org
2025-12-10dm-crypt: enable DM_TARGET_ATOMIC_WRITESMikulas Patocka
Allow handling of bios with REQ_ATOMIC flag set. Don't split these bios and fail them if they overrun the hard limit "BIO_MAX_VECS << PAGE_SHIFT". In order to simplify the code, this commit joins the logic that avoids splitting emulated zone append bios with the logic that avoids splitting atomic write bios. Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: John Garry <john.g.garry@oracle.com>
2025-12-10dm: test for REQ_ATOMIC in dm_accept_partial_bio()Mikulas Patocka
Any bio with REQ_ATOMIC flag set should never be split or partially completed, so BUG_ON() on this scenario in dm_accept_partial_bio() (whose intent is to allow partial completions). Also, we must reject atomic bio to targets that don't support them, otherwise this BUG could be triggered by stray bios that have the REQ_ATOMIC set. Signed-off-by: John Garry <john.g.garry@oracle.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Tested-by: John Garry <john.g.garry@oracle.com>
2025-12-10dm-verity: remove useless mempoolMikulas Patocka
v->fec->extra_pool has zero reserved entries, so we can remove it and use the kernel cache directly. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
2025-12-10dm-verity: disable recursive forward error correctionMikulas Patocka
There are two problems with the recursive correction: 1. It may cause denial-of-service. In fec_read_bufs, there is a loop that has 253 iterations. For each iteration, we may call verity_hash_for_block recursively. There is a limit of 4 nested recursions - that means that there may be at most 253^4 (4 billion) iterations. Red Hat QE team actually created an image that pushes dm-verity to this limit - and this image just makes the udev-worker process get stuck in the 'D' state. 2. It doesn't work. In fec_read_bufs we store data into the variable "fio->bufs", but fio bufs is shared between recursive invocations, if "verity_hash_for_block" invoked correction recursively, it would overwrite partially filled fio->bufs. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reported-by: Guangwu Zhang <guazhang@redhat.com> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Eric Biggers <ebiggers@kernel.org>
2025-12-10drm/xe/guc: Recommend GuC v70.54.0 for BMG, PTLJulia Filipchuk
UAPI compatibility version 1.27.0 Update recommended GuC version for BMG, PTL. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20251125014134.2075988-15-julia.filipchuk@intel.com
2025-12-10drm/xe/guc: Recommend GuC v70.53.0 for MTL, DG2, LNLJulia Filipchuk
UAPI compatibility version 1.26.0 Update recommended GuC version for MTL, DG2, LNL. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20251125014134.2075988-14-julia.filipchuk@intel.com
2025-12-10crypto: arm64/ghash - Fix incorrect output from ghash-neonEric Biggers
Commit 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block handling") made ghash_finup() pass the wrong buffer to ghash_do_simd_update(). As a result, ghash-neon now produces incorrect outputs when the message length isn't divisible by 16 bytes. Fix this. (I didn't notice this earlier because this code is reached only on CPUs that support NEON but not PMULL. I haven't yet found a way to get qemu-system-aarch64 to emulate that configuration.) Fixes: 9a7c987fb92b ("crypto: arm64/ghash - Use API partial block handling") Cc: stable@vger.kernel.org Reported-by: Diederik de Haas <diederik@cknow-tech.com> Closes: https://lore.kernel.org/linux-crypto/DETXT7QI62KE.F3CGH2VWX1SC@cknow-tech.com/ Tested-by: Diederik de Haas <diederik@cknow-tech.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Link: https://lore.kernel.org/r/20251209223417.112294-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-12-10accel/amdxdna: Fix cu_idx being cleared by memset() during command setupLizhi Hou
For one command type, cu_idx is assigned before calling memset() on the command structure. This results in cu_idx being overwritten, causing the firmware to receive an incomplete or invalid command and leading to unexpected command failures. Fix this by moving the memset() call before initializing cu_idx so that all fields are populated in the correct order. Fixes: 71829d7f2f70 ("accel/amdxdna: Use MSG_OP_CHAIN_EXEC_NPU when supported") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251209211639.1636888-1-lizhi.hou@amd.com
2025-12-10ALSA: hda: intel-dsp-config: Prefer legacy driver as fallbackTakashi Iwai
When config table entries don't match with the device to be probed, currently we fall back to SND_INTEL_DSP_DRIVER_ANY, which means to allow any drivers to bind with it. This was set so with the assumption (or hope) that all controller drivers should cover the devices generally, but in practice, this caused a problem as reported recently. Namely, when a specific kconfig for SOF isn't set for the modern Intel chips like Alderlake, a wrong driver (AVS) got probed and failed. This is because we have entries like: #if IS_ENABLED(CONFIG_SND_SOC_SOF_ALDERLAKE) /* Alder Lake / Raptor Lake */ { .flags = FLAG_SOF | FLAG_SOF_ONLY_IF_DMIC_OR_SOUNDWIRE, .device = PCI_DEVICE_ID_INTEL_HDA_ADL_S, }, .... #endif so this entry is effective only when CONFIG_SND_SOC_SOF_ALDERLAKE is set. If not set, there is no matching entry, hence it returns SND_INTEL_DSP_DRIVER_ANY as fallback. OTOH, if the kconfig is set, it explicitly falls back to SND_INTEL_DSP_DRIVER_LEGACY when no DMIC or SoundWire is found -- that was the working scenario. That being said, the current setup may be broken for modern Intel chips that are supposed to work with either SOF or legacy driver when the corresponding kconfig were missing. For addressing the problem above, this patch changes the fallback driver to the legacy driver, i.e. return SND_INTEL_DSP_DRIVER_LEGACY type as much as possible. When CONFIG_SND_HDA_INTEL is also disabled, the fallback is set to SND_INTEL_DSP_DRIVER_ANY type, just to be sure. Reported-by: Askar Safin <safinaskar@gmail.com> Closes: https://lore.kernel.org/all/20251014034156.4480-1-safinaskar@gmail.com/ Tested-by: Askar Safin <safinaskar@gmail.com> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@linux.intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20251210131553.184404-1-tiwai@suse.de
2025-12-10drm/i915: Fix BO alloc flagsLoïc Molinari
I915_BO_ALLOC_NOTHP must be added to the I915_BO_ALLOC_FLAGS mask in order to pass GEM_BUG_ON() valid flags checks. v2: - Add Tvrtko's A-b Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Closes: https://lore.kernel.org/intel-gfx/d73adfa8-d61b-46b3-9385-dde53d8db8ad@intel.com/ Fixes: a8a9a590221c ("drm/i915: Use huge tmpfs mountpoint helpers") Suggested-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Loïc Molinari <loic.molinari@collabora.com> Acked-by: Tvrtko Ursulin <tursulin@ursulin.net> Acked-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://patch.msgid.link/20251210143617.712808-1-loic.molinari@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-12-10drm/i915/cx0: Convert C10 PHY PLL SSC state mismatch WARN to a debug messageImre Deak
On C10 PHY PLLs the SSC is enabled by programming the XELPDP_PORT_CLOCK_CTL / XELPDP_SSC_ENABLE_PLLB flag and the PHY_C10_VDR_PLL 4..8 registers: - If SSC is enabled XELPDP_SSC_ENABLE_PLLB is set and the PHY_C10_VDR_PLL registers are programmed to non-zero values. - If SSC is disabled XELPDP_SSC_ENABLE_PLLB is cleared and the PHY_C10_VDR_PLL registers are programmed to zeroed-out values. The driver's state checker verifies if the above settings are consistent, i.e. if XELPDP_SSC_ENABLE_PLLB being set corresponds to the PHY_C10_VDR_PLL registers being zeroed-out or not. On WCL the BIOS programs non-zero values to the PHY_C10_VDR_PLL 4..8 registers, but does not set the XELPDP_SSC_ENABLE_PLLB flag. This will trigger the following PLL state check warning during driver loading: <4>[ 44.457809] xe 0000:00:02.0: [drm] PHY B: SSC enabled state (no), doesn't match PLL configuration (SSC-enabled) <4>[ 44.457833] WARNING: CPU: 4 PID: 298 at drivers/gpu/drm/i915/display/intel_cx0_phy.c:2281 intel_cx0pll_readout_hw_state+0x221/0x620 [xe] It's not clear whether the HW uses the PHY_C10_VDR_PLL 4..8 register values if the XELPDP_SSC_ENABLE_PLLB flag is cleared, or just ignores them in this case. Since the driver always programs the register values according to the above, it still makes sense to verify that the programming happened correctly. To avoid the state check WARN during driver loading due to the way BIOS programs the registers, convert the WARN to a debug message. While at it clarify the debug message. v2: Clarify the debug message. (Jani) Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kahola <mika.kahola@intel.com> Reviewed-by: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20251209153407.1791839-1-imre.deak@intel.com
2025-12-10drm/gem: Fix builds with CONFIG_MMU=nBoris Brezillon
drm_gem_get_unmapped_area() relies on mm_get_unmapped_area() which is only available if CONFIG_MMU=y. Fixes: 99bda20d6d4c ("drm/gem: Introduce drm_gem_get_unmapped_area() fop") Cc: Loïc Molinari <loic.molinari@collabora.com> Reviewed-by: Loïc Molinari <loic.molinari@collabora.com> Link: https://patch.msgid.link/20251209171151.2449120-1-boris.brezillon@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-12-10drm/i915/psr: Allow async flip when Selective Fetch enabledJouni Högander
Now as Selective Fetch is performing full frame update on async flip and vblank evasion is done as needed we can allow async flip even when Selective Fetch is enabled. Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20251204070718.1090778-4-jouni.hogander@intel.com
2025-12-10drm/i915/psr: Perform full frame update on async flipJouni Högander
According to bspec selective fetch is not supported with async flips and instructing full frame update on async flip. v4: - check crtc_state->async_flip_planes in psr2_sel_fetch_pipe_state_supported v3: - rebase - fix old_crtc_state->pipe_srcsz_early_tpt - fix using intel_atomic_get_new_crtc_state v2: - check also crtc_state->async_flip_planes in psr2_sel_fetch_plane_state_supported Bspec: 55229 Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20251204070718.1090778-3-jouni.hogander@intel.com
2025-12-10drm/i915/psr: Set plane id bit in crtc_state->async_flip_planes for PSRJouni Högander
Currently plane id bit is set in crtc_state->async_flip_planes only when async flip toggle workaround is needed. We want to utilize crtc_state->async_flip_planes further in Selective Fetch calculation. v2: - rework if-else if to if-if - added comment updated Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20251204070718.1090778-2-jouni.hogander@intel.com
2025-12-10mm/slub: reset KASAN tag in defer_free() before accessing freed memoryDeepanshu Kartikey
When CONFIG_SLUB_TINY is enabled, kfree_nolock() calls kasan_slab_free() before defer_free(). On ARM64 with MTE (Memory Tagging Extension), kasan_slab_free() poisons the memory and changes the tag from the original (e.g., 0xf3) to a poison tag (0xfe). When defer_free() then tries to write to the freed object to build the deferred free list via llist_add(), the pointer still has the old tag, causing a tag mismatch and triggering a KASAN use-after-free report: BUG: KASAN: slab-use-after-free in defer_free+0x3c/0xbc mm/slub.c:6537 Write at addr f3f000000854f020 by task kworker/u8:6/983 Pointer tag: [f3], memory tag: [fe] Fix this by calling kasan_reset_tag() before accessing the freed memory. This is safe because defer_free() is part of the allocator itself and is expected to manipulate freed memory for bookkeeping purposes. Fixes: af92793e52c3 ("slab: Introduce kmalloc_nolock() and kfree_nolock().") Cc: stable@vger.kernel.org Reported-by: syzbot+7a25305a76d872abcfa1@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7a25305a76d872abcfa1 Tested-by: syzbot+7a25305a76d872abcfa1@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20251210022024.3255826-1-kartikey406@gmail.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-12-10Merge branches 'fixes' and 'misc' into for-nextRussell King (Oracle)
2025-12-10ARM: fix branch predictor hardeningRussell King (Oracle)
__do_user_fault() may be called with indeterminent interrupt enable state, which means we may be preemptive at this point. This causes problems when calling harden_branch_predictor(). For example, when called from a data abort, do_alignment_fault()->do_bad_area(). Move harden_branch_predictor() out of __do_user_fault() and into the calling contexts. Moving it into do_kernel_address_page_fault(), we can be sure that interrupts will be disabled here. Converting do_translation_fault() to use do_kernel_address_page_fault() rather than do_bad_area() means that we keep branch predictor handling for translation faults. Interrupts will also be disabled at this call site. do_sect_fault() needs special handling, so detect user mode accesses to kernel-addresses, and add an explicit call to branch predictor hardening. Finally, add branch predictor hardening to do_alignment() for the faulting case (user mode accessing kernel addresses) before interrupts are enabled. This should cover all cases where harden_branch_predictor() is called, ensuring that it is always has interrupts disabled, also ensuring that it is called early in each call path. Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-12-10ARM: fix hash_name() faultRussell King (Oracle)
Zizhi Wo reports: "During the execution of hash_name()->load_unaligned_zeropad(), a potential memory access beyond the PAGE boundary may occur. For example, when the filename length is near the PAGE_SIZE boundary. This triggers a page fault, which leads to a call to do_page_fault()->mmap_read_trylock(). If we can't acquire the lock, we have to fall back to the mmap_read_lock() path, which calls might_sleep(). This breaks RCU semantics because path lookup occurs under an RCU read-side critical section." This is seen with CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_KFENCE=y. Kernel addresses (with the exception of the vectors/kuser helper page) do not have VMAs associated with them. If the vectors/kuser helper page faults, then there are two possibilities: 1. if the fault happened while in kernel mode, then we're basically dead, because the CPU won't be able to vector through this page to handle the fault. 2. if the fault happened while in user mode, that means the page was protected from user access, and we want to fault anyway. Thus, we can handle kernel addresses from any context entirely separately without going anywhere near the mmap lock. This gives us an entirely non-sleeping path for all kernel mode kernel address faults. As we handle the kernel address faults before interrupts are enabled, this change has the side effect of improving the branch predictor hardening, but does not completely solve the issue. Reported-by: Zizhi Wo <wozizhi@huaweicloud.com> Reported-by: Xie Yuanbin <xieyuanbin1@huawei.com> Link: https://lore.kernel.org/r/20251126090505.3057219-1-wozizhi@huaweicloud.com Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-12-10ARM: allow __do_kernel_fault() to report execution of memory faultsRussell King (Oracle)
Allow __do_kernel_fault() to detect the execution of memory, so we can provide the same fault message as do_page_fault() would do. This is required when we split the kernel address fault handling from the main do_page_fault() code path. Reviewed-by: Xie Yuanbin <xieyuanbin1@huawei.com> Tested-by: Xie Yuanbin <xieyuanbin1@huawei.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-12-10selftests: netfilter: prefer xfail in case race wasn't triggeredFlorian Westphal
Jakub says: "We try to reserve SKIP for tests skipped because tool is missing in env, something isn't built into the kernel etc." use xfail, we can't force the race condition to appear at will so its expected that the test 'fails' occasionally. Fixes: 78a588363587 ("selftests: netfilter: add conntrack clash resolution test case") Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20251206175647.5c32f419@kernel.org/ Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10netfilter: always set route tuple out ifindexLorenzo Bianconi
Always set nf_flow_route tuple out ifindex even if the indev is not one of the flowtable configured devices since otherwise the outdev lookup in nf_flow_offload_ip_hook() or nf_flow_offload_ipv6_hook() for FLOW_OFFLOAD_XMIT_NEIGH flowtable entries will fail. The above issue occurs in the following configuration since IP6IP6 tunnel does not support flowtable acceleration yet: $ip addr show 5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:11:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns1 inet6 2001:db8:1::2/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::211:22ff:fe33:2255/64 scope link tentative proto kernel_ll valid_lft forever preferred_lft forever 6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:22:22:33:22:55 brd ff:ff:ff:ff:ff:ff link-netns ns3 inet6 2001:db8:2::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::222:22ff:fe33:2255/64 scope link tentative proto kernel_ll valid_lft forever preferred_lft forever 7: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000 link/tunnel6 2001:db8:2::1 peer 2001:db8:2::2 permaddr a85:e732:2c37:: inet6 2002:db8:1::1/64 scope global nodad valid_lft forever preferred_lft forever inet6 fe80::885:e7ff:fe32:2c37/64 scope link proto kernel_ll valid_lft forever preferred_lft forever $ip -6 route show 2001:db8:1::/64 dev eth0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev eth1 proto kernel metric 256 pref medium 2002:db8:1::/64 dev tun0 proto kernel metric 256 pref medium default via 2002:db8:1::2 dev tun0 metric 1024 pref medium $nft list ruleset table inet filter { flowtable ft { hook ingress priority filter devices = { eth0, eth1 } } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp } flow add @ft } } Fixes: b5964aac51e0 ("netfilter: flowtable: consolidate xmit path") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10ipvs: fix ipv4 null-ptr-deref in route error pathSlavin Liu
The IPv4 code path in __ip_vs_get_out_rt() calls dst_link_failure() without ensuring skb->dev is set, leading to a NULL pointer dereference in fib_compute_spec_dst() when ipv4_link_failure() attempts to send ICMP destination unreachable messages. The issue emerged after commit ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") started calling __ip_options_compile() from ipv4_link_failure(). This code path eventually calls fib_compute_spec_dst() which dereferences skb->dev. An attempt was made to fix the NULL skb->dev dereference in commit 0113d9c9d1cc ("ipv4: fix null-deref in ipv4_link_failure"), but it only addressed the immediate dev_net(skb->dev) dereference by using a fallback device. The fix was incomplete because fib_compute_spec_dst() later in the call chain still accesses skb->dev directly, which remains NULL when IPVS calls dst_link_failure(). The crash occurs when: 1. IPVS processes a packet in NAT mode with a misconfigured destination 2. Route lookup fails in __ip_vs_get_out_rt() before establishing a route 3. The error path calls dst_link_failure(skb) with skb->dev == NULL 4. ipv4_link_failure() → ipv4_send_dest_unreach() → __ip_options_compile() → fib_compute_spec_dst() 5. fib_compute_spec_dst() dereferences NULL skb->dev Apply the same fix used for IPv6 in commit 326bf17ea5d4 ("ipvs: fix ipv6 route unreach panic"): set skb->dev from skb_dst(skb)->dev before calling dst_link_failure(). KASAN: null-ptr-deref in range [0x0000000000000328-0x000000000000032f] CPU: 1 PID: 12732 Comm: syz.1.3469 Not tainted 6.6.114 #2 RIP: 0010:__in_dev_get_rcu include/linux/inetdevice.h:233 RIP: 0010:fib_compute_spec_dst+0x17a/0x9f0 net/ipv4/fib_frontend.c:285 Call Trace: <TASK> spec_dst_fill net/ipv4/ip_options.c:232 spec_dst_fill net/ipv4/ip_options.c:229 __ip_options_compile+0x13a1/0x17d0 net/ipv4/ip_options.c:330 ipv4_send_dest_unreach net/ipv4/route.c:1252 ipv4_link_failure+0x702/0xb80 net/ipv4/route.c:1265 dst_link_failure include/net/dst.h:437 __ip_vs_get_out_rt+0x15fd/0x19e0 net/netfilter/ipvs/ip_vs_xmit.c:412 ip_vs_nat_xmit+0x1d8/0xc80 net/netfilter/ipvs/ip_vs_xmit.c:764 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Slavin Liu <slavin452@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10netfilter: nf_conncount: fix leaked ct in error pathsFernando Fernandez Mancera
There are some situations where ct might be leaked as error paths are skipping the refcounted check and return immediately. In order to solve it make sure that the check is always called. Fixes: be102eb6a0e7 ("netfilter: nf_conncount: rework API to use sk_buff directly") Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-12-10rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AESIlya Dryomov
None of the RBD code directly requires CRC32, CRYPTO, or CRYPTO_AES. These options are needed by CEPH_LIB code and they are selected there directly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Dongsheng Yang <dongsheng.yang@linux.dev>
2025-12-10ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AESEric Biggers
None of the CEPH_FS code directly requires CRC32, CRYPTO, or CRYPTO_AES. These options do get selected indirectly anyway via CEPH_LIB, which does need them, but there is no need for CEPH_FS to select them too. Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10libceph: make decode_pool() more resilient against corrupted osdmapsIlya Dryomov
If the osdmap is (maliciously) corrupted such that the encoded length of ceph_pg_pool envelope is less than what is expected for a particular encoding version, out-of-bounds reads may ensue because the only bounds check that is there is based on that length value. This patch adds explicit bounds checks for each field that is decoded or skipped. Cc: stable@vger.kernel.org Reported-by: ziming zhang <ezrakiez@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Tested-by: ziming zhang <ezrakiez@gmail.com>
2025-12-10libceph: Amend checking to fix `make W=1` build breakageAndy Shevchenko
In a few cases the code compares 32-bit value to a SIZE_MAX derived constant which is much higher than that value on 64-bit platforms, Clang, in particular, is not happy about this net/ceph/osdmap.c:1441:10: error: result of comparison of constant 4611686018427387891 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 1441 | if (len > (SIZE_MAX - sizeof(*pg)) / sizeof(u32)) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/ceph/osdmap.c:1624:10: error: result of comparison of constant 2305843009213693945 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 1624 | if (len > (SIZE_MAX - sizeof(*pg)) / (2 * sizeof(u32))) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by casting to size_t. Note, that possible replacement of SIZE_MAX by U32_MAX may lead to the behaviour changes on the corner cases. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10ceph: Amend checking to fix `make W=1` build breakageAndy Shevchenko
In a few cases the code compares 32-bit value to a SIZE_MAX derived constant which is much higher than that value on 64-bit platforms, Clang, in particular, is not happy about this fs/ceph/snap.c:377:10: error: result of comparison of constant 2305843009213693948 with expression of type 'u32' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] 377 | if (num > (SIZE_MAX - sizeof(*snapc)) / sizeof(u64)) | ~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by casting to size_t. Note, that possible replacement of SIZE_MAX by U32_MAX may lead to the behaviour changes on the corner cases. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10ceph: add trace points to the MDS clientMax Kellermann
This patch adds trace points to the Ceph filesystem MDS client: - request submission (CEPH_MSG_CLIENT_REQUEST) and completion (CEPH_MSG_CLIENT_REPLY) - capabilities (CEPH_MSG_CLIENT_CAPS) These are the central pieces that are useful for analyzing MDS latency/performance problems from the client's perspective. In the long run, all doutc() calls should be replaced with tracepoints. This way, the Ceph filesystem can be traced at any time (without spamming the kernel log). Additionally, trace points can be used in BPF programs (which can even deference the pointer parameters and extract more values). Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10libceph: fix log output race condition in OSD clientSimon Buttgereit
OSD client logging has a problem in get_osd() and put_osd(). For one logging output refcount_read() is called twice. If recount value changes between both calls logging output is not consistent. This patch prints out only the resulting value. [ idryomov: don't make the log messages more verbose ] Signed-off-by: Simon Buttgereit <simon.buttgereit@tu-ilmenau.de> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2025-12-10blk-mq: delete task running check in blk_hctx_poll()Fengnan Chang
blk_hctx_poll() always checks if the task is running or not, and returns 1 if the task is running. This is a leftover from when polled IO was purely for synchronous IO, and doesn't make sense anymore when polled IO is purely asynchronous. Similarly, marking the task as TASK_RUNNING is also superflous, as the very much has to be running to enter the function in the first place. It looks like there has been this judgment for historical reasons, and in very early versions of this function the user would set the process state to TASK_UNINTERRUPTIBLE. Signed-off-by: Diangang Li <lidiangang@bytedance.com> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com> [axboe: kill all remnants of task running, pointless now. massage message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-12-10Merge branch 'bpf-fix-bpf_d_path-helper-prototype'Alexei Starovoitov
Shuran Liu says: ==================== bpf: fix bpf_d_path() helper prototype Hi, This series fixes a verifier issue with bpf_d_path() and adds a regression test to cover its use within a hook function. Patch 1 updates the bpf_d_path() helper prototype so that the second argument is marked as MEM_WRITE. This makes it explicit to the verifier that the helper writes into the provided buffer. Patch 2 extends the existing d_path selftest to cover incorrect verifier assumptions caused by an incorrect function prototype. The test program calls bpf_d_path() and checks if the first character of the path can be read. It ensures the verifier does not assume the buffer remains unwritten. Changelog ========= v5: - Moved the temporary file for the fallocate test from /tmp to /dev/shm Since bpf CI's 9P filesystem under /tmp does not support fallocate. v4: - Use the fallocate hook instead of an LSM hook to simplify the selftest, as suggested by Matt and Alexei. - Add a utility function in test_d_path.c to load the BPF program, improving code reuse. v3: - Switch the pathname prefix loop to use bpf_for() instead of #pragma unroll, as suggested by Matt. - Remove /tmp/bpf_d_path_test in the test cleanup path. - Add the missing Reviewed-by tags. v2: - Merge the new test into the existing d_path selftest rather than creating new files. - Add PID filtering in the LSM program to avoid nondeterministic failures due to unrelated processes triggering bprm_check_security. - Synchronize child execution using a pipe to ensure deterministic updates to the PID. Thanks for your time and reviews. ==================== Link: https://patch.msgid.link/20251206141210.3148-1-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10selftests/bpf: add regression test for bpf_d_path()Shuran Liu
Add a regression test for bpf_d_path() to cover incorrect verifier assumptions caused by an incorrect function prototype. The test attaches to the fallocate hook, calls bpf_d_path() and verifies that a simple prefix comparison on the returned pathname behaves correctly after the fix in patch 1. It ensures the verifier does not assume the buffer remains unwritten. Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Link: https://lore.kernel.org/r/20251206141210.3148-3-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10drm/{i915, xe}/stolen: make insert_node, area_address, area_size optionalJani Nikula
Since the stolen memory hooks are function pointers, make some of them optional instead of having to define them for xe. insert_node, area_address, and area_size are only needed on platforms not supported by xe. Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patch.msgid.link/0dbb460e8bd1df29df98862d08fcdfda03912673.1764930576.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-10drm/{i915, xe}/stolen: move stolen memory handling to display parent interfaceJani Nikula
Call the stolen memory interface through the display parent interface. This makes xe compat gem/i915_gem_stolen.h redundant, and it can be removed. v2: Rebase, convert one more call that appeared Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patch.msgid.link/350c82c49fe40f6319d14d309180e2e2752145ac.1764930576.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-10drm/xe/stolen: unify interface with i915Jani Nikula
Have i915_gem_stolen_node_offset() return u64, and pass const pointer to them. Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patch.msgid.link/e1ae0c5d3cc6f59d6e4f4ce810a6e9b3870109f8.1764930576.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-10drm/i915/fbc: let to_intel_display() do its generic magicJani Nikula
to_intel_display() generics can handle struct intel_plane_state, struct intel_atomic_state, and struct intel_crtc just fine. Pass them directly. Reviewed-by: Vinod Govindapillai <vinod.govindapillai@intel.com> Link: https://patch.msgid.link/14d0979eea358fb3713640eae74a7a8801cd8eec.1764930576.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-10bpf: Fix verifier assumptions of bpf_d_path's output bufferShuran Liu
Commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") started distinguishing read vs write accesses performed by helpers. The second argument of bpf_d_path() is a pointer to a buffer that the helper fills with the resulting path. However, its prototype currently uses ARG_PTR_TO_MEM without MEM_WRITE. Before 37cce22dbd51, helper accesses were conservatively treated as potential writes, so this mismatch did not cause issues. Since that commit, the verifier may incorrectly assume that the buffer contents are unchanged across the helper call and base its optimizations on this wrong assumption. This can lead to misbehaviour in BPF programs that read back the buffer, such as prefix comparisons on the returned path. Fix this by marking the second argument of bpf_d_path() as ARG_PTR_TO_MEM | MEM_WRITE so that the verifier correctly models the write to the caller-provided buffer. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Zesen Liu <ftyg@live.com> Signed-off-by: Zesen Liu <ftyg@live.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20251206141210.3148-2-electronlsr@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-10Merge branch 'inet-frags-flush-pending-skbs-in-fqdir_pre_exit'Jakub Kicinski
Jakub Kicinski says: ==================== inet: frags: flush pending skbs in fqdir_pre_exit() Fix the issue reported by NIPA starting on Sep 18th [1], where pernet_ops_rwsem is constantly held by a reader, preventing writers from grabbing it (specifically driver modules from loading). The fact that reports started around that time seems coincidental. The issue seems to be skbs queued for defrag preventing conntrack from exiting. First patch fixes another theoretical issue, it's mostly a leftover from an attempt to get rid of the inet_frag_queue refcnt, which I gave up on (still think it's doable but a bit of a time sink). Second patch is a minor refactor. The real fix is in the third patch. It's the simplest fix I can think of which is to flush the frag queues. Perhaps someone has a better suggestion? Last patch adds an explicit warning for conntrack getting stuck, as this seems like something that can easily happen if bugs sneak in. The warning will hopefully save us the first 20% of the investigation effort. Link: https://lore.kernel.org/20251001082036.0fc51440@kernel.org # [1] ==================== Link: https://patch.msgid.link/20251207010942.1672972-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10netfilter: conntrack: warn when cleanup is stuckJakub Kicinski
nf_conntrack_cleanup_net_list() calls schedule() so it does not show up as a hung task. Add an explicit check to make debugging leaked skbs/conntack references more obvious. Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10inet: frags: flush pending skbs in fqdir_pre_exit()Jakub Kicinski
We have been seeing occasional deadlocks on pernet_ops_rwsem since September in NIPA. The stuck task was usually modprobe (often loading a driver like ipvlan), trying to take the lock as a Writer. lockdep does not track readers for rwsems so the read wasn't obvious from the reports. On closer inspection the Reader holding the lock was conntrack looping forever in nf_conntrack_cleanup_net_list(). Based on past experience with occasional NIPA crashes I looked thru the tests which run before the crash and noticed that the crash follows ip_defrag.sh. An immediate red flag. Scouring thru (de)fragmentation queues reveals skbs sitting around, holding conntrack references. The problem is that since conntrack depends on nf_defrag_ipv6, nf_defrag_ipv6 will load first. Since nf_defrag_ipv6 loads first its netns exit hooks run _after_ conntrack's netns exit hook. Flush all fragment queue SKBs during fqdir_pre_exit() to release conntrack references before conntrack cleanup runs. Also flush the queues in timer expiry handlers when they discover fqdir->dead is set, in case packet sneaks in while we're running the pre_exit flush. The commit under Fixes is not exactly the culprit, but I think previously the timer firing would eventually unblock the spinning conntrack. Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10inet: frags: add inet_frag_queue_flush()Jakub Kicinski
Instead of exporting inet_frag_rbtree_purge() which requires that caller takes care of memory accounting, add a new helper. We will need to call it from a few places in the next patch. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10inet: frags: avoid theoretical race in ip_frag_reinit()Jakub Kicinski
In ip_frag_reinit() we want to move the frag timeout timer into the future. If the timer fires in the meantime we inadvertently scheduled it again, and since the timer assumes a ref on frag_queue we need to acquire one to balance things out. This is technically racy, we should have acquired the reference _before_ we touch the timer, it may fire again before we take the ref. Avoid this entire dance by using mod_timer_pending() which only modifies the timer if its pending (and which exists since Linux v2.6.30) Note that this was the only place we ever took a ref on frag_queue since Eric's conversion to RCU. So we could potentially replace the whole refcnt field with an atomic flag and a bit more RCU. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251207010942.1672972-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-10Merge branch 'selftests-fix-build-warnings-and-errors' (part)Jakub Kicinski
Guenter Roeck says: ==================== selftests: Fix build warnings and errors This series fixes build warnings and errors observed when trying to build selftests. ==================== Link: https://patch.msgid.link/20251205171010.515236-1-linux@roeck-us.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>