summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-05-02RDMA/ionic: Fix typo in format stringJason Gunthorpe
Applying the corrupted patch by hand mangled the format string, put the s in the right place. Cc: stable@vger.kernel.org Fixes: 654a27f25530 ("RDMA/ionic: bound node_desc sysfs read with %.64s") Link: https://patch.msgid.link/r/1-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reported-by: Brad Spengler <brad.spengler@opensrcsec.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02ip6_gre: Use cached t->net in ip6erspan_changelink().Maoyi Xie
After commit 5e72ce3e3980 ("net: ipv6: Use link netns in newlink() of rtnl_link_ops"), ip6erspan_newlink() correctly resolves the per-netns ip6gre hash via link_net. ip6erspan_changelink() was not converted in that series and still uses dev_net(dev), which diverges from the device's creation netns after IFLA_NET_NS_FD migration. This re-inserts the tunnel into the wrong per-netns hash. The original netns keeps a stale entry. When that netns is later destroyed, ip6gre_exit_rtnl_net() walks the stale entry, producing a slab-use-after-free reported by KASAN, followed by a kernel BUG at net/core/dev.c (LIST_POISON1) in unregister_netdevice_many_notify(). Reachable from an unprivileged user namespace (unshare --user --map-root-user --net). ip6gre_changelink() earlier in the same file already uses the cached t->net; only ip6erspan_changelink() has the wrong shape. Fixes: 2d665034f239 ("net: ip6_gre: Fix ip6erspan hlen calculation") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260430103318.3206018-1-maoyi.xie@ntu.edu.sg Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02Merge branch 'replace-direct-dequeue-call-with-qdisc_dequeue_peeked'Jakub Kicinski
Jamal Hadi Salim says: ==================== Replace direct dequeue call with qdisc_dequeue_peeked When sfb and red qdiscs have children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (red/sfb in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(red/sfb) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (red/sfb). And herein lies the problem. - red/sfb will call the child's dequeue() which will essentially just try to grab something of qfq's queue. The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Patch 1 fixes the issue for red qdisc. Patch 2 fixes it for sfb. Patch 3 adds testcases for the two setups. ==================== Link: https://patch.msgid.link/20260430152957.194015-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02selftests/tc-testing: Add tests that force red and sfb to dequeue from ↵Victor Nogueira
child's gso_skb Create 4 test cases: - Force red to dequeue from its child's gso_skb with qfq leaf - Force sfb to dequeue from its child's gso_skb with qfq leaf - Force red to dequeue from its child's gso_skb with dualpi2 leaf - Force sfb to dequeue from its child's gso_skb with dualpi2 leaf All of them have tbf followed by red (or sfb) followed by qfq (or dualpi2). Since tbf calls its child's peek followed by qdisc_dequeue_peeked, it will force red/sfb to call their child's peek. In this case, since the child (qfq/dualpi2) has qdisc_peek_dequeued as its peek callback, the packet will be stored in its gso_skb queue. During the subsequent call to qdisc_dequeue_peeked, red/sfb will have to dequeue from the child's gso_skb to retrieve the packet. Not doing so will cause a NULL ptr deref which was happening before a recent fix. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260430152957.194015-4-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02net/sched: sch_sfb: Replace direct dequeue call with peek and ↵Victor Nogueria
qdisc_dequeue_peeked When sfb has children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (sfb in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(sfb) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (sfb). And herein lies the problem. - sfb will call the child's dequeue() which will essentially just try to grab something of qfq's queue. [ 127.594489][ T453] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 127.594741][ T453] CPU: 2 UID: 0 PID: 453 Comm: ping Not tainted 7.1.0-rc1-00035-gac961974495b-dirty #793 PREEMPT(full) [ 127.595059][ T453] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 127.595254][ T453] RIP: 0010:qfq_dequeue+0x35c/0x1650 [sch_qfq] [ 127.595461][ T453] Code: 00 fc ff df 80 3c 02 00 0f 85 17 0e 00 00 4c 8d 73 48 48 89 9d b8 02 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 76 0c 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b [ 127.596081][ T453] RSP: 0018:ffff88810e5af440 EFLAGS: 00010216 [ 127.596337][ T453] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 127.596623][ T453] RDX: 0000000000000009 RSI: 0000001880000000 RDI: ffff888104fd82b0 [ 127.596917][ T453] RBP: ffff888104fd8000 R08: ffff888104fd8280 R09: 1ffff110211893a3 [ 127.597165][ T453] R10: 1ffff110211893a6 R11: 1ffff110211893a7 R12: 0000001880000000 [ 127.597404][ T453] R13: ffff888104fd82b8 R14: 0000000000000048 R15: 0000000040000000 [ 127.597644][ T453] FS: 00007fc380cbfc40(0000) GS:ffff88816f2a8000(0000) knlGS:0000000000000000 [ 127.597956][ T453] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 127.598160][ T453] CR2: 00005610aa9890a8 CR3: 000000010369e000 CR4: 0000000000750ef0 [ 127.598390][ T453] PKRU: 55555554 [ 127.598509][ T453] Call Trace: [ 127.598629][ T453] <TASK> [ 127.598718][ T453] ? mark_held_locks+0x40/0x70 [ 127.598890][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599053][ T453] sfb_dequeue+0x88/0x4d0 [ 127.599174][ T453] ? ktime_get+0x137/0x230 [ 127.599328][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599480][ T453] ? qdisc_peek_dequeued+0x7b/0x350 [sch_qfq] [ 127.599670][ T453] ? srso_alias_return_thunk+0x5/0xfbef5 [ 127.599831][ T453] tbf_dequeue+0x6b1/0x1098 [sch_tbf] [ 127.599988][ T453] __qdisc_run+0x169/0x1900 The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Victor Nogueria <victor@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-3-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02net/sched: sch_red: Replace direct dequeue call with peek and ↵Jamal Hadi Salim
qdisc_dequeue_peeked When red qdisc has children (eg qfq qdisc) whose peek() callback is qdisc_peek_dequeued(), we could get a kernel panic. When the parent of such qdiscs (eg illustrated in patch #3 as tbf) wants to retrieve an skb from its child (red in this case), it will do the following: 1a. do a peek() - and when sensing there's an skb the child can offer, then - the child in this case(red) calls its child's (qfq) peek. qfq does the right thing and will return the gso_skb queue packet. Note: if there wasnt a gso_skb entry then qfq will store it there. 1b. invoke a dequeue() on the child (red). And herein lies the problem. - red will call the child's dequeue() which will essentially just try to grab something of qfq's queue. [ 78.667668][ T363] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 78.667927][ T363] CPU: 1 UID: 0 PID: 363 Comm: ping Not tainted 7.1.0-rc1-00033-g46f74a3f7d57-dirty #790 PREEMPT(full) [ 78.668263][ T363] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 78.668486][ T363] RIP: 0010:qfq_dequeue+0x446/0xc90 [sch_qfq] [ 78.668718][ T363] Code: 54 c0 e8 dd 90 00 f1 48 c7 c7 e0 03 54 c0 48 89 de e8 ce 90 00 f1 48 8d 7b 48 b8 ff ff 37 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 <80> 3c 02 00 74 05 e8 ef a1 e1 f1 48 8b 7b 48 48 8d 54 24 58 48 8d [ 78.669312][ T363] RSP: 0018:ffff88810de573e0 EFLAGS: 00010216 [ 78.669533][ T363] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 [ 78.669790][ T363] RDX: 0000000000000009 RSI: 0000000000000004 RDI: 0000000000000048 [ 78.670044][ T363] RBP: ffff888110dc4000 R08: ffffffffb1b0885a R09: fffffbfff6ba9078 [ 78.670297][ T363] R10: 0000000000000003 R11: ffff888110e31c80 R12: 0000001880000000 [ 78.670560][ T363] R13: ffff888110dc4150 R14: ffff888110dc42b8 R15: 0000000000000200 [ 78.670814][ T363] FS: 00007f66a8f09c40(0000) GS:ffff888163428000(0000) knlGS:0000000000000000 [ 78.671110][ T363] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 78.671324][ T363] CR2: 000055db4c6a30a8 CR3: 000000010da67000 CR4: 0000000000750ef0 [ 78.671585][ T363] PKRU: 55555554 [ 78.671713][ T363] Call Trace: [ 78.671843][ T363] <TASK> [ 78.671936][ T363] ? __pfx_qfq_dequeue+0x10/0x10 [sch_qfq] [ 78.672148][ T363] ? __pfx__printk+0x10/0x10 [ 78.672322][ T363] ? srso_alias_return_thunk+0x5/0xfbef5 [ 78.672496][ T363] ? lockdep_hardirqs_on_prepare+0xa8/0x1a0 [ 78.672706][ T363] ? srso_alias_return_thunk+0x5/0xfbef5 [ 78.672875][ T363] ? trace_hardirqs_on+0x19/0x1a0 [ 78.673047][ T363] red_dequeue+0x65/0x270 [sch_red] [ 78.673217][ T363] ? srso_alias_return_thunk+0x5/0xfbef5 [ 78.673385][ T363] tbf_dequeue.cold+0xb0/0x70c [sch_tbf] [ 78.673566][ T363] __qdisc_run+0x169/0x1900 The right thing to do in #1b is to grab the skb off gso_skb queue. This patchset fixes that issue by changing #1b to use qdisc_dequeue_peeked() method instead. Fixes: 77be155cba4e ("pkt_sched: Add peek emulation for non-work-conserving qdiscs.") Reported-by: Manas <ghandatmanas@gmail.com> Reported-by: Rakshit Awasthi <rakshitawasthi17@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430152957.194015-2-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02amd-xgbe: fix PTP addend overflow causing frozen clockGregory Fuchedgi
XGBE_PTP_ACT_CLK_FREQ and XGBE_V2_PTP_ACT_CLK_FREQ were 10x too large (500MHz/1GHz instead of 50MHz/100MHz), causing the computed addend to overflow the 32-bit tstamp_addend. In the general case this would result in the clock advancing at the wrong rate. For v2 (PCI), ptpclk_rate is hardcoded to 125MHz, so the addend formula (ACT_CLK_FREQ << 32) / ptpclk_rate yields exactly 8 * 2^32, and when stored to the 32-bit tstamp_addend the value is zero. With addend = 0 the hardware accumulator never overflows and the PTP clock is fully stopped. For v1 (platform), ptpclk_rate is read from ACPI/DT so the exact overflow behavior depends on the firmware-reported frequency. Define the constants as NSEC_PER_SEC / SSINC so the relationship is explicit and cannot drift out of sync. Fixes: fbd47be098b5 ("amd-xgbe: add hardware PTP timestamping support") Tested-by: Gregory Fuchedgi <gfuchedgi@gmail.com> Signed-off-by: Gregory Fuchedgi <gfuchedgi@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429-fix-xgbe-ptp-addend-v1-1-fca5b0ca5e62@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02net: wan: fsl_ucc_hdlc: fix ucc_hdlc_removeHolger Brunck
If the driver is used in a non tdm mode priv->utdm is a NULL pointer. Therefore we need to check this pointer first before checking si_regs. Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02net: wan: fsl_ucc_hdlc: fix uhdlc_memcleanHolger Brunck
Unmapping of uf_regs is done from ucc_fast_free and doesn't need to be done explicitly. If already unmapped ucc_fast_free will crash. Fixes: c19b6d246a35 ("drivers/net: support hdlc function for QE-UCC") Signed-off-by: Holger Brunck <holger.brunck@hitachienergy.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02MAINTAINERS: Add self for the DEC LANCE network driverMaciej W. Rozycki
Like with the rest of DECstation and TURBOchannel hardware I have been handling the DEC LANCE network driver for some 25 years now anyway. Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/alpine.DEB.2.21.2604271113520.28583@angie.orcam.me.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-02tools/headers: Regenerate stddef.h to fix BPF selftestsPaul Chaignon
With commit dacbfc167808 ("crypto: af_alg - Annotate struct af_alg_iv with __counted_by"), two selftests, test_tag and crypto_sanity, now indirectly rely on the __counted_by macro. On systems with commit dacbfc167808 in the installed UAPI headers, the selftests build fails with: In file included from tools/testing/selftests/bpf/prog_tests/crypto_sanity.c:7: /usr/include/linux/if_alg.h:45:22: error: expected ‘:’, ‘,’, ‘;’, ‘}’ or ‘__attribute__’ before ‘__counted_by’ 45 | __u8 iv[] __counted_by(ivlen); | ^~~~~~~~~~~~ This patch fixes it by regenerating stddef.h in tools/include using the instructions from commit a778f5d46b62 ("tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI"). Fixes: dacbfc167808 ("crypto: af_alg - Annotate struct af_alg_iv with __counted_by") Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Ihor Solodrai <ihor.solodrai@linux.dev> Link: https://lore.kernel.org/r/8da8ef16055aa452d940668ed5359ce54adc6b0b.1777715500.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-01riscv: mm: Fixup no5lvl failure when vaddr is invalidGuo Ren (Alibaba DAMO Academy)
Unlike no4lvl, no5lvl still continues to detect satp, which requires va=pa mapping. When pa=0x800000000000, no5lvl would fail in Sv48 mode due to an illegal VA value of 0x800000000000. So, prevent detecting the satp flow for no5lvl, when vaddr is invalid. Add the is_vaddr_valid() function for checking. Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Björn Töpel <bjorn@rivosinc.com> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Tested-by: Fangyu Yu <fangyu.yu@linux.alibaba.com> Link: https://patch.msgid.link/20260125055212.433163-1-guoren@kernel.org [pjw@kernel.org: cleaned up commit message] Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-01riscv: Fix register corruption from uninitialized cregs on errorMichael Neuling
compat_riscv_gpr_set() calls cregs_to_regs() unconditionally, even when user_regset_copyin() fails. Since cregs is an uninitialized stack variable, a copyin failure causes uninitialized stack data to be written into the target task's pt_regs, corrupting its register state and potentially leaking kernel stack contents. compat_restore_sigcontext() has the same issue: it calls cregs_to_regs() even when __copy_from_user() fails, leading to the same corruption of the signal-returning task's register state on error. Only call cregs_to_regs() when the user copy succeeds. Fixes: 4608c159594f ("riscv: compat: ptrace: Add compat_arch_ptrace implement") Fixes: 7383ee05314b ("riscv: compat: signal: Add rt_frame implementation") Signed-off-by: Michael Neuling <mikey@neuling.org> Assisted-by: Cursor:claude-4.6-opus-high-thinking Link: https://patch.msgid.link/20260501062320.2339562-1-mikey@neuling.org Signed-off-by: Paul Walmsley <pjw@kernel.org>
2026-05-01ksmbd: validate inherited ACE SID lengthShota Zaizen
smb_inherit_dacl() walks the parent directory DACL loaded from the security descriptor xattr. It verifies that each ACE contains the fixed SID header before using it, but does not verify that the variable-length SID described by sid.num_subauth is fully contained in the ACE. A malformed inheritable ACE can advertise more subauthorities than are present in the ACE. compare_sids() may then read past the ACE. smb_set_ace() also clamps the copied destination SID, but used the unchecked source SID count to compute the inherited ACE size. That could advance the temporary inherited ACE buffer pointer and nt_size accounting past the allocated buffer. Fix this by validating the parent ACE SID count and SID length before using the SID during inheritance. Compute the inherited ACE size from the copied SID so the size matches the bounded destination SID. Reject the inherited DACL if size accumulation would overflow smb_acl.size or the security descriptor allocation size. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Shota Zaizen <s@zaizen.me> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01ksmbd: fix kernel-doc warnings from ksmbd_conn_get/put()Namjae Jeon
The kernel test robot reported W=1 build warnings for ksmbd_conn_get() and ksmbd_conn_put() due to missing parameter descriptions. Add the @conn description to fix these warnings. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01ksmbd: fail share config requests when path allocation failsShuhao Fu
Non-pipe shares must have a duplicated backing path before they can be published. share_config_request() currently calls kstrndup() for that path, but if the allocation fails it leaves ret unchanged. If veto list parsing succeeds and share->name exists, the partially built share is still inserted into the global share table with share->path left NULL. A later share-root SMB2 create uses tree_conn->share_conf->path as the lookup root. If the share was published with path == NULL, that request passes a NULL pathname into do_getname_kernel()/strlen() and can crash the ksmbd worker. Set ret = -ENOMEM when path duplication fails so the incomplete share is destroyed before publication. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Shuhao Fu <sfual@cse.ust.hk> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01ksmbd: close durable scavenger races against m_fp_list lookupsDaeMyung Kang
ksmbd_durable_scavenger() has two related races against any walker that iterates f_ci->m_fp_list, including ksmbd_lookup_fd_inode() (used by ksmbd_vfs_rename) and the share-mode checks in fs/smb/server/smb_common.c. (1) fp->node list-head reuse. Durable-preserved handles can remain linked on f_ci->m_fp_list after session teardown so share-mode checks still see them while the handle is reconnectable. The scavenger collected expired handles by adding fp->node to a local scavenger_list after removing them from the global durable idr. Because fp->node is the same list_head used by m_fp_list, list_add(&fp->node, &scavenger_list) overwrites the m_fp_list links and corrupts both lists. CONFIG_DEBUG_LIST can report this on the share-mode walk path. (2) Refcount race against m_fp_list walkers. The scavenger qualifies an expired durable handle with atomic_read(&fp->refcount) > 1 and fp->conn under global_ft.lock, removes fp from global_ft, then drops global_ft.lock before unlinking fp from m_fp_list and freeing it. During that gap fp is still linked on m_fp_list with f_state == FP_INITED. ksmbd_lookup_fd_inode() under m_lock read calls ksmbd_fp_get() (atomic_inc_not_zero on refcount that is still 1) and takes a live reference; the scavenger then unlinks and frees fp while the holder owns a reference, leading to UAF on the holder's subsequent ksmbd_fd_put() and on any field reads performed by a concurrent share-mode walker that iterates m_fp_list without taking ksmbd_fp_get() (smb_check_perm_dleases-like paths). Fix both: * Stop reusing fp->node as a scavenger-private list node. Remove one expired handle from global_ft under global_ft.lock, take an explicit transient reference, drop the lock, unlink fp->node from m_fp_list under f_ci->m_lock, then drop both the durable lifetime and transient references with atomic_sub_and_test(2, &fp->refcount). If the scavenger is the last putter the close runs there; otherwise an in-flight holder that already raced through the m_fp_list lookup owns the final close via its ksmbd_fd_put() path. The one-at-a-time disposal can rescan the durable idr when multiple handles expire in the same pass, but durable scavenging is a background expiration path and the final full scan recomputes min_timeout before the next wait. * Clear fp->persistent_id inside __ksmbd_remove_durable_fd() right after idr_remove(), so a delayed final close from a holder that snatched fp does not re-issue idr_remove() on a persistent id that idr_alloc_cyclic() in ksmbd_open_durable_fd() may have already handed out to a brand-new durable handle. * Bypass the per-conn open_files_count decrement in __put_fd_final() when fp is detached from any session table (fp->conn cleared by session_fd_check() at durable preserve -- paired with the volatile_id clear at unpublish, so checking fp->conn alone is sufficient). The walker that owns the final close runs from an unrelated work->conn whose stats.open_files_count never tracked this durable fp; without this guard the holder would underflow that unrelated counter. The two races are folded into one patch because patch (1) alone cleans up the corrupted list but leaves a deterministic UAF window for m_fp_list walkers that the transient-reference and persistent_id discipline in (2) close; bisecting onto an intermediate state would land on a UAF that pre-patch chaos merely made less reproducible. Validation: * CONFIG_DEBUG_LIST coverage for the list_head reuse path. * KASAN-enabled direct SMB2 durable-handle coverage that exercised ksmbd_durable_scavenger() and non-NULL ksmbd_lookup_fd_inode() returns while durable handles expired under concurrent rename lookups, with no KASAN, UAF, list-corruption, ODEBUG, or WARNING reports. * checkpatch --strict * make -j$(nproc) M=fs/smb/server Fixes: d484d621d40f ("ksmbd: add durable scavenger timer") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01ksmbd: harden file lifetime during session teardownDaeMyung Kang
__close_file_table_ids() is the per-session teardown that closes every fp belonging to a session (or to one tree connect on that session) by walking the session's volatile-id idr. The current loop has three related problems on busy or racing workloads: * Sleeping under ft->lock. The session-teardown skip callback, session_fd_check(), already sleeps in ksmbd_vfs_copy_durable_owner() -> kstrdup(GFP_KERNEL) and down_write(&fp->f_ci->m_lock) (a rw_semaphore). Running the callback inside write_lock(&ft->lock) trips CONFIG_DEBUG_ATOMIC_SLEEP / CONFIG_PROVE_LOCKING on a durable-fd workload. * Refcount accounting blind to f_state. The unconditional atomic_dec_and_test(&fp->refcount) does not distinguish FP_INITED (idr-owned reference still intact) from FP_CLOSED (an earlier ksmbd_close_fd() already consumed the idr-owned reference while leaving fp in the idr because a holder kept refcount non-zero). When the latter races with teardown the same path over-decrements into a holder reference and ksmbd_fd_put() later UAFs that holder. * FP_NEW window. Between __open_id() publishing fp into the session idr and ksmbd_update_fstate(..., FP_INITED) committing the transition at the end of smb2_open(), an fp is in FP_NEW and an intervening teardown that takes a transient reference and unpublishes the volatile id leaves the original idr-owned reference orphaned -- the opener is unaware that fp has been unpublished, returns success to the client, and the fp leaks at refcount = 1. Refactor __close_file_table_ids() to take a transient reference on fp and unpublish fp from the session idr *under ft->lock* before calling skip() outside the lock. A transient ref protects lifetime but not concurrent field mutation, so the idr_remove() is what keeps __ksmbd_lookup_fd() through this session's idr from granting a new ksmbd_fp_get() reference to an fp whose fp->conn / fp->tcon / fp->volatile_id / op->conn / lock_list links are about to be rewritten by session_fd_check(). Durable reconnect is unaffected because it reaches fp through the global durable table (ksmbd_lookup_durable_fd -> global_ft). Decide n_to_drop together with any FP_INITED -> FP_CLOSED transition under ft->lock so teardown and ksmbd_close_fd() never both consume the idr-owned reference. See ksmbd_mark_fp_closed() for the per-state accounting. For the FP_NEW path to be safe, the opener has to learn that fp was unpublished: ksmbd_update_fstate() now returns -ENOENT when an FP_NEW -> FP_INITED transition finds f_state already advanced or the volatile id cleared (both committed by teardown under ft->lock); smb2_open() propagates that as STATUS_OBJECT_NAME_INVALID and drops the original reference via ksmbd_fd_put(). The list removal cannot be left for a deferred final putter because fp->volatile_id has already been cleared and __ksmbd_remove_fd() will intentionally skip both idr_remove() and list_del_init(). Move the m_fp_list unlink in __ksmbd_remove_fd() above the volatile-id check so that an FP_NEW fp that happened to be added to m_fp_list (smb2_open() adds fp->node before ksmbd_update_fstate() runs) is still cleaned up on the deferred putter path; list_del_init() on an empty node is a no-op and remains safe for fps that were never added. Add a defensive guard in session_fd_check() that refuses non-FP_INITED fps so that even if a teardown reaches an FP_NEW fp it falls into the close branch (where the n_to_drop = 1 accounting keeps the opener's reference alive) instead of the durable-preserve branch (which mutates fp->conn / fp->tcon). Validation on a debug kernel additionally built with CONFIG_DEBUG_LIST and CONFIG_DEBUG_OBJECTS_WORK used a same-session two-tcon workload (open/write storm on one tcon, 50 tree disconnects on the other) and reported no list-corruption, work_struct ODEBUG, sleep-in-atomic, lockdep or kmemleak reports. Reverting only the __close_file_table_ids() hunk while keeping a forced-is_reconnectable() harness produced the expected sleep-in-atomic at vfs_cache.c:1095, confirming the ft->lock-out-of-sleepable-skip discipline. KASAN-enabled direct SMB2 coverage with durable handles enabled exercised ksmbd_close_tree_conn_fds(), ksmbd_close_session_fds(), the FP_NEW failure path, tree_conn_fd_check(), and a non-zero session_fd_check() durable-preserve return. This produced no KASAN, DEBUG_LIST, ODEBUG, or WARNING reports. Fixes: f44158485826 ("cifsd: add file operations") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01ksmbd: centralize ksmbd_conn final release to plug transport leakDaeMyung Kang
ksmbd_conn_free() is one of four sites that can observe the last refcount drop of a struct ksmbd_conn. The other three fs/smb/server/connection.c ksmbd_conn_r_count_dec() fs/smb/server/oplock.c __free_opinfo() fs/smb/server/vfs_cache.c session_fd_check() end the conn with a bare kfree(), skipping ida_destroy(&conn->async_ida) and conn->transport->ops->free_transport(conn->transport). Whenever one of them is the last putter, the embedded async_ida and the entire transport struct leak -- for TCP, that is also the struct socket and the kvec iov. __free_opinfo() being a final putter is not theoretical. opinfo_put() queues the callback via call_rcu(&opinfo->rcu, free_opinfo_rcu), so ksmbd_server_terminate_conn() can deposit N opinfo releases in RCU and have ksmbd_conn_free() run in the handler thread before any of them fire. ksmbd_conn_free() then observes refcnt > 0 and short-circuits; the last RCU-delivered __free_opinfo() falls onto its bare kfree(conn) branch and the transport is lost. A/B validation in a QEMU/virtme guest, mounting //127.0.0.1/testshare: each iteration holds 8 files open via sleep processes, force-closes TCP with "ss -K sport = :445", kills the holders, lazy-umounts; repeated 10 times, then ksmbd shutdown and kmemleak scan. state conn_alloc conn_free tcp_free opi_rcu kmemleak ---------- ---------- --------- -------- ------- -------- pre-patch 20 20 10 160 7 with patch 20 20 20 160 0 Pre-patch conn_free=20 with tcp_free=10 directly demonstrates the bare-kfree paths skipping transport cleanup; kmemleak backtraces point into struct tcp_transport / iov. With this patch tcp_free matches conn_free at 20/20 and kmemleak is clean. Move the per-struct final release into __ksmbd_conn_release_work() and route the three bare-kfree final-put sites through a new ksmbd_conn_put(). Those sites now pair ida_destroy() and free_transport() with kfree(conn) regardless of which holder happens to release the last reference. stop_sessions() only triggers the transport shutdown and does not itself drop the last conn reference, so it is unaffected. The centralized release reaches sock_release() -> tcp_close() -> lock_sock_nested() (might_sleep) from every final putter, including __free_opinfo() invoked from an RCU softirq callback, which trips CONFIG_DEBUG_ATOMIC_SLEEP. Defer the release to a dedicated ksmbd_conn_wq workqueue so ksmbd_conn_put() is safe from any non-sleeping context. Make ksmbd_file own a strong connection reference while fp->conn is non-NULL so durable-preserve and final-close paths cannot dereference a stale connection. ksmbd_open_fd() and ksmbd_reopen_durable_fd() take the reference via ksmbd_conn_get() (the latter also reorders the fp->conn / fp->tcon assignments before __open_id() so the published fp is never observed with fp->conn == NULL); session_fd_check() and __ksmbd_close_fd() drop it via ksmbd_conn_put(). With that invariant, session_fd_check() can take a local conn pointer once and use it across the m_op_list and lock_list iterations even though op->conn puts may otherwise drop the last reference. At module exit the workqueue is flushed and destroyed after rcu_barrier(), so any release queued by a trailing RCU callback is drained before the inode hash and module text go away. Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct") Signed-off-by: DaeMyung Kang <charsyam@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01Merge branch 'ipv6-fix-ecmp-route-failover-on-carrier-loss'Jakub Kicinski
Sagarika Sharma says: ==================== ipv6: fix ECMP route failover on carrier loss This patchset resolves an issue where established IPv6 connections are unable to transition to alternative ECMP nexthops upon carrier loss. Unlike IPv4, the IPv6 routing subsystem does not actively invalidate cached destinations during a NETDEV_CHANGE event. Sockets persist with dead routes, leading to stalled traffic or connection drops. This series introduces a fix to trigger route invalidation by updating the route serial number on link carrier loss and provides a corresponding selftest to validate the failover behavior for IPv4 and IPv6. ==================== Link: https://patch.msgid.link/20260430200909.527827-1-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01selftest: net: Add test for TCP flow failover with ECMP routes.Kuniyuki Iwashima
Without the previous commit, TCP failed to switch to alternative IPv6 routes immediately upon carrier loss. It would persist with the dead route until reaching the threshold net.ipv4.tcp_retries1, leading to unnecessary delays in failover. Let's add a selftest for this scenario to ensure TCP fails over immediately upon a carrier loss event. Before: TEST: TCP IPv4 failover [ OK ] TEST: TCP IPv6 failover [FAIL] After: TEST: TCP IPv4 failover [ OK ] TEST: TCP IPv6 failover [ OK ] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Sagarika Sharma <sharmasagarika@google.com> Link: https://patch.msgid.link/20260430200909.527827-3-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01ipv6: update route serial number on NETDEV_CHANGESagarika Sharma
When using IPv6 ECMP routes, if a netdev listed as a nexthop experiences a carrier change event (e.g., a bond device generating a NETDEV_CHANGE event after its slaves go linkdown), established connections utilizing that nexthop fail to fail over to other available nexthops. Instead, these connections stall or drop. This happens because the IPv6 FIB code does not invalidate the socket's cached destination when a NETDEV_CHANGE event occurs. While fib6_ifdown() correctly marks the nexthop with RTNH_F_LINKDOWN, it leaves the route's serial number unchanged. As a result, sockets with a previously cached dst do not realize the route is no longer viable and continue to try using the non-functional nexthop. This behavior contrasts with IPv4, which actively flushes cached destinations on a NETDEV_CHANGE event (see fib_netdev_event() in net/ipv4/fib_frontend.c). Fix this by updating the route serial number in fib6_ifdown() when setting RTNH_F_LINKDOWN. This invalidates stale cached destinations, forcing sockets to perform a new route lookup and fail over to a functioning nexthop. Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Signed-off-by: Sagarika Sharma <sharmasagarika@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430200909.527827-2-sharmasagarika@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01net/sched: sch_pie: annotate more data-races in pie_dump_stats()Eric Dumazet
My prior patch missed few READ_ONCE()/WRITE_ONCE() annotations. Fixes: 5154561d9b11 ("net/sched: sch_pie: annotate data-races in pie_dump_stats()") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430080056.35104-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01net: usb: cdc_ncm: add Apple Mac USB-C direct networking quirkAlex Cheema
Apple Silicon Macs expose two CDC NCM "private" data interfaces over USB-C with VID:PID 0x05ac:0x1905 and product string "Mac". This is the same protocol Apple already ships on iPhone (0x05ac:0x12a8) and iPad (0x05ac:0x12ab) for RemoteXPC since iOS 17 -- both data interfaces lack an interrupt status endpoint, so they rely on the FLAG_LINK_INTR- conditional bind path introduced in commit 3ec8d7572a69 ("CDC-NCM: add support for Apple's private interface"). The id_table currently has entries for iPhone and iPad but not for the Mac. Without a match, cdc_ncm falls through to the generic CDC NCM class-match entry, which uses the FLAG_LINK_INTR-having cdc_ncm_info struct, so bind_common() fails on the missing status endpoint and no netdev appears. Add id_table entries for both interface numbers (0 and 2) of the Mac, bound to the existing apple_private_interface_info driver_info. Verified empirically on a Mac Studio M3 Ultra running macOS 26.5: when a Mac is connected via USB-C, ioreg shows VID 0x05ac, PID 0x1905, product string "Mac", with two NCM data interfaces at numbers 0 and 2. The same PID is presented by all current Apple Silicon Mac models (MacBook Pro/Air, Mac mini, Mac Studio across the M-series), mirroring Apple's single-PID-per-family pattern from iPhone/iPad. After this patch, plugging a Mac into a Linux host running the patched kernel produces two enx... interfaces (one per data interface), "ip -br link" lists them as UP, and standard userspace networking (DHCP, NetworkManager shared mode, etc.) works without any modprobe overrides or out-of-tree modules. Signed-off-by: Alex Cheema <alex@exolabs.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429175739.34426-1-alex@exolabs.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01ipv4: igmp: annotate data-races in igmp_heard_query()Eric Dumazet
Multiple cpus can run igmp_heard_query() concurrently. Add missing READ_ONCE()/WRITE_ONCE() over following in_dev fields. - mr_qrv - mr_qi - mr_qri - mr_v1_seen - mr_v2_seen Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+ae9a171f239b14485310@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69f38675.050a0220.3cbe47.0002.GAE@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430164836.872079-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01net: rtnetlink: zero ifla_vf_broadcast to avoid stack infoleak in ↵Kai Zen
rtnl_fill_vfinfo rtnl_fill_vfinfo() declares struct ifla_vf_broadcast on the stack without initialisation: struct ifla_vf_broadcast vf_broadcast; The struct contains a single fixed 32-byte field: /* include/uapi/linux/if_link.h */ struct ifla_vf_broadcast { __u8 broadcast[32]; }; The function then copies dev->broadcast into it using dev->addr_len as the length: memcpy(vf_broadcast.broadcast, dev->broadcast, dev->addr_len); On Ethernet devices (the overwhelming majority of SR-IOV NICs) dev->addr_len is 6, so only the first 6 bytes of broadcast[] are written. The remaining 26 bytes retain whatever was previously on the kernel stack. The full struct is then handed to userspace via: nla_put(skb, IFLA_VF_BROADCAST, sizeof(vf_broadcast), &vf_broadcast) leaking up to 26 bytes of uninitialised kernel stack per VF per RTM_GETLINK request, repeatable. The other vf_* structs in the same function are explicitly zeroed for exactly this reason - see the memset() calls for ivi, vf_vlan_info, node_guid and port_guid a few lines above. vf_broadcast was simply missed when it was added. Reachability: any unprivileged local process can open AF_NETLINK / NETLINK_ROUTE without capabilities and send RTM_GETLINK with an IFLA_EXT_MASK attribute carrying RTEXT_FILTER_VF. The kernel walks each VF and emits IFLA_VF_BROADCAST, leaking 26 bytes of stack per VF per request. Stack residue at this call site can include return addresses and transient sensitive data; KASAN with stack instrumentation, or KMSAN, will flag the nla_put() when reproduced. Zero the on-stack struct before the partial memcpy, matching the existing pattern used for the other vf_* structs in the same function. Fixes: 75345f888f70 ("ipoib: show VF broadcast address") Cc: stable@vger.kernel.org Signed-off-by: Kai Zen <kai.aizen.dev@gmail.com> Link: https://patch.msgid.link/3c506e8f936e52b57620269b55c348af05d413a2.1777557228.git.kai.aizen.dev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01ipmr: prevent info-leak in pmr_cache_report()Eric Dumazet
Yiming Qian reported: <quote> ipmr_cache_report()` allocates a report skb with `alloc_skb(128, GFP_ATOMIC)` and appends a `struct igmphdr` using `skb_put()`. In the non-`IGMPMSG_WHOLEPKT` path it initializes only: - `igmp->type` - `igmp->code` but does not initialize: - `igmp->csum` - `igmp->group` Later, `igmpmsg_netlink_event()` copies the bytes after `sizeof(struct igmpmsg)` into the `IPMRA_CREPORT_PKT` netlink attribute and emits `RTM_NEWCACHEREPORT` on `RTNLGRP_IPV4_MROUTE_R`. As a result, 6 bytes of stale heap data from the skb head are disclosed to userspace. </quote> Let's use skb_put_zero() instead of skb_put() to fix this bug. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260430070611.4004529-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01Merge tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Fixes for rc2, the usual amdgpu/xe double header, I think xe had a couple of weeks combined due to some maintainer access issues, otherwise there's just a few misc fixes and documentation fixups. core and helpers: - calculate framebuffer geometry with format helpers - fix docs amdgpu: - GFX12 fix for CONFIG_DRM_DEBUG_MM configs - Fix DC analog support - Userq fixes - GART placement fix - Aldebaran SMU fixes - AMDGPU_INFO_READ_MMR_REG fix - UVD 3.1 fix - GC 6 TCC fix - Fix root reservation in amdgpu_vm_handle_fault() - RAS fix - Module reload fix for APUs - Fix build for CONFIG_DRM_FBDEV_EMULATION=n - IGT DWB regression fix - GC 11.5.4 fix - VCN user fence fixes - JPEG user fence fixes - SMU 13.0.6 fix - VCN 3/4 IB parser fixes - NV3x+ dGPU vblank fix - DCE6/8 fixes for LVDS/eDP panels without an EDID amdkfd: - Fix for when CONFIG_HSA_AMD is not set - SVM fixes xe: - uapi: Add missing pad and extensions check - uapi: Reject unsafe PAT indices for CPU cached memory - Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge - Xe3p tuning and workaround fixes - USE drm mm instead of drm SA for CCS read/write - Fix leaks and null derefs - Fix Wa_18022495364 appletbdrm: - allocate protocol buffers with kvzalloc() dma-buf: - fix docs imagination: - avoid segfault in debugfs ofdrm: - put PCI device reference on errors udl: - increase USB timeout" * tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel: (77 commits) drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise drm/xe/xelp: Fix Wa_18022495364 drm/xe/gsc: Fix BO leak on error in query_compatibility_version() drm/xe/eustall: Fix drm_dev_put called before stream disable in close drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl() drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import() drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked() drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked() drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked drm/xe/vf: Use drm mm instead of drm sa for CCS read/write drm/xe: Add memory pool with shadow support drm/xe/debugfs: Correct printing of register whitelist ranges drm/xe: Mark ROW_CHICKEN5 as a masked register drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL drm/xe/xe3p_lpg: Add missing indirect ring state feature flag drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138 drm/xe/vm: Add missing pad and extensions check drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge() ...
2026-05-01net: usb: r8152: add TRENDnet TUC-ET2G v2.0Aleksander Jan Bajkowski
The TRENDnet TUC-ET2G V2.0 is an RTL8156B based 2.5G Ethernet controller. Add the vendor and product ID values to the driver. This makes Ethernet work with the adapter. Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Birger Koblitz <mail@birger-koblitz.de> Link: https://patch.msgid.link/20260430213435.21821-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01Merge tag 'nf-26-05-01' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Replace skb_try_make_writable() by skb_ensure_writable() in nft_fwd_netdev and the flowtable to deal with uncloned packets having their network header in paged fragments. 2) Drop packet if output device does not exist and ensure sufficient headroom in nft_fwd_netdev before transmitting the skb. 3) Use the existing dup recursion counter in nft_fwd_netdev for the neigh_xmit variant, from Weiming Shi. 4) Add .check_hooks interface to x_tables to detach the control plane hook check based on the match/target configuration. Then, update nft_compat to use .check_hooks from .validate path, this fixes a lack of hook validation for several match/targets. 5) Fix incorrect .usersize in xt_CT, from Florian Westphal. 6) Fix a memleak with netdev tables in dormant state, from Florian Westphal. 7) Several patches to check if the packet is a fragment, then skip layer 4 inspection, for x_tables and nf_tables; as well as common nf_socket infrastructure. The xt_hashlimit match drops fragments to stay consistent with the existing approach when failing to parse the layer 4 protocol header. 8) Ensure sufficient headroom in the flowtable before transmitting the skb. 9) Fix the flowtable inline vlan approach for double-tagged vlan: Reverse the iteration over .encap[] since it represents the encapsulation as seen from the ingress path. Postpone pushing layer 2 header so output device is available to calculate needed headroom. Finally, add and use nf_flow_vlan_push() to fix it. 10) Fix flowtable inline pppoe with GSO packets. Moreover, use FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware address since neighbour cache does not exist in pppoe. 11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for double-tagged vlan in particular this should provide some benefits in certain scenarios. More notes regarding 9-11): - sashiko is also signalling to use it for IPIP headers, but that needs more adjustments such setting skb->protocol after removing the IPIP header, will follow up in a separated patch. - I plan to submit selftests to cover double-tagged-vlan. As for pppoe, it should be possible but that would mandate a few userspace dependencies. This has been semi-automatically tested by me and reporters describing broken double-vlan-tagged and pppoe currently in the flowtable. * tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header netfilter: flowtable: fix inline pppoe encapsulation in xmit path netfilter: flowtable: fix inline vlan encapsulation in xmit path netfilter: flowtable: ensure sufficient headroom in xmit path netfilter: xtables: fix L4 header parsing for non-first fragments netfilter: nf_tables: skip L4 header parsing for non-first fragments netfilter: nf_socket: skip socket lookup for non-first fragments netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables netfilter: xt_CT: fix usersize for v1 and v2 revision netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate netfilter: x_tables: add .check_hooks to matches and targets netfilter: nft_fwd_netdev: use recursion counter in neigh egress path netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding netfilter: replace skb_try_make_writable() by skb_ensure_writable() ==================== Link: https://patch.msgid.link/20260501122237.296262-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Avoid writing an uninitialised stack variable to POR_EL0 on sigreturn if the poe_context record is absent - Reserve one more page for the early 4K-page kernel mapping to cover the extra [_text, _stext) split introduced by the non-executable read-only mapping - Force the arch_local_irq_*() wrappers to be __always_inline so that noinstr entry and idle paths cannot call out-of-line, instrumentable copies - Fix potential sign extension in the arm64 SCS unwinder's DWARF advance_loc4 decoding - Tolerate arm64 ACPI platforms with only WFI and no deeper PSCI idle states, restoring cpuidle registration on such systems - Include the UAPI <asm/ptrace.h> header in the arm64 GCS libc test rather than carrying a duplicate struct user_gcs definition (the original #ifdef NT_ARM_GCS was wrong to cover the structure definition as it would be masked out if the toolchain defined it) * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: Preserve POR_EL0 if poe_context is missing arm64: Reserve an extra page for early kernel mapping kselftest/arm64: Include <asm/ptrace.h> for user_gcs definition ACPI: arm64: cpuidle: Tolerate platforms with no deep PSCI idle states arm64/irqflags: __always_inline the arch_local_irq_*() helpers arm64/scs: Fix potential sign extension issue of advance_loc4
2026-05-01smb: smbdirect: fix MR registration for coalesced SG listsYi Kuo
ib_dma_map_sg() modifies the provided scatterlist and returns the number of mapped entries, which can be fewer than the requested mr->sgt.nents if the DMA controller coalesces contiguous memory segments. Passing the original, uncoalesced count to ib_map_mr_sg() causes memory registration failures if coalescing actually occurs. Capture the actual mapped count returned by ib_dma_map_sg() and pass it to ib_map_mr_sg() to ensure correct MR registration. Also update the ib_dma_map_sg() error logging to drop the error pointer formatting, since the return value is an integer count rather than an error code. Ensure a proper error code (-EIO) is assigned when DMA mapping or MR registration fails. Fixes: de5ef8ec3c46 ("smb: smbdirect: introduce smbdirect_mr.c with client mr code") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221408 Reviewed-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yi Kuo <yi@yikuo.dev> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01smb: smbdirect: introduce and use include/linux/smbdirect.hStefan Metzmacher
This makes it easier to rebuild cifs.ko and ksmbd.ko against a running kernel. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/ Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01smb: smbdirect: make use of DEFAULT_SYMBOL_NAMESPACE and EXPORT_SYMBOL_GPLStefan Metzmacher
This is a better solution than EXPORT_SYMBOL_FOR_MODULES(__sym, "cifs,ksmbd") as it makes it possible to rebuild smbdirect.ko against a running kernel and then load the existing cifs.ko and ksmbd.ko from the running kernel. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/ Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01accel/qaic: fix incorrect counter check in RAS message decodeAlok Tiwari
The UE and UE_NF cases check ce_count against UINT_MAX before incrementing their respective counters. This is logically incorrect and prevents ue_count and ue_nf_count from incrementing when ce_count reaches UINT_MAX. Fixes: c11a50b170e7 ("accel/qaic: Add Reliability, Accessibility, Serviceability (RAS)") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com> Link: https://patch.msgid.link/20260410112015.592546-1-alok.a.tiwari@oracle.com
2026-05-01Merge tag 'selinux-pr-20260501' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fixes from Paul Moore: - Ensure SELinux is always properly accessing its own sock LSM state - Only reserve an xattr slot for SELinux if it will be used - Fix a SELinux auditing regression in the directory avdcache * tag 'selinux-pr-20260501' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: fix avdcache auditing selinux: don't reserve xattr slot when we won't fill it selinux: use sk blob accessor in socket permission helpers
2026-05-01futex: Drop CLONE_THREAD requirement for private default hash allocDavidlohr Bueso
Currently need_futex_hash_allocate_default() depends on strict pthread semantics, abusing CLONE_THREAD. This breaks the non-concurrency assumptions when doing the mm->futex_ref pcpu allocations, leading to bugs[0] when sharing the mm in other ways; ie: BUG: KASAN: slab-use-after-free in futex_hash_put ... where the +1 bias can end up on a percpu counter that mm->futex_ref no longer points at. Loosen the check to cover any CLONE_VM clone, except vfork(). Excluding vfork keeps the existing paths untouched (no overhead), and we can't race in the first place: either the parent is suspended and the child runs alone, or mm->futex_ref is already allocated from an earlier CLONE_VM. Link: https://lore.kernel.org/all/CAL_bE8LsmCQ-FAtYDuwbJhOkt9p2wwYQwAbMh=PifC=VsiBM6A@mail.gmail.com/ [0] Fixes: d9b05321e21e ("futex: Move futex_hash_free() back to __mmput()") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-05-01Merge tag 's390-7.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Reject zero-length writes from userspace that corrupt Debug Facility buffers - Replace one s390 PCI maintainer - Remove SCLP_OFB Kconfig option and enable the guarded code unconditionally - Replace incorrect use of phys_to_folio() to virt_to_folio() in do_secure_storage_access() * tag 's390-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: Fix phys_to_folio() usage in do_secure_storage_access() s390/sclp: Remove SCLP_OFB Kconfig option MAINTAINERS: Replace one of the maintainers for s390/pci s390/debug: Reject zero-length input in debug_input_flush_fn() s390/debug: Reject zero-length input before trimming a newline
2026-05-01rseq: Don't advertise time slice extensions if disabledThomas Gleixner
If time slice extensions have been disabled on the kernel command line, then advertising them in RSEQ flags is wrong. Adjust the conditionals to reflect reality, fixup the misleading comments about the gap of these flags and the rseq::flags field. Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.437059375%40kernel.org Cc: stable@vger.kernel.org
2026-05-01rseq: Protect rseq_reset() against interruptsThomas Gleixner
rseq_reset() uses memset() to clear the tasks rseq data. That's racy against membarrier() and preemption. Guard it with irqsave to cure this. Fixes: faba9d250eae ("rseq: Introduce struct rseq_data") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.353887714%40kernel.org Cc: stable@vger.kernel.org
2026-05-01rseq: Set rseq::cpu_id_start to 0 on unregistrationThomas Gleixner
The RSEQ rework changed that to RSEQ_CPU_UNINITILIZED, which is obviously incompatible. Revert back to the original behavior. Fixes: 0f085b41880e ("rseq: Provide and use rseq_set_ids()") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.271566313%40kernel.org Cc: stable@vger.kernel.org
2026-05-01selftests/rseq: Don't run tests with runner scripts outside of the scriptsMark Brown
The rseq selftests include two runner scripts run_param_test.sh and run_syscall_errors_test.sh which set up the environment for test binaries and run them with various parameters. Currently we list these test binaries in TEST_GEN_PROGS but this results in the kselftest framework running them directly as well as via the runners, resulting in duplication and spurious failures when the environment is not correctly set up (eg, if glibc tries to use rseq). Move the binaries the runners invoke to TEST_GEN_PROGS_EXTENDED, binaries listed there are built but not run by the framework. The param_test benchmarks are not moved since they are not run by run_param_test.sh. Fixes: 830969e7821a ("selftests/rseq: Implement time slice extension test") Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260423-selftests-rseq-use-runner-v1-1-e13a133754c1@kernel.org Cc: stable@vger.kernel.org
2026-05-01Merge tag 'v7.1-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull smb server fixes from Steve French: - Fix shutdown (stop sessions) - Fix readdir unsupported info level * tag 'v7.1-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd: ksmbd: rewrite stop_sessions() with restartable iteration smb: server: handle readdir_info_level_struct_sz() error
2026-05-01Merge tag 'block-7.1-20260430' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - MD pull request via Yu: - Fix a raid5 UAF on IO across the reshape position - Avoid failing RAID1/RAID10 devices for invalid IO errors - Fix RAID10 divide-by-zero when far_copies is zero - Restore bitmap grow through sysfs - Use mddev_is_dm() instead of open-coding gendisk checks - Use ATTRIBUTE_GROUPS() for md default sysfs attributes - Replace open-coded wait loops with wait_event helpers - NVMe pull request via Keith: - Target data transfer size configuation (Aurelien) - Enable P2P for RDMA (Shivaji Kant) - TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar) - TCP host updates (Alistair, Chaitanya) - Authentication updates (Alistair, Daniel, Chris Leech) - Multipath fixes (John Garry) - New quirks (Alan Cui, Tao Jiang) - Apple driver fix (Fedor Pchelkin) - PCI admin doorbell update fix (Keith) - Properly propagate CDROM read-only state to the block layer * tag 'block-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (35 commits) md: use ATTRIBUTE_GROUPS() for md default sysfs attributes md: use mddev_is_dm() instead of open-coding gendisk checks md/raid1: replace wait loop with wait_event_idle() in raid1_write_request() md/md-bitmap: add a none backend for bitmap grow md/md-bitmap: split bitmap sysfs groups md: factor bitmap creation away from sysfs handling md: use mddev_lock_nointr() in mddev_suspend_and_lock_nointr() md: replace wait loop with wait_event() in md_handle_request() md/raid10: fix divide-by-zero in setup_geo() with zero far_copies md/raid1,raid10: don't fail devices for invalid IO errors MAINTAINERS: Add Xiao Ni as md/raid reviewer md/raid5: Fix UAF on IO across the reshape position cdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro() nvme-auth: Hash DH shared secret to create session key nvme-pci: fix missed admin queue sq doorbell write nvme-auth: Include SC_C in RVAL controller hash nvme-tcp: teardown circular locking fixes nvmet-tcp: Don't clear tls_key when freeing sq Revert "nvmet-tcp: Don't free SQ on authentication success" nvme: skip trace completion for host path errors ...
2026-05-01Merge tag 'io_uring-7.1-20260430' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull io_uring fixes from Jens Axboe: - Remove dead struct io_buffer_list member - Fix for incrementally consumed buffers with recvmsg multishot, which requires a minimum value left in a buffer for any receive for the headers. If there's still a bit of buffer left but it's smaller than that value, then userspace will see a spurious -EFAULT returned in the CQE - Locking fix for the DEFER_TASKRUN retry list, which otherwise could race with fallback cancelations. If the task is exiting with task_work left in both the normal and retry list AND the exit cleanup races with the task running task work, then entries could either be doubly completed or lost - Cap NAPI busy poll timeout to something sane, to avoid syzbot running into excessive polling and triggering warnings around that * tag 'io_uring-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring/tw: serialize ctx->retry_llist with ->uring_lock io_uring/napi: cap busy_poll_to 10 msec io_uring/kbuf: support min length left for incremental buffers io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member
2026-05-01parisc: Fix 64-bit kernel build when CONFIG_COMPAT=nHelge Deller
VDSO32_SYMBOL() is used in signal.c, defining the value to zero avoids liker issues when CONFIG_COMPAT=n. Signed-off-by: Helge Deller <deller@gmx.de>
2026-05-01Merge tag 'spi-fix-v7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "There are a couple of nasty issues fixed here in the axiado and rockchip drivers. We've also got more of the fixes from Johan here, this time for the two Cadence drivers, plus a couple of other similar fixes from John and Felix" * tag 'spi-fix-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: amlogic-spisg: initialize completion before requesting IRQ spi: axiado: replace usleep_range() with udelay() in IRQ path spi: cadence-quadspi: fix runtime pm and clock imbalance on unbind spi: cadence-quadspi: fix unclocked access on unbind spi: cadence-quadspi: fix clock imbalance on probe failure spi: cadence-quadspi: fix runtime pm disable imbalance on probe failure spi: cadence: fix clock imbalance on probe failure spi: cadence: fix unclocked access on unbind spi: rockchip: Drop unused and broken CR0 macros spi: rockchip: Read ISR, not IMR, to detect cs-inactive IRQ spi: rzv2h-rspi: Fix silent failure in clock setup error path
2026-05-01arm64: signal: Preserve POR_EL0 if poe_context is missingKevin Brodsky
Commit 2e8a1acea859 ("arm64: signal: Improve POR_EL0 handling to avoid uaccess failures") delayed the write to POR_EL0 in rt_sigreturn to avoid spurious uaccess failures. This change however relies on the poe_context frame record being present: on a system supporting POE, calling sigreturn without a poe_context record now results in writing arbitrary data from the kernel stack into POR_EL0. Fix this by adding a __valid_fields member to struct user_access_state, and zeroing the struct on allocation. restore_poe_context() then indicates that the por_el0 field is valid by setting the corresponding bit in __valid_fields, and restore_user_access_state() only touches POR_EL0 if there is a valid value to set it to. This is in line with how POR_EL0 was originally handled; all frame records are currently optional, except fpsimd_context. To ensure that __valid_fields is kept in sync, fields (currently just por_el0) are now accessed via accessors and prefixed with __ to discourage direct access. Fixes: 2e8a1acea859 ("arm64: signal: Improve POR_EL0 handling to avoid uaccess failures") Cc: <stable@vger.kernel.org> Reported-by: Will Deacon <will@kernel.org> Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-01Merge tag 'regulator-fix-v7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fix from Mark Brown: "A fix from Arnd re-adding a dependency on gpiolib which was implicitly pulled in via an OF specific route which got removed as part of a cleanup" * tag 'regulator-fix-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: rpi-panel-attiny: add back GPIOLIB dependency
2026-05-01Merge tag 'regmap-v7.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "A fix from Colin for a spelling mistake in a dev_warn() message" * tag 'regmap-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: sdw-mbq: Fix spelling mistake "undeferable" -> "undeferrable"