summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-01-20ipv6: annotate data-race in ndisc_router_discovery()Eric Dumazet
syzbot found that ndisc_router_discovery() could read and write in6_dev->ra_mtu without holding a lock [1] This looks fine, IFLA_INET6_RA_MTU is best effort. Add READ_ONCE()/WRITE_ONCE() to document the race. Note that we might also reject illegal MTU values (mtu < IPV6_MIN_MTU || mtu > skb->dev->mtu) in a future patch. [1] BUG: KCSAN: data-race in ndisc_router_discovery / ndisc_router_discovery read to 0xffff888119809c20 of 4 bytes by task 25817 on cpu 1: ndisc_router_discovery+0x151d/0x1c90 net/ipv6/ndisc.c:1558 ndisc_rcv+0x2ad/0x3d0 net/ipv6/ndisc.c:1841 icmpv6_rcv+0xe5a/0x12f0 net/ipv6/icmp.c:989 ip6_protocol_deliver_rcu+0xb2a/0x10d0 net/ipv6/ip6_input.c:438 ip6_input_finish+0xf0/0x1d0 net/ipv6/ip6_input.c:489 NF_HOOK include/linux/netfilter.h:318 [inline] ip6_input+0x5e/0x140 net/ipv6/ip6_input.c:500 ip6_mc_input+0x27c/0x470 net/ipv6/ip6_input.c:590 dst_input include/net/dst.h:474 [inline] ip6_rcv_finish+0x336/0x340 net/ipv6/ip6_input.c:79 ... write to 0xffff888119809c20 of 4 bytes by task 25816 on cpu 0: ndisc_router_discovery+0x155a/0x1c90 net/ipv6/ndisc.c:1559 ndisc_rcv+0x2ad/0x3d0 net/ipv6/ndisc.c:1841 icmpv6_rcv+0xe5a/0x12f0 net/ipv6/icmp.c:989 ip6_protocol_deliver_rcu+0xb2a/0x10d0 net/ipv6/ip6_input.c:438 ip6_input_finish+0xf0/0x1d0 net/ipv6/ip6_input.c:489 NF_HOOK include/linux/netfilter.h:318 [inline] ip6_input+0x5e/0x140 net/ipv6/ip6_input.c:500 ip6_mc_input+0x27c/0x470 net/ipv6/ip6_input.c:590 dst_input include/net/dst.h:474 [inline] ip6_rcv_finish+0x336/0x340 net/ipv6/ip6_input.c:79 ... value changed: 0x00000000 -> 0xe5400659 Fixes: 49b99da2c9ce ("ipv6: add IFLA_INET6_RA_MTU to expose mtu value") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Rocco Yue <rocco.yue@mediatek.com> Link: https://patch.msgid.link/20260118152941.2563857-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20mISDN: annotate data-race around dev->workEric Dumazet
dev->work can re read locklessly in mISDN_read() and mISDN_poll(). Add READ_ONCE()/WRITE_ONCE() annotations. BUG: KCSAN: data-race in mISDN_ioctl / mISDN_read write to 0xffff88812d848280 of 4 bytes by task 10864 on cpu 1: misdn_add_timer drivers/isdn/mISDN/timerdev.c:175 [inline] mISDN_ioctl+0x2fb/0x550 drivers/isdn/mISDN/timerdev.c:233 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl+0xce/0x140 fs/ioctl.c:583 __x64_sys_ioctl+0x43/0x50 fs/ioctl.c:583 x64_sys_call+0x14b0/0x3000 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd8/0x2c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f read to 0xffff88812d848280 of 4 bytes by task 10857 on cpu 0: mISDN_read+0x1f2/0x470 drivers/isdn/mISDN/timerdev.c:112 do_loop_readv_writev fs/read_write.c:847 [inline] vfs_readv+0x3fb/0x690 fs/read_write.c:1020 do_readv+0xe7/0x210 fs/read_write.c:1080 __do_sys_readv fs/read_write.c:1165 [inline] __se_sys_readv fs/read_write.c:1162 [inline] __x64_sys_readv+0x45/0x50 fs/read_write.c:1162 x64_sys_call+0x2831/0x3000 arch/x86/include/generated/asm/syscalls_64.h:20 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xd8/0x2c0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00000000 -> 0x00000001 Fixes: 1b2b03f8e514 ("Add mISDN core files") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260118132528.2349573-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: airoha_eth: increase max MTU to 9220 for DSA jumbo framesSayantan Nandy
The industry standard jumbo frame MTU is 9216 bytes. When using the DSA subsystem, a 4-byte tag is added to each Ethernet frame. Increase AIROHA_MAX_MTU to 9220 bytes (9216 + 4) so that users can set a standard 9216-byte MTU on DSA ports. The underlying hardware supports significantly larger frame sizes (approximately 16K). However, the maximum MTU is limited to 9220 bytes for now, as this is sufficient to support standard jumbo frames and does not incur additional memory allocation overhead. Signed-off-by: Sayantan Nandy <sayantann11@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260119073658.6216-1-sayantann11@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: txgbe: remove the redundant data return in SW-FW mailboxJiawen Wu
For these two firmware mailbox commands, in txgbe_test_hostif() and txgbe_set_phy_link_hostif(), there is no need to read data from the buffer. Under the current setting, OEM firmware will cause the driver to fail to probe. Because OEM firmware returns more link information, with a larger OEM structure txgbe_hic_ephy_getlink. However, the current driver does not support the OEM function. So just fix it in the way that does not involve reading the returned data. Fixes: d84a3ff9aae8 ("net: txgbe: Restrict the use of mismatched FW versions") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Link: https://patch.msgid.link/2914AB0BC6158DDA+20260119065935.6015-1-jiawenwu@trustnetic.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Merge branch 'fix-some-bugs-in-the-flow-director-of-hns3-driver'Jakub Kicinski
Jijie Shao says: ==================== fix some bugs in the flow director of HNS3 driver This patchset fixes two bugs in the flow director: 1. Incorrect definition of HCLGE_FD_AD_COUNTER_NUM_M 2. Incorrect assignment of HCLGE_FD_AD_NXT_KEY ==================== Link: https://patch.msgid.link/20260119132840.410513-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: hns3: fix the HCLGE_FD_AD_NXT_KEY error setting issueJijie Shao
Use next_input_key instead of counter_id to set HCLGE_FD_AD_NXT_KEY. Fixes: 117328680288 ("net: hns3: Add input key and action config support for flow director") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260119132840.410513-3-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: hns3: fix wrong GENMASK() for HCLGE_FD_AD_COUNTER_NUM_MJijie Shao
HCLGE_FD_AD_COUNTER_NUM_M should be at GENMASK(19, 13), rather than at GENMASK(20, 13), because bit 20 is HCLGE_FD_AD_NXT_STEP_B. This patch corrects the wrong definition. Fixes: 117328680288 ("net: hns3: Add input key and action config support for flow director") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://patch.msgid.link/20260119132840.410513-2-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: stmmac: fix resume: calculate tso last_segmentRussell King (Oracle)
Tao Wang reports that sometimes, after resume, stmmac can watchdog: NETDEV WATCHDOG: CPU: x: transmit queue x timed out xx ms When this occurs, the DMA transmit descriptors contain: eth0: 221 [0x0000000876d10dd0]: 0x73660cbe 0x8 0x42 0xb04416a0 eth0: 222 [0x0000000876d10de0]: 0x77731d40 0x8 0x16a0 0x90000000 where descriptor 221 is the TSO header and 222 is the TSO payload. tdes3 for descriptor 221 (0xb04416a0) has both bit 29 (first descriptor) and bit 28 (last descriptor) set, which is incorrect. The following packet also has bit 28 set, but isn't marked as a first descriptor, and this causes the transmit DMA to stall. This occurs because stmmac_tso_allocator() populates the first descriptor, but does not set .last_segment correctly. There are two places where this matters: one is later in stmmac_tso_xmit() where we use it to update the TSO header descriptor. The other is in the ring/chain mode clean_desc3() which is a performance optimisation. Rather than using tx_q->tx_skbuff_dma[].last_segment to determine whether the first descriptor entry is the only segment, calculate the number of descriptor entries used. If there is only one descriptor, then the first is also the last, so mark it as such. Further work will be necessary to either eliminate .last_segment entirely or set it correctly. Code analysis also indicates that a similar issue exists with .is_jumbo. These will be the subject of a future patch. Reported-by: Tao Wang <tao03.wang@horizon.auto> Fixes: c2837423cb54 ("net: stmmac: Rework TX Coalesce logic") Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vhq8O-00000005N5s-0Ke5@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20be2net: fix data race in be_get_new_eqdDavid Yang
In be_get_new_eqd(), statistics of pkts, protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. Before the commit in question, these statistics were retrieved one by one directly from queues. Fix this by reading them into temporary variables first. Fixes: 209477704187 ("be2net: set interrupt moderation for Skyhawk-R using EQ-DB") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260119153440.1440578-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20idpf: Fix data race in idpf_net_dimDavid Yang
In idpf_net_dim(), some statistics protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. The correct way to copy statistics is already illustrated by idpf_add_queue_stats(). Fix this by reading them into temporary variables first. Fixes: c2d548cad150 ("idpf: add TX splitq napi poll support") Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Signed-off-by: David Yang <mmyangfl@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260119162720.1463859-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: hns3: fix data race in hns3_fetch_statsDavid Yang
In hns3_fetch_stats(), ring statistics, protected by u64_stats_sync, are read and accumulated in ignorance of possible u64_stats_fetch_retry() events. These statistics are already accumulated by hns3_ring_stats_update(). Fix this by reading them into a temporary buffer first. Fixes: b20d7fe51e0d ("net: hns3: add some statitics info to tx process") Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260119160759.1455950-1-mmyangfl@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20octeontx2-pf: Remove unnecessary bounds checkSimon Horman
active_fec is a 2-bit unsigned field, and thus can only have the values 0-3. So checking that it is less than 4 is unnecessary. Simplify the code by dropping this check. As it no longer fits well where it is, move FEC_MAX_INDEX to towards the top of the file. And add the prefix OXT2. I believe this is more idiomatic. Flagged by Smatch as: ...//otx2_ethtool.c:1024 otx2_get_fecparam() warn: always true condition '(pfvf->linfo.fec < 4) => (0-3 < 4)' No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Hariprasad Kelam <hkelam@marvell.com> Link: https://patch.msgid.link/20260119-oob-v1-1-a4147e75e770@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: add kdoc for napi_consume_skb()Jakub Kicinski
Looks like AI reviewers miss that napi_consume_skb() must have a real budget passed to it. Let's see if adding a real kdoc will help them figure this out. Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260119224140.1362729-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: usb: r8152: fix transmit queue timeoutMingj Ye
When the TX queue length reaches the threshold, the netdev watchdog immediately detects a TX queue timeout. This patch updates the trans_start timestamp of the transmit queue on every asynchronous USB URB submission along the transmit path, ensuring that the network watchdog accurately reflects ongoing transmission activity. Signed-off-by: Mingj Ye <insyelu@gmail.com> Reviewed-by: Hayes Wang <hayeswang@realtek.com> Link: https://patch.msgid.link/20260120015949.84996-1-insyelu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20selftests: drv-net: fix missing include in ncdevmemJakub Kicinski
Commit ca9d74eb5f6a ("uapi: add INT_MAX and INT_MIN constants") recently removed some includes of limits.h in uAPI headers. ncdevmem.c was depending on them: ncdevmem.c: In function ‘ethtool_add_flow’: ncdevmem.c:369:60: error: ‘INT_MAX’ undeclared (first use in this function) 369 | if (endptr == id_start || flow_id < 0 || flow_id > INT_MAX) | ^~~~~~~ ncdevmem.c:77:1: note: ‘INT_MAX’ is defined in header ‘<limits.h>’; did you forget to ‘#include <limits.h>’? Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20260120180319.1673271-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20clk: qcom: gfx3d: add parent to parent request mapDmitry Baryshkov
After commit d228ece36345 ("clk: divider: remove round_rate() in favor of determine_rate()") determining GFX3D clock rate crashes, because the passed parent map doesn't provide the expected best_parent_hw clock (with the roundd_rate path before the offending commit the best_parent_hw was ignored). Set the field in parent_req in addition to setting it in the req, fixing the crash. clk_hw_round_rate (drivers/clk/clk.c:1764) (P) clk_divider_bestdiv (drivers/clk/clk-divider.c:336) divider_determine_rate (drivers/clk/clk-divider.c:358) clk_alpha_pll_postdiv_determine_rate (drivers/clk/qcom/clk-alpha-pll.c:1275) clk_core_determine_round_nolock (drivers/clk/clk.c:1606) clk_core_round_rate_nolock (drivers/clk/clk.c:1701) __clk_determine_rate (drivers/clk/clk.c:1741) clk_gfx3d_determine_rate (drivers/clk/qcom/clk-rcg2.c:1268) clk_core_determine_round_nolock (drivers/clk/clk.c:1606) clk_core_round_rate_nolock (drivers/clk/clk.c:1701) clk_core_round_rate_nolock (drivers/clk/clk.c:1710) clk_round_rate (drivers/clk/clk.c:1804) dev_pm_opp_set_rate (drivers/opp/core.c:1440 (discriminator 1)) msm_devfreq_target (drivers/gpu/drm/msm/msm_gpu_devfreq.c:51) devfreq_set_target (drivers/devfreq/devfreq.c:360) devfreq_update_target (drivers/devfreq/devfreq.c:426) devfreq_monitor (drivers/devfreq/devfreq.c:458) process_one_work (arch/arm64/include/asm/jump_label.h:36 include/trace/events/workqueue.h:110 kernel/workqueue.c:3284) worker_thread (kernel/workqueue.c:3356 (discriminator 2) kernel/workqueue.c:3443 (discriminator 2)) kthread (kernel/kthread.c:467) ret_from_fork (arch/arm64/kernel/entry.S:861) Fixes: 55213e1acec9 ("clk: qcom: Add gfx3d ping-pong PLL frequency switching") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Abel Vesa <abel.vesa@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Brian Masney <bmasney@redhat.com> Link: https://lore.kernel.org/r/20260117-db820-fix-gfx3d-v1-1-0f8894d71d63@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-20net: fclone allocation small optimizationEric Dumazet
After skb allocation, initial skb->fclone value is 0 (SKB_FCLONE_UNAVAILABLE) We can replace one RMW sequence with a single OR instruction. movzbl 0x7e(%r13),%eax // skb->fclone = SKB_FCLONE_ORIG; and $0xf3,%al or $0x4,%al mov %al,0x7e(%r13) -> or $0x4,0x7e(%r13) // skb->fclone |= SKB_FCLONE_ORIG; Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260116164402.1872649-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Merge branch 'convert-the-micrel-bindings-to-dt-schema'Jakub Kicinski
Stefan Eichenberger says: ==================== Convert the Micrel bindings to DT schema Convert the device tree bindings for the Micrel PHYs and switches to DT schema. ==================== Link: https://patch.msgid.link/20260116130948.79558-1-eichest@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20dt-bindings: net: micrel: Convert micrel-ksz90x1.txt to DT schemaStefan Eichenberger
Convert the micrel-ksz90x1.txt to DT schema. Create a separate YAML file for this PHY series. The old naming of ksz90x1 would be misleading in this case, so rename it to gigabit, as it contains ksz9xx1 and lan8xxx gigabit PHYs. Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260116130948.79558-3-eichest@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20dt-bindings: net: micrel: Convert to DT schemaStefan Eichenberger
Convert the devicetree bindings for the Micrel PHYs and switches to DT schema. Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260116130948.79558-2-eichest@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20dt-bindings: net: sparx5: do not require phys when RGMII is usedRobert Marko
LAN969x has 2 dedicated RGMII ports, so regular SERDES lanes are not used for RGMII. So, lets not require phys to be defined when any of the rgmii phy-modes are set. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260115114021.111324-11-robert.marko@sartura.hr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20selftests: drv-net: extend HW timestamp test with ioctlVadim Fedorenko
Extend HW timestamp tests to check that ioctl interface is not broken and configuration setups and requests are equal to netlink interface. Some linter warnings are disabled because of ctypes classes. Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260116062121.1230184-2-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: remove legacy way to get/set HW timestamp configVadim Fedorenko
With all drivers converted to use ndo_hwstamp callbacks the legacy way can be removed, marking ioctl interface as deprecated. Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20260116062121.1230184-1-vadim.fedorenko@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20net: split kmalloc_reserve() to allow inliningEric Dumazet
kmalloc_reserve() is too big to be inlined. Put the slow path in a new out-of-line function : kmalloc_pfmemalloc() Then let kmalloc_reserve() set skb->pfmemalloc only when/if the slow path is taken. This makes __alloc_skb() faster : - kmalloc_reserve() is now automatically inlined by both gcc and clang. - No more expensive RMW (skb->pfmemalloc = pfmemalloc). - No more expensive stack canary (for CONFIG_STACKPROTECTOR_STRONG=y). - Removal of two prefetches that were coming too late for modern cpus. Text size increase is quite small compared to the cpu savings (~0.7 %) $ size net/core/skbuff.clang.before.o net/core/skbuff.clang.after.o text data bss dec hex filename 72507 5897 0 78404 13244 net/core/skbuff.clang.before.o 72681 5897 0 78578 132f2 net/core/skbuff.clang.after.o Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260116041359.181104-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Merge branch 'eth-fbnic-update-ipc-mailbox-support'Jakub Kicinski
Mohsin Bashir says: ==================== eth: fbnic: Update IPC mailbox support Update IPC mailbox support for fbnic to cater for several changes. ==================== Link: https://patch.msgid.link/20260115003353.4150771-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Update RX mbox timeout valueMohsin Bashir
While waiting for completions on read requests, driver is using different timeout values for different messages. Make use of a single timeout value. Introduce a wrapper function to handle the wait, which also simplify maintaining the 80 char line limit. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Remove retry supportMohsin Bashir
The driver retries sensor read requests from firmware, but this is unnecessary. A functioning firmware should respond to each request within the timeout period. Remove the retry logic and set the timeout to the sum of all retry timeouts. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-5-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Reuse RX mailbox pagesMohsin Bashir
Currently, the RX mailbox frees and reallocates a page for each received message. Since FW Rx messages are processed synchronously, and nothing hold these pages (unlike skbs which we hand over to the stack), reuse the pages and put them back on the Rx ring. Now that we ensure the ring is always fully populated we don't have to worry about filling it up after partial population during init, either. Update fbnic_mbx_process_rx_msgs() to recycle pages after message processing. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-4-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Allocate all pages for RX mailboxMohsin Bashir
Now that memory is allocated with GFP_KERNEL, allocation failures should be extremely rare. Ensure the FW communication ring is always fully populated with free pages, and hard fail initialization otherwise. This enables simplifications in next patches. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Use GFP_KERNEL to allocting mbx pagesMohsin Bashir
Replace GFP_ATOMIC with GFP_KERNEL for mailbox RX page allocation. Since interrupt handler is threaded GFP_KERNEL is a safe option to reduce allocation failures. Also remove __GFP_NOWARN so the kernel reports a warning on allocation failure to aid debugging. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20bpf: Simplify bpf_timer_cancel()Mykyta Yatsenko
Remove lock from the bpf_timer_cancel() helper. The lock does not protect from concurrent modification of the bpf_async_cb data fields as those are modified in the callback without locking. Use guard(rcu)() instead of pair of explicit lock()/unlock(). Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-4-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20bpf: Introduce lock-free bpf_async_update_prog_callback()Mykyta Yatsenko
Introduce bpf_async_update_prog_callback(): lock-free update of cb->prog and cb->callback_fn. This function allows updating prog and callback_fn fields of the struct bpf_async_cb without holding lock. For now use it under the lock from __bpf_async_set_callback(), in the next patches that lock will be removed. Lock-free algorithm: * Acquire a guard reference on prog to prevent it from being freed during the retry loop. * Retry loop: 1. Each iteration acquires a new prog reference and stores it in cb->prog via xchg. The previous prog is released. 2. The loop condition checks if both cb->prog and cb->callback_fn match what we just wrote. If either differs, a concurrent writer overwrote our value, and we must retry. 3. When we retry, our previously-stored prog was already released by the concurrent writer or will be released by us after overwriting. * Release guard reference. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-3-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20bpf: Remove unnecessary arguments from bpf_async_set_callback()Mykyta Yatsenko
Remove unused arguments from __bpf_async_set_callback(). Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-2-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20bpf: Factor out timer deletion helperMykyta Yatsenko
Move the timer deletion logic into a dedicated bpf_timer_delete() helper so it can be reused by later patches. Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260120-timer_nolock-v6-1-670ffdd787b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linuxJakub Kicinski
Pavel Begunkov says: ==================== Add support for providers with large rx buffer Many modern NICs support configurable receive buffer lengths, and zcrx and memory providers can use buffers larger than 4K to improve performance. When paired with hw-gro larger rx buffer sizes can drastically reduce the number of buffers traversing the stack and save a lot of processing time. It also allows to give to users larger contiguous chunks of data. Single stream benchmarks showed up to ~30% CPU util improvement. E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC: packets=23987040 (MB=2745098), rps=199559 (MB/s=22837) CPU %usr %nice %sys %iowait %irq %soft %idle 0 1.53 0.00 27.78 2.72 1.31 66.45 0.22 packets=24078368 (MB=2755550), rps=200319 (MB/s=22924) CPU %usr %nice %sys %iowait %irq %soft %idle 0 0.69 0.00 8.26 31.65 1.83 57.00 0.57 This series adds net infrastructure for memory providers configuring the size and implements it for bnxt. It's an opt-in feature for drivers, they should advertise support for the parameter in the qops and must check if the hardware supports the given size. It's limited to memory providers as it drastically simplifies implementation. It doesn't affect the fast path zcrx uAPI, and the user exposed parameter is defined in zcrx terms, which allows it to be flexible and adjusted in the future. A liburing example can be found at [2] full branch: [1] https://github.com/isilence/linux.git zcrx/large-buffers-v8 Liburing example: [2] https://github.com/isilence/liburing.git zcrx/rx-buf-len * tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux: io_uring/zcrx: document area chunking parameter selftests: iou-zcrx: test large chunk sizes eth: bnxt: support qcfg provided rx page size eth: bnxt: adjust the fill level of agg queues with larger buffers eth: bnxt: store rx buffer size per queue net: pass queue rx page size from memory provider net: add bare bone queue configs net: reduce indent of struct netdev_queue_mgmt_ops members net: memzero mp params when closing a queue ==================== Link: https://patch.msgid.link/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20xfs: add media verification ioctlDarrick J. Wong
Add a new privileged ioctl so that xfs_scrub can ask the kernel to verify the media of the devices backing an xfs filesystem, and have any resulting media errors reported to fsnotify and xfs_healer. To accomplish this, the kernel allocates a folio between the base page size and 1MB, and issues read IOs to a gradually incrementing range of one of the storage devices underlying an xfs filesystem. If any error occurs, that raw error is reported to the calling process. If the error happens to be one of the ones that the kernel considers indicative of data loss, then it will also be reported to xfs_healthmon and fsnotify. Driving the verification from the kernel enables xfs (and by extension xfs_scrub) to have precise control over the size and error handling of IOs that are issued to the underlying block device, and to emit notifications about problems to other relevant kernel subsystems immediately. Note that the caller is also allowed to reduce the size of the IO and to ask for a relaxation period after each IO. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: check if an open file is on the health monitored fsDarrick J. Wong
Create a new ioctl for the healthmon file that checks that a given fd points to the same filesystem that the healthmon file is monitoring. This allows xfs_healer to check that when it reopens a mountpoint to perform repairs, the file that it gets matches the filesystem that generated the corruption report. (Note that xfs_healer doesn't maintain an open fd to a filesystem that it's monitoring so that it doesn't pin the mount.) Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: allow toggling verbose logging on the health monitoring fileDarrick J. Wong
Make it so that we can reconfigure the health monitoring device by calling the XFS_IOC_HEALTH_MONITOR ioctl on it. As of right now we can only toggle the verbose flag, but this is less annoying than having to closing the monitor fd and reopen it. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: convey file I/O errors to the health monitorDarrick J. Wong
Connect the fserror reporting to the health monitor so that xfs can send events about file I/O errors to the xfs_healer daemon. These events are entirely informational because xfs cannot regenerate user data, so hopefully the fsnotify I/O error event gets noticed by the relevant management systems. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: convey externally discovered fsdax media errors to the health monitorDarrick J. Wong
Connect the fsdax media failure notification code to the health monitor so that xfs can send events about that to the xfs_healer daemon. Later on we'll add the ability for the xfs_scrub media scan (phase 6) to report the errors that it finds to the kernel so that those are also logged by xfs_healer. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: convey filesystem shutdown events to the health monitorDarrick J. Wong
Connect the filesystem shutdown code to the health monitor so that xfs can send events about that to the xfs_healer daemon. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: convey metadata health events to the health monitorDarrick J. Wong
Connect the filesystem metadata health event collection system to the health monitor so that xfs can send events to xfs_healer as it collects information. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: convey filesystem unmount events to the health monitorDarrick J. Wong
In xfs_healthmon_unmount, send events to xfs_healer so that it knows that nothing further can be done for the filesystem. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: create event queuing, formatting, and discovery infrastructureDarrick J. Wong
Create the basic infrastructure that we need to report health events to userspace. We need a compact form for recording critical information about an event and queueing them; a means to notice that we've lost some events; and a means to format the events into something that userspace can handle. Make the kernel export C structures via read(). In a previous iteration of this new subsystem, I wanted to explore data exchange formats that are more flexible and easier for humans to read than C structures. The thought being that when we want to rev (or worse, enlarge) the event format, it ought to be trivially easy to do that in a way that doesn't break old userspace. I looked at formats such as protobufs and capnproto. These look really nice in that extending the wire format is fairly easy, you can give it a data schema and it generates the serialization code for you, handles endianness problems, etc. The huge downside is that neither support C all that well. Too hard, and didn't want to port either of those huge sprawling libraries first to the kernel and then again to xfsprogs. Then I thought, how about JSON? Javascript objects are human readable, the kernel can emit json without much fuss (it's all just strings!) and there are plenty of interpreters for python/rust/c/etc. There's a proposed schema format for json, which means that xfs can publish a description of the events that kernel will emit. Userspace consumers (e.g. xfsprogs/xfs_healer) can embed the same schema document and use it to validate the incoming events from the kernel, which means it can discard events that it doesn't understand, or garbage being emitted due to bugs. However, json has a huge crutch -- javascript is well known for its vague definitions of what are numbers. This makes expressing a large number rather fraught, because the runtime is free to represent a number in nearly any way it wants. Stupider ones will truncate values to word size, others will roll out doubles for uint52_t (yes, fifty-two) with the resulting loss of precision. Not good when you're dealing with discrete units. It just so happens that python's json library is smart enough to see a sequence of digits and put them in a u64 (at least on x86_64/aarch64) but an actual javascript interpreter (pasting into Firefox) isn't necessarily so clever. It turns out that none of the proposed json schemas were ever ratified even in an open-consensus way, so json blobs are still just loosely structured blobs. The parsing in userspace was also noticeably slow and memory-consumptive. Hence only the C interface survives. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20xfs: start creating infrastructure for health monitoringDarrick J. Wong
Start creating helper functions and infrastructure to pass filesystem health events to a health monitoring file. Since this is an administrative interface, we only support a single health monitor process per filesystem, so we don't need to use anything fancy such as notifier chains (== tons of indirect calls). Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2026-01-20Revert "Merge branch 'netkit-support-for-io_uring-zero-copy-and-af_xdp'"Jakub Kicinski
This reverts commit 77b9c4a438fc66e2ab004c411056b3fb71a54f2c, reversing changes made to 4515ec4ad58a37e70a9e1256c0b993958c9b7497: 931420a2fc36 ("selftests/net: Add netkit container tests") ab771c938d9a ("selftests/net: Make NetDrvContEnv support queue leasing") 6be87fbb2776 ("selftests/net: Add env for container based tests") 61d99ce3dfc2 ("selftests/net: Add bpf skb forwarding program") 920da3634194 ("netkit: Add xsk support for af_xdp applications") eef51113f8af ("netkit: Add netkit notifier to check for unregistering devices") b5ef109d22d4 ("netkit: Implement rtnl_link_ops->alloc and ndo_queue_create") b5c3fa4a0b16 ("netkit: Add single device mode for netkit") 0073d2fd679d ("xsk: Proxy pool management for leased queues") 1ecea95dd3b5 ("xsk: Extend xsk_rcv_check validation") 804bf334d08a ("net: Proxy netdev_queue_get_dma_dev for leased queues") 0caa9a8ddec3 ("net: Proxy net_mp_{open,close}_rxq for leased queues") ff8889ff9107 ("net, ethtool: Disallow leased real rxqs to be resized") 9e2103f36110 ("net: Add lease info to queue-get response") 31127deddef4 ("net: Implement netdev_nl_queue_create_doit") a5546e18f77c ("net: Add queue-create operation") The series will conflict with io_uring work, and the code needs more polish. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20selftests/bpf: update verifier test for default trusted pointer semanticsMatt Bobrowski
Replace the verifier test for default trusted pointer semantics, which previously relied on BPF kfunc bpf_get_root_mem_cgroup(), with a new test utilizing dedicated BPF kfuncs defined within the bpf_testmod. bpf_get_root_mem_cgroup() was modified such that it again relies on KF_ACQUIRE semantics, therefore no longer making it a suitable candidate to test BPF verifier default trusted pointer semantics against. Link: https://lore.kernel.org/bpf/20260113083949.2502978-2-mattbobrowski@google.com Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20260120091630.3420452-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20tools/net/ynl: Makefile's install target now installs ynltoolMichel Lind
This tool is built by default, but was not being installed by default when running `make install`. Fix this by calling ynltool's install target. Signed-off-by: Michel Lind <michel@michel-slm.name> Link: https://patch.msgid.link/aWqr9gUT4hWZwwcI@mbp-m3-fedora.vm Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Merge branch 'bpf-fix-memory-access-flags-in-helper-prototypes'Alexei Starovoitov
Zesen Liu says: ==================== bpf: Fix memory access flags in helper prototypes This series adds missing memory access flags (MEM_RDONLY or MEM_WRITE) to several bpf helper function prototypes that use ARG_PTR_TO_MEM but lack the correct flag. It also adds a new check in verifier to ensure the flag is specified. Missing memory access flags in helper prototypes can lead to critical correctness issues when the verifier tries to perform code optimization. After commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking"), the verifier relies on the memory access flags, rather than treating all arguments in helper functions as potentially modifying the pointed-to memory. Using ARG_PTR_TO_MEM alone without flags does not make sense because: - If the helper does not change the argument, missing MEM_RDONLY causes the verifier to incorrectly reject a read-only buffer. - If the helper does change the argument, missing MEM_WRITE causes the verifier to incorrectly assume the memory is unchanged, leading to errors in code optimization. We have already seen several reports regarding this: - commit ac44dcc788b9 ("bpf: Fix verifier assumptions of bpf_d_path's output buffer") adds MEM_WRITE to bpf_d_path; - commit 2eb7648558a7 ("bpf: Specify access type of bpf_sysctl_get_name args") adds MEM_WRITE to bpf_sysctl_get_name. This series looks through all prototypes in the kernel and completes the flags. It also adds check_mem_arg_rw_flag_ok() and wires it into check_func_proto() to statically restrict ARG_PTR_TO_MEM from appearing without memory access flags. Changelog ========= v3: - Rebased to bpf-next to address check_func_proto() signature changes, as suggested by Eduard Zingerman. v2: - Add missing MEM_RDONLY flags to protos with ARG_PTR_TO_FIXED_SIZE_MEM. ==================== Link: https://patch.msgid.link/20260120-helper_proto-v3-0-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20bpf: Require ARG_PTR_TO_MEM with memory flagZesen Liu
Add check to ensure that ARG_PTR_TO_MEM is used with either MEM_WRITE or MEM_RDONLY. Using ARG_PTR_TO_MEM alone without flags does not make sense because: - If the helper does not change the argument, missing MEM_RDONLY causes the verifier to incorrectly reject a read-only buffer. - If the helper does change the argument, missing MEM_WRITE causes the verifier to incorrectly assume the memory is unchanged, leading to errors in code optimization. Co-developed-by: Shuran Liu <electronlsr@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Zesen Liu <ftyghome@gmail.com> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120-helper_proto-v3-2-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>