linux-stable.git - Linux kernel stable tree

Age	Commit message (Collapse)	Author
4 days	qede: sync udp_tunnel ports outside qede_lock in the recovery path	Denis V. Lunev
	A TX timeout on a qede NIC that has VXLAN/GENEVE tunnel ports configured wedges the rtnetlink control plane of the whole machine: NETDEV WATCHDOG: ens6f1 (qede): transmit queue 2 timed out 10226 ms [qede_tx_timeout:586(ens6f1)]TX timeout on queue 2! [qede_recovery_handler:2665(ens6f0)]Starting a recovery process The recovery path deadlocks on the driver's own mutex: qede_sp_task rtnl_lock() mutex_lock(&edev->qede_lock) <- taken qede_recovery_handler qede_load udp_tunnel_nic_reset_ntf __udp_tunnel_nic_device_sync info->sync_table == qede_udp_tunnel_sync mutex_lock(&edev->qede_lock) <- same task: deadlock The mutex is not recursive, so the kworker blocks on itself with rtnl_lock held, and neither lock is ever released. Every task that calls rtnl_lock() afterwards (ip, ovs-vswitchd, lldpad, IPv6 addrconf, sshd) blocks forever while the node still answers ping. In a vmcore from an affected production node rtnl_mutex.owner decodes to the very kworker blocked at the innermost mutex_lock() above. Re-sync the tunnel ports from qede_sp_task() after the internal lock is dropped, still under rtnl_lock as the udp_tunnel API requires. This mirrors qede_open(), which calls udp_tunnel_nic_reset_ntf() under rtnl without the internal lock. qede_recovery_handler() now returns whether it has successfully reloaded an open device, and the caller re-syncs the ports only in that case. This keeps the old gating exactly: a device that was down or a failed recovery returns false, as those paths never reached the udp_tunnel_nic_reset_ntf() call before either. This was the only user of the qede_lock()/qede_unlock() helpers, so remove them. Fixes: 8cd160a29415 ("qede: convert to new udp_tunnel_nic infra") Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrew Lunn <andrew+netdev@lunn.ch> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260726104311.1782900-1-den@openvz.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 days	octeontx2-pf: Set correct sequence for carrier off and tx queue stop	Suman Ghosh
	During link down event, we were doing netif_tx_stop_all_queues() first and then netif_carrier_off(). This can cause a potential race since carrier is still on during down event. This patch reverse the calling order to fix the issue. Fixes: 50fe6c02e5ad ("octeontx2-pf: Register and handle link notifications") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260724072831.2415281-1-rkannoth@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
4 days	net: libwx: fix FDIR ATR queue mismatch for software VLAN packets	Jiawen Wu
	When TX VLAN hardware offload is disabled, VLAN tags are embedded in the packet payload (software VLAN). Previously, the driver failed to set the WX_TX_FLAGS_SW_VLAN flag for these packets during transmission. This missing flag caused the txgbe FDIR ATR logic to fall through to the default hash calculation path. This resulted in asymmetric hash values for Tx and Rx flows, preventing return packets from being steered to the same queue as the transmit packets. Fix this by detecting software VLANs via eth_type_vlan(skb->protocol) and setting WX_TX_FLAGS_SW_VLAN. This ensures the ATR feature selects the correct hashing algorithm to maintain Tx/Rx queue symmetry. Fixes: b501d261a5b3 ("net: txgbe: add FDIR ATR support") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/0879DA38A8E32701+20260724074657.10773-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net: mana: Return error code from mana_create_rxq()	Aditya Garg
	mana_create_rxq() returns a struct mana_rxq pointer and returns NULL on any failure. The caller, mana_add_rx_queues(), cannot tell what went wrong and hardcodes the error as -ENOMEM. As a result the actual failure reported by the lower layers (for example -EPROTO from a failed HW request) is masked and every RX queue creation failure looks like an out-of-memory error. Return an ERR_PTR() encoded error code from mana_create_rxq() on failure instead of NULL. The caller now propagates the returned error code directly instead of substituting -ENOMEM. Fixes: ca9c54d2d6a5 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)") Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260727113759.2881500-1-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: stmmac: Fix E2E delay mechanism	Nazim Amirul
	For E2E delay mechanism, "received DELAY_REQ without timestamp" error messages show up for dwmac v3.70+ and dwxgmac IPs. This issue affects socfpga platforms, Agilex7 (dwmac 3.70) and Agilex5 (dwxgmac). According to the databook, to enable timestamping for all events, the SNAPTYPSEL bits in the MAC_Timestamp_Control register must be set to 2'b01, and the TSEVNTENA bit must be cleared to 0'b0. Commit 3cb958027cb8 ("net: stmmac: Fix E2E delay mechanism") already addresses this problem for all dwmacs above version v4.10. However, same holds true for v3.70 and above, as well as for dwxgmac. Updates the check accordingly. Fixes: 14f347334bf2 ("net: stmmac: Correctly take timestamp for PTPv2") Fixes: f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a") Fixes: 3cb958027cb8 ("net: stmmac: Fix E2E delay mechanism") Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com> Link: https://patch.msgid.link/20260728060904.31993-1-muhammad.nazim.amirul.nazle.asmade@altera.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	Merge branch '200GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-07-28 (idpf, ice, igc, igbvf, e1000) Michael Bommarito adds bounds checking to ensure interrupt vector array stays in-bounds on idpf. Josh adjusts minimum value for Tx ring descriptors to prevent Tx timeouts in flow based scheduling mode in idpf. Yuho Choi frees IRQ name in error path to prevent memory leak for idpf. Aaron Ma adds a wait for reset completion before returning from resume on ice driver. Dawid completely disables and clears VF interrupts during reset on ice. Dawei Feng adjusts error path for ice loopback test setup and e1000 probe to prevent memory leaks. Przemek ignores, expected, -EBUSY errors that can occur during reset and cause disabling of DPLL on ice. David Carlier removes napi_synchronize() during igc_down for igc. Matt Vollrath removes incorrect decrement of count which could cause leaking due to off-by-one issue. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000: fix memory leak in e1000_probe() igbvf: Fix leak in TX DMA error cleanup igc: remove napi_synchronize() in igc_down() ice: suppress DPLL errors during reset recovery ice: fix memory leak in ice_lbtest_prepare_rings() ice: fix VF interrupts cleanup ice: wait for reset completion in ice_resume() idpf: Fix mailbox IRQ name leak on request failure idpf: adjust TxQ ring count minimum idpf: bound interrupt-vector register fill to the allocated array ==================== Link: https://patch.msgid.link/20260728210909.3042004-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
5 days	net: sxgbe: check descriptor ring allocation failures	Chenguang Zhao
	sxgbe_open() ignores the return value of init_dma_desc_rings() and continues to program DMA with invalid ring addresses when allocation fails. Check the return value and disconnect the PHY on failure. Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
5 days	net: sxgbe: free TX rings on RX allocation failure	Chenguang Zhao
	When RX descriptor ring allocation fails, init_dma_desc_rings() only frees the partially allocated RX rings and returns. The TX rings that were allocated earlier in the same function are leaked. Rearrange error labels to clean up TX rings upon RX failures. Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
6 days	octeontx2-af: Block VFs from clobbering special CGX PKIND state	Hariprasad Kelam
	PF and VF NIX LFs that share a CGX LMAC reuse the same hardware PKIND programming. When HiGig2 or EDSA parsing is enabled, a VF NIX LF alloc must not reset the LMAC RX PKIND or default TX parse config over the PF setup. Add cgx_get_pkind() and rvu_cgx_is_pkind_config_permitted() so VFs skip cgx_set_pkind(), rvu_npc_set_pkind(), and NIX_AF_LFX_TX_PARSE_CFG updates when the LMAC is using NPC_RX_HIGIG_PKIND or NPC_RX_EDSA_PKIND. Fixes: 94d942c5fb97 ("octeontx2-af: Config pkind for CGX mapped PFs") Cc: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Link: https://patch.msgid.link/20260722081229.1653619-1-rkannoth@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
6 days	e1000: fix memory leak in e1000_probe()	Dawei Feng
	In the e1000_probe() path, e1000_sw_init() allocates adapter->tx_ring and adapter->rx_ring. If the subsequent CE4100-specific MDIO BAR mapping fails, the error handling jumps past the ring cleanup code, leaking both allocations. Fix this leak by moving the err_mdio_ioremap label above the ring deallocation logic. This guarantees the proper release of these resources and prevents the memory leak. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1-rc6. An x86_64 allyesconfig build showed no new warnings. As we do not have a CE4100 reference platform to test with, no runtime testing was able to be performed. Fixes: 5377a4160bb65 ("e1000: Add support for the CE4100 reference platform") Cc: stable@vger.kernel.org Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	igbvf: Fix leak in TX DMA error cleanup	Matt Vollrath
	If an error is encountered while mapping TX buffers, the driver should unmap any buffers already mapped for that skb. Because count is incremented before each frag mapping, it will always match the correct number of unmappings needed when dma_error is reached. Decrementing count before the while loop in dma_error causes an off-by-one error. If any mapping was successful before an unsuccessful mapping, exactly one DMA mapping (the head) would leak. This bug was introduced by a 2010 fix for an endless loop in dma_error. All other affected drivers have already been fixed. Fixes: c1fa347f20f1 ("e1000/e1000e/igb/igbvf/ixgb/ixgbe: Fix tests of unsigned in *_tx_map()") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-4-7-opus Signed-off-by: Matt Vollrath <tactii@gmail.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	igc: remove napi_synchronize() in igc_down()	David Carlier
	When an AF_XDP zero-copy application is killed abruptly, the XSK pool is torn down but NAPI keeps polling. igc_clean_rx_irq_zc() then returns the full budget on every poll, so napi_complete_done() never clears NAPI_STATE_SCHED. igc_down() calls napi_synchronize() before napi_disable(), so it spins forever waiting for that bit and the interface never goes down. Drop the napi_synchronize() and let napi_disable() do the job -- it sets NAPI_STATE_DISABLE, which forces the stuck poll to complete. Reorder it ahead of igc_set_queue_napi() so the NAPI mapping is cleared only after polling has stopped, matching the recent igb fix b1e067240379. Fixes: fc9df2a0b520 ("igc: Enable RX via AF_XDP zero-copy") Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Tested-by: Moriya Kadosh <moriyax.kadosh@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	ice: suppress DPLL errors during reset recovery	Przemyslaw Korba
	During reset recovery, the admin queue returns EBUSY which is expected behavior. However, the DPLL subsystem was logging these as errors and incrementing the error counter, potentially leading to unnecessary warnings and even disabling the DPLL periodic worker if the threshold was reached. Suppress error logging and error counter increments when the admin queue returns EBUSY, as this is expected during reset recovery and not a real failure condition. test case: - ethtool --reset eth3 irq-shared dma-shared filter-shared offload-shared mac-shared phy-shared ram-shared - observe if dmesg EBUSY errors are gone Fixes: d7999f5ea64b ("ice: implement dpll interface to control cgu") Signed-off-by: Przemyslaw Korba <przemyslaw.korba@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	ice: fix memory leak in ice_lbtest_prepare_rings()	Dawei Feng
	ice_lbtest_prepare_rings() frees Rx rings only when ice_vsi_start_all_rx_rings() fails. If ice_vsi_setup_rx_rings() fails after allocating some descriptors, or if ice_vsi_cfg_lan() fails after the Rx rings were prepared, the function reaches the Tx cleanup path without releasing the initialized Rx resources. Fix this by adding separate unwind paths for Rx setup failure and LAN configuration failure. The Rx setup failure path releases the partially prepared Rx rings before freeing Tx rings, while later failures first undo the LAN Tx configuration and then release the Rx rings in reverse setup order. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. Manual inspection confirms that the bug is still present in v7.1-rc7. An x86_64 allyesconfig build showed no new warnings. As we do not have an Intel E800 Series adapter available to run the ethtool offline loopback selftest, no runtime testing was able to be performed. Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest") Cc: stable@vger.kernel.org Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	ice: fix VF interrupts cleanup	Dawid Osuchowski
	When a virtual function sends an IRQ map command, the PF will set up interrupts according to that request. However, because these interrupts are never reset, the next time Virtual Function initializes, the interrupts are still enabled for a given VF, which leads to performance degradation in certain cases due to interrupts being unexpectedly enabled and thus causing interrupt floods. Cc: stable@vger.kernel.org Fixes: 1071a8358a28 ("ice: Implement virtchnl commands for AVF support") Suggested-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Patryk Holda <patryk.holda@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	ice: wait for reset completion in ice_resume()	Aaron Ma
	ice_resume() schedules an asynchronous PF reset and returns immediately. The reset runs later in ice_service_task(). If userspace tries to bring up the net device before the reset finishes, ice_open() fails with -EBUSY: ice_resume() ice_schedule_reset() # sets ICE_PFR_REQ, returns ... ice_open() ice_is_reset_in_progress() # ICE_PFR_REQ still set, -EBUSY ... ice_service_task() ice_do_reset() ice_rebuild() # clears ICE_PFR_REQ, too late Reproduced on E800 series NICs during suspend/resume with irdma enabled, where the aux device probe widens the race window. ice 0000:81:00.0: can't open net device while reset is in progress Add a best-effort wait (10s timeout, matching ice_devlink_info_get()) for the reset to complete before returning from ice_resume(). In practice the reset completes in ~300ms. Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL") Cc: stable@vger.kernel.org Reviewed-by: Kohei Enju <kohei@enjuk.jp> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	idpf: Fix mailbox IRQ name leak on request failure	Yuho Choi
	idpf_mb_intr_req_irq() allocates the mailbox IRQ name before calling request_irq(). On success, the name is released later through kfree(free_irq()), but request_irq() failure returns without freeing it. Free the allocated name on the request_irq() failure path. Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Signed-off-by: Yuho Choi <dbgh9129@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	idpf: adjust TxQ ring count minimum	Joshua Hay
	Set the TxQ ring count minimum to 128 descriptors. Any lower than this, and the queue will stall and trigger Tx timeouts in flow based scheduling mode. This is because next_to_clean might never be updated. In flow based scheduling mode, next_to_clean is only updated after a descriptor completion is processed, i.e. after the RE bit is set in the last descriptor of a Tx packet. This will never happen with a ring size of 64 and an IDPF_TX_SPLITQ_RE_MIN_GAP of 64. No matter what the value of last_re is initialized/set to, the calculated gap will be at most 63 and never trigger the RE bit. Even a ring size of 96 does not solve this. Because of how infrequent next_to_clean is updated and how small the ring is, IDPF_DESC_UNUSED will be much smaller on average. This increases the chance the queue will be stopped because a multi-descriptor packet, e.g. a large LSO packet, does not see enough resources on the ring. In this case, the queue will trigger the stop logic. The queue permanently stalls because there is no chance for a descriptor completion to update next_to_clean since it is dependent on a packet being sent. Fixes: 5f417d551324 ("idpf: replace flow scheduling buffer ring with buffer pool") Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	idpf: bound interrupt-vector register fill to the allocated array	Michael Bommarito
	idpf_get_reg_intr_vecs() fills the caller-allocated reg_vals[] array from the VIRTCHNL2_OP_ALLOC_VECTORS reply in adapter->req_vec_chunks, bounding its inner loop only by the per-chunk num_vectors. The array is sized separately: idpf_intr_reg_init() allocates kzalloc_objs(struct idpf_vec_regs, total_vecs) from caps.num_allocated_vectors and only checks the returned count after the fill. The sum of per-chunk num_vectors is never reconciled against total_vecs, so a reply with a small num_allocated_vectors but chunks summing higher writes past the end of reg_vals[]. Impact: a control plane (a PF or hypervisor device model) that returns a VIRTCHNL2_OP_ALLOC_VECTORS reply whose per-chunk num_vectors sum exceeds num_allocated_vectors writes struct idpf_vec_regs entries past the end of the reg_vals kmalloc allocation (KASAN slab-out-of-bounds write). Bound the fill loop to the array capacity passed in by the callers, mirroring the sibling idpf_vport_get_q_reg(). The existing num_regs < num_vecs check then rejects an undersized reply without the out-of-bounds write happening first. Fixes: d4d558718266 ("idpf: initialize interrupts and enable vport") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
6 days	net: ethernet: mtk_eth_soc: pass eth to mtk_handle_irq_rx in poll_controller	Chenguang Zhao
	mtk_handle_irq_rx expects a struct mtk_eth * (matching the request_irq cookie), but mtk_poll_controller incorrectly passed the net_device *. Calling ndo_poll_controller with CONFIG_NET_POLL_CONTROLLER enabled would then crash. Fixes: 8186f6e382d8 ("net-next: mediatek: fix compile error inside mtk_poll_controller()") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Link: https://patch.msgid.link/20260723055735.885112-1-chenguang.zhao@linux.dev Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 days	ethtool: Embed FEC hist ranges as buffer in struct	Eric Joyner
	When a driver's .get_fec_stats() handler is called and the driver supports FEC histogram stats, the driver supplies the histogram bin ranges via a pointer. This pointer is assigned while under the netdev ops lock in fec_prepare_data(), but the actual data is only read after the lock is released; so this allows the driver to change the ranges (e.g. from another .get_fec_stats() call) while the current call chain is reading them in fec_fill_reply(). Fix this by adding an ethtool core-owned buffer, ranges_buf, to struct ethtool_fec_hist. Drivers whose ranges are built dynamically (currently just mlx5) fill ranges_buf and then point the existing ranges pointer at it, giving ethtool a consistent copy that stays valid after the netdev ops lock is dropped and later in fec_fill_reply(). Drivers whose ranges are compile-time constants (bnxt, netdevsim) are unaffected by the potential race and keep setting the existing ranges pointer to their constant array, without making copies. Fixes: cc2f08129925 ("ethtool: add FEC bins histogram report") Signed-off-by: Eric Joyner <eric.joyner@amd.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://patch.msgid.link/20260723041342.39238-1-eric.joyner@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 days	rtase: fix double free of multi-frag skb on DMA map failure	Yun Lu
	In rtase_start_xmit(), when the head buffer DMA mapping fails after rtase_xmit_frags() has mapped all fragments, the error path clears the fragment descriptors with rtase_tx_clear_range(), which frees the skb through the last-frag slot and accounts tx_dropped. Control then falls through to the common error label, which frees the same skb a second time and counts it again. Return right after clearing the fragments when the skb owns frags; the no-frag case still drops through and frees the head skb once. Fixes: d6e882b89fdf ("rtase: Implement .ndo_start_xmit function") Signed-off-by: Yun Lu <luyun@kylinos.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Justin Lai <justinlai0215@realtek.com> Link: https://patch.msgid.link/20260721023836.6691-1-luyun_611@163.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
7 days	forcedeth: fix UAF of txrx_stats in nv_remove	Chenguang Zhao
	nv_remove() frees the per-CPU txrx_stats before unregister_netdev(). Until unregister completes, ndo_get_stats64, the NAPI/xmit data path, and nv_close()/drain may still access txrx_stats, leading to a use-after-free. Free the stats only after unregister_netdev(). Fixes: f4b633b911fd ("forcedeth: use per cpu to collect xmit/recv statistics") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20260723092637.2135095-1-chenguang.zhao@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	bnge/bng_re: fix ring ID widths	Vikas Gupta
	Firmware requires more than 16 bits to address TX ring IDs for its internal QP management. Widen the associated HSI ring ID fields to 32 bits. The values firmware assigns remain within 24 bits, bounded by the hardware doorbell XID field. The fw_ring_id field belongs to bnge_ring_struct, a common struct shared by all ring types, so widening it to u32 applies uniformly across TX, RX, CP, and NQ rings but firmware assigns values within 16-bit range for all ring types except TX, which requires the wider field. Note that, Thor Ultra hardware has not yet been deployed and no firmware has been released to field, so backward compatibility is not a concern. Fixes: 42d1c54d6248 ("bnge/bng_re: Add a new HSI") Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Reviewed-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Reviewed-by: Dharmender Garg <dharmender.garg@broadcom.com> Reviewed-by: Yendapally Reddy Dhananjaya Reddy <yendapally.reddy@broadcom.com> Link: https://patch.msgid.link/20260721063731.2622500-1-vikas.gupta@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	net: airoha: fix ETS channel derivation in airoha_tc_setup_qdisc_ets()	Lorenzo Bianconi
	Derive the hardware QoS channel from opt->parent instead of opt->handle in airoha_tc_setup_qdisc_ets(). The ETS qdisc handle is either user-specified or auto-allocated by qdisc_alloc_handle() and bears no relation to the HTB leaf classid that identifies the hardware channel. HTB derives the channel from TC_H_MIN(opt->classid), and ETS is always attached as a child of an HTB leaf, so its opt->parent matches that classid. Using opt->handle instead can cause two ETS qdiscs on different HTB leaves to collide on the same hardware channel, corrupting scheduler configuration and stats. Fixes: 20bf7d07c956 ("net: airoha: Add sched ETS offload support") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20260720-airoha-ets-handle-fix-v2-1-6f7129ddc06f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	idpf: fix max_vport related crash on allocation error during init	Emil Tantilov
	Set adapter->max_vports only after successful allocation of vports, netdevs and vport_config buffers. This fixes possible crashes on reset or rmmod, following failed allocation on init [ 305.981402] idpf 0000:83:00.0: enabling device (0100 -> 0102) [ 305.994464] idpf 0000:83:00.0: Device HW Reset initiated [ 320.416872] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 320.416918] #PF: supervisor read access in kernel mode [ 320.416942] #PF: error_code(0x0000) - not-present page [ 320.416963] PGD 2099657067 P4D 0 [ 320.416983] Oops: Oops: 0000 [#1] SMP NOPTI ... [ 320.417093] RIP: 0010:idpf_remove+0x118/0x200 [idpf] [ 320.417130] Code: 8b bb 98 09 00 00 e8 17 0f 5b e5 48 8b bb e8 08 00 00 e8 0b 0f 5b e5 66 83 bb 28 06 00 00 00 48 8b bb 20 06 00 00 74 49 31 ed <48> 8b 04 ef 48 85 c0 74 2f 48 8b 78 20 e8 66 58 91 e5 48 8b 83 20 [ 320.417183] RSP: 0018:ff7322212903fdb8 EFLAGS: 00010246 [ 320.417205] RAX: 0000000000000000 RBX: ff4463de40300000 RCX: ff7322212903fd4c [ 320.417228] RDX: 0000000000000001 RSI: ffffffffa7f7d100 RDI: 0000000000000000 [ 320.417250] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 320.417272] R10: 0000000000000001 R11: ff4463de3a638f58 R12: ff4463be89ac7000 [ 320.417294] R13: ff4463be89ac7198 R14: ff4463be94fc7198 R15: ffffffffc0f10f20 [ 320.417317] FS: 00007f963c0e6740(0000) GS:ff4463fdd65d8000(0000) knlGS:0000000000000000 [ 320.417342] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 320.417362] CR2: 0000000000000000 CR3: 00000020ba674002 CR4: 0000000000773ef0 [ 320.417385] PKRU: 55555554 [ 320.417398] Call Trace: [ 320.417412] <TASK> [ 320.417429] pci_device_remove+0x42/0xb0 [ 320.417459] device_release_driver_internal+0x1a9/0x210 [ 320.417492] driver_detach+0x4b/0x90 [ 320.417516] bus_remove_driver+0x70/0x100 [ 320.417539] pci_unregister_driver+0x2e/0xb0 [ 320.417564] __do_sys_delete_module.constprop.0+0x190/0x2f0 [ 320.417592] ? kmem_cache_free+0x31e/0x550 [ 320.417619] ? lockdep_hardirqs_on_prepare+0xde/0x190 [ 320.417644] ? do_syscall_64+0x38/0x6b0 [ 320.417665] do_syscall_64+0xc8/0x6b0 [ 320.417683] ? clear_bhb_loop+0x30/0x80 [ 320.417706] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 320.417727] RIP: 0033:0x7f963bb30beb Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration") Reviewed-by: Madhu Chittim <madhu.chittim@intel.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Samuel Salin <Samuel.salin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-13-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: reject out-of-range ptype in ice_parser_profile_init	Aleksandr Loktionov
	set_bit(rslt->ptype, prof->ptypes) operates on a DECLARE_BITMAP of ICE_FLOW_PTYPE_MAX (1024) bits. Nothing prevents a malicious VF from providing ptype >= 1024 through VIRTCHNL, resulting in a write past the end of the bitmap and a kernel page fault. Reproduced with a custom kernel module injecting a crafted VIRTCHNL_OP_ADD_RSS_CFG on E810-C QSFP (8086:1592), FW 4.91 0x800214af 1.3909.0, ICE COMMS DDP 1.3.53.0, kernel 7.1.0-rc1. crash_parser: ice_parser_profile_init @ ffffffffc0d61b60 crash_parser: setting ptype=0xffff (max valid=1023) crash_parser: calling ice_parser_profile_init -- expect OOB crash! BUG: kernel NULL pointer dereference, address: 0000000000000000 Oops: Oops: 0002 [#1] SMP NOPTI CPU: 56 UID: 0 PID: 165011 Comm: insmod Kdump: loaded Tainted: G S U OE 7.1.0-rc1 #1 Hardware name: Intel Corporation S2600BPB/S2600BPB RIP: 0010:ice_parser_profile_init+0x2d/0x1d0 [ice] Call Trace: <TASK> ? __pfx_ice_parser_profile_init+0x10/0x10 [ice] crash_init+0x127/0xff0 [crash_parser] do_one_initcall+0x45/0x310 do_init_module+0x64/0x270 init_module_from_file+0xcc/0xf0 idempotent_init_module+0x17b/0x280 __x64_sys_finit_module+0x6e/0xe0 Bail out early with -EINVAL when ptype is out of range. Fixes: e312b3a1e209 ("ice: add API for parser profile initialization") Cc: stable@vger.kernel.org Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-12-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: prevent tstamp ring allocation for non-PF VSI types	Paul Greenwalt
	The pf->txtime_txqs bitmap tracks which Tx queues have ETF (Earliest TxTime First) offload enabled. This bitmap is indexed by queue number and is set by ice_offload_txtime(), which only operates on PF VSI queues. However, ice_is_txtime_ena() does not check the VSI type before consulting the bitmap. When ETF offload is enabled on PF Tx queue 0, bit 0 is set in pf->txtime_txqs. During a subsequent PCI reset rebuild, the CTRL VSI's Tx queue 0 is reconfigured and ice_is_txtime_ena() is called for that ring. Since it only checks pf->txtime_txqs by queue index without distinguishing VSI type, it finds bit 0 set and returns true, matching the PF VSI's ETF queue, not the CTRL VSI's. This causes ice_vsi_cfg_txq() to spuriously allocate a tstamp_ring for the CTRL VSI ring. Since CTRL VSI rings have no associated netdev, ice_clean_tx_ring() takes an early return at the !netdev check before reaching ice_free_tx_tstamp_ring(), leaking the allocation. Each PCI reset leaks one 64-byte tstamp_ring. Fix this by restricting ice_is_txtime_ena() to return true only for PF VSI rings, since txtime_txqs is only meaningful for PF VSI queues. Fixes: ccde82e90946 ("ice: add E830 Earliest TxTime First Offload support") Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-11-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: fix PTP Call Trace during PTP release	Paul Greenwalt
	If a PF reset occurs when the PTP state is ICE_PTP_UNINIT, then ice_ptp_rebuild() will update the state to ICE_PTP_ERROR. This will result in the following PTP release call trace during driver unload: kernel BUG at lib/list_debug.c:52! ice_ptp_release+0x332/0x3c0 [ice] ice_deinit_features.part.0+0x10e/0x120 [ice] ice_remove+0x100/0x220 [ice] This was observed when passing PF1 through to a VM. ice_ptp_init() fails because ctrl_pf is NULL and sets the state to ICE_PTP_UNINIT. Fix by detecting the ICE_PTP_UNINIT state in ice_ptp_rebuild() and returning without error, preventing the invalid state transition to ICE_PTP_ERROR. The only valid path to ICE_PTP_ERROR is from ICE_PTP_RESETTING after a failed rebuild. Fixes: 8293e4cb2ff5 ("ice: introduce PTP state machine") Cc: stable@vger.kernel.org Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-10-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: use READ_ONCE() to access cached PHC time	Sergey Temerkhanov
	ptp.cached_phc_time is a 64-bit value updated by a periodic work item on one CPU and read locklessly on another. On 32-bit or non-atomic architectures this can result in a torn read. Use READ_ONCE() to enforce a single atomic load. Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Cc: stable@vger.kernel.org Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-9-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: fix LAG recipe to profile association	Marcin Szycik
	ice_init_lag() associates recipes to profiles, assuming that Link Aggregation-related profiles will always have profile ID lower than 70 (ICE_PROFID_IPV6_GTPU_IPV6_TCP_INNER). This value seems arbitrary and might not always be valid for some versions of DDP package, i.e. LAG profiles may have profile ID greater than 70. This would lead to misconfigured switch and LAG not working properly. Fix it by checking up to maximum profile ID. Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Dave Ertman <david.m.ertman@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-7-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: pass the return value of skb_checksum_help()	Michal Swiatkowski
	skb_checksum_help() can fail. Pass its return value back to the caller. Commonize this software path in goto. Instead of just returning error try calculating software checksum first. There is a check for TSO in checksum_sw_fb. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-4-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	ice: allow creating VFs when !CONFIG_ICE_SWITCHDEV	Vincent Chen
	Currently ice_eswitch_attach_vf() is called unconditionally in ice_start_vfs(), which causes VF creation to fail when CONFIG_ICE_SWITCHDEV is not defined. Fix this by adding switchdev mode checks at the call sites before calling ice_eswitch_attach_vf(), consistent with how ice_eswitch_attach_sf() is already handled in ice_devlink_port_new(). This is similar to commit aacca7a83b97 ("ice: allow creating VFs for !CONFIG_NET_SWITCHDEV") which fixed the same issue for the previous ice_eswitch_configure() API. Fixes: 415db8399d06 ("ice: make representor code generic") Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://patch.msgid.link/20260717185340.3595286-2-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	octeontx2-pf: tc: fix egress ratelimiting	Hariprasad Kelam
	The egress rate calculation computes an incorrect mantissa and exponent, causing up to ~50% deviation from the configured rate at lower speeds. Rework the computation to follow the hardware rate formula: rate = 2 * (1 + mantissa/256) * 2^exp / (1 << div_exp) Keep div_exp = 0 and derive exp and mantissa from half of the requested rate. Rates below 2 Mbps are floored to the smallest encodable step (exp = 0, mantissa = 0). Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Nitin Shetty J <nshettyj@marvell.com> Link: https://patch.msgid.link/20260717084349.2227796-1-nshettyj@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	net/mlx5e: Reject unsupported CB Shaper TSA in ETS validation	Alexei Lazar
	Credit Based (CB) TSA is not supported by the mlx5 driver, so reject any configurations that specify it. Fixes: 08fb1dacdd76 ("net/mlx5e: Support DCBNL IEEE ETS") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260717075125.1244877-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	net/mlx5e: Report zero bandwidth for non-ETS traffic classes	Alexei Lazar
	The IEEE 802.1Qaz standard defines that bandwidth allocation percentages only apply to Enhanced Transmission Selection (ETS) traffic classes. For STRICT and VENDOR transmission selection algorithms, bandwidth percentage values are not applicable. Currently for non-ETS 100 bandwidth is being reported for all traffic classes in the get operation due to hardware limitation, regardless of their TSA type. Fix this by reporting 0 for non-ETS traffic classes. Fixes: 820c2c5e773d ("net/mlx5e: Read ETS settings directly from firmware") Signed-off-by: Alexei Lazar <alazar@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260717075125.1244877-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	net/mlx5: E-Switch, fix zero num_dest in prio_tag egress vlan rule	Yael Chemla
	esw_egress_acl_vlan_create() hardcodes num_dest=0 in its mlx5_add_flow_rules() call. When invoked from the non-bond path fwd_dest is NULL and num_dest=0 is correct. When invoked from esw_acl_egress_ofld_rules_create() during a bond event, fwd_dest is non-NULL and flow_act.action carries MLX5_FLOW_CONTEXT_ACTION_FWD_DEST, but _mlx5_add_flow_rules() rejects a non-NULL dest pointer paired with dest_num<=0 and returns -EINVAL. The error propagates as "configure slave vport egress fwd, err(-22)". The passive vport's egress ACL table ends up with its flow groups allocated but no FTEs, so prio-tagged packets are not popped and bond failover is broken on prio_tag_required devices. Fix by passing fwd_dest ? 1 : 0 as num_dest to match the actual number of destinations supplied. Fixes: bf773dc0e6d5 ("net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule") Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260717073306.1242399-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	net/mlx5: Fix MCIA register buffer overflow on 32 dword reads	Gal Pressman
	The MCIA register can return up to 32 dwords (128 bytes) when the device advertises the mcia_32dwords capability, but struct mlx5_ifc_mcia_reg_bits only defines dword_0..11, leaving room for just 12 dwords (48 bytes) of data. mlx5_query_mcia() clamps the read size to mlx5_mcia_max_bytes() and then memcpy()s that many bytes out of the register, potentially reading past the end of the 'out' buffer. On kernels built with FORTIFY_SOURCE this is caught as a buffer overflow while reading the module EEPROM via ethtool: detected buffer overflow in memcpy kernel BUG at lib/string_helpers.c:1048! RIP: 0010:fortify_panic+0x13/0x20 Call Trace: mlx5_query_mcia.isra.0+0x200/0x210 [mlx5_core] mlx5_query_module_eeprom_by_page+0x4a/0xa0 [mlx5_core] mlx5e_get_module_eeprom_by_page+0xbb/0x120 [mlx5_core] eeprom_prepare_data+0xf3/0x170 ethnl_default_doit+0xf1/0x3b0 Extend the mcia_reg layout to 32 dwords. Fixes: 271907ee2f29 ("net/mlx5: Query the maximum MCIA register read size from firmware") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Alex Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260717072338.1240582-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	hinic: remove unused ethtool RSS user configuration buffers	Chenguang Zhao
	rss_indir_user and rss_hkey_user are allocated and filled in __set_rss_rxfh() when the user configures RSS via ethtool, but nothing ever reads them. hinic_get_rxfh() fetches the state from the device, and the hardware is programmed from the original indir/key arguments. These buffers only leaked on driver unload. Drop the unused allocations, memcpys, and struct fields. Fixes: 4fdc51bb4e92 ("hinic: add support for rss parameters with ethtool") Signed-off-by: Chenguang Zhao <zhaochenguang@kylinos.cn> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260722025353.328179-1-chenguang.zhao@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 days	octeontx2-vf: set TC flower flag on MCAM entry allocation	Suman Ghosh
	When MCAM entries are allocated for a VF netdev via the devlink mcam_count parameter, only OTX2_FLAG_NTUPLE_SUPPORT was set. That enabled ethtool ntuple filters but not tc flower offload. Also set OTX2_FLAG_TC_FLOWER_SUPPORT when entries are successfully allocated. Fixes: 2da489432747 ("octeontx2-pf: devlink params support to set mcam entry count") Signed-off-by: Suman Ghosh <sumang@marvell.com> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260715052007.2099851-1-rkannoth@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 days	net: stmmac: enable the MAC on link up for all supported speeds	vadik likholetov
	stmmac_mac_link_down() clears the MAC's transmit and receive enable bits. stmmac_mac_link_up() is expected to set them again through stmmac_mac_set(..., true), but it first switches on the negotiated speed and returns early for a speed the switch does not list. The MAC is then left gated off. The speed selection is split into three switches, keyed on the interface. The generic branch -- taken for everything that is neither USXGMII nor XLGMII, so including PHY_INTERFACE_MODE_10GBASER -- lists only SPEED_2500, SPEED_1000, SPEED_100 and SPEED_10. MGBE on Tegra234 runs 10GBASE-R into an Aquantia AQR113C. That PHY does rate matching, so phylink_link_up() replaces the media speed with the MAC-side interface speed before calling into the MAC: case RATE_MATCH_PAUSE: speed = phylink_interface_max_speed(link_state.interface); duplex = DUPLEX_FULL; The driver is therefore called as stmmac_mac_link_up(interface=10GBASER, speed=10000, duplex=1) which falls through to "default: return;". The interface stops passing traffic after the first link flap. The failure is easy to misread. The link still comes up, because the PHY is polled over MDIO and needs no MAC, so the interface reports carrier 1 at the media speed. The DMA is untouched, so its start bits stay set and descriptors are still consumed. Only the MAC itself is gated off: the receiver counts nothing (mmc_rx_framecount_gb stops advancing, RE is 0) and nothing reaches the wire (TE is 0). The interface survives boot only because stmmac_hw_setup(), called from ndo_open, enables the MAC unconditionally -- so the problem appears only once the cable has been unplugged and plugged back in, and "ip link set dev <ethX> down && ip link set dev <ethX> up" appears to fix it. The interface is not what the speed bits depend on: with the single exception of 2.5G, which is selected through the XGMII block on USXGMII and through the regular speed bits otherwise, each speed maps to one field of struct mac_link. The per-interface switches are speed validation, and phylink already validates the speed against priv->hw->link.caps. So collapse the three switches into one keyed on the speed alone, keeping the interface test only for the 2.5G case. This covers 10G on 10GBASE-R, and equally 5G, and 1G/100/10 on USXGMII, all of which hit "default: return;" today. A core that does not support a speed leaves the corresponding mac_link field at 0, and phylink will not offer it that speed in the first place. For dwxgmac2 at 10G, link.xgmii.speed10000 is XGMAC_CONFIG_SS_10000, which is 0 and is the correct speed selection for a 10GBASE-R MAC: ctrl then equals old_ctrl, the register write is skipped, and execution reaches stmmac_mac_set(..., true). Log an error in the default case, since a speed with no entry here leaves the MAC disabled and the symptom does not point at the cause. Fixes: d8ca113724e7 ("net: stmmac: tegra: Add MGBE support") Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: vadik likholetov <vadikas@gmail.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260713074911.30090-1-vadikas@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 days	net: stmmac: reset residual action in L3L4 filters on delete	Nazim Amirul
	When deleting an L3/L4 flower filter entry, the action field is not reset. If a filter was previously configured with a drop action, that action may persist and affect subsequent filter configurations unintentionally. Clear the action field when the filter entry is deleted. Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260714023716.29865-5-muhammad.nazim.amirul.nazle.asmade@altera.com Reviewed-by: Jakub Raczynski <j.raczynski@samsung.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 days	net: stmmac: fix l3l4 filter rejecting unsupported offload requests	Nazim Amirul
	The basic flow parser in tc_add_basic_flow() does not validate match keys before proceeding. Unsupported offload configurations such as partial protocol masks, non-IPv4 network proto, or non-TCP/UDP transport proto are silently accepted instead of returning -EOPNOTSUPP. Add validation to return -EOPNOTSUPP early for: - No network or transport proto present in the key - Partial protocol mask (only full mask supported) - Network proto is not IPv4 - Transport proto is not TCP or UDP Each rejection includes an extack message so the user knows which part of the match is unsupported. Also propagate -EOPNOTSUPP from tc_add_basic_flow() in tc_add_flow() by returning it directly rather than using break. The break was silently discarding the error for FLOW_CLS_REPLACE operations where entry->in_use is already true, causing tc_add_flow() to return 0 (success) for unsupported replace requests. Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260714023716.29865-4-muhammad.nazim.amirul.nazle.asmade@altera.com Reviewed-by: Jakub Raczynski <j.raczynski@samsung.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 days	net: stmmac: xgmac: fix l4 filter port overwrite on register update	Nazim Amirul
	The XGMAC_L4_ADDR register holds both source and destination port match values. The current implementation overwrites the entire register when configuring either port, so setting one silently erases the other. Fix this by reading the register first, then masking and updating only the relevant field before writing back. Fixes: 425eabddaf0f ("net: stmmac: Implement L3/L4 Filters using TC Flower") Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com> Signed-off-by: Nazim Amirul <muhammad.nazim.amirul.nazle.asmade@altera.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260714023716.29865-3-muhammad.nazim.amirul.nazle.asmade@altera.com Reviewed-by: Jakub Raczynski <j.raczynski@samsung.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
12 days	net: stmmac: dwmac4: mask interrupts when stopping DMA in suspend	Luis Lang
	Since commit 1b9707e6f1a9 ("net: stmmac: enable RPS and RBU interrupts"), suspending causes an interrupt storm from the RPS interrupt. Fix this by adding a deinit_chan() op to stmmac_dma_ops, which masks all default dma channel interrupts. This is called from stmmac_stop_all_dma(), so interrupts don't trigger while suspending. Fixes: 1b9707e6f1a9 ("net: stmmac: enable RPS and RBU interrupts") Suggested-by: Andrew Lunn <andrew@lunn.ch> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Luis Lang <luis.la@mail.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260720111534.163416-1-luis.la@mail.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days	net: dpaa: fix mode setting	Michael Walle
	Before converting to the phylink interface, the init function would have set a non-reserved I/F mode in the maccfg2 register. After converting to phylink, 0 is written as mode, which is a reserved value (although it's the hardware default). Without a valid mode, a SGMII link is never established between the MAC and the PHY and thus .link_up() is never called which could set the correct mode according to the actual speed. Fix it by setting the maximum speed of the phy_interface_t in use in .mac_config() - just like the driver did before the phylink conversion. Fixes: 5d93cfcf7360 ("net: dpaa: Convert to phylink") Suggested-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: Michael Walle <mwalle@kernel.org> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260717132401.2653252-1-mwalle@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
12 days	net: hip04: fix RX buffer leak on build_skb failure	Fan Wu
	When build_skb() fails in hip04_rx_poll(), the driver jumps to the refill path without releasing the current RX buffer and its DMA mapping. Installing a replacement buffer then overwrites the slot references and leaks both resources. Keep the current slot intact and return budget so NAPI retries the same buffer. Also free a newly allocated RX fragment when dma_map_single() fails. This issue was found by an in-house static analysis tool. Fixes: 701a0fd52318 ("hip04_eth: fix missing error handle for build_skb failed") Cc: stable@vger.kernel.org Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260712142729.2057636-1-fanwu01@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days	net: stmmac: intel: skip SerDes reconfig when rate is unchanged	Markus Breitenberger
	intel_mac_finish() is registered as the phylink mac_finish() callback for the Elkhart Lake SGMII ports. phylink calls it at the end of every major link reconfiguration, including the initial one during probe. The callback selects the PMC ModPHY LCPLL programming for the requested MAC-side interface and then power-cycles the SerDes. On Elkhart Lake that ModPHY is also used by the on-die AHCI SATA PHY. Reapplying the programming during the initial boot-time link-up disturbs the shared analog block while it is still driving SATA, so the SATA link fails to train: ata1: SATA link down (SStatus 1 SControl 300) The disk carrying the root filesystem is never detected and the system hangs at rootwait. Ethernet itself comes up normally, which makes the failure look unrelated to the network driver. Before mac_finish() runs, the legacy SerDes power-up path has already programmed SERDES_GCR0 for the current interface. The 1G and 2.5G ModPHY tables selected by mac_finish() correspond to the SerDes lane rate, so read that rate back from SERDES_GCR0 and skip the PMC reprogramming and SerDes power-cycle when it already matches the selected interface. This keeps the disruptive reprogramming out of the boot path when the SerDes is configured correctly, while preserving the previous behavior when a real SGMII/1000BASE-X to 2500BASE-X rate change is needed. If the register read fails, reconfigure as before. Fixes: a42f6b3f1cc1 ("net: stmmac: configure SerDes according to the interface mode") Cc: stable@vger.kernel.org Signed-off-by: Markus Breitenberger <bre@keba.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20260713171619.192452-1-bre@breiti.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days	gve: fix Rx queue stall on alloc failure	Eddie Phillips
	When the system is under extreme memory pressure, page allocations can fail during the Rx buffer refill loop. If the number of buffers posted to hardware falls below a critical low threshold and the refill loop exits due to allocation failures, the queue can stall: 1. The device drops incoming packets because there are no descriptors. 2. Since no packets are processed, no Rx completions are generated. 3. Because no completions occur, NAPI is never scheduled, preventing the refill loop from running again even after memory is freed. This results in a permanent queue stall. Resolve this by introducing a starvation recovery timer for each Rx queue. If the number of buffers posted to hardware falls below a critical low threshold, start a timer to periodically reschedule NAPI. Once NAPI runs and successfully refills the queue above the threshold, the timer is not rescheduled. The threshold is set to 32 because a single maximum-sized Receive Segment Coalescing (RSC) packet can consume up to 19 descriptors in the Rx path. Lower thresholds (such as 8 or 16) would be insufficient to process a complete maximum-sized RSC packet, risking packet drops or unexpected hardware behavior under memory pressure. Setting the threshold to 32 guarantees a safe margin to handle at least one full RSC packet. Cc: stable@vger.kernel.org Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Reviewed-by: Jordan Rhee <jordanrhee@google.com> Signed-off-by: Eddie Phillips <eddiephillips@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/20260709211906.3322883-1-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 days	pds_core: check for workqueue allocation failure	Nikhil P. Rao
	pdsc_init_pf() does not check whether create_singlethread_workqueue() succeeded. Fail probe on failure. The workqueue is set up before the timer and mutexes, so its failure path must unwind only the earlier setup. Fixes: c2dbb0904310 ("pds_core: health timer and workqueue") Reported-by: sashiko-bot <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260629200358.2626129-1-nikhil.rao%40amd.com?part=2 Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://patch.msgid.link/20260714212713.1788438-1-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>