linux.git/drivers/net, branch v7.2-rc1

Merge tag 'net-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2026-06-25T19:25:36+00:00

Pull networking fixes from Jakub Kicinski:
 "Including fixes from netfilter and IPsec.

  Current release - regressions:

   - do not acquire dev->tx_global_lock in netdev_watchdog_up()

   - ethtool: keep rtnl_lock for ops using ethtool_op_get_link()

   - fix deadlock in nested UP notifier events

  Current release - new code bugs:

   - eth:
      - cn20k: fix subbank free list indexing for search order
      - airoha: fix BQL underflow in shared QDMA TX ring

  Previous releases - regressions:

   - netfilter:
     - flowtable: fix offloaded ct timeout never being extended
     - nf_conncount: prevent connlimit drops for early confirmed ct

  Previous releases - always broken:

   - require CAP_NET_ADMIN in the originating netns when modifying
     cross-netns devices

   - report NAPI thread PID in the caller's pid namespace

   - mac802154: fix dirty frag in in-place crypto for IOT radios

   - sctp: hold socket lock when dumping endpoints in sctp_diag, avoid
     an overflow

   - eth: gve: fix header buffer corruption with header-split and HW-GRO

   - af_key: initialize alg_key_len for IPComp states, prevent OOB read"

* tag 'net-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (213 commits)
  selftests: bonding: add a test for VLAN propagation over a bonded real device
  vlan: defer real device state propagation to netdev_work
  net: add the driver-facing netdev_work scheduling API
  net: turn the rx_mode work into a generic netdev_work facility
  net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()
  rxrpc: Fix rxrpc_rotate_tx_rotate() to check there's something to rotate
  rxrpc: Fix leak of released call in recvmsg(MSG_PEEK)
  rxrpc: Fix socket notification race
  rxrpc: Fix potential infinite loop in rxrpc_recvmsg()
  rxrpc: Fix oob challenge leak in cleanup after notification failure
  rxrpc: Fix the reception of a reply packet before data transmission
  afs: Fix uncancelled rxrpc OOB message handler
  afs: Fix further netns teardown to cancel the preallocation charger
  rxrpc: Fix double unlock in rxrpc_recvmsg()
  rxrpc: Fix leak of connection from OOB challenge
  rxrpc: Fix ACKALL packet handling
  net: hns3: differentiate autoneg default values between copper and fiber
  net: hns3: fix permanent link down deadlock after reset
  net: hns3: refactor MAC autoneg and speed configuration
  net: hns3: unify copper port ksettings configuration path
  ...

net: ethtool: keep rtnl_lock for ops using ethtool_op_get_link()

2026-06-25T17:18:34+00:00

Breno reports following splats on mlx5:

  RTNL: assertion failed at net/core/dev.c (2241)
  WARNING: net/core/dev.c:2241 at netif_state_change+0xed/0x130, CPU#5: ethtool/1335
  RIP: 0010:netif_state_change+0xf9/0x130
  Call Trace:
    
     __linkwatch_sync_dev+0xea/0x120
     ethtool_op_get_link+0xe/0x20
     __ethtool_get_link+0x26/0x40
     linkstate_prepare_data+0x51/0x200
     ethnl_default_doit+0x213/0x470
     genl_family_rcv_msg_doit+0xdd/0x110

Looks like I missed ethtool_op_get_link() trying to sync linkwatch,
which needs rtnl_lock. Not all drivers do this - bnxt doesn't,
it just returns the link state, so add an opt-in bit.

Reported-by: Breno Leitao 
Fixes: 45079e00133e ("net: ethtool: optionally skip rtnl_lock on Netlink path for GET ops")
Acked-by: Stanislav Fomichev 
Reviewed-by: Breno Leitao 
Acked-by: Harshitha Ramamurthy 
Link: https://patch.msgid.link/20260624190439.2521219-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski

net: hns3: differentiate autoneg default values between copper and fiber

2026-06-25T16:15:44+00:00

Fix a link loss issue during driver initialization on optical ports
connected to forced-mode (non-autoneg) remote switches.

Previously, during driver probe or initialization, hclge_configure()
blindly hardcoded hdev->hw.mac.req_autoneg to AUTONEG_ENABLE for all
media types. While this is necessary for copper (BASE-T) ports to
establish a link, many high-speed optical (fiber) ports in data
centers are connected to switches running in forced mode (fixed speed,
autoneg disabled). Forcing autoneg on these optical ports during
initialization causes a permanent link failure since the remote end
refuses to respond to autoneg pulses.

Fix this by implementing media-type differentiated initialization in
hclge_init_ae_dev(). Copper ports continue to default to
AUTONEG_ENABLE, while optical ports strictly inherit the preset
autoneg status pre-configured by the firmware (hdev->hw.mac.autoneg),
preserving native compatibility with forced-mode network environments.

Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang 
Signed-off-by: Jijie Shao 
Link: https://patch.msgid.link/20260624141319.271439-5-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski

net: hns3: fix permanent link down deadlock after reset

2026-06-25T16:15:44+00:00

Fix a critical race condition deadlock where the network interface
remains permanently Link Down after a hardware reset under specific
ethtool sequences.

This issue exclusively manifests in firmware-controlled PHY topologies
where the driver relies on the IMP firmware to arbitrate link parameters.
Standard devices driven by the kernel's native PHY_LIB are unaffected.

The deadlock occurs via the following path:
1. User disables autoneg and forces an unmatched speed, forcing link
   down: `ethtool -s ethx autoneg off speed 10 duplex full`
2. User re-enables autoneg: `ethtool -s ethx autoneg on`. The netdev
   stack passes cmd->base.speed as SPEED_UNKNOWN (0xffffffff).
3. Driver saves req_autoneg=1, but before the interface can link up,
   a hardware reset is triggered.
4. During reset recovery, MAC init reads the un-synchronized runtime
   state mac.autoneg (which is still 0/OFF), misinterprets it as
   forced mode, and pushes the cached SPEED_UNKNOWN into the hardware
   registers, causing the MAC firmware state machine to freeze.
   Meanwhile, PHY init reads req_autoneg=1 and enables PHY autoneg.

Since the MAC is frozen with 0xffffffff and PHY is running autoneg,
they mismatch permanently.

Fix this by:
1. Intercepting SPEED_UNKNOWN/DUPLEX_UNKNOWN in
   hclge_set_phy_link_ksettings() and hclge_cfg_mac_speed_dup_h() to
   prevent it from corrupting the driver's cached valid configuration.
2. Save req_autoneg in hclge_set_autoneg().
3. Aligning the state judgment in hclge_set_autoneg_speed_dup() to use
   req_autoneg instead of the un-synchronized runtime mac.autoneg,
   ensuring both MAC and PHY consistently enter the autoneg branch to
   eliminate configuration discrepancies during reset recovery.

Fixes: 05eb60e9648c ("net: hns3: using user configure after hardware reset")
Signed-off-by: Shuaisong Yang 
Signed-off-by: Jijie Shao 
Link: https://patch.msgid.link/20260624141319.271439-4-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski

net: hns3: refactor MAC autoneg and speed configuration

2026-06-25T16:15:44+00:00

Extract the MAC autoneg and speed/duplex/lane configuration logic out
of hclge_mac_init() and encapsulate it into a new dedicated helper
function hclge_set_autoneg_speed_dup().

In the init path (hclge_init_ae_dev), this helper is now called after
hclge_update_port_info() so that firmware-reported autoneg values are
already populated before applying the link configuration.

Introduce a separate req_lane_num field in struct hclge_mac to isolate
the user-requested lane count from mac.lane_num, which firmware may
overwrite via hclge_get_sfp_info() with stale values from a prior link
lifecycle (e.g., lane_num=4 from 100G). During probe, req_lane_num is
initialized to 0, which instructs firmware to auto-select the correct
lane count for the current speed, rather than reusing the firmware-
reported mac.lane_num that may be inconsistent with the target speed.
This prevents probe failures from mismatched (speed, lane_num) pairs.

In the reset path (hclge_reset_ae_dev), it runs immediately after
hclge_mac_init(), using the previously cached req_* values to restore
the link without re-querying firmware.

Signed-off-by: Shuaisong Yang 
Signed-off-by: Jijie Shao 
Link: https://patch.msgid.link/20260624141319.271439-3-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski

net: hns3: unify copper port ksettings configuration path

2026-06-25T16:15:44+00:00

Refactor hns3_set_link_ksettings() and hclge_set_phy_link_ksettings()
to unify the configuration path for copper ports.

Previously, netdevs with a native kernel phy attached bypassed the main
MAC parameter caching logic and returned early via
phy_ethtool_ksettings_set(). This prevented the driver from updating
hdev->hw.mac.req_xxx variables for kernel PHY setups, leaving them
out-of-sync during reset recovery.

Clean this up by routing all copper port configurations through
ops->set_phy_link_ksettings(), and perform driver-level or kernel-level
PHY arbitration inside hclge_set_phy_link_ksettings() via
hnae3_dev_phy_imp_supported(). This ensures that the user's intended link
profiles (req_speed, req_duplex, req_autoneg) are uniformly recorded
across all copper and fiber deployment topologies, laying the groundwork
for stable reset recovery.

For copper ports where neither IMP firmware nor a kernel PHY is available
(e.g. PHY_INEXISTENT), hclge_set_phy_link_ksettings() returns -ENODEV.
In hns3_set_link_ksettings(), this is caught so the configuration falls
through to the existing MAC-level path (check_ksettings_param ->
cfg_mac_speed_dup_h), preserving compatibility with PHY-less copper
deployments.

Signed-off-by: Shuaisong Yang 
Signed-off-by: Jijie Shao 
Link: https://patch.msgid.link/20260624141319.271439-2-shaojijie@huawei.com
Signed-off-by: Jakub Kicinski

net: mana: Optimize irq affinity for low vcpu configs

2026-06-25T16:10:35+00:00

Before the commit 755391121038 ("net: mana: Allocate MSI-X vectors
dynamically"), all the MANA IRQs were assigned statically and together
during early driver load.

After this commit, the IRQ allocation for MANA was done in two phases.
HWC IRQ allocated earlier and then, queue IRQs dynamically added at a
later point. By this time, the IRQ weights on vCPUs can become imbalanced
and if IRQ count is greater than the vCPU count the topology aware IRQ
distribution logic in MANA can cause multiple MANA IRQs to land on the
same vCPUs, while other sibling vCPUs have none (case 1).

On SMP enabled, low-vCPU systems, this becomes a bigger problem as the
softIRQ handling overhead of two IRQs on the same vCPUs becomes much more
than their overheads if they were spread across sibling vCPUs.

In such cases when many parallel TCP connections are tested, the
throughput drops significantly.

Fix the affinity assignment logic, in cases where the IRQ count is greater
than the vCPU count and when IRQs are added dynamically, by utilizing all
the vCPUs irrespective of their NUMA/core bindings (case 2).

The results of setting the affinity and hint to NULL were also studied,
and we observed that, with this logic if there are pre-existing IRQs
allocated on the VM (apart from MANA), during MANA IRQs allocation, it
leads to clustering of the MANA queue IRQs again (case 3).

=======================================================
Case 1: without this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

	TYPE		effective vCPU aff
=======================================================
IRQ0:	HWC		0
IRQ1:	mana_q1		0
IRQ2:	mana_q2		2
IRQ3:	mana_q3		0
IRQ4:	mana_q4		3

%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU		0	1	2	3
=======================================================
pass 1:		38.85	0.03	24.89	24.65
pass 2:		39.15	0.03	24.57	25.28
pass 3:		40.36	0.03	23.20	23.17

=======================================================
Case 2: with this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

        TYPE            effective vCPU aff
=======================================================
IRQ0:   HWC             0
IRQ1:   mana_q1         0
IRQ2:   mana_q2         1
IRQ3:   mana_q3         2
IRQ4:   mana_q4         3

%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU            0       1       2       3
=======================================================
pass 1:         15.42	15.85	14.99	14.51
pass 2:         15.53	15.94	15.81	15.93
pass 3:         16.41	16.35	16.40	16.36

=======================================================
Case 3: with affinity set to NULL
=======================================================
4 vCPU(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

	TYPE		effective vCPU aff
=======================================================
IRQ0:	HWC			0
IRQ1:	mana_q1			2
IRQ2:	mana_q2			3
IRQ3:	mana_q3			2
IRQ4:	mana_q4			3

=======================================================
Throughput Impact(in Gbps, same env)
=======================================================
TCP conn	with patch	w/o patch	aff NULL
20480		15.65		7.73		5.25
10240		15.63		8.93		5.77
8192		15.64		9.69		7.16
6144		15.64		13.16		9.33
4096		15.69		15.75		13.50
2048		15.69		15.83		13.61
1024		15.71		15.28		13.60

Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
Cc: stable@vger.kernel.org
Co-developed-by: Erni Sri Satya Vennela 
Signed-off-by: Erni Sri Satya Vennela 
Signed-off-by: Shradha Gupta 
Reviewed-by: Haiyang Zhang 
Reviewed-by: Simon Horman 
Reviewed-by: Yury Norov 
Link: https://patch.msgid.link/20260624072138.1632849-1-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski

net: sparx5: unregister blocking notifier on init failure

2026-06-25T15:54:04+00:00

sparx5_register_notifier_blocks() registers the switchdev blocking
notifier before allocating the ordered workqueue. If the workqueue
allocation fails, the error path unregisters the switchdev and netdevice
notifiers, but leaves the blocking notifier registered.

Add a separate error label for the workqueue allocation failure path and
unregister the switchdev blocking notifier there.

Fixes: d6fce5141929 ("net: sparx5: add switching support")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li 
Reviewed-by: Simon Horman 
Link: https://patch.msgid.link/20260623115714.2192074-1-haoxiang_li2024@163.com
Signed-off-by: Jakub Kicinski

octeontx2-af: Free BPID bitmap on setup failure

2026-06-25T15:47:59+00:00

nix_setup_bpids() allocates bp->bpids with rvu_alloc_bitmap(), which uses
a plain kcalloc(). If any of the following devm_kcalloc() allocations for
the BPID mapping arrays fails, the function returns without freeing the
bitmap. Free the BPID bitmap before returning from those error paths.

Fixes: d6212d2e41a0 ("octeontx2-af: Create BPIDs free pool")
Cc: stable@vger.kernel.org
Signed-off-by: Haoxiang Li 
Reviewed-by: Simon Horman 
Link: https://patch.msgid.link/20260623114316.2182271-1-haoxiang_li2024@163.com
Signed-off-by: Jakub Kicinski

net: enetc: fix potential divide-by-zero when num_vsi is zero

2026-06-25T15:40:08+00:00

For i.MX94 series, all the standalone ENETCs do not support SR-IOV, so
pf->caps.num_vsi is zero. This leads to a divide-by-zero in
enetc4_default_rings_allocation() when distributing rings among PF and
VFs.

Division by zero is undefined behavior in C. On ARM64, the UDIV/SDIV
instructions silently return zero rather than raising an exception, so
the issue does not cause a visible crash. However, relying on this
behavior is incorrect and poses a cross-platform compatibility risk.

Add an explicit check for num_vsi == 0 and return early after the PF's
rings have been configured.

Fixes: 2d673b0e2f8d ("net: enetc: add standalone ENETC support for i.MX94")
Signed-off-by: Wei Fang 
Reviewed-by: Maxime Chevallier 
Link: https://patch.msgid.link/20260624072726.1238903-1-wei.fang@oss.nxp.com
Signed-off-by: Jakub Kicinski