<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/net/bonding, branch linux-6.16.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>bonding: don't set oif to bond dev when getting NS target destination</title>
<updated>2025-09-25T09:16:44+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2025-09-16T08:01:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e2019c7d3a3e31833bf4c8d7d3258a54e99653b8'/>
<id>e2019c7d3a3e31833bf4c8d7d3258a54e99653b8</id>
<content type='text'>
[ Upstream commit a8ba87f04ca9cdec06776ce92dce1395026dc3bb ]

Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.

Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.

Reported-by: David Wilder &lt;wilder@us.ibm.com&gt;
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh &lt;jv@jvosburgh.net&gt;
Tested-by: David Wilder &lt;wilder@us.ibm.com&gt;
Acked-by: Jay Vosburgh &lt;jv@jvosburgh.net&gt;
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a8ba87f04ca9cdec06776ce92dce1395026dc3bb ]

Unlike IPv4, IPv6 routing strictly requires the source address to be valid
on the outgoing interface. If the NS target is set to a remote VLAN interface,
and the source address is also configured on a VLAN over a bond interface,
setting the oif to the bond device will fail to retrieve the correct
destination route.

Fix this by not setting the oif to the bond device when retrieving the NS
target destination. This allows the correct destination device (the VLAN
interface) to be determined, so that bond_verify_device_path can return the
proper VLAN tags for sending NS messages.

Reported-by: David Wilder &lt;wilder@us.ibm.com&gt;
Closes: https://lore.kernel.org/netdev/aGOKggdfjv0cApTO@fedora/
Suggested-by: Jay Vosburgh &lt;jv@jvosburgh.net&gt;
Tested-by: David Wilder &lt;wilder@us.ibm.com&gt;
Acked-by: Jay Vosburgh &lt;jv@jvosburgh.net&gt;
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250916080127.430626-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: set random address only when slaves already exist</title>
<updated>2025-09-25T09:16:43+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2025-09-10T02:43:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f3439ed58636f822885d0f0647070a6da0b8e855'/>
<id>f3439ed58636f822885d0f0647070a6da0b8e855</id>
<content type='text'>
[ Upstream commit 35ae4e86292ef7dfe4edbb9942955c884e984352 ]

After commit 5c3bf6cba791 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.

Correct this by additionally testing for the bond being empty before
randomizing the MAC.

Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren &lt;qren@redhat.com&gt;
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 35ae4e86292ef7dfe4edbb9942955c884e984352 ]

After commit 5c3bf6cba791 ("bonding: assign random address if device
address is same as bond"), bonding will erroneously randomize the MAC
address of the first interface added to the bond if fail_over_mac =
follow.

Correct this by additionally testing for the bond being empty before
randomizing the MAC.

Fixes: 5c3bf6cba791 ("bonding: assign random address if device address is same as bond")
Reported-by: Qiuling Ren &lt;qren@redhat.com&gt;
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250910024336.400253-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: send LACPDUs periodically in passive mode after receiving partner's LACPDU</title>
<updated>2025-08-28T14:34:49+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2025-08-15T06:19:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4f58c0bfd264a2768d3b5f00b52d37151e4085f2'/>
<id>4f58c0bfd264a2768d3b5f00b52d37151e4085f2</id>
<content type='text'>
[ Upstream commit 0599640a21e98f0d6a3e9ff85c0a687c90a8103b ]

When `lacp_active` is set to `off`, the bond operates in passive mode, meaning
it only "speaks when spoken to." However, the current kernel implementation
only sends an LACPDU in response when the partner's state changes.

As a result, once LACP negotiation succeeds, the actor stops sending LACPDUs
until the partner times out and sends an "expired" LACPDU. This causes
continuous LACP state flapping.

According to IEEE 802.1AX-2014, 6.4.13 Periodic Transmission machine. The
values of Partner_Oper_Port_State.LACP_Activity and
Actor_Oper_Port_State.LACP_Activity determine whether periodic transmissions
take place. If either or both parameters are set to Active LACP, then periodic
transmissions occur; if both are set to Passive LACP, then periodic
transmissions do not occur.

To comply with this, we remove the `!bond-&gt;params.lacp_active` check in
`ad_periodic_machine()`. Instead, we initialize the actor's port's
`LACP_STATE_LACP_ACTIVITY` state based on `lacp_active` setting.

Additionally, we avoid setting the partner's state to
`LACP_STATE_LACP_ACTIVITY` in the EXPIRED state, since we should not assume
the partner is active by default.

This ensures that in passive mode, the bond starts sending periodic LACPDUs
after receiving one from the partner, and avoids flapping due to inactivity.

Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250815062000.22220-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 0599640a21e98f0d6a3e9ff85c0a687c90a8103b ]

When `lacp_active` is set to `off`, the bond operates in passive mode, meaning
it only "speaks when spoken to." However, the current kernel implementation
only sends an LACPDU in response when the partner's state changes.

As a result, once LACP negotiation succeeds, the actor stops sending LACPDUs
until the partner times out and sends an "expired" LACPDU. This causes
continuous LACP state flapping.

According to IEEE 802.1AX-2014, 6.4.13 Periodic Transmission machine. The
values of Partner_Oper_Port_State.LACP_Activity and
Actor_Oper_Port_State.LACP_Activity determine whether periodic transmissions
take place. If either or both parameters are set to Active LACP, then periodic
transmissions occur; if both are set to Passive LACP, then periodic
transmissions do not occur.

To comply with this, we remove the `!bond-&gt;params.lacp_active` check in
`ad_periodic_machine()`. Instead, we initialize the actor's port's
`LACP_STATE_LACP_ACTIVITY` state based on `lacp_active` setting.

Additionally, we avoid setting the partner's state to
`LACP_STATE_LACP_ACTIVITY` in the EXPIRED state, since we should not assume
the partner is active by default.

This ensures that in passive mode, the bond starts sending periodic LACPDUs
after receiving one from the partner, and avoids flapping due to inactivity.

Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250815062000.22220-3-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: update LACP activity flag after setting lacp_active</title>
<updated>2025-08-28T14:34:49+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2025-08-15T06:19:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3310f0ebba5ebcc9b971f939155f2cd08c013a6a'/>
<id>3310f0ebba5ebcc9b971f939155f2cd08c013a6a</id>
<content type='text'>
[ Upstream commit b64d035f77b1f02ab449393342264b44950a75ae ]

The port's actor_oper_port_state activity flag should be updated immediately
after changing the lacp_active option to reflect the current mode correctly.

Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250815062000.22220-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit b64d035f77b1f02ab449393342264b44950a75ae ]

The port's actor_oper_port_state activity flag should be updated immediately
after changing the lacp_active option to reflect the current mode correctly.

Fixes: 3a755cd8b7c6 ("bonding: add new option lacp_active")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Link: https://patch.msgid.link/20250815062000.22220-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: core: Convert dev_set_mac_address() to struct sockaddr_storage</title>
<updated>2025-05-27T06:25:43+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2025-05-21T20:46:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9ca6804ab7c34f65fcf2e29333a39e7807c30b60'/>
<id>9ca6804ab7c34f65fcf2e29333a39e7807c30b60</id>
<content type='text'>
All users of dev_set_mac_address() are now using a struct sockaddr_storage.
Convert the internal data type to struct sockaddr_storage, drop the casts,
and update pointer types.

Acked-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
Link: https://patch.msgid.link/20250521204619.2301870-6-kees@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All users of dev_set_mac_address() are now using a struct sockaddr_storage.
Convert the internal data type to struct sockaddr_storage, drop the casts,
and update pointer types.

Acked-by: Gustavo A. R. Silva &lt;gustavoars@kernel.org&gt;
Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
Link: https://patch.msgid.link/20250521204619.2301870-6-kees@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next</title>
<updated>2025-05-26T16:32:48+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2025-05-26T16:30:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fdb061195f53e5b6d12595fc32a1a9c1130f0c23'/>
<id>fdb061195f53e5b6d12595fc32a1a9c1130f0c23</id>
<content type='text'>
Steffen Klassert says:

====================
1) Remove some unnecessary strscpy_pad() size arguments.
   From Thorsten Blum.

2) Correct use of xso.real_dev on bonding offloads.
   Patchset from Cosmin Ratiu.

3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
   From Chiachang Wang.

4) Refactor migration setup during cloning. This was
   done after the clone was created. Now it is done
   in the cloning function itself.
   From Chiachang Wang.

5) Validate assignment of maximal possible SEQ number.
   Prevent from setting to the maximum sequrnce number
   as this would cause for traffic drop.
   From Leon Romanovsky.

6) Prevent configuration of interface index when offload
   is used. Hardware can't handle this case.i
   From Leon Romanovsky.

7) Always use kfree_sensitive() for SA secret zeroization.
   From Zilin Guan.

ipsec-next-2025-05-23

* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: use kfree_sensitive() for SA secret zeroization
  xfrm: prevent configuration of interface index when offload is used
  xfrm: validate assignment of maximal possible SEQ number
  xfrm: Refactor migration setup during the cloning process
  xfrm: Migrate offload configuration
  bonding: Fix multiple long standing offload races
  bonding: Mark active offloaded xfrm_states
  xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  xfrm: Remove unneeded device check from validate_xmit_xfrm
  xfrm: Use xdo.dev instead of xdo.real_dev
  net/mlx5: Avoid using xso.real_dev unnecessarily
  xfrm: Remove unnecessary strscpy_pad() size arguments
====================

Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Steffen Klassert says:

====================
1) Remove some unnecessary strscpy_pad() size arguments.
   From Thorsten Blum.

2) Correct use of xso.real_dev on bonding offloads.
   Patchset from Cosmin Ratiu.

3) Add hardware offload configuration to XFRM_MSG_MIGRATE.
   From Chiachang Wang.

4) Refactor migration setup during cloning. This was
   done after the clone was created. Now it is done
   in the cloning function itself.
   From Chiachang Wang.

5) Validate assignment of maximal possible SEQ number.
   Prevent from setting to the maximum sequrnce number
   as this would cause for traffic drop.
   From Leon Romanovsky.

6) Prevent configuration of interface index when offload
   is used. Hardware can't handle this case.i
   From Leon Romanovsky.

7) Always use kfree_sensitive() for SA secret zeroization.
   From Zilin Guan.

ipsec-next-2025-05-23

* tag 'ipsec-next-2025-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
  xfrm: use kfree_sensitive() for SA secret zeroization
  xfrm: prevent configuration of interface index when offload is used
  xfrm: validate assignment of maximal possible SEQ number
  xfrm: Refactor migration setup during the cloning process
  xfrm: Migrate offload configuration
  bonding: Fix multiple long standing offload races
  bonding: Mark active offloaded xfrm_states
  xfrm: Add explicit dev to .xdo_dev_state_{add,delete,free}
  xfrm: Remove unneeded device check from validate_xmit_xfrm
  xfrm: Use xdo.dev instead of xdo.real_dev
  net/mlx5: Avoid using xso.real_dev unnecessarily
  xfrm: Remove unnecessary strscpy_pad() size arguments
====================

Link: https://patch.msgid.link/20250523075611.3723340-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: assign random address if device address is same as bond</title>
<updated>2025-04-28T11:37:54+00:00</updated>
<author>
<name>Hangbin Liu</name>
<email>liuhangbin@gmail.com</email>
</author>
<published>2025-04-24T04:22:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5c3bf6cba7911f470afd748606be5c03a9512fcc'/>
<id>5c3bf6cba7911f470afd748606be5c03a9512fcc</id>
<content type='text'>
This change addresses a MAC address conflict issue in failover scenarios,
similar to the problem described in commit a951bc1e6ba5 ("bonding: correct
the MAC address for 'follow' fail_over_mac policy").

In fail_over_mac=follow mode, the bonding driver expects the formerly active
slave to swap MAC addresses with the newly active slave during failover.
However, under certain conditions, two slaves may end up with the same MAC
address, which breaks this policy:

1) ip link set eth0 master bond0
   -&gt; bond0 adopts eth0's MAC address (MAC0).

2) ip link set eth1 master bond0
   -&gt; eth1 is added as a backup with its own MAC (MAC1).

3) ip link set eth0 nomaster
   -&gt; eth0 is released and restores its MAC (MAC0).
   -&gt; eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

4) ip link set eth0 master bond0
   -&gt; eth0 is re-added to bond0, now both eth0 and eth1 have MAC0.

This results in a MAC address conflict and violates the expected behavior
of the failover policy.

To fix this, we assign a random MAC address to any newly added slave if
its current MAC address matches that of the bond. The original (permanent)
MAC address is saved and will be restored when the device is released
from the bond.

This ensures that each slave has a unique MAC address during failover
transitions, preserving the integrity of the fail_over_mac=follow policy.

Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Acked-by: Jay Vosburgh &lt;jv@jvosburgh.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change addresses a MAC address conflict issue in failover scenarios,
similar to the problem described in commit a951bc1e6ba5 ("bonding: correct
the MAC address for 'follow' fail_over_mac policy").

In fail_over_mac=follow mode, the bonding driver expects the formerly active
slave to swap MAC addresses with the newly active slave during failover.
However, under certain conditions, two slaves may end up with the same MAC
address, which breaks this policy:

1) ip link set eth0 master bond0
   -&gt; bond0 adopts eth0's MAC address (MAC0).

2) ip link set eth1 master bond0
   -&gt; eth1 is added as a backup with its own MAC (MAC1).

3) ip link set eth0 nomaster
   -&gt; eth0 is released and restores its MAC (MAC0).
   -&gt; eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

4) ip link set eth0 master bond0
   -&gt; eth0 is re-added to bond0, now both eth0 and eth1 have MAC0.

This results in a MAC address conflict and violates the expected behavior
of the failover policy.

To fix this, we assign a random MAC address to any newly added slave if
its current MAC address matches that of the bond. The original (permanent)
MAC address is saved and will be restored when the device is released
from the bond.

This ensures that each slave has a unique MAC address during failover
transitions, preserving the integrity of the fail_over_mac=follow policy.

Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Signed-off-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Acked-by: Jay Vosburgh &lt;jv@jvosburgh.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2025-04-17T19:26:50+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-04-17T19:23:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=240ce924d2718b8f6f622f2a9a9c219b9da736e8'/>
<id>240ce924d2718b8f6f622f2a9a9c219b9da736e8</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-6.15-rc3).

No conflicts. Adjacent changes:

tools/net/ynl/pyynl/ynl_gen_c.py
  4d07bbf2d456 ("tools: ynl-gen: don't declare loop iterator in place")
  7e8ba0c7de2b ("tools: ynl: don't use genlmsghdr in classic netlink")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cross-merge networking fixes after downstream PR (net-6.15-rc3).

No conflicts. Adjacent changes:

tools/net/ynl/pyynl/ynl_gen_c.py
  4d07bbf2d456 ("tools: ynl-gen: don't declare loop iterator in place")
  7e8ba0c7de2b ("tools: ynl: don't use genlmsghdr in classic netlink")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: Fix multiple long standing offload races</title>
<updated>2025-04-16T09:02:49+00:00</updated>
<author>
<name>Cosmin Ratiu</name>
<email>cratiu@nvidia.com</email>
</author>
<published>2025-04-11T07:49:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d2fddbd3479928e52061e1c8dd302006b6283ce8'/>
<id>d2fddbd3479928e52061e1c8dd302006b6283ce8</id>
<content type='text'>
Refactor the bonding ipsec offload operations to fix a number of
long-standing control plane races between state migration and user
deletion and a few other issues.

xfrm state deletion can happen concurrently with
bond_change_active_slave() operation. This manifests itself as a
bond_ipsec_del_sa() call with x-&gt;lock held, followed by a
bond_ipsec_free_sa() a bit later from a wq. The alternate path of
these calls coming from xfrm_dev_state_flush() can't happen, as that
needs the RTNL lock and bond_change_active_slave() already holds it.

1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second
   time on an xfrm state that was concurrently killed. This is bad.
2. bond_ipsec_add_sa_all() can add a state on the new device, but
   pending bond_ipsec_free_sa() calls from the old device will then hit
   the WARN_ON() and then, worse, call xdo_dev_state_free() on the new
   device without a corresponding xdo_dev_state_delete().
3. Resolve a sleeping in atomic context introduced by the mentioned
   "Fixes" commit.

bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x-&gt;lock
and check for x-&gt;km.state to help with problems 1 and 2. And since
xso.real_dev is now a private pointer managed by the bonding driver in
xfrm state, make better use of it to fully fix problems 1 and 2. In
bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the
mutex and x-&gt;lock, which makes sure that neither bond_ipsec_del_sa() nor
bond_ipsec_free_sa() could run concurrently.

Fix problem 3 by moving the list cleanup (which requires the mutex) from
bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()

Finally, simplify bond_ipsec_del_sa() and bond_ipsec_free_sa() by using
xso-&gt;real_dev directly, since it's now protected by locks and can be
trusted to always reflect the offload device.

Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Reviewed-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Tested-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Refactor the bonding ipsec offload operations to fix a number of
long-standing control plane races between state migration and user
deletion and a few other issues.

xfrm state deletion can happen concurrently with
bond_change_active_slave() operation. This manifests itself as a
bond_ipsec_del_sa() call with x-&gt;lock held, followed by a
bond_ipsec_free_sa() a bit later from a wq. The alternate path of
these calls coming from xfrm_dev_state_flush() can't happen, as that
needs the RTNL lock and bond_change_active_slave() already holds it.

1. bond_ipsec_del_sa_all() might call xdo_dev_state_delete() a second
   time on an xfrm state that was concurrently killed. This is bad.
2. bond_ipsec_add_sa_all() can add a state on the new device, but
   pending bond_ipsec_free_sa() calls from the old device will then hit
   the WARN_ON() and then, worse, call xdo_dev_state_free() on the new
   device without a corresponding xdo_dev_state_delete().
3. Resolve a sleeping in atomic context introduced by the mentioned
   "Fixes" commit.

bond_ipsec_del_sa_all() and bond_ipsec_add_sa_all() now acquire x-&gt;lock
and check for x-&gt;km.state to help with problems 1 and 2. And since
xso.real_dev is now a private pointer managed by the bonding driver in
xfrm state, make better use of it to fully fix problems 1 and 2. In
bond_ipsec_del_sa_all(), set xso.real_dev to NULL while holding both the
mutex and x-&gt;lock, which makes sure that neither bond_ipsec_del_sa() nor
bond_ipsec_free_sa() could run concurrently.

Fix problem 3 by moving the list cleanup (which requires the mutex) from
bond_ipsec_del_sa() (called from atomic context) to bond_ipsec_free_sa()

Finally, simplify bond_ipsec_del_sa() and bond_ipsec_free_sa() by using
xso-&gt;real_dev directly, since it's now protected by locks and can be
trusted to always reflect the offload device.

Fixes: 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Reviewed-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Tested-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: Mark active offloaded xfrm_states</title>
<updated>2025-04-16T09:01:58+00:00</updated>
<author>
<name>Cosmin Ratiu</name>
<email>cratiu@nvidia.com</email>
</author>
<published>2025-04-11T07:49:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fd4e41ebf66cb8b43de2f640b97314c4ee3b4499'/>
<id>fd4e41ebf66cb8b43de2f640b97314c4ee3b4499</id>
<content type='text'>
When the active link is changed for a bond device, the existing xfrm
states need to be migrated over to the new link. This is done with:
- bond_ipsec_del_sa_all() goes through the offloaded states list and
  removes all of them from hw.
- bond_ipsec_add_sa_all() re-offloads all states to the new device.

But because the offload status of xfrm states isn't marked in any way,
there can be bugs.

When all bond links are down, bond_ipsec_del_sa_all() unoffloads
everything from the previous active link. If the same link then comes
back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
This results in a stack trace like this a bit later when user space
removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
remove a rule that's no longer offloaded:

 [] Call Trace:
 []  &lt;TASK&gt;
 []  ? __warn+0x7d/0x110
 []  ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
 []  ? report_bug+0x16d/0x180
 []  ? handle_bug+0x4f/0x90
 []  ? exc_invalid_op+0x14/0x70
 []  ? asm_exc_invalid_op+0x16/0x20
 []  ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
 []  ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
 []  bond_ipsec_del_sa+0x1ab/0x200 [bonding]
 []  xfrm_dev_state_delete+0x1f/0x60
 []  __xfrm_state_delete+0x196/0x200
 []  xfrm_state_delete+0x21/0x40
 []  xfrm_del_sa+0x69/0x110
 []  xfrm_user_rcv_msg+0x11d/0x300
 []  ? release_pages+0xca/0x140
 []  ? copy_to_user_tmpl.part.0+0x110/0x110
 []  netlink_rcv_skb+0x54/0x100
 []  xfrm_netlink_rcv+0x31/0x40
 []  netlink_unicast+0x1fc/0x2d0
 []  netlink_sendmsg+0x1e4/0x410
 []  __sock_sendmsg+0x38/0x60
 []  sock_write_iter+0x94/0xf0
 []  vfs_write+0x338/0x3f0
 []  ksys_write+0xba/0xd0
 []  do_syscall_64+0x4c/0x100
 []  entry_SYSCALL_64_after_hwframe+0x4b/0x53

There's also another theoretical bug:
Calling bond_ipsec_del_sa_all() multiple times can result in corruption
in the driver implementation if the double-free isn't tolerated. This
isn't nice.

Before the "Fixes" commit, xs-&gt;xso.real_dev was set to NULL when an xfrm
state was unoffloaded from a device, but a race with netdevsim's
.xdo_dev_offload_ok() accessing real_dev was considered a sufficient
reason to not set real_dev to NULL anymore. This unfortunately
introduced the new bugs.

Since .xdo_dev_offload_ok() was significantly refactored by [1] and
there are no more users in the stack of xso.real_dev, that
race is now gone and xs-&gt;xso.real_dev can now once again be used to
represent which device (if any) currently holds the offloaded rule.

Go one step further and set real_dev after add/before delete calls, to
catch any future driver misuses of real_dev.

[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Reviewed-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Tested-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the active link is changed for a bond device, the existing xfrm
states need to be migrated over to the new link. This is done with:
- bond_ipsec_del_sa_all() goes through the offloaded states list and
  removes all of them from hw.
- bond_ipsec_add_sa_all() re-offloads all states to the new device.

But because the offload status of xfrm states isn't marked in any way,
there can be bugs.

When all bond links are down, bond_ipsec_del_sa_all() unoffloads
everything from the previous active link. If the same link then comes
back up, nothing gets reoffloaded by bond_ipsec_add_sa_all().
This results in a stack trace like this a bit later when user space
removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to
remove a rule that's no longer offloaded:

 [] Call Trace:
 []  &lt;TASK&gt;
 []  ? __warn+0x7d/0x110
 []  ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
 []  ? report_bug+0x16d/0x180
 []  ? handle_bug+0x4f/0x90
 []  ? exc_invalid_op+0x14/0x70
 []  ? asm_exc_invalid_op+0x16/0x20
 []  ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core]
 []  ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core]
 []  bond_ipsec_del_sa+0x1ab/0x200 [bonding]
 []  xfrm_dev_state_delete+0x1f/0x60
 []  __xfrm_state_delete+0x196/0x200
 []  xfrm_state_delete+0x21/0x40
 []  xfrm_del_sa+0x69/0x110
 []  xfrm_user_rcv_msg+0x11d/0x300
 []  ? release_pages+0xca/0x140
 []  ? copy_to_user_tmpl.part.0+0x110/0x110
 []  netlink_rcv_skb+0x54/0x100
 []  xfrm_netlink_rcv+0x31/0x40
 []  netlink_unicast+0x1fc/0x2d0
 []  netlink_sendmsg+0x1e4/0x410
 []  __sock_sendmsg+0x38/0x60
 []  sock_write_iter+0x94/0xf0
 []  vfs_write+0x338/0x3f0
 []  ksys_write+0xba/0xd0
 []  do_syscall_64+0x4c/0x100
 []  entry_SYSCALL_64_after_hwframe+0x4b/0x53

There's also another theoretical bug:
Calling bond_ipsec_del_sa_all() multiple times can result in corruption
in the driver implementation if the double-free isn't tolerated. This
isn't nice.

Before the "Fixes" commit, xs-&gt;xso.real_dev was set to NULL when an xfrm
state was unoffloaded from a device, but a race with netdevsim's
.xdo_dev_offload_ok() accessing real_dev was considered a sufficient
reason to not set real_dev to NULL anymore. This unfortunately
introduced the new bugs.

Since .xdo_dev_offload_ok() was significantly refactored by [1] and
there are no more users in the stack of xso.real_dev, that
race is now gone and xs-&gt;xso.real_dev can now once again be used to
represent which device (if any) currently holds the offloaded rule.

Go one step further and set real_dev after add/before delete calls, to
catch any future driver misuses of real_dev.

[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/
Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;razor@blackwall.org&gt;
Reviewed-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Tested-by: Hangbin Liu &lt;liuhangbin@gmail.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
