<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/xfrm, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>xfrm_output: Force software GSO only in tunnel mode</title>
<updated>2025-02-21T07:20:06+00:00</updated>
<author>
<name>Cosmin Ratiu</name>
<email>cratiu@nvidia.com</email>
</author>
<published>2025-02-19T10:52:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0aae2867aa6067f73d066bc98385e23c8454a1d7'/>
<id>0aae2867aa6067f73d066bc98385e23c8454a1d7</id>
<content type='text'>
The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.

This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb-&gt;inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.

Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -&gt; iptunnel_handle_offloads sets inner_protocol and
  inner network header.
- then, udp_tunnel_xmit_skb -&gt; ip_tunnel_xmit adds outer transport and
  network headers.
- later in the xmit path, xfrm_output -&gt; xfrm_outer_mode_output -&gt;
  xfrm4_prepare_output -&gt; xfrm4_tunnel_encap_add overwrites the inner
  network header with the one set in ip_tunnel_xmit before adding the
  second outer header.
- __dev_queue_xmit -&gt; validate_xmit_skb checks whether GSO segmentation
  needs to happen based on dev features. In the original bug, the hw
  couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
  tries to use the wrong inner network header, expecting the one set in
  iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
  wrong inner network header.

With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.

Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Reviewed-by: Yael Chemla &lt;ychemla@nvidia.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.

This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb-&gt;inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.

Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -&gt; iptunnel_handle_offloads sets inner_protocol and
  inner network header.
- then, udp_tunnel_xmit_skb -&gt; ip_tunnel_xmit adds outer transport and
  network headers.
- later in the xmit path, xfrm_output -&gt; xfrm_outer_mode_output -&gt;
  xfrm4_prepare_output -&gt; xfrm4_tunnel_encap_add overwrites the inner
  network header with the one set in ip_tunnel_xmit before adding the
  second outer header.
- __dev_queue_xmit -&gt; validate_xmit_skb checks whether GSO segmentation
  needs to happen based on dev features. In the original bug, the hw
  couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
  tries to use the wrong inner network header, expecting the one set in
  iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
  wrong inner network header.

With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.

Reviewed-by: Dragos Tatulea &lt;dtatulea@nvidia.com&gt;
Reviewed-by: Yael Chemla &lt;ychemla@nvidia.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output")
Signed-off-by: Cosmin Ratiu &lt;cratiu@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: fix tunnel mode TX datapath in packet offload mode</title>
<updated>2025-02-21T07:13:12+00:00</updated>
<author>
<name>Alexandre Cassen</name>
<email>acassen@corp.free.fr</email>
</author>
<published>2025-02-19T10:20:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5eddd76ec2fd1988f0a3450fde9730b10dd22992'/>
<id>5eddd76ec2fd1988f0a3450fde9730b10dd22992</id>
<content type='text'>
Packets that match the output xfrm policy are delivered to the netstack.
In IPsec packet mode for tunnel mode, the HW is responsible for building
the hard header and outer IP header. In such a situation, the inner
header may refer to a network that is not directly reachable by the host,
resulting in a failed neighbor resolution. The packet is then dropped.
xfrm policy defines the netdevice to use for xmit so we can send packets
directly to it.

Makes direct xmit exclusive to tunnel mode, since some rules may apply
in transport mode.

Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode")
Signed-off-by: Alexandre Cassen &lt;acassen@corp.free.fr&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Packets that match the output xfrm policy are delivered to the netstack.
In IPsec packet mode for tunnel mode, the HW is responsible for building
the hard header and outer IP header. In such a situation, the inner
header may refer to a network that is not directly reachable by the host,
resulting in a failed neighbor resolution. The packet is then dropped.
xfrm policy defines the netdevice to use for xmit so we can send packets
directly to it.

Makes direct xmit exclusive to tunnel mode, since some rules may apply
in transport mode.

Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode")
Signed-off-by: Alexandre Cassen &lt;acassen@corp.free.fr&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ipsec-2025-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec</title>
<updated>2025-01-27T23:15:12+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-01-27T23:15:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=463ec95a162d5036f81831c2cd8d6d661b1e7b9f'/>
<id>463ec95a162d5036f81831c2cd8d6d661b1e7b9f</id>
<content type='text'>
Steffen Klassert says:

====================
pull request (net): ipsec 2025-01-27

1) Fix incrementing the upper 32 bit sequence numbers for GSO skbs.
   From Jianbo Liu.

2) Fix an out-of-bounds read on xfrm state lookup.
   From Florian Westphal.

3) Fix secpath handling on packet offload mode.
   From Alexandre Cassen.

4) Fix the usage of skb-&gt;sk in the xfrm layer.

5) Don't disable preemption while looking up cache state
   to fix PREEMPT_RT.
   From Sebastian Sewior.

* tag 'ipsec-2025-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: Don't disable preemption while looking up cache state.
  xfrm: Fix the usage of skb-&gt;sk
  xfrm: delete intermediate secpath entry in packet offload mode
  xfrm: state: fix out-of-bounds read during lookup
  xfrm: replay: Fix the update of replay_esn-&gt;oseq_hi for GSO
====================

Link: https://patch.msgid.link/20250127060757.3946314-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Steffen Klassert says:

====================
pull request (net): ipsec 2025-01-27

1) Fix incrementing the upper 32 bit sequence numbers for GSO skbs.
   From Jianbo Liu.

2) Fix an out-of-bounds read on xfrm state lookup.
   From Florian Westphal.

3) Fix secpath handling on packet offload mode.
   From Alexandre Cassen.

4) Fix the usage of skb-&gt;sk in the xfrm layer.

5) Don't disable preemption while looking up cache state
   to fix PREEMPT_RT.
   From Sebastian Sewior.

* tag 'ipsec-2025-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: Don't disable preemption while looking up cache state.
  xfrm: Fix the usage of skb-&gt;sk
  xfrm: delete intermediate secpath entry in packet offload mode
  xfrm: state: fix out-of-bounds read during lookup
  xfrm: replay: Fix the update of replay_esn-&gt;oseq_hi for GSO
====================

Link: https://patch.msgid.link/20250127060757.3946314-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: Don't disable preemption while looking up cache state.</title>
<updated>2025-01-24T06:46:11+00:00</updated>
<author>
<name>Sebastian Sewior</name>
<email>bigeasy@linutronix.de</email>
</author>
<published>2025-01-23T16:20:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6c9b7db96db62ee9ad8d359d90ff468d462518c4'/>
<id>6c9b7db96db62ee9ad8d359d90ff468d462518c4</id>
<content type='text'>
For the state cache lookup xfrm_input_state_lookup() first disables
preemption, to remain on the CPU and then retrieves a per-CPU pointer.
Within the preempt-disable section it also acquires
netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
acquired with explicit disabled preemption (such as by get_cpu())
because this lock becomes a sleeping lock on PREEMPT_RT.

To remain on the same CPU is just an optimisation for the CPU local
lookup. The actual modification of the per-CPU variable happens with
netns_xfrm::xfrm_state_lock acquired.

Remove get_cpu() and use the state_cache_input on the current CPU.

Reported-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/
Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.")
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For the state cache lookup xfrm_input_state_lookup() first disables
preemption, to remain on the CPU and then retrieves a per-CPU pointer.
Within the preempt-disable section it also acquires
netns_xfrm::xfrm_state_lock, a spinlock_t. This lock must not be
acquired with explicit disabled preemption (such as by get_cpu())
because this lock becomes a sleeping lock on PREEMPT_RT.

To remain on the same CPU is just an optimisation for the CPU local
lookup. The actual modification of the per-CPU variable happens with
netns_xfrm::xfrm_state_lock acquired.

Remove get_cpu() and use the state_cache_input on the current CPU.

Reported-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Closes: https://lore.kernel.org/all/CAADnVQKkCLaj=roayH=Mjiiqz_svdf1tsC3OE4EC0E=mAD+L1A@mail.gmail.com/
Fixes: 81a331a0e72dd ("xfrm: Add an inbound percpu state cache.")
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: Fix the usage of skb-&gt;sk</title>
<updated>2025-01-20T06:06:53+00:00</updated>
<author>
<name>Steffen Klassert</name>
<email>steffen.klassert@secunet.com</email>
</author>
<published>2025-01-16T10:46:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1620c88887b16940e00dbe57dd38c74eda9bad9e'/>
<id>1620c88887b16940e00dbe57dd38c74eda9bad9e</id>
<content type='text'>
xfrm assumed to always have a full socket at skb-&gt;sk.
This is not always true, so fix it by converting to a
full socket before it is used.

Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
xfrm assumed to always have a full socket at skb-&gt;sk.
This is not always true, so fix it by converting to a
full socket before it is used.

Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: remove init_dummy_netdev()</title>
<updated>2025-01-14T03:06:51+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2025-01-13T00:34:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f835bdae716751fa20451508150e5fdd5f5b2be3'/>
<id>f835bdae716751fa20451508150e5fdd5f5b2be3</id>
<content type='text'>
init_dummy_netdev() can initialize statically declared or embedded
net_devices. Such netdevs did not come from alloc_netdev_mqs().
After recent work by Breno, there are the only two cases where
we have do that.

Switch those cases to alloc_netdev_mqs() and delete init_dummy_netdev().
Dealing with static netdevs is not worth the maintenance burden.

Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Joe Damato &lt;jdamato@fastly.com&gt;
Link: https://patch.msgid.link/20250113003456.3904110-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
init_dummy_netdev() can initialize statically declared or embedded
net_devices. Such netdevs did not come from alloc_netdev_mqs().
After recent work by Breno, there are the only two cases where
we have do that.

Switch those cases to alloc_netdev_mqs() and delete init_dummy_netdev().
Dealing with static netdevs is not worth the maintenance burden.

Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Joe Damato &lt;jdamato@fastly.com&gt;
Link: https://patch.msgid.link/20250113003456.3904110-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: Support ESN context update to hardware for TX</title>
<updated>2025-01-07T12:12:11+00:00</updated>
<author>
<name>Jianbo Liu</name>
<email>jianbol@nvidia.com</email>
</author>
<published>2024-12-19T12:37:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=373b79af3a209e25e011c5980cb59ed3252fa2f0'/>
<id>373b79af3a209e25e011c5980cb59ed3252fa2f0</id>
<content type='text'>
Previously xfrm_dev_state_advance_esn() was added for RX only. But
it's possible that ESN context also need to be synced to hardware for
TX, so call it for outbound in this patch.

Signed-off-by: Jianbo Liu &lt;jianbol@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously xfrm_dev_state_advance_esn() was added for RX only. But
it's possible that ESN context also need to be synced to hardware for
TX, so call it for outbound in this patch.

Signed-off-by: Jianbo Liu &lt;jianbol@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: add tracepoint functionality</title>
<updated>2024-12-05T09:02:36+00:00</updated>
<author>
<name>Christian Hopps</name>
<email>chopps@labn.net</email>
</author>
<published>2024-11-14T07:07:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ed58b186c7737bf0db1ebf57207b30fe740e1d07'/>
<id>ed58b186c7737bf0db1ebf57207b30fe740e1d07</id>
<content type='text'>
Add tracepoints to the IP-TFS code.

Signed-off-by: Christian Hopps &lt;chopps@labn.net&gt;
Tested-by: Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add tracepoints to the IP-TFS code.

Signed-off-by: Christian Hopps &lt;chopps@labn.net&gt;
Tested-by: Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: handle reordering of received packets</title>
<updated>2024-12-05T09:02:29+00:00</updated>
<author>
<name>Christian Hopps</name>
<email>chopps@labn.net</email>
</author>
<published>2024-11-14T07:07:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6be02e3e4f376fea468846c8562655ca5ee18204'/>
<id>6be02e3e4f376fea468846c8562655ca5ee18204</id>
<content type='text'>
Handle the receipt of the outer tunnel packets out-of-order. Pointers to
the out-of-order packets are saved in a window (array) awaiting needed
prior packets. When the required prior packets are received the now
in-order packets are then passed on to the regular packet receive code.
A timer is used to consider missing earlier packet as lost so the
algorithm will advance.

Signed-off-by: Christian Hopps &lt;chopps@labn.net&gt;
Tested-by: Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Handle the receipt of the outer tunnel packets out-of-order. Pointers to
the out-of-order packets are saved in a window (array) awaiting needed
prior packets. When the required prior packets are received the now
in-order packets are then passed on to the regular packet receive code.
A timer is used to consider missing earlier packet as lost so the
algorithm will advance.

Signed-off-by: Christian Hopps &lt;chopps@labn.net&gt;
Tested-by: Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: iptfs: add skb-fragment sharing code</title>
<updated>2024-12-05T09:02:22+00:00</updated>
<author>
<name>Christian Hopps</name>
<email>chopps@labn.net</email>
</author>
<published>2024-11-14T07:07:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f2b6a9095743a6bf1f34c43c4fe78fa8bdf5ad7'/>
<id>5f2b6a9095743a6bf1f34c43c4fe78fa8bdf5ad7</id>
<content type='text'>
Avoid copying the inner packet data by sharing the skb data fragments
from the output packet skb into new inner packet skb.

Signed-off-by: Christian Hopps &lt;chopps@labn.net&gt;
Tested-by: Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Avoid copying the inner packet data by sharing the skb data fragments
from the output packet skb into new inner packet skb.

Signed-off-by: Christian Hopps &lt;chopps@labn.net&gt;
Tested-by: Antony Antony &lt;antony.antony@secunet.com&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
