<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/tls, branch v5.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net/tls: Remove the context from the list in tls_device_down</title>
<updated>2022-07-24T20:40:56+00:00</updated>
<author>
<name>Maxim Mikityanskiy</name>
<email>maximmi@nvidia.com</email>
</author>
<published>2022-07-21T09:11:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f6336724a4d4220c89a4ec38bca84b03b178b1a3'/>
<id>f6336724a4d4220c89a4ec38bca84b03b178b1a3</id>
<content type='text'>
tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Reviewed-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tls_device_down takes a reference on all contexts it's going to move to
the degraded state (software fallback). If sk_destruct runs afterwards,
it can reduce the reference counter back to 1 and return early without
destroying the context. Then tls_device_down will release the reference
it took and call tls_device_free_ctx. However, the context will still
stay in tls_device_down_list forever. The list will contain an item,
memory for which is released, making a memory corruption possible.

Fix the above bug by properly removing the context from all lists before
any call to tls_device_free_ctx.

Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down")
Signed-off-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Reviewed-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/tls: Fix race in TLS device down flow</title>
<updated>2022-07-18T10:33:07+00:00</updated>
<author>
<name>Tariq Toukan</name>
<email>tariqt@nvidia.com</email>
</author>
<published>2022-07-15T08:42:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b'/>
<id>f08d8c1bb97c48f24a82afaa2fd8c140f8d3da8b</id>
<content type='text'>
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.

In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
  all the way until it flushes the gc work, and returns without freeing
  the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
  work and frees the context's device resources.

Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization.  This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Socket destruction flow and tls_device_down function sync against each
other using tls_device_lock and the context refcount, to guarantee the
device resources are freed via tls_dev_del() by the end of
tls_device_down.

In the following unfortunate flow, this won't happen:
- refcount is decreased to zero in tls_device_sk_destruct.
- tls_device_down starts, skips the context as refcount is zero, going
  all the way until it flushes the gc work, and returns without freeing
  the device resources.
- only then, tls_device_queue_ctx_destruction is called, queues the gc
  work and frees the context's device resources.

Solve it by decreasing the refcount in the socket's destruction flow
under the tls_device_lock, for perfect synchronization.  This does not
slow down the common likely destructor flow, in which both the refcount
is decreased and the spinlock is acquired, anyway.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/tls: Check for errors in tls_device_init</title>
<updated>2022-07-14T17:12:39+00:00</updated>
<author>
<name>Tariq Toukan</name>
<email>tariqt@nvidia.com</email>
</author>
<published>2022-07-14T07:07:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d8c51b25a235e283e37750943bbf356ef187230'/>
<id>3d8c51b25a235e283e37750943bbf356ef187230</id>
<content type='text'>
Add missing error checks in tls_device_init.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add missing error checks in tls_device_init.

Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Reported-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220714070754.1428-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "tls: rx: move counting TlsDecryptErrors for sync"</title>
<updated>2022-07-06T12:10:59+00:00</updated>
<author>
<name>Gal Pressman</name>
<email>gal@nvidia.com</email>
</author>
<published>2022-07-05T11:08:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a069a90554168ac4cc81af65f000557d2a8a0745'/>
<id>a069a90554168ac4cc81af65f000557d2a8a0745</id>
<content type='text'>
This reverts commit 284b4d93daee56dff3e10029ddf2e03227f50dbf.
When using TLS device offload and coming from tls_device_reencrypt()
flow, -EBADMSG error in tls_do_decryption() should not be counted
towards the TLSTlsDecryptError counter.

Move the counter increase back to the decrypt_internal() call site in
decrypt_skb_update().
This also fixes an issue where:
	if (n_sgin &lt; 1)
		return -EBADMSG;

Errors in decrypt_internal() were not counted after the cited patch.

Fixes: 284b4d93daee ("tls: rx: move counting TlsDecryptErrors for sync")
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Reviewed-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Signed-off-by: Gal Pressman &lt;gal@nvidia.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 284b4d93daee56dff3e10029ddf2e03227f50dbf.
When using TLS device offload and coming from tls_device_reencrypt()
flow, -EBADMSG error in tls_do_decryption() should not be counted
towards the TLSTlsDecryptError counter.

Move the counter increase back to the decrypt_internal() call site in
decrypt_skb_update().
This also fixes an issue where:
	if (n_sgin &lt; 1)
		return -EBADMSG;

Errors in decrypt_internal() were not counted after the cited patch.

Fixes: 284b4d93daee ("tls: rx: move counting TlsDecryptErrors for sync")
Cc: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Reviewed-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Signed-off-by: Gal Pressman &lt;gal@nvidia.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sock: redo the psock vs ULP protection check</title>
<updated>2022-06-23T08:08:30+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-06-20T19:13:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e34a07c0ae3906f97eb18df50902e2a01c1015b6'/>
<id>e34a07c0ae3906f97eb18df50902e2a01c1015b6</id>
<content type='text'>
Commit 8a59f9d1e3d4 ("sock: Introduce sk-&gt;sk_prot-&gt;psock_update_sk_prot()")
has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
the new tcp_bpf_update_proto() function. I'm guessing that this
was done to allow creating psocks for non-inet sockets.

Unfortunately the destruction path for psock includes the ULP
unwind, so we need to fail the sk_psock_init() itself.
Otherwise if ULP is already present we'll notice that later,
and call tcp_update_ulp() with the sk_proto of the ULP
itself, which will most likely result in the ULP looping
its callbacks.

Fixes: 8a59f9d1e3d4 ("sock: Introduce sk-&gt;sk_prot-&gt;psock_update_sk_prot()")
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Tested-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 8a59f9d1e3d4 ("sock: Introduce sk-&gt;sk_prot-&gt;psock_update_sk_prot()")
has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to
the new tcp_bpf_update_proto() function. I'm guessing that this
was done to allow creating psocks for non-inet sockets.

Unfortunately the destruction path for psock includes the ULP
unwind, so we need to fail the sk_psock_init() itself.
Otherwise if ULP is already present we'll notice that later,
and call tcp_update_ulp() with the sk_proto of the ULP
itself, which will most likely result in the ULP looping
its callbacks.

Fixes: 8a59f9d1e3d4 ("sock: Introduce sk-&gt;sk_prot-&gt;psock_update_sk_prot()")
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Reviewed-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Tested-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/r/20220620191353.1184629-2-kuba@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "net/tls: fix tls_sk_proto_close executed repeatedly"</title>
<updated>2022-06-23T08:08:30+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-06-20T19:13:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1b205d948fbb06a7613d87dcea0ff5fd8a08ed91'/>
<id>1b205d948fbb06a7613d87dcea0ff5fd8a08ed91</id>
<content type='text'>
This reverts commit 69135c572d1f84261a6de2a1268513a7e71753e2.

This commit was just papering over the issue, ULP should not
get -&gt;update() called with its own sk_prot. Each ULP would
need to add this check.

Fixes: 69135c572d1f ("net/tls: fix tls_sk_proto_close executed repeatedly")
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 69135c572d1f84261a6de2a1268513a7e71753e2.

This commit was just papering over the issue, ULP should not
get -&gt;update() called with its own sk_prot. Each ULP would
need to add this check.

Fixes: 69135c572d1f ("net/tls: fix tls_sk_proto_close executed repeatedly")
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Reviewed-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/r/20220620191353.1184629-1-kuba@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/tls: fix tls_sk_proto_close executed repeatedly</title>
<updated>2022-06-20T09:01:52+00:00</updated>
<author>
<name>Ziyang Xuan</name>
<email>william.xuanziyang@huawei.com</email>
</author>
<published>2022-06-20T04:35:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69135c572d1f84261a6de2a1268513a7e71753e2'/>
<id>69135c572d1f84261a6de2a1268513a7e71753e2</id>
<content type='text'>
After setting the sock ktls, update ctx-&gt;sk_proto to sock-&gt;sk_prot by
tls_update(), so now ctx-&gt;sk_proto-&gt;close is tls_sk_proto_close(). When
close the sock, tls_sk_proto_close() is called for sock-&gt;sk_prot-&gt;close
is tls_sk_proto_close(). But ctx-&gt;sk_proto-&gt;close() will be executed later
in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
occurred. That will trigger the following bug.

=================================================================
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
Call Trace:
 &lt;TASK&gt;
 tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
 __sock_release+0xcd/0x280 net/socket.c:650
 sock_close+0x18/0x20 net/socket.c:1365

Updating a proto which is same with sock-&gt;sk_prot is incorrect. Add proto
and sock-&gt;sk_prot equality check at the head of tls_update() to fix it.

Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free")
Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan &lt;william.xuanziyang@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After setting the sock ktls, update ctx-&gt;sk_proto to sock-&gt;sk_prot by
tls_update(), so now ctx-&gt;sk_proto-&gt;close is tls_sk_proto_close(). When
close the sock, tls_sk_proto_close() is called for sock-&gt;sk_prot-&gt;close
is tls_sk_proto_close(). But ctx-&gt;sk_proto-&gt;close() will be executed later
in tls_sk_proto_close(). Thus tls_sk_proto_close() executed repeatedly
occurred. That will trigger the following bug.

=================================================================
KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
RIP: 0010:tls_sk_proto_close+0xd8/0xaf0 net/tls/tls_main.c:306
Call Trace:
 &lt;TASK&gt;
 tls_sk_proto_close+0x356/0xaf0 net/tls/tls_main.c:329
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
 __sock_release+0xcd/0x280 net/socket.c:650
 sock_close+0x18/0x20 net/socket.c:1365

Updating a proto which is same with sock-&gt;sk_prot is incorrect. Add proto
and sock-&gt;sk_prot equality check at the head of tls_update() to fix it.

Fixes: 95fa145479fb ("bpf: sockmap/tls, close can race with map free")
Reported-by: syzbot+29c3c12f3214b85ad081@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan &lt;william.xuanziyang@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TX</title>
<updated>2022-06-10T04:51:57+00:00</updated>
<author>
<name>Maxim Mikityanskiy</name>
<email>maximmi@nvidia.com</email>
</author>
<published>2022-06-08T15:34:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b489a6e5871690735752f8875f411e4d0cd8e5df'/>
<id>b489a6e5871690735752f8875f411e4d0cd8e5df</id>
<content type='text'>
To embrace possible future optimizations of TLS, rename zerocopy
sendfile definitions to more generic ones:

* setsockopt: TLS_TX_ZEROCOPY_SENDFILE- &gt; TLS_TX_ZEROCOPY_RO
* sock_diag: TLS_INFO_ZC_SENDFILE -&gt; TLS_INFO_ZC_RO_TX

RO stands for readonly and emphasizes that the application shouldn't
modify the data being transmitted with zerocopy to avoid potential
disconnection.

Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()")
Signed-off-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To embrace possible future optimizations of TLS, rename zerocopy
sendfile definitions to more generic ones:

* setsockopt: TLS_TX_ZEROCOPY_SENDFILE- &gt; TLS_TX_ZEROCOPY_RO
* sock_diag: TLS_INFO_ZC_SENDFILE -&gt; TLS_INFO_ZC_RO_TX

RO stands for readonly and emphasizes that the application shouldn't
modify the data being transmitted with zerocopy to avoid potential
disconnection.

Fixes: c1318b39c7d3 ("tls: Add opt-in zerocopy mode of sendfile()")
Signed-off-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: tls: fix messing up lists when bpf enabled</title>
<updated>2022-05-20T00:55:06+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2022-05-18T20:56:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1c2133114d2d11c10ffb0da4e12904bde0478beb'/>
<id>1c2133114d2d11c10ffb0da4e12904bde0478beb</id>
<content type='text'>
Artem points out that skb may try to take over the skb and
queue it to its own list. Unlink the skb before calling out.

Fixes: b1a2c1786330 ("tls: rx: clear ctx-&gt;recv_pkt earlier")
Reported-by: Artem Savkov &lt;asavkov@redhat.com&gt;
Tested-by: Artem Savkov &lt;asavkov@redhat.com&gt;
Link: https://lore.kernel.org/r/20220518205644.2059468-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Artem points out that skb may try to take over the skb and
queue it to its own list. Unlink the skb before calling out.

Fixes: b1a2c1786330 ("tls: rx: clear ctx-&gt;recv_pkt earlier")
Reported-by: Artem Savkov &lt;asavkov@redhat.com&gt;
Tested-by: Artem Savkov &lt;asavkov@redhat.com&gt;
Link: https://lore.kernel.org/r/20220518205644.2059468-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tls: Add opt-in zerocopy mode of sendfile()</title>
<updated>2022-05-19T10:14:11+00:00</updated>
<author>
<name>Boris Pismenny</name>
<email>borisp@nvidia.com</email>
</author>
<published>2022-05-18T09:27:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c1318b39c7d36bd5139a9c71044ff2b2d3c6f9d8'/>
<id>c1318b39c7d36bd5139a9c71044ff2b2d3c6f9d8</id>
<content type='text'>
TLS device offload copies sendfile data to a bounce buffer before
transmitting. It allows to maintain the valid MAC on TLS records when
the file contents change and a part of TLS record has to be
retransmitted on TCP level.

In many common use cases (like serving static files over HTTPS) the file
contents are not changed on the fly. In many use cases breaking the
connection is totally acceptable if the file is changed during
transmission, because it would be received corrupted in any case.

This commit allows to optimize performance for such use cases to
providing a new optional mode of TLS sendfile(), in which the extra copy
is skipped. Removing this copy improves performance significantly, as
TLS and TCP sendfile perform the same operations, and the only overhead
is TLS header/trailer insertion.

The new mode can only be enabled with the new socket option named
TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
compatibility with existing applications that rely on the copying
behavior.

The new mode is safe, meaning that unsolicited modifications of the file
being sent can't break integrity of the kernel. The worst thing that can
happen is sending a corrupted TLS record, which is in any case not
forbidden when using regular TCP sockets.

Sockets other than TLS device offload are not affected by the new socket
option. The actual status of zerocopy sendfile can be queried with
sock_diag.

Performance numbers in a single-core test with 24 HTTPS streams on
nginx, under 100% CPU load:

* non-zerocopy: 33.6 Gbit/s
* zerocopy: 79.92 Gbit/s

CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz

Signed-off-by: Boris Pismenny &lt;borisp@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Signed-off-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
TLS device offload copies sendfile data to a bounce buffer before
transmitting. It allows to maintain the valid MAC on TLS records when
the file contents change and a part of TLS record has to be
retransmitted on TCP level.

In many common use cases (like serving static files over HTTPS) the file
contents are not changed on the fly. In many use cases breaking the
connection is totally acceptable if the file is changed during
transmission, because it would be received corrupted in any case.

This commit allows to optimize performance for such use cases to
providing a new optional mode of TLS sendfile(), in which the extra copy
is skipped. Removing this copy improves performance significantly, as
TLS and TCP sendfile perform the same operations, and the only overhead
is TLS header/trailer insertion.

The new mode can only be enabled with the new socket option named
TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
compatibility with existing applications that rely on the copying
behavior.

The new mode is safe, meaning that unsolicited modifications of the file
being sent can't break integrity of the kernel. The worst thing that can
happen is sending a corrupted TLS record, which is in any case not
forbidden when using regular TCP sockets.

Sockets other than TLS device offload are not affected by the new socket
option. The actual status of zerocopy sendfile can be queried with
sock_diag.

Performance numbers in a single-core test with 24 HTTPS streams on
nginx, under 100% CPU load:

* non-zerocopy: 33.6 Gbit/s
* zerocopy: 79.92 Gbit/s

CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz

Signed-off-by: Boris Pismenny &lt;borisp@nvidia.com&gt;
Signed-off-by: Tariq Toukan &lt;tariqt@nvidia.com&gt;
Signed-off-by: Maxim Mikityanskiy &lt;maximmi@nvidia.com&gt;
Reviewed-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Link: https://lore.kernel.org/r/20220518092731.1243494-1-maximmi@nvidia.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
