<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/xdp, branch master</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf</title>
<updated>2026-05-10T01:42:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-05-10T01:42:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=515186b7be488f37c63c2436fc2d1a160ef9bb95'/>
<id>515186b7be488f37c63c2436fc2d1a160ef9bb95</id>
<content type='text'>
Pull bpf fixes from Alexei Starovoitov:

 - Fix sk_local_storage diag dump via netlink (Amery Hung)

 - Fix off-by-one in arena direct-value access (Junyoung Jang)

 - Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)

 - Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)

 - Reject TX-only AF_XDP sockets (Linpu Yu)

 - Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)

 - Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
   (Weiming Shi)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix off-by-one boundary validation in arena direct-value access
  xskmap: reject TX-only AF_XDP sockets
  bpf: Don't run arg-tracking analysis twice on main subprog
  bpf: Free reuseport cBPF prog after RCU grace period.
  bpf: tcp: Fix type confusion in sol_tcp_sockopt().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
  mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
  selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
  bpf: tcp: Fix type confusion in bpf_tcp_sock().
  tools/headers: Regenerate stddef.h to fix BPF selftests
  bpf: Fix sk_local_storage diag dumping uninitialized special fields
  bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
  sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
  bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  bpf: Reject TCP_NODELAY in TCP header option callbacks
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull bpf fixes from Alexei Starovoitov:

 - Fix sk_local_storage diag dump via netlink (Amery Hung)

 - Fix off-by-one in arena direct-value access (Junyoung Jang)

 - Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)

 - Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)

 - Reject TX-only AF_XDP sockets (Linpu Yu)

 - Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)

 - Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
   (Weiming Shi)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix off-by-one boundary validation in arena direct-value access
  xskmap: reject TX-only AF_XDP sockets
  bpf: Don't run arg-tracking analysis twice on main subprog
  bpf: Free reuseport cBPF prog after RCU grace period.
  bpf: tcp: Fix type confusion in sol_tcp_sockopt().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
  mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
  selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
  bpf: tcp: Fix type confusion in bpf_tcp_sock().
  tools/headers: Regenerate stddef.h to fix BPF selftests
  bpf: Fix sk_local_storage diag dumping uninitialized special fields
  bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
  sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
  bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  bpf: Reject TCP_NODELAY in TCP header option callbacks
</pre>
</div>
</content>
</entry>
<entry>
<title>xskmap: reject TX-only AF_XDP sockets</title>
<updated>2026-05-09T23:17:01+00:00</updated>
<author>
<name>Linpu Yu</name>
<email>linpu5433@gmail.com</email>
</author>
<published>2026-05-08T14:43:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bf6d507f7e3c65751d52fd8caf1ea4e003922624'/>
<id>bf6d507f7e3c65751d52fd8caf1ea4e003922624</id>
<content type='text'>
XSKMAP entries are used as redirect targets for incoming XDP frames.
A TX-only AF_XDP socket lacks an Rx ring and cannot handle redirected
traffic, but xsk_map_update_elem() currently allows such sockets to
be inserted into the map.

Redirecting packets to such a socket on the veth generic-XDP path
causes a kernel crash in xsk_generic_rcv().

This became possible after xsk_is_setup_for_bpf_map() was removed from
the XSKMAP update path, which allowed bound TX-only sockets to be
inserted into the map.

Reject TX-only sockets during XSKMAP updates to avoid the crash.
They remain fully operational for pure Tx purposes outside XSKMAP.

Fixes: 968be23ceaca ("xsk: Fix possible segfault at xskmap entry insertion")
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Signed-off-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Signed-off-by: Linpu Yu &lt;linpu5433@gmail.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://lore.kernel.org/r/20260508144344.694-1-linpu5433@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
XSKMAP entries are used as redirect targets for incoming XDP frames.
A TX-only AF_XDP socket lacks an Rx ring and cannot handle redirected
traffic, but xsk_map_update_elem() currently allows such sockets to
be inserted into the map.

Redirecting packets to such a socket on the veth generic-XDP path
causes a kernel crash in xsk_generic_rcv().

This became possible after xsk_is_setup_for_bpf_map() was removed from
the XSKMAP update path, which allowed bound TX-only sockets to be
inserted into the map.

Reject TX-only sockets during XSKMAP updates to avoid the crash.
They remain fully operational for pure Tx purposes outside XSKMAP.

Fixes: 968be23ceaca ("xsk: Fix possible segfault at xskmap entry insertion")
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Yuan Tan &lt;yuantan098@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Signed-off-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Signed-off-by: Linpu Yu &lt;linpu5433@gmail.com&gt;
Reviewed-by: Jason Xing &lt;kerneljasonxing@gmail.com&gt;
Link: https://lore.kernel.org/r/20260508144344.694-1-linpu5433@gmail.com
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: fix u64 descriptor address truncation on 32-bit architectures</title>
<updated>2026-05-06T02:27:51+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=203cee647f551abc87b992045cd920b117ff990a'/>
<id>203cee647f551abc87b992045cd920b117ff990a</id>
<content type='text'>
In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)-&gt;destructor_arg (void *) via a
uintptr_t cast:

    skb_shinfo(skb)-&gt;destructor_arg = (void *)((uintptr_t)addr | 0x1UL);

On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.

Fix this by handling the 32-bit case directly in
xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an
xsk_addrs struct (the same path already used for multi-descriptor
SKBs) to store the full u64 address. The existing tagged-pointer logic
in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned
from kmem_cache_zalloc() are always word-aligned and therefore have
bit 0 clear, which correctly identifies them as a struct pointer
rather than an inline tagged address on every architecture.

Factor the shared kmem_cache_zalloc + destructor_arg assignment into
__xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles
the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1).
The three former open-coded kmem_cache_zalloc call sites now reduce to
a single call each.

Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through
xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb()
before skb-&gt;destructor is installed.

The overhead is one extra kmem_cache_zalloc per first descriptor on
32-bit only; 64-bit builds are completely unchanged.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In copy mode TX, xsk_skb_destructor_set_addr() stores the 64-bit
descriptor address into skb_shinfo(skb)-&gt;destructor_arg (void *) via a
uintptr_t cast:

    skb_shinfo(skb)-&gt;destructor_arg = (void *)((uintptr_t)addr | 0x1UL);

On 32-bit architectures uintptr_t is 32 bits, so the upper 32 bits of
the descriptor address are silently dropped. In XDP_ZEROCOPY unaligned
mode the chunk offset is encoded in bits 48-63 of the descriptor
address (XSK_UNALIGNED_BUF_OFFSET_SHIFT = 48), meaning the offset is
lost entirely. The completion queue then returns a truncated address to
userspace, making buffer recycling impossible.

Fix this by handling the 32-bit case directly in
xsk_skb_destructor_set_addr(): when !CONFIG_64BIT, allocate an
xsk_addrs struct (the same path already used for multi-descriptor
SKBs) to store the full u64 address. The existing tagged-pointer logic
in xsk_skb_destructor_is_addr() stays unchanged: slab pointers returned
from kmem_cache_zalloc() are always word-aligned and therefore have
bit 0 clear, which correctly identifies them as a struct pointer
rather than an inline tagged address on every architecture.

Factor the shared kmem_cache_zalloc + destructor_arg assignment into
__xsk_addrs_alloc() and add a wrapper xsk_addrs_alloc() that handles
the inline-to-list upgrade (is_addr check + get_addr + num_descs = 1).
The three former open-coded kmem_cache_zalloc call sites now reduce to
a single call each.

Propagate the -ENOMEM from xsk_skb_destructor_set_addr() through
xsk_skb_init_misc() so the caller can clean up the skb via kfree_skb()
before skb-&gt;destructor is installed.

The overhead is one extra kmem_cache_zalloc per first descriptor on
32-bit only; 64-bit builds are completely unchanged.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-9-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: fix xsk_addrs slab leak on multi-buffer error path</title>
<updated>2026-05-06T02:27:51+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e0f229025a8e774a695017a376c4a01279c0e66e'/>
<id>e0f229025a8e774a695017a376c4a01279c0e66e</id>
<content type='text'>
When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to &gt;= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).

If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs-&gt;skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1

If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -&gt; xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs &gt; 1:

    if (unlikely(num_descs &gt; 1))
        kmem_cache_free(xsk_tx_generic_cache, destructor_arg);

Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.

Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When xsk_build_skb() / xsk_build_skb_zerocopy() sees the first
continuation descriptor, it promotes destructor_arg from an inlined
address to a freshly allocated xsk_addrs (num_descs = 1). The counter
is bumped to &gt;= 2 only at the very end of a successful build (by calling
xsk_inc_num_desc()).

If the build fails in between (e.g. alloc_page() returns NULL with
-EAGAIN, or the MAX_SKB_FRAGS overflow hits), we jump to free_err, skip
calling xsk_inc_num_desc() to increment num_descs and leave the half-built
skb attached to xs-&gt;skb for the app to retry. The skb now has
1) destructor_arg = a real xsk_addrs pointer,
2) num_descs = 1

If the app never retries and just close()s the socket, xsk_release()
calls xsk_drop_skb() -&gt; xsk_consume_skb(), which decides whether to
free xsk_addrs by testing num_descs &gt; 1:

    if (unlikely(num_descs &gt; 1))
        kmem_cache_free(xsk_tx_generic_cache, destructor_arg);

Because num_descs is exactly 1 the branch is skipped and the
xsk_addrs object is leaked to the xsk_tx_generic_cache slab.

Fix it by directly testing if destructor_arg is still addr. Or else it
is modified and used to store the newly allocated memory from
xsk_tx_generic_cache regardless of increment of num_desc, which we
need to handle.

Closes: https://lore.kernel.org/all/20260419045824.D9E5EC2BCAF@smtp.kernel.org/
Fixes: 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-8-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: avoid skb leak in XDP_TX_METADATA case</title>
<updated>2026-05-06T02:27:50+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8c2cff50afdd2b53c7cc2ca2297301c0ffd3e802'/>
<id>8c2cff50afdd2b53c7cc2ca2297301c0ffd3e802</id>
<content type='text'>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs-&gt;skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means no frag and xs-&gt;skb is
   NULL) and users enable metadata feature.
2. xsk_skb_metadata() returns a error code.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'.
4. there is no chance to free this skb anymore.

Closes: https://lore.kernel.org/all/20260415085204.3F87AC19424@smtp.kernel.org/
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-7-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: prevent CQ desync when freeing half-built skbs in xsk_build_skb()</title>
<updated>2026-05-06T02:27:50+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3dec153ae484e3b2ddac841156e197ba54c8df94'/>
<id>3dec153ae484e3b2ddac841156e197ba54c8df94</id>
<content type='text'>
Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.

Postpone the init phase when we believe the allocation of first frag is
successfully completed. Before this init, skb can be safely freed by
kfree_skb().

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Once xsk_skb_init_misc() has been called on an skb, its destructor is
set to xsk_destruct_skb(), which submits the descriptor address(es) to
the completion queue and advances the CQ producer. If such an skb is
subsequently freed via kfree_skb() along an error path - before the
skb has ever been handed to the driver - the destructor still runs and
submits a bogus, half-initialized address to the CQ.

Postpone the init phase when we believe the allocation of first frag is
successfully completed. Before this init, skb can be safely freed by
kfree_skb().

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/
Fixes: c30d084960cf ("xsk: avoid overwriting skb fields for multi-buffer traffic")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-6-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: fix use-after-free of xs-&gt;skb in xsk_build_skb() free_err path</title>
<updated>2026-05-06T02:27:50+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0f3776583d282550dbafe6082a914efcf9094d59'/>
<id>0f3776583d282550dbafe6082a914efcf9094d59</id>
<content type='text'>
When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs-&gt;skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb &amp;&amp; !skb_shinfo(skb)-&gt;nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs-&gt;skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs-&gt;skb is dereferenced, causing a
use-after-free or double-free.

Fix by using a !xs-&gt;skb check to handle first frag situation, ensuring
we only free skbs that were freshly allocated in this call
(xs-&gt;skb is NULL) and never free an in-progress multi-buffer skb that
the caller still references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When xsk_build_skb() processes multi-buffer packets in copy mode, the
first descriptor stores data into the skb linear area without adding
any frags, so nr_frags stays at 0. The caller then sets xs-&gt;skb = skb
to accumulate subsequent descriptors.

If a continuation descriptor fails (e.g. alloc_page returns NULL with
-EAGAIN), we jump to free_err where the condition:

  if (skb &amp;&amp; !skb_shinfo(skb)-&gt;nr_frags)
      kfree_skb(skb);

evaluates to true because nr_frags is still 0 (the first descriptor
used the linear area, not frags). This frees the skb while xs-&gt;skb
still points to it, creating a dangling pointer. On the next transmit
attempt or socket close, xs-&gt;skb is dereferenced, causing a
use-after-free or double-free.

Fix by using a !xs-&gt;skb check to handle first frag situation, ensuring
we only free skbs that were freshly allocated in this call
(xs-&gt;skb is NULL) and never free an in-progress multi-buffer skb that
the caller still references.

Closes: https://lore.kernel.org/all/20260415082654.21026-4-kerneljasonxing@gmail.com/
Fixes: 6b9c129c2f93 ("xsk: remove @first_frag from xsk_build_skb()")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-5-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: handle NULL dereference of the skb without frags issue</title>
<updated>2026-05-06T02:27:50+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8cd3c1c6e7d9a1f0954159ec5f2fdaa7f6a48bd8'/>
<id>8cd3c1c6e7d9a1f0954159ec5f2fdaa7f6a48bd8</id>
<content type='text'>
When a first descriptor (xs-&gt;skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the
free_err -EOVERFLOW handler unconditionally dereferences xs-&gt;skb
via xsk_inc_num_desc(xs-&gt;skb) and xsk_drop_skb(xs-&gt;skb), causing
a NULL pointer dereference.

Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb()
calls with an xs-&gt;skb check (for the continuation case), and add
an else branch for the first-descriptor case that manually cancels
the one reserved CQ slot and increments invalid_descs by one to
account for the single invalid descriptor.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a first descriptor (xs-&gt;skb == NULL) triggers -EOVERFLOW in
xsk_build_skb_zerocopy() (e.g., MAX_SKB_FRAGS exceeded), the
free_err -EOVERFLOW handler unconditionally dereferences xs-&gt;skb
via xsk_inc_num_desc(xs-&gt;skb) and xsk_drop_skb(xs-&gt;skb), causing
a NULL pointer dereference.

Fix this by guarding the existing xsk_inc_num_desc()/xsk_drop_skb()
calls with an xs-&gt;skb check (for the continuation case), and add
an else branch for the first-descriptor case that manually cancels
the one reserved CQ slot and increments invalid_descs by one to
account for the single invalid descriptor.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-4-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: free the skb when hitting the upper bound MAX_SKB_FRAGS</title>
<updated>2026-05-06T02:27:50+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0bb7a9caf5c1d6e25ba376ea6b39261ad28550f4'/>
<id>0bb7a9caf5c1d6e25ba376ea6b39261ad28550f4</id>
<content type='text'>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs-&gt;skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs-&gt;skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs-&gt;skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix it by explicitly adding kfree_skb() before returning back to its
caller.

How to reproduce it in virtio_net:
1. the current skb is the first one (which means xs-&gt;skb is NULL) and
   hit the limit MAX_SKB_FRAGS.
2. xsk_build_skb_zerocopy() returns -EOVERFLOW.
3. the caller xsk_build_skb() clears skb by using 'skb = NULL;'. This
   is why bug can be triggered.
4. there is no chance to free this skb anymore.

Note that if in this case the xs-&gt;skb is not NULL, xsk_build_skb() will
call xsk_drop_skb(xs-&gt;skb) to do the right thing.

Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path")
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Link: https://patch.msgid.link/20260502200722.53960-3-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xsk: reject sw-csum UMEM binding to IFF_TX_SKB_NO_LINEAR devices</title>
<updated>2026-05-06T02:27:49+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2026-05-02T20:07:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d73a9a63f9f7f7c17637731fd28daf3665992d1e'/>
<id>d73a9a63f9f7f7c17637731fd28daf3665992d1e</id>
<content type='text'>
skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb-&gt;data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.

AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.

The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().

Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) &gt;= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
   (invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
    consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
    the same descriptor and re-hits the same WARN until the socket
    is closed.

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
skb_checksum_help() is a common helper that writes the folded
16-bit checksum back via skb-&gt;data + csum_start + csum_offset,
i.e. it relies on the skb's linear head and fails (with WARN_ONCE
and -EINVAL) when skb_headlen() is 0.

AF_XDP generic xmit takes two very different paths depending on the
netdev. Drivers that advertise IFF_TX_SKB_NO_LINEAR (e.g. virtio_net)
skip the "copy payload into a linear head" step on purpose as a
performance optimisation: xsk_build_skb_zerocopy() only attaches UMEM
pages as frags and never calls skb_put(), so skb_headlen() stays 0
for the whole skb. For these skbs there is simply no linear area for
skb_checksum_help() to write the csum into - the sw-csum fallback is
structurally inapplicable.

The patch tries to catch this and reject the combination with error at
setup time. Rejecting at bind() converts this silent per-packet failure
into a synchronous, actionable -EOPNOTSUPP at setup time. HW csum and
launch_time metadata on IFF_TX_SKB_NO_LINEAR drivers are unaffected
because they do not call skb_checksum_help().

Without the patch, every descriptor carrying 'XDP_TX_METADATA |
XDP_TXMD_FLAGS_CHECKSUM' produces:
1) a WARN_ONCE "offset (N) &gt;= skb_headlen() (0)" from skb_checksum_help(),
2) sendmsg() returning -EINVAL without consuming the descriptor
   (invalid_descs is not incremented),
3) a wedged TX ring: __xsk_generic_xmit() does not advance the
    consumer on non-EOVERFLOW errors, so the next sendmsg() re-reads
    the same descriptor and re-hits the same WARN until the socket
    is closed.

Closes: https://lore.kernel.org/all/20260419045822.843BFC2BCAF@smtp.kernel.org/#t
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Acked-by: Stanislav Fomichev &lt;sdf@fomichev.me&gt;
Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Alexander Lobakin &lt;aleksander.lobakin@intel.com&gt;
Fixes: 30c3055f9c0d ("xsk: wrap generic metadata handling onto separate function")
Link: https://patch.msgid.link/20260502200722.53960-2-kerneljasonxing@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
