<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/net/netns, branch master</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>ipv6: flowlabel: enforce per-netns limit for unprivileged callers</title>
<updated>2026-05-08T21:59:14+00:00</updated>
<author>
<name>Maoyi Xie</name>
<email>maoyi.xie@ntu.edu.sg</email>
</author>
<published>2026-05-06T08:24:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e68eadffb724b36ffd3d5619e0efcaf29ec2a175'/>
<id>e68eadffb724b36ffd3d5619e0efcaf29ec2a175</id>
<content type='text'>
fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.

Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.

Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.

mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie &lt;maoyi.xie@ntu.edu.sg&gt;
Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are
file scope and shared across netns. mem_check() reads fl_size to
decide whether to deny non-CAP_NET_ADMIN callers. capable() runs
against init_user_ns, so an unprivileged user in any non-init
userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and
starve every other unprivileged userns on the host.

Add struct netns_ipv6::flowlabel_count, bumped and decremented
next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new
field fills the existing 4-byte hole after ipmr_seq, so struct
netns_ipv6 stays the same size on 64-bit builds.

Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the
file was added. Machines and connection counts have grown.

mem_check() folds an extra per-netns ceiling into the existing
non-CAP_NET_ADMIN conditional. The ceiling is half of the total
budget that unprivileged callers have ever been able to use, i.e.
(FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With
FL_MAX_SIZE doubled, this preserves the original per-user reach
of 3K (what an unprivileged caller could already obtain before
this change), while forcing an attacker to spread allocations
across at least two netns to exhaust the global non-CAP_NET_ADMIN
budget.

CAP_NET_ADMIN against init_user_ns still bypasses both caps.

The previous patch took ip6_fl_lock across mem_check and
fl_intern, so the new flowlabel_count read in mem_check and the
new flowlabel_count++ in fl_intern run under the same critical
section. flowlabel_count is therefore plain int, like fl_size.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Maoyi Xie &lt;maoyi.xie@ntu.edu.sg&gt;
Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipmr: Add __rcu to netns_ipv4.mrt.</title>
<updated>2026-05-05T02:26:13+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2026-05-02T18:07:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a6039776c7994dd0b9a4acce23a3f897d1688cbf'/>
<id>a6039776c7994dd0b9a4acce23a3f897d1688cbf</id>
<content type='text'>
kernel test robot reported this Sparse warning:

  $ make C=1 net/ipv4/ipmr.o
  net/ipv4/ipmr.c:312:24: error: incompatible types in comparison expression (different address spaces):
  net/ipv4/ipmr.c:312:24:    struct mr_table [noderef] __rcu *
  net/ipv4/ipmr.c:312:24:    struct mr_table *

Let's add __rcu annotation to netns_ipv4.mrt.

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202605030032.glNApko7-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260502180755.359554-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kernel test robot reported this Sparse warning:

  $ make C=1 net/ipv4/ipmr.o
  net/ipv4/ipmr.c:312:24: error: incompatible types in comparison expression (different address spaces):
  net/ipv4/ipmr.c:312:24:    struct mr_table [noderef] __rcu *
  net/ipv4/ipmr.c:312:24:    struct mr_table *

Let's add __rcu annotation to netns_ipv4.mrt.

Fixes: b3b6babf4751 ("ipmr: Free mr_table after RCU grace period.")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202605030032.glNApko7-lkp@intel.com/
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Link: https://patch.msgid.link/20260502180755.359554-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-04-02T18:03:13+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2026-04-02T17:57:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8ffb33d7709b59ff60560f48960a73bd8a55be95'/>
<id>8ffb33d7709b59ff60560f48960a73bd8a55be95</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-7.0-rc7).

Conflicts:

net/vmw_vsock/af_vsock.c
  b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()")
  0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
  ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic")
  57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change")

drivers/net/wireless/intel/iwlwifi/mld/mac80211.c
  4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections")
  687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data")

drivers/net/wireless/intel/iwlwifi/mld/scan.h
  b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling")
  ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing")

drivers/net/wireless/intel/iwlwifi/mvm/fw.c
  078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2")
  323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cross-merge networking fixes after downstream PR (net-7.0-rc7).

Conflicts:

net/vmw_vsock/af_vsock.c
  b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()")
  0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport")

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
  ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic")
  57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change")

drivers/net/wireless/intel/iwlwifi/mld/mac80211.c
  4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections")
  687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data")

drivers/net/wireless/intel/iwlwifi/mld/scan.h
  b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling")
  ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing")

drivers/net/wireless/intel/iwlwifi/mvm/fw.c
  078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v
2")
  323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported")

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mpls: add seqcount to protect the platform_label{,s} pair</title>
<updated>2026-03-27T01:32:14+00:00</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2026-03-23T23:25:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=629ec78ef8608d955ce217880cdc3e1873af3a15'/>
<id>629ec78ef8608d955ce217880cdc3e1873af3a15</id>
<content type='text'>
The RCU-protected codepaths (mpls_forward, mpls_dump_routes) can have
an inconsistent view of platform_labels vs platform_label in case of a
concurrent resize (resize_platform_label_table, under
platform_mutex). This can lead to OOB accesses.

This patch adds a seqcount, so that we get a consistent snapshot.

Note that mpls_label_ok is also susceptible to this, so the check
against RTA_DST in rtm_to_route_config, done outside platform_mutex,
is not sufficient. This value gets passed to mpls_label_ok once more
in both mpls_route_add and mpls_route_del, so there is no issue, but
that additional check must not be removed.

Reported-by: Yuan Tan &lt;tanyuan98@outlook.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Fixes: 7720c01f3f590 ("mpls: Add a sysctl to control the size of the mpls label table")
Fixes: dde1b38e873c ("mpls: Convert mpls_dump_routes() to RCU.")
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Link: https://patch.msgid.link/cd8fca15e3eb7e212b094064cd83652e20fd9d31.1774284088.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The RCU-protected codepaths (mpls_forward, mpls_dump_routes) can have
an inconsistent view of platform_labels vs platform_label in case of a
concurrent resize (resize_platform_label_table, under
platform_mutex). This can lead to OOB accesses.

This patch adds a seqcount, so that we get a consistent snapshot.

Note that mpls_label_ok is also susceptible to this, so the check
against RTA_DST in rtm_to_route_config, done outside platform_mutex,
is not sufficient. This value gets passed to mpls_label_ok once more
in both mpls_route_add and mpls_route_del, so there is no issue, but
that additional check must not be removed.

Reported-by: Yuan Tan &lt;tanyuan98@outlook.com&gt;
Reported-by: Yifan Wu &lt;yifanwucs@gmail.com&gt;
Reported-by: Juefei Pu &lt;tomapufckgml@gmail.com&gt;
Reported-by: Xin Liu &lt;bird@lzu.edu.cn&gt;
Fixes: 7720c01f3f590 ("mpls: Add a sysctl to control the size of the mpls label table")
Fixes: dde1b38e873c ("mpls: Convert mpls_dump_routes() to RCU.")
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Link: https://patch.msgid.link/cd8fca15e3eb7e212b094064cd83652e20fd9d31.1774284088.git.sd@queasysnail.net
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2026-03-26T19:09:57+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2026-01-08T19:37:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9ebcf66cd6bcaa6c8275c18b7799507156361218'/>
<id>9ebcf66cd6bcaa6c8275c18b7799507156361218</id>
<content type='text'>
Cross-merge networking fixes after downstream PR (net-7.0-rc6).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Cross-merge networking fixes after downstream PR (net-7.0-rc6).

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec</title>
<updated>2026-03-24T14:16:28+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2026-03-24T14:16:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=51a209ee33428ed688b1c00e0521a5b5b8ff483f'/>
<id>51a209ee33428ed688b1c00e0521a5b5b8ff483f</id>
<content type='text'>
Steffen Klassert says:

====================
pull request (net): ipsec 2026-03-23

1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi.
   From Sabrina Dubroca.

2) Fix the condition on x-&gt;pcpu_num in xfrm_sa_len by using the
   proper check. From Sabrina Dubroca.

3) Call xdo_dev_state_delete during state update to properly cleanup
   the xdo device state. From Sabrina Dubroca.

4) Fix a potential skb leak in espintcp when async crypto is used.
   From Sabrina Dubroca.

5) Validate inner IPv4 header length in IPTFS payload to avoid
   parsing malformed packets. From Roshan Kumar.

6) Fix skb_put() panic on non-linear skb during IPTFS reassembly.
   From Fernando Fernandez Mancera.

7) Silence various sparse warnings related to RCU, state, and policy
   handling. From Sabrina Dubroca.

8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini().
   From Hyunwoo Kim.

9) Prevent policy_hthresh.work from racing with netns teardown by using
   a proper cleanup mechanism. From Minwoo Ra.

10) Validate that the family of the source and destination addresses match
    in pfkey_send_migrate(). From Eric Dumazet.

11) Only publish mode_data after the clone is setup in the IPTFS receive path.
    This prevents leaving x-&gt;mode_data pointing at freed memory on error.
    From Paul Moses.

Please pull or let me know if there are problems.

ipsec-2026-03-23

* tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: iptfs: only publish mode_data after clone setup
  af_key: validate families in pfkey_send_migrate()
  xfrm: prevent policy_hthresh.work from racing with netns teardown
  xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
  xfrm: avoid RCU warnings around the per-netns netlink socket
  xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo
  xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo
  xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini}
  xfrm: state: silence sparse warnings during netns exit
  xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto
  xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock
  xfrm: state: fix sparse warnings around XFRM_STATE_INSERT
  xfrm: state: fix sparse warnings in xfrm_state_init
  xfrm: state: fix sparse warnings on xfrm_state_hold_rcu
  xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly
  xfrm: iptfs: validate inner IPv4 header length in IPTFS payload
  esp: fix skb leak with espintcp and async crypto
  xfrm: call xdo_dev_state_delete during state update
  xfrm: fix the condition on x-&gt;pcpu_num in xfrm_sa_len
  xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi
====================

Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Steffen Klassert says:

====================
pull request (net): ipsec 2026-03-23

1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi.
   From Sabrina Dubroca.

2) Fix the condition on x-&gt;pcpu_num in xfrm_sa_len by using the
   proper check. From Sabrina Dubroca.

3) Call xdo_dev_state_delete during state update to properly cleanup
   the xdo device state. From Sabrina Dubroca.

4) Fix a potential skb leak in espintcp when async crypto is used.
   From Sabrina Dubroca.

5) Validate inner IPv4 header length in IPTFS payload to avoid
   parsing malformed packets. From Roshan Kumar.

6) Fix skb_put() panic on non-linear skb during IPTFS reassembly.
   From Fernando Fernandez Mancera.

7) Silence various sparse warnings related to RCU, state, and policy
   handling. From Sabrina Dubroca.

8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini().
   From Hyunwoo Kim.

9) Prevent policy_hthresh.work from racing with netns teardown by using
   a proper cleanup mechanism. From Minwoo Ra.

10) Validate that the family of the source and destination addresses match
    in pfkey_send_migrate(). From Eric Dumazet.

11) Only publish mode_data after the clone is setup in the IPTFS receive path.
    This prevents leaving x-&gt;mode_data pointing at freed memory on error.
    From Paul Moses.

Please pull or let me know if there are problems.

ipsec-2026-03-23

* tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
  xfrm: iptfs: only publish mode_data after clone setup
  af_key: validate families in pfkey_send_migrate()
  xfrm: prevent policy_hthresh.work from racing with netns teardown
  xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini()
  xfrm: avoid RCU warnings around the per-netns netlink socket
  xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo
  xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo
  xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini}
  xfrm: state: silence sparse warnings during netns exit
  xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto
  xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock
  xfrm: state: fix sparse warnings around XFRM_STATE_INSERT
  xfrm: state: fix sparse warnings in xfrm_state_init
  xfrm: state: fix sparse warnings on xfrm_state_hold_rcu
  xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly
  xfrm: iptfs: validate inner IPv4 header length in IPTFS payload
  esp: fix skb leak with espintcp and async crypto
  xfrm: call xdo_dev_state_delete during state update
  xfrm: fix the condition on x-&gt;pcpu_num in xfrm_sa_len
  xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi
====================

Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: Remove UDP-Lite SNMP stats.</title>
<updated>2026-03-14T01:57:44+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@google.com</email>
</author>
<published>2026-03-11T05:19:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7accba6fd1ab60fb4f3a5c15c52d6fbb3af7f3a3'/>
<id>7accba6fd1ab60fb4f3a5c15c52d6fbb3af7f3a3</id>
<content type='text'>
Since UDP and UDP-Lite shared most of the code, we have had
to check the protocol every time we increment SNMP stats.

Now that the UDP-Lite paths are dead, let's remove UDP-Lite
SNMP stats.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Link: https://patch.msgid.link/20260311052020.1213705-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since UDP and UDP-Lite shared most of the code, we have had
to check the protocol every time we increment SNMP stats.

Now that the UDP-Lite paths are dead, let's remove UDP-Lite
SNMP stats.

Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@google.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Link: https://patch.msgid.link/20260311052020.1213705-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vsock: add G2H fallback for CIDs not owned by H2G transport</title>
<updated>2026-03-12T09:59:36+00:00</updated>
<author>
<name>Alexander Graf</name>
<email>graf@amazon.com</email>
</author>
<published>2026-03-04T23:00:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0de607dc4fd80ede3b2a35e8a72f99c7a0bbc321'/>
<id>0de607dc4fd80ede3b2a35e8a72f99c7a0bbc321</id>
<content type='text'>
When no H2G transport is loaded, vsock currently routes all CIDs to the
G2H transport (commit 65b422d9b61b ("vsock: forward all packets to the
host when no H2G is registered"). Extend that existing behavior: when
an H2G transport is loaded but does not claim a given CID, the
connection falls back to G2H in the same way.

This matters in environments like Nitro Enclaves, where an instance may
run nested VMs via vhost-vsock (H2G) while also needing to reach sibling
enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old
code, any CID &gt; 2 was unconditionally routed to H2G when vhost was
loaded, making those enclaves unreachable without setting
VMADDR_FLAG_TO_HOST explicitly on every connect.

Requiring every application to set VMADDR_FLAG_TO_HOST creates friction:
tools like socat, iperf, and others would all need to learn about it.
The flag was introduced 6 years ago and I am still not aware of any tool
that supports it. Even if there was support, it would be cumbersome to
use. The most natural experience is a single CID address space where H2G
only wins for CIDs it actually owns, and everything else falls through to
G2H, extending the behavior that already exists when H2G is absent.

To give user space at least a hint that the kernel applied this logic,
automatically set the VMADDR_FLAG_TO_HOST on the remote address so it
can determine the path taken via getpeername().

Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1).
At 0 it forces strict routing: H2G always wins for CID &gt; VMADDR_CID_HOST,
or ENODEV if H2G is not loaded.

Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When no H2G transport is loaded, vsock currently routes all CIDs to the
G2H transport (commit 65b422d9b61b ("vsock: forward all packets to the
host when no H2G is registered"). Extend that existing behavior: when
an H2G transport is loaded but does not claim a given CID, the
connection falls back to G2H in the same way.

This matters in environments like Nitro Enclaves, where an instance may
run nested VMs via vhost-vsock (H2G) while also needing to reach sibling
enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old
code, any CID &gt; 2 was unconditionally routed to H2G when vhost was
loaded, making those enclaves unreachable without setting
VMADDR_FLAG_TO_HOST explicitly on every connect.

Requiring every application to set VMADDR_FLAG_TO_HOST creates friction:
tools like socat, iperf, and others would all need to learn about it.
The flag was introduced 6 years ago and I am still not aware of any tool
that supports it. Even if there was support, it would be cumbersome to
use. The most natural experience is a single CID address space where H2G
only wins for CIDs it actually owns, and everything else falls through to
G2H, extending the behavior that already exists when H2G is absent.

To give user space at least a hint that the kernel applied this logic,
automatically set the VMADDR_FLAG_TO_HOST on the remote address so it
can determine the path taken via getpeername().

Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1).
At 0 it forces strict routing: H2G always wins for CID &gt; VMADDR_CID_HOST,
or ENODEV if H2G is not loaded.

Signed-off-by: Alexander Graf &lt;graf@amazon.com&gt;
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella &lt;sgarzare@redhat.com&gt;
Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>xfrm: avoid RCU warnings around the per-netns netlink socket</title>
<updated>2026-03-12T06:16:02+00:00</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2026-03-09T10:32:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d87f8bc47fbf012a7f115e311d0603d97e47c34c'/>
<id>d87f8bc47fbf012a7f115e311d0603d97e47c34c</id>
<content type='text'>
net-&gt;xfrm.nlsk is used in 2 types of contexts:
 - fully under RCU, with rcu_read_lock + rcu_dereference and a NULL check
 - in the netlink handlers, with requests coming from a userspace socket

In the 2nd case, net-&gt;xfrm.nlsk is guaranteed to stay non-NULL and the
object is alive, since we can't enter the netns destruction path while
the user socket holds a reference on the netns.

After adding the __rcu annotation to netns_xfrm.nlsk (which silences
sparse warnings in the RCU users and __net_init code), we need to tell
sparse that the 2nd case is safe. Add a helper for that.

Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
net-&gt;xfrm.nlsk is used in 2 types of contexts:
 - fully under RCU, with rcu_read_lock + rcu_dereference and a NULL check
 - in the netlink handlers, with requests coming from a userspace socket

In the 2nd case, net-&gt;xfrm.nlsk is guaranteed to stay non-NULL and the
object is alive, since we can't enter the netns destruction path while
the user socket holds a reference on the netns.

After adding the __rcu annotation to netns_xfrm.nlsk (which silences
sparse warnings in the RCU users and __net_init code), we need to tell
sparse that the 2nd case is safe. Add a helper for that.

Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: add ip_local_port_step_width sysctl to improve port usage distribution</title>
<updated>2026-03-11T01:59:39+00:00</updated>
<author>
<name>Fernando Fernandez Mancera</name>
<email>fmancera@suse.de</email>
</author>
<published>2026-03-09T02:39:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7da62262ec96a4b345d207b6bcd2ddf5231b7f7d'/>
<id>7da62262ec96a4b345d207b6bcd2ddf5231b7f7d</id>
<content type='text'>
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.

The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.

The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".

In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]

[1] https://0xffsoftware.com/port_graph_current_alg.html

[2] https://0xffsoftware.com/port_graph_random_step_alg.html

Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Fernando Fernandez Mancera &lt;fmancera@suse.de&gt;
Link: https://patch.msgid.link/20260309023946.5473-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With the current port selection algorithm, ports after a reserved port
range or long time used port are used more often than others [1]. This
causes an uneven port usage distribution. This combines with cloud
environments blocking connections between the application server and the
database server if there was a previous connection with the same source
port, leading to connectivity problems between applications on cloud
environments.

The real issue here is that these firewalls cannot cope with
standards-compliant port reuse. This is a workaround for such situations
and an improvement on the distribution of ports selected.

The proposed solution is to implement a variant of RFC 6056 Algorithm 5.
The step size is selected randomly on every connect() call ensuring it
is a coprime with respect to the size of the range of ports we want to
scan. This way, we can ensure that all ports within the range are
scanned before returning an error. To enable this algorithm, the user
must configure the new sysctl option "net.ipv4.ip_local_port_step_width".

In addition, on graphs generated we can observe that the distribution of
source ports is more even with the proposed approach. [2]

[1] https://0xffsoftware.com/port_graph_current_alg.html

[2] https://0xffsoftware.com/port_graph_random_step_alg.html

Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Fernando Fernandez Mancera &lt;fmancera@suse.de&gt;
Link: https://patch.msgid.link/20260309023946.5473-2-fmancera@suse.de
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
