<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/mptcp/ctrl.c, branch v6.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mptcp: track fallbacks accurately via mibs</title>
<updated>2025-07-25T18:29:04+00:00</updated>
<author>
<name>Paolo Abeni</name>
<email>pabeni@redhat.com</email>
</author>
<published>2025-07-23T14:32:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c65c2e3bae6912b6baf8ea12eeace220e8e99b47'/>
<id>c65c2e3bae6912b6baf8ea12eeace220e8e99b47</id>
<content type='text'>
Add the mibs required to cover the few possible fallback causes still
lacking suck info.

Move the relevant mib increment into the fallback helper, so that no
eventual future fallback operation will miss a paired mib increment.

Additionally track failed fallback via its own mib, such mib is
incremented only when a fallback mandated by the protocol fails - due to
racing subflow creation.

While at the above, rename an existing helper to reduce long lines
problems all along.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250723-net-next-mptcp-track-fallbacks-v1-1-a83cce08f2d5@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the mibs required to cover the few possible fallback causes still
lacking suck info.

Move the relevant mib increment into the fallback helper, so that no
eventual future fallback operation will miss a paired mib increment.

Additionally track failed fallback via its own mib, such mib is
incremented only when a fallback mandated by the protocol fails - due to
racing subflow creation.

While at the above, rename an existing helper to reduce long lines
problems all along.

Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250723-net-next-mptcp-track-fallbacks-v1-1-a83cce08f2d5@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: add available_path_managers</title>
<updated>2025-03-20T09:14:49+00:00</updated>
<author>
<name>Geliang Tang</name>
<email>tanggeliang@kylinos.cn</email>
</author>
<published>2025-03-13T10:21:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fa3ee9dd8067e178b459f1df3081fd25fc03c4b5'/>
<id>fa3ee9dd8067e178b459f1df3081fd25fc03c4b5</id>
<content type='text'>
Similarly to net.mptcp.available_schedulers, this patch adds a new one
net.mptcp.available_path_managers to list the available path managers.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similarly to net.mptcp.available_schedulers, this patch adds a new one
net.mptcp.available_path_managers to list the available path managers.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-11-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: map pm_type to path_manager</title>
<updated>2025-03-20T09:14:49+00:00</updated>
<author>
<name>Geliang Tang</name>
<email>tanggeliang@kylinos.cn</email>
</author>
<published>2025-03-13T10:20:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7982ed0edda3379e17c9cfe64e761af34ef63546'/>
<id>7982ed0edda3379e17c9cfe64e761af34ef63546</id>
<content type='text'>
This patch adds a new proc_handler "proc_pm_type" for "pm_type" to
map old path manager sysctl "pm_type" to the newly added "path_manager".

	path_manager		   pm_type

	MPTCP_PM_TYPE_KERNEL    -&gt; "kernel"
	MPTCP_PM_TYPE_USERSPACE -&gt; "userspace"

It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-10-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds a new proc_handler "proc_pm_type" for "pm_type" to
map old path manager sysctl "pm_type" to the newly added "path_manager".

	path_manager		   pm_type

	MPTCP_PM_TYPE_KERNEL    -&gt; "kernel"
	MPTCP_PM_TYPE_USERSPACE -&gt; "userspace"

It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-10-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: map path_manager to pm_type</title>
<updated>2025-03-20T09:14:49+00:00</updated>
<author>
<name>Geliang Tang</name>
<email>tanggeliang@kylinos.cn</email>
</author>
<published>2025-03-13T10:20:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=573b653401a8fa9bf61c281fa922e42f1e5c51e6'/>
<id>573b653401a8fa9bf61c281fa922e42f1e5c51e6</id>
<content type='text'>
This patch maps the newly added path manager sysctl "path_manager"
to the old one "pm_type".

	path_manager   pm_type

	"kernel"    -&gt; MPTCP_PM_TYPE_KERNEL
	"userspace" -&gt; MPTCP_PM_TYPE_USERSPACE
	others      -&gt; __MPTCP_PM_TYPE_NR

It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-9-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch maps the newly added path manager sysctl "path_manager"
to the old one "pm_type".

	path_manager   pm_type

	"kernel"    -&gt; MPTCP_PM_TYPE_KERNEL
	"userspace" -&gt; MPTCP_PM_TYPE_USERSPACE
	others      -&gt; __MPTCP_PM_TYPE_NR

It is important to add this to keep a compatibility with the now
deprecated pm_type sysctl knob.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-9-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: set path manager by name</title>
<updated>2025-03-20T09:14:48+00:00</updated>
<author>
<name>Geliang Tang</name>
<email>tanggeliang@kylinos.cn</email>
</author>
<published>2025-03-13T10:20:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=595c26d122d120b6b2f65495e31c857d0c8c6374'/>
<id>595c26d122d120b6b2f65495e31c857d0c8c6374</id>
<content type='text'>
Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob
is added to determine which path manager will be used by each newly
created MPTCP socket by setting the name of it.

Dealing with an explicit name is easier than with a number, especially
when more PMs will be introduced.

This sysctl knob makes the old one "pm_type" deprecated.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similar to net.mptcp.scheduler, a new net.mptcp.path_manager sysctl knob
is added to determine which path manager will be used by each newly
created MPTCP socket by setting the name of it.

Dealing with an explicit name is easier than with a number, especially
when more PMs will be introduced.

This sysctl knob makes the old one "pm_type" deprecated.

Signed-off-by: Geliang Tang &lt;tanggeliang@kylinos.cn&gt;
Reviewed-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Link: https://patch.msgid.link/20250313-net-next-mptcp-pm-ops-intro-v1-8-f4e4a88efc50@kernel.org
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: blackhole: avoid checking the state twice</title>
<updated>2025-02-25T02:23:44+00:00</updated>
<author>
<name>Matthieu Baerts (NGI0)</name>
<email>matttbe@kernel.org</email>
</author>
<published>2025-02-21T15:44:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8275ac799ee15e972841eb77b694d63f5e888519'/>
<id>8275ac799ee15e972841eb77b694d63f5e888519</id>
<content type='text'>
A small cleanup, reordering the conditions to avoid checking things
twice.

The code here is called in case of timeout on a TCP connection, before
triggering a retransmission. But it only acts on SYN + MPC packets.

So the conditions can be re-order to exit early in case of non-MPTCP
SYN + MPC. This also reduce the indentation levels.

Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-10-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A small cleanup, reordering the conditions to avoid checking things
twice.

The code here is called in case of timeout on a TCP connection, before
triggering a retransmission. But it only acts on SYN + MPC packets.

So the conditions can be re-order to exit early in case of non-MPTCP
SYN + MPC. This also reduce the indentation levels.

Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250221-net-next-mptcp-pm-misc-cleanup-3-v1-10-2b70ab1cee79@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted</title>
<updated>2025-01-30T13:02:19+00:00</updated>
<author>
<name>Matthieu Baerts (NGI0)</name>
<email>matttbe@kernel.org</email>
</author>
<published>2025-01-29T12:24:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e598d8981fd34470b78a1ae777dbf131b15d5bf2'/>
<id>e598d8981fd34470b78a1ae777dbf131b15d5bf2</id>
<content type='text'>
The Fixes commit mentioned this:

&gt; An MPTCP firewall blackhole can be detected if the following SYN
&gt; retransmission after a fallback to "plain" TCP is accepted.

But in fact, this blackhole was detected if any following SYN
retransmissions after a fallback to TCP was accepted.

That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp'
to 0, and 'mpc_drop' will never be reset to 0 after.

This is an issue, because some not so unusual situations might cause the
kernel to detect a false-positive blackhole, e.g. a client trying to
connect to a server while the network is not ready yet, causing a few
SYN retransmissions, before reaching the end server.

Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The Fixes commit mentioned this:

&gt; An MPTCP firewall blackhole can be detected if the following SYN
&gt; retransmission after a fallback to "plain" TCP is accepted.

But in fact, this blackhole was detected if any following SYN
retransmissions after a fallback to TCP was accepted.

That's because 'mptcp_subflow_early_fallback()' will set 'request_mptcp'
to 0, and 'mpc_drop' will never be reset to 0 after.

This is an issue, because some not so unusual situations might cause the
kernel to detect a false-positive blackhole, e.g. a client trying to
connect to a server while the network is not ready yet, causing a few
SYN retransmissions, before reaching the end server.

Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: add syn_retrans_before_tcp_fallback</title>
<updated>2025-01-19T01:32:01+00:00</updated>
<author>
<name>Matthieu Baerts (NGI0)</name>
<email>matttbe@kernel.org</email>
</author>
<published>2025-01-17T17:40:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5b4fd35343d72f4d4f964de2a9fe36143cc18f39'/>
<id>5b4fd35343d72f4d4f964de2a9fe36143cc18f39</id>
<content type='text'>
The number of SYN + MPC retransmissions before falling back to TCP was
fixed to 2. This is certainly a good default value, but having a fixed
number can be a problem in some environments.

The current behaviour means that if all packets are dropped, there will
be:

- The initial SYN + MPC

- 2 retransmissions with MPC

- The next ones will be without MPTCP.

So typically ~3 seconds before falling back to TCP. In some networks
where some temporally blackholes are unfortunately frequent, or when a
client tries to initiate connections while the network is not ready yet,
this can cause new connections not to have MPTCP connections.

In such environments, it is now possible to increase the number of SYN
retransmissions with MPTCP options to make sure MPTCP is used.

Interesting values are:

- 0: the first retransmission will be done without MPTCP options: quite
     aggressive, but also a higher risk of detecting false-positive
     MPTCP blackholes.

- &gt;= 128: all SYN retransmissions will keep the MPTCP options: back to
          the &lt; 6.12 behaviour.

The default behaviour is not changed here.

Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The number of SYN + MPC retransmissions before falling back to TCP was
fixed to 2. This is certainly a good default value, but having a fixed
number can be a problem in some environments.

The current behaviour means that if all packets are dropped, there will
be:

- The initial SYN + MPC

- 2 retransmissions with MPC

- The next ones will be without MPTCP.

So typically ~3 seconds before falling back to TCP. In some networks
where some temporally blackholes are unfortunately frequent, or when a
client tries to initiate connections while the network is not ready yet,
this can cause new connections not to have MPTCP connections.

In such environments, it is now possible to increase the number of SYN
retransmissions with MPTCP options to make sure MPTCP is used.

Interesting values are:

- 0: the first retransmission will be done without MPTCP options: quite
     aggressive, but also a higher risk of detecting false-positive
     MPTCP blackholes.

- &gt;= 128: all SYN retransmissions will keep the MPTCP options: back to
          the &lt; 6.12 behaviour.

The default behaviour is not changed here.

Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250117-net-next-mptcp-syn_retrans_before_tcp_fallback-v1-1-ab4b187099b0@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: blackhole timeout: avoid using current-&gt;nsproxy</title>
<updated>2025-01-09T16:53:34+00:00</updated>
<author>
<name>Matthieu Baerts (NGI0)</name>
<email>matttbe@kernel.org</email>
</author>
<published>2025-01-08T15:34:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=92cf7a51bdae24a32c592adcdd59a773ae149289'/>
<id>92cf7a51bdae24a32c592adcdd59a773ae149289</id>
<content type='text'>
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current-&gt;nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'pernet' structure can be obtained from the table-&gt;data using
container_of().

Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-3-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As mentioned in the previous commit, using the 'net' structure via
'current' is not recommended for different reasons:

- Inconsistency: getting info from the reader's/writer's netns vs only
  from the opener's netns.

- current-&gt;nsproxy can be NULL in some cases, resulting in an 'Oops'
  (null-ptr-deref), e.g. when the current task is exiting, as spotted by
  syzbot [1] using acct(2).

The 'pernet' structure can be obtained from the table-&gt;data using
container_of().

Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com [1]
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-3-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mptcp: sysctl: sched: avoid using current-&gt;nsproxy</title>
<updated>2025-01-09T16:53:34+00:00</updated>
<author>
<name>Matthieu Baerts (NGI0)</name>
<email>matttbe@kernel.org</email>
</author>
<published>2025-01-08T15:34:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d38e26e36206ae3d544d496513212ae931d1da0a'/>
<id>d38e26e36206ae3d544d496513212ae931d1da0a</id>
<content type='text'>
Using the 'net' structure via 'current' is not recommended for different
reasons.

First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table-&gt;data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.

Another reason is that access to current-&gt;nsproxy-&gt;netns can oops if
attempted when current-&gt;nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
  CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206

  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   &lt;TASK&gt;
   proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
   __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
   __kernel_write+0xf6/0x140 fs/read_write.c:632
   do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
   acct_pin_kill+0x2d/0x100 kernel/acct.c:192
   pin_kill+0x194/0x7c0 fs/fs_pin.c:44
   mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
   cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
   task_work_run+0x14e/0x250 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xad8/0x2d70 kernel/exit.c:938
   do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
   get_signal+0x2576/0x2610 kernel/signal.c:3017
   arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
   do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fee3cb87a6a
  Code: Unable to access opcode bytes at 0x7fee3cb87a40.
  RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
  RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
  RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
  R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
  R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
   &lt;/TASK&gt;
  Modules linked in:
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  ----------------
  Code disassembly (best guess), 1 bytes skipped:
     0:	42 80 3c 38 00       	cmpb   $0x0,(%rax,%r15,1)
     5:	0f 85 fe 02 00 00    	jne    0x309
     b:	4d 8b a4 24 08 09 00 	mov    0x908(%r12),%r12
    12:	00
    13:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
    1a:	fc ff df
    1d:	49 8d 7c 24 28       	lea    0x28(%r12),%rdi
    22:	48 89 fa             	mov    %rdi,%rdx
    25:	48 c1 ea 03          	shr    $0x3,%rdx
  * 29:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) &lt;-- trapping instruction
    2d:	0f 85 cc 02 00 00    	jne    0x2ff
    33:	4d 8b 7c 24 28       	mov    0x28(%r12),%r15
    38:	48                   	rex.W
    39:	8d                   	.byte 0x8d
    3a:	84 24 c8             	test   %ah,(%rax,%rcx,8)

Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table-&gt;data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.

Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using the 'net' structure via 'current' is not recommended for different
reasons.

First, if the goal is to use it to read or write per-netns data, this is
inconsistent with how the "generic" sysctl entries are doing: directly
by only using pointers set to the table entry, e.g. table-&gt;data. Linked
to that, the per-netns data should always be obtained from the table
linked to the netns it had been created for, which may not coincide with
the reader's or writer's netns.

Another reason is that access to current-&gt;nsproxy-&gt;netns can oops if
attempted when current-&gt;nsproxy had been dropped when the current task
is exiting. This is what syzbot found, when using acct(2):

  Oops: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN PTI
  KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
  CPU: 1 UID: 0 PID: 5924 Comm: syz-executor Not tainted 6.13.0-rc5-syzkaller-00004-gccb98ccef0e5 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206

  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   &lt;TASK&gt;
   proc_sys_call_handler+0x403/0x5d0 fs/proc/proc_sysctl.c:601
   __kernel_write_iter+0x318/0xa80 fs/read_write.c:612
   __kernel_write+0xf6/0x140 fs/read_write.c:632
   do_acct_process+0xcb0/0x14a0 kernel/acct.c:539
   acct_pin_kill+0x2d/0x100 kernel/acct.c:192
   pin_kill+0x194/0x7c0 fs/fs_pin.c:44
   mnt_pin_kill+0x61/0x1e0 fs/fs_pin.c:81
   cleanup_mnt+0x3ac/0x450 fs/namespace.c:1366
   task_work_run+0x14e/0x250 kernel/task_work.c:239
   exit_task_work include/linux/task_work.h:43 [inline]
   do_exit+0xad8/0x2d70 kernel/exit.c:938
   do_group_exit+0xd3/0x2a0 kernel/exit.c:1087
   get_signal+0x2576/0x2610 kernel/signal.c:3017
   arch_do_signal_or_restart+0x90/0x7e0 arch/x86/kernel/signal.c:337
   exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
   exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
   __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
   syscall_exit_to_user_mode+0x150/0x2a0 kernel/entry/common.c:218
   do_syscall_64+0xda/0x250 arch/x86/entry/common.c:89
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  RIP: 0033:0x7fee3cb87a6a
  Code: Unable to access opcode bytes at 0x7fee3cb87a40.
  RSP: 002b:00007fffcccac688 EFLAGS: 00000202 ORIG_RAX: 0000000000000037
  RAX: 0000000000000000 RBX: 00007fffcccac710 RCX: 00007fee3cb87a6a
  RDX: 0000000000000041 RSI: 0000000000000000 RDI: 0000000000000003
  RBP: 0000000000000003 R08: 00007fffcccac6ac R09: 00007fffcccacac7
  R10: 00007fffcccac710 R11: 0000000000000202 R12: 00007fee3cd49500
  R13: 00007fffcccac6ac R14: 0000000000000000 R15: 00007fee3cd4b000
   &lt;/TASK&gt;
  Modules linked in:
  ---[ end trace 0000000000000000 ]---
  RIP: 0010:proc_scheduler+0xc6/0x3c0 net/mptcp/ctrl.c:125
  Code: 03 42 80 3c 38 00 0f 85 fe 02 00 00 4d 8b a4 24 08 09 00 00 48 b8 00 00 00 00 00 fc ff df 49 8d 7c 24 28 48 89 fa 48 c1 ea 03 &lt;80&gt; 3c 02 00 0f 85 cc 02 00 00 4d 8b 7c 24 28 48 8d 84 24 c8 00 00
  RSP: 0018:ffffc900034774e8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: 1ffff9200068ee9e RCX: ffffc90003477620
  RDX: 0000000000000005 RSI: ffffffff8b08f91e RDI: 0000000000000028
  RBP: 0000000000000001 R08: ffffc90003477710 R09: 0000000000000040
  R10: 0000000000000040 R11: 00000000726f7475 R12: 0000000000000000
  R13: ffffc90003477620 R14: ffffc90003477710 R15: dffffc0000000000
  FS:  0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fee3cd452d8 CR3: 000000007d116000 CR4: 00000000003526f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  ----------------
  Code disassembly (best guess), 1 bytes skipped:
     0:	42 80 3c 38 00       	cmpb   $0x0,(%rax,%r15,1)
     5:	0f 85 fe 02 00 00    	jne    0x309
     b:	4d 8b a4 24 08 09 00 	mov    0x908(%r12),%r12
    12:	00
    13:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
    1a:	fc ff df
    1d:	49 8d 7c 24 28       	lea    0x28(%r12),%rdi
    22:	48 89 fa             	mov    %rdi,%rdx
    25:	48 c1 ea 03          	shr    $0x3,%rdx
  * 29:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1) &lt;-- trapping instruction
    2d:	0f 85 cc 02 00 00    	jne    0x2ff
    33:	4d 8b 7c 24 28       	mov    0x28(%r12),%r15
    38:	48                   	rex.W
    39:	8d                   	.byte 0x8d
    3a:	84 24 c8             	test   %ah,(%rax,%rcx,8)

Here with 'net.mptcp.scheduler', the 'net' structure is not really
needed, because the table-&gt;data already has a pointer to the current
scheduler, the only thing needed from the per-netns data.
Simply use 'data', instead of getting (most of the time) the same thing,
but from a longer and indirect way.

Fixes: 6963c508fd7a ("mptcp: only allow set existing scheduler for net.mptcp.scheduler")
Cc: stable@vger.kernel.org
Reported-by: syzbot+e364f774c6f57f2c86d1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/67769ecb.050a0220.3a8527.003f.GAE@google.com
Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Reviewed-by: Mat Martineau &lt;martineau@kernel.org&gt;
Signed-off-by: Matthieu Baerts (NGI0) &lt;matttbe@kernel.org&gt;
Link: https://patch.msgid.link/20250108-net-sysctl-current-nsproxy-v1-2-5df34b2083e8@kernel.org
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
