<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4/fib_semantics.c, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ipv4: remove fib_info_devhash[]</title>
<updated>2024-10-07T23:46:27+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-10-04T13:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a3f5f4c2f9b6bc2aa6f5a3e8e23b7519e4f2e3e3'/>
<id>a3f5f4c2f9b6bc2aa6f5a3e8e23b7519e4f2e3e3</id>
<content type='text'>
Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.

fib_info_devhash[] is one of them.

It is unclear why we used a hash table, because
a single hlist_head per net device was cheaper and scalable.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Upcoming per-netns RTNL conversion needs to get rid
of shared hash tables.

fib_info_devhash[] is one of them.

It is unclear why we used a hash table, because
a single hlist_head per net device was cheaper and scalable.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-5-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: remove fib_info_lock</title>
<updated>2024-10-07T23:38:58+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-10-04T13:47:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=143ca845ec0c625c410768c36e1a949ef4ed1915'/>
<id>143ca845ec0c625c410768c36e1a949ef4ed1915</id>
<content type='text'>
After the prior patch, fib_info_lock became redundant
because all of its users are holding RTNL.

BH protection is not needed.

Remove the READ_ONCE()/WRITE_ONCE() annotations around fib_info_cnt,
since it is protected by RTNL.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-4-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After the prior patch, fib_info_lock became redundant
because all of its users are holding RTNL.

BH protection is not needed.

Remove the READ_ONCE()/WRITE_ONCE() annotations around fib_info_cnt,
since it is protected by RTNL.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-4-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: use rcu in ip_fib_check_default()</title>
<updated>2024-10-07T23:38:58+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-10-04T13:47:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fc38b28365e5f1396209d2878a34065468765087'/>
<id>fc38b28365e5f1396209d2878a34065468765087</id>
<content type='text'>
fib_info_devhash[] is not resized in fib_info_hash_move().

fib_nh structs are already freed after an rcu grace period.

This will allow to remove fib_info_lock in the following patch.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-3-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fib_info_devhash[] is not resized in fib_info_hash_move().

fib_nh structs are already freed after an rcu grace period.

This will allow to remove fib_info_lock in the following patch.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-3-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: remove fib_devindex_hashfn()</title>
<updated>2024-10-07T23:38:58+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2024-10-04T13:47:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8a0f62fdeb9ea66ad3d0e959c7c4addbabeac1be'/>
<id>8a0f62fdeb9ea66ad3d0e959c7c4addbabeac1be</id>
<content type='text'>
fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash.

It makes no sense doing this from fib_info_hashfn() and
fib_find_info_nh().

It is better to keep as many bits as possible to let
fib_info_hashfn_result() have better spread.

Only fib_info_devhash_bucket() needs to make this operation,
we can 'inline' trivial fib_devindex_hashfn() in it.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-2-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fib_devindex_hashfn() converts a 32bit ifindex value to a 8bit hash.

It makes no sense doing this from fib_info_hashfn() and
fib_find_info_nh().

It is better to keep as many bits as possible to let
fib_info_hashfn_result() have better spread.

Only fib_info_devhash_bucket() needs to make this operation,
we can 'inline' trivial fib_devindex_hashfn() in it.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20241004134720.579244-2-edumazet@google.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: avoid quadratic behavior in FIB insertion of common address</title>
<updated>2024-10-03T23:15:30+00:00</updated>
<author>
<name>Alexandre Ferrieux</name>
<email>alexandre.ferrieux@gmail.com</email>
</author>
<published>2024-10-01T23:14:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b8ca04854fd1253a58aeb1bd089c191cb5a074c'/>
<id>9b8ca04854fd1253a58aeb1bd089c191cb5a074c</id>
<content type='text'>
Mix netns into all IPv4 FIB hashes to avoid massive collision when
inserting the same address in many netns.

Signed-off-by: Alexandre Ferrieux &lt;alexandre.ferrieux@orange.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20241001231438.3855035-1-alexandre.ferrieux@orange.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Mix netns into all IPv4 FIB hashes to avoid massive collision when
inserting the same address in many netns.

Signed-off-by: Alexandre Ferrieux &lt;alexandre.ferrieux@orange.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reviewed-by: Kuniyuki Iwashima &lt;kuniyu@amazon.com&gt;
Link: https://patch.msgid.link/20241001231438.3855035-1-alexandre.ferrieux@orange.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: delete redundant judgment statements</title>
<updated>2024-08-23T13:27:45+00:00</updated>
<author>
<name>Li Zetao</name>
<email>lizetao1@huawei.com</email>
</author>
<published>2024-08-22T04:32:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4c180887775f3e3fb46b9e447cf3ea539286919d'/>
<id>4c180887775f3e3fb46b9e447cf3ea539286919d</id>
<content type='text'>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.

Signed-off-by: Li Zetao &lt;lizetao1@huawei.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The initial value of err is -ENOBUFS, and err is guaranteed to be
less than 0 before all goto errout. Therefore, on the error path
of errout, there is no need to repeatedly judge that err is less than 0,
and delete redundant judgments to make the code more concise.

Signed-off-by: Li Zetao &lt;lizetao1@huawei.com&gt;
Reviewed-by: Petr Machata &lt;petrm@nvidia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: Centralize TOS matching</title>
<updated>2024-08-20T12:57:08+00:00</updated>
<author>
<name>Ido Schimmel</name>
<email>idosch@nvidia.com</email>
</author>
<published>2024-08-14T12:52:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1fa3314c14c6a20d098991a0a6980f9b18b2f930'/>
<id>1fa3314c14c6a20d098991a0a6980f9b18b2f930</id>
<content type='text'>
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.

What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).

Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:

 # ip -4 rule add tos 0x1c table 100
 # ip -4 rule add tos 0x3c table 100
 Error: Invalid tos.

While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.

Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.

A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.

However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.

This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.

Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.

Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.

Include &lt;linux/ip.h&gt; in &lt;linux/in_route.h&gt; since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
&lt;linux/in_route.h&gt;.

No regressions in FIB tests:

 # ./fib_tests.sh
 [...]
 Tests passed: 218
 Tests failed:   0

And FIB rule tests:

 # ./fib_rule_tests.sh
 [...]
 Tests passed: 116
 Tests failed:   0

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The TOS field in the IPv4 flow information structure ('flowi4_tos') is
matched by the kernel against the TOS selector in IPv4 rules and routes.
The field is initialized differently by different call sites. Some treat
it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as
RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC
791 TOS and initialize it using IPTOS_RT_MASK.

What is common to all these call sites is that they all initialize the
lower three DSCP bits, which fits the TOS definition in the initial IPv4
specification (RFC 791).

Therefore, the kernel only allows configuring IPv4 FIB rules that match
on the lower three DSCP bits which are always guaranteed to be
initialized by all call sites:

 # ip -4 rule add tos 0x1c table 100
 # ip -4 rule add tos 0x3c table 100
 Error: Invalid tos.

While this works, it is unlikely to be very useful. RFC 791 that
initially defined the TOS and IP precedence fields was updated by RFC
2474 over twenty five years ago where these fields were replaced by a
single six bits DSCP field.

Extending FIB rules to match on DSCP can be done by adding a new DSCP
selector while maintaining the existing semantics of the TOS selector
for applications that rely on that.

A prerequisite for allowing FIB rules to match on DSCP is to adjust all
the call sites to initialize the high order DSCP bits and remove their
masking along the path to the core where the field is matched on.

However, making this change alone will result in a behavior change. For
example, a forwarded IPv4 packet with a DS field of 0xfc will no longer
match a FIB rule that was configured with 'tos 0x1c'.

This behavior change can be avoided by masking the upper three DSCP bits
in 'flowi4_tos' before comparing it against the TOS selectors in FIB
rules and routes.

Implement the above by adding a new function that checks whether a given
DSCP value matches the one specified in the IPv4 flow information
structure and invoke it from the three places that currently match on
'flowi4_tos'.

Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK
since the latter is not uAPI and we should be able to remove it at some
point.

Include &lt;linux/ip.h&gt; in &lt;linux/in_route.h&gt; since the former defines
IPTOS_TOS_MASK which is used in the definition of RT_TOS() in
&lt;linux/in_route.h&gt;.

No regressions in FIB tests:

 # ./fib_tests.sh
 [...]
 Tests passed: 218
 Tests failed:   0

And FIB rule tests:

 # ./fib_rule_tests.sh
 [...]
 Tests passed: 116
 Tests failed:   0

Signed-off-by: Ido Schimmel &lt;idosch@nvidia.com&gt;
Signed-off-by: Paolo Abeni &lt;pabeni@redhat.com&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2024-07-15T20:19:17+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>kuba@kernel.org</email>
</author>
<published>2024-07-15T20:05:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=51b35d4f9d8860b7c181e545a90bc0ad370efbdb'/>
<id>51b35d4f9d8860b7c181e545a90bc0ad370efbdb</id>
<content type='text'>
Merge in late fixes to prepare for the 6.11 net-next PR.

Conflicts:
  93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null")
  4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref")
  30d7b6727724 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge in late fixes to prepare for the 6.11 net-next PR.

Conflicts:
  93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null")
  4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref")
  30d7b6727724 ("net: ethtool: Add new power limit get and set features")
https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/

Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fix source address selection with route leak</title>
<updated>2024-07-14T14:34:15+00:00</updated>
<author>
<name>Nicolas Dichtel</name>
<email>nicolas.dichtel@6wind.com</email>
</author>
<published>2024-07-10T08:14:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6807352353561187a718e87204458999dbcbba1b'/>
<id>6807352353561187a718e87204458999dbcbba1b</id>
<content type='text'>
By default, an address assigned to the output interface is selected when
the source address is not specified. This is problematic when a route,
configured in a vrf, uses an interface from another vrf (aka route leak).
The original vrf does not own the selected source address.

Let's add a check against the output interface and call the appropriate
function to select the source address.

CC: stable@vger.kernel.org
Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF")
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
By default, an address assigned to the output interface is selected when
the source address is not specified. This is problematic when a route,
configured in a vrf, uses an interface from another vrf (aka route leak).
The original vrf does not own the selected source address.

Let's add a check against the output interface and call the appropriate
function to select the source address.

CC: stable@vger.kernel.org
Fixes: 8cbb512c923d ("net: Add source address lookup op for VRF")
Signed-off-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Reviewed-by: David Ahern &lt;dsahern@kernel.org&gt;
Link: https://patch.msgid.link/20240710081521.3809742-2-nicolas.dichtel@6wind.com
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: remove NULL-pointer net parameter in ip_metrics_convert</title>
<updated>2024-06-05T09:06:00+00:00</updated>
<author>
<name>Jason Xing</name>
<email>kernelxing@tencent.com</email>
</author>
<published>2024-05-31T15:46:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=61e2bbafb00e4b9a5de45e6448a7b6b818658576'/>
<id>61e2bbafb00e4b9a5de45e6448a7b6b818658576</id>
<content type='text'>
When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always triggers NULL
pointer crash. Then I digged into this part, realizing that we can remove
this one due to its uselessness.

Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When I was doing some experiments, I found that when using the first
parameter, namely, struct net, in ip_metrics_convert() always triggers NULL
pointer crash. Then I digged into this part, realizing that we can remove
this one due to its uselessness.

Signed-off-by: Jason Xing &lt;kernelxing@tencent.com&gt;
Reviewed-by: Simon Horman &lt;horms@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
