<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/ipv4/fib_semantics.c, branch linux-4.0.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>fib_trie: Move fib_find_alias to file where it is used</title>
<updated>2015-01-25T22:47:16+00:00</updated>
<author>
<name>Alexander Duyck</name>
<email>alexander.h.duyck@redhat.com</email>
</author>
<published>2015-01-22T23:51:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=02525368f48c197bce6e4251ff7bde92fa6f026e'/>
<id>02525368f48c197bce6e4251ff7bde92fa6f026e</id>
<content type='text'>
The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.

Signed-off-by: Alexander Duyck &lt;alexander.h.duyck@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The function fib_find_alias is only accessed by functions in fib_trie.c as
such it makes sense to relocate it and cast it as static so that the
compiler can take advantage of optimizations it can do to it as a local
function.

Signed-off-by: Alexander Duyck &lt;alexander.h.duyck@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: make nlmsg_end() and genlmsg_end() void</title>
<updated>2015-01-18T06:03:45+00:00</updated>
<author>
<name>Johannes Berg</name>
<email>johannes.berg@intel.com</email>
</author>
<published>2015-01-16T21:09:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=053c095a82cf773075e83d7233b5cc19a1f73ece'/>
<id>053c095a82cf773075e83d7233b5cc19a1f73ece</id>
<content type='text'>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) &lt; 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb-&gt;len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb-&gt;len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with &lt;= 0 in dump functionality, but that could just
be changed to &lt; 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for &lt;0 or &lt;=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) &lt; 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb-&gt;len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb-&gt;len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with &lt;= 0 in dump functionality, but that could just
be changed to &lt; 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for &lt;0 or &lt;=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: tcp: add RTAX_CC_ALGO fib handling</title>
<updated>2015-01-06T03:55:24+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2015-01-05T22:57:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ea697639992d96da98016b8934e68a73876a2264'/>
<id>ea697639992d96da98016b8934e68a73876a2264</id>
<content type='text'>
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fix nexthop attlen check in fib_nh_match</title>
<updated>2014-10-14T19:59:37+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@resnulli.us</email>
</author>
<published>2014-10-13T14:34:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7'/>
<id>f76936d07c4eeb36d8dbb64ebd30ab46ff85d9f7</id>
<content type='text'>
fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fib_nh_match does not match nexthops correctly. Example:

ip route add 172.16.10/24 nexthop via 192.168.122.12 dev eth0 \
                          nexthop via 192.168.122.13 dev eth0
ip route del 172.16.10/24 nexthop via 192.168.122.14 dev eth0 \
                          nexthop via 192.168.122.15 dev eth0

Del command is successful and route is removed. After this patch
applied, the route is correctly matched and result is:
RTNETLINK answers: No such process

Please consider this for stable trees as well.

Fixes: 4e902c57417c4 ("[IPv4]: FIB configuration using struct fib_config")
Signed-off-by: Jiri Pirko &lt;jiri@resnulli.us&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fix a race in update_or_create_fnhe()</title>
<updated>2014-09-06T00:15:50+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-09-04T05:21:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=caa415270c732505240bb60171c44a7838c555e8'/>
<id>caa415270c732505240bb60171c44a7838c555e8</id>
<content type='text'>
nh_exceptions is effectively used under rcu, but lacks proper
barriers. Between kzalloc() and setting of nh-&gt;nh_exceptions(),
we need a proper memory barrier.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 4895c771c7f00 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nh_exceptions is effectively used under rcu, but lacks proper
barriers. Between kzalloc() and setting of nh-&gt;nh_exceptions(),
we need a proper memory barrier.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: 4895c771c7f00 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation</title>
<updated>2014-05-07T21:14:32+00:00</updated>
<author>
<name>Sergey Popovich</name>
<email>popovich_sergei@mail.ru</email>
</author>
<published>2014-05-06T15:23:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd'/>
<id>aeefa1ecfc799b0ea2c4979617f14cecd5cccbfd</id>
<content type='text'>
Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().

Signed-off-by: Sergey Popovich &lt;popovich_sergei@mail.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Increment fib_info_cnt in fib_create_info() right after successfuly
alllocating fib_info structure, overwise fib_metrics allocation failure
leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
on error path from fib_create_info().

Signed-off-by: Sergey Popovich &lt;popovich_sergei@mail.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif</title>
<updated>2014-04-16T19:05:11+00:00</updated>
<author>
<name>Cong Wang</name>
<email>cwang@twopensource.com</email>
</author>
<published>2014-04-15T23:25:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6a662719c9868b3d6c7d26b3a085f0cd3cc15e64'/>
<id>6a662719c9868b3d6c7d26b3a085f0cd3cc15e64</id>
<content type='text'>
As suggested by Julian:

	Simply, flowi4_iif must not contain 0, it does not
	look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

        if (rule-&gt;iifindex &amp;&amp; (rule-&gt;iifindex != fl-&gt;flowi_iif))
                goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
                 from include/net/netns/netfilter.h:5,
                 from include/net/net_namespace.h:21,
                 from include/linux/netdevice.h:43,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:61,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:27,
                 from include/linux/nfs_fs.h:30,
                 from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Cong Wang &lt;cwang@twopensource.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As suggested by Julian:

	Simply, flowi4_iif must not contain 0, it does not
	look logical to ignore all ip rules with specified iif.

because in fib_rule_match() we do:

        if (rule-&gt;iifindex &amp;&amp; (rule-&gt;iifindex != fl-&gt;flowi_iif))
                goto out;

flowi4_iif should be LOOPBACK_IFINDEX by default.

We need to move LOOPBACK_IFINDEX to include/net/flow.h:

1) It is mostly used by flowi_iif

2) Fix the following compile error if we use it in flow.h
by the patches latter:

In file included from include/linux/netfilter.h:277:0,
                 from include/net/netns/netfilter.h:5,
                 from include/net/net_namespace.h:21,
                 from include/linux/netdevice.h:43,
                 from include/linux/icmpv6.h:12,
                 from include/linux/ipv6.h:61,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:27,
                 from include/linux/nfs_fs.h:30,
                 from init/do_mounts.c:32:
include/net/flow.h: In function ‘flowi4_init_output’:
include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)

Cc: Eric Biederman &lt;ebiederm@xmission.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: Cong Wang &lt;cwang@twopensource.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: make fib_detect_death static</title>
<updated>2013-12-28T22:01:46+00:00</updated>
<author>
<name>Stephen Hemminger</name>
<email>stephen@networkplumber.org</email>
</author>
<published>2013-12-28T19:05:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c9cb6b6ec18ca91d090178d3c6d1dde7428c791b'/>
<id>c9cb6b6ec18ca91d090178d3c6d1dde7428c791b</id>
<content type='text'>
Make fib_detect_death function static only used in one file.

Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make fib_detect_death function static only used in one file.

Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>fib: Use const struct nl_info * in rtmsg_fib</title>
<updated>2013-10-18T18:42:15+00:00</updated>
<author>
<name>Joe Perches</name>
<email>joe@perches.com</email>
</author>
<published>2013-10-17T20:34:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9877b25382e770618d0a36a3024d8a3c67eee9ea'/>
<id>9877b25382e770618d0a36a3024d8a3c67eee9ea</id>
<content type='text'>
The rtmsg_fib function doesn't modify this argument so mark
it const.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The rtmsg_fib function doesn't modify this argument so mark
it const.

Signed-off-by: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: use next hop exceptions also for input routes</title>
<updated>2013-06-29T04:27:47+00:00</updated>
<author>
<name>Timo Teräs</name>
<email>timo.teras@iki.fi</email>
</author>
<published>2013-06-27T07:27:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2ffae99d1fac272952b5a395759823717760ce37'/>
<id>2ffae99d1fac272952b5a395759823717760ce37</id>
<content type='text'>
Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".

However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)-&gt;ops-&gt;update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.

It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.

So for the time being, cache separate copies of input routes for
each next hop exception.

Signed-off-by: Timo Teräs &lt;timo.teras@iki.fi&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".

However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)-&gt;ops-&gt;update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.

It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.

So for the time being, cache separate copies of input routes for
each next hop exception.

Signed-off-by: Timo Teräs &lt;timo.teras@iki.fi&gt;
Reviewed-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
