<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv6/ip6_fib.c, branch v4.2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>ipv6: Fix a potential deadlock when creating pcpu rt</title>
<updated>2015-08-17T21:28:03+00:00</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2015-08-14T18:05:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9c7370a166b4e157137bfbfe2ad296d57147547c'/>
<id>9c7370a166b4e157137bfbfe2ad296d57147547c</id>
<content type='text'>
rt6_make_pcpu_route() is called under read_lock(&amp;table-&gt;tb6_lock).
rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
the write_lock(&amp;tabl-&gt;tb6_lock).  A visualized version:

read_lock(&amp;table-&gt;tb6_lock);
rt6_make_pcpu_route();
=&gt; ip6_rt_pcpu_alloc();
=&gt; dst_alloc();
=&gt; ip6_dst_gc();
=&gt; write_lock(&amp;table-&gt;tb6_lock); /* oops */

The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().

A reported stack:

[141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=60000 jiffies g=4159086 c=4159085 q=2139)
[141625.547469] Task dump for CPU 27:
[141625.550881] mtr             R  running task        0 22121  22081 0x00000008
[141625.558069]  0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b
[141625.565641]  ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000
[141625.573220]  ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00
[141625.580803] Call Trace:
[141625.583345]  &lt;IRQ&gt;  [&lt;ffffffff8106e488&gt;] sched_show_task+0xc1/0xc6
[141625.589650]  [&lt;ffffffff810702b0&gt;] dump_cpu_task+0x35/0x39
[141625.595144]  [&lt;ffffffff8108df9f&gt;] rcu_dump_cpu_stacks+0x6a/0x8c
[141625.601320]  [&lt;ffffffff81090606&gt;] rcu_check_callbacks+0x1f6/0x5d4
[141625.607669]  [&lt;ffffffff810940c8&gt;] update_process_times+0x2a/0x4f
[141625.613925]  [&lt;ffffffff8109fbee&gt;] tick_sched_handle+0x32/0x3e
[141625.619923]  [&lt;ffffffff8109fc2f&gt;] tick_sched_timer+0x35/0x5c
[141625.625830]  [&lt;ffffffff81094a1f&gt;] __hrtimer_run_queues+0x8f/0x18d
[141625.632171]  [&lt;ffffffff81094c9e&gt;] hrtimer_interrupt+0xa0/0x166
[141625.638258]  [&lt;ffffffff8102bf2a&gt;] local_apic_timer_interrupt+0x4e/0x52
[141625.645036]  [&lt;ffffffff8102c36f&gt;] smp_apic_timer_interrupt+0x39/0x4a
[141625.651643]  [&lt;ffffffff8140b9e8&gt;] apic_timer_interrupt+0x68/0x70
[141625.657895]  &lt;EOI&gt;  [&lt;ffffffff81346ee8&gt;] ? dst_destroy+0x7c/0xb5
[141625.664188]  [&lt;ffffffff813d45b5&gt;] ? fib6_flush_trees+0x20/0x20
[141625.670272]  [&lt;ffffffff81082b45&gt;] ? queue_write_lock_slowpath+0x60/0x6f
[141625.677140]  [&lt;ffffffff8140aa33&gt;] _raw_write_lock_bh+0x23/0x25
[141625.683218]  [&lt;ffffffff813d4553&gt;] __fib6_clean_all+0x40/0x82
[141625.689124]  [&lt;ffffffff813d45b5&gt;] ? fib6_flush_trees+0x20/0x20
[141625.695207]  [&lt;ffffffff813d6058&gt;] fib6_clean_all+0xe/0x10
[141625.700854]  [&lt;ffffffff813d60d3&gt;] fib6_run_gc+0x79/0xc8
[141625.706329]  [&lt;ffffffff813d0510&gt;] ip6_dst_gc+0x85/0xf9
[141625.711718]  [&lt;ffffffff81346d68&gt;] dst_alloc+0x55/0x159
[141625.717105]  [&lt;ffffffff813d09b5&gt;] __ip6_dst_alloc.isra.32+0x19/0x63
[141625.723620]  [&lt;ffffffff813d1830&gt;] ip6_pol_route+0x36a/0x3e8
[141625.729441]  [&lt;ffffffff813d18d6&gt;] ip6_pol_route_output+0x11/0x13
[141625.735700]  [&lt;ffffffff813f02c8&gt;] fib6_rule_action+0xa7/0x1bf
[141625.741698]  [&lt;ffffffff813d18c5&gt;] ? ip6_pol_route_input+0x17/0x17
[141625.748043]  [&lt;ffffffff81357c48&gt;] fib_rules_lookup+0xb5/0x12a
[141625.754050]  [&lt;ffffffff81141628&gt;] ? poll_select_copy_remaining+0xf9/0xf9
[141625.761002]  [&lt;ffffffff813f0535&gt;] fib6_rule_lookup+0x37/0x5c
[141625.766914]  [&lt;ffffffff813d18c5&gt;] ? ip6_pol_route_input+0x17/0x17
[141625.773260]  [&lt;ffffffff813d008c&gt;] ip6_route_output+0x7a/0x82
[141625.779177]  [&lt;ffffffff813c44c8&gt;] ip6_dst_lookup_tail+0x53/0x112
[141625.785437]  [&lt;ffffffff813c45c3&gt;] ip6_dst_lookup_flow+0x2a/0x6b
[141625.791604]  [&lt;ffffffff813ddaab&gt;] rawv6_sendmsg+0x407/0x9b6
[141625.797423]  [&lt;ffffffff813d7914&gt;] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
[141625.804464]  [&lt;ffffffff8139d4b4&gt;] inet_sendmsg+0x57/0x8e
[141625.810028]  [&lt;ffffffff81329ba3&gt;] sock_sendmsg+0x2e/0x3c
[141625.815588]  [&lt;ffffffff8132be57&gt;] SyS_sendto+0xfe/0x143
[141625.821063]  [&lt;ffffffff813dd551&gt;] ? rawv6_setsockopt+0x5e/0x67
[141625.827146]  [&lt;ffffffff8132c9f8&gt;] ? sock_common_setsockopt+0xf/0x11
[141625.833660]  [&lt;ffffffff8132c08c&gt;] ? SyS_setsockopt+0x81/0xa2
[141625.839565]  [&lt;ffffffff8140ac17&gt;] entry_SYSCALL_64_fastpath+0x12/0x6a

Fixes: d52d3997f843 ("pv6: Create percpu rt6_info")
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
CC: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Reported-by: Steinar H. Gunderson &lt;sgunderson@bigfoot.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rt6_make_pcpu_route() is called under read_lock(&amp;table-&gt;tb6_lock).
rt6_make_pcpu_route() calls ip6_rt_pcpu_alloc(rt) which then
calls dst_alloc().  dst_alloc() _may_ call ip6_dst_gc() which takes
the write_lock(&amp;tabl-&gt;tb6_lock).  A visualized version:

read_lock(&amp;table-&gt;tb6_lock);
rt6_make_pcpu_route();
=&gt; ip6_rt_pcpu_alloc();
=&gt; dst_alloc();
=&gt; ip6_dst_gc();
=&gt; write_lock(&amp;table-&gt;tb6_lock); /* oops */

The fix is to do a read_unlock first before calling ip6_rt_pcpu_alloc().

A reported stack:

[141625.537638] INFO: rcu_sched self-detected stall on CPU { 27}  (t=60000 jiffies g=4159086 c=4159085 q=2139)
[141625.547469] Task dump for CPU 27:
[141625.550881] mtr             R  running task        0 22121  22081 0x00000008
[141625.558069]  0000000000000000 ffff88103f363d98 ffffffff8106e488 000000000000001b
[141625.565641]  ffffffff81684900 ffff88103f363db8 ffffffff810702b0 0000000008000000
[141625.573220]  ffffffff81684900 ffff88103f363de8 ffffffff8108df9f ffff88103f375a00
[141625.580803] Call Trace:
[141625.583345]  &lt;IRQ&gt;  [&lt;ffffffff8106e488&gt;] sched_show_task+0xc1/0xc6
[141625.589650]  [&lt;ffffffff810702b0&gt;] dump_cpu_task+0x35/0x39
[141625.595144]  [&lt;ffffffff8108df9f&gt;] rcu_dump_cpu_stacks+0x6a/0x8c
[141625.601320]  [&lt;ffffffff81090606&gt;] rcu_check_callbacks+0x1f6/0x5d4
[141625.607669]  [&lt;ffffffff810940c8&gt;] update_process_times+0x2a/0x4f
[141625.613925]  [&lt;ffffffff8109fbee&gt;] tick_sched_handle+0x32/0x3e
[141625.619923]  [&lt;ffffffff8109fc2f&gt;] tick_sched_timer+0x35/0x5c
[141625.625830]  [&lt;ffffffff81094a1f&gt;] __hrtimer_run_queues+0x8f/0x18d
[141625.632171]  [&lt;ffffffff81094c9e&gt;] hrtimer_interrupt+0xa0/0x166
[141625.638258]  [&lt;ffffffff8102bf2a&gt;] local_apic_timer_interrupt+0x4e/0x52
[141625.645036]  [&lt;ffffffff8102c36f&gt;] smp_apic_timer_interrupt+0x39/0x4a
[141625.651643]  [&lt;ffffffff8140b9e8&gt;] apic_timer_interrupt+0x68/0x70
[141625.657895]  &lt;EOI&gt;  [&lt;ffffffff81346ee8&gt;] ? dst_destroy+0x7c/0xb5
[141625.664188]  [&lt;ffffffff813d45b5&gt;] ? fib6_flush_trees+0x20/0x20
[141625.670272]  [&lt;ffffffff81082b45&gt;] ? queue_write_lock_slowpath+0x60/0x6f
[141625.677140]  [&lt;ffffffff8140aa33&gt;] _raw_write_lock_bh+0x23/0x25
[141625.683218]  [&lt;ffffffff813d4553&gt;] __fib6_clean_all+0x40/0x82
[141625.689124]  [&lt;ffffffff813d45b5&gt;] ? fib6_flush_trees+0x20/0x20
[141625.695207]  [&lt;ffffffff813d6058&gt;] fib6_clean_all+0xe/0x10
[141625.700854]  [&lt;ffffffff813d60d3&gt;] fib6_run_gc+0x79/0xc8
[141625.706329]  [&lt;ffffffff813d0510&gt;] ip6_dst_gc+0x85/0xf9
[141625.711718]  [&lt;ffffffff81346d68&gt;] dst_alloc+0x55/0x159
[141625.717105]  [&lt;ffffffff813d09b5&gt;] __ip6_dst_alloc.isra.32+0x19/0x63
[141625.723620]  [&lt;ffffffff813d1830&gt;] ip6_pol_route+0x36a/0x3e8
[141625.729441]  [&lt;ffffffff813d18d6&gt;] ip6_pol_route_output+0x11/0x13
[141625.735700]  [&lt;ffffffff813f02c8&gt;] fib6_rule_action+0xa7/0x1bf
[141625.741698]  [&lt;ffffffff813d18c5&gt;] ? ip6_pol_route_input+0x17/0x17
[141625.748043]  [&lt;ffffffff81357c48&gt;] fib_rules_lookup+0xb5/0x12a
[141625.754050]  [&lt;ffffffff81141628&gt;] ? poll_select_copy_remaining+0xf9/0xf9
[141625.761002]  [&lt;ffffffff813f0535&gt;] fib6_rule_lookup+0x37/0x5c
[141625.766914]  [&lt;ffffffff813d18c5&gt;] ? ip6_pol_route_input+0x17/0x17
[141625.773260]  [&lt;ffffffff813d008c&gt;] ip6_route_output+0x7a/0x82
[141625.779177]  [&lt;ffffffff813c44c8&gt;] ip6_dst_lookup_tail+0x53/0x112
[141625.785437]  [&lt;ffffffff813c45c3&gt;] ip6_dst_lookup_flow+0x2a/0x6b
[141625.791604]  [&lt;ffffffff813ddaab&gt;] rawv6_sendmsg+0x407/0x9b6
[141625.797423]  [&lt;ffffffff813d7914&gt;] ? do_ipv6_setsockopt.isra.8+0xd87/0xde2
[141625.804464]  [&lt;ffffffff8139d4b4&gt;] inet_sendmsg+0x57/0x8e
[141625.810028]  [&lt;ffffffff81329ba3&gt;] sock_sendmsg+0x2e/0x3c
[141625.815588]  [&lt;ffffffff8132be57&gt;] SyS_sendto+0xfe/0x143
[141625.821063]  [&lt;ffffffff813dd551&gt;] ? rawv6_setsockopt+0x5e/0x67
[141625.827146]  [&lt;ffffffff8132c9f8&gt;] ? sock_common_setsockopt+0xf/0x11
[141625.833660]  [&lt;ffffffff8132c08c&gt;] ? SyS_setsockopt+0x81/0xa2
[141625.839565]  [&lt;ffffffff8140ac17&gt;] entry_SYSCALL_64_fastpath+0x12/0x6a

Fixes: d52d3997f843 ("pv6: Create percpu rt6_info")
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
CC: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Reported-by: Steinar H. Gunderson &lt;sgunderson@bigfoot.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: Create percpu rt6_info</title>
<updated>2015-05-25T17:25:35+00:00</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2015-05-23T03:56:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d52d3997f843ffefaa8d8462790ffcaca6c74192'/>
<id>d52d3997f843ffefaa8d8462790ffcaca6c74192</id>
<content type='text'>
After the patch
'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
we need to compensate the performance hit (bouncing dst-&gt;__refcnt).

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After the patch
'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
we need to compensate the performance hit (bouncing dst-&gt;__refcnt).

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: Only create RTF_CACHE routes after encountering pmtu exception</title>
<updated>2015-05-25T17:25:33+00:00</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2015-05-23T03:56:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=45e4fd26683c9a5f88600d91b08a484f7f09226a'/>
<id>45e4fd26683c9a5f88600d91b08a484f7f09226a</id>
<content type='text'>
This patch creates a RTF_CACHE routes only after encountering a pmtu
exception.

After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
tree, the rt-&gt;rt6i_node-&gt;fn_sernum is bumped which will fail the
ip6_dst_check() and trigger a relookup.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch creates a RTF_CACHE routes only after encountering a pmtu
exception.

After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
tree, the rt-&gt;rt6i_node-&gt;fn_sernum is bumped which will fail the
ip6_dst_check() and trigger a relookup.

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Cc: Steffen Klassert &lt;steffen.klassert@secunet.com&gt;
Cc: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: fix ECMP route replacement</title>
<updated>2015-05-20T16:02:26+00:00</updated>
<author>
<name>Michal Kubeček</name>
<email>mkubecek@suse.cz</email>
</author>
<published>2015-05-18T18:54:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=27596472473a02cfef2908a6bcda7e55264ba6b7'/>
<id>27596472473a02cfef2908a6bcda7e55264ba6b7</id>
<content type='text'>
When replacing an IPv6 multipath route with "ip route replace", i.e.
NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
matching route without fixing its siblings, resulting in corrupted
siblings linked list; removing one of the siblings can then end in an
infinite loop.

IPv6 ECMP implementation is a bit different from IPv4 so that route
replacement cannot work in exactly the same way. This should be a
reasonable approximation:

1. If the new route is ECMP-able and there is a matching ECMP-able one
already, replace it and all its siblings (if any).

2. If the new route is ECMP-able and no matching ECMP-able route exists,
replace first matching non-ECMP-able (if any) or just add the new one.

3. If the new route is not ECMP-able, replace first matching
non-ECMP-able route (if any) or add the new route.

We also need to remove the NLM_F_REPLACE flag after replacing old
route(s) by first nexthop of an ECMP route so that each subsequent
nexthop does not replace previous one.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Acked-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When replacing an IPv6 multipath route with "ip route replace", i.e.
NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
matching route without fixing its siblings, resulting in corrupted
siblings linked list; removing one of the siblings can then end in an
infinite loop.

IPv6 ECMP implementation is a bit different from IPv4 so that route
replacement cannot work in exactly the same way. This should be a
reasonable approximation:

1. If the new route is ECMP-able and there is a matching ECMP-able one
already, replace it and all its siblings (if any).

2. If the new route is ECMP-able and no matching ECMP-able route exists,
replace first matching non-ECMP-able (if any) or just add the new one.

3. If the new route is not ECMP-able, replace first matching
non-ECMP-able route (if any) or add the new route.

We also need to remove the NLM_F_REPLACE flag after replacing old
route(s) by first nexthop of an ECMP route so that each subsequent
nexthop does not replace previous one.

Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)")
Signed-off-by: Michal Kubecek &lt;mkubecek@suse.cz&gt;
Acked-by: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: coding style: comparison for inequality with NULL</title>
<updated>2015-03-31T17:51:54+00:00</updated>
<author>
<name>Ian Morris</name>
<email>ipm@chirality.org.uk</email>
</author>
<published>2015-03-29T13:00:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=53b24b8f94cb15e38e332db82177cf3f0f4df0c5'/>
<id>53b24b8f94cb15e38e332db82177cf3f0f4df0c5</id>
<content type='text'>
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris &lt;ipm@chirality.org.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x != NULL and sometimes as x. x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.

Signed-off-by: Ian Morris &lt;ipm@chirality.org.uk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-01-28T00:59:56+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-01-28T00:59:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=95f873f2fff96c592c5d863e2a39825bd8bf0500'/>
<id>95f873f2fff96c592c5d863e2a39825bd8bf0500</id>
<content type='text'>
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	arch/arm/boot/dts/imx6sx-sdb.dts
	net/sched/cls_bpf.c

Two simple sets of overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too</title>
<updated>2015-01-27T08:22:14+00:00</updated>
<author>
<name>Hannes Frederic Sowa</name>
<email>hannes@stressinduktion.org</email>
</author>
<published>2015-01-26T14:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6e9e16e6143b725662e47026a1d0f270721cdd24'/>
<id>6e9e16e6143b725662e47026a1d0f270721cdd24</id>
<content type='text'>
Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug &lt;https://bugzilla.kernel.org/show_bug.cgi?id=91941&gt;:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel &lt;lkundrak@v3.sk&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Tested-by: Lubomir Rintel &lt;lkundrak@v3.sk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug &lt;https://bugzilla.kernel.org/show_bug.cgi?id=91941&gt;:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba2de3957 ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: Lubomir Rintel &lt;lkundrak@v3.sk&gt;
Signed-off-by: Hannes Frederic Sowa &lt;hannes@stressinduktion.org&gt;
Tested-by: Lubomir Rintel &lt;lkundrak@v3.sk&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: make nlmsg_end() and genlmsg_end() void</title>
<updated>2015-01-18T06:03:45+00:00</updated>
<author>
<name>Johannes Berg</name>
<email>johannes.berg@intel.com</email>
</author>
<published>2015-01-16T21:09:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=053c095a82cf773075e83d7233b5cc19a1f73ece'/>
<id>053c095a82cf773075e83d7233b5cc19a1f73ece</id>
<content type='text'>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) &lt; 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb-&gt;len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb-&gt;len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with &lt;= 0 in dump functionality, but that could just
be changed to &lt; 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for &lt;0 or &lt;=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) &lt; 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb-&gt;len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb-&gt;len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with &lt;= 0 in dump functionality, but that could just
be changed to &lt; 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for &lt;0 or &lt;=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fib6: convert cfg metric to u32 outside of table write lock</title>
<updated>2015-01-06T03:55:24+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2015-01-05T22:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e715b6d3a5ef55834778d49224e60e8ccb5bf45f'/>
<id>e715b6d3a5ef55834778d49224e60e8ccb5bf45f</id>
<content type='text'>
Do the nla validation earlier, outside the write lock.

This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Do the nla validation earlier, outside the write lock.

This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.

Joint work with Daniel Borkmann.

Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fib6: fib6_commit_metrics: fix potential NULL pointer dereference</title>
<updated>2015-01-06T03:55:24+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2015-01-05T22:57:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0409c9a5a7546043d32e8c46df0dcaf65f97f659'/>
<id>0409c9a5a7546043d32e8c46df0dcaf65f97f659</id>
<content type='text'>
When IPv6 host routes with metrics attached are being added, we fetch
the metrics store from the dst via COW through dst_metrics_write_ptr(),
added through commit e5fd387ad5b3.

One remaining problem here is that we actually call into inet_getpeer()
and may end up allocating/creating a new peer from the kmemcache, which
may fail.

Example trace from perf probe (inet_getpeer:41) where create is 1:

ip 6877 [002] 4221.391591: probe:inet_getpeer: (ffffffff8165e293)
  85e294 inet_getpeer.part.7 (&lt;- kmem_cache_alloc())
  85e578 inet_getpeer
  8eb333 ipv6_cow_metrics
  8f10ff fib6_commit_metrics

Therefore, a check for NULL on the return of dst_metrics_write_ptr()
is necessary here.

Joint work with Florian Westphal.

Fixes: e5fd387ad5b3 ("ipv6: do not overwrite inetpeer metrics prematurely")
Cc: Michal Kubeček &lt;mkubecek@suse.cz&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When IPv6 host routes with metrics attached are being added, we fetch
the metrics store from the dst via COW through dst_metrics_write_ptr(),
added through commit e5fd387ad5b3.

One remaining problem here is that we actually call into inet_getpeer()
and may end up allocating/creating a new peer from the kmemcache, which
may fail.

Example trace from perf probe (inet_getpeer:41) where create is 1:

ip 6877 [002] 4221.391591: probe:inet_getpeer: (ffffffff8165e293)
  85e294 inet_getpeer.part.7 (&lt;- kmem_cache_alloc())
  85e578 inet_getpeer
  8eb333 ipv6_cow_metrics
  8f10ff fib6_commit_metrics

Therefore, a check for NULL on the return of dst_metrics_write_ptr()
is necessary here.

Joint work with Florian Westphal.

Fixes: e5fd387ad5b3 ("ipv6: do not overwrite inetpeer metrics prematurely")
Cc: Michal Kubeček &lt;mkubecek@suse.cz&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
