<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/netfilter, branch v3.4.112</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get</title>
<updated>2016-03-21T01:17:56+00:00</updated>
<author>
<name>Andrey Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2014-01-29T18:34:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9b3dd23a4533ee0a288e4c6c77276c6791426a1a'/>
<id>9b3dd23a4533ee0a288e4c6c77276c6791426a1a</id>
<content type='text'>
commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(&amp;ct-&gt;tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
	kmem_cache_free(net-&gt;ct.nf_conntrack_cachep, ct);

net-&gt;ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1			task 2			task 3
			nf_conntrack_find_get
			 ____nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
						__nf_conntrack_alloc
						 kmem_cache_alloc
						 memset(&amp;ct-&gt;tuplehash[IP_CT_DIR_MAX],
			 if (nf_ct_is_dying(ct))
			 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

&lt;2&gt;[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
&lt;4&gt;[46267.083951] RIP: 0010:[&lt;ffffffffa01e00a4&gt;]  [&lt;ffffffffa01e00a4&gt;] nf_nat_setup_info+0x564/0x590 [nf_nat]
...
&lt;4&gt;[46267.085549] Call Trace:
&lt;4&gt;[46267.085622]  [&lt;ffffffffa023421b&gt;] alloc_null_binding+0x5b/0xa0 [iptable_nat]
&lt;4&gt;[46267.085697]  [&lt;ffffffffa02342bc&gt;] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
&lt;4&gt;[46267.085770]  [&lt;ffffffffa0234521&gt;] nf_nat_fn+0x111/0x260 [iptable_nat]
&lt;4&gt;[46267.085843]  [&lt;ffffffffa0234798&gt;] nf_nat_out+0x48/0xd0 [iptable_nat]
&lt;4&gt;[46267.085919]  [&lt;ffffffff814841b9&gt;] nf_iterate+0x69/0xb0
&lt;4&gt;[46267.085991]  [&lt;ffffffff81494e70&gt;] ? ip_finish_output+0x0/0x2f0
&lt;4&gt;[46267.086063]  [&lt;ffffffff81484374&gt;] nf_hook_slow+0x74/0x110
&lt;4&gt;[46267.086133]  [&lt;ffffffff81494e70&gt;] ? ip_finish_output+0x0/0x2f0
&lt;4&gt;[46267.086207]  [&lt;ffffffff814b5890&gt;] ? dst_output+0x0/0x20
&lt;4&gt;[46267.086277]  [&lt;ffffffff81495204&gt;] ip_output+0xa4/0xc0
&lt;4&gt;[46267.086346]  [&lt;ffffffff814b65a4&gt;] raw_sendmsg+0x8b4/0x910
&lt;4&gt;[46267.086419]  [&lt;ffffffff814c10fa&gt;] inet_sendmsg+0x4a/0xb0
&lt;4&gt;[46267.086491]  [&lt;ffffffff814459aa&gt;] ? sock_update_classid+0x3a/0x50
&lt;4&gt;[46267.086562]  [&lt;ffffffff81444d67&gt;] sock_sendmsg+0x117/0x140
&lt;4&gt;[46267.086638]  [&lt;ffffffff8151997b&gt;] ? _spin_unlock_bh+0x1b/0x20
&lt;4&gt;[46267.086712]  [&lt;ffffffff8109d370&gt;] ? autoremove_wake_function+0x0/0x40
&lt;4&gt;[46267.086785]  [&lt;ffffffff81495e80&gt;] ? do_ip_setsockopt+0x90/0xd80
&lt;4&gt;[46267.086858]  [&lt;ffffffff8100be0e&gt;] ? call_function_interrupt+0xe/0x20
&lt;4&gt;[46267.086936]  [&lt;ffffffff8118cb10&gt;] ? ub_slab_ptr+0x20/0x90
&lt;4&gt;[46267.087006]  [&lt;ffffffff8118cb10&gt;] ? ub_slab_ptr+0x20/0x90
&lt;4&gt;[46267.087081]  [&lt;ffffffff8118f2e8&gt;] ? kmem_cache_alloc+0xd8/0x1e0
&lt;4&gt;[46267.087151]  [&lt;ffffffff81445599&gt;] sys_sendto+0x139/0x190
&lt;4&gt;[46267.087229]  [&lt;ffffffff81448c0d&gt;] ? sock_setsockopt+0x16d/0x6f0
&lt;4&gt;[46267.087303]  [&lt;ffffffff810efa47&gt;] ? audit_syscall_entry+0x1d7/0x200
&lt;4&gt;[46267.087378]  [&lt;ffffffff810ef795&gt;] ? __audit_syscall_exit+0x265/0x290
&lt;4&gt;[46267.087454]  [&lt;ffffffff81474885&gt;] ? compat_sys_setsockopt+0x75/0x210
&lt;4&gt;[46267.087531]  [&lt;ffffffff81474b5f&gt;] compat_sys_socketcall+0x13f/0x210
&lt;4&gt;[46267.087607]  [&lt;ffffffff8104dea3&gt;] ia32_sysret+0x0/0x5
&lt;4&gt;[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 &lt;0f&gt; 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
&lt;1&gt;[46267.088023] RIP  [&lt;ffffffffa01e00a4&gt;] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Patrick McHardy &lt;kaber@trash.net&gt;
Cc: Jozsef Kadlecsik &lt;kadlec@blackhole.kfki.hu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Signed-off-by: Andrey Vagin &lt;avagin@openvz.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit c6825c0976fa7893692e0e43b09740b419b23c09 upstream.

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(&amp;ct-&gt;tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
	kmem_cache_free(net-&gt;ct.nf_conntrack_cachep, ct);

net-&gt;ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1			task 2			task 3
			nf_conntrack_find_get
			 ____nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
						__nf_conntrack_alloc
						 kmem_cache_alloc
						 memset(&amp;ct-&gt;tuplehash[IP_CT_DIR_MAX],
			 if (nf_ct_is_dying(ct))
			 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few nodes.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

&lt;2&gt;[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
&lt;4&gt;[46267.083951] RIP: 0010:[&lt;ffffffffa01e00a4&gt;]  [&lt;ffffffffa01e00a4&gt;] nf_nat_setup_info+0x564/0x590 [nf_nat]
...
&lt;4&gt;[46267.085549] Call Trace:
&lt;4&gt;[46267.085622]  [&lt;ffffffffa023421b&gt;] alloc_null_binding+0x5b/0xa0 [iptable_nat]
&lt;4&gt;[46267.085697]  [&lt;ffffffffa02342bc&gt;] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
&lt;4&gt;[46267.085770]  [&lt;ffffffffa0234521&gt;] nf_nat_fn+0x111/0x260 [iptable_nat]
&lt;4&gt;[46267.085843]  [&lt;ffffffffa0234798&gt;] nf_nat_out+0x48/0xd0 [iptable_nat]
&lt;4&gt;[46267.085919]  [&lt;ffffffff814841b9&gt;] nf_iterate+0x69/0xb0
&lt;4&gt;[46267.085991]  [&lt;ffffffff81494e70&gt;] ? ip_finish_output+0x0/0x2f0
&lt;4&gt;[46267.086063]  [&lt;ffffffff81484374&gt;] nf_hook_slow+0x74/0x110
&lt;4&gt;[46267.086133]  [&lt;ffffffff81494e70&gt;] ? ip_finish_output+0x0/0x2f0
&lt;4&gt;[46267.086207]  [&lt;ffffffff814b5890&gt;] ? dst_output+0x0/0x20
&lt;4&gt;[46267.086277]  [&lt;ffffffff81495204&gt;] ip_output+0xa4/0xc0
&lt;4&gt;[46267.086346]  [&lt;ffffffff814b65a4&gt;] raw_sendmsg+0x8b4/0x910
&lt;4&gt;[46267.086419]  [&lt;ffffffff814c10fa&gt;] inet_sendmsg+0x4a/0xb0
&lt;4&gt;[46267.086491]  [&lt;ffffffff814459aa&gt;] ? sock_update_classid+0x3a/0x50
&lt;4&gt;[46267.086562]  [&lt;ffffffff81444d67&gt;] sock_sendmsg+0x117/0x140
&lt;4&gt;[46267.086638]  [&lt;ffffffff8151997b&gt;] ? _spin_unlock_bh+0x1b/0x20
&lt;4&gt;[46267.086712]  [&lt;ffffffff8109d370&gt;] ? autoremove_wake_function+0x0/0x40
&lt;4&gt;[46267.086785]  [&lt;ffffffff81495e80&gt;] ? do_ip_setsockopt+0x90/0xd80
&lt;4&gt;[46267.086858]  [&lt;ffffffff8100be0e&gt;] ? call_function_interrupt+0xe/0x20
&lt;4&gt;[46267.086936]  [&lt;ffffffff8118cb10&gt;] ? ub_slab_ptr+0x20/0x90
&lt;4&gt;[46267.087006]  [&lt;ffffffff8118cb10&gt;] ? ub_slab_ptr+0x20/0x90
&lt;4&gt;[46267.087081]  [&lt;ffffffff8118f2e8&gt;] ? kmem_cache_alloc+0xd8/0x1e0
&lt;4&gt;[46267.087151]  [&lt;ffffffff81445599&gt;] sys_sendto+0x139/0x190
&lt;4&gt;[46267.087229]  [&lt;ffffffff81448c0d&gt;] ? sock_setsockopt+0x16d/0x6f0
&lt;4&gt;[46267.087303]  [&lt;ffffffff810efa47&gt;] ? audit_syscall_entry+0x1d7/0x200
&lt;4&gt;[46267.087378]  [&lt;ffffffff810ef795&gt;] ? __audit_syscall_exit+0x265/0x290
&lt;4&gt;[46267.087454]  [&lt;ffffffff81474885&gt;] ? compat_sys_setsockopt+0x75/0x210
&lt;4&gt;[46267.087531]  [&lt;ffffffff81474b5f&gt;] compat_sys_socketcall+0x13f/0x210
&lt;4&gt;[46267.087607]  [&lt;ffffffff8104dea3&gt;] ia32_sysret+0x0/0x5
&lt;4&gt;[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 &lt;0f&gt; 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
&lt;1&gt;[46267.088023] RIP  [&lt;ffffffffa01e00a4&gt;] nf_nat_setup_info+0x564/0x590

Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Patrick McHardy &lt;kaber@trash.net&gt;
Cc: Jozsef Kadlecsik &lt;kadlec@blackhole.kfki.hu&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Signed-off-by: Andrey Vagin &lt;avagin@openvz.org&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack: Support expectations in different zones</title>
<updated>2016-03-21T01:17:48+00:00</updated>
<author>
<name>Joe Stringer</name>
<email>joestringer@nicira.com</email>
</author>
<published>2015-07-22T04:37:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2908d41798415ec9561186c0fd667d6bf282ae31'/>
<id>2908d41798415ec9561186c0fd667d6bf282ae31</id>
<content type='text'>
commit 4b31814d20cbe5cd4ccf18089751e77a04afe4f2 upstream.

When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.

Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer &lt;joestringer@nicira.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4b31814d20cbe5cd4ccf18089751e77a04afe4f2 upstream.

When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.

Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones")
Signed-off-by: Joe Stringer &lt;joestringer@nicira.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipvs: fix memory leak in ip_vs_ctl.c</title>
<updated>2015-09-18T01:20:40+00:00</updated>
<author>
<name>Tommi Rantala</name>
<email>tt.rantala@gmail.com</email>
</author>
<published>2015-05-07T12:12:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6c25c7671c3f8bffec2ce9d88cf5673efa4fe2d3'/>
<id>6c25c7671c3f8bffec2ce9d88cf5673efa4fe2d3</id>
<content type='text'>
commit f30bf2a5cac6c60ab366c4bc6db913597bf4d6ab upstream.

Fix memory leak introduced in commit a0840e2e165a ("IPVS: netns,
ip_vs_ctl local vars moved to ipvs struct."):

unreferenced object 0xffff88005785b800 (size 2048):
  comm "(-localed)", pid 1434, jiffies 4294755650 (age 1421.089s)
  hex dump (first 32 bytes):
    bb 89 0b 83 ff ff ff ff b0 78 f0 4e 00 88 ff ff  .........x.N....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff8262ea8e&gt;] kmemleak_alloc+0x4e/0xb0
    [&lt;ffffffff811fba74&gt;] __kmalloc_track_caller+0x244/0x430
    [&lt;ffffffff811b88a0&gt;] kmemdup+0x20/0x50
    [&lt;ffffffff823276b7&gt;] ip_vs_control_net_init+0x1f7/0x510
    [&lt;ffffffff8231d630&gt;] __ip_vs_init+0x100/0x250
    [&lt;ffffffff822363a1&gt;] ops_init+0x41/0x190
    [&lt;ffffffff82236583&gt;] setup_net+0x93/0x150
    [&lt;ffffffff82236cc2&gt;] copy_net_ns+0x82/0x140
    [&lt;ffffffff810ab13d&gt;] create_new_namespaces+0xfd/0x190
    [&lt;ffffffff810ab49a&gt;] unshare_nsproxy_namespaces+0x5a/0xc0
    [&lt;ffffffff810833e3&gt;] SyS_unshare+0x173/0x310
    [&lt;ffffffff8265cbd7&gt;] system_call_fastpath+0x12/0x6f
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Tommi Rantala &lt;tt.rantala@gmail.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f30bf2a5cac6c60ab366c4bc6db913597bf4d6ab upstream.

Fix memory leak introduced in commit a0840e2e165a ("IPVS: netns,
ip_vs_ctl local vars moved to ipvs struct."):

unreferenced object 0xffff88005785b800 (size 2048):
  comm "(-localed)", pid 1434, jiffies 4294755650 (age 1421.089s)
  hex dump (first 32 bytes):
    bb 89 0b 83 ff ff ff ff b0 78 f0 4e 00 88 ff ff  .........x.N....
    04 00 00 00 a4 01 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;ffffffff8262ea8e&gt;] kmemleak_alloc+0x4e/0xb0
    [&lt;ffffffff811fba74&gt;] __kmalloc_track_caller+0x244/0x430
    [&lt;ffffffff811b88a0&gt;] kmemdup+0x20/0x50
    [&lt;ffffffff823276b7&gt;] ip_vs_control_net_init+0x1f7/0x510
    [&lt;ffffffff8231d630&gt;] __ip_vs_init+0x100/0x250
    [&lt;ffffffff822363a1&gt;] ops_init+0x41/0x190
    [&lt;ffffffff82236583&gt;] setup_net+0x93/0x150
    [&lt;ffffffff82236cc2&gt;] copy_net_ns+0x82/0x140
    [&lt;ffffffff810ab13d&gt;] create_new_namespaces+0xfd/0x190
    [&lt;ffffffff810ab49a&gt;] unshare_nsproxy_namespaces+0x5a/0xc0
    [&lt;ffffffff810833e3&gt;] SyS_unshare+0x173/0x310
    [&lt;ffffffff8265cbd7&gt;] system_call_fastpath+0x12/0x6f
    [&lt;ffffffffffffffff&gt;] 0xffffffffffffffff

Fixes: a0840e2e165a ("IPVS: netns, ip_vs_ctl local vars moved to ipvs struct.")
Signed-off-by: Tommi Rantala &lt;tt.rantala@gmail.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipvs: uninitialized data with IP_VS_IPV6</title>
<updated>2015-06-19T03:40:33+00:00</updated>
<author>
<name>Dan Carpenter</name>
<email>dan.carpenter@oracle.com</email>
</author>
<published>2014-12-06T13:49:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4dd86a6aea75dba2284caa49817897582a0fe684'/>
<id>4dd86a6aea75dba2284caa49817897582a0fe684</id>
<content type='text'>
commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream.

The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3b05ac3824ed9648c0d9c02d51d9b54e4e7e874f upstream.

The app_tcp_pkt_out() function expects "*diff" to be set and ends up
using uninitialized data if CONFIG_IP_VS_IPV6 is turned on.

The same issue is there in app_tcp_pkt_in().  Thanks to Julian Anastasov
for noticing that.

Signed-off-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Acked-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: make skb_gso_segment error handling more robust</title>
<updated>2015-06-19T03:40:33+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2014-10-20T11:49:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=812fbfa11d44a4e59623229cbd61833fed1c1768'/>
<id>812fbfa11d44a4e59623229cbd61833fed1c1768</id>
<content type='text'>
commit 330966e501ffe282d7184fde4518d5e0c24bc7f8 upstream.

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
[lizf: Backported to 3.4: drop some hunks as there are fewer skb_gso_segment()
 users in 3.4]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 330966e501ffe282d7184fde4518d5e0c24bc7f8 upstream.

skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL.  This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, there have been issues with some protocol handlers erronously not
respecting the specified feature mask in some cases.

It is preferable to get 'have to turn off hw offloading, else slow' reports
rather than 'kernel crashes'.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Ben Hutchings &lt;ben@decadent.org.uk&gt;
[lizf: Backported to 3.4: drop some hunks as there are fewer skb_gso_segment()
 users in 3.4]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipvs: add missing ip_vs_pe_put in sync code</title>
<updated>2015-06-19T03:40:23+00:00</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2015-02-21T19:03:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bd637e58a1e3605bea04d6b503576765d2e3c7c5'/>
<id>bd637e58a1e3605bea04d6b503576765d2e3c7c5</id>
<content type='text'>
commit 528c943f3bb919aef75ab2fff4f00176f09a4019 upstream.

ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).

Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.

Fixes: fe5e7a1efb66 ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 528c943f3bb919aef75ab2fff4f00176f09a4019 upstream.

ip_vs_conn_fill_param_sync() gets in param.pe a module
reference for persistence engine from __ip_vs_pe_getbyname()
but forgets to put it. Problem occurs in backup for
sync protocol v1 (2.6.39).

Also, pe_data usually comes in sync messages for
connection templates and ip_vs_conn_new() copies
the pointer only in this case. Make sure pe_data
is not leaked if it comes unexpectedly for normal
connections. Leak can happen only if bogus messages
are sent to backup server.

Fixes: fe5e7a1efb66 ("IPVS: Backup, Adding Version 1 receive capability")
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: xt_socket: fix a stack corruption bug</title>
<updated>2015-06-19T03:40:18+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-02-16T03:03:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cea9eddd391dcdee27449ba212b2d89832f108ce'/>
<id>cea9eddd391dcdee27449ba212b2d89832f108ce</id>
<content type='text'>
commit 78296c97ca1fd3b104f12e1f1fbc06c46635990b upstream.

As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.

Lets add an additional parameter to make sure storage is valid long
enough.

While we are at it, adds some const qualifiers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: b64c9256a9b76 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 78296c97ca1fd3b104f12e1f1fbc06c46635990b upstream.

As soon as extract_icmp6_fields() returns, its local storage (automatic
variables) is deallocated and can be overwritten.

Lets add an additional parameter to make sure storage is valid long
enough.

While we are at it, adds some const qualifiers.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Fixes: b64c9256a9b76 ("tproxy: added IPv6 support to the socket match")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipvs: rerouting to local clients is not needed anymore</title>
<updated>2015-04-14T09:34:02+00:00</updated>
<author>
<name>Julian Anastasov</name>
<email>ja@ssi.bg</email>
</author>
<published>2014-12-18T20:41:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f0a6690e94cd24aa1e1346f5f473d6a3c1f22a50'/>
<id>f0a6690e94cd24aa1e1346f5f473d6a3c1f22a50</id>
<content type='text'>
commit 579eb62ac35845686a7c4286c0a820b4eb1f96aa upstream.

commit f5a41847acc5 ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000c5 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner &lt;f.wiessner@smart-weblications.de&gt;
Tested-by: Smart Weblications GmbH - Florian Wiessner &lt;f.wiessner@smart-weblications.de&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
[Julian: Backported to 3.4]
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 579eb62ac35845686a7c4286c0a820b4eb1f96aa upstream.

commit f5a41847acc5 ("ipvs: move ip_route_me_harder for ICMP")
from 2.6.37 introduced ip_route_me_harder() call for responses to
local clients, so that we can provide valid rt_src after SNAT.
It was used by TCP to provide valid daddr for ip_send_reply().
After commit 0a5ebb8000c5 ("ipv4: Pass explicit daddr arg to
ip_send_reply()." from 3.0 this rerouting is not needed anymore
and should be avoided, especially in LOCAL_IN.

Fixes 3.12.33 crash in xfrm reported by Florian Wiessner:
"3.12.33 - BUG xfrm_selector_match+0x25/0x2f6"

Reported-by: Smart Weblications GmbH - Florian Wiessner &lt;f.wiessner@smart-weblications.de&gt;
Tested-by: Smart Weblications GmbH - Florian Wiessner &lt;f.wiessner@smart-weblications.de&gt;
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
[Julian: Backported to 3.4]
Signed-off-by: Julian Anastasov &lt;ja@ssi.bg&gt;
Acked-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inetpeer: get rid of ip_id_count</title>
<updated>2014-08-14T00:42:35+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-06-02T12:26:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ad52eef552c7896ec6024ee72fc126167fe5c4e2'/>
<id>ad52eef552c7896ec6024ee72fc126167fe5c4e2</id>
<content type='text'>
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 73f156a6e8c1074ac6327e0abd1169e95eb66463 ]

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_conntrack_dccp: fix skb_header_pointer API usages</title>
<updated>2014-04-03T18:58:46+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>dborkman@redhat.com</email>
</author>
<published>2014-01-05T23:57:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8b0da794d8fb4cc4ba410d4bb66fcd0feea0c7d1'/>
<id>8b0da794d8fb4cc4ba410d4bb66fcd0feea0c7d1</id>
<content type='text'>
commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream.

Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

  struct dccp_hdr _dh, *dh;
  ...
  skb_header_pointer(skb, dataoff, sizeof(_dh), &amp;dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &amp;_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream.

Some occurences in the netfilter tree use skb_header_pointer() in
the following way ...

  struct dccp_hdr _dh, *dh;
  ...
  skb_header_pointer(skb, dataoff, sizeof(_dh), &amp;dh);

... where dh itself is a pointer that is being passed as the copy
buffer. Instead, we need to use &amp;_dh as the forth argument so that
we're copying the data into an actual buffer that sits on the stack.

Currently, we probably could overwrite memory on the stack (e.g.
with a possibly mal-formed DCCP packet), but unintentionally, as
we only want the buffer to be placed into _dh variable.

Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support")
Signed-off-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
