<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/core, branch v4.2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm: make page pfmemalloc check more robust</title>
<updated>2015-08-21T21:30:10+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2015-08-21T21:11:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2f064f3485cd29633ad1b3cfb00cc519509a3d72'/>
<id>2f064f3485cd29633ad1b3cfb00cc519509a3d72</id>
<content type='text'>
Commit c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb") added
checks for page-&gt;pfmemalloc to __skb_fill_page_desc():

        if (page-&gt;pfmemalloc &amp;&amp; !page-&gt;mapping)
                skb-&gt;pfmemalloc = true;

It assumes page-&gt;mapping == NULL implies that page-&gt;pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-&gt;mapping
to NULL and leave page-&gt;index value alone.  Due to being in union, a
non-zero page-&gt;index will be interpreted as true page-&gt;pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page-&gt;pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page-&gt;index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Debugged-by: Vlastimil Babka &lt;vbabka@suse.com&gt;
Debugged-by: Jiri Bohac &lt;jbohac@suse.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb") added
checks for page-&gt;pfmemalloc to __skb_fill_page_desc():

        if (page-&gt;pfmemalloc &amp;&amp; !page-&gt;mapping)
                skb-&gt;pfmemalloc = true;

It assumes page-&gt;mapping == NULL implies that page-&gt;pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-&gt;mapping
to NULL and leave page-&gt;index value alone.  Due to being in union, a
non-zero page-&gt;index will be interpreted as true page-&gt;pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page-&gt;pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page-&gt;index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Debugged-by: Vlastimil Babka &lt;vbabka@suse.com&gt;
Debugged-by: Jiri Bohac &lt;jbohac@suse.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code</title>
<updated>2015-08-14T00:08:39+00:00</updated>
<author>
<name>Linus Lüssing</name>
<email>linus.luessing@c0d3.blue</email>
</author>
<published>2015-08-13T03:54:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a516993f0ac1694673412eb2d16a091eafa77d2a'/>
<id>a516993f0ac1694673412eb2d16a091eafa77d2a</id>
<content type='text'>
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Linus Lüssing &lt;linus.luessing@c0d3.blue&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Linus Lüssing &lt;linus.luessing@c0d3.blue&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>inet: fix races with reqsk timers</title>
<updated>2015-08-11T04:17:29+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2015-08-10T16:09:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2235f2ac75fd2501c251b0b699a9632e80239a6d'/>
<id>2235f2ac75fd2501c251b0b699a9632e80239a6d</id>
<content type='text'>
reqsk_queue_destroy() and reqsk_queue_unlink() should use
del_timer_sync() instead of del_timer() before calling reqsk_put(),
otherwise we could free a req still used by another cpu.

But before doing so, reqsk_queue_destroy() must release syn_wait_lock
spinlock or risk a dead lock, as reqsk_timer_handler() might
need to take this same spinlock from reqsk_queue_unlink() (called from
inet_csk_reqsk_queue_drop())

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
reqsk_queue_destroy() and reqsk_queue_unlink() should use
del_timer_sync() instead of del_timer() before calling reqsk_put(),
otherwise we could free a req still used by another cpu.

But before doing so, reqsk_queue_destroy() must release syn_wait_lock
spinlock or risk a dead lock, as reqsk_timer_handler() might
need to take this same spinlock from reqsk_queue_unlink() (called from
inet_csk_reqsk_queue_drop())

Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: pktgen: don't abuse current-&gt;state in pktgen_thread_worker()</title>
<updated>2015-08-07T06:52:44+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2015-08-04T16:33:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7ba8bd75ddc6b041b5716dbb29e49df3e9cc2928'/>
<id>7ba8bd75ddc6b041b5716dbb29e49df3e9cc2928</id>
<content type='text'>
Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
code in pktgen_thread_worker()" removed (in particular) the final
__set_current_state(TASK_RUNNING) and I didn't notice the previous
set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
in __might_sleep() after return.

Afaics, we can simply remove both set_current_state()'s, and we
could do this a long ago right after ef87979c273a2 "pktgen: better
scheduler friendliness" which changed pktgen_thread_worker() to
use wait_event_interruptible_timeout().

Reported-by: Huang Ying &lt;ying.huang@intel.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 1fbe4b46caca "net: pktgen: kill the Wait for kthread_stop
code in pktgen_thread_worker()" removed (in particular) the final
__set_current_state(TASK_RUNNING) and I didn't notice the previous
set_current_state(TASK_INTERRUPTIBLE). This triggers the warning
in __might_sleep() after return.

Afaics, we can simply remove both set_current_state()'s, and we
could do this a long ago right after ef87979c273a2 "pktgen: better
scheduler friendliness" which changed pktgen_thread_worker() to
use wait_event_interruptible_timeout().

Reported-by: Huang Ying &lt;ying.huang@intel.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix skb_set_peeked use-after-free bug</title>
<updated>2015-08-07T04:55:47+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2015-08-04T07:42:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a0a2a6602496a45ae838a96db8b8173794b5d398'/>
<id>a0a2a6602496a45ae838a96db8b8173794b5d398</id>
<content type='text'>
The commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram.  This is because skb_set_peeked may create
a new skb and free the existing one.  As it stands the caller will
continue to use the old freed skb.

This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).

Fixes: 738ac1ebb96d ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Tested-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Reviewed-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The commit 738ac1ebb96d02e0d23bc320302a6ea94c612dec ("net: Clone
skb before setting peeked flag") introduced a use-after-free bug
in skb_recv_datagram.  This is because skb_set_peeked may create
a new skb and free the existing one.  As it stands the caller will
continue to use the old freed skb.

This patch fixes it by making skb_set_peeked return the new skb
(or the old one if unchanged).

Fixes: 738ac1ebb96d ("net: Clone skb before setting peeked flag")
Reported-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Tested-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Reviewed-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket</title>
<updated>2015-07-30T22:59:12+00:00</updated>
<author>
<name>Sowmini Varadhan</name>
<email>sowmini.varadhan@oracle.com</email>
</author>
<published>2015-07-30T13:50:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8a68173691f036613e3d4e6bf8dc129d4a7bf383'/>
<id>8a68173691f036613e3d4e6bf8dc129d4a7bf383</id>
<content type='text'>
The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock-&gt;..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the
      netns of kernel sockets.")
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Sowmini Varadhan &lt;sowmini.varadhan@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The newsk returned by sk_clone_lock should hold a get_net()
reference if, and only if, the parent is not a kernel socket
(making this similar to sk_alloc()).

E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock-&gt;..inet_csk_clone_lock
sets up the syn_recv newsk from sk_clone_lock. When the parent (listen)
socket is a kernel socket (defined in sk_alloc() as having
sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt
and should not hold a get_net() reference.

Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the
      netns of kernel sockets.")
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Sowmini Varadhan &lt;sowmini.varadhan@oracle.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix recv with flags MSG_WAITALL | MSG_PEEK</title>
<updated>2015-07-27T08:06:53+00:00</updated>
<author>
<name>Sabrina Dubroca</name>
<email>sd@queasysnail.net</email>
</author>
<published>2015-07-24T16:19:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dfbafc995304ebb9a9b03f65083e6e9cea143b20'/>
<id>dfbafc995304ebb9a9b03f65083e6e9cea143b20</id>
<content type='text'>
Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
with flags = MSG_WAITALL | MSG_PEEK.

sk_wait_data waits for sk_receive_queue not empty, but in this case,
the receive queue is not empty, but does not contain any skb that we
can use.

Add a "last skb seen on receive queue" argument to sk_wait_data, so
that it sleeps until the receive queue has new skbs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
Reported-by: Enrico Scholz &lt;rh-bugzilla@ensc.de&gt;
Reported-by: Dan Searle &lt;dan@censornet.com&gt;
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called
with flags = MSG_WAITALL | MSG_PEEK.

sk_wait_data waits for sk_receive_queue not empty, but in this case,
the receive queue is not empty, but does not contain any skb that we
can use.

Add a "last skb seen on receive queue" argument to sk_wait_data, so
that it sleeps until the receive queue has new skbs.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258
Reported-by: Enrico Scholz &lt;rh-bugzilla@ensc.de&gt;
Reported-by: Dan Searle &lt;dan@censornet.com&gt;
Signed-off-by: Sabrina Dubroca &lt;sd@queasysnail.net&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: net_cls: fix false-positive "suspicious RCU usage"</title>
<updated>2015-07-25T07:13:18+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-07-22T09:23:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cc9f4daa638e660f7a910b8094122561470ac331'/>
<id>cc9f4daa638e660f7a910b8094122561470ac331</id>
<content type='text'>
In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===============================
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ #2 Not tainted
[  270.730036] -------------------------------
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){......}, at: [&lt;ffffffff81682b70&gt;] __dev_queue_xmit+0x50/0x960
[  270.730085]  #1:  (&amp;qdisc_tx_lock){+.....}, at: [&lt;ffffffff81682d60&gt;] __dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[  270.730100]  0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
[  270.730103]  ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
[  270.730105]  ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [&lt;ffffffff817ad487&gt;] dump_stack+0x4c/0x65
[  270.730148]  [&lt;ffffffff810ca4f2&gt;] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [&lt;ffffffff816a62d2&gt;] task_cls_state+0x92/0xa0
[  270.730158]  [&lt;ffffffffa00b534f&gt;] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [&lt;ffffffff816aac74&gt;] tc_classify_compat+0x74/0xc0
[  270.730166]  [&lt;ffffffff816ab573&gt;] tc_classify+0x33/0x90
[  270.730170]  [&lt;ffffffffa00bcb0a&gt;] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [&lt;ffffffff81682e26&gt;] __dev_queue_xmit+0x306/0x960
[  270.730174]  [&lt;ffffffff81682b70&gt;] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [&lt;ffffffff816834a3&gt;] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [&lt;ffffffff81787770&gt;] dev_queue_xmit+0x10/0x20
[  270.730187]  [&lt;ffffffff8178b91c&gt;] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [&lt;ffffffff8178be25&gt;] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [&lt;ffffffff81665245&gt;] ? sock_def_readable+0x5/0x190
[  270.730210]  [&lt;ffffffff817b64bb&gt;] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [&lt;ffffffff8173bcbc&gt;] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [&lt;ffffffff8165f367&gt;] sock_sendmsg+0x47/0x50
[  270.730221]  [&lt;ffffffff8165f42f&gt;] sock_write_iter+0x7f/0xd0
[  270.730232]  [&lt;ffffffff811fd4c7&gt;] __vfs_write+0xa7/0xf0
[  270.730234]  [&lt;ffffffff811fe5b8&gt;] vfs_write+0xb8/0x190
[  270.730236]  [&lt;ffffffff811fe8c2&gt;] SyS_write+0x52/0xb0
[  270.730239]  [&lt;ffffffff817b6bae&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===============================
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ #2 Not tainted
[  270.730036] -------------------------------
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){......}, at: [&lt;ffffffff81682b70&gt;] __dev_queue_xmit+0x50/0x960
[  270.730085]  #1:  (&amp;qdisc_tx_lock){+.....}, at: [&lt;ffffffff81682d60&gt;] __dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[  270.730100]  0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
[  270.730103]  ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
[  270.730105]  ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [&lt;ffffffff817ad487&gt;] dump_stack+0x4c/0x65
[  270.730148]  [&lt;ffffffff810ca4f2&gt;] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [&lt;ffffffff816a62d2&gt;] task_cls_state+0x92/0xa0
[  270.730158]  [&lt;ffffffffa00b534f&gt;] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [&lt;ffffffff816aac74&gt;] tc_classify_compat+0x74/0xc0
[  270.730166]  [&lt;ffffffff816ab573&gt;] tc_classify+0x33/0x90
[  270.730170]  [&lt;ffffffffa00bcb0a&gt;] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [&lt;ffffffff81682e26&gt;] __dev_queue_xmit+0x306/0x960
[  270.730174]  [&lt;ffffffff81682b70&gt;] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [&lt;ffffffff816834a3&gt;] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [&lt;ffffffff81787770&gt;] dev_queue_xmit+0x10/0x20
[  270.730187]  [&lt;ffffffff8178b91c&gt;] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [&lt;ffffffff8178be25&gt;] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [&lt;ffffffff81665245&gt;] ? sock_def_readable+0x5/0x190
[  270.730210]  [&lt;ffffffff817b64bb&gt;] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [&lt;ffffffff8173bcbc&gt;] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [&lt;ffffffff8165f367&gt;] sock_sendmsg+0x47/0x50
[  270.730221]  [&lt;ffffffff8165f42f&gt;] sock_write_iter+0x7f/0xd0
[  270.730232]  [&lt;ffffffff811fd4c7&gt;] __vfs_write+0xa7/0xf0
[  270.730234]  [&lt;ffffffff811fe5b8&gt;] vfs_write+0xb8/0x190
[  270.730236]  [&lt;ffffffff811fe8c2&gt;] SyS_write+0x52/0xb0
[  270.730239]  [&lt;ffffffff817b6bae&gt;] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ratelimit warnings about dst entry refcount underflow or overflow</title>
<updated>2015-07-21T07:11:19+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2015-07-17T11:01:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8bf4ada2e21378816b28205427ee6b0e1ca4c5f1'/>
<id>8bf4ada2e21378816b28205427ee6b0e1ca4c5f1</id>
<content type='text'>
Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix skb csum races when peeking</title>
<updated>2015-07-15T22:59:58+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2015-07-13T12:01:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=89c22d8c3b278212eef6a8cc66b570bc840a6f5a'/>
<id>89c22d8c3b278212eef6a8cc66b570bc840a6f5a</id>
<content type='text'>
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we calculate the checksum on the recv path, we store the
result in the skb as an optimisation in case we need the checksum
again down the line.

This is in fact bogus for the MSG_PEEK case as this is done without
any locking.  So multiple threads can peek and then store the result
to the same skb, potentially resulting in bogus skb states.

This patch fixes this by only storing the result if the skb is not
shared.  This preserves the optimisations for the few cases where
it can be done safely due to locking or other reasons, e.g., SIOCINQ.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
