<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/core, branch v4.19.78</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>net: Unpublish sk from sk_reuseport_cb before call_rcu</title>
<updated>2019-10-07T16:57:22+00:00</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2019-09-27T23:00:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d5b1db1c7ce4198bbbd51160350bdd446c8ed2ba'/>
<id>d5b1db1c7ce4198bbbd51160350bdd446c8ed2ba</id>
<content type='text'>
[ Upstream commit 8c7138b33e5c690c308b2a7085f6313fdcb3f616 ]

The "reuse-&gt;sock[]" array is shared by multiple sockets.  The going away
sk must unpublish itself from "reuse-&gt;sock[]" before making call_rcu()
call.  However, this unpublish-action is currently done after a grace
period and it may cause use-after-free.

The fix is to move reuseport_detach_sock() to sk_destruct().
Due to the above reason, any socket with sk_reuseport_cb has
to go through the rcu grace period before freeing it.

It is a rather old bug (~3 yrs).  The Fixes tag is not necessary
the right commit but it is the one that introduced the SOCK_RCU_FREE
logic and this fix is depending on it.

Fixes: a4298e4522d6 ("net: add SOCK_RCU_FREE socket flag")
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8c7138b33e5c690c308b2a7085f6313fdcb3f616 ]

The "reuse-&gt;sock[]" array is shared by multiple sockets.  The going away
sk must unpublish itself from "reuse-&gt;sock[]" before making call_rcu()
call.  However, this unpublish-action is currently done after a grace
period and it may cause use-after-free.

The fix is to move reuseport_detach_sock() to sk_destruct().
Due to the above reason, any socket with sk_reuseport_cb has
to go through the rcu grace period before freeing it.

It is a rather old bug (~3 yrs).  The Fixes tag is not necessary
the right commit but it is the one that introduced the SOCK_RCU_FREE
logic and this fix is depending on it.

Fixes: a4298e4522d6 ("net: add SOCK_RCU_FREE socket flag")
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Suggested-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: allow narrow loads of some sk_reuseport_md fields with offset &gt; 0</title>
<updated>2019-09-21T05:16:59+00:00</updated>
<author>
<name>Ilya Leoshkevich</name>
<email>iii@linux.ibm.com</email>
</author>
<published>2019-08-20T15:50:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d9f79f0ace550f450851a5c07f88eed585b46214'/>
<id>d9f79f0ace550f450851a5c07f88eed585b46214</id>
<content type='text'>
[ Upstream commit 2c238177bd7f4b14bdf7447cc1cd9bb791f147e6 ]

test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:

	; data_check.eth_protocol = reuse_md-&gt;eth_protocol;
	18: (69) r1 = *(u16 *)(r6 +22)
	invalid bpf_context access off=22 size=2

This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).

The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.

Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.

Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 2c238177bd7f4b14bdf7447cc1cd9bb791f147e6 ]

test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:

	; data_check.eth_protocol = reuse_md-&gt;eth_protocol;
	18: (69) r1 = *(u16 *)(r6 +22)
	invalid bpf_context access off=22 size=2

This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).

The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.

Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.

Fixes: 2dbb9b9e6df6 ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: correct reuseport selection with connected sockets</title>
<updated>2019-09-21T05:16:43+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2019-09-13T01:16:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fdd60d80c4294b7203d6f9d075a57da0a8d85fba'/>
<id>fdd60d80c4294b7203d6f9d075a57da0a8d85fba</id>
<content type='text'>
[ Upstream commit acdcecc61285faed359f1a3568c32089cc3a8329 ]

UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit acdcecc61285faed359f1a3568c32089cc3a8329 ]

UDP reuseport groups can hold a mix unconnected and connected sockets.
Ensure that connections only receive all traffic to their 4-tuple.

Fast reuseport returns on the first reuseport match on the assumption
that all matches are equal. Only if connections are present, return to
the previous behavior of scoring all sockets.

Record if connections are present and if so (1) treat such connected
sockets as an independent match from the group, (2) only return
2-tuple matches from reuseport and (3) do not return on the first
2-tuple reuseport match to allow for a higher scoring match later.

New field has_conns is set without locks. No other fields in the
bitmap are modified at runtime and the field is only ever set
unconditionally, so an RMW cannot miss a change.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.com
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Paolo Abeni &lt;pabeni@redhat.com&gt;
Acked-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list</title>
<updated>2019-09-19T07:09:30+00:00</updated>
<author>
<name>Shmulik Ladkani</name>
<email>shmulik@metanetworks.com</email>
</author>
<published>2019-09-06T09:23:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=821302dd0c51d29269ef73a595bdff294419e2cd'/>
<id>821302dd0c51d29269ef73a595bdff294419e2cd</id>
<content type='text'>
[ Upstream commit 3dcbdb134f329842a38f0e6797191b885ab00a00 ]

Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801d1
("net: Add frag_list support to skb_segment"), quote:

    As such we require all frag_list members terminate on exact MSS
    boundaries.  This is checked using BUG_ON.
    As there should only be one producer in the kernel of such packets,
    namely GRO, this requirement should not be difficult to maintain.

However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.

This lead to a kernel BUG_ON for flows involving:
 - GRO generating a frag_list skb
 - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
 - skb_segment() of the skb

See example BUG_ON reports in [0].

In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).

Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.

This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].

Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
 - untrusted gso_size, by testing SKB_GSO_DODGY is set
   (SKB_GSO_DODGY is set by any gso_size mangling functions in
    net/core/filter.c)
 - the frag_list is non empty, its item is a non head_frag, *and* the
   headlen of the given 'head_skb' does not match the gso_size.

[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/

[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/

Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn &lt;willemdebruijn.kernel@gmail.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.duyck@gmail.com&gt;
Signed-off-by: Shmulik Ladkani &lt;shmulik.ladkani@gmail.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 3dcbdb134f329842a38f0e6797191b885ab00a00 ]

Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801d1
("net: Add frag_list support to skb_segment"), quote:

    As such we require all frag_list members terminate on exact MSS
    boundaries.  This is checked using BUG_ON.
    As there should only be one producer in the kernel of such packets,
    namely GRO, this requirement should not be difficult to maintain.

However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.

This lead to a kernel BUG_ON for flows involving:
 - GRO generating a frag_list skb
 - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
 - skb_segment() of the skb

See example BUG_ON reports in [0].

In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).

Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.

This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].

Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
 - untrusted gso_size, by testing SKB_GSO_DODGY is set
   (SKB_GSO_DODGY is set by any gso_size mangling functions in
    net/core/filter.c)
 - the frag_list is non empty, its item is a non head_frag, *and* the
   headlen of the given 'head_skb' does not match the gso_size.

[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/

[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/

Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn &lt;willemdebruijn.kernel@gmail.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Alexander Duyck &lt;alexander.duyck@gmail.com&gt;
Signed-off-by: Shmulik Ladkani &lt;shmulik.ladkani@gmail.com&gt;
Reviewed-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Alexander Duyck &lt;alexander.h.duyck@linux.intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix null de-reference of device refcount</title>
<updated>2019-09-19T07:09:29+00:00</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2019-09-10T20:02:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=88a46756f016552581fc07e0a5d5c23b5a26737f'/>
<id>88a46756f016552581fc07e0a5d5c23b5a26737f</id>
<content type='text'>
[ Upstream commit 10cc514f451a0f239aa34f91bc9dc954a9397840 ]

In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.

However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-

44837.761523:   &lt;6&gt; Unable to handle kernel paging request at
                    virtual address 0000004a88287000
44837.761651:   &lt;2&gt; pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654:   &lt;2&gt; lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393:   &lt;2&gt; Call trace:
44837.762398:   &lt;2&gt;  in_dev_finish_destroy+0x4c/0xc8
44837.762404:   &lt;2&gt;  in_dev_rcu_put+0x24/0x30
44837.762412:   &lt;2&gt;  rcu_nocb_kthread+0x43c/0x468
44837.762418:   &lt;2&gt;  kthread+0x118/0x128
44837.762424:   &lt;2&gt;  ret_from_fork+0x10/0x1c

Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.

Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti &lt;stranche@codeaurora.org&gt;
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 10cc514f451a0f239aa34f91bc9dc954a9397840 ]

In event of failure during register_netdevice, free_netdev is
invoked immediately. free_netdev assumes that all the netdevice
refcounts have been dropped prior to it being called and as a
result frees and clears out the refcount pointer.

However, this is not necessarily true as some of the operations
in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for
invocation after a grace period. The IPv4 callback in_dev_rcu_put
tries to access the refcount after free_netdev is called which
leads to a null de-reference-

44837.761523:   &lt;6&gt; Unable to handle kernel paging request at
                    virtual address 0000004a88287000
44837.761651:   &lt;2&gt; pc : in_dev_finish_destroy+0x4c/0xc8
44837.761654:   &lt;2&gt; lr : in_dev_finish_destroy+0x2c/0xc8
44837.762393:   &lt;2&gt; Call trace:
44837.762398:   &lt;2&gt;  in_dev_finish_destroy+0x4c/0xc8
44837.762404:   &lt;2&gt;  in_dev_rcu_put+0x24/0x30
44837.762412:   &lt;2&gt;  rcu_nocb_kthread+0x43c/0x468
44837.762418:   &lt;2&gt;  kthread+0x118/0x128
44837.762424:   &lt;2&gt;  ret_from_fork+0x10/0x1c

Fix this by waiting for the completion of the call_rcu() in
case of register_netdevice errors.

Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
Cc: Sean Tranchetti &lt;stranche@codeaurora.org&gt;
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix skb use after free in netpoll</title>
<updated>2019-09-10T09:33:38+00:00</updated>
<author>
<name>Feng Sun</name>
<email>loyou85@gmail.com</email>
</author>
<published>2019-08-26T06:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6a2bd826163052ed3b7f4817de46d4d89d78263c'/>
<id>6a2bd826163052ed3b7f4817de46d4d89d78263c</id>
<content type='text'>
[ Upstream commit 2c1644cf6d46a8267d79ed95cb9b563839346562 ]

After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into following use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo-&gt;txq.
queue_process will also run into use after free case.

hit netpoll_send_skb_on_dev first case with following kernel log:

[  117.864773] kernel BUG at mm/slub.c:306!
[  117.864773] invalid opcode: 0000 [#1] SMP PTI
[  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
[  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[  117.864781] Call Trace:
[  117.864781]  ? tun_net_xmit+0x21c/0x460
[  117.864781]  kfree_skbmem+0x4e/0x60
[  117.864782]  kfree_skb+0x3a/0xa0
[  117.864782]  tun_net_xmit+0x21c/0x460
[  117.864782]  netpoll_start_xmit+0x11d/0x1b0
[  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
[  117.864789]  ? skb_clone+0x53/0xd0
[  117.864790]  ? __skb_clone+0x2e/0x120
[  117.864790]  deliver_clone+0x37/0x50 [bridge]
[  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
[  117.864791]  br_flood+0x6c/0x130 [bridge]
[  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
[  117.864792]  netpoll_start_xmit+0x11d/0x1b0
[  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864792]  netpoll_send_udp+0x2c6/0x3e8
[  117.864793]  write_msg+0xd9/0xf0 [netconsole]
[  117.864793]  console_unlock+0x386/0x4e0
[  117.864793]  vprintk_emit+0x17e/0x280
[  117.864794]  vprintk_default+0x29/0x50
[  117.864794]  vprintk_func+0x4c/0xbc
[  117.864794]  printk+0x58/0x6f
[  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
[  117.864795]  kthread+0x104/0x140
[  117.864795]  ? 0xffffffffc05b1000
[  117.864796]  ? kthread_park+0x80/0x80
[  117.864796]  ret_from_fork+0x35/0x40

Signed-off-by: Feng Sun &lt;loyou85@gmail.com&gt;
Signed-off-by: Xiaojun Zhao &lt;xiaojunzhao141@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 2c1644cf6d46a8267d79ed95cb9b563839346562 ]

After commit baeababb5b85d5c4e6c917efe2a1504179438d3b
("tun: return NET_XMIT_DROP for dropped packets"),
when tun_net_xmit drop packets, it will free skb and return NET_XMIT_DROP,
netpoll_send_skb_on_dev will run into following use after free cases:
1. retry netpoll_start_xmit with freed skb;
2. queue freed skb in npinfo-&gt;txq.
queue_process will also run into use after free case.

hit netpoll_send_skb_on_dev first case with following kernel log:

[  117.864773] kernel BUG at mm/slub.c:306!
[  117.864773] invalid opcode: 0000 [#1] SMP PTI
[  117.864774] CPU: 3 PID: 2627 Comm: loop_printmsg Kdump: loaded Tainted: P           OE     5.3.0-050300rc5-generic #201908182231
[  117.864775] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  117.864775] RIP: 0010:kmem_cache_free+0x28d/0x2b0
[  117.864781] Call Trace:
[  117.864781]  ? tun_net_xmit+0x21c/0x460
[  117.864781]  kfree_skbmem+0x4e/0x60
[  117.864782]  kfree_skb+0x3a/0xa0
[  117.864782]  tun_net_xmit+0x21c/0x460
[  117.864782]  netpoll_start_xmit+0x11d/0x1b0
[  117.864788]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864789]  __br_forward+0x1b9/0x1e0 [bridge]
[  117.864789]  ? skb_clone+0x53/0xd0
[  117.864790]  ? __skb_clone+0x2e/0x120
[  117.864790]  deliver_clone+0x37/0x50 [bridge]
[  117.864790]  maybe_deliver+0x89/0xc0 [bridge]
[  117.864791]  br_flood+0x6c/0x130 [bridge]
[  117.864791]  br_dev_xmit+0x315/0x3c0 [bridge]
[  117.864792]  netpoll_start_xmit+0x11d/0x1b0
[  117.864792]  netpoll_send_skb_on_dev+0x1b8/0x200
[  117.864792]  netpoll_send_udp+0x2c6/0x3e8
[  117.864793]  write_msg+0xd9/0xf0 [netconsole]
[  117.864793]  console_unlock+0x386/0x4e0
[  117.864793]  vprintk_emit+0x17e/0x280
[  117.864794]  vprintk_default+0x29/0x50
[  117.864794]  vprintk_func+0x4c/0xbc
[  117.864794]  printk+0x58/0x6f
[  117.864795]  loop_fun+0x24/0x41 [printmsg_loop]
[  117.864795]  kthread+0x104/0x140
[  117.864795]  ? 0xffffffffc05b1000
[  117.864796]  ? kthread_park+0x80/0x80
[  117.864796]  ret_from_fork+0x35/0x40

Signed-off-by: Feng Sun &lt;loyou85@gmail.com&gt;
Signed-off-by: Xiaojun Zhao &lt;xiaojunzhao141@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: make sure EPOLLOUT wont be missed</title>
<updated>2019-09-06T08:22:07+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-08-17T04:26:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0a6a9c473080801b7a738b828b4e651206febc79'/>
<id>0a6a9c473080801b7a738b828b4e651206febc79</id>
<content type='text'>
[ Upstream commit ef8d8ccdc216f797e66cb4a1372f5c4c285ce1e4 ]

As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.

However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :

    long timeo = sock_sndtimeo(sk, flags &amp; MSG_DONTWAIT);

So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk-&gt;sk_sndtimeo has been set to a small (but not zero)
value.

This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.

It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.

Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Reported-by: Vladimir Rutsky  &lt;rutsky@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Jason Baron &lt;jbaron@akamai.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ef8d8ccdc216f797e66cb4a1372f5c4c285ce1e4 ]

As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.

However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :

    long timeo = sock_sndtimeo(sk, flags &amp; MSG_DONTWAIT);

So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk-&gt;sk_sndtimeo has been set to a small (but not zero)
value.

This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.

It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.

Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Jason Baron &lt;jbaron@akamai.com&gt;
Reported-by: Vladimir Rutsky  &lt;rutsky@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Jason Baron &lt;jbaron@akamai.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix ifindex collision during namespace removal</title>
<updated>2019-08-09T15:52:31+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@mellanox.com</email>
</author>
<published>2019-07-28T12:56:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=edb7ad69c439cdb960d9f519233d8d9771e329b5'/>
<id>edb7ad69c439cdb960d9f519233d8d9771e329b5</id>
<content type='text'>
[ Upstream commit 55b40dbf0e76b4bfb9d8b3a16a0208640a9a45df ]

Commit aca51397d014 ("netns: Fix arbitrary net_device-s corruptions
on net_ns stop.") introduced a possibility to hit a BUG in case device
is returning back to init_net and two following conditions are met:
1) dev-&gt;ifindex value is used in a name of another "dev%d"
   device in init_net.
2) dev-&gt;name is used by another device in init_net.

Under real life circumstances this is hard to get. Therefore this has
been present happily for over 10 years. To reproduce:

$ ip a
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
3: enp0s2: &lt;BROADCAST,MULTICAST&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
$ ip netns add ns1
$ ip -n ns1 link add dummy1ns1 type dummy
$ ip -n ns1 link add dummy2ns1 type dummy
$ ip link set enp0s2 netns ns1
$ ip -n ns1 link set enp0s2 name dummy0
[  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
$ ip link add dev4 type dummy
$ ip -n ns1 a
1: lo: &lt;LOOPBACK&gt; mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy1ns1: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
3: dummy2ns1: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
4: dummy0: &lt;BROADCAST,MULTICAST&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
$ ip a
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
4: dev4: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
$ ip netns del ns1
[  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
[  158.719316] ------------[ cut here ]------------
[  158.720591] kernel BUG at net/core/dev.c:9824!
[  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
[  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
[  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[  158.727508] Workqueue: netns cleanup_net
[  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
[  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
[  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
[  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
[  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
[  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
[  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
[  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
[  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
[  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  158.762758] Call Trace:
[  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
[  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
[  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
[  158.769870]  ops_exit_list.isra.0+0xa8/0x150
[  158.771544]  cleanup_net+0x446/0x8f0
[  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
[  158.775294]  process_one_work+0xa1a/0x1740
[  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
[  158.779143]  ? do_raw_spin_lock+0x11b/0x280
[  158.780848]  worker_thread+0x9e/0x1060
[  158.782500]  ? process_one_work+0x1740/0x1740
[  158.784454]  kthread+0x31b/0x420
[  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
[  158.788286]  ret_from_fork+0x3a/0x50
[  158.789871] ---[ end trace defd6c657c71f936 ]---
[  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
[  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
[  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
[  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
[  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
[  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
[  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
[  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
[  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
[  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fix this by checking if a device with the same name exists in init_net
and fallback to original code - dev%d to allocate name - in case it does.

This was found using syzkaller.

Fixes: aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 55b40dbf0e76b4bfb9d8b3a16a0208640a9a45df ]

Commit aca51397d014 ("netns: Fix arbitrary net_device-s corruptions
on net_ns stop.") introduced a possibility to hit a BUG in case device
is returning back to init_net and two following conditions are met:
1) dev-&gt;ifindex value is used in a name of another "dev%d"
   device in init_net.
2) dev-&gt;name is used by another device in init_net.

Under real life circumstances this is hard to get. Therefore this has
been present happily for over 10 years. To reproduce:

$ ip a
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
3: enp0s2: &lt;BROADCAST,MULTICAST&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
$ ip netns add ns1
$ ip -n ns1 link add dummy1ns1 type dummy
$ ip -n ns1 link add dummy2ns1 type dummy
$ ip link set enp0s2 netns ns1
$ ip -n ns1 link set enp0s2 name dummy0
[  100.858894] virtio_net virtio0 dummy0: renamed from enp0s2
$ ip link add dev4 type dummy
$ ip -n ns1 a
1: lo: &lt;LOOPBACK&gt; mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: dummy1ns1: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 16:63:4c:38:3e:ff brd ff:ff:ff:ff:ff:ff
3: dummy2ns1: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether aa:9e:86:dd:6b:5d brd ff:ff:ff:ff:ff:ff
4: dummy0: &lt;BROADCAST,MULTICAST&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
$ ip a
1: lo: &lt;LOOPBACK,UP,LOWER_UP&gt; mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: dummy0: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:89:3f:86:61:29 brd ff:ff:ff:ff:ff:ff
4: dev4: &lt;BROADCAST,NOARP&gt; mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5a:e1:4a:b6:ec:f8 brd ff:ff:ff:ff:ff:ff
$ ip netns del ns1
[  158.717795] default_device_exit: failed to move dummy0 to init_net: -17
[  158.719316] ------------[ cut here ]------------
[  158.720591] kernel BUG at net/core/dev.c:9824!
[  158.722260] invalid opcode: 0000 [#1] SMP KASAN PTI
[  158.723728] CPU: 0 PID: 56 Comm: kworker/u2:1 Not tainted 5.3.0-rc1+ #18
[  158.725422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
[  158.727508] Workqueue: netns cleanup_net
[  158.728915] RIP: 0010:default_device_exit.cold+0x1d/0x1f
[  158.730683] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
[  158.736854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
[  158.738752] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
[  158.741369] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
[  158.743418] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
[  158.745626] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
[  158.748405] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
[  158.750638] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[  158.752944] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.755245] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
[  158.757654] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.760012] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  158.762758] Call Trace:
[  158.763882]  ? dev_change_net_namespace+0xbb0/0xbb0
[  158.766148]  ? devlink_nl_cmd_set_doit+0x520/0x520
[  158.768034]  ? dev_change_net_namespace+0xbb0/0xbb0
[  158.769870]  ops_exit_list.isra.0+0xa8/0x150
[  158.771544]  cleanup_net+0x446/0x8f0
[  158.772945]  ? unregister_pernet_operations+0x4a0/0x4a0
[  158.775294]  process_one_work+0xa1a/0x1740
[  158.776896]  ? pwq_dec_nr_in_flight+0x310/0x310
[  158.779143]  ? do_raw_spin_lock+0x11b/0x280
[  158.780848]  worker_thread+0x9e/0x1060
[  158.782500]  ? process_one_work+0x1740/0x1740
[  158.784454]  kthread+0x31b/0x420
[  158.786082]  ? __kthread_create_on_node+0x3f0/0x3f0
[  158.788286]  ret_from_fork+0x3a/0x50
[  158.789871] ---[ end trace defd6c657c71f936 ]---
[  158.792273] RIP: 0010:default_device_exit.cold+0x1d/0x1f
[  158.795478] Code: 84 e8 18 c9 3e fe 0f 0b e9 70 90 ff ff e8 36 e4 52 fe 89 d9 4c 89 e2 48 c7 c6 80 d6 25 84 48 c7 c7 20 c0 25 84 e8 f4 c8 3e
[  158.804854] RSP: 0018:ffff8880347e7b90 EFLAGS: 00010282
[  158.807865] RAX: 000000000000003b RBX: 00000000ffffffef RCX: 0000000000000000
[  158.811794] RDX: 0000000000000000 RSI: ffffffff8128013d RDI: ffffed10068fcf64
[  158.816652] RBP: ffff888033550170 R08: 000000000000003b R09: fffffbfff0b94b9c
[  158.820930] R10: fffffbfff0b94b9b R11: ffffffff85ca5cdf R12: ffff888032f28000
[  158.825113] R13: dffffc0000000000 R14: ffff8880335501b8 R15: 1ffff110068fcf72
[  158.829899] FS:  0000000000000000(0000) GS:ffff888036000000(0000) knlGS:0000000000000000
[  158.834923] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  158.838164] CR2: 00007fe8b45d21d0 CR3: 00000000340b4005 CR4: 0000000000360ef0
[  158.841917] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  158.845149] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fix this by checking if a device with the same name exists in init_net
and fallback to original code - dev%d to allocate name - in case it does.

This was found using syzkaller.

Fixes: aca51397d014 ("netns: Fix arbitrary net_device-s corruptions on net_ns stop.")
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: fix tcp_set_congestion_control() use from bpf hook</title>
<updated>2019-07-28T06:29:26+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2019-07-19T02:28:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c60f57dfe995172c2f01e59266e3ffa3419c6cd9'/>
<id>c60f57dfe995172c2f01e59266e3ffa3419c6cd9</id>
<content type='text'>
[ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ]

Neal reported incorrect use of ns_capable() from bpf hook.

bpf_setsockopt(...TCP_CONGESTION...)
  -&gt; tcp_set_congestion_control()
   -&gt; ns_capable(sock_net(sk)-&gt;user_ns, CAP_NET_ADMIN)
    -&gt; ns_capable_common()
     -&gt; current_cred()
      -&gt; rcu_dereference_protected(current-&gt;cred, 1)

Accessing 'current' in bpf context makes no sense, since packets
are processed from softirq context.

As Neal stated : The capability check in tcp_set_congestion_control()
was written assuming a system call context, and then was reused from
a BPF call site.

The fix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Reported-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ]

Neal reported incorrect use of ns_capable() from bpf hook.

bpf_setsockopt(...TCP_CONGESTION...)
  -&gt; tcp_set_congestion_control()
   -&gt; ns_capable(sock_net(sk)-&gt;user_ns, CAP_NET_ADMIN)
    -&gt; ns_capable_common()
     -&gt; current_cred()
      -&gt; rcu_dereference_protected(current-&gt;cred, 1)

Accessing 'current' in bpf context makes no sense, since packets
are processed from softirq context.

As Neal stated : The capability check in tcp_set_congestion_control()
was written assuming a system call context, and then was reused from
a BPF call site.

The fix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.

Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Reported-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Lawrence Brakmo &lt;brakmo@fb.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: neigh: fix multiple neigh timer scheduling</title>
<updated>2019-07-28T06:29:24+00:00</updated>
<author>
<name>Lorenzo Bianconi</name>
<email>lorenzo.bianconi@redhat.com</email>
</author>
<published>2019-07-14T21:36:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=257441a072010d880a79048cd3158cde32ff8258'/>
<id>257441a072010d880a79048cd3158cde32ff8258</id>
<content type='text'>
[ Upstream commit 071c37983d99da07797294ea78e9da1a6e287144 ]

Neigh timer can be scheduled multiple times from userspace adding
multiple neigh entries and forcing the neigh timer scheduling passing
NTF_USE in the netlink requests.
This will result in a refcount leak and in the following dump stack:

[   32.465295] NEIGH: BUG, double timer add, state is 8
[   32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65
[   32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[   32.465313] Call Trace:
[   32.465318]  dump_stack+0x7c/0xc0
[   32.465323]  __neigh_event_send+0x20c/0x880
[   32.465326]  ? ___neigh_create+0x846/0xfb0
[   32.465329]  ? neigh_lookup+0x2a9/0x410
[   32.465332]  ? neightbl_fill_info.constprop.0+0x800/0x800
[   32.465334]  neigh_add+0x4f8/0x5e0
[   32.465337]  ? neigh_xmit+0x620/0x620
[   32.465341]  ? find_held_lock+0x85/0xa0
[   32.465345]  rtnetlink_rcv_msg+0x204/0x570
[   32.465348]  ? rtnl_dellink+0x450/0x450
[   32.465351]  ? mark_held_locks+0x90/0x90
[   32.465354]  ? match_held_lock+0x1b/0x230
[   32.465357]  netlink_rcv_skb+0xc4/0x1d0
[   32.465360]  ? rtnl_dellink+0x450/0x450
[   32.465363]  ? netlink_ack+0x420/0x420
[   32.465366]  ? netlink_deliver_tap+0x115/0x560
[   32.465369]  ? __alloc_skb+0xc9/0x2f0
[   32.465372]  netlink_unicast+0x270/0x330
[   32.465375]  ? netlink_attachskb+0x2f0/0x2f0
[   32.465378]  netlink_sendmsg+0x34f/0x5a0
[   32.465381]  ? netlink_unicast+0x330/0x330
[   32.465385]  ? move_addr_to_kernel.part.0+0x20/0x20
[   32.465388]  ? netlink_unicast+0x330/0x330
[   32.465391]  sock_sendmsg+0x91/0xa0
[   32.465394]  ___sys_sendmsg+0x407/0x480
[   32.465397]  ? copy_msghdr_from_user+0x200/0x200
[   32.465401]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[   32.465404]  ? lockdep_hardirqs_on+0x17d/0x250
[   32.465407]  ? __wake_up_common_lock+0xcb/0x110
[   32.465410]  ? __wake_up_common+0x230/0x230
[   32.465413]  ? netlink_bind+0x3e1/0x490
[   32.465416]  ? netlink_setsockopt+0x540/0x540
[   32.465420]  ? __fget_light+0x9c/0xf0
[   32.465423]  ? sockfd_lookup_light+0x8c/0xb0
[   32.465426]  __sys_sendmsg+0xa5/0x110
[   32.465429]  ? __ia32_sys_shutdown+0x30/0x30
[   32.465432]  ? __fd_install+0xe1/0x2c0
[   32.465435]  ? lockdep_hardirqs_off+0xb5/0x100
[   32.465438]  ? mark_held_locks+0x24/0x90
[   32.465441]  ? do_syscall_64+0xf/0x270
[   32.465444]  do_syscall_64+0x63/0x270
[   32.465448]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
receiving a netlink request with NTF_USE flag set

Reported-by: Marek Majkowski &lt;marek@cloudflare.com&gt;
Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
Signed-off-by: Lorenzo Bianconi &lt;lorenzo.bianconi@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 071c37983d99da07797294ea78e9da1a6e287144 ]

Neigh timer can be scheduled multiple times from userspace adding
multiple neigh entries and forcing the neigh timer scheduling passing
NTF_USE in the netlink requests.
This will result in a refcount leak and in the following dump stack:

[   32.465295] NEIGH: BUG, double timer add, state is 8
[   32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65
[   32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
[   32.465313] Call Trace:
[   32.465318]  dump_stack+0x7c/0xc0
[   32.465323]  __neigh_event_send+0x20c/0x880
[   32.465326]  ? ___neigh_create+0x846/0xfb0
[   32.465329]  ? neigh_lookup+0x2a9/0x410
[   32.465332]  ? neightbl_fill_info.constprop.0+0x800/0x800
[   32.465334]  neigh_add+0x4f8/0x5e0
[   32.465337]  ? neigh_xmit+0x620/0x620
[   32.465341]  ? find_held_lock+0x85/0xa0
[   32.465345]  rtnetlink_rcv_msg+0x204/0x570
[   32.465348]  ? rtnl_dellink+0x450/0x450
[   32.465351]  ? mark_held_locks+0x90/0x90
[   32.465354]  ? match_held_lock+0x1b/0x230
[   32.465357]  netlink_rcv_skb+0xc4/0x1d0
[   32.465360]  ? rtnl_dellink+0x450/0x450
[   32.465363]  ? netlink_ack+0x420/0x420
[   32.465366]  ? netlink_deliver_tap+0x115/0x560
[   32.465369]  ? __alloc_skb+0xc9/0x2f0
[   32.465372]  netlink_unicast+0x270/0x330
[   32.465375]  ? netlink_attachskb+0x2f0/0x2f0
[   32.465378]  netlink_sendmsg+0x34f/0x5a0
[   32.465381]  ? netlink_unicast+0x330/0x330
[   32.465385]  ? move_addr_to_kernel.part.0+0x20/0x20
[   32.465388]  ? netlink_unicast+0x330/0x330
[   32.465391]  sock_sendmsg+0x91/0xa0
[   32.465394]  ___sys_sendmsg+0x407/0x480
[   32.465397]  ? copy_msghdr_from_user+0x200/0x200
[   32.465401]  ? _raw_spin_unlock_irqrestore+0x37/0x40
[   32.465404]  ? lockdep_hardirqs_on+0x17d/0x250
[   32.465407]  ? __wake_up_common_lock+0xcb/0x110
[   32.465410]  ? __wake_up_common+0x230/0x230
[   32.465413]  ? netlink_bind+0x3e1/0x490
[   32.465416]  ? netlink_setsockopt+0x540/0x540
[   32.465420]  ? __fget_light+0x9c/0xf0
[   32.465423]  ? sockfd_lookup_light+0x8c/0xb0
[   32.465426]  __sys_sendmsg+0xa5/0x110
[   32.465429]  ? __ia32_sys_shutdown+0x30/0x30
[   32.465432]  ? __fd_install+0xe1/0x2c0
[   32.465435]  ? lockdep_hardirqs_off+0xb5/0x100
[   32.465438]  ? mark_held_locks+0x24/0x90
[   32.465441]  ? do_syscall_64+0xf/0x270
[   32.465444]  do_syscall_64+0x63/0x270
[   32.465448]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
receiving a netlink request with NTF_USE flag set

Reported-by: Marek Majkowski &lt;marek@cloudflare.com&gt;
Fixes: 0c5c2d308906 ("neigh: Allow for user space users of the neighbour table")
Signed-off-by: Lorenzo Bianconi &lt;lorenzo.bianconi@redhat.com&gt;
Reviewed-by: David Ahern &lt;dsahern@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
