<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/core, branch v5.4.58</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>bpf: sockmap: Require attach_bpf_fd when detaching a program</title>
<updated>2020-08-07T07:34:02+00:00</updated>
<author>
<name>Lorenz Bauer</name>
<email>lmb@cloudflare.com</email>
</author>
<published>2020-06-29T09:56:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ca7ace8fd26d9ae4be3cf69f474ddcfb0e8506ce'/>
<id>ca7ace8fd26d9ae4be3cf69f474ddcfb0e8506ce</id>
<content type='text'>
commit bb0de3131f4c60a9bf976681e0fe4d1e55c7a821 upstream.

The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.

Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bb0de3131f4c60a9bf976681e0fe4d1e55c7a821 upstream.

The sockmap code currently ignores the value of attach_bpf_fd when
detaching a program. This is contrary to the usual behaviour of
checking that attach_bpf_fd represents the currently attached
program.

Ensure that attach_bpf_fd is indeed the currently attached
program. It turns out that all sockmap selftests already do this,
which indicates that this is unlikely to cause breakage.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Lorenz Bauer &lt;lmb@cloudflare.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Link: https://lore.kernel.org/bpf/20200629095630.7933-5-lmb@cloudflare.com
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>udp: Copy has_conns in reuseport_grow().</title>
<updated>2020-07-31T16:39:31+00:00</updated>
<author>
<name>Kuniyuki Iwashima</name>
<email>kuniyu@amazon.co.jp</email>
</author>
<published>2020-07-21T06:15:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6735c126d27200db85c338410ffd34649dd572ff'/>
<id>6735c126d27200db85c338410ffd34649dd572ff</id>
<content type='text'>
[ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]

If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.

However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.

Therefore, reuseport_grow() should copy has_conns.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Benjamin Herrenschmidt &lt;benh@amazon.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Acked-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit f2b2c55e512879a05456eaf5de4d1ed2f7757509 ]

If an unconnected socket in a UDP reuseport group connect()s, has_conns is
set to 1. Then, when a packet is received, udp[46]_lib_lookup2() scans all
sockets in udp_hslot looking for the connected socket with the highest
score.

However, when the number of sockets bound to the port exceeds max_socks,
reuseport_grow() resets has_conns to 0. It can cause udp[46]_lib_lookup2()
to return without scanning all sockets, resulting in that packets sent to
connected sockets may be distributed to unconnected sockets.

Therefore, reuseport_grow() should copy has_conns.

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Benjamin Herrenschmidt &lt;benh@amazon.com&gt;
Signed-off-by: Kuniyuki Iwashima &lt;kuniyu@amazon.co.jp&gt;
Acked-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rtnetlink: Fix memory(net_device) leak when -&gt;newlink fails</title>
<updated>2020-07-31T16:39:30+00:00</updated>
<author>
<name>Weilong Chen</name>
<email>chenweilong@huawei.com</email>
</author>
<published>2020-07-15T12:58:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=01c928350641cee51b96d9b0a6bfcff1cde18321'/>
<id>01c928350641cee51b96d9b0a6bfcff1cde18321</id>
<content type='text'>
[ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]

When vlan_newlink call register_vlan_dev fails, it might return error
with dev-&gt;reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.

BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
  comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
  hex dump (first 32 bytes):
    76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00  vlan2...........
    00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00  .E(.............
  backtrace:
    [&lt;0000000047527e31&gt;] kmalloc_node include/linux/slab.h:578 [inline]
    [&lt;0000000047527e31&gt;] kvmalloc_node+0x33/0xd0 mm/util.c:574
    [&lt;000000002b59e3bc&gt;] kvmalloc include/linux/mm.h:753 [inline]
    [&lt;000000002b59e3bc&gt;] kvzalloc include/linux/mm.h:761 [inline]
    [&lt;000000002b59e3bc&gt;] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
    [&lt;000000006076752a&gt;] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
    [&lt;00000000572b3be5&gt;] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
    [&lt;00000000e84ea553&gt;] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
    [&lt;0000000052c7c0a9&gt;] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
    [&lt;000000004b5cb379&gt;] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
    [&lt;00000000c71c20d3&gt;] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    [&lt;00000000c71c20d3&gt;] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
    [&lt;00000000cca72fa9&gt;] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
    [&lt;000000009221ebf7&gt;] sock_sendmsg_nosec net/socket.c:652 [inline]
    [&lt;000000009221ebf7&gt;] sock_sendmsg+0x109/0x140 net/socket.c:672
    [&lt;000000001c30ffe4&gt;] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
    [&lt;00000000b71ca6f3&gt;] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
    [&lt;0000000007297384&gt;] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
    [&lt;000000000eb29b11&gt;] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
    [&lt;000000006839b4d0&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit cebb69754f37d68e1355a5e726fdac317bcda302 ]

When vlan_newlink call register_vlan_dev fails, it might return error
with dev-&gt;reg_state = NETREG_UNREGISTERED. The rtnl_newlink should
free the memory. But currently rtnl_newlink only free the memory which
state is NETREG_UNINITIALIZED.

BUG: memory leak
unreferenced object 0xffff8881051de000 (size 4096):
  comm "syz-executor139", pid 560, jiffies 4294745346 (age 32.445s)
  hex dump (first 32 bytes):
    76 6c 61 6e 32 00 00 00 00 00 00 00 00 00 00 00  vlan2...........
    00 45 28 03 81 88 ff ff 00 00 00 00 00 00 00 00  .E(.............
  backtrace:
    [&lt;0000000047527e31&gt;] kmalloc_node include/linux/slab.h:578 [inline]
    [&lt;0000000047527e31&gt;] kvmalloc_node+0x33/0xd0 mm/util.c:574
    [&lt;000000002b59e3bc&gt;] kvmalloc include/linux/mm.h:753 [inline]
    [&lt;000000002b59e3bc&gt;] kvzalloc include/linux/mm.h:761 [inline]
    [&lt;000000002b59e3bc&gt;] alloc_netdev_mqs+0x83/0xd90 net/core/dev.c:9929
    [&lt;000000006076752a&gt;] rtnl_create_link+0x2c0/0xa20 net/core/rtnetlink.c:3067
    [&lt;00000000572b3be5&gt;] __rtnl_newlink+0xc9c/0x1330 net/core/rtnetlink.c:3329
    [&lt;00000000e84ea553&gt;] rtnl_newlink+0x66/0x90 net/core/rtnetlink.c:3397
    [&lt;0000000052c7c0a9&gt;] rtnetlink_rcv_msg+0x540/0x990 net/core/rtnetlink.c:5460
    [&lt;000000004b5cb379&gt;] netlink_rcv_skb+0x12b/0x3a0 net/netlink/af_netlink.c:2469
    [&lt;00000000c71c20d3&gt;] netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
    [&lt;00000000c71c20d3&gt;] netlink_unicast+0x4c6/0x690 net/netlink/af_netlink.c:1329
    [&lt;00000000cca72fa9&gt;] netlink_sendmsg+0x735/0xcc0 net/netlink/af_netlink.c:1918
    [&lt;000000009221ebf7&gt;] sock_sendmsg_nosec net/socket.c:652 [inline]
    [&lt;000000009221ebf7&gt;] sock_sendmsg+0x109/0x140 net/socket.c:672
    [&lt;000000001c30ffe4&gt;] ____sys_sendmsg+0x5f5/0x780 net/socket.c:2352
    [&lt;00000000b71ca6f3&gt;] ___sys_sendmsg+0x11d/0x1a0 net/socket.c:2406
    [&lt;0000000007297384&gt;] __sys_sendmsg+0xeb/0x1b0 net/socket.c:2439
    [&lt;000000000eb29b11&gt;] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
    [&lt;000000006839b4d0&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: cb626bf566eb ("net-sysfs: Fix reference count leak")
Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Weilong Chen &lt;chenweilong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net-sysfs: add a newline when printing 'tx_timeout' by sysfs</title>
<updated>2020-07-31T16:39:30+00:00</updated>
<author>
<name>Xiongfeng Wang</name>
<email>wangxiongfeng2@huawei.com</email>
</author>
<published>2020-07-21T07:02:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=274b40b6df6cea30f794b83eb04198f1acca15b0'/>
<id>274b40b6df6cea30f794b83eb04198f1acca15b0</id>
<content type='text'>
[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang &lt;wangxiongfeng2@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang &lt;wangxiongfeng2@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dev: Defer free of skbs in flush_backlog</title>
<updated>2020-07-31T16:39:30+00:00</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2020-07-23T17:31:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d109acd58052ce4affc01e931765045c368ed31e'/>
<id>d109acd58052ce4affc01e931765045c368ed31e</id>
<content type='text'>
[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: fix cgroup_sk_alloc() for sk_clone_lock()</title>
<updated>2020-07-22T07:32:49+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2020-07-02T18:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=94886c86e833dbc8995202b6c6aaff592b7abd24'/>
<id>94886c86e833dbc8995202b6c6aaff592b7abd24</id>
<content type='text'>
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd-&gt;val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd-&gt;val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Reported-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Reported-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Reported-by: Lu Fengqi &lt;lufq.fnst@cn.fujitsu.com&gt;
Reported-by: Daniël Sonck &lt;dsonck92@gmail.com&gt;
Reported-by: Zhang Qiang &lt;qiang.zhang@windriver.com&gt;
Tested-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Tested-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Tested-by: Thomas Lamprecht &lt;t.lamprecht@proxmox.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd-&gt;val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd-&gt;val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Reported-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Reported-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Reported-by: Lu Fengqi &lt;lufq.fnst@cn.fujitsu.com&gt;
Reported-by: Daniël Sonck &lt;dsonck92@gmail.com&gt;
Reported-by: Zhang Qiang &lt;qiang.zhang@windriver.com&gt;
Tested-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Tested-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Tested-by: Thomas Lamprecht &lt;t.lamprecht@proxmox.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched: consistently handle layer3 header accesses in the presence of VLANs</title>
<updated>2020-07-22T07:32:48+00:00</updated>
<author>
<name>Toke Høiland-Jørgensen</name>
<email>toke@redhat.com</email>
</author>
<published>2020-07-03T20:26:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9b7fd81cf9b6ca322cffc6aff4be0f2d54e9689a'/>
<id>9b7fd81cf9b6ca322cffc6aff4be0f2d54e9689a</id>
<content type='text'>
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

There are a couple of places in net/sched/ that check skb-&gt;protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb-&gt;protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb-&gt;protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
  bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb-&gt;protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
  calling the helper twice

Reported-by: Ilya Ponetayev &lt;i.ponetaev@ndmsystems.com&gt;
Fixes: d8b9605d2697 ("net: sched: fix skb-&gt;protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit d7bf2ebebc2bd61ab95e2a8e33541ef282f303d4 ]

There are a couple of places in net/sched/ that check skb-&gt;protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb-&gt;protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb-&gt;protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
  bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb-&gt;protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
  calling the helper twice

Reported-by: Ilya Ponetayev &lt;i.ponetaev@ndmsystems.com&gt;
Fixes: d8b9605d2697 ("net: sched: fix skb-&gt;protocol use in case of accelerated vlan path")
Signed-off-by: Toke Høiland-Jørgensen &lt;toke@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok()</title>
<updated>2020-07-16T06:16:45+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2020-07-02T22:45:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=baef8d1027b0c037128c8be56338e6e066317e68'/>
<id>baef8d1027b0c037128c8be56338e6e066317e68</id>
<content type='text'>
commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream.

When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file-&gt;f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.

Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 63960260457a02af2a6cb35d75e6bdb17299c882 upstream.

When evaluating access control over kallsyms visibility, credentials at
open() time need to be used, not the "current" creds (though in BPF's
case, this has likely always been the same). Plumb access to associated
file-&gt;f_cred down through bpf_dump_raw_ok() and its callers now that
kallsysm_show_value() has been refactored to take struct cred.

Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: bpf@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 7105e828c087 ("bpf: allow for correlation of maps and helpers in dump")
Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, sockmap: RCU dereferenced psock may be used outside RCU block</title>
<updated>2020-07-16T06:16:37+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-06-25T23:13:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b709a08bc4d7e39f9a5a370142ba41c02af09c44'/>
<id>b709a08bc4d7e39f9a5a370142ba41c02af09c44</id>
<content type='text'>
[ Upstream commit 8025751d4d55a2f32be6bdf825b6a80c299875f5 ]

If an ingress verdict program specifies message sizes greater than
skb-&gt;len and there is an ENOMEM error due to memory pressure we
may call the rcv_msg handler outside the strp_data_ready() caller
context. This is because on an ENOMEM error the strparser will
retry from a workqueue. The caller currently protects the use of
psock by calling the strp_data_ready() inside a rcu_read_lock/unlock
block.

But, in above workqueue error case the psock is accessed outside
the read_lock/unlock block of the caller. So instead of using
psock directly we must do a look up against the sk again to
ensure the psock is available.

There is an an ugly piece here where we must handle
the case where we paused the strp and removed the psock. On
psock removal we first pause the strparser and then remove
the psock. If the strparser is paused while an skb is
scheduled on the workqueue the skb will be dropped on the
flow and kfree_skb() is called. If the workqueue manages
to get called before we pause the strparser but runs the rcvmsg
callback after the psock is removed we will hit the unlikely
case where we run the sockmap rcvmsg handler but do not have
a psock. For now we will follow strparser logic and drop the
skb on the floor with skb_kfree(). This is ugly because the
data is dropped. To date this has not caused problems in practice
because either the application controlling the sockmap is
coordinating with the datapath so that skbs are "flushed"
before removal or we simply wait for the sock to be closed before
removing it.

This patch fixes the describe RCU bug and dropping the skb doesn't
make things worse. Future patches will improve this by allowing
the normal case where skbs are not merged to skip the strparser
altogether. In practice many (most?) use cases have no need to
merge skbs so its both a code complexity hit as seen above and
a performance issue. For example, in the Cilium case we always
set the strparser up to return sbks 1:1 without any merging and
have avoided above issues.

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8025751d4d55a2f32be6bdf825b6a80c299875f5 ]

If an ingress verdict program specifies message sizes greater than
skb-&gt;len and there is an ENOMEM error due to memory pressure we
may call the rcv_msg handler outside the strp_data_ready() caller
context. This is because on an ENOMEM error the strparser will
retry from a workqueue. The caller currently protects the use of
psock by calling the strp_data_ready() inside a rcu_read_lock/unlock
block.

But, in above workqueue error case the psock is accessed outside
the read_lock/unlock block of the caller. So instead of using
psock directly we must do a look up against the sk again to
ensure the psock is available.

There is an an ugly piece here where we must handle
the case where we paused the strp and removed the psock. On
psock removal we first pause the strparser and then remove
the psock. If the strparser is paused while an skb is
scheduled on the workqueue the skb will be dropped on the
flow and kfree_skb() is called. If the workqueue manages
to get called before we pause the strparser but runs the rcvmsg
callback after the psock is removed we will hit the unlikely
case where we run the sockmap rcvmsg handler but do not have
a psock. For now we will follow strparser logic and drop the
skb on the floor with skb_kfree(). This is ugly because the
data is dropped. To date this has not caused problems in practice
because either the application controlling the sockmap is
coordinating with the datapath so that skbs are "flushed"
before removal or we simply wait for the sock to be closed before
removing it.

This patch fixes the describe RCU bug and dropping the skb doesn't
make things worse. Future patches will improve this by allowing
the normal case where skbs are not merged to skip the strparser
altogether. In practice many (most?) use cases have no need to
merge skbs so its both a code complexity hit as seen above and
a performance issue. For example, in the Cilium case we always
set the strparser up to return sbks 1:1 without any merging and
have avoided above issues.

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf, sockmap: RCU splat with redirect and strparser error or TLS</title>
<updated>2020-07-16T06:16:37+00:00</updated>
<author>
<name>John Fastabend</name>
<email>john.fastabend@gmail.com</email>
</author>
<published>2020-06-25T23:12:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2000bb546525890584a3cf0ce4787270250b582c'/>
<id>2000bb546525890584a3cf0ce4787270250b582c</id>
<content type='text'>
[ Upstream commit 93dd5f185916b05e931cffae636596f21f98546e ]

There are two paths to generate the below RCU splat the first and
most obvious is the result of the BPF verdict program issuing a
redirect on a TLS socket (This is the splat shown below). Unlike
the non-TLS case the caller of the *strp_read() hooks does not
wrap the call in a rcu_read_lock/unlock. Then if the BPF program
issues a redirect action we hit the RCU splat.

However, in the non-TLS socket case the splat appears to be
relatively rare, because the skmsg caller into the strp_data_ready()
is wrapped in a rcu_read_lock/unlock. Shown here,

 static void sk_psock_strp_data_ready(struct sock *sk)
 {
	struct sk_psock *psock;

	rcu_read_lock();
	psock = sk_psock(sk);
	if (likely(psock)) {
		if (tls_sw_has_ctx_rx(sk)) {
			psock-&gt;parser.saved_data_ready(sk);
		} else {
			write_lock_bh(&amp;sk-&gt;sk_callback_lock);
			strp_data_ready(&amp;psock-&gt;parser.strp);
			write_unlock_bh(&amp;sk-&gt;sk_callback_lock);
		}
	}
	rcu_read_unlock();
 }

If the above was the only way to run the verdict program we
would be safe. But, there is a case where the strparser may throw an
ENOMEM error while parsing the skb. This is a result of a failed
skb_clone, or alloc_skb_for_msg while building a new merged skb when
the msg length needed spans multiple skbs. This will in turn put the
skb on the strp_wrk workqueue in the strparser code. The skb will
later be dequeued and verdict programs run, but now from a
different context without the rcu_read_lock()/unlock() critical
section in sk_psock_strp_data_ready() shown above. In practice
I have not seen this yet, because as far as I know most users of the
verdict programs are also only working on single skbs. In this case no
merge happens which could trigger the above ENOMEM errors. In addition
the system would need to be under memory pressure. For example, we
can't hit the above case in selftests because we missed having tests
to merge skbs. (Added in later patch)

To fix the below splat extend the rcu_read_lock/unnlock block to
include the call to sk_psock_tls_verdict_apply(). This will fix both
TLS redirect case and non-TLS redirect+error case. Also remove
psock from the sk_psock_tls_verdict_apply() function signature its
not used there.

[ 1095.937597] WARNING: suspicious RCU usage
[ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G        W
[ 1095.944363] -----------------------------
[ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage!
[ 1095.950866]
[ 1095.950866] other info that might help us debug this:
[ 1095.950866]
[ 1095.957146]
[ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1
[ 1095.961482] 1 lock held by test_sockmap/15970:
[ 1095.964501]  #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls]
[ 1095.968568]
[ 1095.968568] stack backtrace:
[ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G        W         5.7.0-rc7-02911-g463bac5f1ca79 #1
[ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1095.980519] Call Trace:
[ 1095.982191]  dump_stack+0x8f/0xd0
[ 1095.984040]  sk_psock_skb_redirect+0xa6/0xf0
[ 1095.986073]  sk_psock_tls_strp_read+0x1d8/0x250
[ 1095.988095]  tls_sw_recvmsg+0x714/0x840 [tls]

v2: Improve commit message to identify non-TLS redirect plus error case
    condition as well as more common TLS case. In the process I decided
    doing the rcu_read_unlock followed by the lock/unlock inside branches
    was unnecessarily complex. We can just extend the current rcu block
    and get the same effeective without the shuffling and branching.
    Thanks Martin!

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Reported-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Reported-by: kernel test robot &lt;rong.a.chen@intel.com&gt;
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 93dd5f185916b05e931cffae636596f21f98546e ]

There are two paths to generate the below RCU splat the first and
most obvious is the result of the BPF verdict program issuing a
redirect on a TLS socket (This is the splat shown below). Unlike
the non-TLS case the caller of the *strp_read() hooks does not
wrap the call in a rcu_read_lock/unlock. Then if the BPF program
issues a redirect action we hit the RCU splat.

However, in the non-TLS socket case the splat appears to be
relatively rare, because the skmsg caller into the strp_data_ready()
is wrapped in a rcu_read_lock/unlock. Shown here,

 static void sk_psock_strp_data_ready(struct sock *sk)
 {
	struct sk_psock *psock;

	rcu_read_lock();
	psock = sk_psock(sk);
	if (likely(psock)) {
		if (tls_sw_has_ctx_rx(sk)) {
			psock-&gt;parser.saved_data_ready(sk);
		} else {
			write_lock_bh(&amp;sk-&gt;sk_callback_lock);
			strp_data_ready(&amp;psock-&gt;parser.strp);
			write_unlock_bh(&amp;sk-&gt;sk_callback_lock);
		}
	}
	rcu_read_unlock();
 }

If the above was the only way to run the verdict program we
would be safe. But, there is a case where the strparser may throw an
ENOMEM error while parsing the skb. This is a result of a failed
skb_clone, or alloc_skb_for_msg while building a new merged skb when
the msg length needed spans multiple skbs. This will in turn put the
skb on the strp_wrk workqueue in the strparser code. The skb will
later be dequeued and verdict programs run, but now from a
different context without the rcu_read_lock()/unlock() critical
section in sk_psock_strp_data_ready() shown above. In practice
I have not seen this yet, because as far as I know most users of the
verdict programs are also only working on single skbs. In this case no
merge happens which could trigger the above ENOMEM errors. In addition
the system would need to be under memory pressure. For example, we
can't hit the above case in selftests because we missed having tests
to merge skbs. (Added in later patch)

To fix the below splat extend the rcu_read_lock/unnlock block to
include the call to sk_psock_tls_verdict_apply(). This will fix both
TLS redirect case and non-TLS redirect+error case. Also remove
psock from the sk_psock_tls_verdict_apply() function signature its
not used there.

[ 1095.937597] WARNING: suspicious RCU usage
[ 1095.940964] 5.7.0-rc7-02911-g463bac5f1ca79 #1 Tainted: G        W
[ 1095.944363] -----------------------------
[ 1095.947384] include/linux/skmsg.h:284 suspicious rcu_dereference_check() usage!
[ 1095.950866]
[ 1095.950866] other info that might help us debug this:
[ 1095.950866]
[ 1095.957146]
[ 1095.957146] rcu_scheduler_active = 2, debug_locks = 1
[ 1095.961482] 1 lock held by test_sockmap/15970:
[ 1095.964501]  #0: ffff9ea6b25de660 (sk_lock-AF_INET){+.+.}-{0:0}, at: tls_sw_recvmsg+0x13a/0x840 [tls]
[ 1095.968568]
[ 1095.968568] stack backtrace:
[ 1095.975001] CPU: 1 PID: 15970 Comm: test_sockmap Tainted: G        W         5.7.0-rc7-02911-g463bac5f1ca79 #1
[ 1095.977883] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 1095.980519] Call Trace:
[ 1095.982191]  dump_stack+0x8f/0xd0
[ 1095.984040]  sk_psock_skb_redirect+0xa6/0xf0
[ 1095.986073]  sk_psock_tls_strp_read+0x1d8/0x250
[ 1095.988095]  tls_sw_recvmsg+0x714/0x840 [tls]

v2: Improve commit message to identify non-TLS redirect plus error case
    condition as well as more common TLS case. In the process I decided
    doing the rcu_read_unlock followed by the lock/unlock inside branches
    was unnecessarily complex. We can just extend the current rcu block
    and get the same effeective without the shuffling and branching.
    Thanks Martin!

Fixes: e91de6afa81c1 ("bpf: Fix running sk_skb program types with ktls")
Reported-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Reported-by: kernel test robot &lt;rong.a.chen@intel.com&gt;
Signed-off-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Acked-by: Jakub Sitnicki &lt;jakub@cloudflare.com&gt;
Link: https://lore.kernel.org/bpf/159312677907.18340.11064813152758406626.stgit@john-XPS-13-9370
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
