<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/core, branch v4.9.232</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>net-sysfs: add a newline when printing 'tx_timeout' by sysfs</title>
<updated>2020-07-31T14:44:06+00:00</updated>
<author>
<name>Xiongfeng Wang</name>
<email>wangxiongfeng2@huawei.com</email>
</author>
<published>2020-07-21T07:02:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c0e3a9a4c4c96b4b2ebaaa198404a601f201ee5b'/>
<id>c0e3a9a4c4c96b4b2ebaaa198404a601f201ee5b</id>
<content type='text'>
[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang &lt;wangxiongfeng2@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 9bb5fbea59f36a589ef886292549ca4052fe676c ]

When I cat 'tx_timeout' by sysfs, it displays as follows. It's better to
add a newline for easy reading.

root@syzkaller:~# cat /sys/devices/virtual/net/lo/queues/tx-0/tx_timeout
0root@syzkaller:~#

Signed-off-by: Xiongfeng Wang &lt;wangxiongfeng2@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dev: Defer free of skbs in flush_backlog</title>
<updated>2020-07-31T14:44:06+00:00</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2020-07-23T17:31:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b7441315b8d2b25a2bdcc71957148b8ad843955a'/>
<id>b7441315b8d2b25a2bdcc71957148b8ad843955a</id>
<content type='text'>
[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 7df5cb75cfb8acf96c7f2342530eb41e0c11f4c3 ]

IRQs are disabled when freeing skbs in input queue.
Use the IRQ safe variant to free skbs here.

Fixes: 145dd5f9c88f ("net: flush the softnet backlog in process context")
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: fix cgroup_sk_alloc() for sk_clone_lock()</title>
<updated>2020-07-22T07:10:48+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2020-07-02T18:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=51fbad61b1dc2a082c7f7dbc3b1299a1e40c061a'/>
<id>51fbad61b1dc2a082c7f7dbc3b1299a1e40c061a</id>
<content type='text'>
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd-&gt;val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd-&gt;val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Reported-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Reported-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Reported-by: Lu Fengqi &lt;lufq.fnst@cn.fujitsu.com&gt;
Reported-by: Daniël Sonck &lt;dsonck92@gmail.com&gt;
Reported-by: Zhang Qiang &lt;qiang.zhang@windriver.com&gt;
Tested-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Tested-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Tested-by: Thomas Lamprecht &lt;t.lamprecht@proxmox.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ad0f75e5f57ccbceec13274e1e242f2b5a6397ed ]

When we clone a socket in sk_clone_lock(), its sk_cgrp_data is
copied, so the cgroup refcnt must be taken too. And, unlike the
sk_alloc() path, sock_update_netprioidx() is not called here.
Therefore, it is safe and necessary to grab the cgroup refcnt
even when cgroup_sk_alloc is disabled.

sk_clone_lock() is in BH context anyway, the in_interrupt()
would terminate this function if called there. And for sk_alloc()
skcd-&gt;val is always zero. So it's safe to factor out the code
to make it more readable.

The global variable 'cgroup_sk_alloc_disabled' is used to determine
whether to take these reference counts. It is impossible to make
the reference counting correct unless we save this bit of information
in skcd-&gt;val. So, add a new bit there to record whether the socket
has already taken the reference counts. This obviously relies on
kmalloc() to align cgroup pointers to at least 4 bytes,
ARCH_KMALLOC_MINALIGN is certainly larger than that.

This bug seems to be introduced since the beginning, commit
d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
tried to fix it but not compeletely. It seems not easy to trigger until
the recent commit 090e28b229af
("netprio_cgroup: Fix unlimited memory leak of v2 cgroups") was merged.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Reported-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Reported-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Reported-by: Lu Fengqi &lt;lufq.fnst@cn.fujitsu.com&gt;
Reported-by: Daniël Sonck &lt;dsonck92@gmail.com&gt;
Reported-by: Zhang Qiang &lt;qiang.zhang@windriver.com&gt;
Tested-by: Cameron Berkenpas &lt;cam@neo-zeon.de&gt;
Tested-by: Peter Geis &lt;pgwipeout@gmail.com&gt;
Tested-by: Thomas Lamprecht &lt;t.lamprecht@proxmox.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Zefan Li &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Roman Gushchin &lt;guro@fb.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Do not clear the sock TX queue in sk_set_socket()</title>
<updated>2020-06-30T19:38:39+00:00</updated>
<author>
<name>Tariq Toukan</name>
<email>tariqt@mellanox.com</email>
</author>
<published>2020-06-22T20:26:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=673212158fc97950d8f9f512e28e1bb275aad463'/>
<id>673212158fc97950d8f9f512e28e1bb275aad463</id>
<content type='text'>
[ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]

Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.

This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.

Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.

Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: Tariq Toukan &lt;tariqt@mellanox.com&gt;
Reviewed-by: Boris Pismenny &lt;borisp@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 41b14fb8724d5a4b382a63cb4a1a61880347ccb8 ]

Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.

This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.

Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.

Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: Tariq Toukan &lt;tariqt@mellanox.com&gt;
Reviewed-by: Boris Pismenny &lt;borisp@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix memleak in register_netdevice()</title>
<updated>2020-06-30T19:38:38+00:00</updated>
<author>
<name>Yang Yingliang</name>
<email>yangyingliang@huawei.com</email>
</author>
<published>2020-06-16T09:39:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=40b56938e09cb48f797ff01fb97f841fef531f04'/>
<id>40b56938e09cb48f797ff01fb97f841fef531f04</id>
<content type='text'>
[ Upstream commit 814152a89ed52c722ab92e9fbabcac3cb8a39245 ]

I got a memleak report when doing some fuzz test:

unreferenced object 0xffff888112584000 (size 13599):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 32 bytes):
    74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
    00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;000000002f60ba65&gt;] __kmalloc_node+0x309/0x3a0
    [&lt;0000000075b211ec&gt;] kvmalloc_node+0x7f/0xc0
    [&lt;00000000d3a97396&gt;] alloc_netdev_mqs+0x76/0xfc0
    [&lt;00000000609c3655&gt;] __tun_chr_ioctl+0x1456/0x3d70
    [&lt;000000001127ca24&gt;] ksys_ioctl+0xe5/0x130
    [&lt;00000000b7d5e66a&gt;] __x64_sys_ioctl+0x6f/0xb0
    [&lt;00000000e1023498&gt;] do_syscall_64+0x56/0xa0
    [&lt;000000009ec0eb12&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888111845cc0 (size 8):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 8 bytes):
    74 61 70 30 00 88 ff ff                          tap0....
  backtrace:
    [&lt;000000004c159777&gt;] kstrdup+0x35/0x70
    [&lt;00000000d8b496ad&gt;] kstrdup_const+0x3d/0x50
    [&lt;00000000494e884a&gt;] kvasprintf_const+0xf1/0x180
    [&lt;0000000097880a2b&gt;] kobject_set_name_vargs+0x56/0x140
    [&lt;000000008fbdfc7b&gt;] dev_set_name+0xab/0xe0
    [&lt;000000005b99e3b4&gt;] netdev_register_kobject+0xc0/0x390
    [&lt;00000000602704fe&gt;] register_netdevice+0xb61/0x1250
    [&lt;000000002b7ca244&gt;] __tun_chr_ioctl+0x1cd1/0x3d70
    [&lt;000000001127ca24&gt;] ksys_ioctl+0xe5/0x130
    [&lt;00000000b7d5e66a&gt;] __x64_sys_ioctl+0x6f/0xb0
    [&lt;00000000e1023498&gt;] do_syscall_64+0x56/0xa0
    [&lt;000000009ec0eb12&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff88811886d800 (size 512):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
  backtrace:
    [&lt;0000000050315800&gt;] device_add+0x61e/0x1950
    [&lt;0000000021008dfb&gt;] netdev_register_kobject+0x17e/0x390
    [&lt;00000000602704fe&gt;] register_netdevice+0xb61/0x1250
    [&lt;000000002b7ca244&gt;] __tun_chr_ioctl+0x1cd1/0x3d70
    [&lt;000000001127ca24&gt;] ksys_ioctl+0xe5/0x130
    [&lt;00000000b7d5e66a&gt;] __x64_sys_ioctl+0x6f/0xb0
    [&lt;00000000e1023498&gt;] do_syscall_64+0x56/0xa0
    [&lt;000000009ec0eb12&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

If call_netdevice_notifiers() failed, then rollback_registered()
calls netdev_unregister_kobject() which holds the kobject. The
reference cannot be put because the netdev won't be add to todo
list, so it will leads a memleak, we need put the reference to
avoid memleak.

Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 814152a89ed52c722ab92e9fbabcac3cb8a39245 ]

I got a memleak report when doing some fuzz test:

unreferenced object 0xffff888112584000 (size 13599):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 32 bytes):
    74 61 70 30 00 00 00 00 00 00 00 00 00 00 00 00  tap0............
    00 ee d9 19 81 88 ff ff 00 00 00 00 00 00 00 00  ................
  backtrace:
    [&lt;000000002f60ba65&gt;] __kmalloc_node+0x309/0x3a0
    [&lt;0000000075b211ec&gt;] kvmalloc_node+0x7f/0xc0
    [&lt;00000000d3a97396&gt;] alloc_netdev_mqs+0x76/0xfc0
    [&lt;00000000609c3655&gt;] __tun_chr_ioctl+0x1456/0x3d70
    [&lt;000000001127ca24&gt;] ksys_ioctl+0xe5/0x130
    [&lt;00000000b7d5e66a&gt;] __x64_sys_ioctl+0x6f/0xb0
    [&lt;00000000e1023498&gt;] do_syscall_64+0x56/0xa0
    [&lt;000000009ec0eb12&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff888111845cc0 (size 8):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 8 bytes):
    74 61 70 30 00 88 ff ff                          tap0....
  backtrace:
    [&lt;000000004c159777&gt;] kstrdup+0x35/0x70
    [&lt;00000000d8b496ad&gt;] kstrdup_const+0x3d/0x50
    [&lt;00000000494e884a&gt;] kvasprintf_const+0xf1/0x180
    [&lt;0000000097880a2b&gt;] kobject_set_name_vargs+0x56/0x140
    [&lt;000000008fbdfc7b&gt;] dev_set_name+0xab/0xe0
    [&lt;000000005b99e3b4&gt;] netdev_register_kobject+0xc0/0x390
    [&lt;00000000602704fe&gt;] register_netdevice+0xb61/0x1250
    [&lt;000000002b7ca244&gt;] __tun_chr_ioctl+0x1cd1/0x3d70
    [&lt;000000001127ca24&gt;] ksys_ioctl+0xe5/0x130
    [&lt;00000000b7d5e66a&gt;] __x64_sys_ioctl+0x6f/0xb0
    [&lt;00000000e1023498&gt;] do_syscall_64+0x56/0xa0
    [&lt;000000009ec0eb12&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff88811886d800 (size 512):
  comm "ip", pid 3048, jiffies 4294911734 (age 343.491s)
  hex dump (first 32 bytes):
    00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00  .....N..........
    ff ff ff ff ff ff ff ff c0 66 3d a3 ff ff ff ff  .........f=.....
  backtrace:
    [&lt;0000000050315800&gt;] device_add+0x61e/0x1950
    [&lt;0000000021008dfb&gt;] netdev_register_kobject+0x17e/0x390
    [&lt;00000000602704fe&gt;] register_netdevice+0xb61/0x1250
    [&lt;000000002b7ca244&gt;] __tun_chr_ioctl+0x1cd1/0x3d70
    [&lt;000000001127ca24&gt;] ksys_ioctl+0xe5/0x130
    [&lt;00000000b7d5e66a&gt;] __x64_sys_ioctl+0x6f/0xb0
    [&lt;00000000e1023498&gt;] do_syscall_64+0x56/0xa0
    [&lt;000000009ec0eb12&gt;] entry_SYSCALL_64_after_hwframe+0x44/0xa9

If call_netdevice_notifiers() failed, then rollback_registered()
calls netdev_unregister_kobject() which holds the kobject. The
reference cannot be put because the netdev won't be add to todo
list, so it will leads a memleak, we need put the reference to
avoid memleak.

Reported-by: Hulk Robot &lt;hulkci@huawei.com&gt;
Signed-off-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: core: device_rename: Use rwsem instead of a seqcount</title>
<updated>2020-06-30T19:38:33+00:00</updated>
<author>
<name>Ahmed S. Darwish</name>
<email>a.darwish@linutronix.de</email>
</author>
<published>2020-06-03T14:49:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=815be3b360ddbe50f28a789d5225f26926582e17'/>
<id>815be3b360ddbe50f28a789d5225f26926582e17</id>
<content type='text'>
[ Upstream commit 11d6011c2cf29f7c8181ebde6c8bc0c4d83adcd7 ]

Sequence counters write paths are critical sections that must never be
preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.

Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
netdev name retrieval.") handled a deadlock, observed with
CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
infinitely spinning: it got scheduled after the seqcount write side
blocked inside its own critical section.

To fix that deadlock, among other issues, the commit added a
cond_resched() inside the read side section. While this will get the
non-preemptible kernel eventually unstuck, the seqcount reader is fully
exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.

The fix is also still broken: if the seqcount reader belongs to a
real-time scheduling policy, it can spin forever and the kernel will
livelock.

Disabling preemption over the seqcount write side critical section will
not work: inside it are a number of GFP_KERNEL allocations and mutex
locking through the drivers/base/ :: device_rename() call chain.

&gt;From all the above, replace the seqcount with a rwsem.

Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: kbuild test robot &lt;lkp@intel.com&gt; [ v1 missing up_read() on error exit ]
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt; [ v1 missing up_read() on error exit ]
Signed-off-by: Ahmed S. Darwish &lt;a.darwish@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 11d6011c2cf29f7c8181ebde6c8bc0c4d83adcd7 ]

Sequence counters write paths are critical sections that must never be
preempted, and blocking, even for CONFIG_PREEMPTION=n, is not allowed.

Commit 5dbe7c178d3f ("net: fix kernel deadlock with interface rename and
netdev name retrieval.") handled a deadlock, observed with
CONFIG_PREEMPTION=n, where the devnet_rename seqcount read side was
infinitely spinning: it got scheduled after the seqcount write side
blocked inside its own critical section.

To fix that deadlock, among other issues, the commit added a
cond_resched() inside the read side section. While this will get the
non-preemptible kernel eventually unstuck, the seqcount reader is fully
exhausting its slice just spinning -- until TIF_NEED_RESCHED is set.

The fix is also still broken: if the seqcount reader belongs to a
real-time scheduling policy, it can spin forever and the kernel will
livelock.

Disabling preemption over the seqcount write side critical section will
not work: inside it are a number of GFP_KERNEL allocations and mutex
locking through the drivers/base/ :: device_rename() call chain.

&gt;From all the above, replace the seqcount with a rwsem.

Fixes: 5dbe7c178d3f (net: fix kernel deadlock with interface rename and netdev name retrieval.)
Fixes: 30e6c9fa93cf (net: devnet_rename_seq should be a seqcount)
Fixes: c91f6df2db49 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name)
Cc: &lt;stable@vger.kernel.org&gt;
Reported-by: kbuild test robot &lt;lkp@intel.com&gt; [ v1 missing up_read() on error exit ]
Reported-by: Dan Carpenter &lt;dan.carpenter@oracle.com&gt; [ v1 missing up_read() on error exit ]
Signed-off-by: Ahmed S. Darwish &lt;a.darwish@linutronix.de&gt;
Reviewed-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/rt, net: Use CONFIG_PREEMPTION.patch</title>
<updated>2020-06-30T19:38:33+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-10-15T19:18:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e8f45586d06372c044002b11f12db7a9ba674739'/>
<id>e8f45586d06372c044002b11f12db7a9ba674739</id>
<content type='text'>
[ Upstream commit 2da2b32fd9346009e9acdb68c570ca8d3966aba7 ]

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Update the comment to use CONFIG_PREEMPTION.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20191015191821.11479-22-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 2da2b32fd9346009e9acdb68c570ca8d3966aba7 ]

CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Update the comment to use CONFIG_PREEMPTION.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Sebastian Andrzej Siewior &lt;bigeasy@linutronix.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20191015191821.11479-22-bigeasy@linutronix.de
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: rtnl_configure_link: fix dev flags changes arg to __dev_notify_flags</title>
<updated>2020-06-03T06:16:47+00:00</updated>
<author>
<name>Roopa Prabhu</name>
<email>roopa@cumulusnetworks.com</email>
</author>
<published>2018-09-12T20:21:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fba023f8dd69bbd728cea29a2b8277f5bd805d1b'/>
<id>fba023f8dd69bbd728cea29a2b8277f5bd805d1b</id>
<content type='text'>
commit 56a49d7048703f5ffdb84d3a0ee034108fba6850 upstream.

This fix addresses https://bugzilla.kernel.org/show_bug.cgi?id=201071

Commit 5025f7f7d506 wrongly relied on __dev_change_flags to notify users of
dev flag changes in the case when dev-&gt;rtnl_link_state = RTNL_LINK_INITIALIZED.
Fix it by indicating flag changes explicitly to __dev_notify_flags.

Fixes: 5025f7f7d506 ("rtnetlink: add rtnl_link_state check in rtnl_configure_link")
Reported-By: Liam mcbirnie &lt;liam.mcbirnie@boeing.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 56a49d7048703f5ffdb84d3a0ee034108fba6850 upstream.

This fix addresses https://bugzilla.kernel.org/show_bug.cgi?id=201071

Commit 5025f7f7d506 wrongly relied on __dev_change_flags to notify users of
dev flag changes in the case when dev-&gt;rtnl_link_state = RTNL_LINK_INITIALIZED.
Fix it by indicating flag changes explicitly to __dev_notify_flags.

Fixes: 5025f7f7d506 ("rtnetlink: add rtnl_link_state check in rtnl_configure_link")
Reported-By: Liam mcbirnie &lt;liam.mcbirnie@boeing.com&gt;
Signed-off-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>netprio_cgroup: Fix unlimited memory leak of v2 cgroups</title>
<updated>2020-05-20T06:15:39+00:00</updated>
<author>
<name>Zefan Li</name>
<email>lizefan@huawei.com</email>
</author>
<published>2020-05-09T03:32:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d1d65ba019d21cdbf3e2b83a0a7e0e8512a4b526'/>
<id>d1d65ba019d21cdbf3e2b83a0a7e0e8512a4b526</id>
<content type='text'>
[ Upstream commit 090e28b229af92dc5b40786ca673999d59e73056 ]

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called
cgroup_sk_alloc() thought it's in v2 mode, so it stores
the cgroup pointer in sk-&gt;sk_cgrp_data and increments
the cgroup refcnt, but then sock_update_netprioidx()
thought it's in v1 mode, so it stores netprioidx value
in sk-&gt;sk_cgrp_data, so the cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap
cgroup control file. The easiest fix is to also do the switch when
a task is attached to a new cgroup.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Reported-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Tested-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 090e28b229af92dc5b40786ca673999d59e73056 ]

If systemd is configured to use hybrid mode which enables the use of
both cgroup v1 and v2, systemd will create new cgroup on both the default
root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
task to the two cgroups. If the task does some network thing then the v2
cgroup can never be freed after the session exited.

One of our machines ran into OOM due to this memory leak.

In the scenario described above when sk_alloc() is called
cgroup_sk_alloc() thought it's in v2 mode, so it stores
the cgroup pointer in sk-&gt;sk_cgrp_data and increments
the cgroup refcnt, but then sock_update_netprioidx()
thought it's in v1 mode, so it stores netprioidx value
in sk-&gt;sk_cgrp_data, so the cgroup refcnt will never be freed.

Currently we do the mode switch when someone writes to the ifpriomap
cgroup control file. The easiest fix is to also do the switch when
a task is attached to a new cgroup.

Fixes: bd1060a1d671 ("sock, cgroup: add sock-&gt;sk_cgroup")
Reported-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Tested-by: Yang Yingliang &lt;yangyingliang@huawei.com&gt;
Signed-off-by: Zefan Li &lt;lizefan@huawei.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix a potential recursive NETDEV_FEAT_CHANGE</title>
<updated>2020-05-20T06:15:39+00:00</updated>
<author>
<name>Cong Wang</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2020-05-07T19:19:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=74af5e3ecf14b11c822870d9235d51d383401f9e'/>
<id>74af5e3ecf14b11c822870d9235d51d383401f9e</id>
<content type='text'>
[ Upstream commit dd912306ff008891c82cd9f63e8181e47a9cb2fb ]

syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event
between bonding master and slave. I managed to find a reproducer
for this:

  ip li set bond0 up
  ifenslave bond0 eth0
  brctl addbr br0
  ethtool -K eth0 lro off
  brctl addif br0 bond0
  ip li set br0 up

When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave,
it captures this and calls bond_compute_features() to fixup its
master's and other slaves' features. However, when syncing with
its lower devices by netdev_sync_lower_features() this event is
triggered again on slaves when the LRO feature fails to change,
so it goes back and forth recursively until the kernel stack is
exhausted.

Commit 17b85d29e82c intentionally lets __netdev_update_features()
return -1 for such a failure case, so we have to just rely on
the existing check inside netdev_sync_lower_features() and skip
NETDEV_FEAT_CHANGE event only for this specific failure case.

Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Reported-by: syzbot+e73ceacfd8560cc8a3ca@syzkaller.appspotmail.com
Reported-by: syzbot+c2fb6f9ddcea95ba49b5@syzkaller.appspotmail.com
Cc: Jarod Wilson &lt;jarod@redhat.com&gt;
Cc: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Reviewed-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit dd912306ff008891c82cd9f63e8181e47a9cb2fb ]

syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event
between bonding master and slave. I managed to find a reproducer
for this:

  ip li set bond0 up
  ifenslave bond0 eth0
  brctl addbr br0
  ethtool -K eth0 lro off
  brctl addif br0 bond0
  ip li set br0 up

When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave,
it captures this and calls bond_compute_features() to fixup its
master's and other slaves' features. However, when syncing with
its lower devices by netdev_sync_lower_features() this event is
triggered again on slaves when the LRO feature fails to change,
so it goes back and forth recursively until the kernel stack is
exhausted.

Commit 17b85d29e82c intentionally lets __netdev_update_features()
return -1 for such a failure case, so we have to just rely on
the existing check inside netdev_sync_lower_features() and skip
NETDEV_FEAT_CHANGE event only for this specific failure case.

Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack")
Reported-by: syzbot+e73ceacfd8560cc8a3ca@syzkaller.appspotmail.com
Reported-by: syzbot+c2fb6f9ddcea95ba49b5@syzkaller.appspotmail.com
Cc: Jarod Wilson &lt;jarod@redhat.com&gt;
Cc: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Reviewed-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
