<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/netlink, branch v4.9-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>netlink: do not enter direct reclaim from netlink_dump()</title>
<updated>2016-10-07T00:53:13+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-10-05T19:13:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d35c99ff77ecb2eb239731b799386f3b3637a31e'/>
<id>d35c99ff77ecb2eb239731b799386f3b3637a31e</id>
<content type='text'>
Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.

Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.

The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.

On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAIT

While we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.

iproute2 already uses 32KB recvmsg() buffer sizes.

Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)

Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Reviewed-by: Greg Rose &lt;grose@lightfleet.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.

Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.

The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.

On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAIT

While we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.

iproute2 already uses 32KB recvmsg() buffer sizes.

Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)

Fixes: 9063e21fb026 ("netlink: autosize skb lengthes")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Greg Thelen &lt;gthelen@google.com&gt;
Reviewed-by: Greg Rose &lt;grose@lightfleet.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: don't forget to release a rhashtable_iter structure</title>
<updated>2016-09-08T00:29:38+00:00</updated>
<author>
<name>Andrey Vagin</name>
<email>avagin@openvz.org</email>
</author>
<published>2016-09-07T04:31:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=733ade23de1b72c1f11c5e4a1a9020a6f48decd2'/>
<id>733ade23de1b72c1f11c5e4a1a9020a6f48decd2</id>
<content type='text'>
This bug was detected by kmemleak:
unreferenced object 0xffff8804269cc3c0 (size 64):
  comm "criu", pid 1042, jiffies 4294907360 (age 13.713s)
  hex dump (first 32 bytes):
    a0 32 cc 2c 04 88 ff ff 00 00 00 00 00 00 00 00  .2.,............
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
  backtrace:
    [&lt;ffffffff8184dffa&gt;] kmemleak_alloc+0x4a/0xa0
    [&lt;ffffffff8124720f&gt;] kmem_cache_alloc_trace+0x10f/0x280
    [&lt;ffffffffa02864cc&gt;] __netlink_diag_dump+0x26c/0x290 [netlink_diag]

v2: don't remove a reference on a rhashtable_iter structure to
    release it from netlink_diag_dump_done

Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
Signed-off-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This bug was detected by kmemleak:
unreferenced object 0xffff8804269cc3c0 (size 64):
  comm "criu", pid 1042, jiffies 4294907360 (age 13.713s)
  hex dump (first 32 bytes):
    a0 32 cc 2c 04 88 ff ff 00 00 00 00 00 00 00 00  .2.,............
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
  backtrace:
    [&lt;ffffffff8184dffa&gt;] kmemleak_alloc+0x4a/0xa0
    [&lt;ffffffff8124720f&gt;] kmem_cache_alloc_trace+0x10f/0x280
    [&lt;ffffffffa02864cc&gt;] __netlink_diag_dump+0x26c/0x290 [netlink_diag]

v2: don't remove a reference on a rhashtable_iter structure to
    release it from netlink_diag_dump_done

Cc: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Fixes: ad202074320c ("netlink: Use rhashtable walk interface in diag dump")
Signed-off-by: Andrei Vagin &lt;avagin@openvz.org&gt;
Acked-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: make genetlink ctrl ops const</title>
<updated>2016-09-01T21:09:00+00:00</updated>
<author>
<name>stephen hemminger</name>
<email>stephen@networkplumber.org</email>
</author>
<published>2016-08-31T22:22:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=12d8de6d952372102db2faedd19913dbfa883c5d'/>
<id>12d8de6d952372102db2faedd19913dbfa883c5d</id>
<content type='text'>
Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Stephen Hemminger &lt;stephen@networkplumber.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: Use rhashtable walk interface in diag dump</title>
<updated>2016-08-19T21:40:25+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2016-08-19T08:21:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ad202074320cd75b31b8cdb58cca0d4ef6aaea8a'/>
<id>ad202074320cd75b31b8cdb58cca0d4ef6aaea8a</id>
<content type='text'>
This patch converts the diag dumping code to use the rhashtable
walk code instead of going through rhashtable by hand.  The lock
nl_table_lock is now only taken while we process the multicast
list as it's not needed for the rhashtable walk.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch converts the diag dumping code to use the rhashtable
walk code instead of going through rhashtable by hand.  The lock
nl_table_lock is now only taken while we process the multicast
list as it's not needed for the rhashtable walk.

Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net/netlink/af_netlink.h: Remove unused structure.</title>
<updated>2016-06-10T05:26:24+00:00</updated>
<author>
<name>Fabien Siron</name>
<email>fabien.siron@epita.fr</email>
</author>
<published>2016-06-07T13:02:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=21aff3b905ad9e5e52b18a755c13fe755bd6ab3d'/>
<id>21aff3b905ad9e5e52b18a755c13fe755bd6ab3d</id>
<content type='text'>
Signed-off-by: Fabien Siron &lt;fabien.siron@epita.fr&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Fabien Siron &lt;fabien.siron@epita.fr&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: Fix dump skb leak/double free</title>
<updated>2016-05-17T02:05:15+00:00</updated>
<author>
<name>Herbert Xu</name>
<email>herbert@gondor.apana.org.au</email>
</author>
<published>2016-05-16T09:28:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=92964c79b357efd980812c4de5c1fd2ec8bb5520'/>
<id>92964c79b357efd980812c4de5c1fd2ec8bb5520</id>
<content type='text'>
When we free cb-&gt;skb after a dump, we do it after releasing the
lock.  This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.

This patch saves the skb and module before we unlock so we free
the right memory.

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: Baozeng Ding &lt;sploving1@gmail.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we free cb-&gt;skb after a dump, we do it after releasing the
lock.  This means that a new dump could have started in the time
being and we'll end up freeing their skb instead of ours.

This patch saves the skb and module before we unlock so we free
the right memory.

Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
Reported-by: Baozeng Ding &lt;sploving1@gmail.com&gt;
Signed-off-by: Herbert Xu &lt;herbert@gondor.apana.org.au&gt;
Acked-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2016-04-23T22:51:33+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-04-23T22:26:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1602f49b58abcb0d34a5f0a29d68e7c1769547aa'/>
<id>1602f49b58abcb0d34a5f0a29d68e7c1769547aa</id>
<content type='text'>
Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: don't send NETLINK_URELEASE for unbound sockets</title>
<updated>2016-04-11T03:32:23+00:00</updated>
<author>
<name>Dmitry Ivanov</name>
<email>dmitrijs.ivanovs@ubnt.com</email>
</author>
<published>2016-04-07T07:31:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e27260203912b40751fa353d009eaa5a642c739f'/>
<id>e27260203912b40751fa353d009eaa5a642c739f</id>
<content type='text'>
All existing users of NETLINK_URELEASE use it to clean up resources that
were previously allocated to a socket via some command. As a result, no
users require getting this notification for unbound sockets.

Sending it for unbound sockets, however, is a problem because any user
(including unprivileged users) can create a socket that uses the same ID
as an existing socket. Binding this new socket will fail, but if the
NETLINK_URELEASE notification is generated for such sockets, the users
thereof will be tricked into thinking the socket that they allocated the
resources for is closed.

In the nl80211 case, this will cause destruction of virtual interfaces
that still belong to an existing hostapd process; this is the case that
Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
of netlink log/queue it will cause them to stop reporting events, as if
NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.

Fix this problem by checking that the socket is bound before generating
the NETLINK_URELEASE notification.

Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Ivanov &lt;dima@ubnt.com&gt;
Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All existing users of NETLINK_URELEASE use it to clean up resources that
were previously allocated to a socket via some command. As a result, no
users require getting this notification for unbound sockets.

Sending it for unbound sockets, however, is a problem because any user
(including unprivileged users) can create a socket that uses the same ID
as an existing socket. Binding this new socket will fail, but if the
NETLINK_URELEASE notification is generated for such sockets, the users
thereof will be tricked into thinking the socket that they allocated the
resources for is closed.

In the nl80211 case, this will cause destruction of virtual interfaces
that still belong to an existing hostapd process; this is the case that
Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
of netlink log/queue it will cause them to stop reporting events, as if
NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.

Fix this problem by checking that the socket is bound before generating
the NETLINK_URELEASE notification.

Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Ivanov &lt;dima@ubnt.com&gt;
Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rhashtable: accept GFP flags in rhashtable_walk_init</title>
<updated>2016-04-05T08:56:32+00:00</updated>
<author>
<name>Bob Copeland</name>
<email>me@bobcopeland.com</email>
</author>
<published>2016-03-02T15:09:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8f6fd83c6c5ec66a4a70c728535ddcdfef4f3697'/>
<id>8f6fd83c6c5ec66a4a70c728535ddcdfef4f3697</id>
<content type='text'>
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section.  Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: Bob Copeland &lt;me@bobcopeland.com&gt;
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In certain cases, the 802.11 mesh pathtable code wants to
iterate over all of the entries in the forwarding table from
the receive path, which is inside an RCU read-side critical
section.  Enable walks inside atomic sections by allowing
GFP_ATOMIC allocations for the walker state.

Change all existing callsites to pass in GFP_KERNEL.

Acked-by: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: Bob Copeland &lt;me@bobcopeland.com&gt;
[also adjust gfs2/glock.c and rhashtable tests]
Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: add support for NIC driver ioctls</title>
<updated>2016-03-22T19:45:44+00:00</updated>
<author>
<name>David Decotigny</name>
<email>decot@googlers.com</email>
</author>
<published>2016-03-21T17:15:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=025c68186e07afaededa84143f1a22f273cd3f67'/>
<id>025c68186e07afaededa84143f1a22f273cd3f67</id>
<content type='text'>
By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
dev_ioctl(), which provides support for NIC driver ioctls, which
includes ethtool support. This is similar to the way ioctls are handled
in udp.c or tcp.c.

This removes the requirement that ethtool for example be tied to the
support of a specific L3 protocol (ethtool uses an AF_INET socket
today).

Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
dev_ioctl(), which provides support for NIC driver ioctls, which
includes ethtool support. This is similar to the way ioctls are handled
in udp.c or tcp.c.

This removes the requirement that ethtool for example be tied to the
support of a specific L3 protocol (ethtool uses an AF_INET socket
today).

Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
