<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/netlink, branch v4.6-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>netlink: add support for NIC driver ioctls</title>
<updated>2016-03-22T19:45:44+00:00</updated>
<author>
<name>David Decotigny</name>
<email>decot@googlers.com</email>
</author>
<published>2016-03-21T17:15:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=025c68186e07afaededa84143f1a22f273cd3f67'/>
<id>025c68186e07afaededa84143f1a22f273cd3f67</id>
<content type='text'>
By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
dev_ioctl(), which provides support for NIC driver ioctls, which
includes ethtool support. This is similar to the way ioctls are handled
in udp.c or tcp.c.

This removes the requirement that ethtool for example be tied to the
support of a specific L3 protocol (ethtool uses an AF_INET socket
today).

Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
By returning -ENOIOCTLCMD, sock_do_ioctl() falls back to calling
dev_ioctl(), which provides support for NIC driver ioctls, which
includes ethtool support. This is similar to the way ioctls are handled
in udp.c or tcp.c.

This removes the requirement that ethtool for example be tied to the
support of a specific L3 protocol (ethtool uses an AF_INET socket
today).

Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nfnetlink: Revert "nfnetlink: add support for memory mapped netlink"</title>
<updated>2016-02-18T16:42:22+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-02-18T14:03:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c5b0db3263b92526bc0c1b6380c0c99f91f069fc'/>
<id>c5b0db3263b92526bc0c1b6380c0c99f91f069fc</id>
<content type='text'>
reverts commit 3ab1f683bf8b ("nfnetlink: add support for memory mapped
netlink")'

Like previous commits in the series, remove wrappers that are not needed
after mmapped netlink removal.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
reverts commit 3ab1f683bf8b ("nfnetlink: add support for memory mapped
netlink")'

Like previous commits in the series, remove wrappers that are not needed
after mmapped netlink removal.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "genl: Add genlmsg_new_unicast() for unicast message allocation"</title>
<updated>2016-02-18T16:42:19+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-02-18T14:03:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=263ea09084d172cac6e40459a690babe8de8e448'/>
<id>263ea09084d172cac6e40459a690babe8de8e448</id>
<content type='text'>
This reverts commit bb9b18fb55b0 ("genl: Add genlmsg_new_unicast() for
unicast message allocation")'.

Nothing wrong with it; its no longer needed since this was only for
mmapped netlink support.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit bb9b18fb55b0 ("genl: Add genlmsg_new_unicast() for
unicast message allocation")'.

Nothing wrong with it; its no longer needed since this was only for
mmapped netlink support.

Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: remove mmapped netlink support</title>
<updated>2016-02-18T16:42:18+00:00</updated>
<author>
<name>Florian Westphal</name>
<email>fw@strlen.de</email>
</author>
<published>2016-02-18T14:03:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d1b4c689d4130bcfd3532680b64db562300716b6'/>
<id>d1b4c689d4130bcfd3532680b64db562300716b6</id>
<content type='text'>
mmapped netlink has a number of unresolved issues:

- TX zerocopy support had to be disabled more than a year ago via
  commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.")
  because the content of the mmapped area can change after netlink
  attribute validation but before message processing.

- RX support was implemented mainly to speed up nfqueue dumping packet
  payload to userspace.  However, since commit ae08ce0021087a5d812d2
  ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
  with the socket-based interface too (via the skb_zerocopy helper).

The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:

- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.

- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb-&gt;head.
See for instance
commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump").

- skbs handed e.g. to netlink_ack must have non-NULL skb-&gt;sk, else we
crash because it needs the sk to check if a tx ring is attached.

Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").

mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.

nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
  Problem is that in the mmap case, the allocation time also determines
  the ordering in which the frame will be seen by userspace (A
  allocating before B means that A is located in earlier ring slot,
  but this also means that B might get a lower sequence number then A
  since seqno is decided later.  To fix this we would need to extend the
  spinlocked region to also cover the allocation and message setup which
  isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
  Queing GSO packets is faster than having to force a software segmentation
  in the kernel, so this is a desirable option.  However, with a mmap based
  ring one has to use 64kb per ring slot element, else mmap has to fall back
  to the socket path (NL_MMAP_STATUS_COPY) for all large packets.

To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.

Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Ken-ichirou MATSUZAWA &lt;chamaken@gmail.com&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Patrick McHardy &lt;kaber@trash.net&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mmapped netlink has a number of unresolved issues:

- TX zerocopy support had to be disabled more than a year ago via
  commit 4682a0358639b29cf ("netlink: Always copy on mmap TX.")
  because the content of the mmapped area can change after netlink
  attribute validation but before message processing.

- RX support was implemented mainly to speed up nfqueue dumping packet
  payload to userspace.  However, since commit ae08ce0021087a5d812d2
  ("netfilter: nfnetlink_queue: zero copy support") we avoid one copy
  with the socket-based interface too (via the skb_zerocopy helper).

The other problem is that skbs attached to mmaped netlink socket
behave different from normal skbs:

- they don't have a shinfo area, so all functions that use skb_shinfo()
(e.g. skb_clone) cannot be used.

- reserving headroom prevents userspace from seeing the content as
it expects message to start at skb-&gt;head.
See for instance
commit aa3a022094fa ("netlink: not trim skb for mmaped socket when dump").

- skbs handed e.g. to netlink_ack must have non-NULL skb-&gt;sk, else we
crash because it needs the sk to check if a tx ring is attached.

Also not obvious, leads to non-intuitive bug fixes such as 7c7bdf359
("netfilter: nfnetlink: use original skbuff when acking batches").

mmaped netlink also didn't play nicely with the skb_zerocopy helper
used by nfqueue and openvswitch.  Daniel Borkmann fixed this via
commit 6bb0fef489f6 ("netlink, mmap: fix edge-case leakages in nf queue
zero-copy")' but at the cost of also needing to provide remaining
length to the allocation function.

nfqueue also has problems when used with mmaped rx netlink:
- mmaped netlink doesn't allow use of nfqueue batch verdict messages.
  Problem is that in the mmap case, the allocation time also determines
  the ordering in which the frame will be seen by userspace (A
  allocating before B means that A is located in earlier ring slot,
  but this also means that B might get a lower sequence number then A
  since seqno is decided later.  To fix this we would need to extend the
  spinlocked region to also cover the allocation and message setup which
  isn't desirable.
- nfqueue can now be configured to queue large (GSO) skbs to userspace.
  Queing GSO packets is faster than having to force a software segmentation
  in the kernel, so this is a desirable option.  However, with a mmap based
  ring one has to use 64kb per ring slot element, else mmap has to fall back
  to the socket path (NL_MMAP_STATUS_COPY) for all large packets.

To use the mmap interface, userspace not only has to probe for mmap netlink
support, it also has to implement a recv/socket receive path in order to
handle messages that exceed the size of an rx ring element.

Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Ken-ichirou MATSUZAWA &lt;chamaken@gmail.com&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Patrick McHardy &lt;kaber@trash.net&gt;
Cc: Thomas Graf &lt;tgraf@suug.ch&gt;
Signed-off-by: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>openvswitch: allow management from inside user namespaces</title>
<updated>2016-02-11T14:53:19+00:00</updated>
<author>
<name>Tycho Andersen</name>
<email>tycho.andersen@canonical.com</email>
</author>
<published>2016-02-05T16:20:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4a92602aa1cd5bbaeedbd9536ff992f7d26fe9d1'/>
<id>4a92602aa1cd5bbaeedbd9536ff992f7d26fe9d1</id>
<content type='text'>
Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one

Reported-by: James Page &lt;james.page@canonical.com&gt;
Signed-off-by: Tycho Andersen &lt;tycho.andersen@canonical.com&gt;
CC: Eric Biederman &lt;ebiederm@xmission.com&gt;
CC: Pravin Shelar &lt;pshelar@ovn.org&gt;
CC: Justin Pettit &lt;jpettit@nicira.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
    massive one

Reported-by: James Page &lt;james.page@canonical.com&gt;
Signed-off-by: Tycho Andersen &lt;tycho.andersen@canonical.com&gt;
CC: Eric Biederman &lt;ebiederm@xmission.com&gt;
CC: Pravin Shelar &lt;pshelar@ovn.org&gt;
CC: Justin Pettit &lt;jpettit@nicira.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: not trim skb for mmaped socket when dump</title>
<updated>2016-01-30T04:25:17+00:00</updated>
<author>
<name>Ken-ichirou MATSUZAWA</name>
<email>chamaken@gmail.com</email>
</author>
<published>2016-01-29T01:45:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aa3a022094fac7f6e48050e139fa8a5a2e3265ce'/>
<id>aa3a022094fac7f6e48050e139fa8a5a2e3265ce</id>
<content type='text'>
We should not trim skb for mmaped socket since its buf size is fixed
and userspace will read as frame which data equals head. mmaped
socket will not call recvmsg, means max_recvmsg_len is 0,
skb_reserve was not called before commit: db65a3aaf29e.

Fixes: db65a3aaf29e (netlink: Trim skb to alloc size to avoid MSG_TRUNC)
Signed-off-by: Ken-ichirou MATSUZAWA &lt;chamas@h4.dion.ne.jp&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We should not trim skb for mmaped socket since its buf size is fixed
and userspace will read as frame which data equals head. mmaped
socket will not call recvmsg, means max_recvmsg_len is 0,
skb_reserve was not called before commit: db65a3aaf29e.

Fixes: db65a3aaf29e (netlink: Trim skb to alloc size to avoid MSG_TRUNC)
Signed-off-by: Ken-ichirou MATSUZAWA &lt;chamas@h4.dion.ne.jp&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>genetlink: Fix off-by-one in genl_allocate_reserve_groups()</title>
<updated>2016-01-13T15:28:06+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-01-13T15:28:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b8e429a2feac623a34e21099a4a69de29b6d873e'/>
<id>b8e429a2feac623a34e21099a4a69de29b6d873e</id>
<content type='text'>
The bug fix for adding n_groups to the computation forgot
to adjust "&gt;=" to "&gt;" to keep the condition correct.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The bug fix for adding n_groups to the computation forgot
to adjust "&gt;=" to "&gt;" to keep the condition correct.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux</title>
<updated>2016-01-13T05:21:27+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-01-13T05:21:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ddb5388ffd0ad75d07e7b78181a0b579824ba6f0'/>
<id>ddb5388ffd0ad75d07e7b78181a0b579824ba6f0</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>net: netlink: Fix multicast group storage allocation for families with more than one groups</title>
<updated>2016-01-12T21:40:15+00:00</updated>
<author>
<name>Matti Vaittinen</name>
<email>matti.vaittinen@nokia.com</email>
</author>
<published>2016-01-11T12:26:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ccdf6ce6a8dba374668ae9b4d763e19903611c38'/>
<id>ccdf6ce6a8dba374668ae9b4d763e19903611c38</id>
<content type='text'>
Multicast groups are stored in global buffer. Check for needed buffer size
incorrectly compares buffer size to first id for family. This means that
for families with more than one mcast id one may allocate too small buffer
and end up writing rest of the groups to some unallocated memory. Fix the
buffer size check to compare allocated space to last mcast id for the
family.

Tested on ARM using kernel 3.14

Signed-off-by: Matti Vaittinen &lt;matti.vaittinen@nokia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Multicast groups are stored in global buffer. Check for needed buffer size
incorrectly compares buffer size to first id for family. This means that
for families with more than one mcast id one may allocate too small buffer
and end up writing rest of the groups to some unallocated memory. Fix the
buffer size check to compare allocated space to last mcast id for the
family.

Tested on ARM using kernel 3.14

Signed-off-by: Matti Vaittinen &lt;matti.vaittinen@nokia.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netlink: add a start callback for starting a netlink dump</title>
<updated>2015-12-16T04:25:20+00:00</updated>
<author>
<name>Tom Herbert</name>
<email>tom@herbertland.com</email>
</author>
<published>2015-12-15T23:41:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06'/>
<id>fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06</id>
<content type='text'>
The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The start callback allows the caller to set up a context for the
dump callbacks. Presumably, the context can then be destroyed in
the done callback.

Signed-off-by: Tom Herbert &lt;tom@herbertland.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
