<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv6/ip6mr.c, branch v4.12</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>net: Fix inconsistent teardown and release of private netdev state.</title>
<updated>2017-06-07T19:53:24+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-05-08T16:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cf124db566e6b036b8bcbe8decbed740bdfac8c6'/>
<id>cf124db566e6b036b8bcbe8decbed740bdfac8c6</id>
<content type='text'>
Network devices can allocate reasources and private memory using
netdev_ops-&gt;ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops-&gt;ndo_uninit() or netdev-&gt;destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops-&gt;ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev-&gt;destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev-&gt;destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops-&gt;ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops-&gt;ndo_uninit().  But
it is not able to invoke netdev-&gt;destructor().

This is because netdev-&gt;destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev-&gt;destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev-&gt;destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev-&gt;priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev-&gt;destructor(), except for
free_netdev().

netdev-&gt;needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops-&gt;ndo_init() succeeds, by invoking both ndo_ops-&gt;ndo_uninit()
and netdev-&gt;priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev-&gt;priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Network devices can allocate reasources and private memory using
netdev_ops-&gt;ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops-&gt;ndo_uninit() or netdev-&gt;destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops-&gt;ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev-&gt;destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev-&gt;destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops-&gt;ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops-&gt;ndo_uninit().  But
it is not able to invoke netdev-&gt;destructor().

This is because netdev-&gt;destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev-&gt;destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev-&gt;destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev-&gt;priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev-&gt;destructor(), except for
free_netdev().

netdev-&gt;needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops-&gt;ndo_init() succeeds, by invoking both ndo_ops-&gt;ndo_uninit()
and netdev-&gt;priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev-&gt;priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2017-04-22T03:23:53+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-04-22T03:23:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5'/>
<id>fb796707d7a6c9b24fdf80b9b4f24fa5ffcf0ec5</id>
<content type='text'>
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ip6mr: fix notification device destruction</title>
<updated>2017-04-21T19:35:47+00:00</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@cumulusnetworks.com</email>
</author>
<published>2017-04-21T17:42:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=723b929ca0f79c0796f160c2eeda4597ee98d2b8'/>
<id>723b929ca0f79c0796f160c2eeda4597ee98d2b8</id>
<content type='text'>
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.

The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
 call_netdevice_notifiers net/core/dev.c:1663
 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
 unregister_netdevice_many net/core/dev.c:7880
 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
 kthread+0x35e/0x430 kernel/kthread.c:231
 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe &lt;0f&gt;
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---

Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Tested-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.

The trace from Andrey:
------------[ cut here ]------------
kernel BUG at net/core/dev.c:6813!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 1 PID: 1165 Comm: kworker/u4:3 Not tainted 4.11.0-rc7+ #251
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Workqueue: netns cleanup_net
task: ffff880069208000 task.stack: ffff8800692d8000
RIP: 0010:rollback_registered_many+0x348/0xeb0 net/core/dev.c:6813
RSP: 0018:ffff8800692de7f0 EFLAGS: 00010297
RAX: ffff880069208000 RBX: 0000000000000002 RCX: 0000000000000001
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88006af90569
RBP: ffff8800692de9f0 R08: ffff8800692dec60 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000000 R12: ffff88006af90070
R13: ffff8800692debf0 R14: dffffc0000000000 R15: ffff88006af90000
FS:  0000000000000000(0000) GS:ffff88006cb00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe7e897d870 CR3: 00000000657e7000 CR4: 00000000000006e0
Call Trace:
 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
 unregister_netdevice_many+0xc8/0x120 net/core/dev.c:7880
 ip6mr_device_event+0x362/0x3f0 net/ipv6/ip6mr.c:1346
 notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
 __raw_notifier_call_chain kernel/notifier.c:394
 raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
 call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1647
 call_netdevice_notifiers net/core/dev.c:1663
 rollback_registered_many+0x919/0xeb0 net/core/dev.c:6841
 unregister_netdevice_many.part.105+0x87/0x440 net/core/dev.c:7881
 unregister_netdevice_many net/core/dev.c:7880
 default_device_exit_batch+0x4fa/0x640 net/core/dev.c:8333
 ops_exit_list.isra.4+0x100/0x150 net/core/net_namespace.c:144
 cleanup_net+0x5a8/0xb40 net/core/net_namespace.c:463
 process_one_work+0xc04/0x1c10 kernel/workqueue.c:2097
 worker_thread+0x223/0x19c0 kernel/workqueue.c:2231
 kthread+0x35e/0x430 kernel/kthread.c:231
 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Code: 3c 32 00 0f 85 70 0b 00 00 48 b8 00 02 00 00 00 00 ad de 49 89
47 78 e9 93 fe ff ff 49 8d 57 70 49 8d 5f 78 eb 9e e8 88 7a 14 fe &lt;0f&gt;
0b 48 8b 9d 28 fe ff ff e8 7a 7a 14 fe 48 b8 00 00 00 00 00
RIP: rollback_registered_many+0x348/0xeb0 RSP: ffff8800692de7f0
---[ end trace e0b29c57e9b3292c ]---

Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Tested-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ipv6: Refactor inet6_netconf_notify_devconf to take event</title>
<updated>2017-03-29T05:32:42+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2017-03-28T21:28:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=85b3daada4cab8cc36888f5d025058bbc8737497'/>
<id>85b3daada4cab8cc36888f5d025058bbc8737497</id>
<content type='text'>
Refactor inet6_netconf_notify_devconf to take the event as an input arg.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Refactor inet6_netconf_notify_devconf to take the event as an input arg.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv6: check sk sk_type and protocol early in ip_mroute_set/getsockopt</title>
<updated>2017-02-27T02:25:46+00:00</updated>
<author>
<name>Xin Long</name>
<email>lucien.xin@gmail.com</email>
</author>
<published>2017-02-24T08:29:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=99253eb750fda6a644d5188fb26c43bad8d5a745'/>
<id>99253eb750fda6a644d5188fb26c43bad8d5a745</id>
<content type='text'>
Commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups") fixed
the issue for ipv4 ipmr:

  ip_mroute_setsockopt() &amp; ip_mroute_getsockopt() should not
  access/set raw_sk(sk)-&gt;ipmr_table before making sure the socket
  is a raw socket, and protocol is IGMP

The same fix should be done for ipv6 ipmr as well.

This patch can fix the panic caused by overwriting the same offset
as ipmr_table as in raw_sk(sk) when accessing other type's socket
by ip_mroute_setsockopt().

Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups") fixed
the issue for ipv4 ipmr:

  ip_mroute_setsockopt() &amp; ip_mroute_getsockopt() should not
  access/set raw_sk(sk)-&gt;ipmr_table before making sure the socket
  is a raw socket, and protocol is IGMP

The same fix should be done for ipv6 ipmr as well.

This patch can fix the panic caused by overwriting the same offset
as ipmr_table as in raw_sk(sk) when accessing other type's socket
by ip_mroute_setsockopt().

Signed-off-by: Xin Long &lt;lucien.xin@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: ipv6: remove nowait arg to rt6_fill_node</title>
<updated>2017-01-18T20:43:59+00:00</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2017-01-17T23:51:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fd61c6ba313de6758aeeab58fe03bd9fbbc8cea9'/>
<id>fd61c6ba313de6758aeeab58fe03bd9fbbc8cea9</id>
<content type='text'>
All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and
simplify rt6_fill_node accordingly.

rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the
nowait arg from it as well.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and
simplify rt6_fill_node accordingly.

rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the
nowait arg from it as well.

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entries</title>
<updated>2017-01-03T15:04:31+00:00</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@cumulusnetworks.com</email>
</author>
<published>2017-01-03T11:13:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1708ebc9636a249e104b83c6d105f15244825281'/>
<id>1708ebc9636a249e104b83c6d105f15244825281</id>
<content type='text'>
While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.

Suggested-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.

Suggested-by: Roopa Prabhu &lt;roopa@cumulusnetworks.com&gt;
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Replace &lt;asm/uaccess.h&gt; with &lt;linux/uaccess.h&gt; globally</title>
<updated>2016-12-24T19:46:01+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-12-24T19:46:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7c0f6ba682b9c7632072ffbedf8d328c8f3c42ba'/>
<id>7c0f6ba682b9c7632072ffbedf8d328c8f3c42ba</id>
<content type='text'>
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*&lt;asm/uaccess.h&gt;'
  sed -i -e "s!$PATT!#include &lt;linux/uaccess.h&gt;!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*&lt;asm/uaccess.h&gt;'
  sed -i -e "s!$PATT!#include &lt;linux/uaccess.h&gt;!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: pim: add all RFC7761 message types</title>
<updated>2016-10-31T20:18:30+00:00</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@cumulusnetworks.com</email>
</author>
<published>2016-10-31T12:21:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=56245cae19f5ccb371fa63b09bb6b9ce7c0f1266'/>
<id>56245cae19f5ccb371fa63b09bb6b9ce7c0f1266</id>
<content type='text'>
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route</title>
<updated>2016-09-26T03:41:39+00:00</updated>
<author>
<name>Nikolay Aleksandrov</name>
<email>nikolay@cumulusnetworks.com</email>
</author>
<published>2016-09-25T21:08:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8'/>
<id>2cf750704bb6d7ed8c7d732e071dd1bc890ea5e8</id>
<content type='text'>
Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.

Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013]  #0:  (((&amp;mrt-&gt;ipmr_expire_timer))){+.-...}, at: [&lt;ffffffff810fbbf5&gt;] call_timer_fn+0x5/0x350
[ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [&lt;ffffffff8161e005&gt;] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412]  &lt;IRQ&gt;  [&lt;ffffffff813a7804&gt;] dump_stack+0x85/0xc1
[ 7858.215662]  [&lt;ffffffff810a4a72&gt;] ___might_sleep+0x192/0x250
[ 7858.215868]  [&lt;ffffffff810a4b9f&gt;] __might_sleep+0x6f/0x100
[ 7858.216072]  [&lt;ffffffff8165bea3&gt;] mutex_lock_nested+0x33/0x4d0
[ 7858.216279]  [&lt;ffffffff815a7a5f&gt;] ? netlink_lookup+0x25f/0x460
[ 7858.216487]  [&lt;ffffffff8157474b&gt;] rtnetlink_rcv+0x1b/0x40
[ 7858.216687]  [&lt;ffffffff815a9a0c&gt;] netlink_unicast+0x19c/0x260
[ 7858.216900]  [&lt;ffffffff81573c70&gt;] rtnl_unicast+0x20/0x30
[ 7858.217128]  [&lt;ffffffff8161cd39&gt;] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351]  [&lt;ffffffff8161e06f&gt;] ipmr_expire_process+0x8f/0x130
[ 7858.217581]  [&lt;ffffffff8161dfe0&gt;] ? ipmr_net_init+0x180/0x180
[ 7858.217785]  [&lt;ffffffff8161dfe0&gt;] ? ipmr_net_init+0x180/0x180
[ 7858.217990]  [&lt;ffffffff810fbc95&gt;] call_timer_fn+0xa5/0x350
[ 7858.218192]  [&lt;ffffffff810fbbf5&gt;] ? call_timer_fn+0x5/0x350
[ 7858.218415]  [&lt;ffffffff8161dfe0&gt;] ? ipmr_net_init+0x180/0x180
[ 7858.218656]  [&lt;ffffffff810fde10&gt;] run_timer_softirq+0x260/0x640
[ 7858.218865]  [&lt;ffffffff8166379b&gt;] ? __do_softirq+0xbb/0x54f
[ 7858.219068]  [&lt;ffffffff816637c8&gt;] __do_softirq+0xe8/0x54f
[ 7858.219269]  [&lt;ffffffff8107a948&gt;] irq_exit+0xb8/0xc0
[ 7858.219463]  [&lt;ffffffff81663452&gt;] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678]  [&lt;ffffffff816625bc&gt;] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897]  &lt;EOI&gt;  [&lt;ffffffff81055f16&gt;] ? native_safe_halt+0x6/0x10
[ 7858.220165]  [&lt;ffffffff810d64dd&gt;] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373]  [&lt;ffffffff810298e3&gt;] default_idle+0x23/0x190
[ 7858.220574]  [&lt;ffffffff8102a20f&gt;] arch_cpu_idle+0xf/0x20
[ 7858.220790]  [&lt;ffffffff810c9f8c&gt;] default_idle_call+0x4c/0x60
[ 7858.221016]  [&lt;ffffffff810ca33b&gt;] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257]  [&lt;ffffffff8164f995&gt;] rest_init+0x135/0x140
[ 7858.221469]  [&lt;ffffffff81f83014&gt;] start_kernel+0x50e/0x51b
[ 7858.221670]  [&lt;ffffffff81f82120&gt;] ? early_idt_handler_array+0x120/0x120
[ 7858.221894]  [&lt;ffffffff81f8243f&gt;] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113]  [&lt;ffffffff81f8257c&gt;] x86_64_start_kernel+0x13b/0x14a

Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
instead of the previous dst_pid which was copied from in_skb's portid.
Since the skb is new the portid is 0 at that point so the packets are sent
to the kernel and we get scheduling while atomic or a deadlock (depending
on where it happens) by trying to acquire rtnl two times.
Also since this is RTM_GETROUTE, it can be triggered by a normal user.

Here's the sleeping while atomic trace:
[ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
[ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
[ 7858.212881] 2 locks held by swapper/0/0:
[ 7858.213013]  #0:  (((&amp;mrt-&gt;ipmr_expire_timer))){+.-...}, at: [&lt;ffffffff810fbbf5&gt;] call_timer_fn+0x5/0x350
[ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [&lt;ffffffff8161e005&gt;] ipmr_expire_process+0x25/0x130
[ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
[ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
[ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
[ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
[ 7858.215251] Call Trace:
[ 7858.215412]  &lt;IRQ&gt;  [&lt;ffffffff813a7804&gt;] dump_stack+0x85/0xc1
[ 7858.215662]  [&lt;ffffffff810a4a72&gt;] ___might_sleep+0x192/0x250
[ 7858.215868]  [&lt;ffffffff810a4b9f&gt;] __might_sleep+0x6f/0x100
[ 7858.216072]  [&lt;ffffffff8165bea3&gt;] mutex_lock_nested+0x33/0x4d0
[ 7858.216279]  [&lt;ffffffff815a7a5f&gt;] ? netlink_lookup+0x25f/0x460
[ 7858.216487]  [&lt;ffffffff8157474b&gt;] rtnetlink_rcv+0x1b/0x40
[ 7858.216687]  [&lt;ffffffff815a9a0c&gt;] netlink_unicast+0x19c/0x260
[ 7858.216900]  [&lt;ffffffff81573c70&gt;] rtnl_unicast+0x20/0x30
[ 7858.217128]  [&lt;ffffffff8161cd39&gt;] ipmr_destroy_unres+0xa9/0xf0
[ 7858.217351]  [&lt;ffffffff8161e06f&gt;] ipmr_expire_process+0x8f/0x130
[ 7858.217581]  [&lt;ffffffff8161dfe0&gt;] ? ipmr_net_init+0x180/0x180
[ 7858.217785]  [&lt;ffffffff8161dfe0&gt;] ? ipmr_net_init+0x180/0x180
[ 7858.217990]  [&lt;ffffffff810fbc95&gt;] call_timer_fn+0xa5/0x350
[ 7858.218192]  [&lt;ffffffff810fbbf5&gt;] ? call_timer_fn+0x5/0x350
[ 7858.218415]  [&lt;ffffffff8161dfe0&gt;] ? ipmr_net_init+0x180/0x180
[ 7858.218656]  [&lt;ffffffff810fde10&gt;] run_timer_softirq+0x260/0x640
[ 7858.218865]  [&lt;ffffffff8166379b&gt;] ? __do_softirq+0xbb/0x54f
[ 7858.219068]  [&lt;ffffffff816637c8&gt;] __do_softirq+0xe8/0x54f
[ 7858.219269]  [&lt;ffffffff8107a948&gt;] irq_exit+0xb8/0xc0
[ 7858.219463]  [&lt;ffffffff81663452&gt;] smp_apic_timer_interrupt+0x42/0x50
[ 7858.219678]  [&lt;ffffffff816625bc&gt;] apic_timer_interrupt+0x8c/0xa0
[ 7858.219897]  &lt;EOI&gt;  [&lt;ffffffff81055f16&gt;] ? native_safe_halt+0x6/0x10
[ 7858.220165]  [&lt;ffffffff810d64dd&gt;] ? trace_hardirqs_on+0xd/0x10
[ 7858.220373]  [&lt;ffffffff810298e3&gt;] default_idle+0x23/0x190
[ 7858.220574]  [&lt;ffffffff8102a20f&gt;] arch_cpu_idle+0xf/0x20
[ 7858.220790]  [&lt;ffffffff810c9f8c&gt;] default_idle_call+0x4c/0x60
[ 7858.221016]  [&lt;ffffffff810ca33b&gt;] cpu_startup_entry+0x39b/0x4d0
[ 7858.221257]  [&lt;ffffffff8164f995&gt;] rest_init+0x135/0x140
[ 7858.221469]  [&lt;ffffffff81f83014&gt;] start_kernel+0x50e/0x51b
[ 7858.221670]  [&lt;ffffffff81f82120&gt;] ? early_idt_handler_array+0x120/0x120
[ 7858.221894]  [&lt;ffffffff81f8243f&gt;] x86_64_start_reservations+0x2a/0x2c
[ 7858.222113]  [&lt;ffffffff81f8257c&gt;] x86_64_start_kernel+0x13b/0x14a

Fixes: 2942e9005056 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
Signed-off-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
