<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/openvswitch, branch v4.13</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>openvswitch: fix skb_panic due to the incorrect actions attrlen</title>
<updated>2017-08-16T21:12:37+00:00</updated>
<author>
<name>Liping Zhang</name>
<email>zlpnobody@gmail.com</email>
</author>
<published>2017-08-16T05:30:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=494bea39f3201776cdfddc232705f54a0bd210c4'/>
<id>494bea39f3201776cdfddc232705f54a0bd210c4</id>
<content type='text'>
For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
  skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
  ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:&lt;NULL&gt;
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:129!
  [...]
  Call Trace:
   &lt;IRQ&gt;
   [&lt;ffffffff8148be82&gt;] skb_put+0x43/0x44
   [&lt;ffffffff8148fabf&gt;] skb_zerocopy+0x6c/0x1f4
   [&lt;ffffffffa0290d36&gt;] queue_userspace_packet+0x3a3/0x448 [openvswitch]
   [&lt;ffffffffa0292023&gt;] ovs_dp_upcall+0x30/0x5c [openvswitch]
   [&lt;ffffffffa028d435&gt;] output_userspace+0x132/0x158 [openvswitch]
   [&lt;ffffffffa01e6890&gt;] ? ip6_rcv_finish+0x74/0x77 [ipv6]
   [&lt;ffffffffa028e277&gt;] do_execute_actions+0xcc1/0xdc8 [openvswitch]
   [&lt;ffffffffa028e3f2&gt;] ovs_execute_actions+0x74/0x106 [openvswitch]
   [&lt;ffffffffa0292130&gt;] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
   [&lt;ffffffffa0292b77&gt;] ? key_extract+0x63c/0x8d5 [openvswitch]
   [&lt;ffffffffa029848b&gt;] ovs_vport_receive+0xa1/0xc3 [openvswitch]
  [...]

Also we can find that the actions_len is much little than the orig_len:
  crash&gt; struct sw_flow_actions 0xffff8812f539d000
  struct sw_flow_actions {
    rcu = {
      next = 0xffff8812f5398800,
      func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
  }

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee &lt;neil.mckee@inmon.com&gt;
Signed-off-by: Liping Zhang &lt;zlpnobody@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
  skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
  ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:&lt;NULL&gt;
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:129!
  [...]
  Call Trace:
   &lt;IRQ&gt;
   [&lt;ffffffff8148be82&gt;] skb_put+0x43/0x44
   [&lt;ffffffff8148fabf&gt;] skb_zerocopy+0x6c/0x1f4
   [&lt;ffffffffa0290d36&gt;] queue_userspace_packet+0x3a3/0x448 [openvswitch]
   [&lt;ffffffffa0292023&gt;] ovs_dp_upcall+0x30/0x5c [openvswitch]
   [&lt;ffffffffa028d435&gt;] output_userspace+0x132/0x158 [openvswitch]
   [&lt;ffffffffa01e6890&gt;] ? ip6_rcv_finish+0x74/0x77 [ipv6]
   [&lt;ffffffffa028e277&gt;] do_execute_actions+0xcc1/0xdc8 [openvswitch]
   [&lt;ffffffffa028e3f2&gt;] ovs_execute_actions+0x74/0x106 [openvswitch]
   [&lt;ffffffffa0292130&gt;] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
   [&lt;ffffffffa0292b77&gt;] ? key_extract+0x63c/0x8d5 [openvswitch]
   [&lt;ffffffffa029848b&gt;] ovs_vport_receive+0xa1/0xc3 [openvswitch]
  [...]

Also we can find that the actions_len is much little than the orig_len:
  crash&gt; struct sw_flow_actions 0xffff8812f539d000
  struct sw_flow_actions {
    rcu = {
      next = 0xffff8812f5398800,
      func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
  }

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee &lt;neil.mckee@inmon.com&gt;
Signed-off-by: Liping Zhang &lt;zlpnobody@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>openvswitch: fix potential out of bound access in parse_ct</title>
<updated>2017-07-24T23:25:06+00:00</updated>
<author>
<name>Liping Zhang</name>
<email>zlpnobody@gmail.com</email>
</author>
<published>2017-07-23T09:52:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c'/>
<id>69ec932e364b1ba9c3a2085fe96b76c8a3f71e7c</id>
<content type='text'>
Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang &lt;zlpnobody@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Liping Zhang &lt;zlpnobody@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>openvswitch: Fix for force/commit action failures</title>
<updated>2017-07-15T21:40:21+00:00</updated>
<author>
<name>Greg Rose</name>
<email>gvrose8192@gmail.com</email>
</author>
<published>2017-07-14T19:42:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8b97ac5bda17cfaa257bcab6180af0f43a2e87e0'/>
<id>8b97ac5bda17cfaa257bcab6180af0f43a2e87e0</id>
<content type='text'>
When there is an established connection in direction A-&gt;B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B-&gt;A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

Fixes: dd41d330b03 ("openvswitch: Add force commit.")
CC: Pravin Shelar &lt;pshelar@nicira.com&gt;
CC: dev@openvswitch.org
Signed-off-by: Joe Stringer &lt;joe@ovn.org&gt;
Signed-off-by: Greg Rose &lt;gvrose8192@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When there is an established connection in direction A-&gt;B, it is
possible to receive a packet on port B which then executes
ct(commit,force) without first performing ct() - ie, a lookup.
In this case, we would expect that this packet can delete the existing
entry so that we can commit a connection with direction B-&gt;A. However,
currently we only perform a check in skb_nfct_cached() for whether
OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
lookup previously occurred. In the above scenario, a lookup has not
occurred but we should still be able to statelessly look up the
existing entry and potentially delete the entry if it is in the
opposite direction.

This patch extends the check to also hint that if the action has the
force flag set, then we will lookup the existing entry so that the
force check at the end of skb_nfct_cached has the ability to delete
the connection.

Fixes: dd41d330b03 ("openvswitch: Add force commit.")
CC: Pravin Shelar &lt;pshelar@nicira.com&gt;
CC: dev@openvswitch.org
Signed-off-by: Joe Stringer &lt;joe@ovn.org&gt;
Signed-off-by: Greg Rose &lt;gvrose8192@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>openvswitch: fix mis-ordered comment lines for ovs_skb_cb</title>
<updated>2017-07-03T12:54:00+00:00</updated>
<author>
<name>Daniel Axtens</name>
<email>dja@axtens.net</email>
</author>
<published>2017-07-03T11:46:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=52427fa0631269c62885dc48e0c32e2ad6e17f8c'/>
<id>52427fa0631269c62885dc48e0c32e2ad6e17f8c</id>
<content type='text'>
I was trying to wrap my head around meaning of mru, and realised
that the second line of the comment defining it had somehow
ended up after the line defining cutlen, leading to much confusion.

Reorder the lines to make sense.

Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I was trying to wrap my head around meaning of mru, and realised
that the second line of the comment defining it had somehow
ended up after the line defining cutlen, leading to much confusion.

Reorder the lines to make sense.

Signed-off-by: Daniel Axtens &lt;dja@axtens.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>datapath: Avoid using stack larger than 1024.</title>
<updated>2017-07-01T16:09:59+00:00</updated>
<author>
<name>Tonghao Zhang</name>
<email>xiangxia.m.yue@gmail.com</email>
</author>
<published>2017-06-30T00:27:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9cc9a5cb176ccb4f2cda5ac34da5a659926f125f'/>
<id>9cc9a5cb176ccb4f2cda5ac34da5a659926f125f</id>
<content type='text'>
When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

    CC [M]  /root/ovs/datapath/linux/datapath.o
    /root/ovs/datapath/linux/datapath.c: In function
    'ovs_flow_cmd_set':
    /root/ovs/datapath/linux/datapath.c:1221:1: warning:
    the frame size of 1040 bytes is larger than 1024 bytes
    [-Wframe-larger-than=]

This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.

Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When compiling OvS-master on 4.4.0-81 kernel,
there is a warning:

    CC [M]  /root/ovs/datapath/linux/datapath.o
    /root/ovs/datapath/linux/datapath.c: In function
    'ovs_flow_cmd_set':
    /root/ovs/datapath/linux/datapath.c:1221:1: warning:
    the frame size of 1040 bytes is larger than 1024 bytes
    [-Wframe-larger-than=]

This patch factors out match-init and action-copy to avoid
"Wframe-larger-than=1024" warning. Because mask is only
used to get actions, we new a function to save some
stack space.

Signed-off-by: Tonghao Zhang &lt;xiangxia.m.yue@gmail.com&gt;
Acked-by: Pravin B Shelar &lt;pshelar@ovn.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: store port/representator id in metadata_dst</title>
<updated>2017-06-25T15:42:01+00:00</updated>
<author>
<name>Jakub Kicinski</name>
<email>jakub.kicinski@netronome.com</email>
</author>
<published>2017-06-23T20:11:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3fcece12bc1b6dcdf0986f2cd9e8f63b1f9b6aa0'/>
<id>3fcece12bc1b6dcdf0986f2cd9e8f63b1f9b6aa0</id>
<content type='text'>
Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id.  This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission.  The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Signed-off-by: Sridhar Samudrala &lt;sridhar.samudrala@intel.com&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id.  This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission.  The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski &lt;jakub.kicinski@netronome.com&gt;
Signed-off-by: Sridhar Samudrala &lt;sridhar.samudrala@intel.com&gt;
Signed-off-by: Simon Horman &lt;horms@verge.net.au&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vxlan: get rid of redundant vxlan_dev.flags</title>
<updated>2017-06-20T17:37:02+00:00</updated>
<author>
<name>Matthias Schiffer</name>
<email>mschiffer@universe-factory.net</email>
</author>
<published>2017-06-19T08:03:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dc5321d79697db1b610c25fa4fad1aec7533ea3e'/>
<id>dc5321d79697db1b610c25fa4fad1aec7533ea3e</id>
<content type='text'>
There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.

Signed-off-by: Matthias Schiffer &lt;mschiffer@universe-factory.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.

Signed-off-by: Matthias Schiffer &lt;mschiffer@universe-factory.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>networking: convert many more places to skb_put_zero()</title>
<updated>2017-06-16T15:48:35+00:00</updated>
<author>
<name>Johannes Berg</name>
<email>johannes.berg@intel.com</email>
</author>
<published>2017-06-16T12:29:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b080db585384b9f037e015c0c28d1ad33be41dfc'/>
<id>b080db585384b9f037e015c0c28d1ad33be41dfc</id>
<content type='text'>
There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There were many places that my previous spatch didn't find,
as pointed out by yuan linyu in various patches.

The following spatch found many more and also removes the
now unnecessary casts:

    @@
    identifier p, p2;
    expression len;
    expression skb;
    type t, t2;
    @@
    (
    -p = skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    |
    -p = (t)skb_put(skb, len);
    +p = skb_put_zero(skb, len);
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, len);
    |
    -memset(p, 0, len);
    )

    @@
    type t, t2;
    identifier p, p2;
    expression skb;
    @@
    t *p;
    ...
    (
    -p = skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    |
    -p = (t *)skb_put(skb, sizeof(t));
    +p = skb_put_zero(skb, sizeof(t));
    )
    ... when != p
    (
    p2 = (t2)p;
    -memset(p2, 0, sizeof(*p));
    |
    -memset(p, 0, sizeof(*p));
    )

    @@
    expression skb, len;
    @@
    -memset(skb_put(skb, len), 0, len);
    +skb_put_zero(skb, len);

Apply it to the tree (with one manual fixup to keep the
comment in vxlan.c, which spatch removed.)

Signed-off-by: Johannes Berg &lt;johannes.berg@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2017-06-15T15:59:32+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-06-15T15:31:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0ddead90b223faae475f3296a50bf574b7f7c69a'/>
<id>0ddead90b223faae475f3296a50bf574b7f7c69a</id>
<content type='text'>
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix inconsistent teardown and release of private netdev state.</title>
<updated>2017-06-07T19:53:24+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2017-05-08T16:52:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cf124db566e6b036b8bcbe8decbed740bdfac8c6'/>
<id>cf124db566e6b036b8bcbe8decbed740bdfac8c6</id>
<content type='text'>
Network devices can allocate reasources and private memory using
netdev_ops-&gt;ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops-&gt;ndo_uninit() or netdev-&gt;destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops-&gt;ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev-&gt;destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev-&gt;destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops-&gt;ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops-&gt;ndo_uninit().  But
it is not able to invoke netdev-&gt;destructor().

This is because netdev-&gt;destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev-&gt;destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev-&gt;destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev-&gt;priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev-&gt;destructor(), except for
free_netdev().

netdev-&gt;needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops-&gt;ndo_init() succeeds, by invoking both ndo_ops-&gt;ndo_uninit()
and netdev-&gt;priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev-&gt;priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Network devices can allocate reasources and private memory using
netdev_ops-&gt;ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops-&gt;ndo_uninit() or netdev-&gt;destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops-&gt;ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev-&gt;destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev-&gt;destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops-&gt;ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops-&gt;ndo_uninit().  But
it is not able to invoke netdev-&gt;destructor().

This is because netdev-&gt;destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev-&gt;destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev-&gt;destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev-&gt;priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev-&gt;destructor(), except for
free_netdev().

netdev-&gt;needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops-&gt;ndo_init() succeeds, by invoking both ndo_ops-&gt;ndo_uninit()
and netdev-&gt;priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev-&gt;priv_destructor() and optionally call free_netdev().

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
