<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/net/bonding, branch v4.7</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>bonding: set carrier off for devices created through netlink</title>
<updated>2016-07-14T23:17:43+00:00</updated>
<author>
<name>Beniamino Galvani</name>
<email>bgalvani@redhat.com</email>
</author>
<published>2016-07-13T16:25:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=005db31d5f5f7c31cfdc43505d77eb3ca5cf8ec6'/>
<id>005db31d5f5f7c31cfdc43505d77eb3ca5cf8ec6</id>
<content type='text'>
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after
register_netdevice") moved netif_carrier_off() from bond_init() to
bond_create(), but the latter is called only for initial default
devices and ones created through sysfs:

 $ modprobe bonding
 $ echo +bond1 &gt; /sys/class/net/bonding_masters
 $ ip link add bond2 type bond
 $ grep "MII Status" /proc/net/bonding/*
 /proc/net/bonding/bond0:MII Status: down
 /proc/net/bonding/bond1:MII Status: down
 /proc/net/bonding/bond2:MII Status: up

Ensure that carrier is initially off also for devices created through
netlink.

Signed-off-by: Beniamino Galvani &lt;bgalvani@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit e826eafa65c6 ("bonding: Call netif_carrier_off after
register_netdevice") moved netif_carrier_off() from bond_init() to
bond_create(), but the latter is called only for initial default
devices and ones created through sysfs:

 $ modprobe bonding
 $ echo +bond1 &gt; /sys/class/net/bonding_masters
 $ ip link add bond2 type bond
 $ grep "MII Status" /proc/net/bonding/*
 /proc/net/bonding/bond0:MII Status: down
 /proc/net/bonding/bond1:MII Status: down
 /proc/net/bonding/bond2:MII Status: up

Ensure that carrier is initially off also for devices created through
netlink.

Signed-off-by: Beniamino Galvani &lt;bgalvani@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: fix enslavement slave link notifications</title>
<updated>2016-07-05T18:51:55+00:00</updated>
<author>
<name>Aviv Heller</name>
<email>avivh@mellanox.com</email>
</author>
<published>2016-07-05T09:09:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a30b016808e214c6864ad579ef867b3fe0a314f8'/>
<id>a30b016808e214c6864ad579ef867b3fe0a314f8</id>
<content type='text'>
Currently, link notifications are not sent by
bond_set_slave_link_state() upon enslavement if
the slave is enslaved when up.

This happens because slave-&gt;link default init value
is 0, which is the same as BOND_LINK_UP, resulting
in bond_set_slave_link_state() ignoring this transition.

This patch sets the default value of slave-&gt;link to
BOND_LINK_NOCHANGE, assuring it will count as a state
transition and thus trigger notification logic.

Signed-off-by: Aviv Heller &lt;avivh@mellanox.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, link notifications are not sent by
bond_set_slave_link_state() upon enslavement if
the slave is enslaved when up.

This happens because slave-&gt;link default init value
is 0, which is the same as BOND_LINK_UP, resulting
in bond_set_slave_link_state() ignoring this transition.

This patch sets the default value of slave-&gt;link to
BOND_LINK_NOCHANGE, assuring it will count as a state
transition and thus trigger notification logic.

Signed-off-by: Aviv Heller &lt;avivh@mellanox.com&gt;
Reviewed-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Signed-off-by: Saeed Mahameed &lt;saeedm@mellanox.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: prevent out of bound accesses</title>
<updated>2016-07-01T10:06:09+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-06-30T14:13:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f87fda00b6ed232a817c655b8d179b48bde8fdbe'/>
<id>f87fda00b6ed232a817c655b8d179b48bde8fdbe</id>
<content type='text'>
ether_addr_equal_64bits() requires some care about its arguments,
namely that 8 bytes might be read, even if last 2 byte values are not
used.

KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
in bond_3ad.c

Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
Although the 8-byte alignment was there, KASan would detect out
of bound accesses.

Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare")
Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr")
Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Acked-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Acked-by: Ding Tianhong &lt;dingtianhong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ether_addr_equal_64bits() requires some care about its arguments,
namely that 8 bytes might be read, even if last 2 byte values are not
used.

KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
in bond_3ad.c

Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
Although the 8-byte alignment was there, KASan would detect out
of bound accesses.

Fixes: 815117adaf5b ("bonding: use ether_addr_equal_unaligned for bond addr compare")
Fixes: bb54e58929f3 ("bonding: Verify RX LACPDU has proper dest mac-addr")
Fixes: 885a136c52a8 ("bonding: use compare_ether_addr_64bits() in ALB")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Acked-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Acked-by: Ding Tianhong &lt;dingtianhong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: fix 802.3ad aggregator reselection</title>
<updated>2016-06-28T08:19:18+00:00</updated>
<author>
<name>Jay Vosburgh</name>
<email>jay.vosburgh@canonical.com</email>
</author>
<published>2016-06-23T21:20:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0622cab0341cac6b30da177b0faa39fae0680e71'/>
<id>0622cab0341cac6b30da177b0faa39fae0680e71</id>
<content type='text'>
Since commit 7bb11dc9f59d ("bonding: unify all places where
actor-oper key needs to be updated."), the logic in bonding to handle
selection between multiple aggregators has not functioned.

	This affects only configurations wherein the bonding slaves
connect to two discrete aggregators (e.g., two independent switches, each
with LACP enabled), thus creating two separate aggregation groups within a
single bond.

	The cause is a change in 7bb11dc9f59d to no longer set
AD_PORT_BEGIN on a port after a link state change, which would cause the
port to be reselected for attachment to an aggregator as if were newly
added to the bond.  We cannot restore the prior behavior, as it
contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
now remains selected, the aggregator selection logic is not invoked.

	A side effect of this change is that aggregators in bonding will
now contain ports that are link down.  The aggregator selection logic
does not currently handle this situation correctly, causing incorrect
aggregator selection.

	This patch makes two changes to repair the aggregator selection
logic in bonding to function as documented and within the confines of the
standard:

	First, the aggregator selection and related logic now utilizes the
number of active ports per aggregator, not the number of selected ports
(as some selected ports may be down).  The ad_select "bandwidth" and
"count" options only consider ports that are link up.

	Second, on any carrier state change of any slave, the aggregator
selection logic is explicitly called to insure the correct aggregator is
active.

Reported-by: Veli-Matti Lintu &lt;veli-matti.lintu@opinsys.fi&gt;
Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.")
Signed-off-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 7bb11dc9f59d ("bonding: unify all places where
actor-oper key needs to be updated."), the logic in bonding to handle
selection between multiple aggregators has not functioned.

	This affects only configurations wherein the bonding slaves
connect to two discrete aggregators (e.g., two independent switches, each
with LACP enabled), thus creating two separate aggregation groups within a
single bond.

	The cause is a change in 7bb11dc9f59d to no longer set
AD_PORT_BEGIN on a port after a link state change, which would cause the
port to be reselected for attachment to an aggregator as if were newly
added to the bond.  We cannot restore the prior behavior, as it
contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
now remains selected, the aggregator selection logic is not invoked.

	A side effect of this change is that aggregators in bonding will
now contain ports that are link down.  The aggregator selection logic
does not currently handle this situation correctly, causing incorrect
aggregator selection.

	This patch makes two changes to repair the aggregator selection
logic in bonding to function as documented and within the confines of the
standard:

	First, the aggregator selection and related logic now utilizes the
number of active ports per aggregator, not the number of selected ports
(as some selected ports may be down).  The ad_select "bandwidth" and
"count" options only consider ports that are link up.

	Second, on any carrier state change of any slave, the aggregator
selection logic is explicitly called to insure the correct aggregator is
active.

Reported-by: Veli-Matti Lintu &lt;veli-matti.lintu@opinsys.fi&gt;
Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to be updated.")
Signed-off-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: fix bond_get_stats()</title>
<updated>2016-03-19T03:14:15+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-03-18T00:23:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fe30937b65354c7fec244caebbdaae68e28ca797'/>
<id>fe30937b65354c7fec244caebbdaae68e28ca797</id>
<content type='text'>
bond_get_stats() can be called from rtnetlink (with RTNL held)
or from /proc/net/dev seq handler (with RCU held)

The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding
stats more reliable") kind of assumed only one cpu could run there.

If multiple threads are reading /proc/net/dev, stats can be really
messed up after a while.

A second problem is that some fields are 32bit, so we need to properly
handle the wrap around problem.

Given that RTNL is not always held, we need to use
bond_for_each_slave_rcu().

Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Cc: Jay Vosburgh &lt;j.vosburgh@gmail.com&gt;
Cc: Veaceslav Falico &lt;vfalico@gmail.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bond_get_stats() can be called from rtnetlink (with RTNL held)
or from /proc/net/dev seq handler (with RCU held)

The logic added in commit 5f0c5f73e5ef ("bonding: make global bonding
stats more reliable") kind of assumed only one cpu could run there.

If multiple threads are reading /proc/net/dev, stats can be really
messed up after a while.

A second problem is that some fields are 32bit, so we need to properly
handle the wrap around problem.

Given that RTNL is not always held, we need to use
bond_for_each_slave_rcu().

Fixes: 5f0c5f73e5ef ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Cc: Jay Vosburgh &lt;j.vosburgh@gmail.com&gt;
Cc: Veaceslav Falico &lt;vfalico@gmail.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: remove duplicate set of flag IFF_MULTICAST</title>
<updated>2016-03-19T02:14:47+00:00</updated>
<author>
<name>Zhang Shengju</name>
<email>zhangshengju@cmss.chinamobile.com</email>
</author>
<published>2016-03-16T09:59:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1098cee6e1b11e4fe80b2765eb54d412d4294f03'/>
<id>1098cee6e1b11e4fe80b2765eb54d412d4294f03</id>
<content type='text'>
Remove unnecessary set of flag IFF_MULTICAST, since ether_setup
already does this.

Signed-off-by: Zhang Shengju &lt;zhangshengju@cmss.chinamobile.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove unnecessary set of flag IFF_MULTICAST, since ether_setup
already does this.

Signed-off-by: Zhang Shengju &lt;zhangshengju@cmss.chinamobile.com&gt;
Reviewed-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: bonding: use __ethtool_get_ksettings</title>
<updated>2016-02-26T03:06:46+00:00</updated>
<author>
<name>David Decotigny</name>
<email>decot@googlers.com</email>
</author>
<published>2016-02-24T18:58:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9856909c2abb32a795d9782bff2ce987696ffb9b'/>
<id>9856909c2abb32a795d9782bff2ce987696ffb9b</id>
<content type='text'>
Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: David Decotigny &lt;decot@googlers.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2016-02-23T05:09:14+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-02-23T05:09:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b633353115e352d3c31c12d4c61978c810f05ea1'/>
<id>b633353115e352d3c31c12d4c61978c810f05ea1</id>
<content type='text'>
Conflicts:
	drivers/net/phy/bcm7xxx.c
	drivers/net/phy/marvell.c
	drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	drivers/net/phy/bcm7xxx.c
	drivers/net/phy/marvell.c
	drivers/net/vxlan.c

All three conflicts were cases of simple overlapping changes.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: don't use stale speed and duplex information</title>
<updated>2016-02-16T20:14:01+00:00</updated>
<author>
<name>Jay Vosburgh</name>
<email>jay.vosburgh@canonical.com</email>
</author>
<published>2016-02-08T20:10:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=266b495f11d6706018f66250cb02a788ff2490d7'/>
<id>266b495f11d6706018f66250cb02a788ff2490d7</id>
<content type='text'>
There is presently a race condition between the bonding periodic
link monitor and the updating of a slave's speed and duplex.  The former
occurs on a periodic basis, and the latter in response to a driver's
calling of netif_carrier_on.

	It is possible for the periodic monitor to run between the
driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE
event that causes bonding to update the slave's speed and duplex.  This
manifests most notably as a report that a slave is up and "0 Mbps full
duplex" after enslavement, but in principle could report an incorrect
speed and duplex after any link up event if the device comes up with a
different speed or duplex.  This affects the 802.3ad aggregator
selection, as the speed and duplex are selection criteria.

	This is fixed by updating the speed and duplex in the periodic
monitor, prior to using that information.

	This was done historically in bonding, but the call to
bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding:
don't call update_speed_duplex() under spinlocks"), as it might sleep
under lock.  Later, the locking was changed to only hold RTNL, and so
after commit 876254ae2758 ("bonding: don't call update_speed_duplex()
under spinlocks") this call is again safe.

Tested-by: "Tantilov, Emil S" &lt;emil.s.tantilov@intel.com&gt;
Cc: Veaceslav Falico &lt;vfalico@gmail.com&gt;
Cc: dingtianhong &lt;dingtianhong@huawei.com&gt;
Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks")
Signed-off-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Acked-by: Ding Tianhong &lt;dingtianhong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is presently a race condition between the bonding periodic
link monitor and the updating of a slave's speed and duplex.  The former
occurs on a periodic basis, and the latter in response to a driver's
calling of netif_carrier_on.

	It is possible for the periodic monitor to run between the
driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE
event that causes bonding to update the slave's speed and duplex.  This
manifests most notably as a report that a slave is up and "0 Mbps full
duplex" after enslavement, but in principle could report an incorrect
speed and duplex after any link up event if the device comes up with a
different speed or duplex.  This affects the 802.3ad aggregator
selection, as the speed and duplex are selection criteria.

	This is fixed by updating the speed and duplex in the periodic
monitor, prior to using that information.

	This was done historically in bonding, but the call to
bond_update_speed_duplex was removed in commit 876254ae2758 ("bonding:
don't call update_speed_duplex() under spinlocks"), as it might sleep
under lock.  Later, the locking was changed to only hold RTNL, and so
after commit 876254ae2758 ("bonding: don't call update_speed_duplex()
under spinlocks") this call is again safe.

Tested-by: "Tantilov, Emil S" &lt;emil.s.tantilov@intel.com&gt;
Cc: Veaceslav Falico &lt;vfalico@gmail.com&gt;
Cc: dingtianhong &lt;dingtianhong@huawei.com&gt;
Fixes: 876254ae2758 ("bonding: don't call update_speed_duplex() under spinlocks")
Signed-off-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Acked-by: Ding Tianhong &lt;dingtianhong@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bonding: Fix ARP monitor validation</title>
<updated>2016-02-13T10:54:55+00:00</updated>
<author>
<name>Jay Vosburgh</name>
<email>jay.vosburgh@canonical.com</email>
</author>
<published>2016-02-02T21:35:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=21a75f0915dde8674708b39abfcda113911c49b1'/>
<id>21a75f0915dde8674708b39abfcda113911c49b1</id>
<content type='text'>
The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.

	This logic can fail if there is no currently active slave.  In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave.  The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.

	This can arise if the bonding slaves and switch are working, but
the ARP target is not responding.  When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave.  This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.

	Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.

	This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.

Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico &lt;vfalico@gmail.com&gt;
Cc: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Signed-off-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.

	This logic can fail if there is no currently active slave.  In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave.  The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.

	This can arise if the bonding slaves and switch are working, but
the ARP target is not responding.  When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave.  This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.

	Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.

	This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.

Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico &lt;vfalico@gmail.com&gt;
Cc: Andy Gospodarek &lt;gospo@cumulusnetworks.com&gt;
Signed-off-by: Jay Vosburgh &lt;jay.vosburgh@canonical.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
