<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/tipc, branch v4.19</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>tipc: fix info leak from kernel tipc_event</title>
<updated>2018-10-18T23:49:53+00:00</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-10-18T15:38:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b06f9d9f1a907dd03f203e2ce9e27e318c22ba01'/>
<id>b06f9d9f1a907dd03f203e2ce9e27e318c22ba01</id>
<content type='text'>
We initialize a struct tipc_event allocated on the kernel stack to
zero to avert info leak to user space.

Reported-by: syzbot+057458894bc8cada4dee@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We initialize a struct tipc_event allocated on the kernel stack to
zero to avert info leak to user space.

Reported-by: syzbot+057458894bc8cada4dee@syzkaller.appspotmail.com
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: fix unsafe rcu locking when accessing publication list</title>
<updated>2018-10-16T05:33:27+00:00</updated>
<author>
<name>Tung Nguyen</name>
<email>tung.q.nguyen@dektech.com.au</email>
</author>
<published>2018-10-12T20:46:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d3092b2efca1cd1d492d0b08499a2066c5ca8cec'/>
<id>d3092b2efca1cd1d492d0b08499a2066c5ca8cec</id>
<content type='text'>
The binding table's 'cluster_scope' list is rcu protected to handle
races between threads changing the list and those traversing the list at
the same moment. We have now found that the function named_distribute()
uses the regular list_for_each() macro to traverse the said list.
Likewise, the function tipc_named_withdraw() is removing items from the
same list using the regular list_del() call. When these two functions
execute in parallel we see occasional crashes.

This commit fixes this by adding the missing _rcu() suffixes.

Signed-off-by: Tung Nguyen &lt;tung.q.nguyen@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The binding table's 'cluster_scope' list is rcu protected to handle
races between threads changing the list and those traversing the list at
the same moment. We have now found that the function named_distribute()
uses the regular list_for_each() macro to traverse the said list.
Likewise, the function tipc_named_withdraw() is removing items from the
same list using the regular list_del() call. When these two functions
execute in parallel we see occasional crashes.

This commit fixes this by adding the missing _rcu() suffixes.

Signed-off-by: Tung Nguyen &lt;tung.q.nguyen@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: initialize broadcast link stale counter correctly</title>
<updated>2018-10-16T05:03:34+00:00</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-10-11T20:02:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4af00f4cc1ba34da4654ac31830843cae871642d'/>
<id>4af00f4cc1ba34da4654ac31830843cae871642d</id>
<content type='text'>
In the commit referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
unicast links to the affected node.

Unfortunately, this 'improvement' introduced two bugs, which each and
one alone cause only limited problems, but combined lead to seemingly
stochastic unicast link resets, depending on the amount of broadcast
traffic transmitted.

The first issue, a missing initialization of the 'tolerance' field of
the receiver broadcast link, was recently fixed by commit 047491ea334a
("tipc: set link tolerance correctly in broadcast link").

Ths second issue, where we omit to reset the 'stale_cnt' field of
the same link after a 'stale' period is over, leads to this counter
accumulating over time, and in the absence of the 'tolerance' criteria
leads to the above described symptoms. This commit adds the missing
initialization.

Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission")
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the commit referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
unicast links to the affected node.

Unfortunately, this 'improvement' introduced two bugs, which each and
one alone cause only limited problems, but combined lead to seemingly
stochastic unicast link resets, depending on the amount of broadcast
traffic transmitted.

The first issue, a missing initialization of the 'tolerance' field of
the receiver broadcast link, was recently fixed by commit 047491ea334a
("tipc: set link tolerance correctly in broadcast link").

Ths second issue, where we omit to reset the 'stale_cnt' field of
the same link after a 'stale' period is over, leads to this counter
accumulating over time, and in the absence of the 'tolerance' criteria
leads to the above described symptoms. This commit adds the missing
initialization.

Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission")
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: eliminate possible recursive locking detected by LOCKDEP</title>
<updated>2018-10-11T17:23:48+00:00</updated>
<author>
<name>Ying Xue</name>
<email>ying.xue@windriver.com</email>
</author>
<published>2018-10-11T11:57:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a1f8dd34e64af689e95122921fb2ca83dedd4c4e'/>
<id>a1f8dd34e64af689e95122921fb2ca83dedd4c4e</id>
<content type='text'>
When booting kernel with LOCKDEP option, below warning info was found:

WARNING: possible recursive locking detected
4.19.0-rc7+ #14 Not tainted
--------------------------------------------
swapper/0/1 is trying to acquire lock:
00000000dcfc0fc8 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at: spin_lock_bh
include/linux/spinlock.h:334 [inline]
00000000dcfc0fc8 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850

but task is already holding lock:
00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at: spin_lock_bh
include/linux/spinlock.h:334 [inline]
00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&amp;(&amp;list-&gt;lock)-&gt;rlock#4);
  lock(&amp;(&amp;list-&gt;lock)-&gt;rlock#4);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by swapper/0/1:
 #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at:
register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051
 #1: 00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
spin_lock_bh include/linux/spinlock.h:334 [inline]
 #1: 00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1af/0x295 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline]
 check_deadlock kernel/locking/lockdep.c:1803 [inline]
 validate_chain kernel/locking/lockdep.c:2399 [inline]
 __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411
 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
 _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
 spin_lock_bh include/linux/spinlock.h:334 [inline]
 tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526
 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521
 tipc_init_net+0x472/0x610 net/tipc/core.c:82
 ops_init+0xf7/0x520 net/core/net_namespace.c:129
 __register_pernet_operations net/core/net_namespace.c:940 [inline]
 register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011
 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052
 tipc_init+0x83/0x104 net/tipc/core.c:140
 do_one_initcall+0x109/0x70a init/main.c:885
 do_initcall_level init/main.c:953 [inline]
 do_initcalls init/main.c:961 [inline]
 do_basic_setup init/main.c:979 [inline]
 kernel_init_freeable+0x4bd/0x57f init/main.c:1144
 kernel_init+0x13/0x180 init/main.c:1063
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

The reason why the noise above was complained by LOCKDEP is because we
nested to hold l-&gt;wakeupq.lock and l-&gt;inputq-&gt;lock in tipc_link_reset
function. In fact it's unnecessary to move skb buffer from l-&gt;wakeupq
queue to l-&gt;inputq queue while holding the two locks at the same time.
Instead, we can move skb buffers in l-&gt;wakeupq queue to a temporary
list first and then move the buffers of the temporary list to l-&gt;inputq
queue, which is also safe for us.

Fixes: 3f32d0be6c16 ("tipc: lock wakeup &amp; inputq at tipc_link_reset()")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When booting kernel with LOCKDEP option, below warning info was found:

WARNING: possible recursive locking detected
4.19.0-rc7+ #14 Not tainted
--------------------------------------------
swapper/0/1 is trying to acquire lock:
00000000dcfc0fc8 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at: spin_lock_bh
include/linux/spinlock.h:334 [inline]
00000000dcfc0fc8 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850

but task is already holding lock:
00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at: spin_lock_bh
include/linux/spinlock.h:334 [inline]
00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&amp;(&amp;list-&gt;lock)-&gt;rlock#4);
  lock(&amp;(&amp;list-&gt;lock)-&gt;rlock#4);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by swapper/0/1:
 #0: 00000000f7539d34 (pernet_ops_rwsem){+.+.}, at:
register_pernet_subsys+0x19/0x40 net/core/net_namespace.c:1051
 #1: 00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
spin_lock_bh include/linux/spinlock.h:334 [inline]
 #1: 00000000cbb9b036 (&amp;(&amp;list-&gt;lock)-&gt;rlock#4){+...}, at:
tipc_link_reset+0xfa/0xdf0 net/tipc/link.c:849

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc7+ #14
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1af/0x295 lib/dump_stack.c:113
 print_deadlock_bug kernel/locking/lockdep.c:1759 [inline]
 check_deadlock kernel/locking/lockdep.c:1803 [inline]
 validate_chain kernel/locking/lockdep.c:2399 [inline]
 __lock_acquire+0xf1e/0x3c60 kernel/locking/lockdep.c:3411
 lock_acquire+0x1db/0x520 kernel/locking/lockdep.c:3900
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
 _raw_spin_lock_bh+0x31/0x40 kernel/locking/spinlock.c:168
 spin_lock_bh include/linux/spinlock.h:334 [inline]
 tipc_link_reset+0x125/0xdf0 net/tipc/link.c:850
 tipc_link_bc_create+0xb5/0x1f0 net/tipc/link.c:526
 tipc_bcast_init+0x59b/0xab0 net/tipc/bcast.c:521
 tipc_init_net+0x472/0x610 net/tipc/core.c:82
 ops_init+0xf7/0x520 net/core/net_namespace.c:129
 __register_pernet_operations net/core/net_namespace.c:940 [inline]
 register_pernet_operations+0x453/0xac0 net/core/net_namespace.c:1011
 register_pernet_subsys+0x28/0x40 net/core/net_namespace.c:1052
 tipc_init+0x83/0x104 net/tipc/core.c:140
 do_one_initcall+0x109/0x70a init/main.c:885
 do_initcall_level init/main.c:953 [inline]
 do_initcalls init/main.c:961 [inline]
 do_basic_setup init/main.c:979 [inline]
 kernel_init_freeable+0x4bd/0x57f init/main.c:1144
 kernel_init+0x13/0x180 init/main.c:1063
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413

The reason why the noise above was complained by LOCKDEP is because we
nested to hold l-&gt;wakeupq.lock and l-&gt;inputq-&gt;lock in tipc_link_reset
function. In fact it's unnecessary to move skb buffer from l-&gt;wakeupq
queue to l-&gt;inputq queue while holding the two locks at the same time.
Instead, we can move skb buffers in l-&gt;wakeupq queue to a temporary
list first and then move the buffers of the temporary list to l-&gt;inputq
queue, which is also safe for us.

Fixes: 3f32d0be6c16 ("tipc: lock wakeup &amp; inputq at tipc_link_reset()")
Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Signed-off-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Acked-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: queue socket protocol error messages into socket receive buffer</title>
<updated>2018-10-11T05:56:07+00:00</updated>
<author>
<name>Parthasarathy Bhuvaragan</name>
<email>parthasarathy.bhuvaragan@ericsson.com</email>
</author>
<published>2018-10-10T15:50:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e7eb05823806502747eadc31039cecfd7836ddeb'/>
<id>e7eb05823806502747eadc31039cecfd7836ddeb</id>
<content type='text'>
In tipc_sk_filter_rcv(), when we detect protocol messages with error we
call tipc_sk_conn_proto_rcv() and let it reset the connection and notify
the socket by calling sk-&gt;sk_state_change().

However, tipc_sk_filter_rcv() may have been called from the function
tipc_backlog_rcv(), in which case the socket lock is held and the socket
already awake. This means that the sk_state_change() call is ignored and
the error notification lost. Now the receive queue will remain empty and
the socket sleeps forever.

In this commit, we convert the protocol message into a connection abort
message and enqueue it into the socket's receive queue. By this addition
to the above state change we cover all conditions.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In tipc_sk_filter_rcv(), when we detect protocol messages with error we
call tipc_sk_conn_proto_rcv() and let it reset the connection and notify
the socket by calling sk-&gt;sk_state_change().

However, tipc_sk_filter_rcv() may have been called from the function
tipc_backlog_rcv(), in which case the socket lock is held and the socket
already awake. This means that the sk_state_change() call is ignored and
the error notification lost. Now the receive queue will remain empty and
the socket sleeps forever.

In this commit, we convert the protocol message into a connection abort
message and enqueue it into the socket's receive queue. By this addition
to the above state change we cover all conditions.

Acked-by: Ying Xue &lt;ying.xue@windriver.com&gt;
Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: set link tolerance correctly in broadcast link</title>
<updated>2018-10-11T05:56:07+00:00</updated>
<author>
<name>Jon Maloy</name>
<email>jon.maloy@ericsson.com</email>
</author>
<published>2018-10-10T15:34:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=047491ea334a454fa0647ec99dadcc6dd38417e0'/>
<id>047491ea334a454fa0647ec99dadcc6dd38417e0</id>
<content type='text'>
In the patch referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
affected links.

However, the 'tolerance' field of the broadcast link is never set, and
remains at zero. This renders the whole commit without the intended
improving effect, but luckily also with no negative effect.

In this commit we add the missing initialization.

Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission")
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the patch referred to below we added link tolerance as an additional
criteria for declaring broadcast transmission "stale" and resetting the
affected links.

However, the 'tolerance' field of the broadcast link is never set, and
remains at zero. This renders the whole commit without the intended
improving effect, but luckily also with no negative effect.

In this commit we add the missing initialization.

Fixes: a4dc70d46cf1 ("tipc: extend link reset criteria for stale packet retransmission")
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: ignore STATE_MSG on wrong link session</title>
<updated>2018-10-02T05:35:30+00:00</updated>
<author>
<name>LUU Duc Canh</name>
<email>canh.d.luu@dektech.com.au</email>
</author>
<published>2018-09-26T20:28:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d949cfedbcbab4e91590576cbace2671924ad69c'/>
<id>d949cfedbcbab4e91590576cbace2671924ad69c</id>
<content type='text'>
The initial session number when a link is created is based on a random
value, taken from struct tipc_net-&gt;random. It is then incremented for
each link reset to avoid mixing protocol messages from different link
sessions.

However, when a bearer is reset all its links are deleted, and will
later be re-created using the same random value as the first time.
This means that if the link never went down between creation and
deletion we will still sometimes have two subsequent sessions with
the same session number. In virtual environments with potentially
long transmission times this has turned out to be a real problem.

We now fix this by randomizing the session number each time a link
is created.

With a session number size of 16 bits this gives a risk of session
collision of 1/64k. To reduce this further, we also introduce a sanity
check on the very first STATE message arriving at a link. If this has
an acknowledge value differing from 0, which is logically impossible,
we ignore the message. The final risk for session collision is hence
reduced to 1/4G, which should be sufficient.

Signed-off-by: LUU Duc Canh &lt;canh.d.luu@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The initial session number when a link is created is based on a random
value, taken from struct tipc_net-&gt;random. It is then incremented for
each link reset to avoid mixing protocol messages from different link
sessions.

However, when a bearer is reset all its links are deleted, and will
later be re-created using the same random value as the first time.
This means that if the link never went down between creation and
deletion we will still sometimes have two subsequent sessions with
the same session number. In virtual environments with potentially
long transmission times this has turned out to be a real problem.

We now fix this by randomizing the session number each time a link
is created.

With a session number size of 16 bits this gives a risk of session
collision of 1/64k. To reduce this further, we also introduce a sanity
check on the very first STATE message arriving at a link. If this has
an acknowledge value differing from 0, which is logically impossible,
we ignore the message. The final risk for session collision is hence
reduced to 1/4G, which should be sufficient.

Signed-off-by: LUU Duc Canh &lt;canh.d.luu@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: fix failover problem</title>
<updated>2018-09-29T18:45:14+00:00</updated>
<author>
<name>LUU Duc Canh</name>
<email>canh.d.luu@dektech.com.au</email>
</author>
<published>2018-09-26T19:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c140eb166d681f66bd7e99fb121357db1a503e7f'/>
<id>c140eb166d681f66bd7e99fb121357db1a503e7f</id>
<content type='text'>
We see the following scenario:
1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
   Since there is a second working link, failover procedure is started.
2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
   A on node 2. The node item 1-&gt;2 goes to state FAILINGOVER.
3) Linke endpoint A/2 receives the failover, and is supposed to take
   down its parallell link endpoint B/2, while producing a FAILOVER
   message to send back to A/1.
4) However, B/2 has already been deleted, so no FAILOVER message can
   created.
5) Node 1-&gt;2 remains in state FAILINGOVER forever, refusing to receive
   any messages that can bring B/1 up again. We are left with a non-
   redundant link between node 1 and 2.

We fix this with letting endpoint A/2 build a dummy FAILOVER message
to send to back to A/1, so that the situation can be resolved.

Signed-off-by: LUU Duc Canh &lt;canh.d.luu@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We see the following scenario:
1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
   Since there is a second working link, failover procedure is started.
2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
   A on node 2. The node item 1-&gt;2 goes to state FAILINGOVER.
3) Linke endpoint A/2 receives the failover, and is supposed to take
   down its parallell link endpoint B/2, while producing a FAILOVER
   message to send back to A/1.
4) However, B/2 has already been deleted, so no FAILOVER message can
   created.
5) Node 1-&gt;2 remains in state FAILINGOVER forever, refusing to receive
   any messages that can bring B/1 up again. We are left with a non-
   redundant link between node 1 and 2.

We fix this with letting endpoint A/2 build a dummy FAILOVER message
to send to back to A/1, so that the situation can be resolved.

Signed-off-by: LUU Duc Canh &lt;canh.d.luu@dektech.com.au&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: lock wakeup &amp; inputq at tipc_link_reset()</title>
<updated>2018-09-26T03:48:56+00:00</updated>
<author>
<name>Parthasarathy Bhuvaragan</name>
<email>parthasarathy.bhuvaragan@ericsson.com</email>
</author>
<published>2018-09-25T20:09:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3f32d0be6c16b902b687453c962d17eea5b8ea19'/>
<id>3f32d0be6c16b902b687453c962d17eea5b8ea19</id>
<content type='text'>
In tipc_link_reset() we copy the wakeup queue to input queue using
skb_queue_splice_init(link-&gt;wakeupq, link-&gt;inputq).
This is performed without holding any locks. The lists might be
simultaneously be accessed by other cpu threads in tipc_sk_rcv(),
something leading to to random missing packets.

Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In tipc_link_reset() we copy the wakeup queue to input queue using
skb_queue_splice_init(link-&gt;wakeupq, link-&gt;inputq).
This is performed without holding any locks. The lists might be
simultaneously be accessed by other cpu threads in tipc_sk_rcv(),
something leading to to random missing packets.

Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tipc: reset bearer if device carrier not ok</title>
<updated>2018-09-26T03:48:56+00:00</updated>
<author>
<name>Parthasarathy Bhuvaragan</name>
<email>parthasarathy.bhuvaragan@ericsson.com</email>
</author>
<published>2018-09-25T19:56:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=94b6ddce71780575fbbf9d2c36afc8440e61a281'/>
<id>94b6ddce71780575fbbf9d2c36afc8440e61a281</id>
<content type='text'>
If we detect that under lying carrier detects errors and goes down,
we reset the bearer.

Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we detect that under lying carrier detects errors and goes down,
we reset the bearer.

Signed-off-by: Parthasarathy Bhuvaragan &lt;parthasarathy.bhuvaragan@ericsson.com&gt;
Signed-off-by: Jon Maloy &lt;jon.maloy@ericsson.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
