<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/sctp/outqueue.c, branch linux-3.6.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sctp: Implement quick failover draft from tsvwg</title>
<updated>2012-07-22T19:13:46+00:00</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@tuxdriver.com</email>
</author>
<published>2012-07-21T07:56:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5aa93bcf66f4af094d6f11096e81d5501a0b4ba5'/>
<id>5aa93bcf66f4af094d6f11096e81d5501a0b4ba5</id>
<content type='text'>
I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
CC: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
CC: Sridhar Samudrala &lt;sri@us.ibm.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
I've seen several attempts recently made to do quick failover of sctp transports
by reducing various retransmit timers and counters.  While its possible to
implement a faster failover on multihomed sctp associations, its not
particularly robust, in that it can lead to unneeded retransmits, as well as
false connection failures due to intermittent latency on a network.

Instead, lets implement the new ietf quick failover draft found here:
http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05

This will let the sctp stack identify transports that have had a small number of
errors, and avoid using them quickly until their reliability can be
re-established.  I've tested this out on two virt guests connected via multiple
isolated virt networks and believe its in compliance with the above draft and
works well.

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
CC: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
CC: Sridhar Samudrala &lt;sri@us.ibm.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: linux-sctp@vger.kernel.org
CC: joe@perches.com
Acked-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: cleanup unsigned to unsigned int</title>
<updated>2012-04-15T16:44:40+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>eric.dumazet@gmail.com</email>
</author>
<published>2012-04-15T05:58:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=95c961747284a6b83a5e2d81240e214b0fa3464d'/>
<id>95c961747284a6b83a5e2d81240e214b0fa3464d</id>
<content type='text'>
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use of "unsigned int" is preferred to bare "unsigned" in net tree.

Signed-off-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd</title>
<updated>2011-12-20T18:58:37+00:00</updated>
<author>
<name>Thomas Graf</name>
<email>tgraf@redhat.com</email>
</author>
<published>2011-12-19T04:11:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a76c0adf60f6ca5ff3481992e4ea0383776b24d2'/>
<id>a76c0adf60f6ca5ff3481992e4ea0383776b24d2</id>
<content type='text'>
When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.

The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.

When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.

Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.

The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.

Chunk
Size    Unpatched     No Overhead
-------------------------------------
   4    15.2 Kbit [!]   12.2 Mbit [!]
   8    35.8 Kbit [!]   26.0 Mbit [!]
  16    95.5 Kbit [!]   54.4 Mbit [!]
  32   106.7 Mbit      102.3 Mbit
  64   189.2 Mbit      188.3 Mbit
 128   331.2 Mbit      334.8 Mbit
 256   537.7 Mbit      536.0 Mbit
 512   766.9 Mbit      766.6 Mbit
1024   810.1 Mbit      808.6 Mbit

Signed-off-by: Thomas Graf &lt;tgraf@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When checking whether a DATA chunk fits into the estimated rwnd a
full sizeof(struct sk_buff) is added to the needed chunk size. This
quickly exhausts the available rwnd space and leads to packets being
sent which are much below the PMTU limit. This can lead to much worse
performance.

The reason for this behaviour was to avoid putting too much memory
pressure on the receiver. The concept is not completely irational
because a Linux receiver does in fact clone an skb for each DATA chunk
delivered. However, Linux also reserves half the available socket
buffer space for data structures therefore usage of it is already
accounted for.

When proposing to change this the last time it was noted that this
behaviour was introduced to solve a performance issue caused by rwnd
overusage in combination with small DATA chunks.

Trying to reproduce this I found that with the sk_buff overhead removed,
the performance would improve significantly unless socket buffer limits
are increased.

The following numbers have been gathered using a patched iperf
supporting SCTP over a live 1 Gbit ethernet network. The -l option
was used to limit DATA chunk sizes. The numbers listed are based on
the average of 3 test runs each. Default values have been used for
sk_(r|w)mem.

Chunk
Size    Unpatched     No Overhead
-------------------------------------
   4    15.2 Kbit [!]   12.2 Mbit [!]
   8    35.8 Kbit [!]   26.0 Mbit [!]
  16    95.5 Kbit [!]   54.4 Mbit [!]
  32   106.7 Mbit      102.3 Mbit
  64   189.2 Mbit      188.3 Mbit
 128   331.2 Mbit      334.8 Mbit
 256   537.7 Mbit      536.0 Mbit
 512   766.9 Mbit      766.6 Mbit
1024   810.1 Mbit      808.6 Mbit

Signed-off-by: Thomas Graf &lt;tgraf@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: HEARTBEAT negotiation after ASCONF</title>
<updated>2011-08-25T02:41:09+00:00</updated>
<author>
<name>Michio Honda</name>
<email>micchie@sfc.wide.ad.jp</email>
</author>
<published>2011-06-16T01:54:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f207c050fb1c385e34946e57107e639831c7d557'/>
<id>f207c050fb1c385e34946e57107e639831c7d557</id>
<content type='text'>
This patch fixes BUG that the ASCONF receiver transmits DATA chunks
to the newly added UNCONFIRMED destination.

Signed-off-by: Michio Honda &lt;micchie@sfc.wide.ad.jp&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch fixes BUG that the ASCONF receiver transmits DATA chunks
to the newly added UNCONFIRMED destination.

Signed-off-by: Michio Honda &lt;micchie@sfc.wide.ad.jp&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6</title>
<updated>2011-07-14T14:56:40+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2011-07-14T14:56:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6a7ebdf2fd15417e87b4fd02ff411aeaca34da5f'/>
<id>6a7ebdf2fd15417e87b4fd02ff411aeaca34da5f</id>
<content type='text'>
Conflicts:
	net/bluetooth/l2cap_core.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Conflicts:
	net/bluetooth/l2cap_core.c
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Enforce retransmission limit during shutdown</title>
<updated>2011-07-07T21:08:44+00:00</updated>
<author>
<name>Thomas Graf</name>
<email>tgraf@infradead.org</email>
</author>
<published>2011-07-07T00:28:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f8d9605243280f1870dd2c6c37a735b925c15f3c'/>
<id>f8d9605243280f1870dd2c6c37a735b925c15f3c</id>
<content type='text'>
When initiating a graceful shutdown while having data chunks
on the retransmission queue with a peer which is in zero
window mode the shutdown is never completed because the
retransmission error count is reset periodically by the
following two rules:

 - Do not timeout association while doing zero window probe.
 - Reset overall error count when a heartbeat request has
   been acknowledged.

The graceful shutdown will wait for all outstanding TSN to
be acknowledged before sending the SHUTDOWN request. This
never happens due to the peer's zero window not acknowledging
the continuously retransmitted data chunks. Although the
error counter is incremented for each failed retransmission,
the receiving of the SACK announcing the zero window clears
the error count again immediately. Also heartbeat requests
continue to be sent periodically. The peer acknowledges these
requests causing the error counter to be reset as well.

This patch changes behaviour to only reset the overall error
counter for the above rules while not in shutdown. After
reaching the maximum number of retransmission attempts, the
T5 shutdown guard timer is scheduled to give the receiver
some additional time to recover. The timer is stopped as soon
as the receiver acknowledges any data.

The issue can be easily reproduced by establishing a sctp
association over the loopback device, constantly queueing
data at the sender while not reading any at the receiver.
Wait for the window to reach zero, then initiate a shutdown
by killing both processes simultaneously. The association
will never be freed and the chunks on the retransmission
queue will be retransmitted indefinitely.

Signed-off-by: Thomas Graf &lt;tgraf@infradead.org&gt;
Acked-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When initiating a graceful shutdown while having data chunks
on the retransmission queue with a peer which is in zero
window mode the shutdown is never completed because the
retransmission error count is reset periodically by the
following two rules:

 - Do not timeout association while doing zero window probe.
 - Reset overall error count when a heartbeat request has
   been acknowledged.

The graceful shutdown will wait for all outstanding TSN to
be acknowledged before sending the SHUTDOWN request. This
never happens due to the peer's zero window not acknowledging
the continuously retransmitted data chunks. Although the
error counter is incremented for each failed retransmission,
the receiving of the SACK announcing the zero window clears
the error count again immediately. Also heartbeat requests
continue to be sent periodically. The peer acknowledges these
requests causing the error counter to be reset as well.

This patch changes behaviour to only reset the overall error
counter for the above rules while not in shutdown. After
reaching the maximum number of retransmission attempts, the
T5 shutdown guard timer is scheduled to give the receiver
some additional time to recover. The timer is stopped as soon
as the receiver acknowledges any data.

The issue can be easily reproduced by establishing a sctp
association over the loopback device, constantly queueing
data at the sender while not reading any at the receiver.
Wait for the window to reach zero, then initiate a shutdown
by killing both processes simultaneously. The association
will never be freed and the chunks on the retransmission
queue will be retransmitted indefinitely.

Signed-off-by: Thomas Graf &lt;tgraf@infradead.org&gt;
Acked-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Add ASCONF operation on the single-homed host</title>
<updated>2011-06-02T09:04:53+00:00</updated>
<author>
<name>Michio Honda</name>
<email>micchie@sfc.wide.ad.jp</email>
</author>
<published>2011-04-26T11:19:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8a07eb0a50aebc8c95478d49c28c7f8419a26cef'/>
<id>8a07eb0a50aebc8c95478d49c28c7f8419a26cef</id>
<content type='text'>
In this case, the SCTP association transmits an ASCONF packet
including addition of the new IP address and deletion of the old
address.  This patch implements this functionality.
In this case, the ASCONF chunk is added to the beginning of the
queue, because the other chunks cannot be transmitted in this state.

Signed-off-by: Michio Honda &lt;micchie@sfc.wide.ad.jp&gt;
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Acked-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In this case, the SCTP association transmits an ASCONF packet
including addition of the new IP address and deletion of the old
address.  This patch implements this functionality.
In this case, the ASCONF chunk is added to the beginning of the
queue, because the other chunks cannot be transmitted in this state.

Signed-off-by: Michio Honda &lt;micchie@sfc.wide.ad.jp&gt;
Signed-off-by: YOSHIFUJI Hideaki &lt;yoshfuji@linux-ipv6.org&gt;
Acked-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: move chunk from retransmit queue to abandoned list</title>
<updated>2011-04-20T08:51:06+00:00</updated>
<author>
<name>Wei Yongjun</name>
<email>yjwei@cn.fujitsu.com</email>
</author>
<published>2011-04-19T21:32:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4c6a6f42131dd750dcfe3c71e63bfc046e5a227e'/>
<id>4c6a6f42131dd750dcfe3c71e63bfc046e5a227e</id>
<content type='text'>
If there is still data waiting to retransmit and remain in
retransmit queue, while doing the next retransmit, if the
chunk is abandoned, we should move it to abandoned list.

Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If there is still data waiting to retransmit and remain in
retransmit queue, while doing the next retransmit, if the
chunk is abandoned, we should move it to abandoned list.

Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: remove completely unsed EMPTY state</title>
<updated>2011-04-20T08:51:03+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2011-04-19T21:28:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0b8f9e25b0aaf5a5d9fd844a97e5c17746b865d4'/>
<id>0b8f9e25b0aaf5a5d9fd844a97e5c17746b865d4</id>
<content type='text'>
SCTP does not SCTP_STATE_EMPTY and we can never be in
that state.  Remove useless code.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SCTP does not SCTP_STATE_EMPTY and we can never be in
that state.  Remove useless code.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: teach CACC algorithm about removed transports</title>
<updated>2011-04-20T04:45:21+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vladislav.yasevich@hp.com</email>
</author>
<published>2011-04-18T19:13:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f246a7b7c5b9df0ea0f0807a7101995af5e83213'/>
<id>f246a7b7c5b9df0ea0f0807a7101995af5e83213</id>
<content type='text'>
When we have have to remove a transport due to ASCONF, we move
the data to a new active path.  This can trigger CACC algorithm
to not mark that data as missing when SACKs arrive.  This is
because the transport passed to the CACC algorithm is the one
this data is sitting on, not the one it was sent on (that one
may be gone).  So, by sending the original transport (even if
it's NULL), we may start marking data as missing.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we have have to remove a transport due to ASCONF, we move
the data to a new active path.  This can trigger CACC algorithm
to not mark that data as missing when SACKs arrive.  This is
because the transport passed to the CACC algorithm is the one
this data is sitting on, not the one it was sent on (that one
may be gone).  So, by sending the original transport (even if
it's NULL), we may start marking data as missing.

Signed-off-by: Vlad Yasevich &lt;vladislav.yasevich@hp.com&gt;
Signed-off-by: Wei Yongjun &lt;yjwei@cn.fujitsu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
