<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/sctp/ulpqueue.c, branch v4.7</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>sctp: signal sk_data_ready earlier on data chunks reception</title>
<updated>2016-05-02T01:06:10+00:00</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>marcelo.leitner@gmail.com</email>
</author>
<published>2016-04-29T17:17:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0970f5b3665933f5f0d069607c78fb10bd918b62'/>
<id>0970f5b3665933f5f0d069607c78fb10bd918b62</id>
<content type='text'>
Dave Miller pointed out that fb586f25300f ("sctp: delay calls to
sk_data_ready() as much as possible") may insert latency specially if
the receiving application is running on another CPU and that it would be
better if we signalled as early as possible.

This patch thus basically inverts the logic on fb586f25300f and signals
it as early as possible, similar to what we had before.

Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
Reported-by: Dave Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Dave Miller pointed out that fb586f25300f ("sctp: delay calls to
sk_data_ready() as much as possible") may insert latency specially if
the receiving application is running on another CPU and that it would be
better if we signalled as early as possible.

This patch thus basically inverts the logic on fb586f25300f and signals
it as early as possible, similar to what we had before.

Fixes: fb586f25300f ("sctp: delay calls to sk_data_ready() as much as possible")
Reported-by: Dave Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: simplify sk_receive_queue locking</title>
<updated>2016-04-15T21:22:20+00:00</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>marcelo.leitner@gmail.com</email>
</author>
<published>2016-04-13T22:12:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=311b21774f1389f9c34eac4da90c43c95fc2b62b'/>
<id>311b21774f1389f9c34eac4da90c43c95fc2b62b</id>
<content type='text'>
SCTP already serializes access to rcvbuf through its sock lock:
sctp_recvmsg takes it right in the start and release at the end, while
rx path will also take the lock before doing any socket processing. On
sctp_rcv() it will check if there is an user using the socket and, if
there is, it will queue incoming packets to the backlog. The backlog
processing will do the same. Even timers will do such check and
re-schedule if an user is using the socket.

Simplifying this will allow us to remove sctp_skb_list_tail and get ride
of some expensive lockings.  The lists that it is used on are also
mangled with functions like __skb_queue_tail and __skb_unlink in the
same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
sctp_close() will also purge those while using only the sock lock.

Therefore the lockings performed by sctp_skb_list_tail() are not
necessary. This patch removes this function and replaces its calls with
just skb_queue_splice_tail_init() instead.

The biggest gain is at sctp_ulpq_tail_event(), because the events always
contain a list, even if it's queueing a single skb and this was
triggering expensive calls to spin_lock_irqsave/_irqrestore for every
data chunk received.

As SCTP will deliver each data chunk on a corresponding recvmsg, the
more effective the change will be.
Before this patch, with chunks with 30 bytes:
netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
400000 -s 400000 400000
on a 10Gbit link with 1500 MTU:

SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

425984 425984     30    60.00       137.45   7.34     7.36     52.504  52.608

With it:

SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

425984 425984     30    60.00       179.10   7.97     6.70     43.740  36.788

Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
SCTP already serializes access to rcvbuf through its sock lock:
sctp_recvmsg takes it right in the start and release at the end, while
rx path will also take the lock before doing any socket processing. On
sctp_rcv() it will check if there is an user using the socket and, if
there is, it will queue incoming packets to the backlog. The backlog
processing will do the same. Even timers will do such check and
re-schedule if an user is using the socket.

Simplifying this will allow us to remove sctp_skb_list_tail and get ride
of some expensive lockings.  The lists that it is used on are also
mangled with functions like __skb_queue_tail and __skb_unlink in the
same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
sctp_close() will also purge those while using only the sock lock.

Therefore the lockings performed by sctp_skb_list_tail() are not
necessary. This patch removes this function and replaces its calls with
just skb_queue_splice_tail_init() instead.

The biggest gain is at sctp_ulpq_tail_event(), because the events always
contain a list, even if it's queueing a single skb and this was
triggering expensive calls to spin_lock_irqsave/_irqrestore for every
data chunk received.

As SCTP will deliver each data chunk on a corresponding recvmsg, the
more effective the change will be.
Before this patch, with chunks with 30 bytes:
netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
400000 -s 400000 400000
on a 10Gbit link with 1500 MTU:

SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

425984 425984     30    60.00       137.45   7.34     7.36     52.504  52.608

With it:

SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

425984 425984     30    60.00       179.10   7.97     6.70     43.740  36.788

Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: delay calls to sk_data_ready() as much as possible</title>
<updated>2016-04-14T03:04:44+00:00</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>marcelo.leitner@gmail.com</email>
</author>
<published>2016-04-08T19:41:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fb586f25300f4587c7ebd097a604bf269b25bfa7'/>
<id>fb586f25300f4587c7ebd097a604bf269b25bfa7</id>
<content type='text'>
Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.

With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.

Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.

v2: series re-checked
v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently processing of multiple chunks in a single SCTP packet leads to
multiple calls to sk_data_ready, causing multiple wake up signals which
are costy and doesn't make it wake up any faster.

With this patch it will note that the wake up is pending and will do it
before leaving the state machine interpreter, latest place possible to
do it realiably and cleanly.

Note that sk_data_ready events are not dependent on asocs, unlike waking
up writers.

v2: series re-checked
v3: use local vars to cleanup the code, suggested by Jakub Sitnicki
Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: introduce SO_INCOMING_CPU</title>
<updated>2014-11-11T18:00:06+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2014-11-11T13:54:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2c8c56e15df3d4c2af3d656e44feb18789f75837'/>
<id>2c8c56e15df3d4c2af3d656e44feb18789f75837</id>
<content type='text'>
Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

 int cpu;
 socklen_t len = sizeof(cpu);

 getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &amp;cpu, &amp;len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Alternative to RPS/RFS is to use hardware support for multiple
queues.

Then split a set of million of sockets into worker threads, each
one using epoll() to manage events on its own socket pool.

Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
know after accept() or connect() on which queue/cpu a socket is managed.

We normally use one cpu per RX queue (IRQ smp_affinity being properly
set), so remembering on socket structure which cpu delivered last packet
is enough to solve the problem.

After accept(), connect(), or even file descriptor passing around
processes, applications can use :

 int cpu;
 socklen_t len = sizeof(cpu);

 getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &amp;cpu, &amp;len);

And use this information to put the socket into the right silo
for optimal performance, as all networking stack should run
on the appropriate cpu, without need to send IPI (RPS/RFS).

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: add support for busy polling to sctp protocol</title>
<updated>2014-04-20T22:18:55+00:00</updated>
<author>
<name>Neil Horman</name>
<email>nhorman@tuxdriver.com</email>
</author>
<published>2014-04-17T19:26:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8465a5fcd1ceba8f2b55121d47b73f4025401490'/>
<id>8465a5fcd1ceba8f2b55121d47b73f4025401490</id>
<content type='text'>
The busy polling socket option adds support for sockets to busy wait on data
arriving on the napi queue from which they have most recently received a frame.
Currently only tcp and udp support this feature, but theres no reason sctp can't
do so as well.  Add it in so appliations can take advantage of it

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
CC: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Daniel Borkmann &lt;dborkman@redhat.com&gt;
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Acked-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The busy polling socket option adds support for sockets to busy wait on data
arriving on the napi queue from which they have most recently received a frame.
Currently only tcp and udp support this feature, but theres no reason sctp can't
do so as well.  Add it in so appliations can take advantage of it

Signed-off-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
CC: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
CC: "David S. Miller" &lt;davem@davemloft.net&gt;
CC: Daniel Borkmann &lt;dborkman@redhat.com&gt;
CC: netdev@vger.kernel.org
Acked-by: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
Acked-by: Daniel Borkmann &lt;dborkman@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Fix use after free by removing length arg from sk_data_ready callbacks.</title>
<updated>2014-04-11T20:15:36+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2014-04-11T20:15:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=676d23690fb62b5d51ba5d659935e9f7d9da9f8e'/>
<id>676d23690fb62b5d51ba5d659935e9f7d9da9f8e</id>
<content type='text'>
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&amp;sk-&gt;s_receive_queue, skb);
	sk-&gt;sk_data_ready(sk, skb-&gt;len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb-&gt;len access is potentially
to freed up memory.

Furthermore, the skb-&gt;len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Several spots in the kernel perform a sequence like:

	skb_queue_tail(&amp;sk-&gt;s_receive_queue, skb);
	sk-&gt;sk_data_ready(sk, skb-&gt;len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb-&gt;len access is potentially
to freed up memory.

Furthermore, the skb-&gt;len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: fix checkpatch errors with open brace '{' and trailing statements</title>
<updated>2013-12-26T18:47:48+00:00</updated>
<author>
<name>wangweidong</name>
<email>wangweidong1@huawei.com</email>
</author>
<published>2013-12-23T04:16:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8d72651d86e9c702d37dd9ef9f084ce027af90a7'/>
<id>8d72651d86e9c702d37dd9ef9f084ce027af90a7</id>
<content type='text'>
fix checkpatch errors below:
ERROR: that open brace { should be on the previous line
ERROR: open brace '{' following function declarations go on the next line
ERROR: trailing statements should be on next line

Signed-off-by: Wang Weidong &lt;wangweidong1@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fix checkpatch errors below:
ERROR: that open brace { should be on the previous line
ERROR: open brace '{' following function declarations go on the next line
ERROR: trailing statements should be on next line

Signed-off-by: Wang Weidong &lt;wangweidong1@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: fix checkpatch errors with (foo*)|foo * bar|foo* bar</title>
<updated>2013-12-26T18:47:47+00:00</updated>
<author>
<name>wangweidong</name>
<email>wangweidong1@huawei.com</email>
</author>
<published>2013-12-23T04:16:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=26ac8e5fe1562831e68ccd9f7057aade37aab2a3'/>
<id>26ac8e5fe1562831e68ccd9f7057aade37aab2a3</id>
<content type='text'>
fix checkpatch errors below:
ERROR: "(foo*)" should be "(foo *)"
ERROR: "foo * bar" should be "foo *bar"
ERROR: "foo* bar" should be "foo *bar"

Signed-off-by: Wang Weidong &lt;wangweidong1@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fix checkpatch errors below:
ERROR: "(foo*)" should be "(foo *)"
ERROR: "foo * bar" should be "foo *bar"
ERROR: "foo* bar" should be "foo *bar"

Signed-off-by: Wang Weidong &lt;wangweidong1@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: fix checkpatch errors with space required or prohibited</title>
<updated>2013-12-26T18:47:47+00:00</updated>
<author>
<name>wangweidong</name>
<email>wangweidong1@huawei.com</email>
</author>
<published>2013-12-23T04:16:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=cb3f837ba95d7774978e86fc17ddf970cf7d15a4'/>
<id>cb3f837ba95d7774978e86fc17ddf970cf7d15a4</id>
<content type='text'>
fix checkpatch errors while the space is required or prohibited
to the "=,()++..."

Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: Wang Weidong &lt;wangweidong1@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fix checkpatch errors while the space is required or prohibited
to the "=,()++..."

Acked-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: Wang Weidong &lt;wangweidong1@huawei.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sctp: Fix FSF address in file headers</title>
<updated>2013-12-06T17:37:56+00:00</updated>
<author>
<name>Jeff Kirsher</name>
<email>jeffrey.t.kirsher@intel.com</email>
</author>
<published>2013-12-06T14:28:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4b2f13a25133b115eb56771bd4a8e71a82aea968'/>
<id>4b2f13a25133b115eb56771bd4a8e71a82aea968</id>
<content type='text'>
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL &lt;http://www.gnu.org/licenses/&gt; so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
CC: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Several files refer to an old address for the Free Software Foundation
in the file header comment.  Resolve by replacing the address with
the URL &lt;http://www.gnu.org/licenses/&gt; so that we do not have to keep
updating the header comments anytime the address changes.

CC: Vlad Yasevich &lt;vyasevich@gmail.com&gt;
CC: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Signed-off-by: Jeff Kirsher &lt;jeffrey.t.kirsher@intel.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
