linux.git/net/sctp/inqueue.c, branch v4.13

sctp: remove the typedef sctp_chunkhdr_t

2017-07-01T16:08:41+00:00

This patch is to remove the typedef sctp_chunkhdr_t, and replace
with struct sctp_chunkhdr in the places where it's using this
typedef.

It is also to fix some indents and use sizeof(variable) instead
of sizeof(type)., especially in sctp_new.

Signed-off-by: Xin Long 
Signed-off-by: David S. Miller

sctp: rename WORD_TRUNC/ROUND macros

2016-09-22T07:13:26+00:00

To something more meaningful these days, specially because this is
working on packet headers or lengths and which are not tied to any CPU
arch but to the protocol itself.

So, WORD_TRUNC becomes SCTP_TRUNC4 and WORD_ROUND becomes SCTP_PAD4.

Reported-by: David Laight 
Reported-by: David Miller 
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

sctp: linearize early if it's not GSO

2016-08-20T00:09:42+00:00

Because otherwise when crc computation is still needed it's way more
expensive than on a linear buffer to the point that it affects
performance.

It's so expensive that netperf test gives a perf output as below:

Overhead  Command         Shared Object       Symbol
  18,62%  netserver       [kernel.vmlinux]    [k] crc32_generic_shift
   2,57%  netserver       [kernel.vmlinux]    [k] __pskb_pull_tail
   1,94%  netserver       [kernel.vmlinux]    [k] fib_table_lookup
   1,90%  netserver       [kernel.vmlinux]    [k] copy_user_enhanced_fast_string
   1,66%  swapper         [kernel.vmlinux]    [k] intel_idle
   1,63%  netserver       [kernel.vmlinux]    [k] _raw_spin_lock
   1,59%  netserver       [sctp]              [k] sctp_packet_transmit
   1,55%  netserver       [kernel.vmlinux]    [k] memcpy_erms
   1,42%  netserver       [sctp]              [k] sctp_rcv

# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

212992 212992  12000    10.00      3016.42   2.88     3.78     1.874   2.462

After patch:
Overhead  Command         Shared Object      Symbol
   2,75%  netserver       [kernel.vmlinux]   [k] memcpy_erms
   2,63%  netserver       [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
   2,39%  netserver       [kernel.vmlinux]   [k] fib_table_lookup
   2,04%  netserver       [kernel.vmlinux]   [k] __pskb_pull_tail
   1,91%  netserver       [kernel.vmlinux]   [k] _raw_spin_lock
   1,91%  netserver       [sctp]             [k] sctp_packet_transmit
   1,72%  netserver       [mlx4_en]          [k] mlx4_en_process_rx_cq
   1,68%  netserver       [sctp]             [k] sctp_rcv

# netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
Recv   Send    Send                          Utilization       Service Demand
Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
Size   Size    Size     Time     Throughput  local    remote   local   remote
bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB

212992 212992  12000    10.00      3681.77   3.83     3.46     2.045   1.849

Fixes: 3acb50c18d8d ("sctp: delay as much as possible skb_linearize")
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

sctp: fix BH handling on socket backlog

2016-07-25T18:22:22+00:00

Now that the backlog processing is called with BH enabled, we have to
disable BH before taking the socket lock via bh_lock_sock() otherwise
it may dead lock:

sctp_backlog_rcv()
                bh_lock_sock(sk);

                if (sock_owned_by_user(sk)) {
                        if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                sctp_chunk_free(chunk);
                        else
                                backloged = 1;
                } else
                        sctp_inq_push(inqueue, chunk);

                bh_unlock_sock(sk);

while sctp_inq_push() was disabling/enabling BH, but enabling BH
triggers pending softirq, which then may try to re-lock the socket in
sctp_rcv().

[  219.187215]  
[  219.187217]  [] _raw_spin_lock+0x20/0x30
[  219.187223]  [] sctp_rcv+0x48c/0xba0 [sctp]
[  219.187225]  [] ? nf_iterate+0x62/0x80
[  219.187226]  [] ip_local_deliver_finish+0x94/0x1e0
[  219.187228]  [] ip_local_deliver+0x6f/0xf0
[  219.187229]  [] ? ip_rcv_finish+0x3b0/0x3b0
[  219.187230]  [] ip_rcv_finish+0xd8/0x3b0
[  219.187232]  [] ip_rcv+0x282/0x3a0
[  219.187233]  [] ? update_curr+0x66/0x180
[  219.187235]  [] __netif_receive_skb_core+0x524/0xa90
[  219.187236]  [] ? update_cfs_shares+0x30/0xf0
[  219.187237]  [] ? __enqueue_entity+0x6c/0x70
[  219.187239]  [] ? enqueue_entity+0x204/0xdf0
[  219.187240]  [] __netif_receive_skb+0x18/0x60
[  219.187242]  [] process_backlog+0x9e/0x140
[  219.187243]  [] net_rx_action+0x22c/0x370
[  219.187245]  [] __do_softirq+0x112/0x2e7
[  219.187247]  [] do_softirq_own_stack+0x1c/0x30
[  219.187247]  
[  219.187248]  [] do_softirq.part.14+0x38/0x40
[  219.187249]  [] __local_bh_enable_ip+0x7d/0x80
[  219.187254]  [] sctp_inq_push+0x68/0x80 [sctp]
[  219.187258]  [] sctp_backlog_rcv+0x151/0x1c0 [sctp]
[  219.187260]  [] __release_sock+0x87/0xf0
[  219.187261]  [] release_sock+0x30/0xa0
[  219.187265]  [] sctp_accept+0x17d/0x210 [sctp]
[  219.187266]  [] ? prepare_to_wait_event+0xf0/0xf0
[  219.187268]  [] inet_accept+0x3c/0x130
[  219.187269]  [] SYSC_accept4+0x103/0x210
[  219.187271]  [] ? _raw_spin_unlock_bh+0x1a/0x20
[  219.187272]  [] ? release_sock+0x8c/0xa0
[  219.187276]  [] ? sctp_inet_listen+0x62/0x1b0 [sctp]
[  219.187277]  [] SyS_accept+0x10/0x20

Fixes: 860fbbc343bf ("sctp: prepare for socket backlog behavior change")
Cc: Eric Dumazet 
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

sctp: do not clear chunk->ecn_ce_done flag

2016-07-14T01:10:14+00:00

We should not clear that flag when switching to a new skb from a GSO skb
because it would cause ECN processing to happen multiple times per GSO
skb, which is not wanted. Instead, let it be processed once per chunk.
That is, in other words, once per IP header available.

Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

sctp: avoid identifying address family many times for a chunk

2016-07-14T01:10:14+00:00

Identifying address family operations during rx path is not something
expensive but it's ugly to the eye to have it done multiple times,
specially when we already validated it during initial rx processing.

This patch takes advantage of the now shared sctp_input_cb and make the
pointer to the operations readily available.

Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

sctp: allow GSO frags to access the chunk too

2016-07-14T01:10:14+00:00

SCTP will try to access original IP headers on sctp_recvmsg in order to
copy the addresses used. There are also other places that do similar access
to IP or even SCTP headers. But after 90017accff61 ("sctp: Add GSO
support") they aren't always there because they are only present in the
header skb.

SCTP handles the queueing of incoming data by cloning the incoming skb
and limiting to only the relevant payload. This clone has its cb updated
to something different and it's then queued on socket rx queue. Thus we
need to fix this in two moments.

For rx path, not related to socket queue yet, this patch uses a
partially copied sctp_input_cb to such GSO frags. This restores the
ability to access the headers for this part of the code.

Regarding the socket rx queue, it removes iif member from sctp_event and
also add a chunk pointer on it.

With these changes we're always able to reach the headers again.

The biggest change here is that now the sctp_chunk struct and the
original skb are only freed after the application consumed the buffer.
Note however that the original payload was already like this due to the
skb cloning.

For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
IPv6 now can fetch it directly from original's IPv6 CB as the original
skb is still accessible.

In the future we probably can simplify sctp_v*_skb_iif() stuff, as
sctp_v4_skb_iif() was called but it's return value not used, and now
it's not even called, but such cleanup is out of scope for this change.

Fixes: 90017accff61 ("sctp: Add GSO support")
Signed-off-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller

sctp: Add GSO support

2016-06-03T23:37:21+00:00

SCTP has this pecualiarity that its packets cannot be just segmented to
(P)MTU. Its chunks must be contained in IP segments, padding respected.
So we can't just generate a big skb, set gso_size to the fragmentation
point and deliver it to IP layer.

This patch takes a different approach. SCTP will now build a skb as it
would be if it was received using GRO. That is, there will be a cover
skb with protocol headers and children ones containing the actual
segments, already segmented to a way that respects SCTP RFCs.

With that, we can tell skb_segment() to just split based on frag_list,
trusting its sizes are already in accordance.

This way SCTP can benefit from GSO and instead of passing several
packets through the stack, it can pass a single large packet.

v2:
- Added support for receiving GSO frames, as requested by Dave Miller.
- Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
- Added heuristics similar to what we have in TCP for not generating
  single GSO packets that fills cwnd.
v3:
- consider sctphdr size in skb_gso_transport_seglen()
- rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for
  unsupported GSO")

Signed-off-by: Marcelo Ricardo Leitner 
Tested-by: Xin Long 
Signed-off-by: David S. Miller

sctp: delay as much as possible skb_linearize

2016-06-03T23:37:21+00:00

This patch is a preparation for the GSO one. In order to successfully
handle GSO packets on rx path we must not call skb_linearize, otherwise
it defeats any gain GSO may have had.

This patch thus delays as much as possible the call to skb_linearize,
leaving it to sctp_inq_pop() moment. For that the sanity checks
performed now know how to deal with fragments.

One positive side-effect of this is that if the socket is backlogged it
will have the chance of doing it on backlog processing instead of
during softirq.

With this move, it's evident that a check for non-linearity in
sctp_inq_pop was ineffective and is now removed. Note that a similar
check is performed a bit below this one.

Signed-off-by: Marcelo Ricardo Leitner 
Tested-by: Xin Long 
Signed-off-by: David S. Miller

sctp: prepare for socket backlog behavior change

2016-05-02T21:02:26+00:00

sctp_inq_push() will soon be called without BH being blocked
when generic socket code flushes the socket backlog.

It is very possible SCTP can be converted to not rely on BH,
but this needs to be done by SCTP experts.

Signed-off-by: Eric Dumazet 
Acked-by: Marcelo Ricardo Leitner 
Signed-off-by: David S. Miller