linux-stable.git/include/net/tcp.h, branch linux-4.5.y

tcp: do not drop syn_recv on all icmp reports

2016-02-09T09:15:37+00:00

Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.

This is of course incorrect.

A specific list of ICMP messages should be able to drop a SYN_RECV.

For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: Petr Novopashenniy 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: Change reference to experimental CWND RFC.

2016-01-29T20:23:58+00:00

Signed-off-by: Jörg Thalheim 
Signed-off-by: David S. Miller

net: tcp_memcontrol: simplify linkage between socket and page counter

2016-01-15T00:00:49+00:00

There won't be any separate counters for socket memory consumed by
protocols other than TCP in the future.  Remove the indirection and link
sockets directly to their owning memory cgroup.

Signed-off-by: Johannes Weiner 
Reviewed-by: Vladimir Davydov 
Acked-by: David S. Miller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

net: tcp_memcontrol: sanitize tcp memory accounting callbacks

2016-01-15T00:00:49+00:00

There won't be a tcp control soft limit, so integrating the memcg code
into the global skmem limiting scheme complicates things unnecessarily.
Replace this with simple and clear charge and uncharge calls--hidden
behind a jump label--to account skb memory.

Note that this is not purely aesthetic: as a result of shoehorning the
per-memcg code into the same memory accounting functions that handle the
global level, the old code would compare the per-memcg consumption
against the smaller of the per-memcg limit and the global limit.  This
allowed the total consumption of multiple sockets to exceed the global
limit, as long as the individual sockets stayed within bounds.  After
this change, the code will always compare the per-memcg consumption to
the per-memcg limit, and the global consumption to the global limit, and
thus close this loophole.

Without a soft limit, the per-memcg memory pressure state in sockets is
generally questionable.  However, we did it until now, so we continue to
enter it when the hard limit is hit, and packets are dropped, to let
other sockets in the cgroup know that they shouldn't grow their transmit
windows, either.  However, keep it simple in the new callback model and
leave memory pressure lazily when the next packet is accepted (as
opposed to doing it synchroneously when packets are processed).  When
packets are dropped, network performance will already be in the toilet,
so that should be a reasonable trade-off.

As described above, consumption is now checked on the per-memcg level
and the global level separately.  Likewise, memory pressure states are
maintained on both the per-memcg level and the global level, and a
socket is considered under pressure when either level asserts as much.

Signed-off-by: Johannes Weiner 
Reviewed-by: Vladimir Davydov 
Acked-by: David S. Miller 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

ipv4: Namespecify the tcp_keepalive_intvl sysctl knob

2016-01-10T22:32:09+00:00

This is the final part required to namespaceify the tcp
keep alive mechanism.

Signed-off-by: Nikolay Borisov 
Signed-off-by: David S. Miller

ipv4: Namespecify tcp_keepalive_probes sysctl knob

2016-01-10T22:32:09+00:00

This is required to have full tcp keepalive mechanism namespace
support.

Signed-off-by: Nikolay Borisov 
Signed-off-by: David S. Miller

ipv4: Namespaceify tcp_keepalive_time sysctl knob

2016-01-10T22:32:09+00:00

Different net namespaces might have different requirements as to
the keepalive time of tcp sockets. This might be required in cases
where different firewall rules are in place which require tcp
timeout sockets to be increased/decreased independently of the host.

Signed-off-by: Nikolay Borisov 
Signed-off-by: David S. Miller

net: add inet_sk_transparent() helper

2015-12-22T22:03:05+00:00

Avoids cluttering tcp_v4_send_reset when followup patch extends
it to deal with timewait sockets.

Suggested-by: Eric Dumazet 
Signed-off-by: Florian Westphal 
Acked-by: Eric Dumazet 
Acked-by: Hannes Frederic Sowa 
Signed-off-by: David S. Miller

net: diag: Support destroying TCP sockets.

2015-12-16T04:26:52+00:00

This implements SOCK_DESTROY for TCP sockets. It causes all
blocking calls on the socket to fail fast with ECONNABORTED and
causes a protocol close of the socket. It informs the other end
of the connection by sending a RST, i.e., initiating a TCP ABORT
as per RFC 793. ECONNABORTED was chosen for consistency with
FreeBSD.

Signed-off-by: Lorenzo Colitti 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp/dccp: fix hashdance race for passive sessions

2015-10-23T12:42:21+00:00

Multiple cpus can process duplicates of incoming ACK messages
matching a SYN_RECV request socket. This is a rare event under
normal operations, but definitely can happen.

Only one must win the race, otherwise corruption would occur.

To fix this without adding new atomic ops, we use logic in
inet_ehash_nolisten() to detect the request was present in the same
ehash bucket where we try to insert the new child.

If request socket was not found, we have to undo the child creation.

This actually removes a spin_lock()/spin_unlock() pair in
reqsk_queue_unlink() for the fast path.

Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller