linux.git/net/ipv4/tcp_timer.c, branch v5.1

tcp: Refactor pingpong code

2019-01-27T21:29:43+00:00

Instead of using pingpong as a single bit information, we refactor the
code to treat it as a counter. When interactive session is detected,
we set pingpong count to TCP_PINGPONG_THRESH. And when pingpong count
is >= TCP_PINGPONG_THRESH, we consider the session in pingpong mode.

This patch is a pure refactor and sets foundation for the next patch.
This patch itself does not change any pingpong logic.

Signed-off-by: Wei Wang 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: retry more conservatively on local congestion

2019-01-17T23:12:26+00:00

Previously when the sender fails to retransmit a data packet on
timeout due to congestion in the local host (e.g. throttling in
qdisc), it'll retry within an RTO up to 500ms.

In low-RTT networks such as data-centers, RTO is often far
below the default minimum 200ms (and the cap 500ms). Then local
host congestion could trigger a retry storm pouring gas to the
fire. Worse yet, the retry counter (icsk_retransmits) is not
properly updated so the aggressive retry may exceed the system
limit (15 rounds) until the packet finally slips through.

On such rare events, it's wise to retry more conservatively (500ms)
and update the stats properly to reflect these incidents and follow
the system limit. Note that this is consistent with the behavior
when a keep-alive probe is dropped due to local congestion.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Reviewed-by: Neal Cardwell 
Reviewed-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller

tcp: simplify window probe aborting on USER_TIMEOUT

2019-01-17T23:12:26+00:00

Previously we use the next unsent skb's timestamp to determine
when to abort a socket stalling on window probes. This no longer
works as skb timestamp reflects the last instead of the first
transmission.

Instead we can estimate how long the socket has been stalling
with the probe count and the exponential backoff behavior.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Reviewed-by: Neal Cardwell 
Reviewed-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller

tcp: create a helper to model exponential backoff

2019-01-17T23:12:26+00:00

Create a helper to model TCP exponential backoff for the next patch.
This is pure refactor w no behavior change.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Reviewed-by: Neal Cardwell 
Reviewed-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller

tcp: properly track retry time on passive Fast Open

2019-01-17T23:12:26+00:00

This patch addresses a corner issue on timeout behavior of a
passive Fast Open socket.  A passive Fast Open server may write
and close the socket when it is re-trying SYN-ACK to complete
the handshake. After the handshake is completely, the server does
not properly stamp the recovery start time (tp->retrans_stamp is
0), and the socket may abort immediately on the very first FIN
timeout, instead of retying until it passes the system or user
specified limit.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Reviewed-by: Neal Cardwell 
Reviewed-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller

tcp: always set retrans_stamp on recovery

2019-01-17T23:12:26+00:00

Previously TCP socket's retrans_stamp is not set if the
retransmission has failed to send. As a result if a socket is
experiencing local issues to retransmit packets, determining when
to abort a socket is complicated w/o knowning the starting time of
the recovery since retrans_stamp may remain zero.

This complication causes sub-optimal behavior that TCP may use the
latest, instead of the first, retransmission time to compute the
elapsed time of a stalling connection due to local issues. Then TCP
may disrecard TCP retries settings and keep retrying until it finally
succeed: not a good idea when the local host is already strained.

The simple fix is to always timestamp the start of a recovery.
It's worth noting that retrans_stamp is also used to compare echo
timestamp values to detect spurious recovery. This patch does
not break that because retrans_stamp is still later than when the
original packet was sent.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Reviewed-by: Neal Cardwell 
Reviewed-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller

tcp: exit if nothing to retransmit on RTO timeout

2019-01-17T23:12:26+00:00

Previously TCP only warns if its RTO timer fires and the
retransmission queue is empty, but it'll cause null pointer
reference later on. It's better to avoid such catastrophic failure
and simply exit with a warning.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Reviewed-by: Neal Cardwell 
Reviewed-by: Soheil Hassas Yeganeh 
Signed-off-by: David S. Miller

tcp: change txhash on SYN-data timeout

2019-01-10T21:55:41+00:00

Previously upon SYN timeouts the sender recomputes the txhash to
try a different path. However this does not apply on the initial
timeout of SYN-data (active Fast Open). Therefore an active IPv6
Fast Open connection may incur one second RTO penalty to take on
a new path after the second SYN retransmission uses a new flow label.

This patch removes this undesirable behavior so Fast Open changes
the flow label just like the regular connections. This also helps
avoid falsely disabling Fast Open on the sender which triggers
after two consecutive SYN timeouts on Fast Open.

Signed-off-by: Yuchung Cheng 
Reviewed-by: Neal Cardwell 
Signed-off-by: David S. Miller

tcp: fix SNMP TCP timeout under-estimation

2018-12-01T01:22:41+00:00

Previously the SNMP TCPTIMEOUTS counter has inconsistent accounting:
1. It counts all SYN and SYN-ACK timeouts
2. It counts timeouts in other states except recurring timeouts and
   timeouts after fast recovery or disorder state.

Such selective accounting makes analysis difficult and complicated. For
example the monitoring system needs to collect many other SNMP counters
to infer the total amount of timeout events. This patch makes TCPTIMEOUTS
counter simply counts all the retransmit timeout (SYN or data or FIN).

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller

tcp: fix off-by-one bug on aborting window-probing socket

2018-12-01T01:22:41+00:00

Previously there is an off-by-one bug on determining when to abort
a stalled window-probing socket. This patch fixes that so it is
consistent with tcp_write_timeout().

Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller