linux-stable.git/net/ipv4/tcp_timer.c, branch v6.8

tcp: use tp->total_rto to track number of linear timeouts in SYN_SENT state

2023-11-16T23:35:12+00:00

In commit ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear")
David used icsk->icsk_backoff field to track the number of linear timeouts.

Since then, tp->total_rto has been added.

This commit uses tp->total_rto instead of icsk->icsk_backoff
so that tcp_ld_RTO_revert() no longer can trigger an overflow
in inet_csk_rto_backoff(). Other than the potential UBSAN
report, there was no issue because receiving an ICMP message
currently aborts the connect().

In the following patch, we want to adhere to RFC 6069
and RFC 1122 4.2.3.9.

Signed-off-by: Eric Dumazet 
Cc: David Morley 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Signed-off-by: David S. Miller

tcp: add support for usec resolution in TCP TS values

2023-10-23T08:35:01+00:00

Back in 2015, Van Jacobson suggested to use usec resolution in TCP TS values.
This has been implemented in our private kernels.

Goals were :

1) better observability of delays in networking stacks.
2) better disambiguation of events based on TSval/ecr values.
3) building block for congestion control modules needing usec resolution.

Back then we implemented a schem based on private SYN options
to negotiate the feature.

For upstream submission, we chose to use a route attribute,
because this feature is probably going to be used in private
networks [1] [2].

ip route add 10/8 ... features tcp_usec_ts

Note that RFC 7323 recommends a
  "timestamp clock frequency in the range 1 ms to 1 sec per tick.",
but also mentions
  "the maximum acceptable clock frequency is one tick every 59 ns."

[1] Unfortunately RFC 7323 5.5 (Outdated Timestamps) suggests
to invalidate TS.Recent values after a flow was idle for more
than 24 days. This is the part making usec_ts a problem
for peers following this recommendation for long living
idle flows.

[2] Attempts to standardize usec ts went nowhere:

https://www.ietf.org/proceedings/97/slides/slides-97-tcpm-tcp-options-for-low-latency-00.pdf
https://datatracker.ietf.org/doc/draft-wang-tcpm-low-latency-opt/

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: rename tcp_time_stamp() to tcp_time_stamp_ts()

2023-10-23T08:35:01+00:00

This helper returns a TSval from a TCP socket.

It currently calls tcp_time_stamp_ms() but will soon
be able to return a usec based TSval, depending
on an upcoming tp->tcp_usec_ts field.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: rename tcp_skb_timestamp()

2023-10-23T08:35:01+00:00

This helper returns a 32bit TCP TSval from skb->tstamp.

As we are going to support usec or ms units soon, rename it
to tcp_skb_timestamp_ts() and add a boolean to select the unit.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: add tcp_time_stamp_ms() helper

2023-10-23T08:35:00+00:00

In preparation of adding usec TCP TS values, add tcp_time_stamp_ms()
for contexts needing ms based values.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: record last received ipv6 flowlabel

2023-10-10T08:02:59+00:00

In order to better estimate whether a data packet has been
retransmitted or is the result of a TLP, we save the last received
ipv6 flowlabel.

To make space for this field we resize the "ato" field in
inet_connection_sock as the current value of TCP_DELACK_MAX can be
fully contained in 8 bits and add a compile_time_assert ensuring this
field is the required size.

v2: addressed kernel bot feedback about dccp_delack_timer()
v3: addressed build error introduced by commit bbf80d713fe7 ("tcp:
derive delack_max from rto_min")

Signed-off-by: David Morley 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Tested-by: David Morley 
Reviewed-by: Eric Dumazet 
Signed-off-by: Paolo Abeni

tcp: new TCP_INFO stats for RTO events

2023-09-16T12:42:34+00:00

The 2023 SIGCOMM paper "Improving Network Availability with Protective
ReRoute" has indicated Linux TCP's RTO-triggered txhash rehashing can
effectively reduce application disruption during outages. To better
measure the efficacy of this feature, this patch adds three more
detailed stats during RTO recovery and exports via TCP_INFO.
Applications and monitoring systems can leverage this data to measure
the network path diversity and end-to-end repair latency during network
outages to improve their network infrastructure.

The following counters are added to tcp_sock in order to track RTO
events over the lifetime of a TCP socket.

1. u16 total_rto - Counts the total number of RTO timeouts.
2. u16 total_rto_recoveries - Counts the total number of RTO recoveries.
3. u32 total_rto_time - Counts the total time spent (ms) in RTO
                        recoveries. (time spent in CA_Loss and
                        CA_Recovery states)

To compute total_rto_time, we add a new u32 rto_stamp field to
tcp_sock. rto_stamp records the start timestamp (ms) of the last RTO
recovery (CA_Loss).

Corresponding fields are also added to the tcp_info struct.

Signed-off-by: Aananth V 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: indent an if statement

2023-09-15T12:55:50+00:00

Indent this if statement one tab.

Signed-off-by: Dan Carpenter 
Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

2023-08-18T19:44:56+00:00

Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/sfc/tc.c
  fa165e194997 ("sfc: don't unregister flow_indr if it was never registered")
  3bf969e88ada ("sfc: add MAE table machinery for conntrack table")
https://lore.kernel.org/all/20230818112159.7430e9b4@canb.auug.org.au/

No adjacent changes.

Signed-off-by: Jakub Kicinski

net: fix the RTO timer retransmitting skb every 1ms if linear option is enabled

2023-08-15T19:24:04+00:00

In the real workload, I encountered an issue which could cause the RTO
timer to retransmit the skb per 1ms with linear option enabled. The amount
of lost-retransmitted skbs can go up to 1000+ instantly.

The root cause is that if the icsk_rto happens to be zero in the 6th round
(which is the TCP_THIN_LINEAR_RETRIES value), then it will always be zero
due to the changed calculation method in tcp_retransmit_timer() as follows:

icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);

Above line could be converted to
icsk->icsk_rto = min(0 << 1, TCP_RTO_MAX) = 0

Therefore, the timer expires so quickly without any doubt.

I read through the RFC 6298 and found that the RTO value can be rounded
up to a certain value, in Linux, say TCP_RTO_MIN as default, which is
regarded as the lower bound in this patch as suggested by Eric.

Fixes: 36e31b0af587 ("net: TCP thin linear timeouts")
Suggested-by: Eric Dumazet 
Signed-off-by: Jason Xing 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller