<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/net/ipv4, branch v5.6-rc4</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>tcp: fix TFO SYNACK undo to avoid double-timestamp-undo</title>
<updated>2020-02-24T01:23:35+00:00</updated>
<author>
<name>Neal Cardwell</name>
<email>ncardwell@google.com</email>
</author>
<published>2020-02-22T16:21:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dad8cea7add96a353fa1898b5ccefbb72da66f29'/>
<id>dad8cea7add96a353fa1898b5ccefbb72da66f29</id>
<content type='text'>
In a rare corner case the new logic for undo of SYNACK RTO could
result in triggering the warning in tcp_fastretrans_alert() that says:
        WARN_ON(tp-&gt;retrans_out != 0);

The warning looked like:

WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270

The sequence that tickles this bug is:
 - Fast Open server receives TFO SYN with data, sends SYNACK
 - (client receives SYNACK and sends ACK, but ACK is lost)
 - server app sends some data packets
 - (N of the first data packets are lost)
 - server receives client ACK that has a TS ECR matching first SYNACK,
   and also SACKs suggesting the first N data packets were lost
    - server performs TS undo of SYNACK RTO, then immediately
      enters recovery
    - buggy behavior then performed a *second* undo that caused
      the connection to be in CA_Open with retrans_out != 0

Basically, the incoming ACK packet with SACK blocks causes us to first
undo the cwnd reduction from the SYNACK RTO, but then immediately
enters fast recovery, which then makes us eligible for undo again. And
then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo
using a "mash-up" of state from two different loss recovery phases: it
uses the timestamp info from the ACK of the original SYNACK, and the
undo_marker from the fast recovery.

This fix refines the logic to only invoke the tcp_try_undo_loss()
inside tcp_rcv_synrecv_state_fastopen() if the connection is still in
CA_Loss.  If peer SACKs triggered fast recovery, then
tcp_rcv_synrecv_state_fastopen() can't safely undo.

Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In a rare corner case the new logic for undo of SYNACK RTO could
result in triggering the warning in tcp_fastretrans_alert() that says:
        WARN_ON(tp-&gt;retrans_out != 0);

The warning looked like:

WARNING: CPU: 1 PID: 1 at net/ipv4/tcp_input.c:2818 tcp_ack+0x13e0/0x3270

The sequence that tickles this bug is:
 - Fast Open server receives TFO SYN with data, sends SYNACK
 - (client receives SYNACK and sends ACK, but ACK is lost)
 - server app sends some data packets
 - (N of the first data packets are lost)
 - server receives client ACK that has a TS ECR matching first SYNACK,
   and also SACKs suggesting the first N data packets were lost
    - server performs TS undo of SYNACK RTO, then immediately
      enters recovery
    - buggy behavior then performed a *second* undo that caused
      the connection to be in CA_Open with retrans_out != 0

Basically, the incoming ACK packet with SACK blocks causes us to first
undo the cwnd reduction from the SYNACK RTO, but then immediately
enters fast recovery, which then makes us eligible for undo again. And
then tcp_rcv_synrecv_state_fastopen() accidentally performs an undo
using a "mash-up" of state from two different loss recovery phases: it
uses the timestamp info from the ACK of the original SYNACK, and the
undo_marker from the fast recovery.

This fix refines the logic to only invoke the tcp_try_undo_loss()
inside tcp_rcv_synrecv_state_fastopen() if the connection is still in
CA_Loss.  If peer SACKs triggered fast recovery, then
tcp_rcv_synrecv_state_fastopen() can't safely undo.

Fixes: 794200d66273 ("tcp: undo cwnd on Fast Open spurious SYNACK retransmit")
Signed-off-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipv4: ensure rcu_read_lock() in cipso_v4_error()</title>
<updated>2020-02-23T05:45:55+00:00</updated>
<author>
<name>Matteo Croce</name>
<email>mcroce@redhat.com</email>
</author>
<published>2020-02-21T11:28:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3e72dfdf8227b052393f71d820ec7599909dddc2'/>
<id>3e72dfdf8227b052393f71d820ec7599909dddc2</id>
<content type='text'>
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in
ipv4_link_failure()"), __ip_options_compile() must be called under rcu
protection.

Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in
ipv4_link_failure()"), __ip_options_compile() must be called under rcu
protection.

Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error")
Suggested-by: Guillaume Nault &lt;gnault@redhat.com&gt;
Signed-off-by: Matteo Croce &lt;mcroce@redhat.com&gt;
Acked-by: Paul Moore &lt;paul@paul-moore.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>udp: rehash on disconnect</title>
<updated>2020-02-20T00:34:11+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2020-02-19T19:16:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=303d0403b8c25e994e4a6e45389e173cf8706fb5'/>
<id>303d0403b8c25e994e4a6e45389e173cf8706fb5</id>
<content type='text'>
As of the below commit, udp sockets bound to a specific address can
coexist with one bound to the any addr for the same port.

The commit also phased out the use of socket hashing based only on
port (hslot), in favor of always hashing on {addr, port} (hslot2).

The change broke the following behavior with disconnect (AF_UNSPEC):

    server binds to 0.0.0.0:1337
    server connects to 127.0.0.1:80
    server disconnects
    client connects to 127.0.0.1:1337
    client sends "hello"
    server reads "hello"	// times out, packet did not find sk

On connect the server acquires a specific source addr suitable for
routing to its destination. On disconnect it reverts to the any addr.

The connect call triggers a rehash to a different hslot2. On
disconnect, add the same to return to the original hslot2.

Skip this step if the socket is going to be unhashed completely.

Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address")
Reported-by: Pavel Roskin &lt;plroskin@gmail.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As of the below commit, udp sockets bound to a specific address can
coexist with one bound to the any addr for the same port.

The commit also phased out the use of socket hashing based only on
port (hslot), in favor of always hashing on {addr, port} (hslot2).

The change broke the following behavior with disconnect (AF_UNSPEC):

    server binds to 0.0.0.0:1337
    server connects to 127.0.0.1:80
    server disconnects
    client connects to 127.0.0.1:1337
    client sends "hello"
    server reads "hello"	// times out, packet did not find sk

On connect the server acquires a specific source addr suitable for
routing to its destination. On disconnect it reverts to the any addr.

The connect call triggers a rehash to a different hslot2. On
disconnect, add the same to return to the original hslot2.

Skip this step if the socket is going to be unhashed completely.

Fixes: 4cdeeee9252a ("net: udp: prefer listeners bound to an address")
Reported-by: Pavel Roskin &lt;plroskin@gmail.com&gt;
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>icmp: introduce helper for nat'd source address in network device context</title>
<updated>2020-02-13T22:19:00+00:00</updated>
<author>
<name>Jason A. Donenfeld</name>
<email>Jason@zx2c4.com</email>
</author>
<published>2020-02-11T19:47:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0b41713b606694257b90d61ba7e2712d8457648b'/>
<id>0b41713b606694257b90d61ba7e2712d8457648b</id>
<content type='text'>
This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This introduces a helper function to be called only by network drivers
that wraps calls to icmp[v6]_send in a conntrack transformation, in case
NAT has been used. We don't want to pollute the non-driver path, though,
so we introduce this as a helper to be called by places that actually
make use of this, as suggested by Florian.

Signed-off-by: Jason A. Donenfeld &lt;Jason@zx2c4.com&gt;
Cc: Florian Westphal &lt;fw@strlen.de&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2020-02-04T13:32:20+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-02-04T13:32:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=33b40134e5cfbbccad7f3040d1919889537a3df7'/>
<id>33b40134e5cfbbccad7f3040d1919889537a3df7</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Use after free in rxrpc_put_local(), from David Howells.

 2) Fix 64-bit division error in mlxsw, from Nathan Chancellor.

 3) Make sure we clear various bits of TCP state in response to
    tcp_disconnect(). From Eric Dumazet.

 4) Fix netlink attribute policy in cls_rsvp, from Eric Dumazet.

 5) txtimer must be deleted in stmmac suspend(), from Nicolin Chen.

 6) Fix TC queue mapping in bnxt_en driver, from Michael Chan.

 7) Various netdevsim fixes from Taehee Yoo (use of uninitialized data,
    snapshot panics, stack out of bounds, etc.)

 8) cls_tcindex changes hash table size after allocating the table, fix
    from Cong Wang.

 9) Fix regression in the enforcement of session ID uniqueness in l2tp.
    We only have to enforce uniqueness for IP based tunnels not UDP
    ones. From Ridge Kennedy.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  gtp: use __GFP_NOWARN to avoid memalloc warning
  l2tp: Allow duplicate session creation with UDP
  r8152: Add MAC passthrough support to new device
  net_sched: fix an OOB access in cls_tcindex
  qed: Remove set but not used variable 'p_link'
  tc-testing: add missing 'nsPlugin' to basic.json
  tc-testing: fix eBPF tests failure on linux fresh clones
  net: hsr: fix possible NULL deref in hsr_handle_frame()
  netdevsim: remove unused sdev code
  netdevsim: use __GFP_NOWARN to avoid memalloc warning
  netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs
  netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()
  netdevsim: fix panic in nsim_dev_take_snapshot_write()
  netdevsim: disable devlink reload when resources are being used
  netdevsim: fix using uninitialized resources
  bnxt_en: Fix TC queue mapping.
  bnxt_en: Fix logic that disables Bus Master during firmware reset.
  bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset.
  bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.
  net: stmmac: Delete txtimer in suspend()
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking fixes from David Miller:

 1) Use after free in rxrpc_put_local(), from David Howells.

 2) Fix 64-bit division error in mlxsw, from Nathan Chancellor.

 3) Make sure we clear various bits of TCP state in response to
    tcp_disconnect(). From Eric Dumazet.

 4) Fix netlink attribute policy in cls_rsvp, from Eric Dumazet.

 5) txtimer must be deleted in stmmac suspend(), from Nicolin Chen.

 6) Fix TC queue mapping in bnxt_en driver, from Michael Chan.

 7) Various netdevsim fixes from Taehee Yoo (use of uninitialized data,
    snapshot panics, stack out of bounds, etc.)

 8) cls_tcindex changes hash table size after allocating the table, fix
    from Cong Wang.

 9) Fix regression in the enforcement of session ID uniqueness in l2tp.
    We only have to enforce uniqueness for IP based tunnels not UDP
    ones. From Ridge Kennedy.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (46 commits)
  gtp: use __GFP_NOWARN to avoid memalloc warning
  l2tp: Allow duplicate session creation with UDP
  r8152: Add MAC passthrough support to new device
  net_sched: fix an OOB access in cls_tcindex
  qed: Remove set but not used variable 'p_link'
  tc-testing: add missing 'nsPlugin' to basic.json
  tc-testing: fix eBPF tests failure on linux fresh clones
  net: hsr: fix possible NULL deref in hsr_handle_frame()
  netdevsim: remove unused sdev code
  netdevsim: use __GFP_NOWARN to avoid memalloc warning
  netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs
  netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()
  netdevsim: fix panic in nsim_dev_take_snapshot_write()
  netdevsim: disable devlink reload when resources are being used
  netdevsim: fix using uninitialized resources
  bnxt_en: Fix TC queue mapping.
  bnxt_en: Fix logic that disables Bus Master during firmware reset.
  bnxt_en: Fix RDMA driver failure with SRIOV after firmware reset.
  bnxt_en: Refactor logic to re-enable SRIOV after firmware reset detected.
  net: stmmac: Delete txtimer in suspend()
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>proc: convert everything to "struct proc_ops"</title>
<updated>2020-02-04T03:05:26+00:00</updated>
<author>
<name>Alexey Dobriyan</name>
<email>adobriyan@gmail.com</email>
</author>
<published>2020-02-04T01:37:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=97a32539b9568bb653683349e5a76d02ff3c3e2c'/>
<id>97a32539b9568bb653683349e5a76d02ff3c3e2c</id>
<content type='text'>
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

	llseek		=&gt; proc_lseek
	unlocked_ioctl	=&gt; proc_ioctl

	xxx		=&gt; proc_xxx

	delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
  Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

	llseek		=&gt; proc_lseek
	unlocked_ioctl	=&gt; proc_ioctl

	xxx		=&gt; proc_xxx

	delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
  Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan &lt;adobriyan@gmail.com&gt;
Signed-off-by: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: Reduce SYN resend delay if a suspicous ACK is received</title>
<updated>2020-02-02T21:33:21+00:00</updated>
<author>
<name>SeongJae Park</name>
<email>sjpark@amazon.de</email>
</author>
<published>2020-02-02T03:38:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9603d47bad4642118fa19fd1562569663d9235f6'/>
<id>9603d47bad4642118fa19fd1562569663d9235f6</id>
<content type='text'>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order.  This is possible in RSS disabled environments such as a
connection inside a host.

For example, expected state transitions and required packets for the
disconnection will be similar to below flow.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN--&gt;
	 05 					CLOSE_WAIT
	 06 		&lt;--ACK---
	 07 FIN_WAIT_2
	 08 		&lt;--FIN/ACK---
	 09 TIME_WAIT
	 10 		---ACK--&gt;
	 11 					LAST_ACK
	 12 CLOSED				CLOSED

In some cases such as LINGER option applied socket, the FIN and FIN/ACK
will be substituted to RST and RST/ACK, but there is no difference in
the main logic.

The acks in lines 6 and 8 are the acks.  If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B.  Thus, Process B will left in
CLOSE_WAIT status, as below.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN--&gt;
	 05 					CLOSE_WAIT
	 06 				(&lt;--ACK---)
	 07	  			(&lt;--FIN/ACK---)
	 08 				(fired in right order)
	 09 		&lt;--FIN/ACK---
	 10 		&lt;--ACK---
	 11 		(processed in reverse order)
	 12 FIN_WAIT_2

Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number.  Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection.  If reconnections are frequent, the one second latency
spikes can be a big problem.  Below is a tcpdump results of the problem:

    14.436259 IP 127.0.0.1.45150 &gt; 127.0.0.1.4242: Flags [S], seq 2560603644
    14.436266 IP 127.0.0.1.4242 &gt; 127.0.0.1.45150: Flags [.], ack 5, win 512
    14.436271 IP 127.0.0.1.45150 &gt; 127.0.0.1.4242: Flags [R], seq 2541101298
    /* ONE SECOND DELAY */
    15.464613 IP 127.0.0.1.45150 &gt; 127.0.0.1.4242: Flags [S], seq 2560603644

This commit mitigates the problem by reducing the delay for the next SYN
if the suspicous ACK is received while in SYN_SENT state.

Following commit will add a selftest, which can be also helpful for
understanding of this issue.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: SeongJae Park &lt;sjpark@amazon.de&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When closing a connection, the two acks that required to change closing
socket's status to FIN_WAIT_2 and then TIME_WAIT could be processed in
reverse order.  This is possible in RSS disabled environments such as a
connection inside a host.

For example, expected state transitions and required packets for the
disconnection will be similar to below flow.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN--&gt;
	 05 					CLOSE_WAIT
	 06 		&lt;--ACK---
	 07 FIN_WAIT_2
	 08 		&lt;--FIN/ACK---
	 09 TIME_WAIT
	 10 		---ACK--&gt;
	 11 					LAST_ACK
	 12 CLOSED				CLOSED

In some cases such as LINGER option applied socket, the FIN and FIN/ACK
will be substituted to RST and RST/ACK, but there is no difference in
the main logic.

The acks in lines 6 and 8 are the acks.  If the line 8 packet is
processed before the line 6 packet, it will be just ignored as it is not
a expected packet, and the later process of the line 6 packet will
change the status of Process A to FIN_WAIT_2, but as it has already
handled line 8 packet, it will not go to TIME_WAIT and thus will not
send the line 10 packet to Process B.  Thus, Process B will left in
CLOSE_WAIT status, as below.

	 00 (Process A)				(Process B)
	 01 ESTABLISHED				ESTABLISHED
	 02 close()
	 03 FIN_WAIT_1
	 04 		---FIN--&gt;
	 05 					CLOSE_WAIT
	 06 				(&lt;--ACK---)
	 07	  			(&lt;--FIN/ACK---)
	 08 				(fired in right order)
	 09 		&lt;--FIN/ACK---
	 10 		&lt;--ACK---
	 11 		(processed in reverse order)
	 12 FIN_WAIT_2

Later, if the Process B sends SYN to Process A for reconnection using
the same port, Process A will responds with an ACK for the last flow,
which has no increased sequence number.  Thus, Process A will send RST,
wait for TIMEOUT_INIT (one second in default), and then try
reconnection.  If reconnections are frequent, the one second latency
spikes can be a big problem.  Below is a tcpdump results of the problem:

    14.436259 IP 127.0.0.1.45150 &gt; 127.0.0.1.4242: Flags [S], seq 2560603644
    14.436266 IP 127.0.0.1.4242 &gt; 127.0.0.1.45150: Flags [.], ack 5, win 512
    14.436271 IP 127.0.0.1.45150 &gt; 127.0.0.1.4242: Flags [R], seq 2541101298
    /* ONE SECOND DELAY */
    15.464613 IP 127.0.0.1.45150 &gt; 127.0.0.1.4242: Flags [S], seq 2560603644

This commit mitigates the problem by reducing the delay for the next SYN
if the suspicous ACK is received while in SYN_SENT state.

Following commit will add a selftest, which can be also helpful for
understanding of this issue.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: SeongJae Park &lt;sjpark@amazon.de&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: clear tp-&gt;segs_{in|out} in tcp_disconnect()</title>
<updated>2020-02-01T06:12:37+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-01-31T18:44:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=784f8344de750a41344f4bbbebb8507a730fc99c'/>
<id>784f8344de750a41344f4bbbebb8507a730fc99c</id>
<content type='text'>
tp-&gt;segs_in and tp-&gt;segs_out need to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: 2efd055c53c0 ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tp-&gt;segs_in and tp-&gt;segs_out need to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: 2efd055c53c0 ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Marcelo Ricardo Leitner &lt;mleitner@redhat.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: clear tp-&gt;data_segs{in|out} in tcp_disconnect()</title>
<updated>2020-02-01T06:12:22+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-01-31T18:32:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=db7ffee6f3eb3683cdcaeddecc0a630a14546fe3'/>
<id>db7ffee6f3eb3683cdcaeddecc0a630a14546fe3</id>
<content type='text'>
tp-&gt;data_segs_in and tp-&gt;data_segs_out need to be cleared
in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tp-&gt;data_segs_in and tp-&gt;data_segs_out need to be cleared
in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: a44d6eacdaf5 ("tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tcp: clear tp-&gt;delivered in tcp_disconnect()</title>
<updated>2020-02-01T06:12:18+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2020-01-31T18:22:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2fbdd56251b5c62f96589f39eded277260de7267'/>
<id>2fbdd56251b5c62f96589f39eded277260de7267</id>
<content type='text'>
tp-&gt;delivered needs to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: ddf1af6fa00e ("tcp: new delivery accounting")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
tp-&gt;delivered needs to be cleared in tcp_disconnect().

tcp_disconnect() is rarely used, but it is worth fixing it.

Fixes: ddf1af6fa00e ("tcp: new delivery accounting")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Neal Cardwell &lt;ncardwell@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: Jakub Kicinski &lt;kuba@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
