linux-stable.git/net/ipv4/tcp_input.c, branch linux-4.2.y

tcp: initialize tp->copied_seq in case of cross SYN connection

2015-12-15T05:25:38+00:00

[ Upstream commit 142a2e7ece8d8ac0e818eb2c91f99ca894730e2a ]

Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.

Signed-off-by: Eric Dumazet 
Reported-by: Dmitry Vyukov 
Tested-by: Dmitry Vyukov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: fix potential huge kmalloc() calls in TCP_REPAIR

2015-12-15T05:25:37+00:00

[ Upstream commit 5d4c9bfbabdb1d497f21afd81501e5c54b0c85d9 ]

tcp_send_rcvq() is used for re-injecting data into tcp receive queue.

Problems :

- No check against size is performed, allowed user to fool kernel in
  attempting very large memory allocations, eventually triggering
  OOM when memory is fragmented.

- In case of fault during the copy we do not return correct errno.

Lets use alloc_skb_with_frags() to cook optimal skbs.

Fixes: 292e8d8c8538 ("tcp: Move rcvq sending to tcp_input.c")
Fixes: c0e88ff0f256 ("tcp: Repair socket queues")
Signed-off-by: Eric Dumazet 
Cc: Pavel Emelyanov 
Acked-by: Pavel Emelyanov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

tcp: don't use F-RTO on non-recurring timeouts

2015-07-16T00:17:21+00:00

Currently F-RTO may repeatedly send new data packets on non-recurring
timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682)
should only be used on either new recovery or recurring timeouts.

This exacerbates the recovery progress during frequent timeout &
repair, because we prioritize sending new data packets instead of
repairing the holes when the bandwidth is already scarce.

Fix it by correcting the test of a new recovery episode.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller

tcp: fill shinfo->gso_size at last moment

2015-06-11T23:33:11+00:00

In commit cd7d8498c9a5 ("tcp: change tcp_skb_pcount() location") we stored
gso_segs in a temporary cache hot location.

This patch does the same for gso_size.

This allows to save 2 cache line misses in tcp xmit path for
the last packet that is considered but not sent because of
various conditions (cwnd, tso defer, receiver window, TSQ...)

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: fill shinfo->gso_type at last moment

2015-06-11T23:33:11+00:00

Our goal is to touch skb_shinfo(skb) only when absolutely needed,
to avoid two cache line misses in TCP output path for last skb
that is considered but not sent because of various conditions
(cwnd, tso defer, receiver window, TSQ...)

A packet is GSO only when skb_shinfo(skb)->gso_size is not zero.

We can set skb_shinfo(skb)->gso_type to sk->sk_gso_type even for
non GSO packets.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: export tcp_enter_cwr()

2015-06-11T07:09:12+00:00

Upcoming tcp_cdg uses tcp_enter_cwr() to initiate PRR. Export this
function so that CDG can be compiled as a module.

Cc: Eric Dumazet 
Cc: Yuchung Cheng 
Cc: Stephen Hemminger 
Cc: Neal Cardwell 
Cc: David Hayes 
Cc: Andreas Petlund 
Cc: Dave Taht 
Cc: Nicolas Kuhn 
Signed-off-by: Kenneth Klette Jonassen 
Acked-by: Yuchung Cheng 
Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2015-05-23T05:22:35+00:00

Conflicts:
	drivers/net/ethernet/cadence/macb.c
	drivers/net/phy/phy.c
	include/linux/skbuff.h
	net/ipv4/tcp.c
	net/switchdev/switchdev.c

Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
renaming overlapping with net-next changes of various
sorts.

phy.c was a case of two changes, one adding a local
variable to a function whilst the second was removing
one.

tcp.c overlapped a deadlock fix with the addition of new tcp_info
statistic values.

macb.c involved the addition of two zyncq device entries.

skbuff.h involved adding back ipv4_daddr to nf_bridge_info
whilst net-next changes put two other existing members of
that struct into a union.

Signed-off-by: David S. Miller

tcp: fix a potential deadlock in tcp_get_info()

2015-05-22T17:46:06+00:00

Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.

We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.

[  523.722504] ======================================================
[  523.728706] [ INFO: possible circular locking dependency detected ]
[  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[  523.739202] -------------------------------------------------------
[  523.745474] ss/18032 is trying to acquire lock:
[  523.750002]  (slock-AF_INET){+.-...}, at: [] tcp_get_info+0x2c4/0x360
[  523.758129]
[  523.758129] but task is already holding lock:
[  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [] inet_diag_dump_icsk+0x1d5/0x6c0
[  523.774661]
[  523.774661] which lock already depends on the new lock.
[  523.774661]
[  523.782850]
[  523.782850] the existing dependency chain (in reverse order) is:
[  523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[  523.796599]        [] lock_acquire+0xbb/0x270
[  523.802565]        [] _raw_spin_lock+0x38/0x50
[  523.808628]        [] __inet_hash_nolisten+0x78/0x110
[  523.815273]        [] tcp_v4_syn_recv_sock+0x24b/0x350
[  523.822067]        [] tcp_check_req+0x3c1/0x500
[  523.828199]        [] tcp_v4_do_rcv+0x239/0x3d0
[  523.834331]        [] tcp_v4_rcv+0xa8e/0xc10
[  523.840202]        [] ip_local_deliver_finish+0x133/0x3e0
[  523.847214]        [] ip_local_deliver+0xaa/0xc0
[  523.853440]        [] ip_rcv_finish+0x168/0x5c0
[  523.859624]        [] ip_rcv+0x307/0x420

Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

tcp: don't over-send F-RTO probes

2015-05-19T20:36:57+00:00

After sending the new data packets to probe (step 2), F-RTO may
incorrectly send more probes if the next ACK advances SND_UNA and
does not sack new packet. However F-RTO RFC 5682 probes at most
once. This bug may cause sender to always send new data instead of
repairing holes, inducing longer HoL blocking on the receiver for
the application.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller

tcp: only undo on partial ACKs in CA_Loss

2015-05-19T20:36:57+00:00

Undo based on TCP timestamps should only happen on ACKs that advance
SND_UNA, according to the Eifel algorithm in RFC 3522:

Section 3.2:

  (4) If the value of the Timestamp Echo Reply field of the
      acceptable ACK's Timestamps option is smaller than the
      value of RetransmitTS, then proceed to step (5),

Section Terminology:
   We use the term 'acceptable ACK' as defined in [RFC793].  That is an
   ACK that acknowledges previously unacknowledged data.

This is because upon receiving an out-of-order packet, the receiver
returns the last timestamp that advances RCV_NXT, not the current
timestamp of the packet in the DUPACK. Without checking the flag,
the DUPACK will cause tcp_packet_delayed() to return true and
tcp_try_undo_loss() will revert cwnd reduction.

Note that we check the condition in CA_Recovery already by only
calling tcp_try_undo_partial() if FLAG_SND_UNA_ADVANCED is set or
tcp_try_undo_recovery() if snd_una crosses high_seq.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: David S. Miller