linux-stable.git/net/ipv4/proc.c, branch v4.14.331

tcp: tcp_fragment() should apply sane memory limits

2019-06-17T17:52:44+00:00

commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream.

Jonathan Looney reported that a malicious peer can force a sender
to fragment its retransmit queue into tiny skbs, inflating memory
usage and/or overflow 32bit counters.

TCP allows an application to queue up to sk_sndbuf bytes,
so we need to give some allowance for non malicious splitting
of retransmit queue.

A new SNMP counter is added to monitor how many times TCP
did not allow to split an skb if the allowance was exceeded.

Note that this counter might increase in the case applications
use SO_SNDBUF socket option to lower sk_sndbuf.

CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the
	socket is already using more than half the allowed space

Signed-off-by: Eric Dumazet 
Reported-by: Jonathan Looney 
Acked-by: Neal Cardwell 
Acked-by: Yuchung Cheng 
Reviewed-by: Tyler Hicks 
Cc: Bruce Curtis 
Cc: Jonathan Lemon 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

ip: discard IPv4 datagrams with overlapping segments.

2018-09-19T20:43:47+00:00

This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.

Tested: ran ip_defrag selftest (not yet available uptream).

Suggested-by: David S. Miller 
Signed-off-by: Peter Oskolkov 
Signed-off-by: Eric Dumazet 
Cc: Florian Westphal 
Acked-by: Stephen Hemminger 
Signed-off-by: David S. Miller 
(cherry picked from commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227)
Signed-off-by: Greg Kroah-Hartman

inet: frags: break the 2GB limit for frags storage

2018-09-19T20:43:46+00:00

Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) > nf->high_thresh)

Tested:

$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh



$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds                    3317150            0.0
IpReasmFails                    3317112            0.0

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
(cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714)
Signed-off-by: Greg Kroah-Hartman

inet: frags: remove some helpers

2018-09-19T20:43:46+00:00

Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()

Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405ab ("inet: frag: don't account number
of fragment queues")

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
(cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672)
Signed-off-by: Greg Kroah-Hartman

tcp: Revert "tcp: remove header prediction"

2017-08-30T18:20:09+00:00

This reverts commit 45f119bf936b1f9f546a0b139c5b56f9bb2bdc78.

Eric Dumazet says:
  We found at Google a significant regression caused by
  45f119bf936b1f9f546a0b139c5b56f9bb2bdc78 tcp: remove header prediction

  In typical RPC  (TCP_RR), when a TCP socket receives data, we now call
  tcp_ack() while we used to not call it.

  This touches enough cache lines to cause a slowdown.

so problem does not seem to be HP removal itself but the tcp_ack()
call.  Therefore, it might be possible to remove HP after all, provided
one finds a way to elide tcp_ack for most cases.

Reported-by: Eric Dumazet 
Signed-off-by: Florian Westphal 
Signed-off-by: David S. Miller

tcp: remove unused mib counters

2017-07-31T21:37:50+00:00

was used by tcp prequeue and header prediction.
TCPFORWARDRETRANS use was removed in january.

Signed-off-by: Florian Westphal 
Signed-off-by: David S. Miller

tcp: add TCPMemoryPressuresChrono counter

2017-06-08T15:26:19+00:00

DRAM supply shortage and poor memory pressure tracking in TCP
stack makes any change in SO_SNDBUF/SO_RCVBUF (or equivalent autotuning
limits) and tcp_mem[] quite hazardous.

TCPMemoryPressures SNMP counter is an indication of tcp_mem sysctl
limits being hit, but only tracking number of transitions.

If TCP stack behavior under stress was perfect :
1) It would maintain memory usage close to the limit.
2) Memory pressure state would be entered for short times.

We certainly prefer 100 events lasting 10ms compared to one event
lasting 200 seconds.

This patch adds a new SNMP counter tracking cumulative duration of
memory pressure events, given in ms units.

$ cat /proc/sys/net/ipv4/tcp_mem
3088    4117    6176
$ grep TCP /proc/net/sockstat
TCP: inuse 180 orphan 0 tw 2 alloc 234 mem 4140
$ nstat -n ; sleep 10 ; nstat |grep Pressure
TcpExtTCPMemoryPressures        1700
TcpExtTCPMemoryPressuresChrono  5209

v2: Used EXPORT_SYMBOL_GPL() instead of EXPORT_SYMBOL() as David
instructed.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net/tcp_fastopen: Add snmp counter for blackhole detection

2017-04-24T18:27:17+00:00

This counter records the number of times the firewall blackhole issue is
detected and active TFO is disabled.

Signed-off-by: Wei Wang 
Acked-by: Yuchung Cheng 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller

tcp: remove tcp_tw_recycle

2017-03-17T03:33:56+00:00

The tcp_tw_recycle was already broken for connections
behind NAT, since the per-destination timestamp is not
monotonically increasing for multiple machines behind
a single destination address.

After the randomization of TCP timestamp offsets
in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
for each connection), the tcp_tw_recycle is broken for all
types of connections for the same reason: the timestamps
received from a single machine is not monotonically increasing,
anymore.

Remove tcp_tw_recycle, since it is not functional. Also, remove
the PAWSPassive SNMP counter since it is only used for
tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
since the strict argument is only set when tcp_tw_recycle is
enabled.

Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Eric Dumazet 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Cc: Lutz Vieweg 
Cc: Florian Westphal 
Signed-off-by: David S. Miller

net: add LINUX_MIB_PFMEMALLOCDROP counter

2017-02-03T04:34:19+00:00

Debugging issues caused by pfmemalloc is often tedious.

Add a new SNMP counter to more easily diagnose these problems.

Signed-off-by: Eric Dumazet 
Cc: Josef Bacik 
Acked-by: Josef Bacik 
Signed-off-by: David S. Miller