linux.git/Documentation/networking/ip-sysctl.txt, branch v4.17

Merge tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

2018-04-27T16:37:12+00:00

Pull staging fixes from Greg KH:
 "Here are two staging driver fixups for 4.17-rc3.

  The first is the remaining stragglers of the irda code removal that
  you pointed out during the merge window. The second is a fix for the
  wilc1000 driver due to a patch that got merged in 4.17-rc1.

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: wilc1000: fix NULL pointer exception in host_int_parse_assoc_resp_info()
  staging: irda: remove remaining remants of irda code removal

docs: ip-sysctl.txt: fix name of some ipv6 variables

2018-04-19T19:20:09+00:00

The name of the following proc/sysctl entries were incorrectly
documented:

    /proc/sys/net/ipv6/conf//max_dst_opts_number
    /proc/sys/net/ipv6/conf//max_hbt_opts_number
    /proc/sys/net/ipv6/conf//max_dst_opts_length
    /proc/sys/net/ipv6/conf//max_hbt_length

Their name was set to the name of the symbol in the .data field of the
control table instead of their .proc name.

Signed-off-by: Olivier Gayot 
Signed-off-by: David S. Miller

staging: irda: remove remaining remants of irda code removal

2018-04-16T09:26:49+00:00

There were some documentation locations that irda was mentioned, as well
as an old MAINTAINERS entry and the networking sysctl entries.  Clean
these all out as this stuff really is finally gone.

Reported-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

inet: frags: break the 2GB limit for frags storage

2018-04-01T03:25:39+00:00

Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.

Current memory tracking is using one atomic_t and integers.

Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.

Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :

if (... || frag_mem_limit(nf) > nf->high_thresh)

Tested:

$ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh



$ grep FRAG /proc/net/sockstat
FRAG: inuse 14705885 memory 16000002880

$ nstat -n ; sleep 1 ; nstat | grep Reas
IpReasmReqds                    3317150            0.0
IpReasmFails                    3317112            0.0

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

inet: frags: use rhashtables for reassembly units

2018-04-01T03:25:39+00:00

Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.

It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)

A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.

This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.

Then there is the problem of sharing this hash table for all netns.

It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.

Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.

Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.

After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)

$ grep FRAG /proc/net/sockstat
FRAG: inuse 1966916 memory 2140004608

A followup patch will change the limits for 64bit arches.

Signed-off-by: Eric Dumazet 
Cc: Kirill Tkhai 
Cc: Herbert Xu 
Cc: Florian Westphal 
Cc: Jesper Dangaard Brouer 
Cc: Alexander Aring 
Cc: Stefan Schmidt 
Signed-off-by: David S. Miller

Documentation: ip-sysctl.txt: clarify disable_ipv6

2018-03-30T16:20:52+00:00

Clarify that when disable_ipv6 is enabled even the ipv6 routes
are deleted for the selected interface and from now it will not
be possible to add addresses/routes to that interface

Signed-off-by: Lorenzo Bianconi 
Signed-off-by: David S. Miller

doc: Change the udp/sctp rmem/wmem default value.

2018-03-16T16:03:30+00:00

The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096.

Signed-off-by: Tonghao Zhang 
Signed-off-by: David S. Miller

net/ipv6: Add support for path selection using hash of 5-tuple

2018-03-04T18:04:23+00:00

Some operators prefer IPv6 path selection to use a standard 5-tuple
hash rather than just an L3 hash with the flow the label. To that end
add support to IPv6 for multipath hash policy similar to bf4e0a3db97eb
("net: ipv4: add support for ECMP hash policy choice"). The default
is still L3 which covers source and destination addresses along with
flow label and IPv6 protocol.

Signed-off-by: David Ahern 
Reviewed-by: Ido Schimmel 
Tested-by: Ido Schimmel 
Reviewed-by: Nikolay Aleksandrov 
Signed-off-by: David S. Miller

doc: Change the min default value of tcp_wmem/tcp_rmem.

2018-02-05T15:05:49+00:00

The SK_MEM_QUANTUM was changed from PAGE_SIZE to 4096. And the
tcp_wmem/tcp_rmem min default values are 4096.

Fixes: bd68a2a854ad ("net: set SK_MEM_QUANTUM to 4096")
Cc: Eric Dumazet 
Signed-off-by: Tonghao Zhang 
Signed-off-by: David S. Miller

tcp: pause Fast Open globally after third consecutive timeout

2017-12-13T20:51:12+00:00

Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.

This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout.  Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.

Reported-by: Dragana Damjanovic 
Reported-by: Patrick McManus 
Signed-off-by: Yuchung Cheng 
Reviewed-by: Wei Wang 
Reviewed-by: Neal Cardwell 
Reviewed-by: Eric Dumazet 
Signed-off-by: David S. Miller