linux-stable.git/include/net/udp.h, branch v4.7.2

net: snmp: kill STATS_BH macros

2016-04-28T02:48:25+00:00

There is nothing related to BH in SNMP counters anymore,
since linux-3.0.

Rename helpers to use __ prefix instead of _BH prefix,
for contexts where preemption is disabled.

This more closely matches convention used to update
percpu variables.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: udp: rename UDP_INC_STATS_BH()

2016-04-28T02:48:23+00:00

Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(),
and UDP6_INC_STATS_BH() to __UDP6_INC_STATS()

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: snmp: kill various STATS_USER() helpers

2016-04-28T02:48:22+00:00

In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.

After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

udp: Add GRO functions to UDP socket

2016-04-07T20:53:29+00:00

This patch adds GRO functions (gro_receive and gro_complete) to UDP
sockets. udp_gro_receive is changed to perform socket lookup on a
packet. If a socket is found the related GRO functions are called.

This features obsoletes using UDP offload infrastructure for GRO
(udp_offload). This has the advantage of not being limited to provide
offload on a per port basis, GRO is now applied to whatever individual
UDP sockets are bound to.  This also allows the possbility of
"application defined GRO"-- that is we can attach something like
a BPF program to a UDP socket to perfrom GRO on an application
layer protocol.

Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller

udp: Add udp6_lib_lookup_skb and udp4_lib_lookup_skb

2016-04-07T20:53:14+00:00

Add externally visible functions to lookup a UDP socket by skb. This
will be used for GRO in UDP sockets. These functions also check
if skb->dst is set, and if it is not skb->dev is used to get dev_net.
This allows calling lookup functions before dst has been set on the
skbuff.

Signed-off-by: Tom Herbert 
Signed-off-by: David S. Miller

udp: remove headers from UDP packets before queueing

2016-04-05T20:29:37+00:00

Remove UDP transport headers before queueing packets for reception.
This change simplifies a follow-up patch to add MSG_PEEK support.

Signed-off-by: Sam Kumar 
Signed-off-by: Willem de Bruijn 
Signed-off-by: David S. Miller

udp: no longer use SLAB_DESTROY_BY_RCU

2016-04-05T02:11:19+00:00

Tom Herbert would like not touching UDP socket refcnt for encapsulated
traffic. For this to happen, we need to use normal RCU rules, with a grace
period before freeing a socket. UDP sockets are not short lived in the
high usage case, so the added cost of call_rcu() should not be a concern.

This actually removes a lot of complexity in UDP stack.

Multicast receives no longer need to hold a bucket spinlock.

Note that ip early demux still needs to take a reference on the socket.

Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
but this might be changed later.

Performance for a single UDP socket receiving flood traffic from
many RX queues/cpus.

Simple udp_rx using simple recvfrom() loop :
438 kpps instead of 374 kpps : 17 % increase of the peak rate.

v2: Addressed Willem de Bruijn feedback in multicast handling
 - keep early demux break in __udp4_lib_demux_lookup()

Signed-off-by: Eric Dumazet 
Cc: Tom Herbert 
Cc: Willem de Bruijn 
Tested-by: Tom Herbert 
Signed-off-by: David S. Miller

sock: struct proto hash function may error

2016-02-11T08:54:14+00:00

In order to support fast reuseport lookups in TCP, the hash function
defined in struct proto must be capable of returning an error code.
This patch changes the function signature of all related hash functions
to return an integer and handles or propagates this return value at
all call sites.

Signed-off-by: Craig Gallek 
Signed-off-by: David S. Miller

soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF

2016-01-05T03:49:59+00:00

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.

Signed-off-by: Craig Gallek 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller

soreuseport: fast reuseport UDP socket selection

2016-01-05T03:49:58+00:00

Include a struct sock_reuseport instance when a UDP socket binds to
a specific address for the first time with the reuseport flag set.
When selecting a socket for an incoming UDP packet, use the information
available in sock_reuseport if present.

This required adding an additional field to the UDP source address
equality function to differentiate between exact and wildcard matches.
The original use case allowed wildcard matches when checking for
existing port uses during bind.  The new use case of adding a socket
to a reuseport group requires exact address matching.

Performance test (using a machine with 2 CPU sockets and a total of
48 cores):  Create reuseport groups of varying size.  Use one socket
from this group per user thread (pinning each thread to a different
core) calling recvmmsg in a tight loop.  Record number of messages
received per second while saturating a 10G link.
  10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
  20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
  40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)

This work is based off a similar implementation written by
Ying Cai  for implementing policy-based reuseport
selection.

Signed-off-by: Craig Gallek 
Signed-off-by: David S. Miller