linux-stable.git/net/core/dst.c, branch linux-4.16.y

net: Remove dst->next

2017-11-30T14:54:27+00:00

There are no more users.

Signed-off-by: David S. Miller 
Reviewed-by: Eric Dumazet

xfrm: Move dst->path into struct xfrm_dst

2017-11-30T14:54:26+00:00

The first member of an IPSEC route bundle chain sets it's dst->path to
the underlying ipv4/ipv6 route that carries the bundle.

Stated another way, if one were to follow the xfrm_dst->child chain of
the bundle, the final non-NULL pointer would be the path and point to
either an ipv4 or an ipv6 route.

This is largely used to make sure that PMTU events propagate down to
the correct ipv4 or ipv6 route.

When we don't have the top of an IPSEC bundle 'dst->path == dst'.

Move it down into xfrm_dst and key off of dst->xfrm.

Signed-off-by: David S. Miller 
Reviewed-by: Eric Dumazet

ipv6: Move dst->from into struct rt6_info.

2017-11-30T14:54:26+00:00

The dst->from value is only used by ipv6 routes to track where
a route "came from".

Any time we clone or copy a core ipv6 route in the ipv6 routing
tables, we have the copy/clone's ->from point to the base route.

This is used to handle route expiration properly.

Only ipv6 uses this mechanism, and only ipv6 code references
it.  So it is safe to move it into rt6_info.

Signed-off-by: David S. Miller 
Reviewed-by: Eric Dumazet

xfrm: Move child route linkage into xfrm_dst.

2017-11-30T14:54:26+00:00

XFRM bundle child chains look like this:

	xdst1 --> xdst2 --> xdst3 --> path_dst

All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL.
The final child pointer in the chain, here called 'path_dst', is some
other kind of route such as an ipv4 or ipv6 one.

The xfrm output path pops routes, one at a time, via the child
pointer, until we hit one which has a dst->xfrm pointer which
is NULL.

We can easily preserve the above mechanisms with child sitting
only in the xfrm_dst structure.  All children in the chain
before we break out of the xfrm_output() loop have dst->xfrm
non-NULL and are therefore xfrm_dst objects.

Since we break out of the loop when we find dst->xfrm NULL, we
will not try to dereference 'dst' as if it were an xfrm_dst.

Signed-off-by: David S. Miller

net: Create and use new helper xfrm_dst_child().

2017-11-30T14:54:25+00:00

Only IPSEC routes have a non-NULL dst->child pointer.  And IPSEC
routes are identified by a non-NULL dst->xfrm pointer.

Signed-off-by: David S. Miller

net: dst: move cpu inside ifdef to avoid compilation warning

2017-10-10T22:55:58+00:00

If CONFIG_DST_CACHE is not selected cpu variable
will be unused and we will see a compilation warning.
Move it under the ifdef.

Reported-by: kbuild test robot 
Fixes: d66f2b91f95b ("bpf: don't rely on the verifier lock for metadata_dst allocation")
Signed-off-by: Jakub Kicinski 
Signed-off-by: David S. Miller

bpf: don't rely on the verifier lock for metadata_dst allocation

2017-10-10T19:30:16+00:00

bpf_skb_set_tunnel_*() functions require allocation of per-cpu
metadata_dst.  The allocation happens upon verification of the
first program using those helpers.  In preparation for removing
the verifier lock, use cmpxchg() to make sure we only allocate
the metadata_dsts once.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Simon Horman 
Acked-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller

net: check type when freeing metadata dst

2017-08-21T17:57:38+00:00

Commit 3fcece12bc1b ("net: store port/representator id in metadata_dst")
added a new type field to metadata_dst, but metadata_dst_free() wasn't
updated to check it before freeing the METADATA_IP_TUNNEL specific dst
cache entry.

This is not currently causing problems since it's far enough back in the
struct to be zeroed for the only other type currently in existance
(METADATA_HW_PORT_MUX), but nevertheless it's not correct.

Fixes: 3fcece12bc1b ("net: store port/representator id in metadata_dst")
Signed-off-by: David Lamparter 
Cc: Jakub Kicinski 
Cc: Sridhar Samudrala 
Cc: Simon Horman 
Cc: David S. Miller 
Signed-off-by: David S. Miller

ipv4: convert dst_metrics.refcnt from atomic_t to refcount_t

2017-08-18T22:14:07+00:00

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: store port/representator id in metadata_dst

2017-06-25T15:42:01+00:00

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id.  This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission.  The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski 
Signed-off-by: Sridhar Samudrala 
Signed-off-by: Simon Horman 
Signed-off-by: David S. Miller