linux-stable.git/net/core/skbuff.c, branch v3.7

net: gro: fix possible panic in skb_gro_receive()

2012-12-07T19:39:29+00:00

commit 2e71a6f8084e (net: gro: selective flush of packets) added
a bug for skbs using frag_list. This part of the GRO stack is rarely
used, as it needs skb not using a page fragment for their skb->head.

Most drivers do use a page fragment, but some of them use GFP_KERNEL
allocations for the initial fill of their RX ring buffer.

napi_gro_flush() overwrite skb->prev that was used for these skb to
point to the last skb in frag_list.

Fix this using a separate field in struct napi_gro_cb to point to the
last fragment.

Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: fix secpath kmemleak

2012-10-22T19:16:07+00:00

Mike Kazantsev found 3.5 kernels and beyond were leaking memory,
and tracked the faulty commit to a1c7fff7e18f59e ("net:
netdev_alloc_skb() use build_skb()")

While this commit seems fine, it uncovered a bug introduced
in commit bad43ca8325 ("net: introduce skb_try_coalesce()), in function
kfree_skb_partial()"):

If head is stolen, we free the sk_buff,
without removing references on secpath (skb->sp).

So IPsec + IP defrag/reassembly (using skb coalescing), or
TCP coalescing could leak secpath objects.

Fix this bug by calling skb_release_head_state(skb) to properly
release all possible references to linked objects.

Reported-by: Mike Kazantsev 
Signed-off-by: Eric Dumazet 
Bisected-by: Mike Kazantsev 
Tested-by: Mike Kazantsev 
Signed-off-by: David S. Miller

net: remove skb recycling

2012-10-07T04:40:54+00:00

Over time, skb recycling infrastructure got litle interest and
many bugs. Generic rx path skb allocation is now using page
fragments for efficient GRO / TCP coalescing, and recyling
a tx skb for rx path is not worth the pain.

Last identified bug is that fat skbs can be recycled
and it can endup using high order pages after few iterations.

With help from Maxime Bizon, who pointed out that commit
87151b8689d (net: allow pskb_expand_head() to get maximum tailroom)
introduced this regression for recycled skbs.

Instead of fixing this bug, lets remove skb recycling.

Drivers wanting really hot skbs should use build_skb() anyway,
to allocate/populate sk_buff right before netif_receive_skb()

Signed-off-by: Eric Dumazet 
Cc: Maxime Bizon 
Signed-off-by: David S. Miller

use skb_end_offset() in skb_try_coalesce()

2012-10-01T20:43:17+00:00

Commit ec47ea824774(skb: Add inline helper for getting the skb end offset from
head) introduces this helper function, skb_end_offset(),
we should make use of it.

Signed-off-by: Weiping Pan 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

2012-09-28T18:40:49+00:00

Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller

net: use bigger pages in __netdev_alloc_frag

2012-09-27T23:29:35+00:00

We currently use percpu order-0 pages in __netdev_alloc_frag
to deliver fragments used by __netdev_alloc_skb()

Depending on NIC driver and arch being 32 or 64 bit, it allows a page to
be split in several fragments (between 1 and 8), assuming PAGE_SIZE=4096

Switching to bigger pages (32768 bytes for PAGE_SIZE=4096 case) allows :

- Better filling of space (the ending hole overhead is less an issue)

- Less calls to page allocator or accesses to page->_count

- Could allow struct skb_shared_info futures changes without major
  performance impact.

This patch implements a transparent fallback to smaller
pages in case of memory pressure.

It also uses a standard "struct page_frag" instead of a custom one.

Signed-off-by: Eric Dumazet 
Cc: Alexander Duyck 
Cc: Benjamin LaHaise 
Signed-off-by: David S. Miller

net: use a per task frag allocator

2012-09-24T20:31:37+00:00

We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)

Signed-off-by: Eric Dumazet 
Cc: Ben Hutchings 
Cc: Vijay Subramanian 
Cc: Alexander Duyck 
Tested-by: Vijay Subramanian 
Signed-off-by: David S. Miller

net/core: fix comment in skb_try_coalesce

2012-09-19T21:29:13+00:00

It should be the skb which is not cloned

Signed-off-by: Li RongQing 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller

netvm: allow skb allocation to use PFMEMALLOC reserves

2012-08-01T01:42:46+00:00

Change the skb allocation API to indicate RX usage and use this to fall
back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
reserve and the socket is later found to be unrelated to page reclaim, the
packet is dropped so that the memory remains available for page reclaim.
Network protocols are expected to recover from this packet loss.

[a.p.zijlstra@chello.nl: Ideas taken from various patches]
[davem@davemloft.net: Use static branches, coding style corrections]
[sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
Signed-off-by: Mel Gorman 
Acked-by: David S. Miller 
Cc: Neil Brown 
Cc: Peter Zijlstra 
Cc: Mike Christie 
Cc: Eric B Munson 
Cc: Eric Dumazet 
Cc: Sebastian Andrzej Siewior 
Cc: Mel Gorman 
Cc: Christoph Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

skbuff: export skb_copy_ubufs

2012-07-22T19:39:33+00:00

Export skb_copy_ubufs so that modules can orphan frags.

Signed-off-by: Michael S. Tsirkin 
Signed-off-by: David S. Miller