linux-stable.git/net/core/dev.c, branch v3.5.7

net: do not disable sg for packets requiring no checksum

2012-10-12T20:47:05+00:00

[ Upstream commit c0d680e577ff171e7b37dbdb1b1bf5451e851f04 ]

A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE).  Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.

  commit f01a5236bd4b140198fbcc550f085e8361fd73fa
  Author: Jesse Gross 
  Date:   Sun Jan 9 06:23:31 2011 +0000

      net offloading: Generalize netif_get_vlan_features().

The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming.  Calling it for a
protocol that requires no checksum is inappropriate.

The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets.  Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:

  http://www.spinics.net/lists/linux-mm/msg15184.html

The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:

  https://lkml.org/lkml/2012/8/28/140

The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.

Signed-off-by: Ed Cashin 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: small bug on rxhash calculation

2012-10-12T20:47:03+00:00

[ Upstream commit 6862234238e84648c305526af2edd98badcad1e0 ]

In the current rxhash calculation function, while the
sorting of the ports/addrs is coherent (you get the
same rxhash for packets sharing the same 4-tuple, in
both directions), ports and addrs are sorted
independently. This implies packets from a connection
between the same addresses but crossed ports hash to
the same rxhash.

For example, traffic between A=S:l and B=L:s is hashed
(in both directions) from {L, S, {s, l}}. The same
rxhash is obtained for packets between C=S:s and D=L:l.

This patch ensures that you either swap both addrs and ports,
or you swap none. Traffic between A and B, and traffic
between C and D, get their rxhash from different sources
({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)

The patch is co-written with Eric Dumazet 

Signed-off-by: Chema Gonzalez 
Signed-off-by: Eric Dumazet 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

af_packet: don't emit packet on orig fanout group

2012-10-02T17:38:54+00:00

[ Upstream commit c0de08d04215031d68fa13af36f347a6cfa252ca ]

If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.

Reported-by: Aleksandr Kotov 
Signed-off-by: Eric Leblond 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net/core: Fix potential memory leak in dev_set_alias()

2012-10-02T17:38:40+00:00

[ Upstream commit 7364e445f62825758fa61195d237a5b8ecdd06ec ]

Do not leak memory by updating pointer with potentially NULL realloc return value.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: Allow driver to limit number of GSO segments per skb

2012-10-02T17:38:39+00:00

[ Upstream commit 30b678d844af3305cda5953467005cebb5d7b687 ]

A peer (or local user) may cause TCP to use a nominal MSS of as little
as 88 (actual MSS of 76 with timestamps).  Given that we have a
sufficiently prodigious local sender and the peer ACKs quickly enough,
it is nevertheless possible to grow the window for such a connection
to the point that we will try to send just under 64K at once.  This
results in a single skb that expands to 861 segments.

In some drivers with TSO support, such an skb will require hundreds of
DMA descriptors; a substantial fraction of a TX ring or even more than
a full ring.  The TX queue selected for the skb may stall and trigger
the TX watchdog repeatedly (since the problem skb will be retried
after the TX reset).  This particularly affects sfc, for which the
issue is designated as CVE-2012-3412.

Therefore:
1. Add the field net_device::gso_max_segs holding the device-specific
   limit.
2. In netif_skb_features(), if the number of segments is too high then
   mask out GSO features to force fall back to software GSO.

Signed-off-by: Ben Hutchings 
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman

net: feed /dev/random with the MAC address when registering a device

2012-08-15T14:52:48+00:00

commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream.

Cc: David Miller 
Cc: Linus Torvalds 
Signed-off-by: "Theodore Ts'o" 
Signed-off-by: Greg Kroah-Hartman

net: Statically initialize init_net.dev_base_head

2012-07-18T20:32:27+00:00

This change eliminates an initialization-order hazard most
recently seen when netprio_cgroup is built into the kernel.

With thanks to Eric Dumazet for catching a bug.

Signed-off-by: Mark Rustad 
Acked-by: Eric Dumazet 
Signed-off-by: David S. Miller

net: cgroup: fix out of bounds accesses

2012-07-09T21:50:54+00:00

dev->priomap is allocated by extend_netdev_table() called from
update_netdev_tables().
And this is only called if write_priomap() is called.

But if write_priomap() is not called, it seems we can have out of bounds
accesses in cgrp_destroy(), read_priomap() & skb_update_prio()

With help from Gao Feng

Signed-off-by: Eric Dumazet 
Cc: Neil Horman 
Cc: Gao feng 
Acked-by: Gao feng 
Signed-off-by: David S. Miller

net: Downgrade CAP_SYS_MODULE deprecated message from error to warning.

2012-06-28T23:55:05+00:00

Make logging level consistent with other deprecation messages in net
subsystem.

Signed-off-by: Vinson Lee 
Cc: David Mackey 
Signed-off-by: David S. Miller

net: remove skb_orphan_try()

2012-06-15T22:30:15+00:00

Orphaning skb in dev_hard_start_xmit() makes bonding behavior
unfriendly for applications sending big UDP bursts : Once packets
pass the bonding device and come to real device, they might hit a full
qdisc and be dropped. Without orphaning, the sender is automatically
throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
sk_sndbuf is not too big)

We could try to defer the orphaning adding another test in
dev_hard_start_xmit(), but all this seems of little gain,
now that BQL tends to make packets more likely to be parked
in Qdisc queues instead of NIC TX ring, in cases where performance
matters.

Reverts commits :
fc6055a5ba31 net: Introduce skb_orphan_try()
87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try()
and removes SKBTX_DRV_NEEDS_SK_REF flag

Reported-and-bisected-by: Jean-Michel Hautbois 
Signed-off-by: Eric Dumazet 
Tested-by: Oliver Hartkopp 
Acked-by: Oliver Hartkopp 
Signed-off-by: David S. Miller