<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/net/core/skbuff.c, branch linux-4.3.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>net: check both type and procotol for tcp sockets</title>
<updated>2016-01-23T04:55:48+00:00</updated>
<author>
<name>WANG Cong</name>
<email>xiyou.wangcong@gmail.com</email>
</author>
<published>2015-12-17T07:39:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=45076a809b04108a8af7a6d55cf7e4d48ae9e43b'/>
<id>45076a809b04108a8af7a6d55cf7e4d48ae9e43b</id>
<content type='text'>
[ Upstream commit ac5cc977991d2dce85fc734a6c71ddb33f6fe3c1 ]

Dmitry reported the following out-of-bound access:

Call Trace:
 [&lt;ffffffff816cec2e&gt;] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
 [&lt;ffffffff84affb14&gt;] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
 [&lt;     inline     &gt;] SYSC_setsockopt net/socket.c:1746
 [&lt;ffffffff84aed7ee&gt;] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
 [&lt;ffffffff85c18c76&gt;] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

This is because we mistake a raw socket as a tcp socket.
We should check both sk-&gt;sk_type and sk-&gt;sk_protocol to ensure
it is a tcp socket.

Willem points out __skb_complete_tx_timestamp() needs to fix as well.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Willem de Bruijn &lt;willemdebruijn.kernel@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit ac5cc977991d2dce85fc734a6c71ddb33f6fe3c1 ]

Dmitry reported the following out-of-bound access:

Call Trace:
 [&lt;ffffffff816cec2e&gt;] __asan_report_load4_noabort+0x3e/0x40
mm/kasan/report.c:294
 [&lt;ffffffff84affb14&gt;] sock_setsockopt+0x1284/0x13d0 net/core/sock.c:880
 [&lt;     inline     &gt;] SYSC_setsockopt net/socket.c:1746
 [&lt;ffffffff84aed7ee&gt;] SyS_setsockopt+0x1fe/0x240 net/socket.c:1729
 [&lt;ffffffff85c18c76&gt;] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185

This is because we mistake a raw socket as a tcp socket.
We should check both sk-&gt;sk_type and sk-&gt;sk_protocol to ensure
it is a tcp socket.

Willem points out __skb_complete_tx_timestamp() needs to fix as well.

Reported-by: Dmitry Vyukov &lt;dvyukov@google.com&gt;
Cc: Willem de Bruijn &lt;willemdebruijn.kernel@gmail.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Signed-off-by: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Acked-by: Willem de Bruijn &lt;willemb@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: Fix offset error in skb_reorder_vlan_header</title>
<updated>2016-01-23T04:55:48+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vyasevich@gmail.com</email>
</author>
<published>2015-12-14T22:44:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=faaf14faa7cd26ea5b58212ae5308864c9dd3153'/>
<id>faaf14faa7cd26ea5b58212ae5308864c9dd3153</id>
<content type='text'>
[ Upstream commit f654861569872d10dcb79d9d7ca219b316f94ff0 ]

skb_reorder_vlan_header is called after the vlan header has
been pulled.  As a result the offset of the begining of
the mac header has been incrased by 4 bytes (VLAN_HLEN).
When moving the mac addresses, include this incrase in
the offset calcualation so that the mac addresses are
copied correctly.

Fixes: a6e18ff1117 (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off)
CC: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
CC: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: Vladislav Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit f654861569872d10dcb79d9d7ca219b316f94ff0 ]

skb_reorder_vlan_header is called after the vlan header has
been pulled.  As a result the offset of the begining of
the mac header has been incrased by 4 bytes (VLAN_HLEN).
When moving the mac addresses, include this incrase in
the offset calcualation so that the mac addresses are
copied correctly.

Fixes: a6e18ff1117 (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off)
CC: Nicolas Dichtel &lt;nicolas.dichtel@6wind.com&gt;
CC: Patrick McHardy &lt;kaber@trash.net&gt;
Signed-off-by: Vladislav Yasevich &lt;vyasevich@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vlan: Fix untag operations of stacked vlans with REORDER_HEADER off</title>
<updated>2016-01-23T04:55:48+00:00</updated>
<author>
<name>Vlad Yasevich</name>
<email>vyasevich@gmail.com</email>
</author>
<published>2015-11-16T20:43:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=adf6790b7576e17dc793022d664d39c9eaf9123e'/>
<id>adf6790b7576e17dc793022d664d39c9eaf9123e</id>
<content type='text'>
[ Upstream commit a6e18ff111701b4ff6947605bfbe9594ec42a6e8 ]

When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header.  Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).

As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.

To solve this, we use skb-&gt;mac_len as the offset.  The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.

Signed-off-by: Vladislav Yasevich &lt;vyasevic@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a6e18ff111701b4ff6947605bfbe9594ec42a6e8 ]

When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header.  Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).

As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.

To solve this, we use skb-&gt;mac_len as the offset.  The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.

Signed-off-by: Vladislav Yasevich &lt;vyasevic@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>skbuff: Fix skb checksum partial check.</title>
<updated>2015-09-29T23:48:46+00:00</updated>
<author>
<name>Pravin B Shelar</name>
<email>pshelar@nicira.com</email>
</author>
<published>2015-09-29T00:24:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=31b33dfb0a144469dd805514c9e63f4993729a48'/>
<id>31b33dfb0a144469dd805514c9e63f4993729a48</id>
<content type='text'>
Earlier patch 6ae459bda tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb-&gt;data.

Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.

Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: Andrew Vagin &lt;avagin@odin.com&gt;
Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Earlier patch 6ae459bda tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb-&gt;data.

Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.

Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: Andrew Vagin &lt;avagin@odin.com&gt;
Signed-off-by: Pravin B Shelar &lt;pshelar@nicira.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-08-28T04:45:31+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-08-28T04:45:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0d36938bb82a7775c21ce0a7429f08ba13d025b6'/>
<id>0d36938bb82a7775c21ce0a7429f08ba13d025b6</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>net-next: Fix warning while make xmldocs caused by skbuff.c</title>
<updated>2015-08-25T21:20:48+00:00</updated>
<author>
<name>Masanari Iida</name>
<email>standby24x7@gmail.com</email>
</author>
<published>2015-08-24T13:56:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d7499160107dd1367cf34873564b522a5516430c'/>
<id>d7499160107dd1367cf34873564b522a5516430c</id>
<content type='text'>
This patch fix following warnings.

.//net/core/skbuff.c:407: warning: No description found
for parameter 'len'
.//net/core/skbuff.c:407: warning: Excess function parameter
 'length' description in '__netdev_alloc_skb'
.//net/core/skbuff.c:476: warning: No description found
 for parameter 'len'
.//net/core/skbuff.c:476: warning: Excess function parameter
'length' description in '__napi_alloc_skb'

Signed-off-by: Masanari Iida &lt;standby24x7@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch fix following warnings.

.//net/core/skbuff.c:407: warning: No description found
for parameter 'len'
.//net/core/skbuff.c:407: warning: Excess function parameter
 'length' description in '__netdev_alloc_skb'
.//net/core/skbuff.c:476: warning: No description found
 for parameter 'len'
.//net/core/skbuff.c:476: warning: Excess function parameter
'length' description in '__napi_alloc_skb'

Signed-off-by: Masanari Iida &lt;standby24x7@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>mm: make page pfmemalloc check more robust</title>
<updated>2015-08-21T21:30:10+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2015-08-21T21:11:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2f064f3485cd29633ad1b3cfb00cc519509a3d72'/>
<id>2f064f3485cd29633ad1b3cfb00cc519509a3d72</id>
<content type='text'>
Commit c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb") added
checks for page-&gt;pfmemalloc to __skb_fill_page_desc():

        if (page-&gt;pfmemalloc &amp;&amp; !page-&gt;mapping)
                skb-&gt;pfmemalloc = true;

It assumes page-&gt;mapping == NULL implies that page-&gt;pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-&gt;mapping
to NULL and leave page-&gt;index value alone.  Due to being in union, a
non-zero page-&gt;index will be interpreted as true page-&gt;pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page-&gt;pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page-&gt;index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Debugged-by: Vlastimil Babka &lt;vbabka@suse.com&gt;
Debugged-by: Jiri Bohac &lt;jbohac@suse.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb") added
checks for page-&gt;pfmemalloc to __skb_fill_page_desc():

        if (page-&gt;pfmemalloc &amp;&amp; !page-&gt;mapping)
                skb-&gt;pfmemalloc = true;

It assumes page-&gt;mapping == NULL implies that page-&gt;pfmemalloc can be
trusted.  However, __delete_from_page_cache() can set set page-&gt;mapping
to NULL and leave page-&gt;index value alone.  Due to being in union, a
non-zero page-&gt;index will be interpreted as true page-&gt;pfmemalloc.

So the assumption is invalid if the networking code can see such a page.
And it seems it can.  We have encountered this with a NFS over loopback
setup when such a page is attached to a new skbuf.  There is no copying
going on in this case so the page confuses __skb_fill_page_desc which
interprets the index as pfmemalloc flag and the network stack drops
packets that have been allocated using the reserves unless they are to
be queued on sockets handling the swapping which is the case here and
that leads to hangs when the nfs client waits for a response from the
server which has been dropped and thus never arrive.

The struct page is already heavily packed so rather than finding another
hole to put it in, let's do a trick instead.  We can reuse the index
again but define it to an impossible value (-1UL).  This is the page
index so it should never see the value that large.  Replace all direct
users of page-&gt;pfmemalloc by page_is_pfmemalloc which will hide this
nastiness from unspoiled eyes.

The information will get lost if somebody wants to use page-&gt;index
obviously but that was the case before and the original code expected
that the information should be persisted somewhere else if that is
really needed (e.g.  what SLAB and SLUB do).

[akpm@linux-foundation.org: fix blooper in slub]
Fixes: c48a11c7ad26 ("netvm: propagate page-&gt;pfmemalloc to skb")
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Debugged-by: Vlastimil Babka &lt;vbabka@suse.com&gt;
Debugged-by: Jiri Bohac &lt;jbohac@suse.com&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Acked-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: &lt;stable@vger.kernel.org&gt;	[3.6+]
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: fix wrong skb_get() usage / crash in IGMP/MLD parsing code</title>
<updated>2015-08-14T00:08:39+00:00</updated>
<author>
<name>Linus Lüssing</name>
<email>linus.luessing@c0d3.blue</email>
</author>
<published>2015-08-13T03:54:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a516993f0ac1694673412eb2d16a091eafa77d2a'/>
<id>a516993f0ac1694673412eb2d16a091eafa77d2a</id>
<content type='text'>
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Linus Lüssing &lt;linus.luessing@c0d3.blue&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The recent refactoring of the IGMP and MLD parsing code into
ipv6_mc_check_mld() / ip_mc_check_igmp() introduced a potential crash /
BUG() invocation for bridges:

I wrongly assumed that skb_get() could be used as a simple reference
counter for an skb which is not the case. skb_get() bears additional
semantics, a user count. This leads to a BUG() invocation in
pskb_expand_head() / kernel panic if pskb_may_pull() is called on an skb
with a user count greater than one - unfortunately the refactoring did
just that.

Fixing this by removing the skb_get() call and changing the API: The
caller of ipv6_mc_check_mld() / ip_mc_check_igmp() now needs to
additionally check whether the returned skb_trimmed is a clone.

Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code")
Reported-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Linus Lüssing &lt;linus.luessing@c0d3.blue&gt;
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net</title>
<updated>2015-06-14T06:56:52+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2015-06-14T06:56:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=25c43bf13b1657d9a2f6a2565e9159ce31517aa5'/>
<id>25c43bf13b1657d9a2f6a2565e9159ce31517aa5</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>net: don't wait for order-3 page allocation</title>
<updated>2015-06-12T00:33:44+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-06-11T23:50:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fb05e7a89f500cfc06ae277bdc911b281928995d'/>
<id>fb05e7a89f500cfc06ae277bdc911b281928995d</id>
<content type='text'>
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Debabrata Banerjee &lt;dbavatar@gmail.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Debabrata Banerjee &lt;dbavatar@gmail.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
