<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/mm/percpu.c, branch v4.16.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()</title>
<updated>2018-03-19T16:38:50+00:00</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2018-03-19T15:32:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f52ba1fef7b92e74d58efef8eae7b6f48c6d218d'/>
<id>f52ba1fef7b92e74d58efef8eae7b6f48c6d218d</id>
<content type='text'>
In case of memory deficit and low percpu memory pages,
pcpu_balance_workfn() takes pcpu_alloc_mutex for a long
time (as it makes memory allocations itself and waits
for memory reclaim). If tasks doing pcpu_alloc() are
choosen by OOM killer, they can't exit, because they
are waiting for the mutex.

The patch makes pcpu_alloc() to care about killing signal
and use mutex_lock_killable(), when it's allowed by GFP
flags. This guarantees, a task does not miss SIGKILL
from OOM killer.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In case of memory deficit and low percpu memory pages,
pcpu_balance_workfn() takes pcpu_alloc_mutex for a long
time (as it makes memory allocations itself and waits
for memory reclaim). If tasks doing pcpu_alloc() are
choosen by OOM killer, they can't exit, because they
are waiting for the mutex.

The patch makes pcpu_alloc() to care about killing signal
and use mutex_lock_killable(), when it's allowed by GFP
flags. This guarantees, a task does not miss SIGKILL
from OOM killer.

Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: include linux/sched.h for cond_resched()</title>
<updated>2018-03-19T16:38:18+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-03-14T15:27:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=71546d100422bcc2c543dadeb9328728997cd23a'/>
<id>71546d100422bcc2c543dadeb9328728997cd23a</id>
<content type='text'>
microblaze build broke due to missing declaration of the
cond_resched() invocation added recently.  Let's include linux/sched.h
explicitly.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
microblaze build broke due to missing declaration of the
cond_resched() invocation added recently.  Let's include linux/sched.h
explicitly.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: kbuild test robot &lt;fengguang.wu@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: add a schedule point in pcpu_balance_workfn()</title>
<updated>2018-02-23T16:52:34+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2018-02-23T16:12:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=accd4f36a7d11c2d54544007eb65e10604dcf2f5'/>
<id>accd4f36a7d11c2d54544007eb65e10604dcf2f5</id>
<content type='text'>
When a large BPF percpu map is destroyed, I have seen
pcpu_balance_workfn() holding cpu for hundreds of milliseconds.

On KASAN config and 112 hyperthreads, average time to destroy a chunk
is ~4 ms.

[ 2489.841376] destroy chunk 1 in 4148689 ns
...
[ 2490.093428] destroy chunk 32 in 4072718 ns

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a large BPF percpu map is destroyed, I have seen
pcpu_balance_workfn() holding cpu for hundreds of milliseconds.

On KASAN config and 112 hyperthreads, average time to destroy a chunk
is ~4 ms.

[ 2489.841376] destroy chunk 1 in 4148689 ns
...
[ 2490.093428] destroy chunk 32 in 4072718 ns

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: allow select gfp to be passed to underlying allocators</title>
<updated>2018-02-18T13:33:01+00:00</updated>
<author>
<name>Dennis Zhou</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2018-02-16T18:09:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=554fef1c39ee148623a496e04569dabb11463406'/>
<id>554fef1c39ee148623a496e04569dabb11463406</id>
<content type='text'>
The prior patch added support for passing gfp flags through to the
underlying allocators. This patch allows users to pass along gfp flags
(currently only __GFP_NORETRY and __GFP_NOWARN) to the underlying
allocators. This should allow users to decide if they are ok with
failing allocations recovering in a more graceful way.

Additionally, gfp passing was done as additional flags in the previous
patch. Instead, change this to caller passed semantics. GFP_KERNEL is
also removed as the default flag. It continues to be used for internally
caused underlying percpu allocations.

V2:
Removed gfp_percpu_mask in favor of doing it inline.
Removed GFP_KERNEL as a default flag for __alloc_percpu_gfp.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Suggested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The prior patch added support for passing gfp flags through to the
underlying allocators. This patch allows users to pass along gfp flags
(currently only __GFP_NORETRY and __GFP_NOWARN) to the underlying
allocators. This should allow users to decide if they are ok with
failing allocations recovering in a more graceful way.

Additionally, gfp passing was done as additional flags in the previous
patch. Instead, change this to caller passed semantics. GFP_KERNEL is
also removed as the default flag. It continues to be used for internally
caused underlying percpu allocations.

V2:
Removed gfp_percpu_mask in favor of doing it inline.
Removed GFP_KERNEL as a default flag for __alloc_percpu_gfp.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Suggested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: add __GFP_NORETRY semantics to the percpu balancing path</title>
<updated>2018-02-18T13:33:00+00:00</updated>
<author>
<name>Dennis Zhou</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2018-02-16T18:07:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=47504ee04b9241548ae2c28be7d0b01cff3b7aa6'/>
<id>47504ee04b9241548ae2c28be7d0b01cff3b7aa6</id>
<content type='text'>
Percpu memory using the vmalloc area based chunk allocator lazily
populates chunks by first requesting the full virtual address space
required for the chunk and subsequently adding pages as allocations come
through. To ensure atomic allocations can succeed, a workqueue item is
used to maintain a minimum number of empty pages. In certain scenarios,
such as reported in [1], it is possible that physical memory becomes
quite scarce which can result in either a rather long time spent trying
to find free pages or worse, a kernel panic.

This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
through to the underlying allocators. This should prevent any
unnecessary panics potentially caused by the workqueue item. The passing
of gfp around is as additional flags rather than a full set of flags.
The next patch will change these to caller passed semantics.

V2:
Added const modifier to gfp flags in the balance path.
Removed an extra whitespace.

[1] https://lkml.org/lkml/2018/2/12/551

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Suggested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Percpu memory using the vmalloc area based chunk allocator lazily
populates chunks by first requesting the full virtual address space
required for the chunk and subsequently adding pages as allocations come
through. To ensure atomic allocations can succeed, a workqueue item is
used to maintain a minimum number of empty pages. In certain scenarios,
such as reported in [1], it is possible that physical memory becomes
quite scarce which can result in either a rather long time spent trying
to find free pages or worse, a kernel panic.

This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
through to the underlying allocators. This should prevent any
unnecessary panics potentially caused by the workqueue item. The passing
of gfp around is as additional flags rather than a full set of flags.
The next patch will change these to caller passed semantics.

V2:
Added const modifier to gfp flags in the balance path.
Removed an extra whitespace.

[1] https://lkml.org/lkml/2018/2/12/551

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Suggested-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: match chunk allocator declarations with definitions</title>
<updated>2018-02-18T13:33:00+00:00</updated>
<author>
<name>Dennis Zhou</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2018-02-15T16:08:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=15d9f3d116c02a485441d758d9ca0a2e4f3b30be'/>
<id>15d9f3d116c02a485441d758d9ca0a2e4f3b30be</id>
<content type='text'>
At some point the function declaration parameters got out of sync with
the function definitions in percpu-vm.c and percpu-km.c. This patch
makes them match again.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
At some point the function declaration parameters got out of sync with
the function definitions in percpu-vm.c and percpu-km.c. This patch
makes them match again.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Acked-by: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: hack to let the CRIS architecture to boot until they clean up</title>
<updated>2017-11-27T20:53:12+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nicolas.pitre@linaro.org</email>
</author>
<published>2017-11-27T20:51:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=abee210500ed15a22787009d9210b9a34911afcc'/>
<id>abee210500ed15a22787009d9210b9a34911afcc</id>
<content type='text'>
Commit 438a506180 ("percpu: don't forget to free the temporary struct
pcpu_alloc_info") uncovered a problem on the CRIS architecture where
the bootmem allocator is initialized with virtual addresses. Given it
has:

    #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

then things just work out because the end result is the same whether you
give this a physical or a virtual address.

Untill you call memblock_free_early(__pa(address)) that is, because
values from __pa() don't match with the virtual addresses stuffed in the
bootmem allocator anymore.

Avoid freeing the temporary pcpu_alloc_info memory on that architecture
until they fix things up to let the kernel boot like it did before.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 438a506180 ("percpu: don't forget to free the temporary struct pcpu_alloc_info")
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 438a506180 ("percpu: don't forget to free the temporary struct
pcpu_alloc_info") uncovered a problem on the CRIS architecture where
the bootmem allocator is initialized with virtual addresses. Given it
has:

    #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

then things just work out because the end result is the same whether you
give this a physical or a virtual address.

Untill you call memblock_free_early(__pa(address)) that is, because
values from __pa() don't match with the virtual addresses stuffed in the
bootmem allocator anymore.

Avoid freeing the temporary pcpu_alloc_info memory on that architecture
until they fix things up to let the kernel boot like it did before.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Fixes: 438a506180 ("percpu: don't forget to free the temporary struct pcpu_alloc_info")
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu</title>
<updated>2017-11-15T22:17:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-11-15T22:17:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=766ec76a27aa9dfdfee3a80f29ddc1f7539c71f9'/>
<id>766ec76a27aa9dfdfee3a80f29ddc1f7539c71f9</id>
<content type='text'>
Pull percpu update from Tejun Heo:
 "Another minor pull request. It only contains one commit which can
  reclaim a bit of memory wasted during boot on UP"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: don't forget to free the temporary struct pcpu_alloc_info
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull percpu update from Tejun Heo:
 "Another minor pull request. It only contains one commit which can
  reclaim a bit of memory wasted during boot on UP"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: don't forget to free the temporary struct pcpu_alloc_info
</pre>
</div>
</content>
</entry>
<entry>
<title>mm, percpu: add support for __GFP_NOWARN flag</title>
<updated>2017-10-19T12:13:49+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-10-17T14:55:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0ea7eeec24be5f04ae80d68f5b1ea3a11f49de2f'/>
<id>0ea7eeec24be5f04ae80d68f5b1ea3a11f49de2f</id>
<content type='text'>
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: don't forget to free the temporary struct pcpu_alloc_info</title>
<updated>2017-10-04T13:53:32+00:00</updated>
<author>
<name>Nicolas Pitre</name>
<email>nicolas.pitre@linaro.org</email>
</author>
<published>2017-10-03T22:29:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=438a50618095061920d3a30d4c5ca1ef2e0ff860'/>
<id>438a50618095061920d3a30d4c5ca1ef2e0ff860</id>
<content type='text'>
Unlike the SMP case, the !SMP case does not free the memory for struct
pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a
chance of being reused by the page allocator later, align it to a page
boundary just like its size.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Acked-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Unlike the SMP case, the !SMP case does not free the memory for struct
pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a
chance of being reused by the page allocator later, align it to a page
boundary just like its size.

Signed-off-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Acked-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
