linux-stable.git/mm/percpu.c, branch v4.16.2

mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()

2018-03-19T16:38:50+00:00

In case of memory deficit and low percpu memory pages,
pcpu_balance_workfn() takes pcpu_alloc_mutex for a long
time (as it makes memory allocations itself and waits
for memory reclaim). If tasks doing pcpu_alloc() are
choosen by OOM killer, they can't exit, because they
are waiting for the mutex.

The patch makes pcpu_alloc() to care about killing signal
and use mutex_lock_killable(), when it's allowed by GFP
flags. This guarantees, a task does not miss SIGKILL
from OOM killer.

Signed-off-by: Kirill Tkhai 
Signed-off-by: Tejun Heo

percpu: include linux/sched.h for cond_resched()

2018-03-19T16:38:18+00:00

microblaze build broke due to missing declaration of the
cond_resched() invocation added recently.  Let's include linux/sched.h
explicitly.

Signed-off-by: Tejun Heo 
Reported-by: kbuild test robot

percpu: add a schedule point in pcpu_balance_workfn()

2018-02-23T16:52:34+00:00

When a large BPF percpu map is destroyed, I have seen
pcpu_balance_workfn() holding cpu for hundreds of milliseconds.

On KASAN config and 112 hyperthreads, average time to destroy a chunk
is ~4 ms.

[ 2489.841376] destroy chunk 1 in 4148689 ns
...
[ 2490.093428] destroy chunk 32 in 4072718 ns

Signed-off-by: Eric Dumazet 
Signed-off-by: Tejun Heo

percpu: allow select gfp to be passed to underlying allocators

2018-02-18T13:33:01+00:00

The prior patch added support for passing gfp flags through to the
underlying allocators. This patch allows users to pass along gfp flags
(currently only __GFP_NORETRY and __GFP_NOWARN) to the underlying
allocators. This should allow users to decide if they are ok with
failing allocations recovering in a more graceful way.

Additionally, gfp passing was done as additional flags in the previous
patch. Instead, change this to caller passed semantics. GFP_KERNEL is
also removed as the default flag. It continues to be used for internally
caused underlying percpu allocations.

V2:
Removed gfp_percpu_mask in favor of doing it inline.
Removed GFP_KERNEL as a default flag for __alloc_percpu_gfp.

Signed-off-by: Dennis Zhou 
Suggested-by: Daniel Borkmann 
Acked-by: Christoph Lameter 
Signed-off-by: Tejun Heo

percpu: add __GFP_NORETRY semantics to the percpu balancing path

2018-02-18T13:33:00+00:00

Percpu memory using the vmalloc area based chunk allocator lazily
populates chunks by first requesting the full virtual address space
required for the chunk and subsequently adding pages as allocations come
through. To ensure atomic allocations can succeed, a workqueue item is
used to maintain a minimum number of empty pages. In certain scenarios,
such as reported in [1], it is possible that physical memory becomes
quite scarce which can result in either a rather long time spent trying
to find free pages or worse, a kernel panic.

This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
through to the underlying allocators. This should prevent any
unnecessary panics potentially caused by the workqueue item. The passing
of gfp around is as additional flags rather than a full set of flags.
The next patch will change these to caller passed semantics.

V2:
Added const modifier to gfp flags in the balance path.
Removed an extra whitespace.

[1] https://lkml.org/lkml/2018/2/12/551

Signed-off-by: Dennis Zhou 
Suggested-by: Daniel Borkmann 
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Acked-by: Christoph Lameter 
Signed-off-by: Tejun Heo

percpu: match chunk allocator declarations with definitions

2018-02-18T13:33:00+00:00

At some point the function declaration parameters got out of sync with
the function definitions in percpu-vm.c and percpu-km.c. This patch
makes them match again.

Signed-off-by: Dennis Zhou 
Acked-by: Christoph Lameter 
Signed-off-by: Tejun Heo

percpu: hack to let the CRIS architecture to boot until they clean up

2017-11-27T20:53:12+00:00

Commit 438a506180 ("percpu: don't forget to free the temporary struct
pcpu_alloc_info") uncovered a problem on the CRIS architecture where
the bootmem allocator is initialized with virtual addresses. Given it
has:

    #define __va(x) ((void *)((unsigned long)(x) | 0x80000000))

then things just work out because the end result is the same whether you
give this a physical or a virtual address.

Untill you call memblock_free_early(__pa(address)) that is, because
values from __pa() don't match with the virtual addresses stuffed in the
bootmem allocator anymore.

Avoid freeing the temporary pcpu_alloc_info memory on that architecture
until they fix things up to let the kernel boot like it did before.

Signed-off-by: Nicolas Pitre 
Signed-off-by: Tejun Heo 
Fixes: 438a506180 ("percpu: don't forget to free the temporary struct pcpu_alloc_info")

Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu

2017-11-15T22:17:11+00:00

Pull percpu update from Tejun Heo:
 "Another minor pull request. It only contains one commit which can
  reclaim a bit of memory wasted during boot on UP"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: don't forget to free the temporary struct pcpu_alloc_info

mm, percpu: add support for __GFP_NOWARN flag

2017-10-19T12:13:49+00:00

Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann 
Cc: Tejun Heo 
Cc: Mark Rutland 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller

percpu: don't forget to free the temporary struct pcpu_alloc_info

2017-10-04T13:53:32+00:00

Unlike the SMP case, the !SMP case does not free the memory for struct
pcpu_alloc_info allocated in setup_per_cpu_areas(). And to give it a
chance of being reused by the page allocator later, align it to a page
boundary just like its size.

Signed-off-by: Nicolas Pitre 
Acked-by: Dennis Zhou 
Signed-off-by: Tejun Heo