linux-stable.git/mm/percpu.c, branch v4.14.78

percpu: stop leaking bitmap metadata blocks

2018-10-18T07:16:23+00:00

commit 6685b357363bfe295e3ae73665014db4aed62c58 upstream.

The commit ca460b3c9627 ("percpu: introduce bitmap metadata blocks")
introduced bitmap metadata blocks. These metadata blocks are allocated
whenever a new chunk is created, but they are never freed. Fix it.

Fixes: ca460b3c9627 ("percpu: introduce bitmap metadata blocks")
Signed-off-by: Mike Rapoport 
Cc: stable@vger.kernel.org
Signed-off-by: Dennis Zhou 
Signed-off-by: Greg Kroah-Hartman

percpu: include linux/sched.h for cond_resched()

2018-05-09T07:51:48+00:00

commit 71546d100422bcc2c543dadeb9328728997cd23a upstream.

microblaze build broke due to missing declaration of the
cond_resched() invocation added recently.  Let's include linux/sched.h
explicitly.

Signed-off-by: Tejun Heo 
Reported-by: kbuild test robot 
Cc: Guenter Roeck 
Signed-off-by: Greg Kroah-Hartman

percpu: add __GFP_NORETRY semantics to the percpu balancing path

2018-04-08T12:26:29+00:00

commit 47504ee04b9241548ae2c28be7d0b01cff3b7aa6 upstream.

Percpu memory using the vmalloc area based chunk allocator lazily
populates chunks by first requesting the full virtual address space
required for the chunk and subsequently adding pages as allocations come
through. To ensure atomic allocations can succeed, a workqueue item is
used to maintain a minimum number of empty pages. In certain scenarios,
such as reported in [1], it is possible that physical memory becomes
quite scarce which can result in either a rather long time spent trying
to find free pages or worse, a kernel panic.

This patch adds support for __GFP_NORETRY and __GFP_NOWARN passing them
through to the underlying allocators. This should prevent any
unnecessary panics potentially caused by the workqueue item. The passing
of gfp around is as additional flags rather than a full set of flags.
The next patch will change these to caller passed semantics.

V2:
Added const modifier to gfp flags in the balance path.
Removed an extra whitespace.

[1] https://lkml.org/lkml/2018/2/12/551

Signed-off-by: Dennis Zhou 
Suggested-by: Daniel Borkmann 
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Acked-by: Christoph Lameter 
Signed-off-by: Tejun Heo 
Signed-off-by: Greg Kroah-Hartman

mm, percpu: add support for __GFP_NOWARN flag

2017-10-19T12:13:49+00:00

Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann 
Cc: Tejun Heo 
Cc: Mark Rutland 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller

percpu: fix iteration to prevent skipping over block

2017-09-28T14:39:27+00:00

The iterator functions pcpu_next_md_free_region and
pcpu_next_fit_region use the block offset to determine if they have
checked the area in the prior iteration. However, this causes an issue
when the block offset is greater than subsequent block contig hints. If
within the iterator it moves to check subsequent blocks, it may fail in
the second predicate due to the block offset not being cleared. Thus,
this causes the allocator to skip over blocks leading to false failures
when allocating from the reserved chunk. While this happens in the
general case as well, it will only fail if it cannot allocate a new
chunk.

This patch resets the block offset to 0 to pass the second predicate
when checking subseqent blocks within the iterator function.

Signed-off-by: Dennis Zhou 
Reported-and-tested-by: Luis Henriques 
Signed-off-by: Tejun Heo

percpu: update header to contain bitmap allocator explanation.

2017-07-26T21:41:06+00:00

The other patches contain a lot of information, so adding this
information in a separate patch. It adds my copyright and a brief
explanation of how the bitmap allocator works. There is a minor typo as
well in the prior explanation so that is fixed.

Signed-off-by: Dennis Zhou 
Reviewed-by: Josef Bacik 
Signed-off-by: Tejun Heo

percpu: update pcpu_find_block_fit to use an iterator

2017-07-26T21:41:06+00:00

The simple, and expensive, way to find a free area is to iterate over
the entire bitmap until an area is found that fits the allocation size
and alignment. This patch makes use of an iterate that find an area to
check by using the block level contig hints. It will only return an area
that can fit the size and alignment request. If the request can fit
inside a block, it returns the first_free bit to start checking from to
see if it can be fulfilled prior to the contig hint. The pcpu_alloc_area
check has a bound of a block size added in case it is wrong.

Signed-off-by: Dennis Zhou 
Reviewed-by: Josef Bacik 
Signed-off-by: Tejun Heo

percpu: use metadata blocks to update the chunk contig hint

2017-07-26T21:41:06+00:00

The largest free region will either be a block level contig hint or an
aggregate over the left_free and right_free areas of blocks. This is a
much smaller set of free areas that need to be checked than a full
traverse.

Signed-off-by: Dennis Zhou 
Reviewed-by: Josef Bacik 
Signed-off-by: Tejun Heo

percpu: update free path to take advantage of contig hints

2017-07-26T21:41:06+00:00

The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entire
chunk. This is rather expensive.

The free path can take advantage of current contig hints to prevent
scanning within the start and end block.  If a scan is needed, it can
be done by scanning backwards from the start and forwards from the end
to identify the entire free area this can be combined with. The blocks
can then be updated by some basic checks rather than complete block
scans.

A chunk scan happens when the freed area makes a page free, a block
free, or spans across blocks. This is necessary as the contig hint at
this point could span across blocks. The check uses the minimum of page
size and the block size to allow for variable sized blocks. There is a
tradeoff here with not updating after every free. It is possible a
contig hint in one block can be merged with the contig hint in the next
block. This means the contig hint can be off by up to a page. However,
if the chunk's contig hint is contained in one block, the contig hint
will be accurate.

Signed-off-by: Dennis Zhou 
Reviewed-by: Josef Bacik 
Signed-off-by: Tejun Heo

percpu: update alloc path to only scan if contig hints are broken

2017-07-26T21:41:06+00:00

Metadata is kept per block to keep track of where the contig hints are.
Scanning can be avoided when the contig hints are not broken. In that
case, left and right contigs have to be managed manually.

This patch changes the allocation path hint updating to only scan when
contig hints are broken.

Signed-off-by: Dennis Zhou 
Reviewed-by: Josef Bacik 
Signed-off-by: Tejun Heo