<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/mm/percpu.c, branch v4.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>mm, percpu: add support for __GFP_NOWARN flag</title>
<updated>2017-10-19T12:13:49+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2017-10-17T14:55:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0ea7eeec24be5f04ae80d68f5b1ea3a11f49de2f'/>
<id>0ea7eeec24be5f04ae80d68f5b1ea3a11f49de2f</id>
<content type='text'>
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add an option for pcpu_alloc() to support __GFP_NOWARN flag.
Currently, we always throw a warning when size or alignment
is unsupported (and also dump stack on failed allocation
requests). The warning itself is harmless since we return
NULL anyway for any failed request, which callers are
required to handle anyway. However, it becomes harmful when
panic_on_warn is set.

The rationale for the WARN() in pcpu_alloc() is that it can
be tracked when larger than supported allocation requests are
made such that allocations limits can be tweaked if warranted.
This makes sense for in-kernel users, however, there are users
of pcpu allocator where allocation size is derived from user
space requests, e.g. when creating BPF maps. In these cases,
the requests should fail gracefully without throwing a splat.

The current work-around was to check allocation size against
the upper limit of PCPU_MIN_UNIT_SIZE from call-sites for
bailing out prior to a call to pcpu_alloc() in order to
avoid throwing the WARN(). This is bad in multiple ways since
PCPU_MIN_UNIT_SIZE is an implementation detail, and having
the checks on call-sites only complicates the code for no
good reason. Thus, lets fix it generically by supporting the
__GFP_NOWARN flag that users can then use with calling the
__alloc_percpu_gfp() helper instead.

Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: fix iteration to prevent skipping over block</title>
<updated>2017-09-28T14:39:27+00:00</updated>
<author>
<name>Dennis Zhou</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-09-27T21:35:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1fa4df3e688902d033dfda796eb83ae6ad8d0488'/>
<id>1fa4df3e688902d033dfda796eb83ae6ad8d0488</id>
<content type='text'>
The iterator functions pcpu_next_md_free_region and
pcpu_next_fit_region use the block offset to determine if they have
checked the area in the prior iteration. However, this causes an issue
when the block offset is greater than subsequent block contig hints. If
within the iterator it moves to check subsequent blocks, it may fail in
the second predicate due to the block offset not being cleared. Thus,
this causes the allocator to skip over blocks leading to false failures
when allocating from the reserved chunk. While this happens in the
general case as well, it will only fail if it cannot allocate a new
chunk.

This patch resets the block offset to 0 to pass the second predicate
when checking subseqent blocks within the iterator function.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reported-and-tested-by: Luis Henriques &lt;lhenriques@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The iterator functions pcpu_next_md_free_region and
pcpu_next_fit_region use the block offset to determine if they have
checked the area in the prior iteration. However, this causes an issue
when the block offset is greater than subsequent block contig hints. If
within the iterator it moves to check subsequent blocks, it may fail in
the second predicate due to the block offset not being cleared. Thus,
this causes the allocator to skip over blocks leading to false failures
when allocating from the reserved chunk. While this happens in the
general case as well, it will only fail if it cannot allocate a new
chunk.

This patch resets the block offset to 0 to pass the second predicate
when checking subseqent blocks within the iterator function.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reported-and-tested-by: Luis Henriques &lt;lhenriques@suse.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: update header to contain bitmap allocator explanation.</title>
<updated>2017-07-26T21:41:06+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5e81ee3e6a79cc9fa85af5c3db0f1f269709bbf1'/>
<id>5e81ee3e6a79cc9fa85af5c3db0f1f269709bbf1</id>
<content type='text'>
The other patches contain a lot of information, so adding this
information in a separate patch. It adds my copyright and a brief
explanation of how the bitmap allocator works. There is a minor typo as
well in the prior explanation so that is fixed.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The other patches contain a lot of information, so adding this
information in a separate patch. It adds my copyright and a brief
explanation of how the bitmap allocator works. There is a minor typo as
well in the prior explanation so that is fixed.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: update pcpu_find_block_fit to use an iterator</title>
<updated>2017-07-26T21:41:06+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b4c2116cfae65b09761b7ba34453733e745a6f77'/>
<id>b4c2116cfae65b09761b7ba34453733e745a6f77</id>
<content type='text'>
The simple, and expensive, way to find a free area is to iterate over
the entire bitmap until an area is found that fits the allocation size
and alignment. This patch makes use of an iterate that find an area to
check by using the block level contig hints. It will only return an area
that can fit the size and alignment request. If the request can fit
inside a block, it returns the first_free bit to start checking from to
see if it can be fulfilled prior to the contig hint. The pcpu_alloc_area
check has a bound of a block size added in case it is wrong.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The simple, and expensive, way to find a free area is to iterate over
the entire bitmap until an area is found that fits the allocation size
and alignment. This patch makes use of an iterate that find an area to
check by using the block level contig hints. It will only return an area
that can fit the size and alignment request. If the request can fit
inside a block, it returns the first_free bit to start checking from to
see if it can be fulfilled prior to the contig hint. The pcpu_alloc_area
check has a bound of a block size added in case it is wrong.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: use metadata blocks to update the chunk contig hint</title>
<updated>2017-07-26T21:41:06+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=525ca84daec01825b0d037f5fcf60adb7f510118'/>
<id>525ca84daec01825b0d037f5fcf60adb7f510118</id>
<content type='text'>
The largest free region will either be a block level contig hint or an
aggregate over the left_free and right_free areas of blocks. This is a
much smaller set of free areas that need to be checked than a full
traverse.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The largest free region will either be a block level contig hint or an
aggregate over the left_free and right_free areas of blocks. This is a
much smaller set of free areas that need to be checked than a full
traverse.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: update free path to take advantage of contig hints</title>
<updated>2017-07-26T21:41:06+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b185cd0dc61c14875155e7bcc3f2c139b6feefd2'/>
<id>b185cd0dc61c14875155e7bcc3f2c139b6feefd2</id>
<content type='text'>
The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entire
chunk. This is rather expensive.

The free path can take advantage of current contig hints to prevent
scanning within the start and end block.  If a scan is needed, it can
be done by scanning backwards from the start and forwards from the end
to identify the entire free area this can be combined with. The blocks
can then be updated by some basic checks rather than complete block
scans.

A chunk scan happens when the freed area makes a page free, a block
free, or spans across blocks. This is necessary as the contig hint at
this point could span across blocks. The check uses the minimum of page
size and the block size to allow for variable sized blocks. There is a
tradeoff here with not updating after every free. It is possible a
contig hint in one block can be merged with the contig hint in the next
block. This means the contig hint can be off by up to a page. However,
if the chunk's contig hint is contained in one block, the contig hint
will be accurate.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The bitmap allocator must keep metadata consistent. The easiest way is
to scan after every allocation for each affected block and the entire
chunk. This is rather expensive.

The free path can take advantage of current contig hints to prevent
scanning within the start and end block.  If a scan is needed, it can
be done by scanning backwards from the start and forwards from the end
to identify the entire free area this can be combined with. The blocks
can then be updated by some basic checks rather than complete block
scans.

A chunk scan happens when the freed area makes a page free, a block
free, or spans across blocks. This is necessary as the contig hint at
this point could span across blocks. The check uses the minimum of page
size and the block size to allow for variable sized blocks. There is a
tradeoff here with not updating after every free. It is possible a
contig hint in one block can be merged with the contig hint in the next
block. This means the contig hint can be off by up to a page. However,
if the chunk's contig hint is contained in one block, the contig hint
will be accurate.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: update alloc path to only scan if contig hints are broken</title>
<updated>2017-07-26T21:41:06+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fc3043345a648a49978c6fb0bf8c188b7cfe0ab3'/>
<id>fc3043345a648a49978c6fb0bf8c188b7cfe0ab3</id>
<content type='text'>
Metadata is kept per block to keep track of where the contig hints are.
Scanning can be avoided when the contig hints are not broken. In that
case, left and right contigs have to be managed manually.

This patch changes the allocation path hint updating to only scan when
contig hints are broken.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Metadata is kept per block to keep track of where the contig hints are.
Scanning can be avoided when the contig hints are not broken. In that
case, left and right contigs have to be managed manually.

This patch changes the allocation path hint updating to only scan when
contig hints are broken.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: keep track of the best offset for contig hints</title>
<updated>2017-07-26T21:41:05+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=268625a6f9df6a7c9b0ae7707a8a1cd5a9993bd2'/>
<id>268625a6f9df6a7c9b0ae7707a8a1cd5a9993bd2</id>
<content type='text'>
This patch makes the contig hint starting offset optimization from the
previous patch as honest as it can be. For both chunk and block starting
offsets, make sure it keeps the starting offset with the best alignment.

The block skip optimization is added in a later patch when the
pcpu_find_block_fit iterator is swapped in.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch makes the contig hint starting offset optimization from the
previous patch as honest as it can be. For both chunk and block starting
offsets, make sure it keeps the starting offset with the best alignment.

The block skip optimization is added in a later patch when the
pcpu_find_block_fit iterator is swapped in.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: skip chunks if the alloc does not fit in the contig hint</title>
<updated>2017-07-26T21:41:05+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=13f966373f9296c0da2fb2764654cce520b3a6b4'/>
<id>13f966373f9296c0da2fb2764654cce520b3a6b4</id>
<content type='text'>
This patch adds chunk-&gt;contig_bits_start to keep track of the contig
hint's offset and the check to skip the chunk if it does not fit. If
the chunk's contig hint starting offset cannot satisfy an allocation,
the allocator assumes there is enough memory pressure in this chunk to
either use a different chunk or create a new one. This accepts a less
tight packing for a smoother latency curve.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds chunk-&gt;contig_bits_start to keep track of the contig
hint's offset and the check to skip the chunk if it does not fit. If
the chunk's contig hint starting offset cannot satisfy an allocation,
the allocator assumes there is enough memory pressure in this chunk to
either use a different chunk or create a new one. This accepts a less
tight packing for a smoother latency curve.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>percpu: add first_bit to keep track of the first free in the bitmap</title>
<updated>2017-07-26T21:41:05+00:00</updated>
<author>
<name>Dennis Zhou (Facebook)</name>
<email>dennisszhou@gmail.com</email>
</author>
<published>2017-07-24T23:02:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=86b442fbce74d6cd0805410ef228776cbd0338d7'/>
<id>86b442fbce74d6cd0805410ef228776cbd0338d7</id>
<content type='text'>
This patch adds first_bit to keep track of the first free bit in the
bitmap. This hint helps prevent scanning of fully allocated blocks.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds first_bit to keep track of the first free bit in the
bitmap. This hint helps prevent scanning of fully allocated blocks.

Signed-off-by: Dennis Zhou &lt;dennisszhou@gmail.com&gt;
Reviewed-by: Josef Bacik &lt;jbacik@fb.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
