diff options
| author | Shengming Hu <hu.shengming@zte.com.cn> | 2026-04-30 22:04:41 +0800 |
|---|---|---|
| committer | Vlastimil Babka (SUSE) <vbabka@kernel.org> | 2026-05-14 09:13:00 +0200 |
| commit | dc795d4c0282a4fbfbcd76a70c09ca0888678443 (patch) | |
| tree | 43a2fb3c8a3bb07734870c09322498771aacc479 /scripts | |
| parent | 5d6919055dec134de3c40167a490f33c74c12581 (diff) | |
mm/slub: defer freelist construction until after bulk allocation from a new slab
Allocations from a fresh slab can consume all of its objects, and the
freelist built during slab allocation is discarded immediately as a result.
Instead of special-casing the whole-slab bulk refill case, defer freelist
construction until after objects are emitted from a fresh slab.
new_slab() now only allocates the slab and initializes its metadata.
refill_objects() then obtains a fresh slab and lets alloc_from_new_slab()
emit objects directly, building a freelist only for the objects left
unallocated; the same change is applied to alloc_single_from_new_slab().
To keep CONFIG_SLAB_FREELIST_RANDOM=y/n on the same path, introduce a
small iterator abstraction for walking free objects in allocation order.
The iterator is used both for filling the sheaf and for building the
freelist of the remaining objects.
Also mark setup_object() inline. After this optimization, the compiler no
longer consistently inlines this helper in the hot path, which can hurt
performance. Explicitly marking it inline restores the expected code
generation.
This reduces per-object overhead when allocating from a fresh slab.
The most direct benefit is in the paths that allocate objects first and
only build a freelist for the remainder afterward: bulk allocation from
a new slab in refill_objects(), single-object allocation from a new slab
in ___slab_alloc(), and the corresponding early-boot paths that now use
the same deferred-freelist scheme. Since refill_objects() is also used to
refill sheaves, the optimization is not limited to the small set of
kmem_cache_alloc_bulk()/kmem_cache_free_bulk() users; regular allocation
workloads may benefit as well when they refill from a fresh slab.
In slub_bulk_bench, the time per object drops by about 42% to 70% with
CONFIG_SLAB_FREELIST_RANDOM=n, and by about 58% to 69% with
CONFIG_SLAB_FREELIST_RANDOM=y. This benchmark is intended to isolate the
cost removed by this change: each iteration allocates exactly
slab->objects from a fresh slab. That makes it a near best-case scenario
for deferred freelist construction, because the old path still built a
full freelist even when no objects remained, while the new path avoids
that work. Realistic workloads may see smaller end-to-end gains depending
on how often allocations reach this fresh-slab refill path.
Benchmark results (slub_bulk_bench):
Machine: qemu-system-x86 -m 1024M -smp 8 -enable-kvm -cpu host
Kernel: Linux 7.1.0-rc1-next-20260429
Config: x86_64_defconfig
Cpu: 0
Rounds: 20
Total: 256MB
- CONFIG_SLAB_FREELIST_RANDOM=n -
obj_size=16, batch=256:
before: 5.44 +- 0.07 ns/object
after: 3.12 +- 0.03 ns/object
delta: -42.6%
obj_size=32, batch=128:
before: 7.57 +- 0.32 ns/object
after: 3.79 +- 0.07 ns/object
delta: -49.9%
obj_size=64, batch=64:
before: 11.27 +- 0.09 ns/object
after: 4.83 +- 0.06 ns/object
delta: -57.2%
obj_size=128, batch=32:
before: 19.38 +- 0.13 ns/object
after: 6.43 +- 0.08 ns/object
delta: -66.8%
obj_size=256, batch=32:
before: 23.59 +- 0.18 ns/object
after: 6.97 +- 0.07 ns/object
delta: -70.5%
obj_size=512, batch=32:
before: 21.06 +- 0.14 ns/object
after: 7.12 +- 0.17 ns/object
delta: -66.2%
- CONFIG_SLAB_FREELIST_RANDOM=y -
obj_size=16, batch=256:
before: 9.42 +- 0.11 ns/object
after: 4.36 +- 0.19 ns/object
delta: -53.7%
obj_size=32, batch=128:
before: 12.19 +- 0.62 ns/object
after: 4.93 +- 0.07 ns/object
delta: -59.6%
obj_size=64, batch=64:
before: 17.01 +- 0.73 ns/object
after: 6.14 +- 0.12 ns/object
delta: -63.9%
obj_size=128, batch=32:
before: 23.71 +- 1.10 ns/object
after: 8.35 +- 0.18 ns/object
delta: -64.8%
obj_size=256, batch=32:
before: 29.20 +- 0.35 ns/object
after: 9.44 +- 1.32 ns/object
delta: -67.7%
obj_size=512, batch=32:
before: 29.35 +- 0.79 ns/object
after: 9.21 +- 0.34 ns/object
delta: -68.6%
Link: https://github.com/HSM6236/slub_bulk_test.git
Suggested-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Harry Yoo (Oracle) <harry@kernel.org>
Reviewed-by: Hao Li <hao.li@linux.dev>
Tested-by: Hao Li <hao.li@linux.dev>
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Link: https://patch.msgid.link/202604302204413066CxdJnJ3RAGH_7iE4EBIO@zte.com.cn
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Diffstat (limited to 'scripts')
0 files changed, 0 insertions, 0 deletions
