diff options
| author | Rik van Riel <riel@surriel.com> | 2026-05-26 18:37:39 -0400 |
|---|---|---|
| committer | Johannes Thumshirn <johannes.thumshirn@wdc.com> | 2026-06-09 18:22:45 +0200 |
| commit | 23fd95663b070cf781f37b8058a8c055f168110b (patch) | |
| tree | 6fcf861401856b2a0ff9aacba4ea4481169653df /include/linux/timerqueue_types.h | |
| parent | 538e5bdbc8996d3f6ce65565dda7df58e6e97e04 (diff) | |
btrfs: allocate eb-attached btree pages as movable
Extent buffer pages allocated by alloc_extent_buffer() are attached to
btree_inode->i_mapping (the buffer_tree path), reach the LRU, and are
served by the btree_migrate_folio aops in fs/btrfs/disk-io.c. They are
migratable in practice once their owning extent buffer hits refs == 1,
which happens naturally. The buddy allocator classifies them by GFP,
however, and bare GFP_NOFS lands them in MIGRATE_UNMOVABLE pageblocks.
The result: every btree_inode page we read in pins an unmovable pageblock
from the page-superblock allocator's perspective, even though the page
itself can be moved.
Have each caller of btrfs_alloc_page_array, btrfs_alloc_folio_array,
and alloc_eb_folio_array pass in the full GFP mask directly, instead
of having the functions calculate it from boolean flags.
The alloc_extent_buffer call site passes GFP_NOFS | __GFP_NOFAIL |
__GFP_MOVABLE. All other call sites pass plain GFP_NOFS.
Three categories of caller stay on bare GFP_NOFS, deliberately:
- alloc_dummy_extent_buffer / btrfs_clone_extent_buffer: the
resulting eb is EXTENT_BUFFER_UNMAPPED, folio->mapping stays NULL,
the folios never enter LRU, never get migrate_folio aops. Tagging
them __GFP_MOVABLE would violate the page allocator's migrability
contract and they would defeat compaction in MOVABLE pageblocks
where isolate_migratepages_block skips non-LRU non-movable_ops
pages outright.
- btrfs_alloc_page_array callers in fs/btrfs/raid56.c (stripe
pages), fs/btrfs/inode.c (encoded reads), fs/btrfs/ioctl.c (io_uring
encoded reads), fs/btrfs/relocation.c (relocation buffers): same
contract violation. raid56 stripe_pages additionally persist in
the stripe cache (RBIO_CACHE_SIZE=1024) well beyond a single I/O,
so they are not transient enough to hand-wave the contract.
- btrfs_alloc_folio_array caller in fs/btrfs/scrub.c (stripe
folios): same -- stripe->folios[] are private buffers freed via
folio_put in release_scrub_stripe.
This change targets the dominant fragmentation source observed on the
page-superblock series: ~28 GB of btree_inode pages parked across
many tainted superpageblocks on a 250 GB test system with btrfs root,
preventing 1 GiB hugepage allocation from those regions. With the
movable hint, those pages now land in MOVABLE pageblocks where the
existing background defragger drains them through the standard
PB_has_movable gate, no LRU-sample fallback needed.
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'include/linux/timerqueue_types.h')
0 files changed, 0 insertions, 0 deletions
