linux.git/fs/xfs, branch v6.15

xfs: Fix comment on xfs_trans_ail_update_bulk()

2025-05-14T13:37:50+00:00

This function doesn't take the AIL lock, but should be called
with AIL lock held. Also (hopefuly) simplify the comment.

Signed-off-by: Carlos Maiolino 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Chandan Babu R 
Signed-off-by: Carlos Maiolino

xfs: Fix a comment on xfs_ail_delete

2025-05-14T13:37:50+00:00

It doesn't return anything.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Carlos Maiolino 
Reviewed-by: Chandan Babu R 
Signed-off-by: Carlos Maiolino

xfs: Fail remount with noattr2 on a v5 with v4 enabled

2025-05-14T13:37:50+00:00

Bug: When we compile the kernel with CONFIG_XFS_SUPPORT_V4=y,
remount with "-o remount,noattr2" on a v5 XFS does not
fail explicitly.

Reproduction:
mkfs.xfs -f /dev/loop0
mount /dev/loop0 /mnt/scratch
mount -o remount,noattr2 /dev/loop0 /mnt/scratch

However, with CONFIG_XFS_SUPPORT_V4=n, the remount
correctly fails explicitly. This is because the way the
following 2 functions are defined:

static inline bool xfs_has_attr2 (struct xfs_mount *mp)
{
	return !IS_ENABLED(CONFIG_XFS_SUPPORT_V4) ||
		(mp->m_features & XFS_FEAT_ATTR2);
}
static inline bool xfs_has_noattr2 (const struct xfs_mount *mp)
{
	return mp->m_features & XFS_FEAT_NOATTR2;
}

xfs_has_attr2() returns true when CONFIG_XFS_SUPPORT_V4=n
and hence, the following if condition in
xfs_fs_validate_params() succeeds and returns -EINVAL:

/*
 * We have not read the superblock at this point, so only the attr2
 * mount option can set the attr2 feature by this stage.
 */

if (xfs_has_attr2(mp) && xfs_has_noattr2(mp)) {
	xfs_warn(mp, "attr2 and noattr2 cannot both be specified.");
	return -EINVAL;
}

With CONFIG_XFS_SUPPORT_V4=y, xfs_has_attr2() always return
false and hence no error is returned.

Fix: Check if the existing mount has crc enabled(i.e, of
type v5 and has attr2 enabled) and the
remount has noattr2, if yes, return -EINVAL.

I have tested xfs/{189,539} in fstests with v4
and v5 XFS with both CONFIG_XFS_SUPPORT_V4=y/n and
they both behave as expected.

This patch also fixes remount from noattr2 -> attr2 (on a v4 xfs).

Related discussion in [1]

[1] https://lore.kernel.org/all/Z65o6nWxT00MaUrW@dread.disaster.area/

Signed-off-by: Nirjhar Roy (IBM) 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Carlos Maiolino 
Signed-off-by: Carlos Maiolino

xfs: fix zoned GC data corruption due to wrong bv_offset

2025-05-14T13:37:49+00:00

xfs_zone_gc_write_chunk writes out the data buffer read in earlier using
the same bio, and currenly looks at bv_offset for the offset into the
scratch folio for that.  But commit 26064d3e2b4d ("block: fix adding
folio to bio") changed how bv_page and bv_offset are calculated for
adding larger folios, breaking this fragile logic.

Switch to extracting the full physical address from the old bio_vec,
and calculate the offset into the folio from that instead.

This fixes data corruption during garbage collection with heavy rockdsb
workloads.  Thanks to Hans for tracking down the culprit commit during
long bisection sessions.

Fixes: 26064d3e2b4d ("block: fix adding folio to bio")
Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection")
Reported-by: Hans Holmberg 
Signed-off-by: Christoph Hellwig 
Reviewed-by: Hans Holmberg 
Tested-by: Hans Holmberg 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Carlos Maiolino

xfs: free up mp->m_free[0].count in error case

2025-05-14T13:37:49+00:00

In xfs_init_percpu_counters(), memory for mp->m_free[0].count wasn't freed
in error case. Free it up in this patch.

Signed-off-by: Wengang Wang 
Fixes: 712bae96631852 ("xfs: generalize the freespace and reserved blocks handling")
Reviewed-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Carlos Maiolino

XFS: fix zoned gc threshold math for 32-bit arches

2025-04-22T14:03:14+00:00

xfs_zoned_need_gc makes use of mult_frac() to calculate the threshold
for triggering the zoned garbage collector, but, turns out mult_frac()
doesn't properly work with 64-bit data types and this caused build
failures on some 32-bit architectures.

Fix this by essentially open coding mult_frac() in a 64-bit friendly
way.

Notice we don't need to bother with counters underflow here because
xfs_estimate_freecounter() will always return a positive value, as it
leverages percpu_counter_read_positive to read such counters.

Fixes: 845abeb1f06a ("xfs: add tunable threshold parameter for triggering zone GC")
Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-kbuild-all/202504181233.F7D9Atra-lkp@intel.com/
Signed-off-by: Carlos Maiolino 
Tested-by: Guenter Roeck 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Hans Holmberg 
Signed-off-by: Carlos Maiolino

xfs: fix fsmap for internal zoned devices

2025-04-16T10:56:10+00:00

Filesystems with an internal zoned rt section use xfs_rtblock_t values
that are relative to the start of the data device.  When fsmap reports
on internal rt sections, it reports the space used by the data section
as "OWN_FS".

Unfortunately, the logic for resuming a query isn't quite right, so
xfs/273 fails because it stress-tests GETFSMAP with a single-record
buffer.  If we enter the "report fake space as OWN_FS" block with a
nonzero key[0].fmr_length, we should add that to key[0].fmr_physical
and recheck if we still need to emit the fake record.  We should /not/
just return 0 from the whole function because that prevents all rmap
record iteration.

If we don't enter that block, the resumption is still wrong.
keys[*].fmr_physical is a reflection of what we copied out to userspace
on a previous query, which means that it already accounts for rgstart.
It is not correct to add rtstart_daddr when computing start_rtb or
end_rtb, so stop that.

While we're at it, add a xfs_has_zoned to make it clear that this is a
zoned filesystem thing.

Fixes: e50ec7fac81aa2 ("xfs: enable fsmap reporting for internal RT devices")
Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Carlos Maiolino

xfs: Fix spelling mistake "drity" -> "dirty"

2025-04-16T08:43:23+00:00

There is a spelling mistake in fs/xfs/xfs_log.c. Fix it.

Signed-off-by: Zhang Xianwei 
Reviewed-by: Darrick J. Wong 
Signed-off-by: Carlos Maiolino

xfs: compute buffer address correctly in xmbuf_map_backing_mem

2025-04-14T09:22:52+00:00

Prior to commit e614a00117bc2d, xmbuf_map_backing_mem relied on
folio_file_page to return the base page for the xmbuf's loff_t in the
xfile, and set b_addr to the page_address of that base page.

Now that folio_file_page has been removed from xmbuf_map_backing_mem, we
always set b_addr to the folio_address of the folio.  This is correct
for the situation where the folio size matches the buffer size, but it's
totally wrong if tmpfs uses large folios.  We need to use
offset_in_folio here.

Found via xfs/801, which demonstrated evidence of corruption of an
in-memory rmap btree block right after initializing an adjacent block.

Fixes: e614a00117bc2d ("xfs: cleanup mapping tmpfs folios into the buffer cache")
Signed-off-by: Darrick J. Wong 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Carlos Maiolino

xfs: add tunable threshold parameter for triggering zone GC

2025-04-14T08:41:33+00:00

Presently we start garbage collection late - when we start running
out of free zones to backfill max_open_zones. This is a reasonable
default as it minimizes write amplification. The longer we wait,
the more blocks are invalidated and reclaim cost less in terms
of blocks to relocate.

Starting this late however introduces a risk of GC being outcompeted
by user writes. If GC can't keep up, user writes will be forced to
wait for free zones with high tail latencies as a result.

This is not a problem under normal circumstances, but if fragmentation
is bad and user write pressure is high (multiple full-throttle
writers) we will "bottom out" of free zones.

To mitigate this, introduce a zonegc_low_space tunable that lets the
user specify a percentage of how much of the unused space that GC
should keep available for writing. A high value will reclaim more of
the space occupied by unused blocks, creating a larger buffer against
write bursts.

This comes at a cost as write amplification is increased. To
illustrate this using a sample workload, setting zonegc_low_space to
60% avoids high (500ms) max latencies while increasing write
amplification by 15%.

Signed-off-by: Hans Holmberg 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Carlos Maiolino