summaryrefslogtreecommitdiff
path: root/fs/xfs
AgeCommit message (Collapse)Author
10 daysxfs: Remove mention of PageWritebackMatthew Wilcox (Oracle)
Update a comment to refer to folios instead of pages. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: abort mount if xfs_fs_reserve_ag_blocks failsChristoph Hellwig
xfs_mountfs currently ignores all errors from xfs_fs_reserve_ag_blocks, which can lead to the mount path continuing on corruption errors. Fix the check to only ignore -ENOSPC as in other callers, and unwind for all other errors. Fixes: 81ed94751b15 ("xfs: fix log intent recovery ENOSPC shutdowns when inactivating inodes") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: factor rtgroup geom write pointer reporting into a helperChristoph Hellwig
Sticks out a bit better if we add a separate helper for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: drop the RTG reference later in xfs_ioc_rtgroup_geometryChristoph Hellwig
Keep the rtgroup reference until after reporting the write pointer, as that uses it. Right now this is not a major issue as we don't support shrinking file systems in a way that makes RTGs go away, but let's stick to the proper reference counting to prepare for that. Fixes: c6ce65cb17aa ("xfs: add write pointer to xfs_rtgroup_geometry") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: fix rtgroup cleanup in CoW fork repairYingjie Gao
xrep_cow_find_bad_rt() initializes scrub rtgroup state before the force-rebuild path calls xrep_cow_mark_file_range(). If that call fails, the code jumps directly to out_rtg, which skips the scrub rtgroup cleanup and only drops the local rtgroup reference. Remove the unnecessary jump so the function falls through to out_sr, ensuring the realtime cursors, lock state, and sr->rtg reference are released before returning. Fixes: fd97fe111208 ("xfs: fix CoW forks for realtime files") Cc: <stable@vger.kernel.org> # v6.14 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: fix error returns in CoW fork repairYingjie Gao
xrep_cow_find_bad() returns success after the cleanup labels even if AG setup, btree queries, or bitmap updates failed. This can make repair continue with an incomplete bad-file-offset bitmap instead of stopping at the original error. The force-rebuild path has a related cleanup problem. If xrep_cow_mark_file_range() fails, the function returns directly and skips the scrub AG context and perag cleanup. Let the force-rebuild path fall through to the existing cleanup code and return the saved error after cleanup. Fixes: dbbdbd008632 ("xfs: repair problems in CoW forks") Cc: <stable@vger.kernel.org> # v6.8 Signed-off-by: Yingjie Gao <gaoyingjie@uniontech.com> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: fix overlapping extents returned for pNFS LAYOUTGETDai Ngo
xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(), which causes the bmap code to expand the mapping to cover the entire extent rather than the requested range. A single LAYOUTGET request from the client can cause the server to issue multiple calls to xfs_fs_map_blocks() for different offsets within the same extent. Because the use of XFS_BMAPI_ENTIRE flag, these calls can produce overlapping mappings. As a result, the LAYOUTGET reply sent to the NFS client may contain overlapping extents. This creates ambiguity in extent selection for a given file range, which can lead to incorrect device selection, inconsistent handling of datastate, and ultimately data corruption or protocol violations on the client side. Problem discovered with xfstest generic/075 test using NFSv4.2 mount with SCSI layout. Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that xfs_bmapi_read() returns only the mapping for the requested range. Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET"). Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: fix use of uninitialized imap in xfs_fs_map_blocks error pathDai Ngo
xfs_fs_map_blocks() acquires the data map lock and then calls xfs_bmapi_read(). If xfs_bmapi_read() fails, the function currently still falls through to xfs_bmbt_to_iomap(), which consumes an uninitialized imap record and may return invalid data to the caller. Fix this by releasing the data map lock and returning immediately when xfs_bmapi_read() reports an error. This prevents xfs_bmbt_to_iomap() from being called with an uninitialized xfs_bmbt_irec. Fixes: 527851124d10f ("xfs: implement pNFS export operations") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
10 daysxfs: handle racing deletions in xfs_zone_gc_iter_irecHans Holmberg
Under heavy garbage collection pressure from RocksDB workloads, filesystem shutdowns can occur in xfs_zone_gc_iter_irec when xfs_iget() returns -EINVAL for deleted files. Fix this by handling -EINVAL just like we handle -ENOENT, allowing zone GC to safely ignore stale mappings. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-21xfs: fix a buffer lookup against removal raceChristoph Hellwig
When a buffer is freed either by LRU eviction or because it is unset, the lockref is marked as dead instantly, which prevents the buffer from being used after finding it in the buffer hash in xfs_buf_lookup and xfs_buf_find_insert. But the latter will then not add the new buffer to the hash because it already found an existing buffer. Fix this using in two places: Remove the buffer from the hash before marking the lockref dead so that that no buffer with a dead lockref can be found in the hash, but if we find one in xfs_buf_find_insert due to store reordering, handle this case correctly instead of returning an unhashed buffer. Fixes: 67fe4303972e ("xfs: don't keep a reference for buffers on the LRU") Reported-by: Andrey Albershteyn <aalbersh@redhat.com> Reported-by: Carlos Maiolino <cem@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-11xfs: Fix typo in commentMd Shofiqul Islam
Fix spelling mistake in comment: - occured -> occurred Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Md Shofiqul Islam <shofiqtest@gmail.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-05-11xfs: fix the "limiting open zones" messageChristoph Hellwig
The xfs logging macros include a newline, remove the \n, which adds an extra one. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Andrey Albershteyn <aalbersh@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: flush delalloc blocks on ENOSPC in xfs_trans_alloc_icreateRavi Singh
xfs_trans_alloc_icreate() can fail with ENOSPC when delalloc reservations have consumed most of the available block count (fdblocks). xfs_trans_alloc() already retries internally with xfs_blockgc_flush_all(), but that only trims post-EOF speculative preallocation and may not free enough space for the transaction reservation. Add a retry with xfs_flush_inodes() when xfs_trans_alloc() returns ENOSPC. This forces writeback of all dirty inodes via sync_inodes_sb(), converting delalloc reservations to real allocations and freeing the over-reserved portion back to fdblocks. This fixes all callers of xfs_trans_alloc_icreate() and removes the existing caller-level retry from xfs_create(), which is now handled centrally. Signed-off-by: Ravi Singh <ravising@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: check da node block pad field during scrubYuto Ohnuki
The da node block header (xfs_da3_node_hdr) contains a __pad32 field that should always be zero. Add a check for this during directory and attribute btree scrubbing. Since old kernels may have written non-zero padding without issues, flag this as an optimization opportunity (preen) rather than corruption. Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: fix memory leak for data allocated by xfs_zone_gc_data_alloc()Wilfred Mallawa
In xfs_zone_gc_mount(), on error, a struct xfs_zone_gc_data allocated with xfs_zone_gc_data_alloc() is freed with kfree(), however, this doesn't free the underlying folios or the rmap_irecs. Use xfs_zone_gc_data_free() to correctly free this memory. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Cc: stable@vger.kernel.org # v6.15 Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: fix memory leak on error in xfs_alloc_zone_info()Wilfred Mallawa
Currently, the 0th index of the zi_used_bucket_bitmap array is not freed on error due to the pre-decrement then evaluate semantic of the while loop used in xfs_alloc_zone_info(). Fix it by allowing for the i == 0 case to be covered. Fixes: 080d01c41d44 ("xfs: implement zoned garbage collection") Cc: stable@vger.kernel.org # v6.15 Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: check directory data block header padding in scrubYuto Ohnuki
Add the missing scrub check for the pad field in directory data block headers. Old kernels may have written non-zero padding without issue, and the write path now self-heals stale padding on modification. Flag non-zero padding as an optimization opportunity (preen) rather than corruption. Add xchk_fblock_set_preen helper for reporting file fork block issues that could be optimized. The trace event xchk_fblock_preen already exists. Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: zero directory data block padding on write verificationYuto Ohnuki
Old kernels did not zero the pad field in xfs_dir3_data_hdr when initializing directory data blocks, so existing filesystems may have non-zero padding on disk. Zero the pad field in xfs_dir3_data_write_verify alongside the existing LSN and checksum updates. The pad field is pure alignment padding with no runtime meaning, so zeroing it during write verification is safe and has no additional I/O cost. This lets filesystems gradually self-heal stale non-zero padding as directories are modified, without requiring an explicit repair pass. Suggested-by: Dave Chinner <dgc@kernel.org> Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: zero entire directory data block header region at initYuto Ohnuki
xfs_dir3_data_init currently zeroes only the xfs_dir3_blk_hdr portion of the directory data block header, then manually initializes the bestfree entries in a loop. This leaves the pad field in xfs_dir3_data_hdr uninitialized and requires explicit zeroing of each bestfree slot. Zero the entire header region (geo->data_entry_offset bytes) unconditionally before setting individual fields. This covers all current and future header fields, all padding (implicit and explicit), and the bestfree array, so the manual zeroing loop for bestfree can be removed. Suggested-by: Dave Chinner <dgc@kernel.org> Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-27xfs: remove the meaningless XFS_ALLOC_FLAG_FREEINGJinliang Zheng
In xfs_refcount_finish_one(), there's no need to pass XFS_ALLOC_FLAG_FREEING to xfs_alloc_read_agf(). So remove it. Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-13Merge tag 'xfs-merge-7.1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "There aren't any new features. The whole series is just a collection of bug fixes and code refactoring. There is some new information added a couple new tracepoints, new data added to mountstats, but no big changes" * tag 'xfs-merge-7.1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (41 commits) xfs: fix number of GC bvecs xfs: untangle the open zones reporting in mountinfo xfs: expose the number of open zones in sysfs xfs: reduce special casing for the open GC zone xfs: streamline GC zone selection xfs: refactor GC zone selection helpers xfs: rename xfs_zone_gc_iter_next to xfs_zone_gc_iter_irec xfs: put the open zone later xfs_open_zone_put xfs: add a separate tracepoint for stealing an open zone for GC xfs: delay initial open of the GC zone xfs: fix a resource leak in xfs_alloc_buftarg() xfs: handle too many open zones when mounting xfs: refactor xfs_mount_zones xfs: fix integer overflow in busy extent sort comparator xfs: fix integer overflow in deferred intent sort comparators xfs: fold xfs_setattr_size into xfs_vn_setattr_size xfs: remove a duplicate assert in xfs_setattr_size xfs: return default quota limits for IDs without a dquot xfs: start gc on zonegc_low_space attribute updates xfs: don't decrement the buffer LRU count for in-use buffers ...
2026-04-13Merge tag 'for-7.1/block-20260411' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O copies between kernel and userspace by matching registered buffer PFNs at I/O time. Includes selftests. - Refactor bio integrity to support filesystem initiated integrity operations and arbitrary buffer alignment. - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast and slow paths. Add bio_await() and bio_submit_or_kill() helpers, unify synchronous bi_end_io callbacks. - Fix zone write plug refcount handling and plug removal races. Add support for serializing zone writes at QD=1 for rotational zoned devices, yielding significant throughput improvements. - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET command. - Add io_uring passthrough (uring_cmd) support to the BSG layer. - Replace pp_buf in partition scanning with struct seq_buf. - zloop improvements and cleanups. - drbd genl cleanup, switching to pre_doit/post_doit. - NVMe pull request via Keith: - Fabrics authentication updates - Enhanced block queue limits support - Workqueue usage updates - A new write zeroes device quirk - Tagset cleanup fix for loop device - MD pull requests via Yu Kuai: - Fix raid5 soft lockup in retry_aligned_read() - Fix raid10 deadlock with check operation and nowait requests - Fix raid1 overlapping writes on writemostly disks - Fix sysfs deadlock on array_state=clear - Proactive RAID-5 parity building with llbitmap, with write_zeroes_unmap optimization for initial sync - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops version mismatch fallback - Fix bcache use-after-free and uninitialized closure - Validate raid5 journal metadata payload size - Various cleanups - Various other fixes, improvements, and cleanups * tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits) ublk: fix tautological comparison warning in ublk_ctrl_reg_buf scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd() block: refactor blkdev_zone_mgmt_ioctl MAINTAINERS: update ublk driver maintainer email Documentation: ublk: address review comments for SHMEM_ZC docs ublk: allow buffer registration before device is started ublk: replace xarray with IDA for shmem buffer index allocation ublk: simplify PFN range loop in __ublk_ctrl_reg_buf ublk: verify all pages in multi-page bvec fall within registered range ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support xfs: use bio_await in xfs_zone_gc_reset_sync block: add a bio_submit_or_kill helper block: factor out a bio_await helper block: unify the synchronous bi_end_io callbacks xfs: fix number of GC bvecs selftests/ublk: add read-only buffer registration test selftests/ublk: add filesystem fio verify test for shmem_zc selftests/ublk: add hugetlbfs shmem_zc test for loop target selftests/ublk: add shared memory zero-copy test selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target ...
2026-04-13Merge tag 'vfs-7.1-rc1.integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs integrity updates from Christian Brauner: "This adds support to generate and verify integrity information (aka T10 PI) in the file system, instead of the automatic below the covers support that is currently used. The implementation is based on refactoring the existing block layer PI code to be reusable for this use case, and then adding relatively small wrappers for the file system use case. These are then used in iomap to implement the semantics, and wired up in XFS with a small amount of glue code. Compared to the baseline this does not change performance for writes, but increases read performance up to 15% for 4k I/O, with the benefit decreasing with larger I/O sizes as even the baseline maxes out the device quickly on my older enterprise SSD" * tag 'vfs-7.1-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: support T10 protection information iomap: support T10 protection information iomap: support ioends for buffered reads iomap: add a bioset pointer to iomap_read_folio_ops ntfs3: remove copy and pasted iomap code iomap: allow file systems to hook into buffered read bio submission iomap: only call into ->submit_read when there is a read_ctx iomap: pass the iomap_iter to ->submit_read iomap: refactor iomap_bio_read_folio_range block: pass a maxlen argument to bio_iov_iter_bounce block: add fs_bio_integrity helpers block: make max_integrity_io_size public block: prepare generation / verification helpers for fs usage block: add a bdev_has_integrity_csum helper block: factor out a bio_integrity_setup_default helper block: factor out a bio_integrity_action helper
2026-04-07xfs: use bio_await in xfs_zone_gc_reset_syncChristoph Hellwig
Replace the open-coded bio wait logic with the new bio_await helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://patch.msgid.link/20260407140538.633364-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07xfs: fix number of GC bvecsChristoph Hellwig
GC scratch allocations can wrap around and use the same buffer twice, and the current code fails to account for that. So far this worked due to rounding in the block layer, but changes to the bio allocator drop the over-provisioning and generic/256 or generic/361 will now usually fail when running against the current block tree. Simplify the allocation to always pass the maximum value that is easier to verify, as a saving of up to one bvec per allocation isn't worth the effort to verify a complicated calculated value. Fixes: 102f444b57b3 ("xfs: rework zone GC buffer management") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Link: https://patch.msgid.link/20260407140538.633364-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07xfs: fix number of GC bvecsChristoph Hellwig
GC scratch allocations can wrap around and use the same buffer twice, and the current code fails to account for that. So far this worked due to rounding in the block layer, but changes to the bio allocator drop the over-provisioning and generic/256 or generic/361 will now usually fail when running against the current block tree. Simplify the allocation to always pass the maximum value that is easier to verify, as a saving of up to one bvec per allocation isn't worth the effort to verify a complicated calculated value. Fixes: 102f444b57b3 ("xfs: rework zone GC buffer management") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: untangle the open zones reporting in mountinfoChristoph Hellwig
Keeping a value per line makes parsing much easier, so move the maximum number of open zones into a separate line, and also add a new line for the number of open open GC zones. While that has to be either 0 or 1 currently having a value future-proofs the interface for adding more open GC zones if needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: expose the number of open zones in sysfsChristoph Hellwig
Add a sysfs attribute for the current number of open zones so that it can be trivially read from userspace in monitoring or testing software. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: reduce special casing for the open GC zoneChristoph Hellwig
Currently the open zone used for garbage collection is a special snow flake, and it has been a bit annoying for some further zoned XFS work I've been doing. Remove the zi_open_gc_field and instead track the open GC zone in the zi_open_zones list together with the normal open zones, and keep an extra pointer and a reference of in the GC thread's data structure. This means anything iterating over open zones just has to look at zi_open_zones, and the life time rules are consistent. It also helps to add support for multiple open GC zones if we ever need them, and removes a bit of code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: streamline GC zone selectionChristoph Hellwig
Currently picking of the GC target zone is a bit odd as it is done both in the main "can we start new GC cycles" routine and in the low-level block allocator for GC. This was mostly done to work around the rules for when code in a waitqueue wait loop can sleep. But with a trick to check if the process state has been set to running to discover if the wait loop has to be retried, all this becomes much simpler. We can select a GC zone just before writing, and bail out of starting new work if we can't find a usable zone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: refactor GC zone selection helpersChristoph Hellwig
Merge xfs_zone_gc_ensure_target into xfs_zone_gc_select_target to keep all zone selection code together. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: rename xfs_zone_gc_iter_next to xfs_zone_gc_iter_irecChristoph Hellwig
This function returns the current iterator position, which makes the _next postfix a bit misleading. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: put the open zone later xfs_open_zone_putChristoph Hellwig
The open zone is what holds the rtg reference for us. This doesn't matter until we support shrinking, and even then is rather theoretical because we can't shrink away a just filled zone in a tiny race window, but let's play safe here. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: add a separate tracepoint for stealing an open zone for GCChristoph Hellwig
The case where we have to reuse an already open zone warrants a different trace point vs the normal opening of a GC zone. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: delay initial open of the GC zoneChristoph Hellwig
The code currently used to select the new GC target zone when the previous one is full also handles the case where there is no current GC target zone at all. Make use of that to simplify the logic in xfs_zone_gc_mount. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: fix a resource leak in xfs_alloc_buftarg()Haoxiang Li
In the error path, call fs_put_dax() to drop the DAX device reference. Fixes: 6f643c57d57c ("xfs: implement ->notify_failure() for XFS") Cc: stable@vger.kernel.org Signed-off-by: Haoxiang Li <lihaoxiang@isrc.iscas.ac.cn> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: handle too many open zones when mountingChristoph Hellwig
When running on conventional zones or devices, the zoned allocator does not have a real write pointer, but instead fakes it up at mount time based on the last block recorded in the rmap. This can create spurious "open" zones when the last written blocks in a conventional zone are invalidated. Add a loop to the mount code to find the conventional zone with the highest used block in the rmap tree and "finish" it until we are below the open zones limit. While we're at it, also error out if there are too many open sequential zones, which can only happen when the user overrode the max open zones limit (or with really buggy hardware reducing the limit, but not much we can do about that). Fixes: 4e4d52075577 ("xfs: add the zoned space allocator") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: refactor xfs_mount_zonesChristoph Hellwig
xfs_mount_zones has grown a bit too big and unorganized. Split the zone reporting loop into a separate helper, hiding the rtg variable there. Print the mount message last, and also keep the VFS writeback chunk size last instead of in the middle of the logic to calculate the free/available blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: fix integer overflow in busy extent sort comparatorYuto Ohnuki
xfs_extent_busy_ag_cmp() subtracts two uint32_t values (group numbers and block numbers) and returns the result as s32. When the difference exceeds INT_MAX, the result overflows and the sort order is corrupted. Use cmp_int() instead, as was done in commit 362c49098086 ("xfs: fix integer overflow in bmap intent sort comparator"). Fixes: 4a137e09151e ("xfs: keep a reference to the pag for busy extents") Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: fix integer overflow in deferred intent sort comparatorsYuto Ohnuki
xfs_extent_free_diff_items(), xfs_refcount_update_diff_items(), and xfs_rmap_update_diff_items() subtract two uint32_t group numbers and return the result as int, which can overflow when the difference exceeds INT_MAX. Use cmp_int() instead, as was done in commit 362c49098086 ("xfs: fix integer overflow in bmap intent sort comparator"). Fixes: c13418e8eb37 ("xfs: give xfs_rmap_intent its own perag reference") Fixes: f6b384631e1e ("xfs: give xfs_extfree_intent its own perag reference") Fixes: 00e7b3bac1dc ("xfs: give xfs_refcount_intent its own perag reference") Signed-off-by: Yuto Ohnuki <ytohnuki@amazon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: fold xfs_setattr_size into xfs_vn_setattr_sizeChristoph Hellwig
xfs_vn_setattr_size is the only caller of xfs_setattr_size, so merge the two functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-04-07xfs: remove a duplicate assert in xfs_setattr_sizeChristoph Hellwig
There already is an assert that checks for uid and gid changes besides a lot of others at the beginning of the function. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-31xfs: return default quota limits for IDs without a dquotRavi Singh
When an ID has no dquot on disk, Q_XGETQUOTA returns -ENOENT even though default quota limits are configured and enforced against that ID. This means unprivileged users who have never used any resources cannot see the limits that apply to them. When xfs_qm_dqget() returns -ENOENT for a non-zero ID, return a zero-usage response with the default limits filled in from m_quotainfo rather than propagating the error. This is consistent with the enforcement behavior in xfs_qm_adjust_dqlimits(), which pushes the same default limits into a dquot when it is first allocated. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ravi Singh <ravising@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-30xfs: start gc on zonegc_low_space attribute updatesHans Holmberg
Start gc if the agressiveness of zone garbage collection is changed by the user (if the file system is not read only). Without this change, the new setting will not be taken into account until the gc thread is woken up by e.g. a write. Cc: stable@vger.kernel.org # v6.15 Fixes: 845abeb1f06a8a ("xfs: add tunable threshold parameter for triggering zone GC") Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-30xfs: don't decrement the buffer LRU count for in-use buffersChristoph Hellwig
XFS buffers are added to the LRU when they are unused, but are only removed from the LRU lazily when the LRU list scan finds a used buffer. So far this only happen when the LRU counter hits 0, which is suboptimal as buffers that were added to the LRU, but are in use again still consume LRU scanning resources and are aged while actually in use. Fix this by checking for in-use buffers and removing the from the LRU before decrementing the LRU counter. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-30xfs: switch (back) to a per-buftarg buffer hashChristoph Hellwig
The per-AG buffer hashes were added when all buffer lookups took a per-hash look. Since then we've made lookups entirely lockless and removed the need for a hash-wide lock for inserts and removals as well. With this there is no need to sharding the hash, so reduce the used resources by using a per-buftarg hash for all buftargs. Long after writing this initially, syzbot found a problem in the buffer cache teardown order, which this happens to fix as well by doing the entire buffer cache teardown in one places instead of splitting it between destroying the buftarg and the perag structures. Link: https://lore.kernel.org/linux-xfs/aLeUdemAZ5wmtZel@dread.disaster.area/ Reported-by: syzbot+0391d34e801643e2809b@syzkaller.appspotmail.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Tested-by: syzbot+0391d34e801643e2809b@syzkaller.appspotmail.com Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-30xfs: use a lockref for the buffer reference countChristoph Hellwig
The lockref structure allows incrementing/decrementing counters like an atomic_t for the fast path, while still allowing complex slow path operations as if the counter was protected by a lock. The only slow path operations that actually need to take the lock are the final put, LRU evictions and marking a buffer stale. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-30xfs: don't keep a reference for buffers on the LRUChristoph Hellwig
Currently the buffer cache adds a reference to b_hold for buffers that are on the LRU. This seems to go all the way back and allows releasing buffers from the LRU using xfs_buf_rele. But it makes xfs_buf_rele really complicated in differs from how other LRUs are implemented in Linux. Switch to not having a reference for buffers in the LRU, and use a separate negative hold value to mark buffers as dead. This simplifies xfs_buf_rele, which now just deal with the last "real" reference, and prepares for using the lockref primitive. This also removes the b_lock protection for removing buffers from the buffer hash. This is the desired outcome because the rhashtable is fully internally synchronized, and previously the lock was mostly held out of ordering constrains in xfs_buf_rele_cached. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-26Merge branch 'xfs-7.0-fixes' into for-nextCarlos Maiolino
Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-03-26xfs: remove file_path tracepoint dataDarrick J. Wong
The xfile/xmbuf shmem file descriptions are no longer as detailed as they were when online fsck was first merged, because moving to static strings in commit 60382993a2e180 ("xfs: get rid of the xchk_xfile_*_descr calls") removed a memory allocation and hence a source of failure. However this makes encoding the description in the tracepoints sort of a waste of memory. David Laight also points out that file_path doesn't zero the whole buffer which causes exposure of stale trace bytes, and Steven Rostedt wonders why we're not using a dynamic array for the file path. I don't think this is worth fixing, so let's just rip it out. Cc: rostedt@goodmis.org Cc: david.laight.linux@gmail.com Link: https://lore.kernel.org/linux-xfs/20260323172204.work.979-kees@kernel.org/ Cc: stable@vger.kernel.org # v6.11 Fixes: 19ebc8f84ea12e ("xfs: fix file_path handling in tracepoints") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>