linux.git - Linux kernel source tree

diff options

author	Qu Wenruo <wqu@suse.com>	2026-06-04 09:59:47 +0930
committer	Johannes Thumshirn <johannes.thumshirn@wdc.com>	2026-06-09 18:22:46 +0200
commit	ff66fe6662330226b3f486014c375538d91c44aa (patch)
tree	23ab758eda3e1dac4b9427a0c6dd5bd0629f0200 /include/linux/debugobjects.h
parent	66ff4d366e7eb4d31813d2acabf3af512ce03aa5 (diff)

btrfs: fix incorrect buffered IO fallback for append direct writes

[BUG] With the previous bug of short direct writes fixed, test case generic/362 (*) still fails with the following error with nodatasum mount option: generic/362 0s ... - output mismatch (see /home/adam/xfstests/results//generic/362.out.bad) - output mismatch (see /home/adam/xfstests/results//generic/362.out.bad) --- tests/generic/362.out 2024-08-24 15:31:37.200000000 +0930 +++ /home/adam/xfstests/results//generic/362.out.bad 2026-05-27 10:13:09.072485767 +0930 @@ -1,2 +1,3 @@ QA output created by 362 +Wrong file size after first write, got 8192 expected 4096 Silence is golden ... *: If the test case has been executed before with default data checksum, the failure will not reproduce. Need the following fix to make it reliably reproducible: https://lore.kernel.org/linux-btrfs/20260528111659.87113-1-wqu@suse.com/ [CAUSE] Inside btrfs_dio_iomap_begin() for a direct write, we increase the isize if it's beyond the current isize. But if the direct io finished short, we do not revert the isize to the previous value nor to the short write end. Then if we need to fall back to buffered writes, and the write has IOCB_APPEND flag, then the buffered write will be positioned at the incorrect isize. The call chain looks like this: btrfs_direct_write(pos=0, length=4K) |- __iomap_dio_rw() | |- iomap_iter() | | |- btrfs_dio_iomap_begin() | | |- btrfs_get_blocks_direct_write() | | |- i_size_write() | | Which updates the isize to the write end (4K). | | | |- iomap_dio_iter() | | Failed with -EFAULT on the first page. | | | |- iomap_iter() | | |- btrfs_dio_iomap_end() | | Detects a short write, return -ENOTBLK | |- if (ret == -ENOTBLK) { ret = 0;} | Which resets the return value. | |- ret = iomap_dio_complet() | Which returns 0. | |- btrfs_buffered_write(iocb, from); |- generic_write_checks() |- iocb->ki_pos = i_size_read() Which is still the new size (4K), other than the original isize 0. [FIX] Introduce the following btrfs_dio_data members: - old_isize - updated_isize If the direct write has enlarged the isize. Then if we got a short write, and btrfs_dio_data::updated_isize is set, revert to the correct isize based on old_isize and current file position. And here we call i_size_write() without holding an extent lock, which is a very special case that we're safe to do: - Only a single writer can be enlarging isize Enlarging isize will take the exclusive inode lock. - Buffered readers need to wait for the OE we're holding Buffered readers will lock extent and wait for OE of the folio range. Sometimes we can skip the OE wait, but since all page cache is invalidated, the OE wait can not be skipped. But I do not think this is the most elegant solution, nor covers all cases. E.g. if the bio is submitted but IO failed, we are unable to do the revert. I believe the more elegant one would be extend the EXTENT_DIO_LOCKED lifespan for direct writes, so that we can update the isize when a write beyond EOF finished successfully. However that change is too huge for a small bug fix. So only implement the minimal partial fix for now. [REASON FOR NO FIXES TAG] The bug is again very old, before commit f85781fb505e ("btrfs: switch to iomap for direct IO") we are already increasing isize without a proper rollback for short writes. Thus only a CC to stable. CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>

Diffstat (limited to 'include/linux/debugobjects.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: