diff options
| author | Filipe Manana <fdmanana@suse.com> | 2026-03-23 15:50:13 +0000 |
|---|---|---|
| committer | David Sterba <dsterba@suse.com> | 2026-04-07 18:56:06 +0200 |
| commit | b48c980b6a7e409050bb3067165db31cc6205e3e (patch) | |
| tree | c740c3c27ebcfca852750c110ca8d995233ab9ad /rust/kernel/gpu/git@git.tavy.me:linux-stable.git | |
| parent | 18addf9ec82f857bb36891bb5d93cc4a82281360 (diff) | |
btrfs: fix deadlock between reflink and transaction commit when using flushoncommit
When using the flushoncommit mount option, we can have a deadlock between
a transaction commit and a reflink operation that copied an inline extent
to an offset beyond the current i_size of the destination node.
The deadlock happens like this:
1) Task A clones an inline extent from inode X to an offset of inode Y
that is beyond Y's current i_size. This means we copied the inline
extent's data to a folio of inode Y that is beyond its EOF, using a
call to copy_inline_to_page();
2) Task B starts a transaction commit and calls
btrfs_start_delalloc_flush() to flush delalloc;
3) The delalloc flushing sees the new dirty folio of inode Y and when it
attempts to flush it, it ends up at extent_writepage() and sees that
the offset of the folio is beyond the i_size of inode Y, so it attempts
to invalidate the folio by calling folio_invalidate(), which ends up at
btrfs' folio invalidate callback - btrfs_invalidate_folio(). There it
tries to lock the folio's range in inode Y's extent io tree, but it
blocks since it's currently locked by task A - during a reflink we lock
the inodes and the source and destination ranges after flushing all
delalloc and waiting for ordered extent completion - after that we
don't expect to have dirty folios in the ranges, the exception is if
we have to copy an inline extent's data (because the destination offset
is not zero);
4) Task A then attempts to start a transaction to update the inode item,
and then it's blocked since the current transaction is in the
TRANS_STATE_COMMIT_START state. Therefore task A has to wait for the
current transaction to become unblocked (its state >=
TRANS_STATE_UNBLOCKED).
So task A is waiting for the transaction commit done by task B, and
the later waiting on the extent lock of inode Y that is currently
held by task A.
Syzbot recently reported this with the following stack traces:
INFO: task kworker/u8:7:1053 blocked for more than 143 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:7 state:D stack:23520 pid:1053 tgid:1053 ppid:2 task_flags:0x4208060 flags:0x00080000
Workqueue: writeback wb_workfn (flush-btrfs-46)
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5298 [inline]
__schedule+0x1553/0x5240 kernel/sched/core.c:6911
__schedule_loop kernel/sched/core.c:6993 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7008
wait_extent_bit fs/btrfs/extent-io-tree.c:811 [inline]
btrfs_lock_extent_bits+0x59c/0x700 fs/btrfs/extent-io-tree.c:1914
btrfs_lock_extent fs/btrfs/extent-io-tree.h:152 [inline]
btrfs_invalidate_folio+0x43d/0xc40 fs/btrfs/inode.c:7704
extent_writepage fs/btrfs/extent_io.c:1852 [inline]
extent_write_cache_pages fs/btrfs/extent_io.c:2580 [inline]
btrfs_writepages+0x12ff/0x2440 fs/btrfs/extent_io.c:2713
do_writepages+0x32e/0x550 mm/page-writeback.c:2554
__writeback_single_inode+0x133/0x11a0 fs/fs-writeback.c:1750
writeback_sb_inodes+0x995/0x19d0 fs/fs-writeback.c:2042
wb_writeback+0x456/0xb70 fs/fs-writeback.c:2227
wb_do_writeback fs/fs-writeback.c:2374 [inline]
wb_workfn+0x41a/0xf60 fs/fs-writeback.c:2414
process_one_work kernel/workqueue.c:3276 [inline]
process_scheduled_works+0xb6e/0x18c0 kernel/workqueue.c:3359
worker_thread+0xa53/0xfc0 kernel/workqueue.c:3440
kthread+0x388/0x470 kernel/kthread.c:436
ret_from_fork+0x51e/0xb90 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
INFO: task syz.4.64:6910 blocked for more than 143 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.4.64 state:D stack:22752 pid:6910 tgid:6905 ppid:5944 task_flags:0x400140 flags:0x00080002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5298 [inline]
__schedule+0x1553/0x5240 kernel/sched/core.c:6911
__schedule_loop kernel/sched/core.c:6993 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7008
wait_current_trans+0x39f/0x590 fs/btrfs/transaction.c:535
start_transaction+0x6a7/0x1650 fs/btrfs/transaction.c:705
clone_copy_inline_extent fs/btrfs/reflink.c:299 [inline]
btrfs_clone+0x128a/0x24d0 fs/btrfs/reflink.c:529
btrfs_clone_files+0x271/0x3f0 fs/btrfs/reflink.c:750
btrfs_remap_file_range+0x76b/0x1320 fs/btrfs/reflink.c:903
vfs_copy_file_range+0xda7/0x1390 fs/read_write.c:1600
__do_sys_copy_file_range fs/read_write.c:1683 [inline]
__se_sys_copy_file_range+0x2fb/0x480 fs/read_write.c:1650
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5f73afc799
RSP: 002b:00007f5f7315e028 EFLAGS: 00000246 ORIG_RAX: 0000000000000146
RAX: ffffffffffffffda RBX: 00007f5f73d75fa0 RCX: 00007f5f73afc799
RDX: 0000000000000005 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007f5f73b92c99 R08: 0000000000000863 R09: 0000000000000000
R10: 00002000000000c0 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5f73d76038 R14: 00007f5f73d75fa0 R15: 00007fff138a5068
</TASK>
INFO: task syz.4.64:6975 blocked for more than 143 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.4.64 state:D stack:24736 pid:6975 tgid:6905 ppid:5944 task_flags:0x400040 flags:0x00080002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5298 [inline]
__schedule+0x1553/0x5240 kernel/sched/core.c:6911
__schedule_loop kernel/sched/core.c:6993 [inline]
schedule+0x164/0x360 kernel/sched/core.c:7008
wb_wait_for_completion+0x3e8/0x790 fs/fs-writeback.c:227
__writeback_inodes_sb_nr+0x24c/0x2d0 fs/fs-writeback.c:2838
try_to_writeback_inodes_sb+0x9a/0xc0 fs/fs-writeback.c:2886
btrfs_start_delalloc_flush fs/btrfs/transaction.c:2175 [inline]
btrfs_commit_transaction+0x82e/0x31a0 fs/btrfs/transaction.c:2364
btrfs_ioctl+0xca7/0xd00 fs/btrfs/ioctl.c:5206
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:597 [inline]
__se_sys_ioctl+0xff/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5f73afc799
RSP: 002b:00007f5f7313d028 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007f5f73d76090 RCX: 00007f5f73afc799
RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000004
RBP: 00007f5f73b92c99 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5f73d76128 R14: 00007f5f73d76090 R15: 00007fff138a5068
</TASK>
Fix this by updating the i_size of the destination inode of a reflink
operation after we copy an inline extent's data to an offset beyond the
i_size and before attempting to start a transaction to update the inode's
item.
Reported-by: syzbot+63056bf627663701bbbf@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/69bba3fe.050a0220.227207.002f.GAE@google.com/
Fixes: 05a5a7621ce6 ("Btrfs: implement full reflink support for inline extents")
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Diffstat (limited to 'rust/kernel/gpu/git@git.tavy.me:linux-stable.git')
0 files changed, 0 insertions, 0 deletions
