<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/bcachefs/error.c, branch v6.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>bcachefs: More informative error message when shutting down due to error</title>
<updated>2025-04-29T02:42:17+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-04-29T00:25:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e5a3b8cf3330a774e5f5f06a2b7cf20116447297'/>
<id>e5a3b8cf3330a774e5f5f06a2b7cf20116447297</id>
<content type='text'>
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Error ratelimiting is no longer only during fsck</title>
<updated>2025-04-20T23:41:38+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-04-18T02:44:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=417f01e726036b564e2e14c39b2be58e93bf7971'/>
<id>417f01e726036b564e2e14c39b2be58e93bf7971</id>
<content type='text'>
We now more often do repair automatically, without the user invoking
fsck - and sometimes that can involve fixing lots of errors, so let's
avoid flooding the dmesg log.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We now more often do repair automatically, without the user invoking
fsck - and sometimes that can involve fixing lots of errors, so let's
avoid flooding the dmesg log.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: use nonblocking variant of print_string_as_lines in error path</title>
<updated>2025-04-03T16:11:42+00:00</updated>
<author>
<name>Bharadwaj Raju</name>
<email>bharadwaj.raju777@gmail.com</email>
</author>
<published>2025-04-02T18:15:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=570f5050bb0739f24aeb94034d8ec134c450b4aa'/>
<id>570f5050bb0739f24aeb94034d8ec134c450b4aa</id>
<content type='text'>
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.

Replace calls to it with the nonblocking variant which is safe to call.

Signed-off-by: Bharadwaj Raju &lt;bharadwaj.raju777@gmail.com&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The inconsistency error path calls print_string_as_lines, which calls
console_lock, which is a potentially-sleeping function and so can't be
called in an atomic context.

Replace calls to it with the nonblocking variant which is safe to call.

Signed-off-by: Bharadwaj Raju &lt;bharadwaj.raju777@gmail.com&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Fix scheduling while atomic from logging changes</title>
<updated>2025-04-03T16:11:42+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-04-01T17:01:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b2ffadcc7f8fd2059e389d640f9c81febd606daf'/>
<id>b2ffadcc7f8fd2059e389d640f9c81febd606daf</id>
<content type='text'>
Two fixes from the recent logging changes:

bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.

The one syzbot found is in
  bch2_bkey_pick_read_device
  bch2_dev_rcu
  bch2_fs_inconsistent

We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.

Secondly, in btree_node_write_endio:

00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085  #0: ffffff80d7ccad68 (&amp;j-&gt;reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085  #1: ffffff80d7c84218 (&amp;c-&gt;btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085  #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085  #3: ffffff80c3823c20 (&amp;vblk-&gt;vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last  enabled at (327): [&lt;ffffffc080073a14&gt;] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [&lt;ffffffc080971a10&gt;] el1_interrupt+0x20/0x60
00085 softirqs last  enabled at (0): [&lt;ffffffc08002f920&gt;] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [&lt;0000000000000000&gt;] 0x0
00085 Preemption disabled at:
00085 [&lt;ffffffc08003ada0&gt;] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085  show_stack+0x1c/0x30 (C)
00085  dump_stack_lvl+0x84/0xc0
00085  dump_stack+0x14/0x20
00085  __might_resched+0x180/0x288
00085  __might_sleep+0x4c/0x88
00085  __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085  krealloc_noprof+0x1a0/0x2d8
00085  bch2_printbuf_make_room+0x9c/0x120
00085  bch2_prt_printf+0x60/0x1b8
00085  btree_node_write_endio+0x1b0/0x2d8
00085  bio_endio+0x138/0x1f0
00085  btree_node_write_endio+0xe8/0x2d8
00085  bio_endio+0x138/0x1f0
00085  blk_update_request+0x220/0x4c0
00085  blk_mq_end_request+0x28/0x148
00085  virtblk_request_done+0x64/0xe8
00085  blk_mq_complete_request+0x34/0x40
00085  virtblk_done+0x78/0x130
00085  vring_interrupt+0x6c/0xb0
00085  __handle_irq_event_percpu+0x8c/0x2e0
00085  handle_irq_event+0x50/0xb0
00085  handle_fasteoi_irq+0xc4/0x250
00085  handle_irq_desc+0x44/0x60
00085  generic_handle_domain_irq+0x20/0x30
00085  gic_handle_irq+0x54/0xc8
00085  call_on_irq_stack+0x24/0x40

Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Two fixes from the recent logging changes:

bch2_inconsistent(), bch2_fs_inconsistent() be called from interrupt
context, or with rcu_read_lock() held.

The one syzbot found is in
  bch2_bkey_pick_read_device
  bch2_dev_rcu
  bch2_fs_inconsistent

We're starting to switch to lift the printbufs up to higher levels so we
can emit better log messages and print them all in one go (avoid
garbling), so that conversion will help with spotting these in the
future; when we declare a printbuf it must be flagged if we're in an
atomic context.

Secondly, in btree_node_write_endio:

00085 BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
00085 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 618, name: bch-reclaim/fa6
00085 preempt_count: 10001, expected: 0
00085 RCU nest depth: 0, expected: 0
00085 4 locks held by bch-reclaim/fa6/618:
00085  #0: ffffff80d7ccad68 (&amp;j-&gt;reclaim_lock){+.+.}-{4:4}, at: bch2_journal_reclaim_thread+0x84/0x198
00085  #1: ffffff80d7c84218 (&amp;c-&gt;btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x1c0/0x440
00085  #2: ffffff80cd3f8140 (bcachefs_btree){+.+.}-{0:0}, at: __bch2_trans_get+0x22c/0x440
00085  #3: ffffff80c3823c20 (&amp;vblk-&gt;vqs[i].lock){-.-.}-{3:3}, at: virtblk_done+0x58/0x130
00085 irq event stamp: 328
00085 hardirqs last  enabled at (327): [&lt;ffffffc080073a14&gt;] finish_task_switch.isra.0+0xbc/0x2a0
00085 hardirqs last disabled at (328): [&lt;ffffffc080971a10&gt;] el1_interrupt+0x20/0x60
00085 softirqs last  enabled at (0): [&lt;ffffffc08002f920&gt;] copy_process+0x7c8/0x2118
00085 softirqs last disabled at (0): [&lt;0000000000000000&gt;] 0x0
00085 Preemption disabled at:
00085 [&lt;ffffffc08003ada0&gt;] irq_enter_rcu+0x18/0x90
00085 CPU: 8 UID: 0 PID: 618 Comm: bch-reclaim/fa6 Not tainted 6.14.0-rc6-ktest-g04630bde23e8 #18798
00085 Hardware name: linux,dummy-virt (DT)
00085 Call trace:
00085  show_stack+0x1c/0x30 (C)
00085  dump_stack_lvl+0x84/0xc0
00085  dump_stack+0x14/0x20
00085  __might_resched+0x180/0x288
00085  __might_sleep+0x4c/0x88
00085  __kmalloc_node_track_caller_noprof+0x34c/0x3e0
00085  krealloc_noprof+0x1a0/0x2d8
00085  bch2_printbuf_make_room+0x9c/0x120
00085  bch2_prt_printf+0x60/0x1b8
00085  btree_node_write_endio+0x1b0/0x2d8
00085  bio_endio+0x138/0x1f0
00085  btree_node_write_endio+0xe8/0x2d8
00085  bio_endio+0x138/0x1f0
00085  blk_update_request+0x220/0x4c0
00085  blk_mq_end_request+0x28/0x148
00085  virtblk_request_done+0x64/0xe8
00085  blk_mq_complete_request+0x34/0x40
00085  virtblk_done+0x78/0x130
00085  vring_interrupt+0x6c/0xb0
00085  __handle_irq_event_percpu+0x8c/0x2e0
00085  handle_irq_event+0x50/0xb0
00085  handle_fasteoi_irq+0xc4/0x250
00085  handle_irq_desc+0x44/0x60
00085  generic_handle_domain_irq+0x20/0x30
00085  gic_handle_irq+0x54/0xc8
00085  call_on_irq_stack+0x24/0x40

Reported-by: syzbot+c82cd2906e2f192410bb@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Better printing of inconsistency errors</title>
<updated>2025-03-29T17:26:13+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-26T14:41:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6d77ce4a273b319f6e9e8d2b6b2415a13bdea66d'/>
<id>6d77ce4a273b319f6e9e8d2b6b2415a13bdea66d</id>
<content type='text'>
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: bch2_count_fsck_err()</title>
<updated>2025-03-29T17:26:13+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-28T16:15:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7337f9f14e0e2dbd2da50ade0cd7e58df6c7af6d'/>
<id>7337f9f14e0e2dbd2da50ade0cd7e58df6c7af6d</id>
<content type='text'>
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Better helpers for inconsistency errors</title>
<updated>2025-03-29T02:31:47+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-28T15:59:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b00750c2e5f09b90fdd370ff3f9581b880ad86fd'/>
<id>b00750c2e5f09b90fdd370ff3f9581b880ad86fd</id>
<content type='text'>
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.

Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.

Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Consistent indentation of multiline fsck errors</title>
<updated>2025-03-29T02:31:47+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-26T17:21:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1ece53237e83edb12cb6c1a8b91f54735e64d95f'/>
<id>1ece53237e83edb12cb6c1a8b91f54735e64d95f</id>
<content type='text'>
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add the new helper printbuf_indent_add_nextline(), and use it in
__bch2_fsck_err() to centralize setting the indentation of multiline
fsck errors.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: fs-common.c -&gt; namei.c</title>
<updated>2025-03-24T13:50:36+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-20T14:53:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4fcd4de0a659a1b9b151d9f88e6ec67a6c05fba5'/>
<id>4fcd4de0a659a1b9b151d9f88e6ec67a6c05fba5</id>
<content type='text'>
name &lt;-&gt; inode, code for managing the relationships between inodes and
dirents.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
name &lt;-&gt; inode, code for managing the relationships between inodes and
dirents.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Kick devices out after too many write IO errors</title>
<updated>2025-03-15T01:02:16+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-26T23:44:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=981e3801443f507d74e2dae5710452642c96e8e3'/>
<id>981e3801443f507d74e2dae5710452642c96e8e3</id>
<content type='text'>
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).

But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.

This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.

In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".

Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.

After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).

But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.

This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.

In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".

Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.

After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
</feed>
