<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/bcachefs/error.h, branch v6.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>bcachefs: Error ratelimiting is no longer only during fsck</title>
<updated>2025-04-20T23:41:38+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-04-18T02:44:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=417f01e726036b564e2e14c39b2be58e93bf7971'/>
<id>417f01e726036b564e2e14c39b2be58e93bf7971</id>
<content type='text'>
We now more often do repair automatically, without the user invoking
fsck - and sometimes that can involve fixing lots of errors, so let's
avoid flooding the dmesg log.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We now more often do repair automatically, without the user invoking
fsck - and sometimes that can involve fixing lots of errors, so let's
avoid flooding the dmesg log.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Better printing of inconsistency errors</title>
<updated>2025-03-29T17:26:13+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-26T14:41:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6d77ce4a273b319f6e9e8d2b6b2415a13bdea66d'/>
<id>6d77ce4a273b319f6e9e8d2b6b2415a13bdea66d</id>
<content type='text'>
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Build up and emit the error message for an inconsistency error all at
once, instead of spread over multiple printk calls, so they're not
jumbled in the dmesg log.

Also, add better indenting.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: bch2_count_fsck_err()</title>
<updated>2025-03-29T17:26:13+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-28T16:15:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7337f9f14e0e2dbd2da50ade0cd7e58df6c7af6d'/>
<id>7337f9f14e0e2dbd2da50ade0cd7e58df6c7af6d</id>
<content type='text'>
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out a helper from __bch2_fsck_err(), for counting the error in
the superblock and deciding whether to print or ratelimit - will be used
to replace some log_fsck_err() calls, where we want to lift out printing
the error message.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Better helpers for inconsistency errors</title>
<updated>2025-03-29T02:31:47+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-03-28T15:59:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b00750c2e5f09b90fdd370ff3f9581b880ad86fd'/>
<id>b00750c2e5f09b90fdd370ff3f9581b880ad86fd</id>
<content type='text'>
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.

Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
An inconsistency error often happens as part of an event with multiple
error messages, and we want to build up one single error message with
proper indenting to produce more readable log messages that don't get
garbled.

Add new helpers that emit messages to a printbuf instead of printing
them directly, next patch will convert to use them.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Kick devices out after too many write IO errors</title>
<updated>2025-03-15T01:02:16+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-26T23:44:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=981e3801443f507d74e2dae5710452642c96e8e3'/>
<id>981e3801443f507d74e2dae5710452642c96e8e3</id>
<content type='text'>
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).

But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.

This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.

In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".

Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.

After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We're improving our handling of write errors - we shouldn't write
degraded data just because a write failed once, we should retry it (on
other devices, if possible).

But for this to work, we need to kick devices out when they're only
returning errors - otherwise those retries will loop infinitely.

This adds a configurable timeout - if writes are failing for too long,
we'll set that device read-only.

In the future we should also implement more tracking and another knob
for an "allowed error rate", so that we can kick out drives that are
acting "unhealthy".

Another thing we'll want is a mechanism (likely in userspace) for
bringing a device back in after a transient error - perhaps a cable was
jiggled, or there was a controller reset.

After transient errors we also need a mechanism to walk (from the
journal) recent btree updates that weren't flushed to that device and
treat them as "degraded", since unflushed data may well not have been
written. Out of scope for this patch, but becoming relevant.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: Finish bch2_account_io_completion() conversions</title>
<updated>2025-03-15T01:02:16+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-28T19:38:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b31c070407edda710ad087d143353ddd0f2c9499'/>
<id>b31c070407edda710ad087d143353ddd0f2c9499</id>
<content type='text'>
More prep work for automatically kicking devices out after too many IO
errors.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
More prep work for automatically kicking devices out after too many IO
errors.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: bch2_account_io_completion()</title>
<updated>2025-03-15T01:02:16+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-28T19:07:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3526bca36b31731eb468cddaa3ad0f6ebc5d7520'/>
<id>3526bca36b31731eb468cddaa3ad0f6ebc5d7520</id>
<content type='text'>
We need to start accounting successes for every IO, not just failures,
so introduce a unified hook for io completion accounting and convert
io_read.c.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need to start accounting successes for every IO, not just failures,
so introduce a unified hook for io completion accounting and convert
io_read.c.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: bch2_write_op_error() now prints info about data update</title>
<updated>2025-03-15T01:02:14+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-10T22:04:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1ccbcd320577271c85d9a5bfbdd3394cb9baadb3'/>
<id>1ccbcd320577271c85d9a5bfbdd3394cb9baadb3</id>
<content type='text'>
A user has been seeing the "error verifying existing checksum while
rewriting existing data (memory corruption?)" error.

This generally indicates a hardware issue (and that may be the case
here), but it might also indicate a bug, in which case we need more
information to look for patterns.

Reported-by: Roland Vet &lt;vet.roland@protonmail.com&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A user has been seeing the "error verifying existing checksum while
rewriting existing data (memory corruption?)" error.

This generally indicates a hardware issue (and that may be the case
here), but it might also indicate a bug, in which case we need more
information to look for patterns.

Reported-by: Roland Vet &lt;vet.roland@protonmail.com&gt;
Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: bch2_inum_offset_err_msg_trans() no longer handles transaction restarts</title>
<updated>2025-03-15T01:02:12+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-07T18:37:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=06284963e3d86c5e0cf56982c2d947eeb0f30871'/>
<id>06284963e3d86c5e0cf56982c2d947eeb0f30871</id>
<content type='text'>
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
we're starting to use error messages with paths in fsck_errors(), where
we do not want nested transaction restart handling, so let's prepare for
that.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bcachefs: bch2_indirect_extent_missing_error() prints path, not just inode number</title>
<updated>2025-03-15T01:02:12+00:00</updated>
<author>
<name>Kent Overstreet</name>
<email>kent.overstreet@linux.dev</email>
</author>
<published>2025-02-07T06:33:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=45f0e6c838e5d9af3f013adb4ba9aad3bcbcbe3b'/>
<id>45f0e6c838e5d9af3f013adb4ba9aad3bcbcbe3b</id>
<content type='text'>
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We want all error messages converted to print paths, not just inode
numbers - users want this information, and it speeds up debugging too.

Auditing and converting all error messages is going to be a big project,
so for the moment we're just doing this incrementally.

Signed-off-by: Kent Overstreet &lt;kent.overstreet@linux.dev&gt;
</pre>
</div>
</content>
</entry>
</feed>
