<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/erofs, branch v6.13</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>erofs: use buffered I/O for file-backed mounts by default</title>
<updated>2024-12-16T13:02:07+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-12-12T13:43:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6422cde1b0d5a31b206b263417c1c2b3c80fe82c'/>
<id>6422cde1b0d5a31b206b263417c1c2b3c80fe82c</id>
<content type='text'>
For many use cases (e.g. container images are just fetched from remote),
performance will be impacted if underlay page cache is up-to-date but
direct i/o flushes dirty pages first.

Instead, let's use buffered I/O by default to keep in sync with loop
devices and add a (re)mount option to explicitly give a try to use
direct I/O if supported by the underlying files.

The container startup time is improved as below:
[workload] docker.io/library/workpress:latest
                                     unpack        1st run  non-1st runs
EROFS snapshotter buffered I/O file  4.586404265s  0.308s   0.198s
EROFS snapshotter direct I/O file    4.581742849s  2.238s   0.222s
EROFS snapshotter loop               4.596023152s  0.346s   0.201s
Overlayfs snapshotter                5.382851037s  0.206s   0.214s

Fixes: fb176750266a ("erofs: add file-backed mount support")
Cc: Derek McGowan &lt;derek@mcg.dev&gt;
Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241212134336.2059899-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For many use cases (e.g. container images are just fetched from remote),
performance will be impacted if underlay page cache is up-to-date but
direct i/o flushes dirty pages first.

Instead, let's use buffered I/O by default to keep in sync with loop
devices and add a (re)mount option to explicitly give a try to use
direct I/O if supported by the underlying files.

The container startup time is improved as below:
[workload] docker.io/library/workpress:latest
                                     unpack        1st run  non-1st runs
EROFS snapshotter buffered I/O file  4.586404265s  0.308s   0.198s
EROFS snapshotter direct I/O file    4.581742849s  2.238s   0.222s
EROFS snapshotter loop               4.596023152s  0.346s   0.201s
Overlayfs snapshotter                5.382851037s  0.206s   0.214s

Fixes: fb176750266a ("erofs: add file-backed mount support")
Cc: Derek McGowan &lt;derek@mcg.dev&gt;
Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241212134336.2059899-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: reference `struct erofs_device_info` for erofs_map_dev</title>
<updated>2024-12-16T13:02:06+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-12-12T23:54:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f8d920a402aec3482931cb5f1539ed438740fc49'/>
<id>f8d920a402aec3482931cb5f1539ed438740fc49</id>
<content type='text'>
Record `m_sb` and `m_dif` to replace `m_fscache`, `m_daxdev`, `m_fp`
and `m_dax_part_off` in order to simplify the codebase.

Note that `m_bdev` is still left since it can be assigned from
`sb-&gt;s_bdev` directly.

Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241212235401.2857246-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Record `m_sb` and `m_dif` to replace `m_fscache`, `m_daxdev`, `m_fp`
and `m_dax_part_off` in order to simplify the codebase.

Note that `m_bdev` is still left since it can be assigned from
`sb-&gt;s_bdev` directly.

Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241212235401.2857246-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: use `struct erofs_device_info` for the primary device</title>
<updated>2024-12-16T13:01:59+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-12-16T12:53:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7b00af2c5414dc01e0718deef7ead81102867636'/>
<id>7b00af2c5414dc01e0718deef7ead81102867636</id>
<content type='text'>
Instead of just listing each one directly in `struct erofs_sb_info`
except that we still use `sb-&gt;s_bdev` for the primary block device.

Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241216125310.930933-2-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Instead of just listing each one directly in `struct erofs_sb_info`
except that we still use `sb-&gt;s_bdev` for the primary block device.

Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241216125310.930933-2-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: add erofs_sb_free() helper</title>
<updated>2024-12-12T16:26:27+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-12-12T13:35:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e2de3c1bf6a0c99b089bd706a62da8f988918858'/>
<id>e2de3c1bf6a0c99b089bd706a62da8f988918858</id>
<content type='text'>
Unify the common parts of erofs_fc_free() and erofs_kill_sb() as
erofs_sb_free().

Thus, fput() in erofs_fc_get_tree() is no longer needed, too.

Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241212133504.2047178-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Unify the common parts of erofs_fc_free() and erofs_kill_sb() as
erofs_sb_free().

Thus, fput() in erofs_fc_get_tree() is no longer needed, too.

Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241212133504.2047178-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: fix PSI memstall accounting</title>
<updated>2024-12-12T16:24:40+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-11-27T08:52:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1a2180f6859c73c674809f9f82e36c94084682ba'/>
<id>1a2180f6859c73c674809f9f82e36c94084682ba</id>
<content type='text'>
Max Kellermann recently reported psi_group_cpu.tasks[NR_MEMSTALL] is
incorrect in the 6.11.9 kernel.

The root cause appears to be that, since the problematic commit, bio
can be NULL, causing psi_memstall_leave() to be skipped in
z_erofs_submit_queue().

Reported-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Closes: https://lore.kernel.org/r/CAKPOu+8tvSowiJADW2RuKyofL_CSkm_SuyZA7ME5vMLWmL6pqw@mail.gmail.com
Fixes: 9e2f9d34dd12 ("erofs: handle overlapped pclusters out of crafted images properly")
Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241127085236.3538334-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Max Kellermann recently reported psi_group_cpu.tasks[NR_MEMSTALL] is
incorrect in the 6.11.9 kernel.

The root cause appears to be that, since the problematic commit, bio
can be NULL, causing psi_memstall_leave() to be skipped in
z_erofs_submit_queue().

Reported-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Closes: https://lore.kernel.org/r/CAKPOu+8tvSowiJADW2RuKyofL_CSkm_SuyZA7ME5vMLWmL6pqw@mail.gmail.com
Fixes: 9e2f9d34dd12 ("erofs: handle overlapped pclusters out of crafted images properly")
Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241127085236.3538334-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: fix rare pcluster memory leak after unmounting</title>
<updated>2024-12-12T16:24:12+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-12-03T07:28:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b10a1e5643e505c367c7e16aa6d8a9a0dc07354b'/>
<id>b10a1e5643e505c367c7e16aa6d8a9a0dc07354b</id>
<content type='text'>
There may still exist some pcluster with valid reference counts
during unmounting.  Instead of introducing another synchronization
primitive, just try again as unmounting is relatively rare.  This
approach is similar to z_erofs_cache_invalidate_folio().

It was also reported by syzbot as a UAF due to commit f5ad9f9a603f
("erofs: free pclusters if no cached folio is attached"):

BUG: KASAN: slab-use-after-free in do_raw_spin_trylock+0x72/0x1f0 kernel/locking/spinlock_debug.c:123
..
 queued_spin_trylock include/asm-generic/qspinlock.h:92 [inline]
 do_raw_spin_trylock+0x72/0x1f0 kernel/locking/spinlock_debug.c:123
 __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
 _raw_spin_trylock+0x20/0x80 kernel/locking/spinlock.c:138
 spin_trylock include/linux/spinlock.h:361 [inline]
 z_erofs_put_pcluster fs/erofs/zdata.c:959 [inline]
 z_erofs_decompress_pcluster fs/erofs/zdata.c:1403 [inline]
 z_erofs_decompress_queue+0x3798/0x3ef0 fs/erofs/zdata.c:1425
 z_erofs_decompressqueue_work+0x99/0xe0 fs/erofs/zdata.c:1437
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa68/0x1840 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f2/0x390 kernel/kthread.c:389
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 &lt;/TASK&gt;

However, it seems a long outstanding memory leak.  Fix it now.

Fixes: f5ad9f9a603f ("erofs: free pclusters if no cached folio is attached")
Reported-by: syzbot+7ff87b095e7ca0c5ac39@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/674c1235.050a0220.ad585.0032.GAE@google.com
Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241203072821.1885740-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There may still exist some pcluster with valid reference counts
during unmounting.  Instead of introducing another synchronization
primitive, just try again as unmounting is relatively rare.  This
approach is similar to z_erofs_cache_invalidate_folio().

It was also reported by syzbot as a UAF due to commit f5ad9f9a603f
("erofs: free pclusters if no cached folio is attached"):

BUG: KASAN: slab-use-after-free in do_raw_spin_trylock+0x72/0x1f0 kernel/locking/spinlock_debug.c:123
..
 queued_spin_trylock include/asm-generic/qspinlock.h:92 [inline]
 do_raw_spin_trylock+0x72/0x1f0 kernel/locking/spinlock_debug.c:123
 __raw_spin_trylock include/linux/spinlock_api_smp.h:89 [inline]
 _raw_spin_trylock+0x20/0x80 kernel/locking/spinlock.c:138
 spin_trylock include/linux/spinlock.h:361 [inline]
 z_erofs_put_pcluster fs/erofs/zdata.c:959 [inline]
 z_erofs_decompress_pcluster fs/erofs/zdata.c:1403 [inline]
 z_erofs_decompress_queue+0x3798/0x3ef0 fs/erofs/zdata.c:1425
 z_erofs_decompressqueue_work+0x99/0xe0 fs/erofs/zdata.c:1437
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa68/0x1840 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f2/0x390 kernel/kthread.c:389
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 &lt;/TASK&gt;

However, it seems a long outstanding memory leak.  Fix it now.

Fixes: f5ad9f9a603f ("erofs: free pclusters if no cached folio is attached")
Reported-by: syzbot+7ff87b095e7ca0c5ac39@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/674c1235.050a0220.ad585.0032.GAE@google.com
Reviewed-by: Chao Yu &lt;chao@kernel.org&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241203072821.1885740-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: handle NONHEAD !delta[1] lclusters gracefully</title>
<updated>2024-11-18T10:50:14+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-11-15T17:36:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0bc8061ffc733a0a246b8689b2d32a3e9204f43c'/>
<id>0bc8061ffc733a0a246b8689b2d32a3e9204f43c</id>
<content type='text'>
syzbot reported a WARNING in iomap_iter_done:
 iomap_fiemap+0x73b/0x9b0 fs/iomap/fiemap.c:80
 ioctl_fiemap fs/ioctl.c:220 [inline]

Generally, NONHEAD lclusters won't have delta[1]==0, except for crafted
images and filesystems created by pre-1.0 mkfs versions.

Previously, it would immediately bail out if delta[1]==0, which led to
inadequate decompressed lengths (thus FIEMAP is impacted).  Treat it as
delta[1]=1 to work around these legacy mkfs versions.

`lclusterbits &gt; 14` is illegal for compact indexes, error out too.

Reported-by: syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/67373c0c.050a0220.2a2fcc.0079.GAE@google.com
Tested-by: syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com
Fixes: d95ae5e25326 ("erofs: add support for the full decompressed length")
Fixes: 001b8ccd0650 ("erofs: fix compact 4B support for 16k block size")
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241115173651.3339514-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
syzbot reported a WARNING in iomap_iter_done:
 iomap_fiemap+0x73b/0x9b0 fs/iomap/fiemap.c:80
 ioctl_fiemap fs/ioctl.c:220 [inline]

Generally, NONHEAD lclusters won't have delta[1]==0, except for crafted
images and filesystems created by pre-1.0 mkfs versions.

Previously, it would immediately bail out if delta[1]==0, which led to
inadequate decompressed lengths (thus FIEMAP is impacted).  Treat it as
delta[1]=1 to work around these legacy mkfs versions.

`lclusterbits &gt; 14` is illegal for compact indexes, error out too.

Reported-by: syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/r/67373c0c.050a0220.2a2fcc.0079.GAE@google.com
Tested-by: syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com
Fixes: d95ae5e25326 ("erofs: add support for the full decompressed length")
Fixes: 001b8ccd0650 ("erofs: fix compact 4B support for 16k block size")
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241115173651.3339514-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: clarify direct I/O support</title>
<updated>2024-11-18T10:50:14+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-11-15T07:46:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b49c0215b176e9c2e0998e7929eeb9261c9a7919'/>
<id>b49c0215b176e9c2e0998e7929eeb9261c9a7919</id>
<content type='text'>
Currently, only filesystems backed by block devices support direct I/O.

Also remove the unnecessary strict checks that can be supported with iomap.

Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241115074625.2520728-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, only filesystems backed by block devices support direct I/O.

Also remove the unnecessary strict checks that can be supported with iomap.

Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241115074625.2520728-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: fix blksize &lt; PAGE_SIZE for file-backed mounts</title>
<updated>2024-11-18T10:50:14+00:00</updated>
<author>
<name>Hongzhen Luo</name>
<email>hongzhen@linux.alibaba.com</email>
</author>
<published>2024-10-15T10:38:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bae0854160939a64a092516ff1b2f221402b843b'/>
<id>bae0854160939a64a092516ff1b2f221402b843b</id>
<content type='text'>
Adjust sb-&gt;s_blocksize{,_bits} directly for file-backed
mounts when the fs block size is smaller than PAGE_SIZE.

Previously, EROFS used sb_set_blocksize(), which caused
a panic if bdev-backed mounts is not used.

Fixes: fb176750266a ("erofs: add file-backed mount support")
Signed-off-by: Hongzhen Luo &lt;hongzhen@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241015103836.3757438-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Adjust sb-&gt;s_blocksize{,_bits} directly for file-backed
mounts when the fs block size is smaller than PAGE_SIZE.

Previously, EROFS used sb_set_blocksize(), which caused
a panic if bdev-backed mounts is not used.

Fixes: fb176750266a ("erofs: add file-backed mount support")
Signed-off-by: Hongzhen Luo &lt;hongzhen@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241015103836.3757438-1-hongzhen@linux.alibaba.com
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>erofs: get rid of `buf-&gt;kmap_type`</title>
<updated>2024-11-18T10:50:14+00:00</updated>
<author>
<name>Gao Xiang</name>
<email>hsiangkao@linux.alibaba.com</email>
</author>
<published>2024-11-14T09:58:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ec4f59d1a99de86e5c14cf97946e94d5cef98ab0'/>
<id>ec4f59d1a99de86e5c14cf97946e94d5cef98ab0</id>
<content type='text'>
After commit 927e5010ff5b ("erofs: use kmap_local_page() only for
erofs_bread()"), `buf-&gt;kmap_type` actually has no use at all.

Let's get rid of `buf-&gt;kmap_type` now.

Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241114095813.839866-1-hsiangkao@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 927e5010ff5b ("erofs: use kmap_local_page() only for
erofs_bread()"), `buf-&gt;kmap_type` actually has no use at all.

Let's get rid of `buf-&gt;kmap_type` now.

Suggested-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Gao Xiang &lt;hsiangkao@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20241114095813.839866-1-hsiangkao@linux.alibaba.com
</pre>
</div>
</content>
</entry>
</feed>
