<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/ext4/inode.c, branch v6.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4</title>
<updated>2025-05-28T19:12:08+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-28T19:12:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d87d73895fcdbe6e45813efc473544433862364f'/>
<id>d87d73895fcdbe6e45813efc473544433862364f</id>
<content type='text'>
Pull ext4 updates from Ted Ts'o:
 "New ext4 features and performance improvements:

   - Fast commit performance improvements

   - Multi-fsblock atomic write support for bigalloc file systems

   - Large folio support for regular files

  This last can result in really stupendous performance for the right
  workloads. For example, see [1] where the Kernel Test Robot reported
  over 37% improvement on a large sequential I/O workload.

  There are also the usual bug fixes and cleanups. Of note are cleanups
  of the extent status tree to fix potential races that could result in
  the extent status tree getting corrupted under heavy simultaneous
  allocation and deallocation to a single file"

Link: https://lore.kernel.org/all/202505161418.ec0d753f-lkp@intel.com/ [1]

* tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits)
  ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF instead
  ext4: Simplify flags in ext4_map_query_blocks()
  ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER
  ext4: Simplify last in leaf check in ext4_map_query_blocks
  ext4: Unwritten to written conversion requires EXT4_EX_NOCACHE
  ext4: only dirty folios when data journaling regular files
  ext4: Add atomic block write documentation
  ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc
  ext4: Add multi-fsblock atomic write support with bigalloc
  ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS
  ext4: Make ext4_meta_trans_blocks() non-static for later use
  ext4: Check if inode uses extents in ext4_inode_can_atomic_write()
  ext4: Document an edge case for overwrites
  jbd2: remove journal_t argument from jbd2_superblock_csum()
  jbd2: remove journal_t argument from jbd2_chksum()
  ext4: remove sb argument from ext4_superblock_csum()
  ext4: remove sbi argument from ext4_chksum()
  ext4: enable large folio for regular file
  ext4: make online defragmentation support large folios
  ext4: make the writeback path support large folios
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ext4 updates from Ted Ts'o:
 "New ext4 features and performance improvements:

   - Fast commit performance improvements

   - Multi-fsblock atomic write support for bigalloc file systems

   - Large folio support for regular files

  This last can result in really stupendous performance for the right
  workloads. For example, see [1] where the Kernel Test Robot reported
  over 37% improvement on a large sequential I/O workload.

  There are also the usual bug fixes and cleanups. Of note are cleanups
  of the extent status tree to fix potential races that could result in
  the extent status tree getting corrupted under heavy simultaneous
  allocation and deallocation to a single file"

Link: https://lore.kernel.org/all/202505161418.ec0d753f-lkp@intel.com/ [1]

* tag 'ext4_for_linus-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (52 commits)
  ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF instead
  ext4: Simplify flags in ext4_map_query_blocks()
  ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER
  ext4: Simplify last in leaf check in ext4_map_query_blocks
  ext4: Unwritten to written conversion requires EXT4_EX_NOCACHE
  ext4: only dirty folios when data journaling regular files
  ext4: Add atomic block write documentation
  ext4: Enable support for ext4 multi-fsblock atomic write using bigalloc
  ext4: Add multi-fsblock atomic write support with bigalloc
  ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS
  ext4: Make ext4_meta_trans_blocks() non-static for later use
  ext4: Check if inode uses extents in ext4_inode_can_atomic_write()
  ext4: Document an edge case for overwrites
  jbd2: remove journal_t argument from jbd2_superblock_csum()
  jbd2: remove journal_t argument from jbd2_chksum()
  ext4: remove sb argument from ext4_superblock_csum()
  ext4: remove sbi argument from ext4_chksum()
  ext4: enable large folio for regular file
  ext4: make online defragmentation support large folios
  ext4: make the writeback path support large folios
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Simplify flags in ext4_map_query_blocks()</title>
<updated>2025-05-20T18:21:00+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-19T18:19:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=618320daa9673ce2b6adae5ad6fbf16e878ff6c9'/>
<id>618320daa9673ce2b6adae5ad6fbf16e878ff6c9</id>
<content type='text'>
Now that we have EXT4_EX_QUERY_FILTER mask, let's use that to simplify
the filtering of flags for passing to ext4_ext_map_blocks() in
ext4_map_query_blocks() function. This allows us to kill the query_flags
local variable which is not needed anymore.

Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/4ae735e83e6f43341e53e2d289e59156a8360134.1747677758.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that we have EXT4_EX_QUERY_FILTER mask, let's use that to simplify
the filtering of flags for passing to ext4_ext_map_blocks() in
ext4_map_query_blocks() function. This allows us to kill the query_flags
local variable which is not needed anymore.

Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/4ae735e83e6f43341e53e2d289e59156a8360134.1747677758.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER</title>
<updated>2025-05-20T18:21:00+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-19T18:19:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9597376bdb6e5830448ba40aacd3ebd705fe35cc'/>
<id>9597376bdb6e5830448ba40aacd3ebd705fe35cc</id>
<content type='text'>
Rename EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER to better describe its
purpose as a filter mask used specifically in ext4_map_query_blocks().
Add a comment explaining that this macro is used to filter flags needed
when querying the on-disk extent tree.

We will later use EXT4_EX_QUERY_FILTER mask to add another
EXT4_GET_BLOCKS_QUERY needed to lookup in on-disk extent tree.

Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/51f05d0ba286372eb8693af95bd4b10194b53141.1747677758.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER to better describe its
purpose as a filter mask used specifically in ext4_map_query_blocks().
Add a comment explaining that this macro is used to filter flags needed
when querying the on-disk extent tree.

We will later use EXT4_EX_QUERY_FILTER mask to add another
EXT4_GET_BLOCKS_QUERY needed to lookup in on-disk extent tree.

Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/51f05d0ba286372eb8693af95bd4b10194b53141.1747677758.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Simplify last in leaf check in ext4_map_query_blocks</title>
<updated>2025-05-20T18:21:00+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-19T18:19:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=797ac3ca24ca1af396e84d16c79c523cfefe1737'/>
<id>797ac3ca24ca1af396e84d16c79c523cfefe1737</id>
<content type='text'>
This simplifies the check for last in leaf in ext4_map_query_blocks()
and fixes this cocci warning.

cocci warnings: (new ones prefixed by &gt;&gt;)
&gt;&gt; fs/ext4/inode.c:573:49-51: WARNING !A || A &amp;&amp; B is equivalent to !A || B

Fixes: 5bb12b1837c0 ("ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202505191524.auftmOwK-lkp@intel.com/
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Reviewed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Link: https://patch.msgid.link/5fd5c806218c83f603c578c95997cf7f6da29d74.1747677758.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This simplifies the check for last in leaf in ext4_map_query_blocks()
and fixes this cocci warning.

cocci warnings: (new ones prefixed by &gt;&gt;)
&gt;&gt; fs/ext4/inode.c:573:49-51: WARNING !A || A &amp;&amp; B is equivalent to !A || B

Fixes: 5bb12b1837c0 ("ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS")
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202505191524.auftmOwK-lkp@intel.com/
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Reviewed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Link: https://patch.msgid.link/5fd5c806218c83f603c578c95997cf7f6da29d74.1747677758.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: only dirty folios when data journaling regular files</title>
<updated>2025-05-20T14:31:13+00:00</updated>
<author>
<name>Brian Foster</name>
<email>bfoster@redhat.com</email>
</author>
<published>2025-05-16T17:38:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e26268ff1dcae5662c1b96c35f18cfa6ab73d9de'/>
<id>e26268ff1dcae5662c1b96c35f18cfa6ab73d9de</id>
<content type='text'>
fstest generic/388 occasionally reproduces a crash that looks as
follows:

BUG: kernel NULL pointer dereference, address: 0000000000000000
...
Call Trace:
 &lt;TASK&gt;
 ext4_block_zero_page_range+0x30c/0x380 [ext4]
 ext4_truncate+0x436/0x440 [ext4]
 ext4_process_orphan+0x5d/0x110 [ext4]
 ext4_orphan_cleanup+0x124/0x4f0 [ext4]
 ext4_fill_super+0x262d/0x3110 [ext4]
 get_tree_bdev_flags+0x132/0x1d0
 vfs_get_tree+0x26/0xd0
 vfs_cmd_create+0x59/0xe0
 __do_sys_fsconfig+0x4ed/0x6b0
 do_syscall_64+0x82/0x170
 ...

This occurs when processing a symlink inode from the orphan list. The
partial block zeroing code in the truncate path calls
ext4_dirty_journalled_data() -&gt; folio_mark_dirty(). The latter calls
mapping-&gt;a_ops-&gt;dirty_folio(), but symlink inodes are not assigned an
a_ops vector in ext4, hence the crash.

To avoid this problem, update the ext4_dirty_journalled_data() helper to
only mark the folio dirty on regular files (for which a_ops is
assigned). This also matches the journaling logic in the ext4_symlink()
creation path, where ext4_handle_dirty_metadata() is called directly.

Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Link: https://patch.msgid.link/20250516173800.175577-1-bfoster@redhat.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: stable@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
fstest generic/388 occasionally reproduces a crash that looks as
follows:

BUG: kernel NULL pointer dereference, address: 0000000000000000
...
Call Trace:
 &lt;TASK&gt;
 ext4_block_zero_page_range+0x30c/0x380 [ext4]
 ext4_truncate+0x436/0x440 [ext4]
 ext4_process_orphan+0x5d/0x110 [ext4]
 ext4_orphan_cleanup+0x124/0x4f0 [ext4]
 ext4_fill_super+0x262d/0x3110 [ext4]
 get_tree_bdev_flags+0x132/0x1d0
 vfs_get_tree+0x26/0xd0
 vfs_cmd_create+0x59/0xe0
 __do_sys_fsconfig+0x4ed/0x6b0
 do_syscall_64+0x82/0x170
 ...

This occurs when processing a symlink inode from the orphan list. The
partial block zeroing code in the truncate path calls
ext4_dirty_journalled_data() -&gt; folio_mark_dirty(). The latter calls
mapping-&gt;a_ops-&gt;dirty_folio(), but symlink inodes are not assigned an
a_ops vector in ext4, hence the crash.

To avoid this problem, update the ext4_dirty_journalled_data() helper to
only mark the folio dirty on regular files (for which a_ops is
assigned). This also matches the journaling logic in the ext4_symlink()
creation path, where ext4_handle_dirty_metadata() is called directly.

Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty")
Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Link: https://patch.msgid.link/20250516173800.175577-1-bfoster@redhat.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Cc: stable@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Add multi-fsblock atomic write support with bigalloc</title>
<updated>2025-05-20T14:31:12+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-15T19:50:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b86629c2b2998338b4a715058b402e47d0b36206'/>
<id>b86629c2b2998338b4a715058b402e47d0b36206</id>
<content type='text'>
EXT4 supports bigalloc feature which allows the FS to work in size of
clusters (group of blocks) rather than individual blocks. This patch
adds atomic write support for bigalloc so that systems with bs = ps can
also create FS using -
    mkfs.ext4 -F -O bigalloc -b 4096 -C 16384 &lt;dev&gt;

With bigalloc ext4 can support multi-fsblock atomic writes. We will have to
adjust ext4's atomic write unit max value to cluster size. This can then support
atomic write of size anywhere between [blocksize, clustersize]. This
patch adds the required changes to enable multi-fsblock atomic write
support using bigalloc in the next patch.

In this patch for block allocation:
we first query the underlying region of the requested range by calling
ext4_map_blocks() call. Here are the various cases which we then handle
depending upon the underlying mapping type:
1. If the underlying region for the entire requested range is a mapped extent,
   then we don't call ext4_map_blocks() to allocate anything. We don't need to
   even start the jbd2 txn in this case.
2. For an append write case, we create a mapped extent.
3. If the underlying region is entirely a hole, then we create an unwritten
   extent for the requested range.
4. If the underlying region is a large unwritten extent, then we split the
   extent into 2 unwritten extent of required size.
5. If the underlying region has any type of mixed mapping, then we call
   ext4_map_blocks() in a loop to zero out the unwritten and the hole regions
   within the requested range. This then provide a single mapped extent type
   mapping for the requested range.

Note: We invoke ext4_map_blocks() in a loop with the EXT4_GET_BLOCKS_ZERO
flag only when the underlying extent mapping of the requested range is
not entirely a hole, an unwritten extent, or a fully mapped extent. That
is, if the underlying region contains a mix of hole(s), unwritten
extent(s), and mapped extent(s), we use this loop to ensure that all the
short mappings are zeroed out. This guarantees that the entire requested
range becomes a single, uniformly mapped extent. It is ok to do so
because we know this is being done on a bigalloc enabled filesystem
where the block bitmap represents the entire cluster unit.

Note having a single contiguous underlying region of type mapped,
unwrittn or hole is not a problem. But the reason to avoid writing on
top of mixed mapping region is because, atomic writes requires all or
nothing should get written for the userspace pwritev2 request. So if at
any point in time during the write if a crash or a sudden poweroff
occurs, the region undergoing atomic write should read either complete
old data or complete new data. But it should never have a mix of both
old and new data.
So, we first convert any mixed mapping region to a single contiguous
mapped extent before any data gets written to it. This is because
normally FS will only convert unwritten extents to written at the end of
the write in -&gt;end_io() call. And if we allow the writes over a mixed
mapping and if a sudden power off happens in between, we will end up
reading mix of new data (over mapped extents) and old data (over
unwritten extents), because unwritten to written conversion never went
through.
So to avoid this and to avoid writes getting torned due to mixed
mapping, we first allocate a single contiguous block mapping and then
do the write.

Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Co-developed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/c4965ac3407cbc773f0bc954d0966d9696f5038a.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
EXT4 supports bigalloc feature which allows the FS to work in size of
clusters (group of blocks) rather than individual blocks. This patch
adds atomic write support for bigalloc so that systems with bs = ps can
also create FS using -
    mkfs.ext4 -F -O bigalloc -b 4096 -C 16384 &lt;dev&gt;

With bigalloc ext4 can support multi-fsblock atomic writes. We will have to
adjust ext4's atomic write unit max value to cluster size. This can then support
atomic write of size anywhere between [blocksize, clustersize]. This
patch adds the required changes to enable multi-fsblock atomic write
support using bigalloc in the next patch.

In this patch for block allocation:
we first query the underlying region of the requested range by calling
ext4_map_blocks() call. Here are the various cases which we then handle
depending upon the underlying mapping type:
1. If the underlying region for the entire requested range is a mapped extent,
   then we don't call ext4_map_blocks() to allocate anything. We don't need to
   even start the jbd2 txn in this case.
2. For an append write case, we create a mapped extent.
3. If the underlying region is entirely a hole, then we create an unwritten
   extent for the requested range.
4. If the underlying region is a large unwritten extent, then we split the
   extent into 2 unwritten extent of required size.
5. If the underlying region has any type of mixed mapping, then we call
   ext4_map_blocks() in a loop to zero out the unwritten and the hole regions
   within the requested range. This then provide a single mapped extent type
   mapping for the requested range.

Note: We invoke ext4_map_blocks() in a loop with the EXT4_GET_BLOCKS_ZERO
flag only when the underlying extent mapping of the requested range is
not entirely a hole, an unwritten extent, or a fully mapped extent. That
is, if the underlying region contains a mix of hole(s), unwritten
extent(s), and mapped extent(s), we use this loop to ensure that all the
short mappings are zeroed out. This guarantees that the entire requested
range becomes a single, uniformly mapped extent. It is ok to do so
because we know this is being done on a bigalloc enabled filesystem
where the block bitmap represents the entire cluster unit.

Note having a single contiguous underlying region of type mapped,
unwrittn or hole is not a problem. But the reason to avoid writing on
top of mixed mapping region is because, atomic writes requires all or
nothing should get written for the userspace pwritev2 request. So if at
any point in time during the write if a crash or a sudden poweroff
occurs, the region undergoing atomic write should read either complete
old data or complete new data. But it should never have a mix of both
old and new data.
So, we first convert any mixed mapping region to a single contiguous
mapped extent before any data gets written to it. This is because
normally FS will only convert unwritten extents to written at the end of
the write in -&gt;end_io() call. And if we allow the writes over a mixed
mapping and if a sudden power off happens in between, we will end up
reading mix of new data (over mapped extents) and old data (over
unwritten extents), because unwritten to written conversion never went
through.
So to avoid this and to avoid writes getting torned due to mixed
mapping, we first allocate a single contiguous block mapping and then
do the write.

Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Co-developed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/c4965ac3407cbc773f0bc954d0966d9696f5038a.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS</title>
<updated>2025-05-20T14:31:12+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-15T19:50:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5bb12b1837c0bf7ddc84e27812f1693a126fe27a'/>
<id>5bb12b1837c0bf7ddc84e27812f1693a126fe27a</id>
<content type='text'>
There can be a case where there are contiguous extents on the adjacent
leaf nodes of on-disk extent trees. So when someone tries to write to
this contiguous range, ext4_map_blocks() call will split by returning
1 extent at a time if this is not already cached in extent_status tree
cache (where if these extents when cached can get merged since they are
contiguous).

This is fine for a normal write however in case of atomic writes, it
can't afford to break the write into two. Now this is also something
that will only happen in the slow write case where we call
ext4_map_blocks() for each of these extents spread across different leaf
nodes. However, there is no guarantee that these extent status cache
cannot be reclaimed before the last call to ext4_map_blocks() in
ext4_map_blocks_atomic_write_slow().

Hence this patch adds support of EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.
This flag checks if the requested range can be fully found in extent
status cache and return. If not, it looks up in on-disk extent
tree via ext4_map_query_blocks(). If the found extent is the last entry
in the leaf node, then it goes and queries the next lblk to see if there
is an adjacent contiguous extent in the adjacent leaf node of the
on-disk extent tree.

Even though there can be a case where there are multiple adjacent extent
entries spread across multiple leaf nodes. But we only read an adjacent
leaf block i.e. in total of 2 extent entries spread across 2 leaf nodes.
The reason for this is that we are mostly only going to support atomic
writes with upto 64KB or maybe max upto 1MB of atomic write support.

Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Co-developed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/6bb563e661f5fbd80e266a9e6ce6e29178f555f6.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There can be a case where there are contiguous extents on the adjacent
leaf nodes of on-disk extent trees. So when someone tries to write to
this contiguous range, ext4_map_blocks() call will split by returning
1 extent at a time if this is not already cached in extent_status tree
cache (where if these extents when cached can get merged since they are
contiguous).

This is fine for a normal write however in case of atomic writes, it
can't afford to break the write into two. Now this is also something
that will only happen in the slow write case where we call
ext4_map_blocks() for each of these extents spread across different leaf
nodes. However, there is no guarantee that these extent status cache
cannot be reclaimed before the last call to ext4_map_blocks() in
ext4_map_blocks_atomic_write_slow().

Hence this patch adds support of EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS.
This flag checks if the requested range can be fully found in extent
status cache and return. If not, it looks up in on-disk extent
tree via ext4_map_query_blocks(). If the found extent is the last entry
in the leaf node, then it goes and queries the next lblk to see if there
is an adjacent contiguous extent in the adjacent leaf node of the
on-disk extent tree.

Even though there can be a case where there are multiple adjacent extent
entries spread across multiple leaf nodes. But we only read an adjacent
leaf block i.e. in total of 2 extent entries spread across 2 leaf nodes.
The reason for this is that we are mostly only going to support atomic
writes with upto 64KB or maybe max upto 1MB of atomic write support.

Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Co-developed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/6bb563e661f5fbd80e266a9e6ce6e29178f555f6.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Make ext4_meta_trans_blocks() non-static for later use</title>
<updated>2025-05-20T14:31:12+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-15T19:50:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=255e7bc2127cbd3a718a55d2da5b2d3f015adcd7'/>
<id>255e7bc2127cbd3a718a55d2da5b2d3f015adcd7</id>
<content type='text'>
Let's make ext4_meta_trans_blocks() non-static for use in later
functions during -&gt;end_io conversion for atomic writes.
We will need this function to estimate journal credits for a special
case. Instead of adding another wrapper around it, let's make this
non-static.

Reviewed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/23ce80d4286f792831ce99d13558182ee228fedb.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Let's make ext4_meta_trans_blocks() non-static for use in later
functions during -&gt;end_io conversion for atomic writes.
We will need this function to estimate journal credits for a special
case. Instead of adding another wrapper around it, let's make this
non-static.

Reviewed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/23ce80d4286f792831ce99d13558182ee228fedb.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: Document an edge case for overwrites</title>
<updated>2025-05-20T14:31:12+00:00</updated>
<author>
<name>Ritesh Harjani (IBM)</name>
<email>ritesh.list@gmail.com</email>
</author>
<published>2025-05-15T19:50:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9fa6121684dad974c8c2b2aceb0df2b27f0627fe'/>
<id>9fa6121684dad974c8c2b2aceb0df2b27f0627fe</id>
<content type='text'>
ext4_iomap_overwrite_begin() clears the flag for IOMAP_WRITE before
calling ext4_iomap_begin(). Document this above ext4_map_blocks() call
as it is easy to miss it when focusing on write paths alone.

Reviewed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/fd50ba05440042dff77d555e463a620a79f8d0e9.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ext4_iomap_overwrite_begin() clears the flag for IOMAP_WRITE before
calling ext4_iomap_begin(). Document this above ext4_map_blocks() call
as it is easy to miss it when focusing on write paths alone.

Reviewed-by: Ojaswin Mujoo &lt;ojaswin@linux.ibm.com&gt;
Acked-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Signed-off-by: Ritesh Harjani (IBM) &lt;ritesh.list@gmail.com&gt;
Link: https://patch.msgid.link/fd50ba05440042dff77d555e463a620a79f8d0e9.1747337952.git.ritesh.list@gmail.com
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ext4: remove sbi argument from ext4_chksum()</title>
<updated>2025-05-20T14:31:12+00:00</updated>
<author>
<name>Eric Biggers</name>
<email>ebiggers@google.com</email>
</author>
<published>2025-05-13T05:38:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6cbab5f95e49ec8a9f21784fae3ff0ee09b2dfbc'/>
<id>6cbab5f95e49ec8a9f21784fae3ff0ee09b2dfbc</id>
<content type='text'>
Since ext4_chksum() no longer uses its sbi argument, remove it.

Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Reviewed-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Link: https://patch.msgid.link/20250513053809.699974-2-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since ext4_chksum() no longer uses its sbi argument, remove it.

Signed-off-by: Eric Biggers &lt;ebiggers@google.com&gt;
Reviewed-by: Baokun Li &lt;libaokun1@huawei.com&gt;
Link: https://patch.msgid.link/20250513053809.699974-2-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o &lt;tytso@mit.edu&gt;
</pre>
</div>
</content>
</entry>
</feed>
