<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/linux/blkdev.h, branch v7.1-rc3</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-04-13T22:51:31+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-13T22:51:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fe6ac157b7e15c8976bd62ad7cb98e248884e83'/>
<id>7fe6ac157b7e15c8976bd62ad7cb98e248884e83</id>
<content type='text'>
Pull block updates from Jens Axboe:

 - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
   copies between kernel and userspace by matching registered buffer
   PFNs at I/O time. Includes selftests.

 - Refactor bio integrity to support filesystem initiated integrity
   operations and arbitrary buffer alignment.

 - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
   and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
   unify synchronous bi_end_io callbacks.

 - Fix zone write plug refcount handling and plug removal races. Add
   support for serializing zone writes at QD=1 for rotational zoned
   devices, yielding significant throughput improvements.

 - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
   command.

 - Add io_uring passthrough (uring_cmd) support to the BSG layer.

 - Replace pp_buf in partition scanning with struct seq_buf.

 - zloop improvements and cleanups.

 - drbd genl cleanup, switching to pre_doit/post_doit.

 - NVMe pull request via Keith:
      - Fabrics authentication updates
      - Enhanced block queue limits support
      - Workqueue usage updates
      - A new write zeroes device quirk
      - Tagset cleanup fix for loop device

 - MD pull requests via Yu Kuai:
      - Fix raid5 soft lockup in retry_aligned_read()
      - Fix raid10 deadlock with check operation and nowait requests
      - Fix raid1 overlapping writes on writemostly disks
      - Fix sysfs deadlock on array_state=clear
      - Proactive RAID-5 parity building with llbitmap, with
        write_zeroes_unmap optimization for initial sync
      - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
        version mismatch fallback
      - Fix bcache use-after-free and uninitialized closure
      - Validate raid5 journal metadata payload size
      - Various cleanups

 - Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
  ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
  scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
  block: refactor blkdev_zone_mgmt_ioctl
  MAINTAINERS: update ublk driver maintainer email
  Documentation: ublk: address review comments for SHMEM_ZC docs
  ublk: allow buffer registration before device is started
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  xfs: use bio_await in xfs_zone_gc_reset_sync
  block: add a bio_submit_or_kill helper
  block: factor out a bio_await helper
  block: unify the synchronous bi_end_io callbacks
  xfs: fix number of GC bvecs
  selftests/ublk: add read-only buffer registration test
  selftests/ublk: add filesystem fio verify test for shmem_zc
  selftests/ublk: add hugetlbfs shmem_zc test for loop target
  selftests/ublk: add shared memory zero-copy test
  selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block updates from Jens Axboe:

 - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
   copies between kernel and userspace by matching registered buffer
   PFNs at I/O time. Includes selftests.

 - Refactor bio integrity to support filesystem initiated integrity
   operations and arbitrary buffer alignment.

 - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
   and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
   unify synchronous bi_end_io callbacks.

 - Fix zone write plug refcount handling and plug removal races. Add
   support for serializing zone writes at QD=1 for rotational zoned
   devices, yielding significant throughput improvements.

 - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
   command.

 - Add io_uring passthrough (uring_cmd) support to the BSG layer.

 - Replace pp_buf in partition scanning with struct seq_buf.

 - zloop improvements and cleanups.

 - drbd genl cleanup, switching to pre_doit/post_doit.

 - NVMe pull request via Keith:
      - Fabrics authentication updates
      - Enhanced block queue limits support
      - Workqueue usage updates
      - A new write zeroes device quirk
      - Tagset cleanup fix for loop device

 - MD pull requests via Yu Kuai:
      - Fix raid5 soft lockup in retry_aligned_read()
      - Fix raid10 deadlock with check operation and nowait requests
      - Fix raid1 overlapping writes on writemostly disks
      - Fix sysfs deadlock on array_state=clear
      - Proactive RAID-5 parity building with llbitmap, with
        write_zeroes_unmap optimization for initial sync
      - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
        version mismatch fallback
      - Fix bcache use-after-free and uninitialized closure
      - Validate raid5 journal metadata payload size
      - Various cleanups

 - Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
  ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
  scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
  block: refactor blkdev_zone_mgmt_ioctl
  MAINTAINERS: update ublk driver maintainer email
  Documentation: ublk: address review comments for SHMEM_ZC docs
  ublk: allow buffer registration before device is started
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  xfs: use bio_await in xfs_zone_gc_reset_sync
  block: add a bio_submit_or_kill helper
  block: factor out a bio_await helper
  block: unify the synchronous bi_end_io callbacks
  xfs: fix number of GC bvecs
  selftests/ublk: add read-only buffer registration test
  selftests/ublk: add filesystem fio verify test for shmem_zc
  selftests/ublk: add hugetlbfs shmem_zc test for loop target
  selftests/ublk: add shared memory zero-copy test
  selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs</title>
<updated>2026-04-13T21:20:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-13T21:20:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ef3da345ccb1fd70e2288b821301698483c6c35a'/>
<id>ef3da345ccb1fd70e2288b821301698483c6c35a</id>
<content type='text'>
Pull misc vfs updates from Christian Brauner:
 "Features:
   - coredump: add tracepoint for coredump events
   - fs: hide file and bfile caches behind runtime const machinery

  Fixes:
   - fix architecture-specific compat_ftruncate64 implementations
   - dcache: Limit the minimal number of bucket to two
   - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START
   - fs/mbcache: cancel shrink work before destroying the cache
   - dcache: permit dynamic_dname()s up to NAME_MAX

  Cleanups:
   - remove or unexport unused fs_context infrastructure
   - trivial -&gt;setattr cleanups
   - selftests/filesystems: Assume that TIOCGPTPEER is defined
   - writeback: fix kernel-doc function name mismatch for wb_put_many()
   - autofs: replace manual symlink buffer allocation in autofs_dir_symlink
   - init/initramfs.c: trivial fix: FSM -&gt; Finite-state machine
   - fs: remove stale and duplicate forward declarations
   - readdir: Introduce dirent_size()
   - fs: Replace user_access_{begin/end} by scoped user access
   - kernel: acct: fix duplicate word in comment
   - fs: write a better comment in step_into() concerning .mnt assignment
   - fs: attr: fix comment formatting and spelling issues"

* tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  dcache: permit dynamic_dname()s up to NAME_MAX
  fs: attr: fix comment formatting and spelling issues
  fs: hide file and bfile caches behind runtime const machinery
  fs: write a better comment in step_into() concerning .mnt assignment
  proc: rename proc_notify_change to proc_setattr
  proc: rename proc_setattr to proc_nochmod_setattr
  affs: rename affs_notify_change to affs_setattr
  adfs: rename adfs_notify_change to adfs_setattr
  hfs: update comments on hfs_inode_setattr
  kernel: acct: fix duplicate word in comment
  fs: Replace user_access_{begin/end} by scoped user access
  readdir: Introduce dirent_size()
  coredump: add tracepoint for coredump events
  fs: remove do_sys_truncate
  fs: pass on FTRUNCATE_* flags to do_truncate
  fs: fix archiecture-specific compat_ftruncate64
  fs: remove stale and duplicate forward declarations
  init/initramfs.c: trivial fix: FSM -&gt; Finite-state machine
  autofs: replace manual symlink buffer allocation in autofs_dir_symlink
  fs/mbcache: cancel shrink work before destroying the cache
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull misc vfs updates from Christian Brauner:
 "Features:
   - coredump: add tracepoint for coredump events
   - fs: hide file and bfile caches behind runtime const machinery

  Fixes:
   - fix architecture-specific compat_ftruncate64 implementations
   - dcache: Limit the minimal number of bucket to two
   - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START
   - fs/mbcache: cancel shrink work before destroying the cache
   - dcache: permit dynamic_dname()s up to NAME_MAX

  Cleanups:
   - remove or unexport unused fs_context infrastructure
   - trivial -&gt;setattr cleanups
   - selftests/filesystems: Assume that TIOCGPTPEER is defined
   - writeback: fix kernel-doc function name mismatch for wb_put_many()
   - autofs: replace manual symlink buffer allocation in autofs_dir_symlink
   - init/initramfs.c: trivial fix: FSM -&gt; Finite-state machine
   - fs: remove stale and duplicate forward declarations
   - readdir: Introduce dirent_size()
   - fs: Replace user_access_{begin/end} by scoped user access
   - kernel: acct: fix duplicate word in comment
   - fs: write a better comment in step_into() concerning .mnt assignment
   - fs: attr: fix comment formatting and spelling issues"

* tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits)
  dcache: permit dynamic_dname()s up to NAME_MAX
  fs: attr: fix comment formatting and spelling issues
  fs: hide file and bfile caches behind runtime const machinery
  fs: write a better comment in step_into() concerning .mnt assignment
  proc: rename proc_notify_change to proc_setattr
  proc: rename proc_setattr to proc_nochmod_setattr
  affs: rename affs_notify_change to affs_setattr
  adfs: rename adfs_notify_change to adfs_setattr
  hfs: update comments on hfs_inode_setattr
  kernel: acct: fix duplicate word in comment
  fs: Replace user_access_{begin/end} by scoped user access
  readdir: Introduce dirent_size()
  coredump: add tracepoint for coredump events
  fs: remove do_sys_truncate
  fs: pass on FTRUNCATE_* flags to do_truncate
  fs: fix archiecture-specific compat_ftruncate64
  fs: remove stale and duplicate forward declarations
  init/initramfs.c: trivial fix: FSM -&gt; Finite-state machine
  autofs: replace manual symlink buffer allocation in autofs_dir_symlink
  fs/mbcache: cancel shrink work before destroying the cache
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>fs: remove stale and duplicate forward declarations</title>
<updated>2026-03-20T13:18:03+00:00</updated>
<author>
<name>Yuto Ohnuki</name>
<email>ytohnuki@amazon.com</email>
</author>
<published>2026-02-26T20:18:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d6bb84f6bb3f4c05fc47fd02ce75ce3032a4ce1'/>
<id>3d6bb84f6bb3f4c05fc47fd02ce75ce3032a4ce1</id>
<content type='text'>
Remove the following unnecessary forward declarations from fs.h, which
improves maintainability.

- struct hd_geometry: became unused in fs.h when
  block_device_operations was moved to blkdev.h in commit 08f858512151
  ("[PATCH] move block_device_operations to blkdev.h"). The forward
  declaration is now added to blkdev.h where it is actually used.

- struct iovec: became unused when aio_read/aio_write were removed in
  commit 8436318205b9 ("-&gt;aio_read and -&gt;aio_write removed")

- struct iov_iter: duplicate forward declaration. This removes the
  redundant second declaration, added in commit 293bc9822fa9
  ("new methods: -&gt;read_iter() and -&gt;write_iter()")

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202512301303.s7YWTZHA-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302139.Wl0soAlz-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302105.pmzYfmcV-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302125.FNgHwu5z-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302108.nIV8r5ES-lkp@intel.com/
Signed-off-by: Yuto Ohnuki &lt;ytohnuki@amazon.com&gt;
Link: https://patch.msgid.link/20260226201857.27310-2-ytohnuki@amazon.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remove the following unnecessary forward declarations from fs.h, which
improves maintainability.

- struct hd_geometry: became unused in fs.h when
  block_device_operations was moved to blkdev.h in commit 08f858512151
  ("[PATCH] move block_device_operations to blkdev.h"). The forward
  declaration is now added to blkdev.h where it is actually used.

- struct iovec: became unused when aio_read/aio_write were removed in
  commit 8436318205b9 ("-&gt;aio_read and -&gt;aio_write removed")

- struct iov_iter: duplicate forward declaration. This removes the
  redundant second declaration, added in commit 293bc9822fa9
  ("new methods: -&gt;read_iter() and -&gt;write_iter()")

Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Closes: https://lore.kernel.org/oe-kbuild-all/202512301303.s7YWTZHA-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302139.Wl0soAlz-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302105.pmzYfmcV-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302125.FNgHwu5z-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202512302108.nIV8r5ES-lkp@intel.com/
Signed-off-by: Yuto Ohnuki &lt;ytohnuki@amazon.com&gt;
Link: https://patch.msgid.link/20260226201857.27310-2-ytohnuki@amazon.com
Reviewed-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Christian Brauner &lt;brauner@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-7.1/block-integrity' into for-7.1/block</title>
<updated>2026-03-09T20:30:14+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-03-09T20:30:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=89d10b7803a6ab7276850e54b766487666667153'/>
<id>89d10b7803a6ab7276850e54b766487666667153</id>
<content type='text'>
Merge in integrity changes which are also landing in the VFS tree as
dependencies for fs related changes.

* for-7.1/block-integrity:
  block: pass a maxlen argument to bio_iov_iter_bounce
  block: add fs_bio_integrity helpers
  block: make max_integrity_io_size public
  block: prepare generation / verification helpers for fs usage
  block: add a bdev_has_integrity_csum helper
  block: factor out a bio_integrity_setup_default helper
  block: factor out a bio_integrity_action helper
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge in integrity changes which are also landing in the VFS tree as
dependencies for fs related changes.

* for-7.1/block-integrity:
  block: pass a maxlen argument to bio_iov_iter_bounce
  block: add fs_bio_integrity helpers
  block: make max_integrity_io_size public
  block: prepare generation / verification helpers for fs usage
  block: add a bdev_has_integrity_csum helper
  block: factor out a bio_integrity_setup_default helper
  block: factor out a bio_integrity_action helper
</pre>
</div>
</content>
</entry>
<entry>
<title>block: remove bdev_nonrot()</title>
<updated>2026-03-09T20:30:00+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2026-02-26T07:54:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ecd92cfec5349876d6a80f8188ea98c5920094b6'/>
<id>ecd92cfec5349876d6a80f8188ea98c5920094b6</id>
<content type='text'>
bdev_nonrot() is simply the negative return value of bdev_rot().
So replace all call sites of bdev_nonrot() with calls to bdev_rot()
and remove bdev_nonrot().

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bdev_nonrot() is simply the negative return value of bdev_rot().
So replace all call sites of bdev_nonrot() with calls to bdev_rot()
and remove bdev_nonrot().

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: allow submitting all zone writes from a single context</title>
<updated>2026-03-09T20:30:00+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2026-02-27T13:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1365b6904fd050bf22ab9f3df375a396de5837a1'/>
<id>1365b6904fd050bf22ab9f3df375a396de5837a1</id>
<content type='text'>
In order to maintain sequential write patterns per zone with zoned block
devices, zone write plugging issues only a single write BIO per zone at
any time. This works well but has the side effect that when large
sequential write streams are issued by the user and these streams cross
zone boundaries, the device ends up receiving a discontiguous set of
write commands for different zones. The same also happens when a user
writes simultaneously at high queue depth multiple zones: the device
does not see all sequential writes per zone and receives discontiguous
writes to different zones. While this does not affect the performance of
solid state zoned block devices, when using an SMR HDD, this pattern
change from sequential writes to discontiguous writes to different zones
significantly increases head seek which results in degraded write
throughput.

In order to reduce this seek overhead for rotational media devices,
introduce a per disk zone write plugs kernel thread to issue all write
BIOs to zones. This single zone write issuing context is enabled for
any zoned block device that has a request queue flagged with the new
QUEUE_ZONED_QD1_WRITES flag.

The flag QUEUE_ZONED_QD1_WRITES is visible as the sysfs queue attribute
zoned_qd1_writes for zoned devices. For regular block devices, this
attribute is not visible. For zoned block devices, a user can override
the default value set to force the global write maximum queue depth of
1 for a zoned block device, or clear this attribute to fallback to the
default behavior of zone write plugging which limits writes to QD=1 per
sequential zone.

Writing to a zoned block device flagged with QUEUE_ZONED_QD1_WRITES is
implemented using a list of zone write plugs that have a non-empty BIO
list. Listed zone write plugs are processed by the disk zone write plugs
worker kthread in FIFO order, and all BIOs of a zone write plug are all
processed before switching to the next listed zone write plug. A newly
submitted BIO for a non-FULL zone write plug that is not yet listed
causes the addition of the zone write plug at the end of the disk list
of zone write plugs.

Since the write BIOs queued in a zone write plug BIO list are
necessarilly sequential, for rotational media, using the single zone
write plugs kthread to issue all BIOs maintains a sequential write
pattern and thus reduces seek overhead and improves write throughput.
This processing essentially result in always writing to HDDs at QD=1,
which is not an issue for HDDs operating with write caching enabled.
Performance with write cache disabled is also not degraded thanks to
the efficient write handling of modern SMR HDDs.

A disk list of zone write plugs is defined using the new struct gendisk
zone_wplugs_list, and accesses to this list is protected using the
zone_wplugs_list_lock spinlock.  The per disk kthread
(zone_wplugs_worker) code is implemented by the function
disk_zone_wplugs_worker(). A reference on listed zone write plugs is
always held until all BIOs of the zone write plug are processed by the
worker kthread. BIO issuing at QD=1 is driven using a completion
structure (zone_wplugs_worker_bio_done) and calls to blk_io_wait().

With this change, performance when sequentially writing the zones of a
30 TB SMR SATA HDD connected to an AHCI adapter changes as follows
(1MiB direct I/Os, results in MB/s unit):

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 | Sequential write | Baseline | Patched |
 |  Queue Depth     | 6.19-rc8 |         |
 +------------------+----------+---------+
 | 1                | 244      | 245     |
 | 2                | 244      | 245     |
 | 4                | 245      | 245     |
 | 8                | 242      | 245     |
 | 16               | 222      | 246     |
 | 32               | 211      | 245     |
 | 64               | 193      | 244     |
 | 128              | 112      | 246     |
 +------------------+----------+---------+

With the current code (baseline), as the sequential write stream crosses
a zone boundary, higher queue depth creates a gap between the
last IO to the previous zone and the first IOs to the following zones,
causing head seeks and degrading performance. Using the disk zone
write plugs worker thread, this pattern disappears and the maximum
throughput of the drive is maintained, leading to over 100%
improvements in throughput for high queue depth write.

Using 16 fio jobs all writing to randomly chosen zones at QD=32 with 1
MiB direct IOs, write throughput also increases significantly.

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 |   Random write   | Baseline | Patched |
 |  Number of zones | 6.19-rc7 |         |
 +------------------+----------+---------+
 | 1                | 191      | 192     |
 | 2                | 101      | 128     |
 | 4                | 115      | 123     |
 | 8                | 90       | 120     |
 | 16               | 64       | 115     |
 | 32               | 58       | 105     |
 | 64               | 56       | 101     |
 | 128              | 55       | 99      |
 +------------------+----------+---------+

Tests using XFS shows that buffered write speed with 8 jobs writing
files increases by 12% to 35% depending on the workload.

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 |     Workload     | Baseline | Patched |
 |                  | 6.19-rc7 |         |
 +------------------+----------+---------+
 | 256MiB file size | 212      | 238     |
 +------------------+----------+---------+
 | 4MiB .. 128 MiB  | 213      | 243     |
 | random file size |          |         |
 +------------------+----------+---------+
 | 2MiB .. 8 MiB    | 179      | 242     |
 | random file size |          |         |
 +------------------+----------+---------+

Performance gains are even more significant when using an HBA that
limits the maximum size of commands to a small value, e.g. HBAs
controlled with the mpi3mr driver limit commands to a maximum of 1 MiB.
In such case, the write throughput gains are over 40%.

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 |     Workload     | Baseline | Patched |
 |                  | 6.19-rc7 |         |
 +------------------+----------+---------+
 | 256MiB file size | 175      | 245     |
 +------------------+----------+---------+
 | 4MiB .. 128 MiB  | 174      | 244     |
 | random file size |          |         |
 +------------------+----------+---------+
 | 2MiB .. 8 MiB    | 171      | 243     |
 | random file size |          |         |
 +------------------+----------+---------+

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In order to maintain sequential write patterns per zone with zoned block
devices, zone write plugging issues only a single write BIO per zone at
any time. This works well but has the side effect that when large
sequential write streams are issued by the user and these streams cross
zone boundaries, the device ends up receiving a discontiguous set of
write commands for different zones. The same also happens when a user
writes simultaneously at high queue depth multiple zones: the device
does not see all sequential writes per zone and receives discontiguous
writes to different zones. While this does not affect the performance of
solid state zoned block devices, when using an SMR HDD, this pattern
change from sequential writes to discontiguous writes to different zones
significantly increases head seek which results in degraded write
throughput.

In order to reduce this seek overhead for rotational media devices,
introduce a per disk zone write plugs kernel thread to issue all write
BIOs to zones. This single zone write issuing context is enabled for
any zoned block device that has a request queue flagged with the new
QUEUE_ZONED_QD1_WRITES flag.

The flag QUEUE_ZONED_QD1_WRITES is visible as the sysfs queue attribute
zoned_qd1_writes for zoned devices. For regular block devices, this
attribute is not visible. For zoned block devices, a user can override
the default value set to force the global write maximum queue depth of
1 for a zoned block device, or clear this attribute to fallback to the
default behavior of zone write plugging which limits writes to QD=1 per
sequential zone.

Writing to a zoned block device flagged with QUEUE_ZONED_QD1_WRITES is
implemented using a list of zone write plugs that have a non-empty BIO
list. Listed zone write plugs are processed by the disk zone write plugs
worker kthread in FIFO order, and all BIOs of a zone write plug are all
processed before switching to the next listed zone write plug. A newly
submitted BIO for a non-FULL zone write plug that is not yet listed
causes the addition of the zone write plug at the end of the disk list
of zone write plugs.

Since the write BIOs queued in a zone write plug BIO list are
necessarilly sequential, for rotational media, using the single zone
write plugs kthread to issue all BIOs maintains a sequential write
pattern and thus reduces seek overhead and improves write throughput.
This processing essentially result in always writing to HDDs at QD=1,
which is not an issue for HDDs operating with write caching enabled.
Performance with write cache disabled is also not degraded thanks to
the efficient write handling of modern SMR HDDs.

A disk list of zone write plugs is defined using the new struct gendisk
zone_wplugs_list, and accesses to this list is protected using the
zone_wplugs_list_lock spinlock.  The per disk kthread
(zone_wplugs_worker) code is implemented by the function
disk_zone_wplugs_worker(). A reference on listed zone write plugs is
always held until all BIOs of the zone write plug are processed by the
worker kthread. BIO issuing at QD=1 is driven using a completion
structure (zone_wplugs_worker_bio_done) and calls to blk_io_wait().

With this change, performance when sequentially writing the zones of a
30 TB SMR SATA HDD connected to an AHCI adapter changes as follows
(1MiB direct I/Os, results in MB/s unit):

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 | Sequential write | Baseline | Patched |
 |  Queue Depth     | 6.19-rc8 |         |
 +------------------+----------+---------+
 | 1                | 244      | 245     |
 | 2                | 244      | 245     |
 | 4                | 245      | 245     |
 | 8                | 242      | 245     |
 | 16               | 222      | 246     |
 | 32               | 211      | 245     |
 | 64               | 193      | 244     |
 | 128              | 112      | 246     |
 +------------------+----------+---------+

With the current code (baseline), as the sequential write stream crosses
a zone boundary, higher queue depth creates a gap between the
last IO to the previous zone and the first IOs to the following zones,
causing head seeks and degrading performance. Using the disk zone
write plugs worker thread, this pattern disappears and the maximum
throughput of the drive is maintained, leading to over 100%
improvements in throughput for high queue depth write.

Using 16 fio jobs all writing to randomly chosen zones at QD=32 with 1
MiB direct IOs, write throughput also increases significantly.

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 |   Random write   | Baseline | Patched |
 |  Number of zones | 6.19-rc7 |         |
 +------------------+----------+---------+
 | 1                | 191      | 192     |
 | 2                | 101      | 128     |
 | 4                | 115      | 123     |
 | 8                | 90       | 120     |
 | 16               | 64       | 115     |
 | 32               | 58       | 105     |
 | 64               | 56       | 101     |
 | 128              | 55       | 99      |
 +------------------+----------+---------+

Tests using XFS shows that buffered write speed with 8 jobs writing
files increases by 12% to 35% depending on the workload.

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 |     Workload     | Baseline | Patched |
 |                  | 6.19-rc7 |         |
 +------------------+----------+---------+
 | 256MiB file size | 212      | 238     |
 +------------------+----------+---------+
 | 4MiB .. 128 MiB  | 213      | 243     |
 | random file size |          |         |
 +------------------+----------+---------+
 | 2MiB .. 8 MiB    | 179      | 242     |
 | random file size |          |         |
 +------------------+----------+---------+

Performance gains are even more significant when using an HBA that
limits the maximum size of commands to a small value, e.g. HBAs
controlled with the mpi3mr driver limit commands to a maximum of 1 MiB.
In such case, the write throughput gains are over 40%.

                    +--------------------+
		    |   Write BW (MB/s)  |
 +------------------+----------+---------+
 |     Workload     | Baseline | Patched |
 |                  | 6.19-rc7 |         |
 +------------------+----------+---------+
 | 256MiB file size | 175      | 245     |
 +------------------+----------+---------+
 | 4MiB .. 128 MiB  | 174      | 244     |
 | random file size |          |         |
 +------------------+----------+---------+
 | 2MiB .. 8 MiB    | 171      | 243     |
 | random file size |          |         |
 +------------------+----------+---------+

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: rename struct gendisk zone_wplugs_lock field</title>
<updated>2026-03-09T20:30:00+00:00</updated>
<author>
<name>Damien Le Moal</name>
<email>dlemoal@kernel.org</email>
</author>
<published>2026-02-27T13:19:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b7cbc30e93e3a64ea058230f6d0c764d6d80276f'/>
<id>b7cbc30e93e3a64ea058230f6d0c764d6d80276f</id>
<content type='text'>
Rename struct gendisk zone_wplugs_lock field to zone_wplugs_hash_lock to
clearly indicates that this is the spinlock used for manipulating the
hash table of zone write plugs.

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename struct gendisk zone_wplugs_lock field to zone_wplugs_hash_lock to
clearly indicates that this is the spinlock used for manipulating the
hash table of zone write plugs.

Signed-off-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Bart Van Assche &lt;bvanassche@acm.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: annotate struct request_queue with __counted_by_ptr</title>
<updated>2026-03-09T20:29:59+00:00</updated>
<author>
<name>Bill Wendling</name>
<email>morbo@google.com</email>
</author>
<published>2026-02-25T20:51:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0ee8ab5d4dc51704be1157470f3df8090629f9fc'/>
<id>0ee8ab5d4dc51704be1157470f3df8090629f9fc</id>
<content type='text'>
The queue_hw_ctx field in struct request_queue is an array of pointers
to struct blk_mq_hw_ctx. The number of elements in this array is tracked
by the nr_hw_queues field.

The array is allocated in __blk_mq_realloc_hw_ctxs() using
kcalloc_node() with set-&gt;nr_hw_queues elements. q-&gt;nr_hw_queues is
subsequently updated to set-&gt;nr_hw_queues.

When growing the array, the new array is assigned to queue_hw_ctx before
nr_hw_queues is updated. This is safe because nr_hw_queues (the old
smaller count) is used for bounds checking, which is within the new
larger allocation.

When shrinking the array, nr_hw_queues is updated to the smaller value,
while queue_hw_ctx retains the larger allocation. This is also safe as
the count is within the allocation bounds.

Annotating queue_hw_ctx with __counted_by_ptr(nr_hw_queues) allows the
compiler (with kSAN) to verify that accesses to queue_hw_ctx are within
the valid range defined by nr_hw_queues.

This patch was generated by CodeMender and reviewed by Bill Wendling.
Tested by running blktests.

Reviewed-by: Daniel Wagner &lt;dwagner@suse.de&gt;
Signed-off-by: Bill Wendling &lt;morbo@google.com&gt;
[axboe: massage commit message]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The queue_hw_ctx field in struct request_queue is an array of pointers
to struct blk_mq_hw_ctx. The number of elements in this array is tracked
by the nr_hw_queues field.

The array is allocated in __blk_mq_realloc_hw_ctxs() using
kcalloc_node() with set-&gt;nr_hw_queues elements. q-&gt;nr_hw_queues is
subsequently updated to set-&gt;nr_hw_queues.

When growing the array, the new array is assigned to queue_hw_ctx before
nr_hw_queues is updated. This is safe because nr_hw_queues (the old
smaller count) is used for bounds checking, which is within the new
larger allocation.

When shrinking the array, nr_hw_queues is updated to the smaller value,
while queue_hw_ctx retains the larger allocation. This is also safe as
the count is within the allocation bounds.

Annotating queue_hw_ctx with __counted_by_ptr(nr_hw_queues) allows the
compiler (with kSAN) to verify that accesses to queue_hw_ctx are within
the valid range defined by nr_hw_queues.

This patch was generated by CodeMender and reviewed by Bill Wendling.
Tested by running blktests.

Reviewed-by: Daniel Wagner &lt;dwagner@suse.de&gt;
Signed-off-by: Bill Wendling &lt;morbo@google.com&gt;
[axboe: massage commit message]
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: make max_integrity_io_size public</title>
<updated>2026-03-09T13:47:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2026-02-23T13:20:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8c56ef10150ed7650cf4105539242c94c156148c'/>
<id>8c56ef10150ed7650cf4105539242c94c156148c</id>
<content type='text'>
File systems that generate integrity will need this, so move it out
of the block private or blk-mq specific headers.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
File systems that generate integrity will need this, so move it out
of the block private or blk-mq specific headers.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: add a bdev_has_integrity_csum helper</title>
<updated>2026-03-09T13:47:02+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2026-02-23T13:20:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7afe93946dff63aa57c6db81f5eb43ac8233364e'/>
<id>7afe93946dff63aa57c6db81f5eb43ac8233364e</id>
<content type='text'>
Factor out a helper to see if the block device has an integrity checksum
from bdev_stable_writes so that it can be reused for other checks.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out a helper to see if the block device has an integrity checksum
from bdev_stable_writes so that it can be reused for other checks.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Reviewed-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Reviewed-by: Darrick J. Wong &lt;djwong@kernel.org&gt;
Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
