<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/block, branch v6.6</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'block-6.6-2023-10-06' of git://git.kernel.dk/linux</title>
<updated>2023-10-06T22:43:19+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-10-06T22:43:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fc5b94f1cb405c7129a337db6ae7db3b1e325c48'/>
<id>fc5b94f1cb405c7129a337db6ae7db3b1e325c48</id>
<content type='text'>
Pull block fixes from Jens Axboe:
 "Just two minor fixes, for nbd and md"

* tag 'block-6.6-2023-10-06' of git://git.kernel.dk/linux:
  nbd: don't call blk_mark_disk_dead nbd_clear_sock_ioctl
  md/raid5: release batch_last before waiting for another stripe_head
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block fixes from Jens Axboe:
 "Just two minor fixes, for nbd and md"

* tag 'block-6.6-2023-10-06' of git://git.kernel.dk/linux:
  nbd: don't call blk_mark_disk_dead nbd_clear_sock_ioctl
  md/raid5: release batch_last before waiting for another stripe_head
</pre>
</div>
</content>
</entry>
<entry>
<title>nbd: don't call blk_mark_disk_dead nbd_clear_sock_ioctl</title>
<updated>2023-10-04T00:27:44+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2023-10-03T15:31:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=07a1141ff170ff5d4f9c4fbb0453727ab48096e5'/>
<id>07a1141ff170ff5d4f9c4fbb0453727ab48096e5</id>
<content type='text'>
blk_mark_disk_dead is the proper interface to shut down a block
device, but it also makes the disk unusable forever.

nbd_clear_sock_ioctl on the other hand wants to shut down the file
system, but allow the block device to be used again when when connected
to another socket.  Switch nbd to use disk_force_media_change and
nbd_bdev_reset to go back to a behavior of the old __invalidate_device
call, with the added benefit of incrementing the device generation
as there is no guarantee the old content comes back when the device
is reconnected.

Reported-by: Samuel Holland &lt;samuel.holland@sifive.com&gt;
Reported-by: Shinichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Fixes: 0c1c9a27ce90 ("nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Samuel Holland &lt;samuel.holland@sifive.com&gt;
Link: https://lore.kernel.org/r/20231003153106.1331363-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
blk_mark_disk_dead is the proper interface to shut down a block
device, but it also makes the disk unusable forever.

nbd_clear_sock_ioctl on the other hand wants to shut down the file
system, but allow the block device to be used again when when connected
to another socket.  Switch nbd to use disk_force_media_change and
nbd_bdev_reset to go back to a behavior of the old __invalidate_device
call, with the added benefit of incrementing the device generation
as there is no guarantee the old content comes back when the device
is reconnected.

Reported-by: Samuel Holland &lt;samuel.holland@sifive.com&gt;
Reported-by: Shinichiro Kawasaki &lt;shinichiro.kawasaki@wdc.com&gt;
Fixes: 0c1c9a27ce90 ("nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl")
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Tested-by: Samuel Holland &lt;samuel.holland@sifive.com&gt;
Link: https://lore.kernel.org/r/20231003153106.1331363-1-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: take header_rwsem in rbd_dev_refresh() only when updating</title>
<updated>2023-09-26T08:33:19+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-09-20T17:01:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0b207d02bd9ab8dcc31b262ca9f60dbc1822500d'/>
<id>0b207d02bd9ab8dcc31b262ca9f60dbc1822500d</id>
<content type='text'>
rbd_dev_refresh() has been holding header_rwsem across header and
parent info read-in unnecessarily for ages.  With commit 870611e4877e
("rbd: get snapshot context after exclusive lock is ensured to be
held"), the potential for deadlocks became much more real owning to
a) header_rwsem now nesting inside lock_rwsem and b) rw_semaphores
not allowing new readers after a writer is registered.

For example, assuming that I/O request 1, I/O request 2 and header
read-in request all target the same OSD:

1. I/O request 1 comes in and gets submitted
2. watch error occurs
3. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
   releases lock_rwsem
4. after reestablishing the watch, rbd_reregister_watch() calls
   rbd_dev_refresh() which takes header_rwsem for write and submits
   a header read-in request
5. I/O request 2 comes in: after taking lock_rwsem for read in
   __rbd_img_handle_request(), it blocks trying to take header_rwsem
   for read in rbd_img_object_requests()
6. another watch error occurs
7. rbd_watch_errcb() blocks trying to take lock_rwsem for write
8. I/O request 1 completion is received by the messenger but can't be
   processed because lock_rwsem won't be granted anymore
9. header read-in request completion can't be received, let alone
   processed, because the messenger is stranded

Change rbd_dev_refresh() to take header_rwsem only for actually
updating rbd_dev-&gt;header.  Header and parent info read-in don't need
any locking.

Cc: stable@vger.kernel.org # 0b035401c570: rbd: move rbd_dev_refresh() definition
Cc: stable@vger.kernel.org # 510a7330c82a: rbd: decouple header read-in from updating rbd_dev-&gt;header
Cc: stable@vger.kernel.org # c10311776f0a: rbd: decouple parent info read-in from updating rbd_dev
Cc: stable@vger.kernel.org
Fixes: 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rbd_dev_refresh() has been holding header_rwsem across header and
parent info read-in unnecessarily for ages.  With commit 870611e4877e
("rbd: get snapshot context after exclusive lock is ensured to be
held"), the potential for deadlocks became much more real owning to
a) header_rwsem now nesting inside lock_rwsem and b) rw_semaphores
not allowing new readers after a writer is registered.

For example, assuming that I/O request 1, I/O request 2 and header
read-in request all target the same OSD:

1. I/O request 1 comes in and gets submitted
2. watch error occurs
3. rbd_watch_errcb() takes lock_rwsem for write, clears owner_cid and
   releases lock_rwsem
4. after reestablishing the watch, rbd_reregister_watch() calls
   rbd_dev_refresh() which takes header_rwsem for write and submits
   a header read-in request
5. I/O request 2 comes in: after taking lock_rwsem for read in
   __rbd_img_handle_request(), it blocks trying to take header_rwsem
   for read in rbd_img_object_requests()
6. another watch error occurs
7. rbd_watch_errcb() blocks trying to take lock_rwsem for write
8. I/O request 1 completion is received by the messenger but can't be
   processed because lock_rwsem won't be granted anymore
9. header read-in request completion can't be received, let alone
   processed, because the messenger is stranded

Change rbd_dev_refresh() to take header_rwsem only for actually
updating rbd_dev-&gt;header.  Header and parent info read-in don't need
any locking.

Cc: stable@vger.kernel.org # 0b035401c570: rbd: move rbd_dev_refresh() definition
Cc: stable@vger.kernel.org # 510a7330c82a: rbd: decouple header read-in from updating rbd_dev-&gt;header
Cc: stable@vger.kernel.org # c10311776f0a: rbd: decouple parent info read-in from updating rbd_dev
Cc: stable@vger.kernel.org
Fixes: 870611e4877e ("rbd: get snapshot context after exclusive lock is ensured to be held")
Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: decouple parent info read-in from updating rbd_dev</title>
<updated>2023-09-26T08:31:33+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-09-20T16:38:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c10311776f0a8ddea2276df96e255625b07045a8'/>
<id>c10311776f0a8ddea2276df96e255625b07045a8</id>
<content type='text'>
Unlike header read-in, parent info read-in is already decoupled in
get_parent_info(), but it's buried in rbd_dev_v2_parent_info() along
with the processing logic.

Separate the initial read-in and update read-in logic into
rbd_dev_setup_parent() and rbd_dev_update_parent() respectively and
have rbd_dev_v2_parent_info() just populate struct parent_image_info
(i.e. what get_parent_info() did).  Some existing QoI issues, like
flatten of a standalone clone being disregarded on refresh, remain.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Unlike header read-in, parent info read-in is already decoupled in
get_parent_info(), but it's buried in rbd_dev_v2_parent_info() along
with the processing logic.

Separate the initial read-in and update read-in logic into
rbd_dev_setup_parent() and rbd_dev_update_parent() respectively and
have rbd_dev_v2_parent_info() just populate struct parent_image_info
(i.e. what get_parent_info() did).  Some existing QoI issues, like
flatten of a standalone clone being disregarded on refresh, remain.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: decouple header read-in from updating rbd_dev-&gt;header</title>
<updated>2023-09-26T08:31:31+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-09-19T18:41:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=510a7330c82a7754d5df0117a8589e8a539067c7'/>
<id>510a7330c82a7754d5df0117a8589e8a539067c7</id>
<content type='text'>
Make rbd_dev_header_info() populate a passed struct rbd_image_header
instead of rbd_dev-&gt;header and introduce rbd_dev_update_header() for
updating mutable fields in rbd_dev-&gt;header upon refresh.  The initial
read-in of both mutable and immutable fields in rbd_dev_image_probe()
passes in rbd_dev-&gt;header so no update step is required there.

rbd_init_layout() is now called directly from rbd_dev_image_probe()
instead of individually in format 1 and format 2 implementations.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make rbd_dev_header_info() populate a passed struct rbd_image_header
instead of rbd_dev-&gt;header and introduce rbd_dev_update_header() for
updating mutable fields in rbd_dev-&gt;header upon refresh.  The initial
read-in of both mutable and immutable fields in rbd_dev_image_probe()
passes in rbd_dev-&gt;header so no update step is required there.

rbd_init_layout() is now called directly from rbd_dev_image_probe()
instead of individually in format 1 and format 2 implementations.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rbd: move rbd_dev_refresh() definition</title>
<updated>2023-09-26T08:31:15+00:00</updated>
<author>
<name>Ilya Dryomov</name>
<email>idryomov@gmail.com</email>
</author>
<published>2023-09-17T13:07:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0b035401c57021fc6c300272cbb1c5a889d4fe45'/>
<id>0b035401c57021fc6c300272cbb1c5a889d4fe45</id>
<content type='text'>
Move rbd_dev_refresh() definition further down to avoid having to
move struct parent_image_info definition in the next commit.  This
spares some forward declarations too.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move rbd_dev_refresh() definition further down to avoid having to
move struct parent_image_info definition in the next commit.  This
spares some forward declarations too.

Signed-off-by: Ilya Dryomov &lt;idryomov@gmail.com&gt;
Reviewed-by: Dongsheng Yang &lt;dongsheng.yang@easystack.cn&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux</title>
<updated>2023-09-09T04:39:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-09T04:39:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7402e635edb9b430125017d2489974c9a10b069c'/>
<id>7402e635edb9b430125017d2489974c9a10b069c</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - Fix null_blk polled IO timeout handling (Chengming)

 - Regression fix for swapped arguments in drbd bvec_set_page()
   (Christoph)

 - String length handling fix for s390 dasd (Heiko)

 - Fixes for blk-throttle accounting (Yu)

 - Fix page pinning issue for same page segments (Christoph)

 - Remove redundant file_remove_privs() call (Christoph)

 - Fix a regression in partition handling for devices not supporting
   partitions (Li)

* tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux:
  drbd: swap bvec_set_page len and offset
  block: fix pin count management when merging same-page segments
  null_blk: fix poll request timeout handling
  s390/dasd: fix string length handling
  block: don't add or resize partition on the disk with GENHD_FL_NO_PART
  block: remove the call to file_remove_privs in blkdev_write_iter
  blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
  blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
  blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative
  blk-throttle: print signed value 'carryover_bytes/ios' for user
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block fixes from Jens Axboe:

 - Fix null_blk polled IO timeout handling (Chengming)

 - Regression fix for swapped arguments in drbd bvec_set_page()
   (Christoph)

 - String length handling fix for s390 dasd (Heiko)

 - Fixes for blk-throttle accounting (Yu)

 - Fix page pinning issue for same page segments (Christoph)

 - Remove redundant file_remove_privs() call (Christoph)

 - Fix a regression in partition handling for devices not supporting
   partitions (Li)

* tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux:
  drbd: swap bvec_set_page len and offset
  block: fix pin count management when merging same-page segments
  null_blk: fix poll request timeout handling
  s390/dasd: fix string length handling
  block: don't add or resize partition on the disk with GENHD_FL_NO_PART
  block: remove the call to file_remove_privs in blkdev_write_iter
  blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
  blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
  blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative
  blk-throttle: print signed value 'carryover_bytes/ios' for user
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client</title>
<updated>2023-09-06T19:10:15+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-06T19:10:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7ba2090ca64ea1aa435744884124387db1fac70f'/>
<id>7ba2090ca64ea1aa435744884124387db1fac70f</id>
<content type='text'>
Pull ceph updates from Ilya Dryomov:
 "Mixed with some fixes and cleanups, this brings in reasonably complete
  fscrypt support to CephFS! The list of things which don't work with
  encryption should be fairly short, mostly around the edges: fallocate
  (not supported well in CephFS to begin with), copy_file_range
  (requires re-encryption), non-default striping patterns.

  This was a multi-year effort principally by Jeff Layton with
  assistance from Xiubo Li, Luís Henriques and others, including several
  dependant changes in the MDS, netfs helper library and fscrypt
  framework itself"

* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
  ceph: make num_fwd and num_retry to __u32
  ceph: make members in struct ceph_mds_request_args_ext a union
  rbd: use list_for_each_entry() helper
  libceph: do not include crypto/algapi.h
  ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
  ceph: fix updating i_truncate_pagecache_size for fscrypt
  ceph: wait for OSD requests' callbacks to finish when unmounting
  ceph: drop messages from MDS when unmounting
  ceph: update documentation regarding snapshot naming limitations
  ceph: prevent snapshot creation in encrypted locked directories
  ceph: add support for encrypted snapshot names
  ceph: invalidate pages when doing direct/sync writes
  ceph: plumb in decryption during reads
  ceph: add encryption support to writepage and writepages
  ceph: add read/modify/write to ceph_sync_write
  ceph: align data in pages in ceph_sync_write
  ceph: don't use special DIO path for encrypted inodes
  ceph: add truncate size handling support for fscrypt
  ceph: add object version support for sync read
  libceph: allow ceph_osdc_new_request to accept a multi-op read
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull ceph updates from Ilya Dryomov:
 "Mixed with some fixes and cleanups, this brings in reasonably complete
  fscrypt support to CephFS! The list of things which don't work with
  encryption should be fairly short, mostly around the edges: fallocate
  (not supported well in CephFS to begin with), copy_file_range
  (requires re-encryption), non-default striping patterns.

  This was a multi-year effort principally by Jeff Layton with
  assistance from Xiubo Li, Luís Henriques and others, including several
  dependant changes in the MDS, netfs helper library and fscrypt
  framework itself"

* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
  ceph: make num_fwd and num_retry to __u32
  ceph: make members in struct ceph_mds_request_args_ext a union
  rbd: use list_for_each_entry() helper
  libceph: do not include crypto/algapi.h
  ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
  ceph: fix updating i_truncate_pagecache_size for fscrypt
  ceph: wait for OSD requests' callbacks to finish when unmounting
  ceph: drop messages from MDS when unmounting
  ceph: update documentation regarding snapshot naming limitations
  ceph: prevent snapshot creation in encrypted locked directories
  ceph: add support for encrypted snapshot names
  ceph: invalidate pages when doing direct/sync writes
  ceph: plumb in decryption during reads
  ceph: add encryption support to writepage and writepages
  ceph: add read/modify/write to ceph_sync_write
  ceph: align data in pages in ceph_sync_write
  ceph: don't use special DIO path for encrypted inodes
  ceph: add truncate size handling support for fscrypt
  ceph: add object version support for sync read
  libceph: allow ceph_osdc_new_request to accept a multi-op read
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>drbd: swap bvec_set_page len and offset</title>
<updated>2023-09-06T13:33:03+00:00</updated>
<author>
<name>Christoph Böhmwalder</name>
<email>christoph.boehmwalder@linbit.com</email>
</author>
<published>2023-09-06T13:30:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4b9c2edaf7282d60e069551b4b28abc2932cd3e3'/>
<id>4b9c2edaf7282d60e069551b4b28abc2932cd3e3</id>
<content type='text'>
bvec_set_page has the following signature:

static inline void bvec_set_page(struct bio_vec *bv, struct page *page,
		unsigned int len, unsigned int offset)

However, the usage in DRBD swaps the len and offset parameters. This
leads to a bvec with length=0 instead of the intended length=4096, which
causes sock_sendmsg to return -EIO.

This leaves DRBD unable to transmit any pages and thus completely
broken.

Swapping the parameters fixes the regression.

Fixes: eeac7405c735 ("drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()")
Reported-by: Serguei Ivantsov &lt;manowar@gsc-game.com&gt;
Link: https://lore.kernel.org/regressions/CAKH+VT3YLmAn0Y8=q37UTDShqxDLsqPcQ4hBMzY7HPn7zNx+RQ@mail.gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Böhmwalder &lt;christoph.boehmwalder@linbit.com&gt;
Link: https://lore.kernel.org/r/20230906133034.948817-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bvec_set_page has the following signature:

static inline void bvec_set_page(struct bio_vec *bv, struct page *page,
		unsigned int len, unsigned int offset)

However, the usage in DRBD swaps the len and offset parameters. This
leads to a bvec with length=0 instead of the intended length=4096, which
causes sock_sendmsg to return -EIO.

This leaves DRBD unable to transmit any pages and thus completely
broken.

Swapping the parameters fixes the regression.

Fixes: eeac7405c735 ("drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()")
Reported-by: Serguei Ivantsov &lt;manowar@gsc-game.com&gt;
Link: https://lore.kernel.org/regressions/CAKH+VT3YLmAn0Y8=q37UTDShqxDLsqPcQ4hBMzY7HPn7zNx+RQ@mail.gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Böhmwalder &lt;christoph.boehmwalder@linbit.com&gt;
Link: https://lore.kernel.org/r/20230906133034.948817-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>null_blk: fix poll request timeout handling</title>
<updated>2023-09-01T14:18:25+00:00</updated>
<author>
<name>Chengming Zhou</name>
<email>zhouchengming@bytedance.com</email>
</author>
<published>2023-09-01T12:03:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a26e45edb4690d58406178b5a9ea4c6dcf2c105'/>
<id>5a26e45edb4690d58406178b5a9ea4c6dcf2c105</id>
<content type='text'>
When doing io_uring benchmark on /dev/nullb0, it's easy to crash the
kernel if poll requests timeout triggered, as reported by David. [1]

BUG: kernel NULL pointer dereference, address: 0000000000000008
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:null_timeout_rq+0x4e/0x91
Call Trace:
 ? null_timeout_rq+0x4e/0x91
 blk_mq_handle_expired+0x31/0x4b
 bt_iter+0x68/0x84
 ? bt_tags_iter+0x81/0x81
 __sbitmap_for_each_set.constprop.0+0xb0/0xf2
 ? __blk_mq_complete_request_remote+0xf/0xf
 bt_for_each+0x46/0x64
 ? __blk_mq_complete_request_remote+0xf/0xf
 ? percpu_ref_get_many+0xc/0x2a
 blk_mq_queue_tag_busy_iter+0x14d/0x18e
 blk_mq_timeout_work+0x95/0x127
 process_one_work+0x185/0x263
 worker_thread+0x1b5/0x227

This is indeed a race problem between null_timeout_rq() and null_poll().

null_poll()				null_timeout_rq()
  spin_lock(&amp;nq-&gt;poll_lock)
  list_splice_init(&amp;nq-&gt;poll_list, &amp;list)
  spin_unlock(&amp;nq-&gt;poll_lock)

  while (!list_empty(&amp;list))
    req = list_first_entry()
    list_del_init()
    ...
    blk_mq_add_to_batch()
    // req-&gt;rq_next = NULL
					spin_lock(&amp;nq-&gt;poll_lock)

					// rq-&gt;queuelist-&gt;next == NULL
					list_del_init(&amp;rq-&gt;queuelist)

					spin_unlock(&amp;nq-&gt;poll_lock)

Fix these problems by setting requests state to MQ_RQ_COMPLETE under
nq-&gt;poll_lock protection, in which null_timeout_rq() can safely detect
this race and early return.

Note this patch just fix the kernel panic when request timeout happen.

[1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/

Fixes: 0a593fbbc245 ("null_blk: poll queue support")
Reported-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Link: https://lore.kernel.org/r/20230901120306.170520-2-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When doing io_uring benchmark on /dev/nullb0, it's easy to crash the
kernel if poll requests timeout triggered, as reported by David. [1]

BUG: kernel NULL pointer dereference, address: 0000000000000008
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:null_timeout_rq+0x4e/0x91
Call Trace:
 ? null_timeout_rq+0x4e/0x91
 blk_mq_handle_expired+0x31/0x4b
 bt_iter+0x68/0x84
 ? bt_tags_iter+0x81/0x81
 __sbitmap_for_each_set.constprop.0+0xb0/0xf2
 ? __blk_mq_complete_request_remote+0xf/0xf
 bt_for_each+0x46/0x64
 ? __blk_mq_complete_request_remote+0xf/0xf
 ? percpu_ref_get_many+0xc/0x2a
 blk_mq_queue_tag_busy_iter+0x14d/0x18e
 blk_mq_timeout_work+0x95/0x127
 process_one_work+0x185/0x263
 worker_thread+0x1b5/0x227

This is indeed a race problem between null_timeout_rq() and null_poll().

null_poll()				null_timeout_rq()
  spin_lock(&amp;nq-&gt;poll_lock)
  list_splice_init(&amp;nq-&gt;poll_list, &amp;list)
  spin_unlock(&amp;nq-&gt;poll_lock)

  while (!list_empty(&amp;list))
    req = list_first_entry()
    list_del_init()
    ...
    blk_mq_add_to_batch()
    // req-&gt;rq_next = NULL
					spin_lock(&amp;nq-&gt;poll_lock)

					// rq-&gt;queuelist-&gt;next == NULL
					list_del_init(&amp;rq-&gt;queuelist)

					spin_unlock(&amp;nq-&gt;poll_lock)

Fix these problems by setting requests state to MQ_RQ_COMPLETE under
nq-&gt;poll_lock protection, in which null_timeout_rq() can safely detect
this race and early return.

Note this patch just fix the kernel panic when request timeout happen.

[1] https://lore.kernel.org/all/3893581.1691785261@warthog.procyon.org.uk/

Fixes: 0a593fbbc245 ("null_blk: poll queue support")
Reported-by: David Howells &lt;dhowells@redhat.com&gt;
Tested-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Link: https://lore.kernel.org/r/20230901120306.170520-2-chengming.zhou@linux.dev
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
