<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/dm.c, branch v6.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dm: fix unconditional IO throttle caused by REQ_PREFLUSH</title>
<updated>2025-02-24T11:09:44+00:00</updated>
<author>
<name>Jinliang Zheng</name>
<email>alexjlzheng@gmail.com</email>
</author>
<published>2025-02-20T11:20:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=88f7f56d16f568f19e1a695af34a7f4a6ce537a6'/>
<id>88f7f56d16f568f19e1a695af34a7f4a6ce537a6</id>
<content type='text'>
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash&gt; bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def2845cc33 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Reviewed-by: Tianxiang Peng &lt;txpeng@tencent.com&gt;
Reviewed-by: Hao Peng &lt;flyingpeng@tencent.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash&gt; bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def2845cc33 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Reviewed-by: Tianxiang Peng &lt;txpeng@tencent.com&gt;
Reviewed-by: Hao Peng &lt;flyingpeng@tencent.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: Ensure cloned bio is same length for atomic write</title>
<updated>2025-01-17T21:24:01+00:00</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2025-01-16T17:02:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f430e34087b4882545baa3d31962f1f7d735421'/>
<id>5f430e34087b4882545baa3d31962f1f7d735421</id>
<content type='text'>
For an atomic write, a cloned bio must be same length as the original bio,
i.e. no splitting.

Error in case it is not.

Per-dm device queue limits should be setup to ensure that this does not
happen, but error this case as an insurance policy.

Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For an atomic write, a cloned bio must be same length as the original bio,
i.e. no splitting.

Error in case it is not.

Per-dm device queue limits should be setup to ensure that this does not
happen, but error this case as an insurance policy.

Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: disable REQ_NOWAIT for flushes</title>
<updated>2025-01-17T21:05:39+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2025-01-09T13:49:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cd6521d03f6a722f18fc027f0d289c39d164dc4f'/>
<id>cd6521d03f6a722f18fc027f0d289c39d164dc4f</id>
<content type='text'>
REQ_NOWAIT for flushes cannot be easily supported by device mapper
because it may allocate multiple bios and its impossible to undo if one
of those allocations wants to wait. So, this patch disables REQ_NOWAIT
flushes in device mapper and we always return EAGAIN.

Previously, the code accepted REQ_NOWAIT flushes, but the non-blocking
execution was not guaranteed.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
REQ_NOWAIT for flushes cannot be easily supported by device mapper
because it may allocate multiple bios and its impossible to undo if one
of those allocations wants to wait. So, this patch disables REQ_NOWAIT
flushes in device mapper and we always return EAGAIN.

Previously, the code accepted REQ_NOWAIT flushes, but the non-blocking
execution was not guaranteed.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: remove useless test in alloc_multiple_bios</title>
<updated>2025-01-17T21:05:39+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2025-01-09T13:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6942348d1bbdb612bc14025cab467fd1f012abe8'/>
<id>6942348d1bbdb612bc14025cab467fd1f012abe8</id>
<content type='text'>
The test gfp_flag &amp; GFP_NOWAIT was always true, because both GFP_NOIO and
GFP_NOWAIT include the flag __GFP_KSWAPD_RECLAIM. Luckily, this oversight
didn't result in any harm; the loop always started with "try = 0".

This patch removes the faulty test and explicitly starts the loop with
"try = 0" (so that we don't hold the mutex in the first iteration).

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The test gfp_flag &amp; GFP_NOWAIT was always true, because both GFP_NOIO and
GFP_NOWAIT include the flag __GFP_KSWAPD_RECLAIM. Luckily, this oversight
didn't result in any harm; the loop always started with "try = 0".

This patch removes the faulty test and explicitly starts the loop with
"try = 0" (so that we don't hold the mutex in the first iteration).

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: add support for get_unique_id</title>
<updated>2024-11-20T10:38:04+00:00</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2024-10-31T15:06:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d5f01ace542de62af857fa0bd405cc16f2c45bd6'/>
<id>d5f01ace542de62af857fa0bd405cc16f2c45bd6</id>
<content type='text'>
This adds support to obtain a device's unique id through dm, similar to the
existing ioctl and persistent resevation handling.  We limit this to
single-target devices.

This enables knfsd to export pNFS SCSI luns that have been exported from
multipath devices.

Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds support to obtain a device's unique id through dm, similar to the
existing ioctl and persistent resevation handling.  We limit this to
single-target devices.

This enables knfsd to export pNFS SCSI luns that have been exported from
multipath devices.

Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: Remove unused dm_set_md_type</title>
<updated>2024-11-20T10:38:04+00:00</updated>
<author>
<name>Dr. David Alan Gilbert</name>
<email>linux@treblig.org</email>
</author>
<published>2024-10-03T01:15:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=feb83afa4e074a67b24811c9c00b688ca5c64b90'/>
<id>feb83afa4e074a67b24811c9c00b688ca5c64b90</id>
<content type='text'>
dm_set_md_type() has been unused since commit
ba30585936b0 ("dm: move setting md-&gt;type into dm_setup_md_queue")

Remove it.

Signed-off-by: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dm_set_md_type() has been unused since commit
ba30585936b0 ("dm: move setting md-&gt;type into dm_setup_md_queue")

Remove it.

Signed-off-by: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: fix a crash if blk_alloc_disk fails</title>
<updated>2024-10-15T11:37:17+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2024-10-07T11:38:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fed13a5478680614ba97fc87e71f16e2e197912e'/>
<id>fed13a5478680614ba97fc87e71f16e2e197912e</id>
<content type='text'>
If blk_alloc_disk fails, the variable md-&gt;disk is set to an error value.
cleanup_mapped_device will see that md-&gt;disk is non-NULL and it will
attempt to access it, causing a crash on this statement
"md-&gt;disk-&gt;private_data = NULL;".

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reported-by: Chenyuan Yang &lt;chenyuan0y@gmail.com&gt;
Closes: https://marc.info/?l=dm-devel&amp;m=172824125004329&amp;w=2
Cc: stable@vger.kernel.org
Reviewed-by: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If blk_alloc_disk fails, the variable md-&gt;disk is set to an error value.
cleanup_mapped_device will see that md-&gt;disk is non-NULL and it will
attempt to access it, causing a crash on this statement
"md-&gt;disk-&gt;private_data = NULL;".

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reported-by: Chenyuan Yang &lt;chenyuan0y@gmail.com&gt;
Closes: https://marc.info/?l=dm-devel&amp;m=172824125004329&amp;w=2
Cc: stable@vger.kernel.org
Reviewed-by: Nitesh Shetty &lt;nj.shetty@samsung.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "dm: requeue IO if mapping table not yet available"</title>
<updated>2024-09-15T19:02:54+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2024-09-13T13:05:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8691cd0fc11197515ed148de0780d927bfca38b'/>
<id>c8691cd0fc11197515ed148de0780d927bfca38b</id>
<content type='text'>
This reverts commit fa247089de9936a46e290d4724cb5f0b845600f5.

The following sequence of commands causes a livelock - there will be
workqueue process looping and consuming 100% CPU:

dmsetup create --notable test
truncate -s 1MiB testdata
losetup /dev/loop0 testdata
dmsetup load test --table '0 2048 linear /dev/loop0 0'
dd if=/dev/zero of=/dev/dm-0 bs=16k count=1 conv=fdatasync

The livelock is caused by the commit fa247089de99. The commit claims that
it fixes a race condition, however, it is unknown what the actual race
condition is and what program is involved in the race condition.

When the inactive table is loaded, the nodes /dev/dm-0 and
/sys/block/dm-0 are created. /dev/dm-0 has zero size at this point. When
the device is suspended and resumed, the nodes /dev/mapper/test and
/dev/disk/* are created.

If some program opens a block device before it is created by dmsetup or
lvm, the program is buggy, so dm could just report an error as it used to
do before.

Reported-by: Zdenek Kabelac &lt;zkabelac@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Fixes: fa247089de99 ("dm: requeue IO if mapping table not yet available")
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit fa247089de9936a46e290d4724cb5f0b845600f5.

The following sequence of commands causes a livelock - there will be
workqueue process looping and consuming 100% CPU:

dmsetup create --notable test
truncate -s 1MiB testdata
losetup /dev/loop0 testdata
dmsetup load test --table '0 2048 linear /dev/loop0 0'
dd if=/dev/zero of=/dev/dm-0 bs=16k count=1 conv=fdatasync

The livelock is caused by the commit fa247089de99. The commit claims that
it fixes a race condition, however, it is unknown what the actual race
condition is and what program is involved in the race condition.

When the inactive table is loaded, the nodes /dev/dm-0 and
/sys/block/dm-0 are created. /dev/dm-0 has zero size at this point. When
the device is suspended and resumed, the nodes /dev/mapper/test and
/dev/disk/* are created.

If some program opens a block device before it is created by dmsetup or
lvm, the program is buggy, so dm could just report an error as it used to
do before.

Reported-by: Zdenek Kabelac &lt;zkabelac@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Fixes: fa247089de99 ("dm: requeue IO if mapping table not yet available")
</pre>
</div>
</content>
</entry>
<entry>
<title>dm suspend: return -ERESTARTSYS instead of -EINTR</title>
<updated>2024-08-13T11:50:45+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2024-08-13T10:38:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23'/>
<id>1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23</id>
<content type='text'>
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).

The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: stable@vger.kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit changes device mapper, so that it returns -ERESTARTSYS
instead of -EINTR when it is interrupted by a signal (so that the ioctl
can be restarted).

The manpage signal(7) says that the ioctl function should be restarted if
the signal was handled with SA_RESTART.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Cc: stable@vger.kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux</title>
<updated>2024-07-22T18:04:09+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-07-22T18:04:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0256994887d7c89c2a41d872aac67605bda8f115'/>
<id>0256994887d7c89c2a41d872aac67605bda8f115</id>
<content type='text'>
Pull block integrity mapping updates from Jens Axboe:
 "A set of cleanups and fixes for the block integrity support.

  Sent separately from the main block changes from last week, as they
  depended on later fixes in the 6.10-rc cycle"

* tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux:
  block: don't free the integrity payload in bio_integrity_unmap_free_user
  block: don't free submitter owned integrity payload on I/O completion
  block: call bio_integrity_unmap_free_user from blk_rq_unmap_user
  block: don't call bio_uninit from bio_endio
  block: also return bio_integrity_payload * from stubs
  block: split integrity support out of bio.h
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block integrity mapping updates from Jens Axboe:
 "A set of cleanups and fixes for the block integrity support.

  Sent separately from the main block changes from last week, as they
  depended on later fixes in the 6.10-rc cycle"

* tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux:
  block: don't free the integrity payload in bio_integrity_unmap_free_user
  block: don't free submitter owned integrity payload on I/O completion
  block: call bio_integrity_unmap_free_user from blk_rq_unmap_user
  block: don't call bio_uninit from bio_endio
  block: also return bio_integrity_payload * from stubs
  block: split integrity support out of bio.h
</pre>
</div>
</content>
</entry>
</feed>
