<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/dm.c, branch v6.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dm: Allow .prepare_ioctl to handle ioctls directly</title>
<updated>2025-05-04T09:35:05+00:00</updated>
<author>
<name>Kevin Wolf</name>
<email>kwolf@redhat.com</email>
</author>
<published>2025-04-29T16:50:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4862c8861d902d43645a493e441c4478be1c6c44'/>
<id>4862c8861d902d43645a493e441c4478be1c6c44</id>
<content type='text'>
This adds a 'bool *forward' parameter to .prepare_ioctl, which allows
device mapper targets to accept ioctls to themselves instead of the
underlying device. If the target already fully handled the ioctl, it
sets *forward to false and device mapper won't forward it to the
underlying device any more.

In order for targets to actually know what the ioctl is about and how to
handle it, pass also cmd and arg.

As long as targets restrict themselves to interpreting ioctls of type
DM_IOCTL, this is a backwards compatible change because previously, any
such ioctl would have been passed down through all device mapper layers
until it reached a device that can't understand the ioctl and would
return an error.

Signed-off-by: Kevin Wolf &lt;kwolf@redhat.com&gt;
Reviewed-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds a 'bool *forward' parameter to .prepare_ioctl, which allows
device mapper targets to accept ioctls to themselves instead of the
underlying device. If the target already fully handled the ioctl, it
sets *forward to false and device mapper won't forward it to the
underlying device any more.

In order for targets to actually know what the ioctl is about and how to
handle it, pass also cmd and arg.

As long as targets restrict themselves to interpreting ioctls of type
DM_IOCTL, this is a backwards compatible change because previously, any
such ioctl would have been passed down through all device mapper layers
until it reached a device that can't understand the ioctl and would
return an error.

Signed-off-by: Kevin Wolf &lt;kwolf@redhat.com&gt;
Reviewed-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: use generic functions instead of disable_discard and disable_write_zeroes</title>
<updated>2025-05-04T09:35:05+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2025-04-22T19:19:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f1e24048edb4c51678948cc6a08d20b57cde1471'/>
<id>f1e24048edb4c51678948cc6a08d20b57cde1471</id>
<content type='text'>
A small code cleanup: use blk_queue_disable_discard and
blk_queue_disable_write_zeroes instead of disable_discard and
disable_write_zeroes.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A small code cleanup: use blk_queue_disable_discard and
blk_queue_disable_write_zeroes instead of disable_discard and
disable_write_zeroes.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: limit swapping tables for devices with zone write plugs</title>
<updated>2025-05-04T09:35:05+00:00</updated>
<author>
<name>Benjamin Marzinski</name>
<email>bmarzins@redhat.com</email>
</author>
<published>2025-04-10T19:49:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=121218bef4c1df165181f5cd8fc3a2246bac817e'/>
<id>121218bef4c1df165181f5cd8fc3a2246bac817e</id>
<content type='text'>
dm_revalidate_zones() only allowed new or previously unzoned devices to
call blk_revalidate_disk_zones(). If the device was already zoned,
disk-&gt;nr_zones would always equal md-&gt;nr_zones, so dm_revalidate_zones()
returned without doing any work. This would make the zoned settings for
the device not match the new table. If the device had zone write plug
resources, it could run into errors like bdev_zone_is_seq() reading
invalid memory because disk-&gt;conv_zones_bitmap was the wrong size.

If the device doesn't have any zone write plug resources, calling
blk_revalidate_disk_zones() will always correctly update device.  If
blk_revalidate_disk_zones() fails, it can still overwrite or clear the
current disk-&gt;nr_zones value. In this case, DM must restore the previous
value of disk-&gt;nr_zones, so that the zoned settings will continue to
match the previous value that it fell back to.

If the device already has zone write plug resources,
blk_revalidate_disk_zones() will not correctly update them, if it is
called for arbitrary zoned device changes.  Since there is not much need
for this ability, the easiest solution is to disallow any table reloads
that change the zoned settings, for devices that already have zone plug
resources.  Specifically, if a device already has zone plug resources
allocated, it can only switch to another zoned table that also emulates
zone append.  Also, it cannot change the device size or the zone size. A
device can switch to an error target.

Fixes: bb37d77239af2 ("dm: introduce zone append emulation")
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dm_revalidate_zones() only allowed new or previously unzoned devices to
call blk_revalidate_disk_zones(). If the device was already zoned,
disk-&gt;nr_zones would always equal md-&gt;nr_zones, so dm_revalidate_zones()
returned without doing any work. This would make the zoned settings for
the device not match the new table. If the device had zone write plug
resources, it could run into errors like bdev_zone_is_seq() reading
invalid memory because disk-&gt;conv_zones_bitmap was the wrong size.

If the device doesn't have any zone write plug resources, calling
blk_revalidate_disk_zones() will always correctly update device.  If
blk_revalidate_disk_zones() fails, it can still overwrite or clear the
current disk-&gt;nr_zones value. In this case, DM must restore the previous
value of disk-&gt;nr_zones, so that the zoned settings will continue to
match the previous value that it fell back to.

If the device already has zone write plug resources,
blk_revalidate_disk_zones() will not correctly update them, if it is
called for arbitrary zoned device changes.  Since there is not much need
for this ability, the easiest solution is to disallow any table reloads
that change the zoned settings, for devices that already have zone plug
resources.  Specifically, if a device already has zone plug resources
allocated, it can only switch to another zoned table that also emulates
zone append.  Also, it cannot change the device size or the zone size. A
device can switch to an error target.

Fixes: bb37d77239af2 ("dm: introduce zone append emulation")
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: free table mempools if not used in __bind</title>
<updated>2025-04-11T11:38:50+00:00</updated>
<author>
<name>Benjamin Marzinski</name>
<email>bmarzins@redhat.com</email>
</author>
<published>2025-04-10T19:49:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e8819e7f03470c5b468720630d9e4e1d5b99159e'/>
<id>e8819e7f03470c5b468720630d9e4e1d5b99159e</id>
<content type='text'>
With request-based dm, the mempools don't need reloading when switching
tables, but the unused table mempools are not freed until the active
table is finally freed. Free them immediately if they are not needed.

Fixes: 29dec90a0f1d9 ("dm: fix bio_set allocation")
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With request-based dm, the mempools don't need reloading when switching
tables, but the unused table mempools are not freed until the active
table is finally freed. Free them immediately if they are not needed.

Fixes: 29dec90a0f1d9 ("dm: fix bio_set allocation")
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: don't change md if dm_table_set_restrictions() fails</title>
<updated>2025-04-11T11:38:39+00:00</updated>
<author>
<name>Benjamin Marzinski</name>
<email>bmarzins@redhat.com</email>
</author>
<published>2025-04-10T19:49:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9eb7109a5bfc5b8226e9517e9f3cc6d414391884'/>
<id>9eb7109a5bfc5b8226e9517e9f3cc6d414391884</id>
<content type='text'>
__bind was changing the disk capacity, geometry and mempools of the
mapped device before calling dm_table_set_restrictions() which could
fail, forcing dm to drop the new table. Failing here would leave the
device using the old table but with the wrong capacity and mempools.

Move dm_table_set_restrictions() earlier in __bind(). Since it needs the
capacity to be set, save the old version and restore it on failure.

Fixes: bb37d77239af2 ("dm: introduce zone append emulation")
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
__bind was changing the disk capacity, geometry and mempools of the
mapped device before calling dm_table_set_restrictions() which could
fail, forcing dm to drop the new table. Failing here would leave the
device using the old table but with the wrong capacity and mempools.

Move dm_table_set_restrictions() earlier in __bind(). Since it needs the
capacity to be set, save the old version and restore it on failure.

Fixes: bb37d77239af2 ("dm: introduce zone append emulation")
Reviewed-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Tested-by: Damien Le Moal &lt;dlemoal@kernel.org&gt;
Signed-off-by: Benjamin Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: fix unconditional IO throttle caused by REQ_PREFLUSH</title>
<updated>2025-02-24T11:09:44+00:00</updated>
<author>
<name>Jinliang Zheng</name>
<email>alexjlzheng@gmail.com</email>
</author>
<published>2025-02-20T11:20:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=88f7f56d16f568f19e1a695af34a7f4a6ce537a6'/>
<id>88f7f56d16f568f19e1a695af34a7f4a6ce537a6</id>
<content type='text'>
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash&gt; bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def2845cc33 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Reviewed-by: Tianxiang Peng &lt;txpeng@tencent.com&gt;
Reviewed-by: Hao Peng &lt;flyingpeng@tencent.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().

An example from v5.4, similar problem also exists in upstream:

    crash&gt; bt 2091206
    PID: 2091206  TASK: ffff2050df92a300  CPU: 109  COMMAND: "kworker/u260:0"
     #0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
     #1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
     #2 [ffff800084a2f880] schedule at ffff800040bfa4b4
     #3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
     #4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
     #5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
     #6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
     #7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
     #8 [ffff800084a2fa60] generic_make_request at ffff800040570138
     #9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
    #10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
    #11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
    #12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
    #13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
    #14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
    #15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
    #16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
    #17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
    #18 [ffff800084a2fe70] kthread at ffff800040118de4

After commit 2def2845cc33 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.

Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait().

Signed-off-by: Jinliang Zheng &lt;alexjlzheng@tencent.com&gt;
Reviewed-by: Tianxiang Peng &lt;txpeng@tencent.com&gt;
Reviewed-by: Hao Peng &lt;flyingpeng@tencent.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: Ensure cloned bio is same length for atomic write</title>
<updated>2025-01-17T21:24:01+00:00</updated>
<author>
<name>John Garry</name>
<email>john.g.garry@oracle.com</email>
</author>
<published>2025-01-16T17:02:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5f430e34087b4882545baa3d31962f1f7d735421'/>
<id>5f430e34087b4882545baa3d31962f1f7d735421</id>
<content type='text'>
For an atomic write, a cloned bio must be same length as the original bio,
i.e. no splitting.

Error in case it is not.

Per-dm device queue limits should be setup to ensure that this does not
happen, but error this case as an insurance policy.

Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For an atomic write, a cloned bio must be same length as the original bio,
i.e. no splitting.

Error in case it is not.

Per-dm device queue limits should be setup to ensure that this does not
happen, but error this case as an insurance policy.

Signed-off-by: John Garry &lt;john.g.garry@oracle.com&gt;
Reviewed-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: disable REQ_NOWAIT for flushes</title>
<updated>2025-01-17T21:05:39+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2025-01-09T13:49:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cd6521d03f6a722f18fc027f0d289c39d164dc4f'/>
<id>cd6521d03f6a722f18fc027f0d289c39d164dc4f</id>
<content type='text'>
REQ_NOWAIT for flushes cannot be easily supported by device mapper
because it may allocate multiple bios and its impossible to undo if one
of those allocations wants to wait. So, this patch disables REQ_NOWAIT
flushes in device mapper and we always return EAGAIN.

Previously, the code accepted REQ_NOWAIT flushes, but the non-blocking
execution was not guaranteed.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
REQ_NOWAIT for flushes cannot be easily supported by device mapper
because it may allocate multiple bios and its impossible to undo if one
of those allocations wants to wait. So, this patch disables REQ_NOWAIT
flushes in device mapper and we always return EAGAIN.

Previously, the code accepted REQ_NOWAIT flushes, but the non-blocking
execution was not guaranteed.

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: remove useless test in alloc_multiple_bios</title>
<updated>2025-01-17T21:05:39+00:00</updated>
<author>
<name>Mikulas Patocka</name>
<email>mpatocka@redhat.com</email>
</author>
<published>2025-01-09T13:47:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6942348d1bbdb612bc14025cab467fd1f012abe8'/>
<id>6942348d1bbdb612bc14025cab467fd1f012abe8</id>
<content type='text'>
The test gfp_flag &amp; GFP_NOWAIT was always true, because both GFP_NOIO and
GFP_NOWAIT include the flag __GFP_KSWAPD_RECLAIM. Luckily, this oversight
didn't result in any harm; the loop always started with "try = 0".

This patch removes the faulty test and explicitly starts the loop with
"try = 0" (so that we don't hold the mutex in the first iteration).

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The test gfp_flag &amp; GFP_NOWAIT was always true, because both GFP_NOIO and
GFP_NOWAIT include the flag __GFP_KSWAPD_RECLAIM. Luckily, this oversight
didn't result in any harm; the loop always started with "try = 0".

This patch removes the faulty test and explicitly starts the loop with
"try = 0" (so that we don't hold the mutex in the first iteration).

Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: add support for get_unique_id</title>
<updated>2024-11-20T10:38:04+00:00</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2024-10-31T15:06:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d5f01ace542de62af857fa0bd405cc16f2c45bd6'/>
<id>d5f01ace542de62af857fa0bd405cc16f2c45bd6</id>
<content type='text'>
This adds support to obtain a device's unique id through dm, similar to the
existing ioctl and persistent resevation handling.  We limit this to
single-target devices.

This enables knfsd to export pNFS SCSI luns that have been exported from
multipath devices.

Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This adds support to obtain a device's unique id through dm, similar to the
existing ioctl and persistent resevation handling.  We limit this to
single-target devices.

This enables knfsd to export pNFS SCSI luns that have been exported from
multipath devices.

Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Signed-off-by: Mikulas Patocka &lt;mpatocka@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
