<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/dm.c, branch v4.16</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dm: fix dropped return code from dm_get_bdev_for_ioctl</title>
<updated>2018-03-30T03:31:32+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2018-03-30T03:31:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=da5dadb4f11660ca67580cd4a7420161266d6254'/>
<id>da5dadb4f11660ca67580cd4a7420161266d6254</id>
<content type='text'>
dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from
prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it
wasn't).  Unfortunately commit 519049afea ("dm: use blkdev_get rather
than bdgrab when issuing pass-through ioctl") reused the variable 'r'
to store the return from blkdev_get() that follows prepare_ioctl()
-- whereby dropping prepare_ioctl()'s result on the floor.

This can lead to an ioctl or persistent reservation being issued to a
partition going unnoticed, which implies the extra permission check for
CAP_SYS_RAWIO is skipped.

Fix this by using a different variable to store blkdev_get()'s return.

Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Reported-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dm_get_bdev_for_ioctl()'s return of 0 or 1 must be the result from
prepare_ioctl (1 means the ioctl was issued to a partition, 0 means it
wasn't).  Unfortunately commit 519049afea ("dm: use blkdev_get rather
than bdgrab when issuing pass-through ioctl") reused the variable 'r'
to store the return from blkdev_get() that follows prepare_ioctl()
-- whereby dropping prepare_ioctl()'s result on the floor.

This can lead to an ioctl or persistent reservation being issued to a
partition going unnoticed, which implies the extra permission check for
CAP_SYS_RAWIO is skipped.

Fix this by using a different variable to store blkdev_get()'s return.

Fixes: 519049afea ("dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl")
Reported-by: Alasdair G Kergon &lt;agk@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: use blkdev_get rather than bdgrab when issuing pass-through ioctl</title>
<updated>2018-03-07T01:23:57+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2018-02-22T18:31:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=519049afead4f7c3e6446028c41e99fde958cc04'/>
<id>519049afead4f7c3e6446028c41e99fde958cc04</id>
<content type='text'>
Otherwise an underlying device's teardown (e.g. SCSI) may race with the
DM ioctl or persistent reservation and result in dereferencing driver
memory that gets freed when the underlying device's final blkdev_put()
occurs.

bdgrab() only increases the refcount for the block_device's inode to
ensure the block_device struct itself will not be freed, but does not
guarantee the block_device will remain associated with the gendisk or
its storage.

Cc: stable@vger.kernel.org # 4.8+
Reported-by: David Jeffery &lt;djeffery@redhat.com&gt;
Suggested-by: David Jeffery &lt;djeffery@redhat.com&gt;
Reviewed-by: Ben Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Otherwise an underlying device's teardown (e.g. SCSI) may race with the
DM ioctl or persistent reservation and result in dereferencing driver
memory that gets freed when the underlying device's final blkdev_put()
occurs.

bdgrab() only increases the refcount for the block_device's inode to
ensure the block_device struct itself will not be freed, but does not
guarantee the block_device will remain associated with the gendisk or
its storage.

Cc: stable@vger.kernel.org # 4.8+
Reported-by: David Jeffery &lt;djeffery@redhat.com&gt;
Suggested-by: David Jeffery &lt;djeffery@redhat.com&gt;
Reviewed-by: Ben Marzinski &lt;bmarzins@redhat.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: correctly handle chained bios in dec_pending()</title>
<updated>2018-02-16T15:46:35+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2018-02-15T09:00:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8dd601fa8317243be887458c49f6c29c2f3d719f'/>
<id>8dd601fa8317243be887458c49f6c29c2f3d719f</id>
<content type='text'>
dec_pending() is given an error status (possibly 0) to be recorded
against a bio.  It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io-&gt;status.  However when it then assigned io-&gt;status to bio-&gt;bi_status,
it is not careful and could overwrite a genuine error status with 0.

This can happen when chained bios are in use.  If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio-&gt;bi_status before the dm_io completes.

This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.

A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
-&gt;bi_status, and the other fragment will normally complete a little
later, and will clear -&gt;bi_status.

The fix is simply to only assign io_error to bio-&gt;bi_status when
io_error is not zero.

Reported-and-tested-by: Milan Broz &lt;gmazyland@gmail.com&gt;
Cc: stable@vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dec_pending() is given an error status (possibly 0) to be recorded
against a bio.  It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io-&gt;status.  However when it then assigned io-&gt;status to bio-&gt;bi_status,
it is not careful and could overwrite a genuine error status with 0.

This can happen when chained bios are in use.  If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio-&gt;bi_status before the dm_io completes.

This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.

A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
-&gt;bi_status, and the other fragment will normally complete a little
later, and will clear -&gt;bi_status.

The fix is simply to only assign io_error to bio-&gt;bi_status when
io_error is not zero.

Reported-and-tested-by: Milan Broz &lt;gmazyland@gmail.com&gt;
Cc: stable@vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm</title>
<updated>2018-01-31T19:05:47+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-01-31T19:05:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0be600a5add76e8e8b9e1119f2a7426ff849aca8'/>
<id>0be600a5add76e8e8b9e1119f2a7426ff849aca8</id>
<content type='text'>
Pull device mapper updates from Mike Snitzer:

 - DM core fixes to ensure that bio submission follows a depth-first
   tree walk; this is critical to allow forward progress without the
   need to use the bioset's BIOSET_NEED_RESCUER.

 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.

 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).

 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool
   that stack directly on NVMe data device.

 - DM multipath improvements to factor out legacy SCSI-only (e.g.
   scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.

 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.

 - Numerous DM raid target fixes and improvements.

 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).

 - Various Documentation fixes and improvements.

 - Misc cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).

* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
  dm cache: Documentation: update default migration_throttling value
  dm mpath selector: more evenly distribute ties
  dm unstripe: fix target length versus number of stripes size check
  dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
  dm table: fix NVMe bio-based dm_table_determine_type() validation
  dm: various cleanups to md-&gt;queue initialization code
  dm mpath: delay the retry of a request if the target responded as busy
  dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
  dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
  dm log writes: fix max length used for kstrndup
  dm: backfill missing calls to mutex_destroy()
  dm snapshot: use mutex instead of rw_semaphore
  dm flakey: check for null arg_name in parse_features()
  dm thin: extend thinpool status format string with omitted fields
  dm thin: fixes in thin-provisioning.txt
  dm thin: document representation of &lt;highest mapped sector&gt; when there is none
  dm thin: fix documentation relative to low water mark threshold
  dm cache: be consistent in specifying sectors and SI units in cache.txt
  dm cache: delete obsoleted paragraph in cache.txt
  dm cache: fix grammar in cache-policies.txt
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull device mapper updates from Mike Snitzer:

 - DM core fixes to ensure that bio submission follows a depth-first
   tree walk; this is critical to allow forward progress without the
   need to use the bioset's BIOSET_NEED_RESCUER.

 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.

 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).

 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool
   that stack directly on NVMe data device.

 - DM multipath improvements to factor out legacy SCSI-only (e.g.
   scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.

 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.

 - Numerous DM raid target fixes and improvements.

 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).

 - Various Documentation fixes and improvements.

 - Misc cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).

* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
  dm cache: Documentation: update default migration_throttling value
  dm mpath selector: more evenly distribute ties
  dm unstripe: fix target length versus number of stripes size check
  dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
  dm table: fix NVMe bio-based dm_table_determine_type() validation
  dm: various cleanups to md-&gt;queue initialization code
  dm mpath: delay the retry of a request if the target responded as busy
  dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
  dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
  dm log writes: fix max length used for kstrndup
  dm: backfill missing calls to mutex_destroy()
  dm snapshot: use mutex instead of rw_semaphore
  dm flakey: check for null arg_name in parse_features()
  dm thin: extend thinpool status format string with omitted fields
  dm thin: fixes in thin-provisioning.txt
  dm thin: document representation of &lt;highest mapped sector&gt; when there is none
  dm thin: fix documentation relative to low water mark threshold
  dm cache: be consistent in specifying sectors and SI units in cache.txt
  dm cache: delete obsoleted paragraph in cache.txt
  dm cache: fix grammar in cache-policies.txt
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: various cleanups to md-&gt;queue initialization code</title>
<updated>2018-01-29T18:44:55+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2018-01-12T14:32:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c12c9a3c3860c76ba273798c0c34c6f1294cc759'/>
<id>c12c9a3c3860c76ba273798c0c34c6f1294cc759</id>
<content type='text'>
Also, add dm_sysfs_init() error handling to dm_create().

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Also, add dm_sysfs_init() error handling to dm_create().

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: backfill missing calls to mutex_destroy()</title>
<updated>2018-01-17T14:16:15+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2018-01-06T02:17:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d5ffebdd797a7c1c89576267640f671db2a668fc'/>
<id>d5ffebdd797a7c1c89576267640f671db2a668fc</id>
<content type='text'>
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: fix incomplete request_queue initialization</title>
<updated>2018-01-15T15:54:32+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2018-01-09T01:03:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c100ec49fdd2222836ff8a17c7bfcc7611d2ee2b'/>
<id>c100ec49fdd2222836ff8a17c7bfcc7611d2ee2b</id>
<content type='text'>
DM is no longer prone to having its request_queue be improperly
initialized.

Summary of changes:

- defer DM's blk_register_queue() from add_disk()-time until
  dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev().

- dm_setup_md_queue() is updated to fully initialize DM's request_queue
  (_after_ all table loads have occurred and the request_queue's type,
  features and limits are known).

A very welcome side-effect of these changes is DM no longer needs to:
1) backfill the "mq" sysfs entry (because historically DM didn't
initialize the request_queue to use blk-mq until _after_
blk_register_queue() was called via add_disk()).
2) call elv_register_queue() to get .request_fn request-based DM
device's "iosched" exposed in syfs.

In addition, blk-mq debugfs support is now made available because
request-based DM's blk-mq request_queue is now properly initialized
before dm_setup_md_queue() calls blk_register_queue().

These changes also stave off the need to introduce new DM-specific
workarounds in block core, e.g. this proposal:
https://patchwork.kernel.org/patch/10067961/

In the end DM devices should be less unicorn in nature (relative to
initialization and availability of block core infrastructure provided by
the request_queue).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Tested-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DM is no longer prone to having its request_queue be improperly
initialized.

Summary of changes:

- defer DM's blk_register_queue() from add_disk()-time until
  dm_setup_md_queue() by using add_disk_no_queue_reg() in alloc_dev().

- dm_setup_md_queue() is updated to fully initialize DM's request_queue
  (_after_ all table loads have occurred and the request_queue's type,
  features and limits are known).

A very welcome side-effect of these changes is DM no longer needs to:
1) backfill the "mq" sysfs entry (because historically DM didn't
initialize the request_queue to use blk-mq until _after_
blk_register_queue() was called via add_disk()).
2) call elv_register_queue() to get .request_fn request-based DM
device's "iosched" exposed in syfs.

In addition, blk-mq debugfs support is now made available because
request-based DM's blk-mq request_queue is now properly initialized
before dm_setup_md_queue() calls blk_register_queue().

These changes also stave off the need to introduce new DM-specific
workarounds in block core, e.g. this proposal:
https://patchwork.kernel.org/patch/10067961/

In the end DM devices should be less unicorn in nature (relative to
initialization and availability of block core infrastructure provided by
the request_queue).

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Tested-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: limit the max bio size as BIO_MAX_PAGES * PAGE_SIZE</title>
<updated>2018-01-06T16:18:00+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2017-12-18T12:22:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8f50e358153dd68182c714626be4a90b64179cf4'/>
<id>8f50e358153dd68182c714626be4a90b64179cf4</id>
<content type='text'>
For BIO based DM, some targets aren't ready for dealing with bigger
incoming bio than 1Mbyte, such as crypt target.

Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc:dm-devel@redhat.com
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For BIO based DM, some targets aren't ready for dealing with bigger
incoming bio than 1Mbyte, such as crypt target.

Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc:dm-devel@redhat.com
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: optimize bio-based NVMe IO submission</title>
<updated>2017-12-20T15:51:11+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2017-12-09T20:16:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=978e51ba38e00e9da09b3ef9ed8c94af7b55a1eb'/>
<id>978e51ba38e00e9da09b3ef9ed8c94af7b55a1eb</id>
<content type='text'>
Upper level bio-based drivers that stack immediately ontop of NVMe can
leverage direct_make_request().  In addition DM's NVMe bio-based
will initially only ever have one NVMe device that it submits IO to at a
time.  There is no splitting needed.  Enhance DM core so that
DM_TYPE_NVME_BIO_BASED's IO submission takes advantage of both of these
characteristics.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Upper level bio-based drivers that stack immediately ontop of NVMe can
leverage direct_make_request().  In addition DM's NVMe bio-based
will initially only ever have one NVMe device that it submits IO to at a
time.  There is no splitting needed.  Enhance DM core so that
DM_TYPE_NVME_BIO_BASED's IO submission takes advantage of both of these
characteristics.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dm: introduce DM_TYPE_NVME_BIO_BASED</title>
<updated>2017-12-20T15:51:10+00:00</updated>
<author>
<name>Mike Snitzer</name>
<email>snitzer@redhat.com</email>
</author>
<published>2017-12-05T02:07:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=22c11858e8002592c59ebb762e4e42dc634bf84f'/>
<id>22c11858e8002592c59ebb762e4e42dc634bf84f</id>
<content type='text'>
If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
all devices in the DM table do not support partial completions.  Also,
the table has a single immutable target that doesn't require DM core to
split bios.

This will enable adding NVMe optimizations to bio-based DM.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If dm_table_determine_type() establishes DM_TYPE_NVME_BIO_BASED then
all devices in the DM table do not support partial completions.  Also,
the table has a single immutable target that doesn't require DM core to
split bios.

This will enable adding NVMe optimizations to bio-based DM.

Signed-off-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
