<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/md.h, branch v4.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>md: separate request handling</title>
<updated>2017-09-28T03:07:40+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-09-21T17:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=393debc23c7820211d1c8253dd6a8408a7628fe7'/>
<id>393debc23c7820211d1c8253dd6a8408a7628fe7</id>
<content type='text'>
With commit cc27b0c78c79, pers-&gt;make_request could bail out without handling
the bio. If that happens, we should retry.  The commit fixes md_make_request
but not other call sites. Separate the request handling part, so other call
sites can use it.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With commit cc27b0c78c79, pers-&gt;make_request could bail out without handling
the bio. If that happens, we should retry.  The commit fixes md_make_request
but not other call sites. Separate the request handling part, so other call
sites can use it.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2017-09-07T19:41:48+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-09-07T19:41:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3645e6d0dc80be4376f87acc9ee527768387c909'/>
<id>3645e6d0dc80be4376f87acc9ee527768387c909</id>
<content type='text'>
Pull MD updates from Shaohua Li:
 "This update mainly fixes bugs:

   - Make raid5 ppl support several ppl from Pawel

   - Several raid5-cache bug fixes from Song

   - Bitmap fixes from Neil and Me

   - One raid1/10 regression fix since 4.12 from Me

   - Other small fixes and cleanup"

* tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/bitmap: disable bitmap_resize for file-backed bitmaps.
  raid5-ppl: Recovery support for multiple partial parity logs
  md: Runtime support for multiple ppls
  md/raid0: attach correct cgroup info in bio
  lib/raid6: align AVX512 constants to 512 bits, not bytes
  raid5: remove raid5_build_block
  md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
  md: replace seq_release_private with seq_release
  md: notify about new spare disk in the container
  md/raid1/10: reset bio allocated from mempool
  md/raid5: release/flush io in raid5_do_work()
  md/bitmap: copy correct data for bitmap super
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MD updates from Shaohua Li:
 "This update mainly fixes bugs:

   - Make raid5 ppl support several ppl from Pawel

   - Several raid5-cache bug fixes from Song

   - Bitmap fixes from Neil and Me

   - One raid1/10 regression fix since 4.12 from Me

   - Other small fixes and cleanup"

* tag 'md/4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  md/bitmap: disable bitmap_resize for file-backed bitmaps.
  raid5-ppl: Recovery support for multiple partial parity logs
  md: Runtime support for multiple ppls
  md/raid0: attach correct cgroup info in bio
  lib/raid6: align AVX512 constants to 512 bits, not bytes
  raid5: remove raid5_build_block
  md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_show
  md: replace seq_release_private with seq_release
  md: notify about new spare disk in the container
  md/raid1/10: reset bio allocated from mempool
  md/raid5: release/flush io in raid5_do_work()
  md/bitmap: copy correct data for bitmap super
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Runtime support for multiple ppls</title>
<updated>2017-08-28T14:45:48+00:00</updated>
<author>
<name>Pawel Baldysiak</name>
<email>pawel.baldysiak@intel.com</email>
</author>
<published>2017-08-16T15:13:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ddc088238cd6988bb4ac3776f403d7ff9d3c7a63'/>
<id>ddc088238cd6988bb4ac3776f403d7ff9d3c7a63</id>
<content type='text'>
Increase PPL area to 1MB and use it as circular buffer to store PPL. The
entry with highest generation number is the latest one. If PPL to be
written is larger then space left in a buffer, rewind the buffer to the
start (don't wrap it).

Signed-off-by: Pawel Baldysiak &lt;pawel.baldysiak@intel.com&gt;
Signed-off-by: Artur Paszkiewicz &lt;artur.paszkiewicz@intel.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Increase PPL area to 1MB and use it as circular buffer to store PPL. The
entry with highest generation number is the latest one. If PPL to be
written is larger then space left in a buffer, rewind the buffer to the
start (don't wrap it).

Signed-off-by: Pawel Baldysiak &lt;pawel.baldysiak@intel.com&gt;
Signed-off-by: Artur Paszkiewicz &lt;artur.paszkiewicz@intel.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: replace bi_bdev with a gendisk pointer and partitions index</title>
<updated>2017-08-23T18:49:55+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2017-08-23T17:10:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=74d46992e0d9dee7f1f376de0d56d31614c8a17a'/>
<id>74d46992e0d9dee7f1f376de0d56d31614c8a17a</id>
<content type='text'>
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: raid1-10: move raid1/raid10 common code into raid1-10.c</title>
<updated>2017-07-21T19:47:20+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2017-07-14T08:14:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=be453e7761d0e72d8a1b2fcfde6d1a7e53881190'/>
<id>be453e7761d0e72d8a1b2fcfde6d1a7e53881190</id>
<content type='text'>
No function change, just move 'struct resync_pages' and related
helpers into raid1-10.c

Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No function change, just move 'struct resync_pages' and related
helpers into raid1-10.c

Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: remove 'idx' from 'struct resync_pages'</title>
<updated>2017-07-21T19:47:20+00:00</updated>
<author>
<name>Ming Lei</name>
<email>ming.lei@redhat.com</email>
</author>
<published>2017-07-14T08:14:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=022e510fcbda79183fd2cdc01abb01b4be80d03f'/>
<id>022e510fcbda79183fd2cdc01abb01b4be80d03f</id>
<content type='text'>
bio_add_page() won't fail for resync bio, and the page index for each
bio is same, so remove it.

More importantly the 'idx' of 'struct resync_pages' is initialized in
mempool allocator function, the current way is wrong since mempool is
only responsible for allocation, we can't use that for initialization.

Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Reported-by: NeilBrown &lt;neilb@suse.com&gt;
Reported-and-tested-by: Patrick &lt;dto@gmx.net&gt;
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bio_add_page() won't fail for resync bio, and the page index for each
bio is same, so remove it.

More importantly the 'idx' of 'struct resync_pages' is initialized in
mempool allocator function, the current way is wrong since mempool is
only responsible for allocation, we can't use that for initialization.

Suggested-by: NeilBrown &lt;neilb@suse.com&gt;
Reported-by: NeilBrown &lt;neilb@suse.com&gt;
Reported-and-tested-by: Patrick &lt;dto@gmx.net&gt;
Fixes: f0250618361d(md: raid10: don't use bio's vec table to manage resync pages)
Fixes: 98d30c5812c3(md: raid1: don't use bio's vec table to manage resync pages)
Cc: stable@vger.kernel.org (4.12+)
Signed-off-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/bitmap: don't read page from device with Bitmap_sync</title>
<updated>2017-07-10T17:30:41+00:00</updated>
<author>
<name>Guoqing Jiang</name>
<email>gqjiang@suse.com</email>
</author>
<published>2017-07-04T03:20:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4aaf7694f841edc96fe0f72958aabe59204b3611'/>
<id>4aaf7694f841edc96fe0f72958aabe59204b3611</id>
<content type='text'>
The device owns Bitmap_sync flag needs recovery
to become in sync, and read page from this type
device could get stale status.

Also add comments for Bitmap_sync bit per the
suggestion from Shaohua and Neil.

Previous disscussion can be found here:
https://marc.info/?t=149760428900004&amp;r=1&amp;w=2

Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The device owns Bitmap_sync flag needs recovery
to become in sync, and read page from this type
device could get stale status.

Also add comments for Bitmap_sync bit per the
suggestion from Shaohua and Neil.

Previous disscussion can be found here:
https://marc.info/?t=149760428900004&amp;r=1&amp;w=2

Signed-off-by: Guoqing Jiang &lt;gqjiang@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: use a separate bio_set for synchronous IO.</title>
<updated>2017-06-21T17:52:36+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-06-20T23:12:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5a85071c2cbcc7d8d8f764b33bf64c76e47d268d'/>
<id>5a85071c2cbcc7d8d8f764b33bf64c76e47d268d</id>
<content type='text'>
md devices allocate a bio_set and use it for two
distinct purposes.
mddev-&gt;bio_set is used to clone bios as part of sending
upper level requests down to lower level devices,
and it is also use for synchronous IO such as superblock
and bitmap updates, and for correcting read errors.

This multiple usage can lead to deadlocks.  It is likely
that cloned bios might be queued for write and to be
waiting for a metadata update before the write can be permitted.
If the cloning exhausted mddev-&gt;bio_set, the metadata update
may not be able to proceed.

This scenario has been seen during heavy testing, with lots of IO and
lots of memory pressure.

Address this by adding a new bio_set specifically for synchronous IO.
All synchronous IO goes directly to the underlying device and is not
queued at the md level, so request using entries from the new
mddev-&gt;sync_set will complete in a timely fashion.
Requests that use mddev-&gt;bio_set will sometimes need to wait
for synchronous IO, but will no longer risk deadlocking that iO.

Also: small simplification in mddev_put(): there is no need to
wait until the spinlock is released before calling bioset_free().

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
md devices allocate a bio_set and use it for two
distinct purposes.
mddev-&gt;bio_set is used to clone bios as part of sending
upper level requests down to lower level devices,
and it is also use for synchronous IO such as superblock
and bitmap updates, and for correcting read errors.

This multiple usage can lead to deadlocks.  It is likely
that cloned bios might be queued for write and to be
waiting for a metadata update before the write can be permitted.
If the cloning exhausted mddev-&gt;bio_set, the metadata update
may not be able to proceed.

This scenario has been seen during heavy testing, with lots of IO and
lots of memory pressure.

Address this by adding a new bio_set specifically for synchronous IO.
All synchronous IO goes directly to the underlying device and is not
queued at the md level, so request using entries from the new
mddev-&gt;sync_set will complete in a timely fashion.
Requests that use mddev-&gt;bio_set will sometimes need to wait
for synchronous IO, but will no longer risk deadlocking that iO.

Also: small simplification in mddev_put(): there is no need to
wait until the spinlock is released before calling bioset_free().

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: fix deadlock between mddev_suspend() and md_write_start()</title>
<updated>2017-06-13T17:18:01+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-06-05T06:49:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cc27b0c78c79680d128dbac79de0d40556d041bb'/>
<id>cc27b0c78c79680d128dbac79de0d40556d041bb</id>
<content type='text'>
If mddev_suspend() races with md_write_start() we can deadlock
with mddev_suspend() waiting for the request that is currently
in md_write_start() to complete the -&gt;make_request() call,
and md_write_start() waiting for the metadata to be updated
to mark the array as 'dirty'.
As metadata updates done by md_check_recovery() only happen then
the mddev_lock() can be claimed, and as mddev_suspend() is often
called with the lock held, these threads wait indefinitely for each
other.

We fix this by having md_write_start() abort if mddev_suspend()
is happening, and -&gt;make_request() aborts if md_write_start()
aborted.
md_make_request() can detect this abort, decrease the -&gt;active_io
count, and wait for mddev_suspend().

Reported-by: Nix &lt;nix@esperi.org.uk&gt;
Fix: 68866e425be2(MD: no sync IO while suspended)
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If mddev_suspend() races with md_write_start() we can deadlock
with mddev_suspend() waiting for the request that is currently
in md_write_start() to complete the -&gt;make_request() call,
and md_write_start() waiting for the metadata to be updated
to mark the array as 'dirty'.
As metadata updates done by md_check_recovery() only happen then
the mddev_lock() can be claimed, and as mddev_suspend() is often
called with the lock held, these threads wait indefinitely for each
other.

We fix this by having md_write_start() abort if mddev_suspend()
is happening, and -&gt;make_request() aborts if md_write_start()
aborted.
md_make_request() can detect this abort, decrease the -&gt;active_io
count, and wait for mddev_suspend().

Reported-by: Nix &lt;nix@esperi.org.uk&gt;
Fix: 68866e425be2(MD: no sync IO while suspended)
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: initialise -&gt;writes_pending in personality modules.</title>
<updated>2017-06-05T23:04:35+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-06-05T06:05:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a415c0f10627913793709ddb75add09d2ea334dc'/>
<id>a415c0f10627913793709ddb75add09d2ea334dc</id>
<content type='text'>
The new per-cpu counter for writes_pending is initialised in
md_alloc(), which is not called by dm-raid.
So dm-raid fails when md_write_start() is called.

Move the initialization to the personality modules
that need it.  This way it is always initialised when needed,
but isn't unnecessarily initialized (requiring memory allocation)
when the personality doesn't use writes_pending.

Reported-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The new per-cpu counter for writes_pending is initialised in
md_alloc(), which is not called by dm-raid.
So dm-raid fails when md_write_start() is called.

Move the initialization to the personality modules
that need it.  This way it is always initialised when needed,
but isn't unnecessarily initialized (requiring memory allocation)
when the personality doesn't use writes_pending.

Reported-by: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
