<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/md/md.c, branch linux-4.13.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>md: separate request handling</title>
<updated>2017-10-05T07:47:34+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-09-21T17:23:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8d73e57d868eb8f4ce1f8ed1addd57fe8d0c080f'/>
<id>8d73e57d868eb8f4ce1f8ed1addd57fe8d0c080f</id>
<content type='text'>
commit 393debc23c7820211d1c8253dd6a8408a7628fe7 upstream.

With commit cc27b0c78c79, pers-&gt;make_request could bail out without handling
the bio. If that happens, we should retry.  The commit fixes md_make_request
but not other call sites. Separate the request handling part, so other call
sites can use it.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 393debc23c7820211d1c8253dd6a8408a7628fe7 upstream.

With commit cc27b0c78c79, pers-&gt;make_request could bail out without handling
the bio. If that happens, we should retry.  The commit fixes md_make_request
but not other call sites. Separate the request handling part, so other call
sites can use it.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: fix a race condition for flush request handling</title>
<updated>2017-10-05T07:47:34+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-09-21T16:55:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=93f1f1b25b32e591c9c3b03e3e338d494cf21068'/>
<id>93f1f1b25b32e591c9c3b03e3e338d494cf21068</id>
<content type='text'>
commit 79bf31a3b2a7ca467cfec8ff97d359a77065d01f upstream.

md_submit_flush_data calls pers-&gt;make_request, which missed the suspend check.
Fix it with the new md_handle_request API.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Tested-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 79bf31a3b2a7ca467cfec8ff97d359a77065d01f upstream.

md_submit_flush_data calls pers-&gt;make_request, which missed the suspend check.
Fix it with the new md_handle_request API.

Reported-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Tested-by: Nate Dailey &lt;nate.dailey@stratus.com&gt;
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>MD: not clear -&gt;safemode for external metadata array</title>
<updated>2017-08-12T03:42:06+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-08-12T03:34:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=afc1f55ca44e257f69da8f43e0714a76686ae8d1'/>
<id>afc1f55ca44e257f69da8f43e0714a76686ae8d1</id>
<content type='text'>
-&gt;safemode should be triggered by mdadm for external metadaa array, otherwise
array's state confuses mdadm.

Fixes: 33182d15c6bf(md: always clear -&gt;safemode when md_check_recovery gets the mddev lock.)
Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
-&gt;safemode should be triggered by mdadm for external metadaa array, otherwise
array's state confuses mdadm.

Fixes: 33182d15c6bf(md: always clear -&gt;safemode when md_check_recovery gets the mddev lock.)
Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: fix test in md_write_start()</title>
<updated>2017-08-08T14:42:36+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-08-08T06:56:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=81fe48e9aa00bdd509bd3c37a76d1132da6b9f09'/>
<id>81fe48e9aa00bdd509bd3c37a76d1132da6b9f09</id>
<content type='text'>
md_write_start() needs to clear the in_sync flag is it is set, or if
there might be a race with set_in_sync() such that the later will
set it very soon.  In the later case it is sufficient to take the
spinlock to synchronize with set_in_sync(), and then set the flag
if needed.

The current test is incorrect.
It should be:
  if "flag is set" or "race is possible"

"flag is set" is trivially "mddev-&gt;in_sync".
"race is possible" should be tested by "mddev-&gt;sync_checkers".

If sync_checkers is 0, then there can be no race.  set_in_sync() will
wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period,
and as md_write_start() holds the rcu_read_lock(), set_in_sync() will
be sure ot see the update to writes_pending.

If sync_checkers is &gt; 0, there could be race.  If md_write_start()
happened entirely between
		if (!mddev-&gt;in_sync &amp;&amp;
		    percpu_ref_is_zero(&amp;mddev-&gt;writes_pending)) {
and
			mddev-&gt;in_sync = 1;
in set_in_sync(), then it would not see that is_sync had been set,
and set_in_sync() would not see that writes_pending had been
incremented.

This bug means that in_sync is sometimes not set when it should be.
Consequently there is a small chance that the array will be marked as
"clean" when in fact it is inconsistent.

Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
md_write_start() needs to clear the in_sync flag is it is set, or if
there might be a race with set_in_sync() such that the later will
set it very soon.  In the later case it is sufficient to take the
spinlock to synchronize with set_in_sync(), and then set the flag
if needed.

The current test is incorrect.
It should be:
  if "flag is set" or "race is possible"

"flag is set" is trivially "mddev-&gt;in_sync".
"race is possible" should be tested by "mddev-&gt;sync_checkers".

If sync_checkers is 0, then there can be no race.  set_in_sync() will
wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period,
and as md_write_start() holds the rcu_read_lock(), set_in_sync() will
be sure ot see the update to writes_pending.

If sync_checkers is &gt; 0, there could be race.  If md_write_start()
happened entirely between
		if (!mddev-&gt;in_sync &amp;&amp;
		    percpu_ref_is_zero(&amp;mddev-&gt;writes_pending)) {
and
			mddev-&gt;in_sync = 1;
in set_in_sync(), then it would not see that is_sync had been set,
and set_in_sync() would not see that writes_pending had been
incremented.

This bug means that in_sync is sometimes not set when it should be.
Consequently there is a small chance that the array will be marked as
"clean" when in fact it is inconsistent.

Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: always clear -&gt;safemode when md_check_recovery gets the mddev lock.</title>
<updated>2017-08-08T14:42:35+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-08-08T06:56:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=33182d15c6bf182f7ae32a66ea4a547d979cd6d7'/>
<id>33182d15c6bf182f7ae32a66ea4a547d979cd6d7</id>
<content type='text'>
If -&gt;safemode == 1, md_check_recovery() will try to get the mddev lock
and perform various other checks.
If mddev-&gt;in_sync is zero, it will call set_in_sync, and clear
-&gt;safemode.  However if mddev-&gt;in_sync is not zero, -&gt;safemode will not
be cleared.

When md_check_recovery() drops the mddev lock, the thread is woken
up again.  Normally it would just check if there was anything else to
do, find nothing, and go to sleep.  However as -&gt;safemode was not
cleared, it will take the mddev lock again, then wake itself up
when unlocking.

This results in an infinite loop, repeatedly calling
md_check_recovery(), which RCU or the soft-lockup detector
will eventually complain about.

Prior to commit 4ad23a976413 ("MD: use per-cpu counter for
writes_pending"), safemode would only be set to one when the
writes_pending counter reached zero, and would be cleared again
when writes_pending is incremented.  Since that patch, safemode
is set more freely, but is not reliably cleared.

So in md_check_recovery() clear -&gt;safemode before checking -&gt;in_sync.

Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (4.12+)
Reported-by: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Reported-by: David R &lt;david@unsolicited.net&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If -&gt;safemode == 1, md_check_recovery() will try to get the mddev lock
and perform various other checks.
If mddev-&gt;in_sync is zero, it will call set_in_sync, and clear
-&gt;safemode.  However if mddev-&gt;in_sync is not zero, -&gt;safemode will not
be cleared.

When md_check_recovery() drops the mddev lock, the thread is woken
up again.  Normally it would just check if there was anything else to
do, find nothing, and go to sleep.  However as -&gt;safemode was not
cleared, it will take the mddev lock again, then wake itself up
when unlocking.

This results in an infinite loop, repeatedly calling
md_check_recovery(), which RCU or the soft-lockup detector
will eventually complain about.

Prior to commit 4ad23a976413 ("MD: use per-cpu counter for
writes_pending"), safemode would only be set to one when the
writes_pending counter reached zero, and would be cleared again
when writes_pending is incremented.  Since that patch, safemode
is set more freely, but is not reliably cleared.

So in md_check_recovery() clear -&gt;safemode before checking -&gt;in_sync.

Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (4.12+)
Reported-by: Dominik Brodowski &lt;linux@dominikbrodowski.net&gt;
Reported-by: David R &lt;david@unsolicited.net&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MD: fix warnning for UP case</title>
<updated>2017-07-25T22:18:13+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-07-25T22:18:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ed9b66d21866ae3bf16557406258ebe6c00a9a84'/>
<id>ed9b66d21866ae3bf16557406258ebe6c00a9a84</id>
<content type='text'>
spin_is_locked always returns 0 for UP case, so ignores it

Reported-by: Joshua Kinard &lt;kumba@gentoo.org&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
spin_is_locked always returns 0 for UP case, so ignores it

Reported-by: Joshua Kinard &lt;kumba@gentoo.org&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2017-07-08T19:50:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2017-07-08T19:50:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=026d15f6b9878794fae1f794cae881ccd65052e5'/>
<id>026d15f6b9878794fae1f794cae881ccd65052e5</id>
<content type='text'>
Pull MD update from Shaohua Li:

 - fixed deadlock in MD suspend and a potential bug in bio allocation
   (Neil Brown)

 - fixed signal issue (Mikulas Patocka)

 - fixed typo in FailFast test (Guoqing Jiang)

 - other trival fixes

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  MD: fix sleep in atomic
  MD: fix a null dereference
  md: use a separate bio_set for synchronous IO.
  md: change the initialization value for a spare device spot to MD_DISK_ROLE_SPARE
  md/raid1: remove unused bio in sync_request_write
  md/raid10: fix FailFast test for wrong device
  md: don't use flush_signals in userspace processes
  md: fix deadlock between mddev_suspend() and md_write_start()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MD update from Shaohua Li:

 - fixed deadlock in MD suspend and a potential bug in bio allocation
   (Neil Brown)

 - fixed signal issue (Mikulas Patocka)

 - fixed typo in FailFast test (Guoqing Jiang)

 - other trival fixes

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  MD: fix sleep in atomic
  MD: fix a null dereference
  md: use a separate bio_set for synchronous IO.
  md: change the initialization value for a spare device spot to MD_DISK_ROLE_SPARE
  md/raid1: remove unused bio in sync_request_write
  md/raid10: fix FailFast test for wrong device
  md: don't use flush_signals in userspace processes
  md: fix deadlock between mddev_suspend() and md_write_start()
</pre>
</div>
</content>
</entry>
<entry>
<title>MD: fix sleep in atomic</title>
<updated>2017-07-03T21:38:59+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-07-03T21:34:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7184ef8bab0cb865c3cea9dd1a675771145df0af'/>
<id>7184ef8bab0cb865c3cea9dd1a675771145df0af</id>
<content type='text'>
bioset_free() will take a mutex, so can't get called with spinlock hold.

Fix: 5a85071c2cbc(md: use a separate bio_set for synchronous IO.)
Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bioset_free() will take a mutex, so can't get called with spinlock hold.

Fix: 5a85071c2cbc(md: use a separate bio_set for synchronous IO.)
Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>MD: fix a null dereference</title>
<updated>2017-06-23T16:19:49+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2017-06-23T16:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7f053a6a745557b3f3ad63e9d28ba85c3c0b1563'/>
<id>7f053a6a745557b3f3ad63e9d28ba85c3c0b1563</id>
<content type='text'>
rdev-&gt;mddev could be null in start time.

Reported-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Fix: 5a85071c2cbc(md: use a separate bio_set for synchronous IO.)
Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
rdev-&gt;mddev could be null in start time.

Reported-by: Ming Lei &lt;ming.lei@redhat.com&gt;
Fix: 5a85071c2cbc(md: use a separate bio_set for synchronous IO.)
Cc: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: use a separate bio_set for synchronous IO.</title>
<updated>2017-06-21T17:52:36+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2017-06-20T23:12:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5a85071c2cbcc7d8d8f764b33bf64c76e47d268d'/>
<id>5a85071c2cbcc7d8d8f764b33bf64c76e47d268d</id>
<content type='text'>
md devices allocate a bio_set and use it for two
distinct purposes.
mddev-&gt;bio_set is used to clone bios as part of sending
upper level requests down to lower level devices,
and it is also use for synchronous IO such as superblock
and bitmap updates, and for correcting read errors.

This multiple usage can lead to deadlocks.  It is likely
that cloned bios might be queued for write and to be
waiting for a metadata update before the write can be permitted.
If the cloning exhausted mddev-&gt;bio_set, the metadata update
may not be able to proceed.

This scenario has been seen during heavy testing, with lots of IO and
lots of memory pressure.

Address this by adding a new bio_set specifically for synchronous IO.
All synchronous IO goes directly to the underlying device and is not
queued at the md level, so request using entries from the new
mddev-&gt;sync_set will complete in a timely fashion.
Requests that use mddev-&gt;bio_set will sometimes need to wait
for synchronous IO, but will no longer risk deadlocking that iO.

Also: small simplification in mddev_put(): there is no need to
wait until the spinlock is released before calling bioset_free().

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
md devices allocate a bio_set and use it for two
distinct purposes.
mddev-&gt;bio_set is used to clone bios as part of sending
upper level requests down to lower level devices,
and it is also use for synchronous IO such as superblock
and bitmap updates, and for correcting read errors.

This multiple usage can lead to deadlocks.  It is likely
that cloned bios might be queued for write and to be
waiting for a metadata update before the write can be permitted.
If the cloning exhausted mddev-&gt;bio_set, the metadata update
may not be able to proceed.

This scenario has been seen during heavy testing, with lots of IO and
lots of memory pressure.

Address this by adding a new bio_set specifically for synchronous IO.
All synchronous IO goes directly to the underlying device and is not
queued at the md level, so request using entries from the new
mddev-&gt;sync_set will complete in a timely fashion.
Requests that use mddev-&gt;bio_set will sometimes need to wait
for synchronous IO, but will no longer risk deadlocking that iO.

Also: small simplification in mddev_put(): there is no need to
wait until the spinlock is released before calling bioset_free().

Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
