linux-stable.git/drivers/md, branch v4.13.2

dm mpath: do not lock up a CPU with requeuing activity

2017-08-28T13:58:27+00:00

When using the block layer in single queue mode, get_request()
returns ERR_PTR(-EAGAIN) if the queue is dying and the REQ_NOWAIT
flag has been passed to get_request(). Avoid that the kernel
reports soft lockup complaints in this case due to continuous
requeuing activity.

Fixes: 7083abbbf ("dm mpath: avoid that path removal can trigger an infinite loop")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche 
Tested-by: Laurence Oberman 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Mike Snitzer

dm: fix printk() rate limiting code

2017-08-28T13:58:27+00:00

Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.

Fixes: 71a16736a15e ("dm: use local printk ratelimit")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche 
Signed-off-by: Mike Snitzer

dm mpath: retry BLK_STS_RESOURCE errors

2017-08-28T13:58:26+00:00

Retry requests instead of failing them if an out-of-memory error occurs
or the block driver below dm-mpath is busy.  This restores the v4.12
behavior of noretry_error(), namely that -ENOMEM results in a retry.

Fixes: 2a842acab109 ("block: introduce new block status code type")
Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Mike Snitzer

dm: fix the second dec_pending() argument in __split_and_process_bio()

2017-08-28T13:36:19+00:00

Detected by sparse.

Fixes: 4e4cbee93d56 ("block: switch bios to blk_status_t")
Signed-off-by: Bart Van Assche 
Reviewed-by: Christoph Hellwig 
Tested-by: Laurence Oberman 
Signed-off-by: Mike Snitzer

MD: not clear ->safemode for external metadata array

2017-08-12T03:42:06+00:00

->safemode should be triggered by mdadm for external metadaa array, otherwise
array's state confuses mdadm.

Fixes: 33182d15c6bf(md: always clear ->safemode when md_check_recovery gets the mddev lock.)
Cc: NeilBrown 
Signed-off-by: Shaohua Li

md/r5cache: fix io_unit handling in r5l_log_endio()

2017-08-08T14:42:37+00:00

In r5l_log_endio(), once log->io_list_lock is released, the io unit
may be accessed (or even freed) by other threads. Current code
doesn't handle the io_unit properly, which leads to potential race
conditions.

This patch solves this race condition by:

1. Add a pending_stripe count flush_payload. Multiple flush_payloads
   are counted as only one pending_stripe. Flag has_flush_payload is
   added to show whether the io unit has flush_payload;
2. In r5l_log_endio(), check flags has_null_flush and
   has_flush_payload with log->io_list_lock held. After the lock
   is released, this IO unit is only accessed when we know the
   pending_stripe counter cannot be zeroed by other threads.

Signed-off-by: Song Liu 
Signed-off-by: Shaohua Li

md/r5cache: call mddev_lock/unlock() in r5c_journal_mode_set

2017-08-08T14:42:36+00:00

In r5c_journal_mode_set(), it is necessary to call mddev_lock()
before accessing conf and conf->log. Otherwise, the conf->log
may change (and become NULL).

Shaohua: fix unlock in failure cases

Signed-off-by: Song Liu 
Signed-off-by: Shaohua Li

md: fix test in md_write_start()

2017-08-08T14:42:36+00:00

md_write_start() needs to clear the in_sync flag is it is set, or if
there might be a race with set_in_sync() such that the later will
set it very soon.  In the later case it is sufficient to take the
spinlock to synchronize with set_in_sync(), and then set the flag
if needed.

The current test is incorrect.
It should be:
  if "flag is set" or "race is possible"

"flag is set" is trivially "mddev->in_sync".
"race is possible" should be tested by "mddev->sync_checkers".

If sync_checkers is 0, then there can be no race.  set_in_sync() will
wait in percpu_ref_switch_to_atomic_sync() for an RCU grace period,
and as md_write_start() holds the rcu_read_lock(), set_in_sync() will
be sure ot see the update to writes_pending.

If sync_checkers is > 0, there could be race.  If md_write_start()
happened entirely between
		if (!mddev->in_sync &&
		    percpu_ref_is_zero(&mddev->writes_pending)) {
and
			mddev->in_sync = 1;
in set_in_sync(), then it would not see that is_sync had been set,
and set_in_sync() would not see that writes_pending had been
incremented.

This bug means that in_sync is sometimes not set when it should be.
Consequently there is a small chance that the array will be marked as
"clean" when in fact it is inconsistent.

Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li

md: always clear ->safemode when md_check_recovery gets the mddev lock.

2017-08-08T14:42:35+00:00

If ->safemode == 1, md_check_recovery() will try to get the mddev lock
and perform various other checks.
If mddev->in_sync is zero, it will call set_in_sync, and clear
->safemode.  However if mddev->in_sync is not zero, ->safemode will not
be cleared.

When md_check_recovery() drops the mddev lock, the thread is woken
up again.  Normally it would just check if there was anything else to
do, find nothing, and go to sleep.  However as ->safemode was not
cleared, it will take the mddev lock again, then wake itself up
when unlocking.

This results in an infinite loop, repeatedly calling
md_check_recovery(), which RCU or the soft-lockup detector
will eventually complain about.

Prior to commit 4ad23a976413 ("MD: use per-cpu counter for
writes_pending"), safemode would only be set to one when the
writes_pending counter reached zero, and would be cleared again
when writes_pending is incremented.  Since that patch, safemode
is set more freely, but is not reliably cleared.

So in md_check_recovery() clear ->safemode before checking ->in_sync.

Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (4.12+)
Reported-by: Dominik Brodowski 
Reported-by: David R 
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li

Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md

2017-07-28T19:24:21+00:00

Pull MD fixes from Shaohua Li:
 "This fixes several bugs, three of them are marked for stable:

   - an initialization issue fixed by Ming

   - a bio clone race issue fixed by me

   - an async tx flush issue fixed by Ofer

   - other cleanups"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  MD: fix warnning for UP case
  md/raid5: add thread_group worker async_tx_issue_pending_all
  md: simplify code with bio_io_error
  md/raid1: fix writebehind bio clone
  md: raid1-10: move raid1/raid10 common code into raid1-10.c
  md: raid1/raid10: initialize bvec table via bio_add_page()
  md: remove 'idx' from 'struct resync_pages'