linux-stable.git/drivers/md, branch v4.9.91

md/raid10: skip spare disk as 'first' disk

2018-03-24T10:00:20+00:00

[ Upstream commit b506335e5d2b4ec687dde392a3bdbf7601778f1d ]

Commit 6f287ca(md/raid10: reset the 'first' at the end of loop) ignores
a case in reshape, the first rdev could be a spare disk, which shouldn't
be accounted as the first disk since it doesn't include the offset info.

Fix: 6f287ca(md/raid10: reset the 'first' at the end of loop)
Cc: Guoqing Jiang 
Cc: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

md/raid10: wait up frozen array in handle_write_completed

2018-03-24T10:00:12+00:00

[ Upstream commit cf25ae78fc50010f66b9be945017796da34c434d ]

Since nr_queued is changed, we need to call wake_up here
if the array is already frozen and waiting for condition
"nr_pending == nr_queued + extra" to be true.

And commit 824e47daddbf ("RAID1: avoid unnecessary spin
locks in I/O barrier code") which has already added the
wake_up for raid1.

Signed-off-by: Guoqing Jiang 
Reviewed-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

md.c:didn't unlock the mddev before return EINVAL in array_size_store

2018-03-22T08:17:50+00:00

[ Upstream commit b670883bb9e55ba63a278d83e034faefc01ce2cf ]

md.c: it needs to release the mddev lock before
the array_size_store() returns.

Fixes: ab5a98b132fd ("md-cluster: change array_sectors and update size are not supported")

Signed-off-by: Zhilong Liu 
Reviewed-by: Guoqing Jiang 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

md/raid6: Fix anomily when recovering a single device in RAID6.

2018-03-22T08:17:50+00:00

[ Upstream commit 7471fb77ce4dc4cb81291189947fcdf621a97987 ]

When recoverying a single missing/failed device in a RAID6,
those stripes where the Q block is on the missing device are
handled a bit differently.  In these cases it is easy to
check that the P block is correct, so we do.  This results
in the P block be destroy.  Consequently the P block needs
to be read a second time in order to compute Q.  This causes
lots of seeks and hurts performance.

It shouldn't be necessary to re-read P as it can be computed
from the DATA.  But we only compute blocks on missing
devices, since c337869d9501 ("md: do not compute parity
unless it is on a failed drive").

So relax the change made in that commit to allow computing
of the P block in a RAID6 which it is the only missing that
block.

This makes RAID6 recovery run much faster as the disk just
"before" the recovering device is no longer seeking
back-and-forth.

Reported-by-tested-by: Brad Campbell 
Reviewed-by: Dan Williams 
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bcache: don't attach backing with duplicate UUID

2018-03-18T10:18:50+00:00

commit 86755b7a96faed57f910f9e6b8061e019ac1ec08 upstream.

This can happen e.g. during disk cloning.

This is an incomplete fix: it does not catch duplicate UUIDs earlier
when things are still unattached.  It does not unregister the device.
Further changes to cope better with this are planned but conflict with
Coly's ongoing improvements to handling device errors.  In the meantime,
one can manually stop the device after this has happened.

Attempts to attach a duplicate device result in:

[  136.372404] loop: module loaded
[  136.424461] bcache: register_bdev() registered backing device loop0
[  136.424464] bcache: bch_cached_dev_attach() Tried to attach loop0 but duplicate UUID already attached

My test procedure is:

  dd if=/dev/sdb1 of=imgfile bs=1024 count=262144
  losetup -f imgfile

Signed-off-by: Michael Lyle 
Reviewed-by: Tang Junhui 
Cc: 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

bcache: fix crashes in duplicate cache device register

2018-03-18T10:18:50+00:00

commit cc40daf91bdddbba72a4a8cd0860640e06668309 upstream.

Kernel crashed when register a duplicate cache device, the call trace is
bellow:
[  417.643790] CPU: 1 PID: 16886 Comm: bcache-register Tainted: G
   W  OE    4.15.5-amd64-preempt-sysrq-20171018 #2
[  417.643861] Hardware name: LENOVO 20ERCTO1WW/20ERCTO1WW, BIOS
N1DET41W (1.15 ) 12/31/2015
[  417.643870] RIP: 0010:bdevname+0x13/0x1e
[  417.643876] RSP: 0018:ffffa3aa9138fd38 EFLAGS: 00010282
[  417.643884] RAX: 0000000000000000 RBX: ffff8c8f2f2f8000 RCX: ffffd6701f8
c7edf
[  417.643890] RDX: ffffa3aa9138fd88 RSI: ffffa3aa9138fd88 RDI: 00000000000
00000
[  417.643895] RBP: ffffa3aa9138fde0 R08: ffffa3aa9138fae8 R09: 00000000000
1850e
[  417.643901] R10: ffff8c8eed34b271 R11: ffff8c8eed34b250 R12: 00000000000
00000
[  417.643906] R13: ffffd6701f78f940 R14: ffff8c8f38f80000 R15: ffff8c8ea7d
90000
[  417.643913] FS:  00007fde7e66f500(0000) GS:ffff8c8f61440000(0000) knlGS:
0000000000000000
[  417.643919] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  417.643925] CR2: 0000000000000314 CR3: 00000007e6fa0001 CR4: 00000000003
606e0
[  417.643931] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000
00000
[  417.643938] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 00000000000
00400
[  417.643946] Call Trace:
[  417.643978]  register_bcache+0x1117/0x1270 [bcache]
[  417.643994]  ? slab_pre_alloc_hook+0x15/0x3c
[  417.644001]  ? slab_post_alloc_hook.isra.44+0xa/0x1a
[  417.644013]  ? kernfs_fop_write+0xf6/0x138
[  417.644020]  kernfs_fop_write+0xf6/0x138
[  417.644031]  __vfs_write+0x31/0xcc
[  417.644043]  ? current_kernel_time64+0x10/0x36
[  417.644115]  ? __audit_syscall_entry+0xbf/0xe3
[  417.644124]  vfs_write+0xa5/0xe2
[  417.644133]  SyS_write+0x5c/0x9f
[  417.644144]  do_syscall_64+0x72/0x81
[  417.644161]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  417.644169] RIP: 0033:0x7fde7e1c1974
[  417.644175] RSP: 002b:00007fff13009a38 EFLAGS: 00000246 ORIG_RAX: 0000000
000000001
[  417.644183] RAX: ffffffffffffffda RBX: 0000000001658280 RCX: 00007fde7e1c
1974
[  417.644188] RDX: 000000000000000a RSI: 0000000001658280 RDI: 000000000000
0001
[  417.644193] RBP: 000000000000000a R08: 0000000000000003 R09: 000000000000
0077
[  417.644198] R10: 000000000000089e R11: 0000000000000246 R12: 000000000000
0001
[  417.644203] R13: 000000000000000a R14: 7fffffffffffffff R15: 000000000000
0000
[  417.644213] Code: c7 c2 83 6f ee 98 be 20 00 00 00 48 89 df e8 6c 27 3b 0
0 48 89 d8 5b c3 0f 1f 44 00 00 48 8b 47 70 48 89 f2 48 8b bf 80 00 00 00 <8
b> b0 14 03 00 00 e9 73 ff ff ff 0f 1f 44 00 00 48 8b 47 40 39
[  417.644302] RIP: bdevname+0x13/0x1e RSP: ffffa3aa9138fd38
[  417.644306] CR2: 0000000000000314

When registering duplicate cache device in register_cache(), after failure
on calling register_cache_set(), bch_cache_release() will be called, then
bdev will be freed, so bdevname(bdev, name) caused kernel crash.

Since bch_cache_release() will free bdev, so in this patch we make sure
bdev being freed if register_cache() fail, and do not free bdev again in
register_bcache() when register_cache() fail.

Signed-off-by: Tang Junhui 
Reported-by: Marc MERLIN 
Tested-by: Michael Lyle 
Reviewed-by: Michael Lyle 
Cc: 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

md: only allow remove_and_add_spares when no sync_thread running.

2018-03-11T15:21:31+00:00

commit 39772f0a7be3b3dc26c74ea13fe7847fd1522c8b upstream.

The locking protocols in md assume that a device will
never be removed from an array during resync/recovery/reshape.
When that isn't happening, rcu or reconfig_mutex is needed
to protect an rdev pointer while taking a refcount.  When
it is happening, that protection isn't needed.

Unfortunately there are cases were remove_and_add_spares() is
called when recovery might be happening: is state_store(),
slot_store() and hot_remove_disk().
In each case, this is just an optimization, to try to expedite
removal from the personality so the device can be removed from
the array.  If resync etc is happening, we just have to wait
for md_check_recover to find a suitable time to call
remove_and_add_spares().

This optimization and not essential so it doesn't
matter if it fails.
So change remove_and_add_spares() to abort early if
resync/recovery/reshape is happening, unless it is called
from md_check_recovery() as part of a newly started recovery.
The parameter "this" is only NULL when called from
md_check_recovery() so when it is NULL, there is no need to abort.

As this can result in a NULL dereference, the fix is suitable
for -stable.

cc: yuyufen 
Cc: Tomasz Majchrzak 
Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
Cc: stable@ver.kernel.org (v4.8+)
Signed-off-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Greg Kroah-Hartman

dm io: fix duplicate bio completion due to missing ref count

2018-03-11T15:21:30+00:00

commit feb7695fe9fb83084aa29de0094774f4c9d4c9fc upstream.

If only a subset of the devices associated with multiple regions support
a given special operation (eg. DISCARD) then the dec_count() that is
used to set error for the region must increment the io->count.

Otherwise, when the dec_count() is called it can cause the dm-io
caller's bio to be completed multiple times.  As was reported against
the dm-mirror target that had mirror legs with a mix of discard
capabilities.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=196077
Reported-by: Zhang Yi 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm: correctly handle chained bios in dec_pending()

2018-02-22T14:43:51+00:00

commit 8dd601fa8317243be887458c49f6c29c2f3d719f upstream.

dec_pending() is given an error status (possibly 0) to be recorded
against a bio.  It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io->status.  However when it then assigned io->status to bio->bi_status,
it is not careful and could overwrite a genuine error status with 0.

This can happen when chained bios are in use.  If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio->bi_status before the dm_io completes.

This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da84354 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.

A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
->bi_status, and the other fragment will normally complete a little
later, and will clear ->bi_status.

The fix is simply to only assign io_error to bio->bi_status when
io_error is not zero.

Reported-and-tested-by: Milan Broz 
Cc: stable@vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

bcache: check return value of register_shrinker

2018-02-03T16:05:37+00:00

[ Upstream commit 6c4ca1e36cdc1a0a7a84797804b87920ccbebf51 ]

register_shrinker is now __must_check, so check it to kill a warning.
Caller of bch_btree_cache_alloc in super.c appropriately checks return
value so this is fully plumbed through.

This V2 fixes checkpatch warnings and improves the commit description,
as I was too hasty getting the previous version out.

Signed-off-by: Michael Lyle 
Reviewed-by: Vojtech Pavlik 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman