linux-stable.git/drivers/md/raid10.c, branch linux-4.1.y

md/raid10: reset the 'first' at the end of loop

2018-05-23T01:33:58+00:00

[ Upstream commit 6f287ca6046edd34ed83aafb7f9033c9c2e809e2 ]

We need to set "first = 0' at the end of rdev_for_each
loop, so we can get the array's min_offset_diff correctly
otherwise min_offset_diff just means the last rdev's
offset diff.

Suggested-by: NeilBrown 
Signed-off-by: Guoqing Jiang 
Reviewed-by: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

md/raid10: skip spare disk as 'first' disk

2018-05-23T01:33:50+00:00

[ Upstream commit b506335e5d2b4ec687dde392a3bdbf7601778f1d ]

Commit 6f287ca(md/raid10: reset the 'first' at the end of loop) ignores
a case in reshape, the first rdev could be a spare disk, which shouldn't
be accounted as the first disk since it doesn't include the offset info.

Fix: 6f287ca(md/raid10: reset the 'first' at the end of loop)
Cc: Guoqing Jiang 
Cc: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

raid10: increment write counter after bio is split

2018-01-17T17:30:35+00:00

[ Upstream commit 9b622e2bbcf049c82e2550d35fb54ac205965f50 ]

md pending write counter must be incremented after bio is split,
otherwise it gets decremented too many times in end bio callback and
becomes negative.

Signed-off-by: Tomasz Majchrzak 
Reviewed-by: Artur Paszkiewicz 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

md/raid10: submit bio directly to replacement disk

2017-11-06T04:54:17+00:00

[ Upstream commit 6d399783e9d4e9bd44931501948059d24ad96ff8 ]

Commit 57c67df(md/raid10: submit IO from originating thread instead of
md thread) submits bio directly for normal disks but not for replacement
disks. There is no point we shouldn't do this for replacement disks.

Cc: NeilBrown 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

md/raid1/10: fix potential deadlock

2017-05-17T19:07:01+00:00

[ Upstream commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 ]

Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:

1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer

If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.

The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:

    if (need to split) {
        split = bio_split(bio, ...)
        bio_chain(...)
        make_request_fn(split);
        generic_make_request(bio);
   } else
        make_request_fn(mddev, bio);

This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices.  These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first.  Then it will process the remainder
of the original bio once the first section has been fully processed.
"

Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.

Cc: Coly Li 
Cc: stable@vger.kernel.org (v3.14+, only the raid10 part)
Suggested-by: NeilBrown 
Reviewed-by: Jack Wang 
Signed-off-by: Shaohua Li 
Signed-off-by: Sasha Levin

md/raid10: submit_bio_wait() returns 0 on success

2015-11-09T22:33:38+00:00

commit 681ab4696062f5aa939c9e04d058732306a97176 upstream.

This was introduced with 9e882242c6193ae6f416f2d8d8db0d9126bd996b
which changed the return value of submit_bio_wait() to return != 0 on
error, but didn't update the caller accordingly.

Fixes: 9e882242c6 ("block: Add submit_bio_wait(), remove from md")
Reported-by: Bill Kuzeja 
Signed-off-by: Jes Sorensen 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid10: always set reshape_safe when initializing reshape_position.

2015-09-29T17:26:13+00:00

commit 299b0685e31c9f3dcc2d58ee3beca761a40b44b3 upstream.

'reshape_position' tracks where in the reshape we have reached.
'reshape_safe' tracks where in the reshape we have safely recorded
in the metadata.

These are compared to determine when to update the metadata.
So it is important that reshape_safe is initialised properly.
Currently it isn't.  When starting a reshape from the beginning
it usually has the correct value by luck.  But when reducing the
number of devices in a RAID10, it has the wrong value and this leads
to the metadata not being updated correctly.
This can lead to corruption if the reshape is not allowed to complete.

This patch is suitable for any -stable kernel which supports RAID10
reshape, which is 3.5 and later.

Fixes: 3ea7daa5d7fd ("md/raid10: add reshape support")
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: make sure MD_RECOVERY_DONE is clear before starting recovery/resync

2015-06-12T10:16:33+00:00

MD_RECOVERY_DONE is normally cleared by md_check_recovery after a
resync etc finished.  However it is possible for raid5_start_reshape
to race and start a reshape before MD_RECOVERY_DONE is cleared.  This
can lean to multiple reshapes running at the same time, which isn't
good.

To make sure it is cleared before starting a reshape, and also clear
it when reaping a thread, just to be safe.

Signed-off-by: NeilBrown

md: remove 'go_faster' option from ->sync_request()

2015-04-21T22:00:40+00:00

This option is not well justified and testing suggests that
it hardly ever makes any difference.

The comment suggests there might be a need to wait for non-resync
activity indicated by ->nr_waiting, however raise_barrier()
already waits for all of that.

So just remove it to simplify reasoning about speed limiting.

This allows us to remove a 'FIXME' comment from raid5.c as that
never used the flag.

Signed-off-by: NeilBrown

md/raid10: round up to bdev_logical_block_size in narrow_write_error.

2015-02-16T03:51:54+00:00

RAID10 version of earlier fix for RAID1.  We must never initiate
IO with sizes less that logical_block_size.

Signed-off-by: NeilBrown