linux-stable.git/drivers/md/raid1.c, branch v3.4.8

md/raid1: close some possible races on write errors during resync

2012-07-29T15:04:17+00:00

commit 58e94ae18478c08229626daece2fc108a4a23261 upstream.

commit 4367af556133723d0f443e14ca8170d9447317cb
   md/raid1: clear bad-block record when write succeeds.

Added a 'reschedule_retry' call possibility at the end of
end_sync_write, but didn't add matching code at the end of
sync_request_write.  So if the writes complete very quickly, or
scheduling makes it seem that way, then we can miss rescheduling
the request and the resync could hang.

Also commit 73d5c38a9536142e062c35997b044e89166e063b
    md: avoid races when stopping resync.

Fix a race condition in this same code in end_sync_write but didn't
make the change in sync_request_write.

This patch updates sync_request_write to fix both of those.
Patch is suitable for 3.1 and later kernels.

Reported-by: Alexander Lyakas 
Original-version-by: Alexander Lyakas 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid1: fix use-after-free bug in RAID1 data-check code.

2012-07-19T15:58:55+00:00

commit 2d4f4f3384d4ef4f7c571448e803a1ce721113d5 upstream.

This bug has been present ever since data-check was introduce
in 2.6.16.  However it would only fire if a data-check were
done on a degraded array, which was only possible if the array
has 3 or more devices.  This is certainly possible, but is quite
uncommon.

Since hot-replace was added in 3.3 it can happen more often as
the same condition can arise if not all possible replacements are
present.

The problem is that as soon as we submit the last read request, the
'r1_bio' structure could be freed at any time, so we really should
stop looking at it.  If the last device is being read from we will
stop looking at it.  However if the last device is not due to be read
from, we will still check the bio pointer in the r1_bio, but the
r1_bio might already be free.

So use the read_targets counter to make sure we stop looking for bios
to submit as soon as we have submitted them all.

This fix is suitable for any -stable kernel since 2.6.16.

Reported-by: Arnold Schulz 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: raid1/raid10: fix problem with merge_bvec_fn

2012-06-09T15:36:18+00:00

commit aba336bd1d46d6b0404b06f6915ed76150739057 upstream.

The new merge_bvec_fn which calls the corresponding function
in subsidiary devices requires that mddev->merge_check_needed
be set if any child has a merge_bvec_fn.

However were were only setting that when a device was hot-added,
not when a device was present from the start.

This bug was introduced in 3.4 so patch is suitable for 3.4.y
kernels.  However that are conflicts in raid10.c so a separate
patch will be needed for 3.4.y.

Reported-by: Sebastian Riemer 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid1,raid10: Fix calculation of 'vcnt' when processing error recovery.

2012-04-12T06:04:47+00:00

If r1bio->sectors % 8 != 0,then the memcmp and a later
memcpy will omit the last bio_vec.

This is suitable for any stable kernel since 3.1 when bad-block
management was introduced.

Cc: stable@vger.kernel.org
Signed-off-by: majianpeng 
Signed-off-by: NeilBrown

md/raid1,raid10: don't compare excess byte during consistency check.

2012-04-03T05:39:23+00:00

When comparing two pages read from different legs of a mirror, only
compare the bytes that were read, not the whole page.

In most cases we read a whole page, but in some cases with
bad blocks or odd sizes devices we might read fewer than that.

This bug has been present "forever" but at worst it might cause
a report of two many mismatches and generate a little bit
extra resync IO, so there is no need to back-port to -stable
kernels.

Reported-by: majianpeng 
Signed-off-by: NeilBrown

md/raid1:Remove unnecessary rcu_dereference(conf->mirrors[i].rdev).

2012-04-03T05:37:33+00:00

Because rde->nr_pending > 0,so can not remove this disk.
And in any case, we aren't holding rcu_read_lock()

Signed-off-by: majianpeng 
Signed-off-by: NeilBrown

md/raid1: If md_integrity_register() failed,run() must free the mem

2012-04-01T23:48:38+00:00

Signed-off-by: majianpeng 
Signed-off-by: NeilBrown

md/raid1: handle merge_bvec_fn in member devices.

2012-03-19T01:46:39+00:00

Currently we don't honour merge_bvec_fn in member devices so if there
is one, we force all requests to be single-page at most.
This is not ideal.

So create a raid1 merge_bvec_fn to check that function in children
as well.

This introduces a small problem.  There is no locking around calls
the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
device added between these could end up getting a request which
violates its merge_bvec_fn.

Currently the best we can do is synchronize_sched().  This will work
providing no preemption happens.  If there is is preemption, we just
have to hope that new devices are largely consistent with old devices.

Signed-off-by: NeilBrown

md: tidy up rdev_for_each usage.

2012-03-19T01:46:39+00:00

md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
mddev.  However it uses the 'safe' version of list_for_each_entry,
and so requires the extra variable, but doesn't include 'safe' in the
name, which is useful documentation.

Consequently some places use this safe version without needing it, and
many use an explicity list_for_each entry.

So:
 - rename rdev_for_each to rdev_for_each_safe
 - create a new rdev_for_each which uses the plain
   list_for_each_entry,
 - use the 'safe' version only where needed, and convert all other
   list_for_each_entry calls to use rdev_for_each.

Signed-off-by: NeilBrown

md/raid1,raid10: avoid deadlock during resync/recovery.

2012-03-19T01:46:38+00:00

If RAID1 or RAID10 is used under LVM or some other stacking
block device, it is possible to enter a deadlock during
resync or recovery.
This can happen if the upper level block device creates
two requests to the RAID1 or RAID10.  The first request gets
processed, blocks recovery and queue requests for underlying
requests in current->bio_list.  A resync request then starts
which will wait for those requests and block new IO.

But then the second request to the RAID1/10 will be attempted
and it cannot progress until the resync request completes,
which cannot progress until the underlying device requests complete,
which are on a queue behind that second request.

So allow that second request to proceed even though there is
a resync request about to start.

This is suitable for any -stable kernel.

Cc: stable@vger.kernel.org
Reported-by: Ray Morris 
Tested-by: Ray Morris 
Signed-off-by: NeilBrown