linux.git/drivers/md/raid10.c, branch v3.1-rc2

md/raid10: handle further errors during fix_read_error better.

2011-07-28T01:39:25+00:00

If we find more read/write errors we should record a bad block before
failing the device.

Signed-off-by: NeilBrown

md/raid10: Handle read errors during recovery better.

2011-07-28T01:39:25+00:00

Currently when we get a read error during recovery, we simply abort
the recovery.

Instead, repeat the read in page-sized blocks.
On successful reads, write to the target.
On read errors, record a bad block on the destination,
and only if that fails do we abort the recovery.

As we now retry reads we need to know where we read from.  This was in
bi_sector but that can be changed during a read attempt.
So store the correct from_addr and to_addr in the r10_bio for later
access.


Signed-off-by: NeilBrown

md/raid10: simplify read error handling during recovery.

2011-07-28T01:39:25+00:00

If a read error is detected during recovery the code currently
fails the read device.
This isn't really necessary.  recovery_request_write will signal
a write error to end_sync_write and it will record a write
error on the destination device which will record a bad block
there or kick it from the array.

So just remove this call to do md_error.

Signed-off-by: NeilBrown

md/raid10: record bad blocks due to write errors during resync/recovery.

2011-07-28T01:39:25+00:00

If we get a write error during resync/recovery don't fail the device
but instead record a bad block.  If that fails we can then fail the
device.

Signed-off-by: NeilBrown

md/raid10: attempt to fix read errors during resync/check

2011-07-28T01:39:25+00:00

We already attempt to fix read errors found during normal IO
and a 'repair' process.
It is best to try to repair them at any time they are found,
so move a test so that during sync and check a read error will
be corrected by over-writing with good data.

If both (all) devices have known bad blocks in the sync section we
won't try to fix even though the bad blocks might not overlap.  That
should be considered later.

Also if we hit a read error during recovery we don't try to fix it.
It would only be possible to fix if there were at least three copies
of data, which is not very common with RAID10.  But it should still
be considered later.

Signed-off-by: NeilBrown

md/raid10: Handle write errors by updating badblock log.

2011-07-28T01:39:24+00:00

When we get a write error (in the data area, not in metadata),
update the badblock log rather than failing the whole device.

As the write may well be many blocks, we trying writing each
block individually and only log the ones which fail.

Signed-off-by: NeilBrown

md/raid10: clear bad-block record when write succeeds.

2011-07-28T01:39:24+00:00

If we succeed in writing to a block that was recorded as
being bad, we clear the bad-block record.

This requires some delayed handling as the bad-block-list update has
to happen in process-context.

Signed-off-by: NeilBrown

md/raid10: avoid writing to known bad blocks on known bad drives.

2011-07-28T01:39:24+00:00

Writing to known bad blocks on drives that have seen a write error
is asking for trouble.  So try to avoid these blocks.

Signed-off-by: NeilBrown

md/raid10 record bad blocks as needed during recovery.

2011-07-28T01:39:24+00:00

When recovering one or more devices, if all the good devices have
bad blocks we should record a bad block on the device being rebuilt.

If this fails, we need to abort the recovery.

To ensure we don't think that we aborted later than we actually did,
we need to move the check for MD_RECOVERY_INTR earlier in md_do_sync,
in particular before mddev->curr_resync is updated.

Signed-off-by: NeilBrown

md/raid10: avoid reading known bad blocks during resync/recovery.

2011-07-28T01:39:24+00:00

During resync/recovery limit the size of the request to avoid
reading into a bad block that does not start at-or-before the current
read address.

Similarly if there is a bad block at this address, don't allow the
current request to extend beyond the end of that bad block.

Now that we don't ever read from known bad blocks, it is safe to allow
devices with those blocks into the array.

Signed-off-by: NeilBrown