linux.git/drivers/md/raid10.c, branch v2.6.33

md: add MODULE_DESCRIPTION for all md related modules.

2009-12-14T01:51:41+00:00

Suggested by  Oren Held 

Signed-off-by: NeilBrown

raid: improve MD/raid10 handling of correctable read errors.

2009-12-14T01:51:41+00:00

We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev->read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev->read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs->fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker 
Signed-off-by: NeilBrown

md/raid10: print more useful messages on device failure.

2009-12-14T01:51:41+00:00

When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.

Signed-off-by: Robert Becker 
Signed-off-by: NeilBrown

md: remove needless setting of thread->timeout in raid10_quiesce

2009-12-14T01:51:41+00:00

As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown

md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.

2009-12-14T01:51:41+00:00

This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown

md: move offset, daemon_sleep and chunksize out of bitmap structure

2009-12-14T01:51:41+00:00

... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown

md: support barrier requests on all personalities.

2009-12-14T01:49:49+00:00

Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown 
Reviewed-by: Andre Noll

md: raid1/raid10: handle allocation errors during array setup.

2009-10-16T04:55:44+00:00

Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.

So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.

Signed-off-by: NeilBrown

md/raid1/raid10: add a cond_resched

2009-10-16T04:55:32+00:00

During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).

Reported-By: Lee Howard 
Signed-off-by: NeilBrown

md: raid-1/10: fix RW bits manipulation

2009-09-23T08:20:15+00:00

Recently Jens has changed bio_rw_flagged() logic by following
commit 1f98a13f623e0ef666690a18c1250335fc6d7ef1. Now it returns
bool instead of int. This broke raid1/raid10 RW bits manipulation logic.
One of visible result is BUG_ON triggering due to empty barrier
here scsi_lib.c:1108 scsi_setup_fs_cmnd()

Signed-off-by: Dmitry Monakhov 
Signed-off-by: NeilBrown