linux-stable.git/drivers/md/raid10.c, branch linux-2.6.33.y

md: raid10: Fix null pointer dereference in fix_read_error()

2010-08-02T17:26:36+00:00

commit 0544a21db02c1d8883158fd6f323364f830a120a upstream.

Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Signed-off-by: Prasanna S. Panchamukhi 
Signed-off-by: Rob Becker 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: Fix read balancing in RAID1 and RAID10 on drives > 2TB

2010-07-05T18:15:49+00:00

commit af3a2cd6b8a479345786e7fe5e199ad2f6240e56 upstream.

read_balance uses a "unsigned long" for a sector number which
will get truncated beyond 2TB.
This will cause read-balancing to be non-optimal, and can cause
data to be read from the 'wrong' branch during a resync.  This has a
very small chance of returning wrong data.

Reported-by: Jordan Russell 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: deal with merge_bvec_fn in component devices better.

2010-04-26T14:48:03+00:00

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_phys_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: add MODULE_DESCRIPTION for all md related modules.

2009-12-14T01:51:41+00:00

Suggested by  Oren Held 

Signed-off-by: NeilBrown

raid: improve MD/raid10 handling of correctable read errors.

2009-12-14T01:51:41+00:00

We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev->read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev->read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs->fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker 
Signed-off-by: NeilBrown

md/raid10: print more useful messages on device failure.

2009-12-14T01:51:41+00:00

When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.

Signed-off-by: Robert Becker 
Signed-off-by: NeilBrown

md: remove needless setting of thread->timeout in raid10_quiesce

2009-12-14T01:51:41+00:00

As bitmap_create and bitmap_destroy already set thread->timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown

md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.

2009-12-14T01:51:41+00:00

This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown

md: move offset, daemon_sleep and chunksize out of bitmap structure

2009-12-14T01:51:41+00:00

... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown

md: support barrier requests on all personalities.

2009-12-14T01:49:49+00:00

Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown 
Reviewed-by: Andre Noll