linux-stable.git/drivers/md/raid5.c, branch linux-3.1.y

md/raid5: STRIPE_ACTIVE has lock semantics, add barriers

2011-11-11T17:44:50+00:00

commit 257a4b42af7586fab4eaec7f04e6896b86551843 upstream.

All updates that occur under STRIPE_ACTIVE should be globally visible
when STRIPE_ACTIVE clears.  test_and_set_bit() implies a barrier, but
clear_bit() does not.

This is suitable for 3.1-stable.

Signed-off-by: Dan Williams 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid5: abort any pending parity operations when array fails.

2011-11-11T17:44:49+00:00

commit 9a3f530f39f4490eaa18b02719fb74ce5f4d2d86 upstream.

When the number of failed devices exceeds the allowed number
we must abort any active parity operations (checks or updates) as they
are no longer meaningful, and can lead to a BUG_ON in
handle_parity_checks6.

This bug was introduce by commit 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
in 2.6.29.

Reported-by: Manish Katiyar 
Tested-by: Manish Katiyar 
Acked-by: Dan Williams 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid5: fix bug that could result in reads from a failed device.

2011-11-11T17:43:12+00:00

commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream.

This bug was introduced in 415e72d034c50520ddb7ff79e7d1792c1306f0c9
which was in 2.6.36.

There is a small window of time between when a device fails and when
it is removed from the array.  During this time we might still read
from it, but we won't write to it - so it is possible that we could
read stale data.

We didn't need the test of 'Faulty' before because the test on
In_sync is sufficient.  Since we started allowing reads from the early
part of non-In_sync devices we need a test on Faulty too.

This is suitable for any kernel from 2.6.36 onwards, though the patch
might need a bit of tweaking in 3.0 and earlier.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: Avoid waking up a thread after it has been freed.

2011-09-21T05:30:20+00:00

Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
   without subsequently clearing ->thread.  A subsequent call
   to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
   disappeared either by:
      - holding the ->mutex
      - having an active request, so something else must be keeping
        the array active.
   However mddev_unlock calls md_wakeup_thread after dropping the
   mutex and without any certainty of an active request, so the
   ->thread could theoretically disappear.
   So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.

Reported-by: "Moshe Melnikov" 
Signed-off-by: NeilBrown 
cc: stable@kernel.org

md/raid5: fix a hang on device failure.

2011-08-31T02:49:14+00:00

Waiting for a 'blocked' rdev to become unblocked in the raid5d thread
cannot work with internal metadata as it is the raid5d thread which
will clear the blocked flag.
This wasn't a problem in 3.0 and earlier as we only set the blocked
flag when external metadata was used then.
However we now set it always, so we need to be more careful.

Signed-off-by: NeilBrown

md/raid5: Clear bad blocks on successful write.

2011-07-28T01:39:23+00:00

On a successful write to a known bad block, flag the sh
so that raid5d can remove the known bad block from the list.

Signed-off-by: NeilBrown

md/raid5. Don't write to known bad block on doubtful devices.

2011-07-28T01:39:22+00:00

If a device has seen write errors, don't write to any known
bad blocks on that device.

Signed-off-by: NeilBrown

md/raid5: write errors should be recorded as bad blocks if possible.

2011-07-28T01:39:22+00:00

When a write error is detected, don't mark the device as failed
immediately but rather record the fact for handle_stripe to deal with.

Handle_stripe then attempts to record a bad block.  Only if that fails
does the device get marked as faulty.

Signed-off-by: NeilBrown

md/raid5: use bad-block log to improve handling of uncorrectable read errors.

2011-07-28T01:39:22+00:00

If we get an uncorrectable read error - record a bad block rather than
failing the device.
And if these errors (which may be due to known bad blocks) cause
recovery to be impossible, record a bad block on the recovering
devices, or abort the recovery.

As we might abort a recovery without failing a device we need to teach
RAID5 about recovery_disabled handling.

Signed-off-by: NeilBrown

md/raid5: avoid reading from known bad blocks.

2011-07-28T01:39:22+00:00

There are two times that we might read in raid5:
1/ when a read request fits within a chunk on a single
   working device.
   In this case, if there is any bad block in the range of
   the read, we simply fail the cache-bypass read and
   perform the read though the stripe cache.

2/ when reading into the stripe cache.  In this case we
   mark as failed any device which has a bad block in that
   strip (1 page wide).
   Note that we will both avoid reading and avoid writing.
   This is correct (as we will never read from the block, there
   is no point writing), but not optimal (as writing could 'fix'
   the error) - that will be addressed later.

If we have not seen any write errors on the device yet, we treat a bad
block like a recent read error.  This will encourage an attempt to fix
the read error which will either generate a write error, or will
ensure good data is stored there.  We don't yet forget the bad block
in that case.  That comes later.

Now that we honour bad blocks when reading we can allow devices with
bad blocks into the array.

Signed-off-by: NeilBrown