linux-stable.git/drivers/md/raid5.c, branch linux-3.19.y

md/raid5: Fix livelock when array is both resyncing and degraded.

2015-03-06T22:57:40+00:00

commit 26ac107378c4742978216be1005b7291b799c7b2 upstream.

Commit a7854487cd7128a30a7f4f5259de9f67d5efb95f:
  md: When RAID5 is dirty, force reconstruct-write instead of read-modify-write.

Causes an RCW cycle to be forced even when the array is degraded.
A degraded array cannot support RCW as that requires reading all data
blocks, and one may be missing.

Forcing an RCW when it is not possible causes a live-lock and the code
spins, repeatedly deciding to do something that cannot succeed.

So change the condition to only force RCW on non-degraded arrays.

Reported-by: Manibalan P 
Bisected-by: Jes Sorensen 
Tested-by: Jes Sorensen 
Signed-off-by: NeilBrown 
Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f
Signed-off-by: Greg Kroah-Hartman

md/raid5: fix another livelock caused by non-aligned writes.

2015-02-02T05:57:17+00:00

If a non-page-aligned write is destined for a device which
is missing/faulty, we can deadlock.

As the target device is missing, a read-modify-write cycle
is not possible.
As the write is not for a full-page, a recontruct-write cycle
is not possible.

This should be handled by logic in fetch_block() which notices
there is a non-R5_OVERWRITE write to a missing device, and so
loads all blocks.

However since commit 67f455486d2ea2, that code requires
STRIPE_PREREAD_ACTIVE before it will active, and those circumstances
never set STRIPE_PREREAD_ACTIVE.

So: in handle_stripe_dirtying, if neither rmw or rcw was possible,
set STRIPE_DELAYED, which will cause STRIPE_PREREAD_ACTIVE be set
after a suitable delay.

Fixes: 67f455486d2ea20b2d94d6adf5b9b783d079e321
Cc: stable@vger.kernel.org (v3.16+)
Reported-by: Mikulas Patocka 
Tested-by: Heinz Mauelshagen 
Signed-off-by: NeilBrown

md/raid5: fetch_block must fetch all the blocks handle_stripe_dirtying wants.

2014-12-03T05:07:58+00:00

It is critical that fetch_block() and handle_stripe_dirtying()
are consistent in their analysis of what needs to be loaded.
Otherwise raid5 can wait forever for a block that won't be loaded.

Currently when writing to a RAID5 that is resyncing, to a location
beyond the resync offset, handle_stripe_dirtying chooses a
reconstruct-write cycle, but fetch_block() assumes a
read-modify-write, and a lockup can happen.

So treat that case just like RAID6, just as we do in
handle_stripe_dirtying.  RAID6 always does reconstruct-write.

This bug was introduced when the behaviour of handle_stripe_dirtying
was changed in 3.7, so the patch is suitable for any kernel since,
though it will need careful merging for some versions.

Cc: stable@vger.kernel.org (v3.7+)
Fixes: a7854487cd7128a30a7f4f5259de9f67d5efb95f
Reported-by: Henry Cai 
Signed-off-by: NeilBrown

md: remove unwanted white space from md.c

2014-10-14T02:08:29+00:00

My editor shows much of this is RED.

Signed-off-by: NeilBrown

md/raid5: fix init_stripe() inconsistencies

2014-10-14T02:08:28+00:00

raid5: fix init_stripe() inconsistencies

1) remove_hash() is not necessary. We will only be called right after
   get_free_stripe(). There we have already a call to remove_hash().

2) Tracing prints out the sector of the freed stripe and not the sector
   that we want to initialize.

Signed-off-by: NeilBrown

md: use set_bit/clear_bit instead of shift/mask for bi_flags changes.

2014-10-08T23:07:04+00:00

Using {set,clear}_bit is more consistent than shifting and masking.

No functional change.

Signed-off-by: NeilBrown

md/raid5: disable 'DISCARD' by default due to safety concerns.

2014-10-02T03:45:00+00:00

It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint.  Some devices in some cases don't do what it
says on the label.

The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero.  If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks.  If all were zero, this would
be the case.  If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.

As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.

As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.

If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
    raid456.devices_handle_discard_safely=Y

As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7

Cc: Shaohua Li 
Cc: "Martin K. Petersen" 
Cc: Mike Snitzer 
Cc: Heinz Mauelshagen 
Cc: stable@vger.kernel.org (3.7+)
Acked-by: Martin K. Petersen 
Acked-by: Mike Snitzer 
Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637
Signed-off-by: NeilBrown

md/raid6: avoid data corruption during recovery of double-degraded RAID6

2014-08-18T04:49:46+00:00

During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Cc: stable@vger.kernel.org (2.6.32+)
Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov 
Cc: Dan Williams 
Reported-by: "Manibalan P" 
Tested-by: "Manibalan P" 
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown 
Acked-by: Dan Williams

md/raid5: avoid livelock caused by non-aligned writes.

2014-08-18T04:49:41+00:00

If a stripe in a raid6 array received a write to each data block while
the array is degraded, and if any of these writes to a missing device
are not page-aligned, then a live-lock happens.

In this case the P and Q blocks need to be read so that the part of
the missing block which is *not* being updated by the write can be
constructed.  Due to a logic error, these blocks are not loaded, so
the update cannot proceed and the stripe is 'handled' repeatedly in an
infinite loop.

This bug is unlikely as most writes are page aligned.  However as it
can lead to a livelock it is suitable for -stable.  It was introduced
in 3.16.

Cc: stable@vger.kernel.org (v3.16)
Fixed: 67f455486d2ea20b2d94d6adf5b9b783d079e321
Signed-off-by: NeilBrown

Merge tag 'md/3.16' of git://neil.brown.name/md

2014-06-11T15:33:41+00:00

Pull md updates from Neil Brown:
 "Assorted md fixes for 3.16

  Mostly performance improvements with a few corner-case bug fixes"

* tag 'md/3.16' of git://neil.brown.name/md:
  raid5: speedup sync_request processing
  md/raid5: deadlock between retry_aligned_read with barrier io
  raid5: add an option to avoid copy data from bio to stripe cache
  md/bitmap: remove confusing code from filemap_get_page.
  raid5: avoid release list until last reference of the stripe
  md: md_clear_badblocks should return an error code on failure.
  md/raid56: Don't perform reads to support writes until stripe is ready.
  md: refuse to change shape of array if it is active but read-only