linux-stable.git/drivers/md, branch linux-2.6.30.y

dm snapshot: fix on disk chunk size validation

2009-09-15T17:45:36+00:00

commit ae0b7448e91353ea5f821601a055aca6b58042cd upstream.

Fix some problems seen in the chunk size processing when activating a
pre-existing snapshot.

For a new snapshot, the chunk size can either be supplied by the creator
or a default value can be used.  For an existing snapshot, the
chunk size in the snapshot header on disk should always be used.

If someone attempts to load an existing snapshot and has the 'default
chunk size' option set, the kernel uses its default value even when it
is incorrect for the snapshot being loaded.  This patch ensures the
correct on-disk value is always used.

Secondly, when the code does use the chunk size stored on the disk it is
prudent to revalidate it, so the code can exit cleanly if it got
corrupted as happened in
https://bugzilla.redhat.com/show_bug.cgi?id=461506 .

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm exception store: split set_chunk_size

2009-09-15T17:45:35+00:00

commit 2defcc3fb4661e7351cb2ac48d843efc4c64db13 upstream.

Break the function set_chunk_size to two functions in preparation for
the fix in the following patch.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm snapshot: fix header corruption race on invalidation

2009-09-15T17:45:34+00:00

commit 61578dcd3fafe6babd72e8db32110cc0b630a432 upstream.

If a persistent snapshot fills up, a race can corrupt the on-disk header
which causes a crash on any future attempt to activate the snapshot
(typically while booting).  This patch fixes the race.

When the snapshot overflows, __invalidate_snapshot is called, which calls
snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
calls write_header. write_header constructs the new header in the "area"
location.

Concurrently, an existing kcopyd job may finish, call copy_callback
and commit_exception method, that goes to persistent_commit_exception.
persistent_commit_exception doesn't do locking, relying on the fact that
callbacks are single-threaded, but it can race with snapshot invalidation and
overwrite the header that is just being written while the snapshot is being
invalidated.

The result of this race is a corrupted header being written that can
lead to a crash on further reactivation (if chunk_size is zero in the
corrupted header).

The fix is to use separate memory areas for each.

See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm snapshot: refactor zero_disk_area to use chunk_io

2009-09-15T17:45:33+00:00

commit 02d2fd31defce6ff77146ad0fef4f19006055d86 upstream.

Refactor chunk_io to prepare for the fix in the following patch.

Pass an area pointer to chunk_io and simplify zero_disk_area to use
chunk_io.  No functional change.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

dm raid1: do not allow log_failure variable to unset after being set

2009-09-15T17:45:32+00:00

commit d2b698644c97cb033261536a4f2010924a00eac9 upstream.

This patch fixes a bug which was triggering a case where the primary leg
could not be changed on failure even when the mirror was in-sync.

The case involves the failure of the primary device along with
the transient failure of the log device.  The problem is that
bios can be put on the 'failures' list (due to log failure)
before 'fail_mirror' is called due to the primary device failure.
Normally, this is fine, but if the log device failure is transient,
a subsequent iteration of the work thread, 'do_mirror', will
reset 'log_failure'.  The 'do_failures' function then resets
the 'in_sync' variable when processing bios on the failures list.
The 'in_sync' variable is what is used to determine if the
primary device can be switched in the event of a failure.  Since
this has been reset, the primary device is incorrectly assumed
to be not switchable.

The case has been seen in the cluster mirror context, where one
machine realizes the log device is dead before the other machines.
As the responsibilities of the server migrate from one node to
another (because the mirror is being reconfigured due to the failure),
the new server may think for a moment that the log device is fine -
thus resetting the 'log_failure' variable.

In any case, it is inappropiate for us to reset the 'log_failure'
variable.  The above bug simply illustrates that it can actually
hurt us.

Signed-off-by: Jonathan Brassow 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

md: Handle growth of v1.x metadata correctly.

2009-08-16T21:18:57+00:00

commit 70471dafe3390243c598a3165dfb86b8b8b3f4fe upstream.

The v1.x metadata does not have a fixed size and can grow
when devices are added.
If it grows enough to require an extra sector of storage,
we need to update the 'sb_size' to match.

Without this, md can write out an incomplete superblock with a
bad checksum, which will be rejected when trying to re-assemble
the array.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: when a level change reduces the number of devices, remove the excess.

2009-08-16T21:18:46+00:00

commit 3a981b03f38dc3b8a69b77cbc679e66c1318a44a upstream.

When an array is changed from RAID6 to RAID5, fewer drives are
needed.  So any device that is made superfluous by the level
conversion must be marked as not-active.
For the RAID6->RAID5 conversion, this will be a drive which only
has 'Q' blocks on it.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md/raid6: release spare page at ->stop()

2009-08-16T21:18:46+00:00

commit 95fc17aac45300f45968aacd97a536ddd8db8101 upstream.

Add missing call to safe_put_page from stop() by unifying open coded
raid5_conf_t de-allocation under free_conf().

Signed-off-by: Dan Williams 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

dm raid1: wake kmirrord when requeueing delayed bios after remote recovery

2009-07-30T21:39:57+00:00

commit 69885683d22d8c05910fd808c01fdce1322739b4 upstream.

The recent commit 7513c2a761d69d2a93f17146b3563527d3618ba0 (dm raid1:
add is_remote_recovering hook for clusters) changed do_writes() to
update the ms->writes list but forgot to wake up kmirrord to process it.

The rule is that when anything is being added on ms->reads, ms->writes
or ms->failures and the list was empty before we must call
wakeup_mirrord (for immediate processing) or delayed_wake (for delayed
processing).  Otherwise the bios could sit on the list indefinitely.

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Greg Kroah-Hartman

md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.

2009-07-20T03:38:53+00:00

commit b8d966efd9a46a9a35beac50cbff6e30565125ef upstream.

If we try to modify one of the md/ sysfs files
  suspend_lo or suspend_hi
when the array is not active, we dereference a NULL.
Protect against that.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman