linux.git/drivers/md, branch v3.3-rc4

Merge tag 'md-3.3-fixes' of git://neil.brown.name/md

2012-02-09T03:06:30+00:00

Some simple md-related fixes.

1/ two small fixes to ensure we handle an interrupted resync properly.
2/ avoid loading the bitmap multiple times in dm-raid

* tag 'md-3.3-fixes' of git://neil.brown.name/md:
  md: two small fixes to handling interrupt resync.
  Prevent DM RAID from loading bitmap twice.

md: two small fixes to handling interrupt resync.

2012-02-07T01:01:51+00:00

1/ If a resync is aborted we should record how far we got
 (recovery_cp) the last request that we know has completed
 (->curr_resync_completed) rather than the last request that was
 submitted (->curr_resync).

2/ When a resync aborts we still want to update the metadata with
 any changes, so set MD_CHANGE_DEVS even if we 'skip'.

Signed-off-by: NeilBrown

Prevent DM RAID from loading bitmap twice.

2012-01-30T22:43:41+00:00

The life cycle of a device-mapper target is:
1) create
2) resume
3) suspend
*) possibly repeat from 2
4) destroy

The dm-raid target is unconditionally calling MD's bitmap_load function upon
every resume.  If steps 2 & 3 above are repeated, bitmap_load is called
multiple times.  It is only written to be called once; otherwise, it allocates
new memory for the bitmap (without freeing the old) and incrementing the number
of pages it thinks it has without zeroing first.  This ultimately leads to
access beyond allocated memory and lost memory.

Simply avoiding the bitmap_load call upon resume is not sufficient.  If the
target was suspended while the initial recovery was only partially complete,
it needs to be restarted when the target is resumed.  This is why
'md_wakeup_thread' is called before issuing the 'mddev_resume'.

Signed-off-by: Jonathan Brassow 
Signed-off-by: NeilBrown

Merge branch 'for-3.3/core' of git://git.kernel.dk/linux-block

2012-01-15T20:24:45+00:00

* 'for-3.3/core' of git://git.kernel.dk/linux-block: (37 commits)
  Revert "block: recursive merge requests"
  block: Stop using macro stubs for the bio data integrity calls
  blockdev: convert some macros to static inlines
  fs: remove unneeded plug in mpage_readpages()
  block: Add BLKROTATIONAL ioctl
  block: Introduce blk_set_stacking_limits function
  block: remove WARN_ON_ONCE() in exit_io_context()
  block: an exiting task should be allowed to create io_context
  block: ioc_cgroup_changed() needs to be exported
  block: recursive merge requests
  block, cfq: fix empty queue crash caused by request merge
  block, cfq: move icq creation and rq->elv.icq association to block core
  block, cfq: restructure io_cq creation path for io_context interface cleanup
  block, cfq: move io_cq exit/release to blk-ioc.c
  block, cfq: move icq cache management to block core
  block, cfq: move io_cq lookup to blk-ioc.c
  block, cfq: move cfqd->icq_list to request_queue and add request->elv.icq
  block, cfq: reorganize cfq_io_context into generic and cfq specific parts
  block: remove elevator_queue->ops
  block: reorder elevator switch sequence
  ...

Fix up conflicts in:
 - block/blk-cgroup.c
	Switch from can_attach_task to can_attach
 - block/cfq-iosched.c
	conflict with now removed cic index changes (we now use q->id instead)

dm: do not forward ioctls from logical volumes to the underlying device

2012-01-14T23:07:24+00:00

A logical volume can map to just part of underlying physical volume.
In this case, it must be treated like a partition.

Based on a patch from Alasdair G Kergon.

Cc: Alasdair G Kergon 
Cc: dm-devel@redhat.com
Signed-off-by: Paolo Bonzini 
Signed-off-by: Linus Torvalds

Merge tag 'md-3.3-fixes' of git://neil.brown.name/md

2012-01-12T02:51:55+00:00

Two bugfixes for md.

One is a recently introduced regression that affects an unusual
configuration with a guaranteed BUG_ON.  Has been tagged for -stable.
The other is minor missing functionality.

* tag 'md-3.3-fixes' of git://neil.brown.name/md:
  md/raid1: perform bad-block tests for WriteMostly devices too.
  md: notify the 'degraded' sysfs attribute on failure.

block: Introduce blk_set_stacking_limits function

2012-01-11T15:27:11+00:00

Stacking driver queue limits are typically bounded exclusively by the
capabilities of the low level devices, not by the stacking driver
itself.

This patch introduces blk_set_stacking_limits() which has more liberal
metrics than the default queue limits function. This allows us to
inherit topology parameters from bottom devices without manually
tweaking the default limits in each driver prior to calling the stacking
function.

Since there is now a clear distinction between stacking and low-level
devices, blk_set_default_limits() has been modified to carry the more
conservative values that we used to manually set in
blk_queue_make_request().

Signed-off-by: Martin K. Petersen 
Acked-by: Mike Snitzer 
Signed-off-by: Jens Axboe

md/raid1: perform bad-block tests for WriteMostly devices too.

2012-01-10T21:35:17+00:00

We normally try to avoid reading from write-mostly devices, but when
we do we really have to check for bad blocks and be sure not to
try reading them.

With the current code, best_good_sectors might not get set and that
causes zero-length read requests to be send down which is very
confusing.

This bug was introduced in commit d2eb35acfdccbe2 and so the patch
is suitable for 3.1.x and 3.2.x

Reported-and-tested-by: Michał Mirosław 
Reported-and-tested-by: Art -kwaak- van Breemen 
Signed-off-by: NeilBrown 
Cc: stable@vger.kernel.org

md: notify the 'degraded' sysfs attribute on failure.

2012-01-10T21:35:14+00:00

We currently only 'notify' changes to the 'degraded' attribute
when it decreases, not when it increases.

Notifying on failure is a little awkward as it happen in
interrupt context.
So instead, notify when we remove the failed device from the array,
which is very soon afterwards.

Reported-and-tested-by: Mikhail Balabin 
Signed-off-by: NeilBrown

Merge tag 'md-3.3' of git://neil.brown.name/md

2012-01-08T21:28:33+00:00

md update for 3.3

Big change is new hot-replacement.
A slot in an array can hold 2 devices - one that
wants-replacement and one that is the replacement.
Once the replacement is built - either from the
original or (in the case of errors) from elsewhere,
the wants-replacement device will be removed.

* tag 'md-3.3' of git://neil.brown.name/md: (36 commits)
  md/raid1: Mark device want_replacement when we see a write error.
  md/raid1: If there is a spare and a want_replacement device, start replacement.
  md/raid1: recognise replacements when assembling arrays.
  md/raid1: handle activation of replacement device when recovery completes.
  md/raid1: Allow a failed replacement device to be removed.
  md/raid1: Allocate spare to store replacement devices and their bios.
  md/raid1:  Replace use of mddev->raid_disks with conf->raid_disks.
  md/raid10: If there is a spare and a want_replacement device, start replacement.
  md/raid10: recognise replacements when assembling array.
  md/raid10: Allow replacement device to be replace old drive.
  md/raid10: handle recovery of replacement devices.
  md/raid10:  Handle replacement devices during resync.
  md/raid10: writes should get directed to replacement as well as original.
  md/raid10: allow removal of failed replacement devices.
  md/raid10: preferentially read from replacement device if possible.
  md/raid10:  change read_balance to return an rdev
  md/raid10: prepare data structures for handling replacement.
  md/raid5: Mark device want_replacement when we see a write error.
  md/raid5: If there is a spare and a want_replacement device, start replacement.
  md/raid5: recognise replacements when assembling array.
  ...