linux-stable.git/drivers/md/multipath.c, branch linux-2.6.33.y

md: deal with merge_bvec_fn in component devices better.

2010-04-26T14:48:03+00:00

commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_phys_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman

md: add MODULE_DESCRIPTION for all md related modules.

2009-12-14T01:51:41+00:00

Suggested by  Oren Held 

Signed-off-by: NeilBrown

md: support barrier requests on all personalities.

2009-12-14T01:49:49+00:00

Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev->barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown 
Reviewed-by: Andre Noll

md: remove unnecessary memset from multipath.

2009-09-23T08:16:31+00:00

Recent commit bbba809e96539672f775a3d70102657d05816a5b
replaced mempool_create_kzalloc_pool with mempool_create_kmalloc_pool
plus a memset.
This memset is not needed (and we didn't need kzalloc in the first
place).
Ever field of the allocated structure (struct multipath_bh) is
initialised immediately except retry_list, and memset does not
initial a list_head anyway.

To remove the memset.

Signed-off-by: NeilBrown

md: report device as congested when suspended

2009-09-23T08:10:29+00:00

This should writeback from coming when the device is temporarily
suspended.

Signed-off-by: NeilBrown

md: Improve name of threads created by md_register_thread

2009-09-23T08:09:45+00:00

The management thread for raid4,5,6 arrays are all called
mdX_raid5, independent of the actual raid level, which is wrong and
can be confusion.

So change md_register_thread to use the name from the personality
unless no alternate name (like 'resync' or 'reshape') is given.

This is simpler and more correct.

Cc: Jinzc 
Signed-off-by: NeilBrown

md: avoid use of broken kzalloc mempool

2009-09-22T14:17:35+00:00

The kzalloc mempool does not re-zero items that have been used and then
returned to the pool.  Manually zero the allocated multipath_bh instead.

Acked-by: Neil Brown 
Signed-off-by: Sage Weil 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

bio: first step in sanitizing the bio->bi_rw flag testing

2009-09-11T12:33:31+00:00

Get rid of any functions that test for these bits and make callers
use bio_rw_flagged() directly. Then it is at least directly apparent
what variable and flag they check.

Signed-off-by: Jens Axboe

md: Push down data integrity code to personalities.

2009-08-03T00:59:47+00:00

This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity.  The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.

Signed-off-by: Andre Noll 
Signed-off-by: NeilBrown

md: Use new topology calls to indicate alignment and I/O sizes

2009-07-01T01:13:45+00:00

Switch MD over to the new disk_stack_limits() function which checks for
aligment and adjusts preferred I/O sizes when stacking.

Also indicate preferred I/O sizes where applicable.

Signed-off-by: Martin K. Petersen 
Signed-off-by: Mike Snitzer 
Signed-off-by: NeilBrown