<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/md/raid10.c, branch v2.6.33</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>md: add MODULE_DESCRIPTION for all md related modules.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0efb9e6191e1d3d34c1db90b829b742bc36d532e'/>
<id>0efb9e6191e1d3d34c1db90b829b742bc36d532e</id>
<content type='text'>
Suggested by  Oren Held &lt;orenhe@il.ibm.com&gt;

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Suggested by  Oren Held &lt;orenhe@il.ibm.com&gt;

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid: improve MD/raid10 handling of correctable read errors.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>Robert Becker</name>
<email>Rob.Becker@riverbed.com</email>
</author>
<published>2009-12-14T01:49:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1e50915fe0bbf7a46db0fa7e1e604d3fc95f057d'/>
<id>1e50915fe0bbf7a46db0fa7e1e604d3fc95f057d</id>
<content type='text'>
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev-&gt;read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev-&gt;read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs-&gt;fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev-&gt;read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev-&gt;read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs-&gt;fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: print more useful messages on device failure.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>Robert Becker</name>
<email>Rob.Becker@riverbed.com</email>
</author>
<published>2009-12-14T01:49:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=67b8dc4b06b0e97df55fd76e209f34f9a52e820e'/>
<id>67b8dc4b06b0e97df55fd76e209f34f9a52e820e</id>
<content type='text'>
When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: remove needless setting of thread-&gt;timeout in raid10_quiesce</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9cd30fdc33cde9ae4ac55a1ccbbb89f3f7b9b2f2'/>
<id>9cd30fdc33cde9ae4ac55a1ccbbb89f3f7b9b2f2</id>
<content type='text'>
As bitmap_create and bitmap_destroy already set thread-&gt;timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As bitmap_create and bitmap_destroy already set thread-&gt;timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1b04be96f6910ee415287bf0d5309c7d4c94bd2b'/>
<id>1b04be96f6910ee415287bf0d5309c7d4c94bd2b</id>
<content type='text'>
This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: move offset, daemon_sleep and chunksize out of bitmap structure</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=42a04b5078ce73a32f85762551d5703c5bd646a1'/>
<id>42a04b5078ce73a32f85762551d5703c5bd646a1</id>
<content type='text'>
... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: support barrier requests on all personalities.</title>
<updated>2009-12-14T01:49:49+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a2826aa92e2e14db372eda01d333267258944033'/>
<id>a2826aa92e2e14db372eda01d333267258944033</id>
<content type='text'>
Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev-&gt;barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Andre Noll &lt;maan@systemlinux.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev-&gt;barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Andre Noll &lt;maan@systemlinux.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: raid1/raid10: handle allocation errors during array setup.</title>
<updated>2009-10-16T04:55:44+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-10-16T04:55:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ed9bfdf1a40952fd0f8094ec77f876b84ead69af'/>
<id>ed9bfdf1a40952fd0f8094ec77f876b84ead69af</id>
<content type='text'>
Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.

So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.

So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid1/raid10: add a cond_resched</title>
<updated>2009-10-16T04:55:32+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-10-16T04:55:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1d9d52416c0445019ccc1f0fddb9a227456eb61b'/>
<id>1d9d52416c0445019ccc1f0fddb9a227456eb61b</id>
<content type='text'>
During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).

Reported-By: Lee Howard &lt;faxguy@howardsilvan.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).

Reported-By: Lee Howard &lt;faxguy@howardsilvan.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: raid-1/10: fix RW bits manipulation</title>
<updated>2009-09-23T08:20:15+00:00</updated>
<author>
<name>Dmitry Monakhov</name>
<email>rjevskiy@gmail.com</email>
</author>
<published>2009-09-20T01:52:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1ef04fefe2241087d9db7e9615c3f11b516e36cf'/>
<id>1ef04fefe2241087d9db7e9615c3f11b516e36cf</id>
<content type='text'>
Recently Jens has changed bio_rw_flagged() logic by following
commit 1f98a13f623e0ef666690a18c1250335fc6d7ef1. Now it returns
bool instead of int. This broke raid1/raid10 RW bits manipulation logic.
One of visible result is BUG_ON triggering due to empty barrier
here scsi_lib.c:1108 scsi_setup_fs_cmnd()

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Recently Jens has changed bio_rw_flagged() logic by following
commit 1f98a13f623e0ef666690a18c1250335fc6d7ef1. Now it returns
bool instead of int. This broke raid1/raid10 RW bits manipulation logic.
One of visible result is BUG_ON triggering due to empty barrier
here scsi_lib.c:1108 scsi_setup_fs_cmnd()

Signed-off-by: Dmitry Monakhov &lt;dmonakhov@openvz.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
