<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/md/raid10.c, branch linux-2.6.33.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>md: raid10: Fix null pointer dereference in fix_read_error()</title>
<updated>2010-08-02T17:26:36+00:00</updated>
<author>
<name>Prasanna S. Panchamukhi</name>
<email>prasanna.panchamukhi@riverbed.com</email>
</author>
<published>2010-06-24T03:31:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7788f4675013950f6e7f6b817f620722675c7c16'/>
<id>7788f4675013950f6e7f6b817f620722675c7c16</id>
<content type='text'>
commit 0544a21db02c1d8883158fd6f323364f830a120a upstream.

Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Signed-off-by: Prasanna S. Panchamukhi &lt;prasanna.panchamukhi@riverbed.com&gt;
Signed-off-by: Rob Becker &lt;rbecker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 0544a21db02c1d8883158fd6f323364f830a120a upstream.

Such NULL pointer dereference can occur when the driver was fixing the
read errors/bad blocks and the disk was physically removed
causing a system crash. This patch check if the
rcu_dereference() returns valid rdev before accessing it in fix_read_error().

Signed-off-by: Prasanna S. Panchamukhi &lt;prasanna.panchamukhi@riverbed.com&gt;
Signed-off-by: Rob Becker &lt;rbecker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: Fix read balancing in RAID1 and RAID10 on drives &gt; 2TB</title>
<updated>2010-07-05T18:15:49+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-05-07T22:20:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fcab0185eddfe15e4bd7f484db0d4441186a7b01'/>
<id>fcab0185eddfe15e4bd7f484db0d4441186a7b01</id>
<content type='text'>
commit af3a2cd6b8a479345786e7fe5e199ad2f6240e56 upstream.

read_balance uses a "unsigned long" for a sector number which
will get truncated beyond 2TB.
This will cause read-balancing to be non-optimal, and can cause
data to be read from the 'wrong' branch during a resync.  This has a
very small chance of returning wrong data.

Reported-by: Jordan Russell &lt;jr-list-2010@quo.to&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit af3a2cd6b8a479345786e7fe5e199ad2f6240e56 upstream.

read_balance uses a "unsigned long" for a sector number which
will get truncated beyond 2TB.
This will cause read-balancing to be non-optimal, and can cause
data to be read from the 'wrong' branch during a resync.  This has a
very small chance of returning wrong data.

Reported-by: Jordan Russell &lt;jr-list-2010@quo.to&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: deal with merge_bvec_fn in component devices better.</title>
<updated>2010-04-26T14:48:03+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2010-03-31T01:07:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=11d0dfa8ce268e40b625e26537f21cf5b74007af'/>
<id>11d0dfa8ce268e40b625e26537f21cf5b74007af</id>
<content type='text'>
commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_phys_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 627a2d3c29427637f4c5d31ccc7fcbd8d312cd71 upstream.

If a component device has a merge_bvec_fn then as we never call it
we must ensure we never need to.  Currently this is done by setting
max_sector to 1 PAGE, however this does not stop a bio being created
with several sub-page iovecs that would violate the merge_bvec_fn.

So instead set max_phys_segments to 1 and set the segment boundary to the
same as a page boundary to ensure there is only ever one single-page
segment of IO requested at a time.

This can particularly be an issue when 'xen' is used as it is
known to submit multiple small buffers in a single bio.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@suse.de&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md: add MODULE_DESCRIPTION for all md related modules.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0efb9e6191e1d3d34c1db90b829b742bc36d532e'/>
<id>0efb9e6191e1d3d34c1db90b829b742bc36d532e</id>
<content type='text'>
Suggested by  Oren Held &lt;orenhe@il.ibm.com&gt;

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Suggested by  Oren Held &lt;orenhe@il.ibm.com&gt;

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid: improve MD/raid10 handling of correctable read errors.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>Robert Becker</name>
<email>Rob.Becker@riverbed.com</email>
</author>
<published>2009-12-14T01:49:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1e50915fe0bbf7a46db0fa7e1e604d3fc95f057d'/>
<id>1e50915fe0bbf7a46db0fa7e1e604d3fc95f057d</id>
<content type='text'>
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev-&gt;read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev-&gt;read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs-&gt;fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev-&gt;read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev-&gt;read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs-&gt;fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: print more useful messages on device failure.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>Robert Becker</name>
<email>Rob.Becker@riverbed.com</email>
</author>
<published>2009-12-14T01:49:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=67b8dc4b06b0e97df55fd76e209f34f9a52e820e'/>
<id>67b8dc4b06b0e97df55fd76e209f34f9a52e820e</id>
<content type='text'>
When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we get a read error on a device in a RAID10, and attempting to
repair the error fails, print more useful messages about why it
failed.

Signed-off-by: Robert Becker &lt;Rob.Becker@riverbed.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: remove needless setting of thread-&gt;timeout in raid10_quiesce</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9cd30fdc33cde9ae4ac55a1ccbbb89f3f7b9b2f2'/>
<id>9cd30fdc33cde9ae4ac55a1ccbbb89f3f7b9b2f2</id>
<content type='text'>
As bitmap_create and bitmap_destroy already set thread-&gt;timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As bitmap_create and bitmap_destroy already set thread-&gt;timeout
as appropriate, there is no need to do it in raid10_quiesce.
There is a possible need to wake the thread after the timeout
has been set low, but it is better to do that where the timeout
is actually set low, in bitmap_create.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: change daemon_sleep to be in 'jiffies' rather than 'seconds'.</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1b04be96f6910ee415287bf0d5309c7d4c94bd2b'/>
<id>1b04be96f6910ee415287bf0d5309c7d4c94bd2b</id>
<content type='text'>
This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This removes a lot of multiplications by HZ.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: move offset, daemon_sleep and chunksize out of bitmap structure</title>
<updated>2009-12-14T01:51:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=42a04b5078ce73a32f85762551d5703c5bd646a1'/>
<id>42a04b5078ce73a32f85762551d5703c5bd646a1</id>
<content type='text'>
... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
... and into bitmap_info.  These are all configuration parameters
that need to be set before the bitmap is created.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: support barrier requests on all personalities.</title>
<updated>2009-12-14T01:49:49+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2009-12-14T01:49:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a2826aa92e2e14db372eda01d333267258944033'/>
<id>a2826aa92e2e14db372eda01d333267258944033</id>
<content type='text'>
Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev-&gt;barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Andre Noll &lt;maan@systemlinux.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously barriers were only supported on RAID1.  This is because
other levels requires synchronisation across all devices and so needed
a different approach.
Here is that approach.

When a barrier arrives, we send a zero-length barrier to every active
device.  When that completes - and if the original request was not
empty -  we submit the barrier request itself (with the barrier flag
cleared) and then submit a fresh load of zero length barriers.

The barrier request itself is asynchronous, but any subsequent
request will block until the barrier completes.

The reason for clearing the barrier flag is that a barrier request is
allowed to fail.  If we pass a non-empty barrier through a striping
raid level it is conceivable that part of it could succeed and part
could fail.  That would be way too hard to deal with.
So if the first run of zero length barriers succeed, we assume all is
sufficiently well that we send the request and ignore errors in the
second run of barriers.

RAID5 needs extra care as write requests may not have been submitted
to the underlying devices yet.  So we flush the stripe cache before
proceeding with the barrier.

Note that the second set of zero-length barriers are submitted
immediately after the original request is submitted.  Thus when
a personality finds mddev-&gt;barrier to be set during make_request,
it should not return from make_request until the corresponding
per-device request(s) have been queued.

That will be done in later patches.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Reviewed-by: Andre Noll &lt;maan@systemlinux.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
