<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/md/raid5.c, branch v3.18</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>md: remove unwanted white space from md.c</title>
<updated>2014-10-14T02:08:29+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-09-30T04:23:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f72ffdd68616e3697bc782b21c82197aeb480fd5'/>
<id>f72ffdd68616e3697bc782b21c82197aeb480fd5</id>
<content type='text'>
My editor shows much of this is RED.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
My editor shows much of this is RED.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: fix init_stripe() inconsistencies</title>
<updated>2014-10-14T02:08:28+00:00</updated>
<author>
<name>Markus Stockhausen</name>
<email>stockhausen@collogia.de</email>
</author>
<published>2014-08-23T10:19:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b8e6a15a1af9b1c203002e7768e60136c4e0e5c6'/>
<id>b8e6a15a1af9b1c203002e7768e60136c4e0e5c6</id>
<content type='text'>
raid5: fix init_stripe() inconsistencies

1) remove_hash() is not necessary. We will only be called right after
   get_free_stripe(). There we have already a call to remove_hash().

2) Tracing prints out the sector of the freed stripe and not the sector
   that we want to initialize.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
raid5: fix init_stripe() inconsistencies

1) remove_hash() is not necessary. We will only be called right after
   get_free_stripe(). There we have already a call to remove_hash().

2) Tracing prints out the sector of the freed stripe and not the sector
   that we want to initialize.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: use set_bit/clear_bit instead of shift/mask for bi_flags changes.</title>
<updated>2014-10-08T23:07:04+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-08-23T10:19:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3fd83717e47687817f5d3e45696bf22456d8b422'/>
<id>3fd83717e47687817f5d3e45696bf22456d8b422</id>
<content type='text'>
Using {set,clear}_bit is more consistent than shifting and masking.

No functional change.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using {set,clear}_bit is more consistent than shifting and masking.

No functional change.

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: disable 'DISCARD' by default due to safety concerns.</title>
<updated>2014-10-02T03:45:00+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-10-02T03:45:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8e0e99ba64c7ba46133a7c8a3e3f7de01f23bd93'/>
<id>8e0e99ba64c7ba46133a7c8a3e3f7de01f23bd93</id>
<content type='text'>
It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint.  Some devices in some cases don't do what it
says on the label.

The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero.  If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks.  If all were zero, this would
be the case.  If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.

As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.

As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.

If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
    raid456.devices_handle_discard_safely=Y

As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7

Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Cc: stable@vger.kernel.org (3.7+)
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint.  Some devices in some cases don't do what it
says on the label.

The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero.  If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks.  If all were zero, this would
be the case.  If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.

As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.

As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.

If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
    raid456.devices_handle_discard_safely=Y

As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7

Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: "Martin K. Petersen" &lt;martin.petersen@oracle.com&gt;
Cc: Mike Snitzer &lt;snitzer@redhat.com&gt;
Cc: Heinz Mauelshagen &lt;heinzm@redhat.com&gt;
Cc: stable@vger.kernel.org (3.7+)
Acked-by: Martin K. Petersen &lt;martin.petersen@oracle.com&gt;
Acked-by: Mike Snitzer &lt;snitzer@redhat.com&gt;
Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid6: avoid data corruption during recovery of double-degraded RAID6</title>
<updated>2014-08-18T04:49:46+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-08-12T23:57:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9c4bdf697c39805078392d5ddbbba5ae5680e0dd'/>
<id>9c4bdf697c39805078392d5ddbbba5ae5680e0dd</id>
<content type='text'>
During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Cc: stable@vger.kernel.org (2.6.32+)
Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov &lt;yur@emcraft.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reported-by: "Manibalan P" &lt;pmanibalan@amiindia.co.in&gt;
Tested-by: "Manibalan P" &lt;pmanibalan@amiindia.co.in&gt;
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Cc: stable@vger.kernel.org (2.6.32+)
Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov &lt;yur@emcraft.com&gt;
Cc: Dan Williams &lt;dan.j.williams@intel.com&gt;
Reported-by: "Manibalan P" &lt;pmanibalan@amiindia.co.in&gt;
Tested-by: "Manibalan P" &lt;pmanibalan@amiindia.co.in&gt;
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Acked-by: Dan Williams &lt;dan.j.williams@intel.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: avoid livelock caused by non-aligned writes.</title>
<updated>2014-08-18T04:49:41+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2014-08-12T23:48:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a40687ff73a5b14909d6aa522f7d778b158911c5'/>
<id>a40687ff73a5b14909d6aa522f7d778b158911c5</id>
<content type='text'>
If a stripe in a raid6 array received a write to each data block while
the array is degraded, and if any of these writes to a missing device
are not page-aligned, then a live-lock happens.

In this case the P and Q blocks need to be read so that the part of
the missing block which is *not* being updated by the write can be
constructed.  Due to a logic error, these blocks are not loaded, so
the update cannot proceed and the stripe is 'handled' repeatedly in an
infinite loop.

This bug is unlikely as most writes are page aligned.  However as it
can lead to a livelock it is suitable for -stable.  It was introduced
in 3.16.

Cc: stable@vger.kernel.org (v3.16)
Fixed: 67f455486d2ea20b2d94d6adf5b9b783d079e321
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a stripe in a raid6 array received a write to each data block while
the array is degraded, and if any of these writes to a missing device
are not page-aligned, then a live-lock happens.

In this case the P and Q blocks need to be read so that the part of
the missing block which is *not* being updated by the write can be
constructed.  Due to a logic error, these blocks are not loaded, so
the update cannot proceed and the stripe is 'handled' repeatedly in an
infinite loop.

This bug is unlikely as most writes are page aligned.  However as it
can lead to a livelock it is suitable for -stable.  It was introduced
in 3.16.

Cc: stable@vger.kernel.org (v3.16)
Fixed: 67f455486d2ea20b2d94d6adf5b9b783d079e321
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'md/3.16' of git://neil.brown.name/md</title>
<updated>2014-06-11T15:33:41+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2014-06-11T15:33:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8d0304e69dc960ae7683943ac5b9c4c685d409d7'/>
<id>8d0304e69dc960ae7683943ac5b9c4c685d409d7</id>
<content type='text'>
Pull md updates from Neil Brown:
 "Assorted md fixes for 3.16

  Mostly performance improvements with a few corner-case bug fixes"

* tag 'md/3.16' of git://neil.brown.name/md:
  raid5: speedup sync_request processing
  md/raid5: deadlock between retry_aligned_read with barrier io
  raid5: add an option to avoid copy data from bio to stripe cache
  md/bitmap: remove confusing code from filemap_get_page.
  raid5: avoid release list until last reference of the stripe
  md: md_clear_badblocks should return an error code on failure.
  md/raid56: Don't perform reads to support writes until stripe is ready.
  md: refuse to change shape of array if it is active but read-only
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull md updates from Neil Brown:
 "Assorted md fixes for 3.16

  Mostly performance improvements with a few corner-case bug fixes"

* tag 'md/3.16' of git://neil.brown.name/md:
  raid5: speedup sync_request processing
  md/raid5: deadlock between retry_aligned_read with barrier io
  raid5: add an option to avoid copy data from bio to stripe cache
  md/bitmap: remove confusing code from filemap_get_page.
  raid5: avoid release list until last reference of the stripe
  md: md_clear_badblocks should return an error code on failure.
  md/raid56: Don't perform reads to support writes until stripe is ready.
  md: refuse to change shape of array if it is active but read-only
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: speedup sync_request processing</title>
<updated>2014-06-10T01:02:01+00:00</updated>
<author>
<name>Eivind Sarto</name>
<email>eivindsarto@gmail.com</email>
</author>
<published>2014-06-10T00:06:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=053f5b6525ae32da397e6c47721961f800d2c924'/>
<id>053f5b6525ae32da397e6c47721961f800d2c924</id>
<content type='text'>
The raid5 sync_request() processing calls handle_stripe() within the context of
the resync-thread.  The resync-thread issues the first set of read requests
and this adds execution latency and slows down the scheduling of the next
sync_request().
The current rebuild/resync speed of raid5 is not much faster than what
rotational HDDs can sustain.
Testing the following patch on a 6-drive array, I can increase the rebuild
speed from 100 MB/s to 175 MB/s.
The sync_request() now just sets STRIPE_HANDLE and releases the stripe.  This
creates some more parallelism between the resync-thread and raid5 kernel daemon.

Signed-off-by: Eivind Sarto &lt;esarto@fusionio.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The raid5 sync_request() processing calls handle_stripe() within the context of
the resync-thread.  The resync-thread issues the first set of read requests
and this adds execution latency and slows down the scheduling of the next
sync_request().
The current rebuild/resync speed of raid5 is not much faster than what
rotational HDDs can sustain.
Testing the following patch on a 6-drive array, I can increase the rebuild
speed from 100 MB/s to 175 MB/s.
The sync_request() now just sets STRIPE_HANDLE and releases the stripe.  This
creates some more parallelism between the resync-thread and raid5 kernel daemon.

Signed-off-by: Eivind Sarto &lt;esarto@fusionio.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: deadlock between retry_aligned_read with barrier io</title>
<updated>2014-06-05T07:18:19+00:00</updated>
<author>
<name>hui jiao</name>
<email>simonjiaoh@gmail.com</email>
</author>
<published>2014-06-05T03:34:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2844dc32ea67044b345221067207ce67ffe8da76'/>
<id>2844dc32ea67044b345221067207ce67ffe8da76</id>
<content type='text'>
A chunk aligned read increases counter active_aligned_reads and
decreases it after sub-device handle it successfully. But when a read
error occurs,  the read redispatched by raid5d, and the
active_aligned_reads will not be decreased until we can grab a stripe
head in retry_aligned_read. Now suppose, a barrier io comes, set
conf-&gt;quiesce to 2, and wait until both active_stripes and
active_aligned_reads are zero. The retried chunk aligned read gets
stuck at get_active_stripe waiting until conf-&gt;quiesce becomes 0.
Retry_aligned_read and barrier io are waiting each other now.
One possible solution is that we ignore conf-&gt;quiesce, let the retried
aligned read finish. I reproduced this deadlock and test this patch on
centos6.0

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A chunk aligned read increases counter active_aligned_reads and
decreases it after sub-device handle it successfully. But when a read
error occurs,  the read redispatched by raid5d, and the
active_aligned_reads will not be decreased until we can grab a stripe
head in retry_aligned_read. Now suppose, a barrier io comes, set
conf-&gt;quiesce to 2, and wait until both active_stripes and
active_aligned_reads are zero. The retried chunk aligned read gets
stuck at get_active_stripe waiting until conf-&gt;quiesce becomes 0.
Retry_aligned_read and barrier io are waiting each other now.
One possible solution is that we ignore conf-&gt;quiesce, let the retried
aligned read finish. I reproduced this deadlock and test this patch on
centos6.0

Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: add an option to avoid copy data from bio to stripe cache</title>
<updated>2014-05-29T06:59:47+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@kernel.org</email>
</author>
<published>2014-05-21T09:57:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d592a9969141e67a3874c808999a4db4bf82ed83'/>
<id>d592a9969141e67a3874c808999a4db4bf82ed83</id>
<content type='text'>
The stripe cache has two goals:
1. cache data, so next time if data can be found in stripe cache, disk access
can be avoided.
2. stable data. data is copied from bio to stripe cache and calculated parity.
data written to disk is from stripe cache, so if upper layer changes bio data,
data written to disk isn't impacted.

In my environment, I can guarantee 2 will not happen. And BDI_CAP_STABLE_WRITES
can guarantee 2 too. For 1, it's not common too. block plug mechanism will
dispatch a bunch of sequentail small requests together. And since I'm using
SSD, I'm using small chunk size. It's rare case stripe cache is really useful.

So I'd like to avoid the copy from bio to stripe cache and it's very helpful
for performance. In my 1M randwrite tests, avoid the copy can increase the
performance more than 30%.

Of course, this shouldn't be enabled by default. It's reported enabling
BDI_CAP_STABLE_WRITES can harm some workloads before, so I added an option to
control it.

Neilb:
  changed BUG_ON to WARN_ON
  Removed some assignments from raid5_build_block which are now not needed.

Signed-off-by: Shaohua Li &lt;shli@fusionio.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The stripe cache has two goals:
1. cache data, so next time if data can be found in stripe cache, disk access
can be avoided.
2. stable data. data is copied from bio to stripe cache and calculated parity.
data written to disk is from stripe cache, so if upper layer changes bio data,
data written to disk isn't impacted.

In my environment, I can guarantee 2 will not happen. And BDI_CAP_STABLE_WRITES
can guarantee 2 too. For 1, it's not common too. block plug mechanism will
dispatch a bunch of sequentail small requests together. And since I'm using
SSD, I'm using small chunk size. It's rare case stripe cache is really useful.

So I'd like to avoid the copy from bio to stripe cache and it's very helpful
for performance. In my 1M randwrite tests, avoid the copy can increase the
performance more than 30%.

Of course, this shouldn't be enabled by default. It's reported enabling
BDI_CAP_STABLE_WRITES can harm some workloads before, so I added an option to
control it.

Neilb:
  changed BUG_ON to WARN_ON
  Removed some assignments from raid5_build_block which are now not needed.

Signed-off-by: Shaohua Li &lt;shli@fusionio.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
