<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/md/raid5.c, branch linux-4.5.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list</title>
<updated>2016-04-12T14:33:39+00:00</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.com</email>
</author>
<published>2016-03-09T01:58:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ea43a45d9885879257d985ff139cbb999ff3e830'/>
<id>ea43a45d9885879257d985ff139cbb999ff3e830</id>
<content type='text'>
commit 550da24f8d62fe81f3c13e3ec27602d6e44d43dc upstream.

break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE.  This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another.  This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf-&gt;preread_active_stripes.  When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero.  So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
Reported-by: Martin Svec &lt;martin.svec@zoner.cz&gt; (and others)
Tested-by: Tom Weber &lt;linux@junkyard.4t2.com&gt;
Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 550da24f8d62fe81f3c13e3ec27602d6e44d43dc upstream.

break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE.  This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another.  This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf-&gt;preread_active_stripes.  When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero.  So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
Reported-by: Martin Svec &lt;martin.svec@zoner.cz&gt; (and others)
Tested-by: Tom Weber &lt;linux@junkyard.4t2.com&gt;
Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>RAID5: revert e9e4c377e2f563 to fix a livelock</title>
<updated>2016-04-12T14:33:38+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-02-26T00:24:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6e3bd6cee9b6f9dab7d175dc19245c0ccbe2376b'/>
<id>6e3bd6cee9b6f9dab7d175dc19245c0ccbe2376b</id>
<content type='text'>
commit 6ab2a4b806ae21b6c3e47c5ff1285ec06d505325 upstream.

Revert commit
e9e4c377e2f563(md/raid5: per hash value and exclusive wait_for_stripe)

The problem is raid5_get_active_stripe waits on
conf-&gt;wait_for_stripe[hash]. Assume hash is 0. My test release stripes
in this order:
- release all stripes with hash 0
- raid5_get_active_stripe still sleeps since active_stripes &gt;
  max_nr_stripes * 3 / 4
- release all stripes with hash other than 0. active_stripes becomes 0
- raid5_get_active_stripe still sleeps, since nobody wakes up
  wait_for_stripe[0]
The system live locks. The problem is active_stripes isn't a per-hash
count. Revert the patch makes the live lock go away.

Cc: Yuanhan Liu &lt;yuanhan.liu@linux.intel.com&gt;
Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6ab2a4b806ae21b6c3e47c5ff1285ec06d505325 upstream.

Revert commit
e9e4c377e2f563(md/raid5: per hash value and exclusive wait_for_stripe)

The problem is raid5_get_active_stripe waits on
conf-&gt;wait_for_stripe[hash]. Assume hash is 0. My test release stripes
in this order:
- release all stripes with hash 0
- raid5_get_active_stripe still sleeps since active_stripes &gt;
  max_nr_stripes * 3 / 4
- release all stripes with hash other than 0. active_stripes becomes 0
- raid5_get_active_stripe still sleeps, since nobody wakes up
  wait_for_stripe[0]
The system live locks. The problem is active_stripes isn't a per-hash
count. Revert the patch makes the live lock go away.

Cc: Yuanhan Liu &lt;yuanhan.liu@linux.intel.com&gt;
Cc: NeilBrown &lt;neilb@suse.de&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>RAID5: check_reshape() shouldn't call mddev_suspend</title>
<updated>2016-04-12T14:33:38+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-02-25T01:38:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=05961902a0a56dd7531671a13e0fef1a5b01d1e8'/>
<id>05961902a0a56dd7531671a13e0fef1a5b01d1e8</id>
<content type='text'>
commit 27a353c026a879a1001e5eac4bda75b16262c44a upstream.

check_reshape() is called from raid5d thread. raid5d thread shouldn't
call mddev_suspend(), because mddev_suspend() waits for all IO finish
but IO is handled in raid5d thread, we could easily deadlock here.

This issue is introduced by
738a273 ("md/raid5: fix allocation of 'scribble' array.")

Reported-and-tested-by: Artur Paszkiewicz &lt;artur.paszkiewicz@intel.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 27a353c026a879a1001e5eac4bda75b16262c44a upstream.

check_reshape() is called from raid5d thread. raid5d thread shouldn't
call mddev_suspend(), because mddev_suspend() waits for all IO finish
but IO is handled in raid5d thread, we could easily deadlock here.

This issue is introduced by
738a273 ("md/raid5: fix allocation of 'scribble' array.")

Reported-and-tested-by: Artur Paszkiewicz &lt;artur.paszkiewicz@intel.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: Compare apples to apples (or sectors to sectors)</title>
<updated>2016-04-12T14:33:38+00:00</updated>
<author>
<name>Jes Sorensen</name>
<email>Jes.Sorensen@redhat.com</email>
</author>
<published>2016-02-16T21:44:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e7ffd7dbb5bc025c6d63416b8acba2b162f29bcb'/>
<id>e7ffd7dbb5bc025c6d63416b8acba2b162f29bcb</id>
<content type='text'>
commit e7597e69dec59b65c5525db1626b9d34afdfa678 upstream.

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf8 ("MD: raid5 trim support")
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e7597e69dec59b65c5525db1626b9d34afdfa678 upstream.

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf8 ("MD: raid5 trim support")
Signed-off-by: Jes Sorensen &lt;Jes.Sorensen@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>MD: rename some functions</title>
<updated>2016-01-20T21:52:20+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-01-20T21:52:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=849674e4fb175e47b7504249f7367367b18fe6a1'/>
<id>849674e4fb175e47b7504249f7367367b18fe6a1</id>
<content type='text'>
These short function names are hard to search. Rename them to make vim happy.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
These short function names are hard to search. Rename them to make vim happy.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5-cache: add journal hot add/remove support</title>
<updated>2016-01-06T00:39:57+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-12-20T23:51:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f6b6ec5cfac306c1eea66f074050864efcb11851'/>
<id>f6b6ec5cfac306c1eea66f074050864efcb11851</id>
<content type='text'>
Add support for journal disk hot add/remove. Mostly trival checks in md
part. The raid5 part is a little tricky. For hot-remove, we can't wait
pending write as it's called from raid5d. The wait will cause deadlock.
We simplily fail the hot-remove. A hot-remove retry can success
eventually since if journal disk is faulty all pending write will be
failed and finish. For hot-add, since an array supporting journal but
without journal disk will be marked read-only, we are safe to hot add
journal without stopping IO (should be read IO, while journal only
handles write IO).

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support for journal disk hot add/remove. Mostly trival checks in md
part. The raid5 part is a little tricky. For hot-remove, we can't wait
pending write as it's called from raid5d. The wait will cause deadlock.
We simplily fail the hot-remove. A hot-remove retry can success
eventually since if journal disk is faulty all pending write will be
failed and finish. For hot-add, since an array supporting journal but
without journal disk will be marked read-only, we are safe to hot add
journal without stopping IO (should be read IO, while journal only
handles write IO).

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid5: remove redundant check in stripe_add_to_batch_list()</title>
<updated>2016-01-06T00:38:22+00:00</updated>
<author>
<name>Roman Gushchin</name>
<email>klamm@yandex-team.ru</email>
</author>
<published>2015-12-20T23:50:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b46020aa3a8a0f9c7324fe0af4aec4227f947a10'/>
<id>b46020aa3a8a0f9c7324fe0af4aec4227f947a10</id>
<content type='text'>
The stripe_add_to_batch_list() function is called only if
stripe_can_batch() returned true, so there is no need for double check.

Signed-off-by: Roman Gushchin &lt;klamm@yandex-team.ru&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The stripe_add_to_batch_list() function is called only if
stripe_can_batch() returned true, so there is no need for double check.

Signed-off-by: Roman Gushchin &lt;klamm@yandex-team.ru&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Cc: linux-raid@vger.kernel.org
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'md/4.4' of git://neil.brown.name/md</title>
<updated>2015-11-05T05:12:47+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-11-05T05:12:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ac322de6bf5416cb145b58599297b8be73cd86ac'/>
<id>ac322de6bf5416cb145b58599297b8be73cd86ac</id>
<content type='text'>
Pull md updates from Neil Brown:
 "Two major components to this update.

   1) The clustered-raid1 support from SUSE is nearly complete.  There
      are a few outstanding issues being worked on.  Maybe half a dozen
      patches will bring this to a usable state.

   2) The first stage of journalled-raid5 support from Facebook makes an
      appearance.  With a journal device configured (typically NVRAM or
      SSD), the "RAID5 write hole" should be closed - a crash during
      degraded operations cannot result in data corruption.

      The next stage will be to use the journal as a write-behind cache
      so that latency can be reduced and in some cases throughput
      increased by performing more full-stripe writes.

* tag 'md/4.4' of git://neil.brown.name/md: (66 commits)
  MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW
  MD: set journal disk -&gt;raid_disk
  MD: kick out journal disk if it's not fresh
  raid5-cache: start raid5 readonly if journal is missing
  MD: add new bit to indicate raid array with journal
  raid5-cache: IO error handling
  raid5: journal disk can't be removed
  raid5-cache: add trim support for log
  MD: fix info output for journal disk
  raid5-cache: use bio chaining
  raid5-cache: small log-&gt;seq cleanup
  raid5-cache: new helper: r5_reserve_log_entry
  raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
  raid5-cache: take rdev-&gt;data_offset into account early on
  raid5-cache: refactor bio allocation
  raid5-cache: clean up r5l_get_meta
  raid5-cache: simplify state machine when caches flushes are not needed
  raid5-cache: factor out a helper to run all stripes for an I/O unit
  raid5-cache: rename flushed_ios to finished_ios
  raid5-cache: free I/O units earlier
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull md updates from Neil Brown:
 "Two major components to this update.

   1) The clustered-raid1 support from SUSE is nearly complete.  There
      are a few outstanding issues being worked on.  Maybe half a dozen
      patches will bring this to a usable state.

   2) The first stage of journalled-raid5 support from Facebook makes an
      appearance.  With a journal device configured (typically NVRAM or
      SSD), the "RAID5 write hole" should be closed - a crash during
      degraded operations cannot result in data corruption.

      The next stage will be to use the journal as a write-behind cache
      so that latency can be reduced and in some cases throughput
      increased by performing more full-stripe writes.

* tag 'md/4.4' of git://neil.brown.name/md: (66 commits)
  MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW
  MD: set journal disk -&gt;raid_disk
  MD: kick out journal disk if it's not fresh
  raid5-cache: start raid5 readonly if journal is missing
  MD: add new bit to indicate raid array with journal
  raid5-cache: IO error handling
  raid5: journal disk can't be removed
  raid5-cache: add trim support for log
  MD: fix info output for journal disk
  raid5-cache: use bio chaining
  raid5-cache: small log-&gt;seq cleanup
  raid5-cache: new helper: r5_reserve_log_entry
  raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
  raid5-cache: take rdev-&gt;data_offset into account early on
  raid5-cache: refactor bio allocation
  raid5-cache: clean up r5l_get_meta
  raid5-cache: simplify state machine when caches flushes are not needed
  raid5-cache: factor out a helper to run all stripes for an I/O unit
  raid5-cache: rename flushed_ios to finished_ios
  raid5-cache: free I/O units earlier
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>MD: set journal disk -&gt;raid_disk</title>
<updated>2015-11-01T02:48:29+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-10-09T04:54:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f2076e7d0643d15b11db979acc7cffd2e8d69e77'/>
<id>f2076e7d0643d15b11db979acc7cffd2e8d69e77</id>
<content type='text'>
Set journal disk -&gt;raid_disk to &gt;=0, I choose raid_disks + 1 instead of
0, because we already have a disk with -&gt;raid_disk 0 and this causes
sysfs entry creation conflict. A lot of places assumes disk with
-&gt;raid_disk &gt;=0 is normal raid disk, so we add check for journal disk.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Set journal disk -&gt;raid_disk to &gt;=0, I choose raid_disks + 1 instead of
0, because we already have a disk with -&gt;raid_disk 0 and this causes
sysfs entry creation conflict. A lot of places assumes disk with
-&gt;raid_disk &gt;=0 is normal raid disk, so we add check for journal disk.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5-cache: start raid5 readonly if journal is missing</title>
<updated>2015-11-01T02:48:29+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2015-10-09T04:54:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7dde2ad3c5b4afb4b2544b864fa34dd1f4897ab6'/>
<id>7dde2ad3c5b4afb4b2544b864fa34dd1f4897ab6</id>
<content type='text'>
If raid array is expected to have journal (eg, journal is set in MD
superblock feature map) and the array is started without journal disk,
start the array readonly.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If raid array is expected to have journal (eg, journal is set in MD
superblock feature map) and the array is started without journal disk,
start the array readonly.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
