<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/md/raid5.c, branch linux-4.8.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>md/raid5: limit request size according to implementation limits</title>
<updated>2017-01-09T07:21:56+00:00</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2016-11-27T16:32:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b202064f0b3f92d4ce3feec09ad55a20e771947b'/>
<id>b202064f0b3f92d4ce3feec09ad55a20e771947b</id>
<content type='text'>
commit e8d7c33232e5fdfa761c3416539bc5b4acd12db5 upstream.

Current implementation employ 16bit counter of active stripes in lower
bits of bio-&gt;bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e8d7c33232e5fdfa761c3416539bc5b4acd12db5 upstream.

Current implementation employ 16bit counter of active stripes in lower
bits of bio-&gt;bi_phys_segments. If request is big enough to overflow
this counter bio will be completed and freed too early.

Fortunately this not happens in default configuration because several
other limits prevent that: stripe_cache_size * nr_disks effectively
limits count of active stripes. And small max_sectors_kb at lower
disks prevent that during normal read/write operations.

Overflow easily happens in discard if it's enabled by module parameter
"devices_handle_discard_safely" and stripe_cache_size is set big enough.

This patch limits requests size with 256Mb - 8Kb to prevent overflows.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Cc: Shaohua Li &lt;shli@kernel.org&gt;
Cc: Neil Brown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: fix a small race condition</title>
<updated>2016-09-09T18:09:19+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-09-08T17:43:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c94455558337eece474eebb6a16b905f98930418'/>
<id>c94455558337eece474eebb6a16b905f98930418</id>
<content type='text'>
commit 5f9d1fde7d54a5(raid5: fix memory leak of bio integrity data)
moves bio_reset to bio_endio. But it introduces a small race condition.
It does bio_reset after raid5_release_stripe, which could make the
stripe reusable and hence reuse the bio just before bio_reset. Moving
bio_reset before raid5_release_stripe is called should fix the race.

Reported-and-tested-by: Stefan Priebe - Profihost AG &lt;s.priebe@profihost.ag&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5f9d1fde7d54a5(raid5: fix memory leak of bio integrity data)
moves bio_reset to bio_endio. But it introduces a small race condition.
It does bio_reset after raid5_release_stripe, which could make the
stripe reusable and hence reuse the bio just before bio_reset. Moving
bio_reset before raid5_release_stripe is called should fix the race.

Reported-and-tested-by: Stefan Priebe - Profihost AG &lt;s.priebe@profihost.ag&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: guarantee enough stripes to avoid reshape hang</title>
<updated>2016-08-31T16:05:23+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-08-30T17:29:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ad5b0f7685dbfc4730987cd16af3c5ebe8133f10'/>
<id>ad5b0f7685dbfc4730987cd16af3c5ebe8133f10</id>
<content type='text'>
If there aren't enough stripes, reshape will hang. We have a check for
this in new reshape, but miss it for reshape resume, hence we could see
hang in reshape resume. This patch forces enough stripes existed if
reshape resumes.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If there aren't enough stripes, reshape will hang. We have a check for
this in new reshape, but miss it for reshape resume, hence we could see
hang in reshape resume. This patch forces enough stripes existed if
reshape resumes.

Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md</title>
<updated>2016-08-30T18:24:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-08-30T18:24:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=86a1679860babbacd61fc1e8c0c0f43641d5860d'/>
<id>86a1679860babbacd61fc1e8c0c0f43641d5860d</id>
<content type='text'>
Pull MD fixes from Shaohua Li:
 "This includes several bug fixes:

   - Alexey Obitotskiy fixed a hang for faulty raid5 array with external
     management

   - Song Liu fixed two raid5 journal related bugs

   - Tomasz Majchrzak fixed a bad block recording issue and an
     accounting issue for raid10

   - ZhengYuan Liu fixed an accounting issue for raid5

   - I fixed a potential race condition and memory leak with DIF/DIX
     enabled

   - other trival fixes"

* tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  raid5: avoid unnecessary bio data set
  raid5: fix memory leak of bio integrity data
  raid10: record correct address of bad block
  md-cluster: fix error return code in join()
  r5cache: set MD_JOURNAL_CLEAN correctly
  md: don't print the same repeated messages about delayed sync operation
  md: remove obsolete ret in md_start_sync
  md: do not count journal as spare in GET_ARRAY_INFO
  md: Prevent IO hold during accessing to faulty raid5 array
  MD: hold mddev lock to change bitmap location
  raid5: fix incorrectly counter of conf-&gt;empty_inactive_list_nr
  raid10: increment write counter after bio is split
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MD fixes from Shaohua Li:
 "This includes several bug fixes:

   - Alexey Obitotskiy fixed a hang for faulty raid5 array with external
     management

   - Song Liu fixed two raid5 journal related bugs

   - Tomasz Majchrzak fixed a bad block recording issue and an
     accounting issue for raid10

   - ZhengYuan Liu fixed an accounting issue for raid5

   - I fixed a potential race condition and memory leak with DIF/DIX
     enabled

   - other trival fixes"

* tag 'md/4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
  raid5: avoid unnecessary bio data set
  raid5: fix memory leak of bio integrity data
  raid10: record correct address of bad block
  md-cluster: fix error return code in join()
  r5cache: set MD_JOURNAL_CLEAN correctly
  md: don't print the same repeated messages about delayed sync operation
  md: remove obsolete ret in md_start_sync
  md: do not count journal as spare in GET_ARRAY_INFO
  md: Prevent IO hold during accessing to faulty raid5 array
  MD: hold mddev lock to change bitmap location
  raid5: fix incorrectly counter of conf-&gt;empty_inactive_list_nr
  raid10: increment write counter after bio is split
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: avoid unnecessary bio data set</title>
<updated>2016-08-24T17:21:53+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-08-23T04:14:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=45c91d808ff989d950e260dab9f89e8f4a3c9c2c'/>
<id>45c91d808ff989d950e260dab9f89e8f4a3c9c2c</id>
<content type='text'>
bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to
set them every time. bi_private will be set before the bio is
dispatched.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bio_reset doesn't change bi_io_vec and bi_max_vecs, so we don't need to
set them every time. bi_private will be set before the bio is
dispatched.

Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: fix memory leak of bio integrity data</title>
<updated>2016-08-24T17:21:52+00:00</updated>
<author>
<name>Shaohua Li</name>
<email>shli@fb.com</email>
</author>
<published>2016-08-23T04:14:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5f9d1fde7d54a5d5fd8cccbee9c9c31474fcdcf2'/>
<id>5f9d1fde7d54a5d5fd8cccbee9c9c31474fcdcf2</id>
<content type='text'>
Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5
doesn't alloc/free bio, instead it reuses bios. There are two issues in
current code:
1. the code calls bio_init (from
init_stripe-&gt;raid5_build_block-&gt;bio_init) then bio_reset (ops_run_io).
The bio is reused, so likely there is integrity data attached. bio_init
will clear a pointer to integrity data and makes bio_reset can't release
the data
2. bio_reset is called before dispatching bio. After bio is finished,
it's possible we don't free bio's integrity data (eg, we don't call
bio_reset again)
Both issues will cause memory leak. The patch moves bio_init to stripe
creation and bio_reset to bio end io. This will fix the two issues.

Reported-by: Yi Zhang &lt;yizhan@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Yi reported a memory leak of raid5 with DIF/DIX enabled disks. raid5
doesn't alloc/free bio, instead it reuses bios. There are two issues in
current code:
1. the code calls bio_init (from
init_stripe-&gt;raid5_build_block-&gt;bio_init) then bio_reset (ops_run_io).
The bio is reused, so likely there is integrity data attached. bio_init
will clear a pointer to integrity data and makes bio_reset can't release
the data
2. bio_reset is called before dispatching bio. After bio is finished,
it's possible we don't free bio's integrity data (eg, we don't call
bio_reset again)
Both issues will cause memory leak. The patch moves bio_init to stripe
creation and bio_reset to bio end io. This will fix the two issues.

Reported-by: Yi Zhang &lt;yizhan@redhat.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>r5cache: set MD_JOURNAL_CLEAN correctly</title>
<updated>2016-08-24T17:21:50+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2016-08-19T22:34:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=486b0f7bcd64be027535811ef44195bc1027fbd3'/>
<id>486b0f7bcd64be027535811ef44195bc1027fbd3</id>
<content type='text'>
Currently, the code sets MD_JOURNAL_CLEAN when the array has
MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array
will be MD_JOURNAL_CLEAN even if the journal device is missing.

With this patch, the MD_JOURNAL_CLEAN is only set when the journal
device presents.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, the code sets MD_JOURNAL_CLEAN when the array has
MD_FEATURE_JOURNAL and the recovery_cp is MaxSector. The array
will be MD_JOURNAL_CLEAN even if the journal device is missing.

With this patch, the MD_JOURNAL_CLEAN is only set when the journal
device presents.

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>block: rename bio bi_rw to bi_opf</title>
<updated>2016-08-07T20:41:02+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@fb.com</email>
</author>
<published>2016-08-05T21:35:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1eff9d322a444245c67515edb52bc0eb68374aa8'/>
<id>1eff9d322a444245c67515edb52bc0eb68374aa8</id>
<content type='text'>
Since commit 63a4cc24867d, bio-&gt;bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since commit 63a4cc24867d, bio-&gt;bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe &lt;axboe@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Prevent IO hold during accessing to faulty raid5 array</title>
<updated>2016-08-06T05:03:10+00:00</updated>
<author>
<name>Alexey Obitotskiy</name>
<email>aleksey.obitotskiy@intel.com</email>
</author>
<published>2016-08-03T08:02:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=11367799f3d12a5074c4a3c0fa4ea8da2a21a2a4'/>
<id>11367799f3d12a5074c4a3c0fa4ea8da2a21a2a4</id>
<content type='text'>
After array enters in faulty state (e.g. number of failed drives
becomes more then accepted for raid5 level) it sets error flags
(one of this flags is MD_CHANGE_PENDING). For internal metadata
arrays MD_CHANGE_PENDING cleared into md_update_sb, but not for
external metadata arrays. MD_CHANGE_PENDING flag set prevents to
finish all new or non-finished IOs to array and hold them in
pending state. In some cases this can leads to deadlock situation.

For example, we have faulty array (2 of 4 drives failed) and
udev handle array state changes and blkid started (or other
userspace application that used array to read/write) but unable
to finish reads due to IO hold. At the same time we unable to get
exclusive access to array (to stop array in our case) because
another external application still use this array.

Fix makes possible to return IO with errors immediately.
So external application can finish working with array and
give exclusive access to other applications to perform
required management actions with array.

Signed-off-by: Alexey Obitotskiy &lt;aleksey.obitotskiy@intel.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After array enters in faulty state (e.g. number of failed drives
becomes more then accepted for raid5 level) it sets error flags
(one of this flags is MD_CHANGE_PENDING). For internal metadata
arrays MD_CHANGE_PENDING cleared into md_update_sb, but not for
external metadata arrays. MD_CHANGE_PENDING flag set prevents to
finish all new or non-finished IOs to array and hold them in
pending state. In some cases this can leads to deadlock situation.

For example, we have faulty array (2 of 4 drives failed) and
udev handle array state changes and blkid started (or other
userspace application that used array to read/write) but unable
to finish reads due to IO hold. At the same time we unable to get
exclusive access to array (to stop array in our case) because
another external application still use this array.

Fix makes possible to return IO with errors immediately.
So external application can finish working with array and
give exclusive access to other applications to perform
required management actions with array.

Signed-off-by: Alexey Obitotskiy &lt;aleksey.obitotskiy@intel.com&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>raid5: fix incorrectly counter of conf-&gt;empty_inactive_list_nr</title>
<updated>2016-08-02T03:18:21+00:00</updated>
<author>
<name>ZhengYuan Liu</name>
<email>liuzhengyuan@kylinos.cn</email>
</author>
<published>2016-07-28T06:22:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ff00d3b4e5e4395c825e8ec628f25932d812f31a'/>
<id>ff00d3b4e5e4395c825e8ec628f25932d812f31a</id>
<content type='text'>
The counter conf-&gt;empty_inactive_list_nr is only used for determine if the
raid5 is congested which is deal with in function raid5_congested().
It was increased in get_free_stripe() when conf-&gt;inactive_list got to be
empty and decreased in release_inactive_stripe_list() when splice
temp_inactive_list to conf-&gt;inactive_list. However, this may have a
problem when raid5_get_active_stripe or stripe_add_to_batch_list was called,
because these two functions may call list_del_init(&amp;sh-&gt;lru) to delete sh from
"conf-&gt;inactive_list + hash" which may cause "conf-&gt;inactive_list + hash" to
be empty when atomic_inc_not_zero(&amp;sh-&gt;count) got false. So a check should be
done at these two point and increase empty_inactive_list_nr accordingly.
Otherwise the counter may get to be negative number which would influence
async readahead from VFS.

Signed-off-by: ZhengYuan Liu &lt;liuzhengyuan@kylinos.cn&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The counter conf-&gt;empty_inactive_list_nr is only used for determine if the
raid5 is congested which is deal with in function raid5_congested().
It was increased in get_free_stripe() when conf-&gt;inactive_list got to be
empty and decreased in release_inactive_stripe_list() when splice
temp_inactive_list to conf-&gt;inactive_list. However, this may have a
problem when raid5_get_active_stripe or stripe_add_to_batch_list was called,
because these two functions may call list_del_init(&amp;sh-&gt;lru) to delete sh from
"conf-&gt;inactive_list + hash" which may cause "conf-&gt;inactive_list + hash" to
be empty when atomic_inc_not_zero(&amp;sh-&gt;count) got false. So a check should be
done at these two point and increase empty_inactive_list_nr accordingly.
Otherwise the counter may get to be negative number which would influence
async readahead from VFS.

Signed-off-by: ZhengYuan Liu &lt;liuzhengyuan@kylinos.cn&gt;
Signed-off-by: Shaohua Li &lt;shli@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
