<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/md/raid10.c, branch v6.7</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>md: initialize 'writes_pending' while allocating mddev</title>
<updated>2023-09-22T17:28:26+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-08-25T03:09:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b8494823e236326500aa1004155e83f748dd10da'/>
<id>b8494823e236326500aa1004155e83f748dd10da</id>
<content type='text'>
Currently 'writes_pending' is initialized in pers-&gt;run for raid1/5/10,
and it's freed while deleing mddev, instead of pers-&gt;free. pers-&gt;run can
be called multiple times before mddev is deleted, and a helper
mddev_init_writes_pending() is used to prevent 'writes_pending' to be
initialized multiple times, this usage is safe but a litter weird.

On the other hand, 'writes_pending' is only initialized for raid1/5/10,
however, it's used in common layer, for example:

array_state_store
 set_in_sync
  if (!mddev-&gt;in_sync) -&gt; in_sync is used for all levels
   // access writes_pending

There might be some implicit dependency that I don't recognized to make
sure 'writes_pending' can only be accessed for raid1/5/10, but there are
no comments about that.

By the way, it make sense to initialize 'writes_pending' in common layer
because there are already three levels use it.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230825030956.1527023-3-yukuai1@huaweicloud.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently 'writes_pending' is initialized in pers-&gt;run for raid1/5/10,
and it's freed while deleing mddev, instead of pers-&gt;free. pers-&gt;run can
be called multiple times before mddev is deleted, and a helper
mddev_init_writes_pending() is used to prevent 'writes_pending' to be
initialized multiple times, this usage is safe but a litter weird.

On the other hand, 'writes_pending' is only initialized for raid1/5/10,
however, it's used in common layer, for example:

array_state_store
 set_in_sync
  if (!mddev-&gt;in_sync) -&gt; in_sync is used for all levels
   // access writes_pending

There might be some implicit dependency that I don't recognized to make
sure 'writes_pending' can only be accessed for raid1/5/10, but there are
no comments about that.

By the way, it make sense to initialize 'writes_pending' in common layer
because there are already three levels use it.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230825030956.1527023-3-yukuai1@huaweicloud.com
</pre>
</div>
</content>
</entry>
<entry>
<title>md: Hold mddev-&gt;reconfig_mutex when trying to get mddev-&gt;sync_thread</title>
<updated>2023-08-15T16:40:26+00:00</updated>
<author>
<name>Li Lingfeng</name>
<email>lilingfeng3@huawei.com</email>
</author>
<published>2023-08-03T07:17:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7eb8ff02c1df279bf7f7f29b866beb655a9eebe9'/>
<id>7eb8ff02c1df279bf7f7f29b866beb655a9eebe9</id>
<content type='text'>
Commit ba9d9f1a707f ("Revert "md: unlock mddev before reap sync_thread in
action_store"") removed the scenario of calling md_unregister_thread()
without holding mddev-&gt;reconfig_mutex, so add a lock holding check before
acquiring mddev-&gt;sync_thread by passing mdev to md_unregister_thread().

Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230803071711.2546560-1-lilingfeng@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit ba9d9f1a707f ("Revert "md: unlock mddev before reap sync_thread in
action_store"") removed the scenario of calling md_unregister_thread()
without holding mddev-&gt;reconfig_mutex, so add a lock holding check before
acquiring mddev-&gt;sync_thread by passing mdev to md_unregister_thread().

Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230803071711.2546560-1-lilingfeng@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: fix a 'conf-&gt;barrier' leakage in raid10_takeover()</title>
<updated>2023-08-15T16:39:48+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-07-31T02:28:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=892da88d1cd93426e9c6d7717876ca705fe2b9fa'/>
<id>892da88d1cd93426e9c6d7717876ca705fe2b9fa</id>
<content type='text'>
After commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()"),
'conf-&gt;barrier' will be leaked in the case that raid10 takeover raid0:

level_store
 pers-&gt;takeover -&gt; raid10_takeover
  raid10_takeover_raid0
   WRITE_ONCE(conf-&gt;barrier, 1)

mddev_suspend
// still raid0
mddev-&gt;pers = pers
// switch to raid10
mddev_resume
// resume without suspend

After the above commit, mddev_resume() will not decrease 'conf-&gt;barrier'
that is set in raid10_takeover_raid0().

Fix this problem by not setting 'conf-&gt;barrier' in raid10_takeover_raid0().

By the way, this problem is found while I'm trying to make
mddev_suspend/resume() to be independent from raid personalities. raid10
is the only personality to use reference count in the quiesce() callback
and this problem is only related to raid10.

Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Link: https://lore.kernel.org/r/20230731022800.1424902-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()"),
'conf-&gt;barrier' will be leaked in the case that raid10 takeover raid0:

level_store
 pers-&gt;takeover -&gt; raid10_takeover
  raid10_takeover_raid0
   WRITE_ONCE(conf-&gt;barrier, 1)

mddev_suspend
// still raid0
mddev-&gt;pers = pers
// switch to raid10
mddev_resume
// resume without suspend

After the above commit, mddev_resume() will not decrease 'conf-&gt;barrier'
that is set in raid10_takeover_raid0().

Fix this problem by not setting 'conf-&gt;barrier' in raid10_takeover_raid0().

By the way, this problem is found while I'm trying to make
mddev_suspend/resume() to be independent from raid personalities. raid10
is the only personality to use reference count in the quiesce() callback
and this problem is only related to raid10.

Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Paul Menzel &lt;pmenzel@molgen.mpg.de&gt;
Link: https://lore.kernel.org/r/20230731022800.1424902-1-yukuai1@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: use dereference_rdev_and_rrdev() to get devices</title>
<updated>2023-07-27T07:13:30+00:00</updated>
<author>
<name>Li Nan</name>
<email>linan122@huawei.com</email>
</author>
<published>2023-07-01T08:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=673643490b9a0eb3b25633abe604f62b8f63dba1'/>
<id>673643490b9a0eb3b25633abe604f62b8f63dba1</id>
<content type='text'>
Commit 2ae6aaf76912 ("md/raid10: fix io loss while replacement replace
rdev") reads replacement first to prevent io loss. However, there are same
issue in wait_blocked_dev() and raid10_handle_discard(), too. Fix it by
using dereference_rdev_and_rrdev() to get devices.

Fixes: d30588b2731f ("md/raid10: improve raid10 discard request")
Fixes: f2e7e269a752 ("md/raid10: pull the code that wait for blocked dev into one function")
Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Link: https://lore.kernel.org/r/20230701080529.2684932-4-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 2ae6aaf76912 ("md/raid10: fix io loss while replacement replace
rdev") reads replacement first to prevent io loss. However, there are same
issue in wait_blocked_dev() and raid10_handle_discard(), too. Fix it by
using dereference_rdev_and_rrdev() to get devices.

Fixes: d30588b2731f ("md/raid10: improve raid10 discard request")
Fixes: f2e7e269a752 ("md/raid10: pull the code that wait for blocked dev into one function")
Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Link: https://lore.kernel.org/r/20230701080529.2684932-4-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: factor out dereference_rdev_and_rrdev()</title>
<updated>2023-07-27T07:13:30+00:00</updated>
<author>
<name>Li Nan</name>
<email>linan122@huawei.com</email>
</author>
<published>2023-07-01T08:05:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b99f8fd2d91eb734f13098aa1cf337edaca454b7'/>
<id>b99f8fd2d91eb734f13098aa1cf337edaca454b7</id>
<content type='text'>
Factor out a helper to get 'rdev' and 'replacement' from config-&gt;mirrors.
Just to make code cleaner and prepare to fix the bug of io loss while
'replacement' replace 'rdev'.

There is no functional change.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Link: https://lore.kernel.org/r/20230701080529.2684932-3-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out a helper to get 'rdev' and 'replacement' from config-&gt;mirrors.
Just to make code cleaner and prepare to fix the bug of io loss while
'replacement' replace 'rdev'.

There is no functional change.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Link: https://lore.kernel.org/r/20230701080529.2684932-3-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: check replacement and rdev to prevent submit the same io twice</title>
<updated>2023-07-27T07:13:30+00:00</updated>
<author>
<name>Li Nan</name>
<email>linan122@huawei.com</email>
</author>
<published>2023-07-01T08:05:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7e85c41b9e1df9192c225afd7cfec8dcad137feb'/>
<id>7e85c41b9e1df9192c225afd7cfec8dcad137feb</id>
<content type='text'>
After commit 4ca40c2ce099 ("md/raid10: Allow replacement device to be
replace old drive."), 'rdev' and 'replacement' could appear to be
identical. There are already checks for that in wait_blocked_dev() and
raid10_write_request(). Add check for raid10_handle_discard() now.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Link: https://lore.kernel.org/r/20230701080529.2684932-2-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 4ca40c2ce099 ("md/raid10: Allow replacement device to be
replace old drive."), 'rdev' and 'replacement' could appear to be
identical. There are already checks for that in wait_blocked_dev() and
raid10_write_request(). Add check for raid10_handle_discard() now.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Link: https://lore.kernel.org/r/20230701080529.2684932-2-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md: remove redundant check in fix_read_error()</title>
<updated>2023-07-27T07:13:30+00:00</updated>
<author>
<name>Li Nan</name>
<email>linan122@huawei.com</email>
</author>
<published>2023-06-23T17:32:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=02c67a3b72b13951c2ca134bd7065f03ec57946d'/>
<id>02c67a3b72b13951c2ca134bd7065f03ec57946d</id>
<content type='text'>
In fix_read_error(), 'success' will be checked immediately after assigning
it, if it is set to 1 then the loop will break. Checking it again in
condition of loop is redundant. Clean it up.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230623173236.2513554-3-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In fix_read_error(), 'success' will be checked immediately after assigning
it, if it is set to 1 then the loop will break. Checking it again in
condition of loop is redundant. Clean it up.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230623173236.2513554-3-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: optimize fix_read_error</title>
<updated>2023-07-27T07:13:30+00:00</updated>
<author>
<name>Li Nan</name>
<email>linan122@huawei.com</email>
</author>
<published>2023-06-23T17:32:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=605eeda6e70f692311b36180f217208d367476f6'/>
<id>605eeda6e70f692311b36180f217208d367476f6</id>
<content type='text'>
We dereference r10_bio-&gt;read_slot too many times in fix_read_error().
Optimize it by using a variable to store read_slot.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230623173236.2513554-2-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We dereference r10_bio-&gt;read_slot too many times in fix_read_error().
Optimize it by using a variable to store read_slot.

Signed-off-by: Li Nan &lt;linan122@huawei.com&gt;
Reviewed-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Link: https://lore.kernel.org/r/20230623173236.2513554-2-linan666@huaweicloud.com
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>md/raid10: switch to use md_account_bio() for io accounting</title>
<updated>2023-07-27T07:13:29+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-21T16:51:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=820455238366a78a44a85cc7d58a987e728464d9'/>
<id>820455238366a78a44a85cc7d58a987e728464d9</id>
<content type='text'>
Make sure that 'active_io' will represent inflight io instead of io that
is dispatching, and io accounting from all levels will be consistent.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621165110.1498313-6-yukuai1@huaweicloud.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make sure that 'active_io' will represent inflight io instead of io that
is dispatching, and io accounting from all levels will be consistent.

Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Reviewed-by: Xiao Ni &lt;xni@redhat.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621165110.1498313-6-yukuai1@huaweicloud.com
</pre>
</div>
</content>
</entry>
<entry>
<title>raid10: avoid spin_lock from fastpath from raid10_unplug()</title>
<updated>2023-06-23T16:41:50+00:00</updated>
<author>
<name>Yu Kuai</name>
<email>yukuai3@huawei.com</email>
</author>
<published>2023-06-21T10:57:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a8d5fdd4d2702d0c7ec125bd3bbce3fc589afa67'/>
<id>a8d5fdd4d2702d0c7ec125bd3bbce3fc589afa67</id>
<content type='text'>
Commit 0c0be98bbe67 ("md/raid10: prevent unnecessary calls to wake_up()
in fast path") missed one place, for example, with:

	fio -direct=1 -rw=write/randwrite -iodepth=1 ...

Plug and unplug are called for each io, then wake_up() from raid10_unplug()
will cause lock contention as well.

Avoid this contention by using wake_up_barrier() instead of wake_up(),
where spin_lock is not held if waitqueue is empty.

Fio test script:

[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]

Test result with ramdisk raid10(By Ali):

	Before this patch	With this patch
READ	IOPS=2033k		IOPS=3642k
WRITE	IOPS=871k		IOPS=1561K

By the way, in this scenario, blk_plug_cb() will be allocated and freed
for each io, this seems need to be optimized as well.

Reported-and-tested-by: Ali Gholami Rudi &lt;aligrudi@gmail.com&gt;
Closes: https://lore.kernel.org/all/20231606122233@laper.mirepesht/
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit 0c0be98bbe67 ("md/raid10: prevent unnecessary calls to wake_up()
in fast path") missed one place, for example, with:

	fio -direct=1 -rw=write/randwrite -iodepth=1 ...

Plug and unplug are called for each io, then wake_up() from raid10_unplug()
will cause lock contention as well.

Avoid this contention by using wake_up_barrier() instead of wake_up(),
where spin_lock is not held if waitqueue is empty.

Fio test script:

[global]
name=random reads and writes
ioengine=libaio
direct=1
readwrite=randrw
rwmixread=70
iodepth=64
buffered=0
filename=/dev/md0
size=1G
runtime=30
time_based
randrepeat=0
norandommap
refill_buffers
ramp_time=10
bs=4k
numjobs=400
group_reporting=1
[job1]

Test result with ramdisk raid10(By Ali):

	Before this patch	With this patch
READ	IOPS=2033k		IOPS=3642k
WRITE	IOPS=871k		IOPS=1561K

By the way, in this scenario, blk_plug_cb() will be allocated and freed
for each io, this seems need to be optimized as well.

Reported-and-tested-by: Ali Gholami Rudi &lt;aligrudi@gmail.com&gt;
Closes: https://lore.kernel.org/all/20231606122233@laper.mirepesht/
Signed-off-by: Yu Kuai &lt;yukuai3@huawei.com&gt;
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Link: https://lore.kernel.org/r/20230621105728.1268542-1-yukuai1@huaweicloud.com
</pre>
</div>
</content>
</entry>
</feed>
