<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/async-thread.c, branch v3.17</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Btrfs: fix task hang under heavy compressed write</title>
<updated>2014-08-24T14:17:02+00:00</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2014-08-15T15:36:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9e0af23764344f7f1b68e4eefbe7dc865018b63d'/>
<id>9e0af23764344f7f1b68e4eefbe7dc865018b63d</id>
<content type='text'>
This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work-&gt;func() &lt;---- (we name it work X)
    for ordered_work in wq-&gt;ordered_list
            ordered_work-&gt;ordered_func()
            ordered_work-&gt;ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() &lt;--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker-&gt;current_work = arg;  &lt;-- arg is A-&gt;normal_work
	worker-&gt;current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A-&gt;func()
		     A-&gt;ordered_func()
		     A-&gt;ordered_free()  &lt;-- A gets freed

		     B-&gt;ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   &lt;-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B-&gt;ordered_free()

As if work A has a high priority in wq-&gt;ordered_list and there are more ordered
works queued after it, such as B-&gt;ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker-&gt;current_work pointer to be work
A-&gt;normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C-&gt;normal_work
and work A-&gt;normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C-&gt;normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work-&gt;func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This has been reported and discussed for a long time, and this hang occurs in
both 3.15 and 3.16.

Btrfs now migrates to use kernel workqueue, but it introduces this hang problem.

Btrfs has a kind of work queued as an ordered way, which means that its
ordered_func() must be processed in the way of FIFO, so it usually looks like --

normal_work_helper(arg)
    work = container_of(arg, struct btrfs_work, normal_work);

    work-&gt;func() &lt;---- (we name it work X)
    for ordered_work in wq-&gt;ordered_list
            ordered_work-&gt;ordered_func()
            ordered_work-&gt;ordered_free()

The hang is a rare case, first when we find free space, we get an uncached block
group, then we go to read its free space cache inode for free space information,
so it will

file a readahead request
    btrfs_readpages()
         for page that is not in page cache
                __do_readpage()
                     submit_extent_page()
                           btrfs_submit_bio_hook()
                                 btrfs_bio_wq_end_io()
                                 submit_bio()
                                 end_workqueue_bio() &lt;--(ret by the 1st endio)
                                      queue a work(named work Y) for the 2nd
                                      also the real endio()

So the hang occurs when work Y's work_struct and work X's work_struct happens
to share the same address.

A bit more explanation,

A,B,C -- struct btrfs_work
arg   -- struct work_struct

kthread:
worker_thread()
    pick up a work_struct from @worklist
    process_one_work(arg)
	worker-&gt;current_work = arg;  &lt;-- arg is A-&gt;normal_work
	worker-&gt;current_func(arg)
		normal_work_helper(arg)
		     A = container_of(arg, struct btrfs_work, normal_work);

		     A-&gt;func()
		     A-&gt;ordered_func()
		     A-&gt;ordered_free()  &lt;-- A gets freed

		     B-&gt;ordered_func()
			  submit_compressed_extents()
			      find_free_extent()
				  load_free_space_inode()
				      ...   &lt;-- (the above readhead stack)
				      end_workqueue_bio()
					   btrfs_queue_work(work C)
		     B-&gt;ordered_free()

As if work A has a high priority in wq-&gt;ordered_list and there are more ordered
works queued after it, such as B-&gt;ordered_func(), its memory could have been
freed before normal_work_helper() returns, which means that kernel workqueue
code worker_thread() still has worker-&gt;current_work pointer to be work
A-&gt;normal_work's, ie. arg's address.

Meanwhile, work C is allocated after work A is freed, work C-&gt;normal_work
and work A-&gt;normal_work are likely to share the same address(I confirmed this
with ftrace output, so I'm not just guessing, it's rare though).

When another kthread picks up work C-&gt;normal_work to process, and finds our
kthread is processing it(see find_worker_executing_work()), it'll think
work C as a collision and skip then, which ends up nobody processing work C.

So the situation is that our kthread is waiting forever on work C.

Besides, there're other cases that can lead to deadlock, but the real problem
is that all btrfs workqueue shares one work-&gt;func, -- normal_work_helper,
so this makes each workqueue to have its own helper function, but only a
wraper pf normal_work_helper.

With this patch, I no long hit the above hang.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: fix crash in remount(thread_pool=) case</title>
<updated>2014-04-07T17:41:52+00:00</updated>
<author>
<name>Sergei Trofimovich</name>
<email>slyfox@gentoo.org</email>
</author>
<published>2014-04-07T07:55:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=800ee2247f483b6d05ed47ef3bbc90b56451746c'/>
<id>800ee2247f483b6d05ed47ef3bbc90b56451746c</id>
<content type='text'>
Reproducer:
    mount /dev/ubda /mnt
    mount -oremount,thread_pool=42 /mnt

Gives a crash:
    ? btrfs_workqueue_set_max+0x0/0x70
    btrfs_resize_thread_pool+0xe3/0xf0
    ? sync_filesystem+0x0/0xc0
    ? btrfs_resize_thread_pool+0x0/0xf0
    btrfs_remount+0x1d2/0x570
    ? kern_path+0x0/0x80
    do_remount_sb+0xd9/0x1c0
    do_mount+0x26a/0xbf0
    ? kfree+0x0/0x1b0
    SyS_mount+0xc4/0x110

It's a call
    btrfs_workqueue_set_max(fs_info-&gt;scrub_wr_completion_workers, new_pool_size);
with
    fs_info-&gt;scrub_wr_completion_workers = NULL;

as scrub wqs get created only on user's demand.

Patch skips not-created-yet workqueues.

Signed-off-by: Sergei Trofimovich &lt;slyfox@gentoo.org&gt;
CC: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
CC: Chris Mason &lt;clm@fb.com&gt;
CC: Josef Bacik &lt;jbacik@fb.com&gt;
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reproducer:
    mount /dev/ubda /mnt
    mount -oremount,thread_pool=42 /mnt

Gives a crash:
    ? btrfs_workqueue_set_max+0x0/0x70
    btrfs_resize_thread_pool+0xe3/0xf0
    ? sync_filesystem+0x0/0xc0
    ? btrfs_resize_thread_pool+0x0/0xf0
    btrfs_remount+0x1d2/0x570
    ? kern_path+0x0/0x80
    do_remount_sb+0xd9/0x1c0
    do_mount+0x26a/0xbf0
    ? kfree+0x0/0x1b0
    SyS_mount+0xc4/0x110

It's a call
    btrfs_workqueue_set_max(fs_info-&gt;scrub_wr_completion_workers, new_pool_size);
with
    fs_info-&gt;scrub_wr_completion_workers = NULL;

as scrub wqs get created only on user's demand.

Patch skips not-created-yet workqueues.

Signed-off-by: Sergei Trofimovich &lt;slyfox@gentoo.org&gt;
CC: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
CC: Chris Mason &lt;clm@fb.com&gt;
CC: Josef Bacik &lt;jbacik@fb.com&gt;
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Add trace for btrfs_workqueue alloc/destroy</title>
<updated>2014-03-21T00:15:28+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-03-12T08:05:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c3a468915a384c0015263edd9b7263775599a323'/>
<id>c3a468915a384c0015263edd9b7263775599a323</id>
<content type='text'>
Since most of the btrfs_workqueue is printed as pointer address,
for easier analysis, add trace for btrfs_workqueue alloc/destroy.
So it is possible to determine the workqueue that a given work belongs
to(by comparing the wq pointer address with alloc trace event).

Signed-off-by: Qu Wenruo &lt;quenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since most of the btrfs_workqueue is printed as pointer address,
for easier analysis, add trace for btrfs_workqueue alloc/destroy.
So it is possible to determine the workqueue that a given work belongs
to(by comparing the wq pointer address with alloc trace event).

Signed-off-by: Qu Wenruo &lt;quenruo@cn.fujitsu.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: add missing kfree in btrfs_destroy_workqueue</title>
<updated>2014-03-21T00:15:27+00:00</updated>
<author>
<name>Filipe Manana</name>
<email>fdmanana@gmail.com</email>
</author>
<published>2014-03-11T14:31:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ef66af101a261f1c86ef9ec3859ebd9c28ee2e54'/>
<id>ef66af101a261f1c86ef9ec3859ebd9c28ee2e54</id>
<content type='text'>
Signed-off-by: Filipe David Borba Manana &lt;fdmanana@gmail.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Signed-off-by: Filipe David Borba Manana &lt;fdmanana@gmail.com&gt;
Signed-off-by: Chris Mason &lt;clm@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Add ftrace for btrfs_workqueue</title>
<updated>2014-03-10T19:17:21+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-03-06T04:19:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=52483bc26f0e95c91e8fd07f9def588bf89664f8'/>
<id>52483bc26f0e95c91e8fd07f9def588bf89664f8</id>
<content type='text'>
Add ftrace for btrfs_workqueue for further workqueue tunning.
This patch needs to applied after the workqueue replace patchset.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add ftrace for btrfs_workqueue for further workqueue tunning.
This patch needs to applied after the workqueue replace patchset.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Cleanup the btrfs_workqueue related function type</title>
<updated>2014-03-10T19:17:20+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-03-06T04:19:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6db8914f9763d3f0a7610b497d44f93a4c17e62e'/>
<id>6db8914f9763d3f0a7610b497d44f93a4c17e62e</id>
<content type='text'>
The new btrfs_workqueue still use open-coded function defition,
this patch will change them into btrfs_func_t type which is much the
same as kernel workqueue.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The new btrfs_workqueue still use open-coded function defition,
this patch will change them into btrfs_func_t type which is much the
same as kernel workqueue.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Cleanup the "_struct" suffix in btrfs_workequeue</title>
<updated>2014-03-10T19:17:16+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-02-28T02:46:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d458b0540ebd728b4d6ef47cc5ef0dbfd4dd361a'/>
<id>d458b0540ebd728b4d6ef47cc5ef0dbfd4dd361a</id>
<content type='text'>
Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Cleanup the old btrfs_worker.</title>
<updated>2014-03-10T19:17:15+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-02-28T02:46:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a046e9c88b0f46677923864295eac7c92cd962cb'/>
<id>a046e9c88b0f46677923864295eac7c92cd962cb</id>
<content type='text'>
Since all the btrfs_worker is replaced with the newly created
btrfs_workqueue, the old codes can be easily remove.

Signed-off-by: Quwenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since all the btrfs_worker is replaced with the newly created
btrfs_workqueue, the old codes can be easily remove.

Signed-off-by: Quwenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Add threshold workqueue based on kernel workqueue</title>
<updated>2014-03-10T19:17:04+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-02-28T02:46:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0bd9289c28c3b6a38f5a05a812afae0274674fa2'/>
<id>0bd9289c28c3b6a38f5a05a812afae0274674fa2</id>
<content type='text'>
The original btrfs_workers has thresholding functions to dynamically
create or destroy kthreads.

Though there is no such function in kernel workqueue because the worker
is not created manually, we can still use the workqueue_set_max_active
to simulated the behavior, mainly to achieve a better HDD performance by
setting a high threshold on submit_workers.
(Sadly, no resource can be saved)

So in this patch, extra workqueue pending counters are introduced to
dynamically change the max active of each btrfs_workqueue_struct, hoping
to restore the behavior of the original thresholding function.

Also, workqueue_set_max_active use a mutex to protect workqueue_struct,
which is not meant to be called too frequently, so a new interval
mechanism is applied, that will only call workqueue_set_max_active after
a count of work is queued. Hoping to balance both the random and
sequence performance on HDD.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The original btrfs_workers has thresholding functions to dynamically
create or destroy kthreads.

Though there is no such function in kernel workqueue because the worker
is not created manually, we can still use the workqueue_set_max_active
to simulated the behavior, mainly to achieve a better HDD performance by
setting a high threshold on submit_workers.
(Sadly, no resource can be saved)

So in this patch, extra workqueue pending counters are introduced to
dynamically change the max active of each btrfs_workqueue_struct, hoping
to restore the behavior of the original thresholding function.

Also, workqueue_set_max_active use a mutex to protect workqueue_struct,
which is not meant to be called too frequently, so a new interval
mechanism is applied, that will only call workqueue_set_max_active after
a count of work is queued. Hoping to balance both the random and
sequence performance on HDD.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: Add high priority workqueue support for btrfs_workqueue_struct</title>
<updated>2014-03-10T19:17:03+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>quwenruo@cn.fujitsu.com</email>
</author>
<published>2014-02-28T02:46:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1ca08976ae94f3594dd7303584581cf8099ce47e'/>
<id>1ca08976ae94f3594dd7303584581cf8099ce47e</id>
<content type='text'>
Add high priority function to btrfs_workqueue.

This is implemented by embedding a btrfs_workqueue into a
btrfs_workqueue and use some helper functions to differ the normal
priority wq and high priority wq.
So the high priority wq is completely independent from the normal
workqueue.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add high priority function to btrfs_workqueue.

This is implemented by embedding a btrfs_workqueue into a
btrfs_workqueue and use some helper functions to differ the normal
priority wq and high priority wq.
So the high priority wq is completely independent from the normal
workqueue.

Signed-off-by: Qu Wenruo &lt;quwenruo@cn.fujitsu.com&gt;
Tested-by: David Sterba &lt;dsterba@suse.cz&gt;
Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
