<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/f2fs/segment.c, branch v5.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>f2fs: add compress_mode mount option</title>
<updated>2020-12-03T08:11:57+00:00</updated>
<author>
<name>Daeho Jeong</name>
<email>daehojeong@google.com</email>
</author>
<published>2020-12-01T04:08:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=602a16d58e9aab3c423bcf051033ea6c9e8a6d37'/>
<id>602a16d58e9aab3c423bcf051033ea6c9e8a6d37</id>
<content type='text'>
We will add a new "compress_mode" mount option to control file
compression mode. This supports "fs" and "user". In "fs" mode (default),
f2fs does automatic compression on the compression enabled files.
In "user" mode, f2fs disables the automaic compression and gives the
user discretion of choosing the target file and the timing. It means
the user can do manual compression/decompression on the compression
enabled files using ioctls.

Signed-off-by: Daeho Jeong &lt;daehojeong@google.com&gt;
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We will add a new "compress_mode" mount option to control file
compression mode. This supports "fs" and "user". In "fs" mode (default),
f2fs does automatic compression on the compression enabled files.
In "user" mode, f2fs disables the automaic compression and gives the
user discretion of choosing the target file and the timing. It means
the user can do manual compression/decompression on the compression
enabled files using ioctls.

Signed-off-by: Daeho Jeong &lt;daehojeong@google.com&gt;
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: init dirty_secmap incorrectly</title>
<updated>2020-12-03T06:00:23+00:00</updated>
<author>
<name>Jack Qiu</name>
<email>jack.qiu@huawei.com</email>
</author>
<published>2020-12-01T07:45:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5335bfc6eb688344bfcd4b4133c002c0ae0d0719'/>
<id>5335bfc6eb688344bfcd4b4133c002c0ae0d0719</id>
<content type='text'>
section is dirty, but dirty_secmap may not set

Reported-by: Jia Yang &lt;jiayang5@huawei.com&gt;
Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Jack Qiu &lt;jack.qiu@huawei.com&gt;
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
section is dirty, but dirty_secmap may not set

Reported-by: Jia Yang &lt;jiayang5@huawei.com&gt;
Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments")
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Jack Qiu &lt;jack.qiu@huawei.com&gt;
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: fix to avoid REQ_TIME and CP_TIME collision</title>
<updated>2020-12-03T06:00:21+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-11-25T02:57:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=493720a4854343b7c3fe100cda6a3a2c3f8d4b5d'/>
<id>493720a4854343b7c3fe100cda6a3a2c3f8d4b5d</id>
<content type='text'>
Lei Li reported a issue: if foreground operations are frequent, background
checkpoint may be always skipped due to below check, result in losing more
data after sudden power-cut.

f2fs_balance_fs_bg()
...
	if (!is_idle(sbi, REQ_TIME) &amp;&amp;
		(!excess_dirty_nats(sbi) &amp;&amp; !excess_dirty_nodes(sbi)))
		return;

E.g:
cp_interval = 5 second
idle_interval = 2 second
foreground operation interval = 1 second (append 1 byte per second into file)

In such case, no matter when it calls f2fs_balance_fs_bg(), is_idle(, REQ_TIME)
returns false, result in skipping background checkpoint.

This patch changes as below to make trigger condition being more reasonable:
- trigger sync_fs() if dirty_{nats,nodes} and prefree segs exceeds threshold;
- skip triggering sync_fs() if there is any background inflight IO or there is
foreground operation recently and meanwhile cp_rwsem is being held by someone;

Reported-by: Lei Li &lt;noctis.akm@gmail.com&gt;
Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Lei Li reported a issue: if foreground operations are frequent, background
checkpoint may be always skipped due to below check, result in losing more
data after sudden power-cut.

f2fs_balance_fs_bg()
...
	if (!is_idle(sbi, REQ_TIME) &amp;&amp;
		(!excess_dirty_nats(sbi) &amp;&amp; !excess_dirty_nodes(sbi)))
		return;

E.g:
cp_interval = 5 second
idle_interval = 2 second
foreground operation interval = 1 second (append 1 byte per second into file)

In such case, no matter when it calls f2fs_balance_fs_bg(), is_idle(, REQ_TIME)
returns false, result in skipping background checkpoint.

This patch changes as below to make trigger condition being more reasonable:
- trigger sync_fs() if dirty_{nats,nodes} and prefree segs exceeds threshold;
- skip triggering sync_fs() if there is any background inflight IO or there is
foreground operation recently and meanwhile cp_rwsem is being held by someone;

Reported-by: Lei Li &lt;noctis.akm@gmail.com&gt;
Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: don't issue flush in f2fs_flush_device_cache() for nobarrier case</title>
<updated>2020-10-14T06:23:34+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-10-12T02:28:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6ed29fe1cac9745589b7db8de3b5089e3ff591d0'/>
<id>6ed29fe1cac9745589b7db8de3b5089e3ff591d0</id>
<content type='text'>
This patch changes f2fs_flush_device_cache() to skip issuing flush for
nobarrier case.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch changes f2fs_flush_device_cache() to skip issuing flush for
nobarrier case.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: handle errors of f2fs_get_meta_page_nofail</title>
<updated>2020-10-14T06:23:29+00:00</updated>
<author>
<name>Jaegeuk Kim</name>
<email>jaegeuk@kernel.org</email>
</author>
<published>2020-10-02T21:17:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=86f33603f8c51537265ff7ac0320638fd2cbdb1b'/>
<id>86f33603f8c51537265ff7ac0320638fd2cbdb1b</id>
<content type='text'>
First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
f2fs_get_meta_page_nofail().

Quick fix was not to give any error with infinite loop, but syzbot caught
a case where it goes to that loop from fuzzed image. In turned out we abused
f2fs_get_meta_page_nofail() like in the below call stack.

- f2fs_fill_super
 - f2fs_build_segment_manager
  - build_sit_entries
   - get_current_sit_page

INFO: task syz-executor178:6870 can't die for more than 143 seconds.
task:syz-executor178 state:R
 stack:26960 pid: 6870 ppid:  6869 flags:0x00004006
Call Trace:

Showing all locks held in the system:
1 lock held by khungtaskd/1179:
 #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
1 lock held by systemd-journal/3920:
1 lock held by in:imklog/6769:
 #0: ffff88809eebc130 (&amp;f-&gt;f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor178/6870:
 #0: ffff8880925120e0 (&amp;type-&gt;s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

Actually, we didn't have to use _nofail in this case, since we could return
error to mount(2) already with the error handler.

As a result, this patch tries to 1) remove _nofail callers as much as possible,
2) deal with error case in last remaining caller, f2fs_get_sum_page().

Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
First problem is we hit BUG_ON() in f2fs_get_sum_page given EIO on
f2fs_get_meta_page_nofail().

Quick fix was not to give any error with infinite loop, but syzbot caught
a case where it goes to that loop from fuzzed image. In turned out we abused
f2fs_get_meta_page_nofail() like in the below call stack.

- f2fs_fill_super
 - f2fs_build_segment_manager
  - build_sit_entries
   - get_current_sit_page

INFO: task syz-executor178:6870 can't die for more than 143 seconds.
task:syz-executor178 state:R
 stack:26960 pid: 6870 ppid:  6869 flags:0x00004006
Call Trace:

Showing all locks held in the system:
1 lock held by khungtaskd/1179:
 #0: ffffffff8a554da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6242
1 lock held by systemd-journal/3920:
1 lock held by in:imklog/6769:
 #0: ffff88809eebc130 (&amp;f-&gt;f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:930
1 lock held by syz-executor178/6870:
 #0: ffff8880925120e0 (&amp;type-&gt;s_umount_key#47/1){+.+.}-{3:3}, at: alloc_super+0x201/0xaf0 fs/super.c:229

Actually, we didn't have to use _nofail in this case, since we could return
error to mount(2) already with the error handler.

As a result, this patch tries to 1) remove _nofail callers as much as possible,
2) deal with error case in last remaining caller, f2fs_get_sum_page().

Reported-by: syzbot+ee250ac8137be41d7b13@syzkaller.appspotmail.com
Reviewed-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: clean up kvfree</title>
<updated>2020-09-14T18:15:37+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-09-14T08:47:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8eb702484ed5badb2b654f2c9d086e43a00b954'/>
<id>c8eb702484ed5badb2b654f2c9d086e43a00b954</id>
<content type='text'>
After commit 0b6d4ca04a86 ("f2fs: don't return vmalloc() memory from
f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed
memory, so clean up to use kfree() instead of kvfree() to free
vmalloc()'ed memory.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After commit 0b6d4ca04a86 ("f2fs: don't return vmalloc() memory from
f2fs_kmalloc()"), f2fs_k{m,z}alloc() will not return vmalloc()'ed
memory, so clean up to use kfree() instead of kvfree() to free
vmalloc()'ed memory.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: support age threshold based garbage collection</title>
<updated>2020-09-11T18:11:15+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-08-04T13:14:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=093749e296e29a4b0162eb925a6701a01e8c9a98'/>
<id>093749e296e29a4b0162eb925a6701a01e8c9a98</id>
<content type='text'>
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.

This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:

1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
 0 means youngest, 100 means oldest, if we set age threshold to 80
 then select dirty segments which has age in range of [80, 100] as
 candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;

2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.

3. merge valid blocks from source victim into target victim with
SSR alloctor.

Test steps:
- create 160 dirty segments:
 * half of them have 128 valid blocks per segment
 * left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
  - Valid: 86
  - Dirty: 1
  - Prefree: 11
  - Free: 6001 (6001)

GC calls: 162 (BG: 220)
  - data segments : 160 (160)
  - node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
  - data blocks : 40960 (40960)
  - node blocks : 494 (494)

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:

  - Valid: 87
  - Dirty: 0
  - Prefree: 4
  - Free: 6008 (6008)

GC calls: 75 (BG: 76)
  - data segments : 74 (74)
  - node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
  - data blocks : 12544 (12544)
  - node blocks : 269 (269)

IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment &amp; clean up]
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There are several issues in current background GC algorithm:
- valid blocks is one of key factors during cost overhead calculation,
so if segment has less valid block, however even its age is young or
it locates hot segment, CB algorithm will still choose the segment as
victim, it's not appropriate.
- GCed data/node will go to existing logs, no matter in-there datas'
update frequency is the same or not, it may mix hot and cold data
again.
- GC alloctor mainly use LFS type segment, it will cost free segment
more quickly.

This patch introduces a new algorithm named age threshold based
garbage collection to solve above issues, there are three steps
mainly:

1. select a source victim:
- set an age threshold, and select candidates beased threshold:
e.g.
 0 means youngest, 100 means oldest, if we set age threshold to 80
 then select dirty segments which has age in range of [80, 100] as
 candiddates;
- set candidate_ratio threshold, and select candidates based the
ratio, so that we can shrink candidates to those oldest segments;
- select target segment with fewest valid blocks in order to
migrate blocks with minimum cost;

2. select a target victim:
- select candidates beased age threshold;
- set candidate_radius threshold, search candidates whose age is
around source victims, searching radius should less than the
radius threshold.
- select target segment with most valid blocks in order to avoid
migrating current target segment.

3. merge valid blocks from source victim into target victim with
SSR alloctor.

Test steps:
- create 160 dirty segments:
 * half of them have 128 valid blocks per segment
 * left of them have 384 valid blocks per segment
- run background GC

Benefit: GC count and block movement count both decrease obviously:

- Before:
  - Valid: 86
  - Dirty: 1
  - Prefree: 11
  - Free: 6001 (6001)

GC calls: 162 (BG: 220)
  - data segments : 160 (160)
  - node segments : 2 (2)
Try to move 41454 blocks (BG: 41454)
  - data blocks : 40960 (40960)
  - node blocks : 494 (494)

IPU: 0 blocks
SSR: 0 blocks in 0 segments
LFS: 41364 blocks in 81 segments

- After:

  - Valid: 87
  - Dirty: 0
  - Prefree: 4
  - Free: 6008 (6008)

GC calls: 75 (BG: 76)
  - data segments : 74 (74)
  - node segments : 1 (1)
Try to move 12813 blocks (BG: 12813)
  - data blocks : 12544 (12544)
  - node blocks : 269 (269)

IPU: 0 blocks
SSR: 12032 blocks in 77 segments
LFS: 855 blocks in 2 segments

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
[Jaegeuk Kim: fix a bug along with pinfile in-mem segment &amp; clean up]
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: support 64-bits key in f2fs rb-tree node entry</title>
<updated>2020-09-10T21:03:30+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-08-04T13:14:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2e9b2bb250d5d1493eaf36215fbfe2cd76ce4f7c'/>
<id>2e9b2bb250d5d1493eaf36215fbfe2cd76ce4f7c</id>
<content type='text'>
then, we can add specified entry into rb-tree with 64-bits segment time
as key.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
then, we can add specified entry into rb-tree with 64-bits segment time
as key.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: inherit mtime of original block during GC</title>
<updated>2020-09-10T21:03:30+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-08-04T13:14:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c5d02785c59dd8ee20031d742003daff8df73e2d'/>
<id>c5d02785c59dd8ee20031d742003daff8df73e2d</id>
<content type='text'>
Don't let f2fs inner GC ruins original aging degree of segment.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Don't let f2fs inner GC ruins original aging degree of segment.

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>f2fs: record average update time of segment</title>
<updated>2020-09-10T21:03:30+00:00</updated>
<author>
<name>Chao Yu</name>
<email>yuchao0@huawei.com</email>
</author>
<published>2020-08-04T13:14:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6f3a01ae9b72d5b7c57632632b8c8088db11e7ab'/>
<id>6f3a01ae9b72d5b7c57632632b8c8088db11e7ab</id>
<content type='text'>
Previously, once we update one block in segment, we will update mtime of
segment to last time, making aged segment becoming freshest, result in
that GC with cost benefit algorithm missing such segment, So this patch
changes to record mtime as average block updating time instead of last
updating time.

It's not needed to reset mtime for prefree segment, as se-&gt;valid_blocks
is zero, then old se-&gt;mtime won't take any weight with below calculation:

	se-&gt;mtime = div_u64(se-&gt;mtime * se-&gt;valid_blocks + mtime,
					se-&gt;valid_blocks + 1);

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, once we update one block in segment, we will update mtime of
segment to last time, making aged segment becoming freshest, result in
that GC with cost benefit algorithm missing such segment, So this patch
changes to record mtime as average block updating time instead of last
updating time.

It's not needed to reset mtime for prefree segment, as se-&gt;valid_blocks
is zero, then old se-&gt;mtime won't take any weight with below calculation:

	se-&gt;mtime = div_u64(se-&gt;mtime * se-&gt;valid_blocks + mtime,
					se-&gt;valid_blocks + 1);

Signed-off-by: Chao Yu &lt;yuchao0@huawei.com&gt;
Signed-off-by: Jaegeuk Kim &lt;jaegeuk@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
