<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/compression.c, branch v4.15</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>btrfs: Fix wild memory access in compression level parser</title>
<updated>2017-11-27T16:01:11+00:00</updated>
<author>
<name>Qu Wenruo</name>
<email>wqu@suse.com</email>
</author>
<published>2017-11-06T02:43:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=eae8d82529dd9820e206ecba0047b806c4410e65'/>
<id>eae8d82529dd9820e206ecba0047b806c4410e65</id>
<content type='text'>
[BUG]
Kernel panic when mounting with "-o compress" mount option.
KASAN will report like:
------
==================================================================
BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0
Read of size 1 at addr d86735fce994f800 by task mount/662
...
Call Trace:
 dump_stack+0xe3/0x175
 kasan_report+0x163/0x370
 __asan_load1+0x47/0x50
 strncmp+0x31/0xc0
 btrfs_compress_str2level+0x20/0x70 [btrfs]
 btrfs_parse_options+0xff4/0x1870 [btrfs]
 open_ctree+0x2679/0x49f0 [btrfs]
 btrfs_mount+0x1b7f/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 vfs_kern_mount+0x13/0x20
 btrfs_mount+0x31e/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 do_mount+0xaad/0x1a00
 SyS_mount+0x98/0xe0
 entry_SYSCALL_64_fastpath+0x1f/0xbe
------

[Cause]
For 'compress' and 'compress_force' options, its token doesn't expect
any parameter so its args[0] contains uninitialized data.
Accessing args[0] will cause above wild memory access.

[Fix]
For Opt_compress and Opt_compress_force, set compression level to
the default.

Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ set the default in advance ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[BUG]
Kernel panic when mounting with "-o compress" mount option.
KASAN will report like:
------
==================================================================
BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0
Read of size 1 at addr d86735fce994f800 by task mount/662
...
Call Trace:
 dump_stack+0xe3/0x175
 kasan_report+0x163/0x370
 __asan_load1+0x47/0x50
 strncmp+0x31/0xc0
 btrfs_compress_str2level+0x20/0x70 [btrfs]
 btrfs_parse_options+0xff4/0x1870 [btrfs]
 open_ctree+0x2679/0x49f0 [btrfs]
 btrfs_mount+0x1b7f/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 vfs_kern_mount+0x13/0x20
 btrfs_mount+0x31e/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 do_mount+0xaad/0x1a00
 SyS_mount+0x98/0xe0
 entry_SYSCALL_64_fastpath+0x1f/0xbe
------

[Cause]
For 'compress' and 'compress_force' options, its token doesn't expect
any parameter so its args[0] contains uninitialized data.
Accessing args[0] will cause above wild memory access.

[Fix]
For Opt_compress and Opt_compress_force, set compression level to
the default.

Signed-off-by: Qu Wenruo &lt;wqu@suse.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ set the default in advance ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: add write_flags for compression bio</title>
<updated>2017-11-15T13:44:31+00:00</updated>
<author>
<name>Liu Bo</name>
<email>bo.li.liu@oracle.com</email>
</author>
<published>2017-10-24T05:18:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f82b735936ffd58b6711cf1f1054616517d8ffcd'/>
<id>f82b735936ffd58b6711cf1f1054616517d8ffcd</id>
<content type='text'>
Compression code path has only flaged bios with REQ_OP_WRITE no matter
where the bios come from, but it could be a sync write if fsync starts
this writeback or a normal writeback write if wb kthread starts a
periodic writeback.

It breaks the rule that sync writes and writeback writes need to be
differentiated from each other, because from the POV of block layer,
all bios need to be recognized by these flags in order to do some
management, e.g. throttlling.

This passes writeback_control to compression write path so that it can
send bios with proper flags to block layer.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Compression code path has only flaged bios with REQ_OP_WRITE no matter
where the bios come from, but it could be a sync write if fsync starts
this writeback or a normal writeback write if wb kthread starts a
periodic writeback.

It breaks the rule that sync writes and writeback writes need to be
differentiated from each other, because from the POV of block layer,
all bios need to be recognized by these flags in order to do some
management, e.g. throttlling.

This passes writeback_control to compression write path so that it can
send bios with proper flags to block layer.

Signed-off-by: Liu Bo &lt;bo.li.liu@oracle.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: heuristic: add Shannon entropy calculation</title>
<updated>2017-11-01T19:45:36+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-10-08T13:11:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=19562430c6213925faba3baa9e8cb224ddd47ee6'/>
<id>19562430c6213925faba3baa9e8cb224ddd47ee6</id>
<content type='text'>
Byte distribution check in heuristic will filter edge data cases and
some time fail to classify input data.

Let's fix that by adding Shannon entropy calculation, that will cover
classification of most other data types.

As Shannon entropy needs log2 with some precision to work, let's use
ilog2(N) and for increased precision, by do ilog2(pow(N, 4)).

Shannon entropy has been slightly changed to avoid signed numbers and
division.

The calculation is direct by the formula, successor of precalculated
table or chains of if-else.

The accuracy errors of ilog2 are compensated by

@ENTROPY_LVL_ACEPTABLE 70 -&gt; 65
@ENTROPY_LVL_HIGH      85 -&gt; 80

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update comments ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Byte distribution check in heuristic will filter edge data cases and
some time fail to classify input data.

Let's fix that by adding Shannon entropy calculation, that will cover
classification of most other data types.

As Shannon entropy needs log2 with some precision to work, let's use
ilog2(N) and for increased precision, by do ilog2(pow(N, 4)).

Shannon entropy has been slightly changed to avoid signed numbers and
division.

The calculation is direct by the formula, successor of precalculated
table or chains of if-else.

The accuracy errors of ilog2 are compensated by

@ENTROPY_LVL_ACEPTABLE 70 -&gt; 65
@ENTROPY_LVL_HIGH      85 -&gt; 80

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update comments ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: heuristic: add byte core set calculation</title>
<updated>2017-11-01T19:45:36+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-09-28T14:33:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=858177d38d4681dad6efc015b99e4c786a34aca5'/>
<id>858177d38d4681dad6efc015b99e4c786a34aca5</id>
<content type='text'>
Calculate byte core set for data sample:
- sort buckets' numbers in decreasing order
- count how many values cover 90% of the sample

If the core set size is low (&lt;=25%), data are easily compressible.
If the core set size is high (&gt;=80%), data are not compressible.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update comments ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Calculate byte core set for data sample:
- sort buckets' numbers in decreasing order
- count how many values cover 90% of the sample

If the core set size is low (&lt;=25%), data are easily compressible.
If the core set size is high (&gt;=80%), data are not compressible.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update comments ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: heuristic: add byte set calculation</title>
<updated>2017-11-01T19:45:36+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-09-28T14:33:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a288e92cacdc4729ad8f83d018fb0f3f5ded6c37'/>
<id>a288e92cacdc4729ad8f83d018fb0f3f5ded6c37</id>
<content type='text'>
Calculate byte set size for data sample:
- calculate how many unique bytes have been in the sample
- for all bytes count &gt; 0, check if we're still in the low count range
  (~25%), such data are easily compressible, otherwise furhter analysis
  is needed

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update comments ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Calculate byte set size for data sample:
- calculate how many unique bytes have been in the sample
- for all bytes count &gt; 0, check if we're still in the low count range
  (~25%), such data are easily compressible, otherwise furhter analysis
  is needed

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ update comments ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: heuristic: add detection of repeated data patterns</title>
<updated>2017-11-01T19:45:36+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-09-28T14:33:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1fe4f6fa5ae7dd1e63145e1ced7b9b38854da9f4'/>
<id>1fe4f6fa5ae7dd1e63145e1ced7b9b38854da9f4</id>
<content type='text'>
Walk over data sample and use memcmp to detect repeated patterns, like
zeros, but a bit more general.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ minor coding style fixes ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Walk over data sample and use memcmp to detect repeated patterns, like
zeros, but a bit more general.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ minor coding style fixes ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: heuristic: implement sampling logic</title>
<updated>2017-11-01T19:45:36+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-09-28T14:33:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a440d48c7f93af5bae86af676cc6cd4e9fd6015f'/>
<id>a440d48c7f93af5bae86af676cc6cd4e9fd6015f</id>
<content type='text'>
Copy sample data from the input data range to sample buffer then
calculate byte value count for that sample into bucket.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
[ minor comment updates ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Copy sample data from the input data range to sample buffer then
calculate byte value count for that sample into bucket.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
[ minor comment updates ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: heuristic: add bucket and sample counters and other defines</title>
<updated>2017-11-01T19:45:36+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-09-28T14:33:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=17b5a6c17e265ca84fac9c3256ff86c691f04aab'/>
<id>17b5a6c17e265ca84fac9c3256ff86c691f04aab</id>
<content type='text'>
Add basic defines and structures for data sampling.

Added macros:
 - For future sampling algo
 - For bucket size

Heuristic workspace:
 - Add bucket for storing byte type counters
 - Add sample array for storing partial copy of input data range
 - Add counter for store current sample size to workspace

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ minor coding style fixes, comments updated ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add basic defines and structures for data sampling.

Added macros:
 - For future sampling algo
 - For bucket size

Heuristic workspace:
 - Add bucket for storing byte type counters
 - Add sample array for storing partial copy of input data range
 - Add counter for store current sample size to workspace

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ minor coding style fixes, comments updated ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Btrfs: compression: separate heuristic/compression workspaces</title>
<updated>2017-11-01T19:45:35+00:00</updated>
<author>
<name>Timofey Titovets</name>
<email>nefelim4ag@gmail.com</email>
</author>
<published>2017-09-28T14:33:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4e439a0b184f014a5833fa468af36cc9f59b8fb1'/>
<id>4e439a0b184f014a5833fa468af36cc9f59b8fb1</id>
<content type='text'>
Compression heuristic itself is not a compression type, as current
infrastructure provides workspaces for several compression types, it's
difficult to just add heuristic workspace.

Just refactor the code to support compression/heuristic workspaces with
maximum code sharing and minimum changes in it.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ coding style fixes ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Compression heuristic itself is not a compression type, as current
infrastructure provides workspaces for several compression types, it's
difficult to just add heuristic workspace.

Just refactor the code to support compression/heuristic workspaces with
maximum code sharing and minimum changes in it.

Signed-off-by: Timofey Titovets &lt;nefelim4ag@gmail.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ coding style fixes ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: allow setting zlib compression level via :9</title>
<updated>2017-11-01T19:45:34+00:00</updated>
<author>
<name>Adam Borowski</name>
<email>kilobyte@angband.pl</email>
</author>
<published>2017-09-15T15:36:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fa4d885a482ef52ad3efa12a5799a3f6408b0718'/>
<id>fa4d885a482ef52ad3efa12a5799a3f6408b0718</id>
<content type='text'>
This is bikeshedding, but it seems people are drastically more likely to
understand "zlib:9" as compression level rather than an algorithm
version compared to "zlib9".

Based on feedback on the mailinglist, the ":9" will be the only accepted
syntax. The level must be a single digit. Unrecognized format will
result to the default, for forward compatibility in a similar way the
compression algorithm specifier was relaxed in commit
a7164fa4e055daf6368c ("btrfs: prepare for extensions in compression
options").

Signed-off-by: Adam Borowski &lt;kilobyte@angband.pl&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ tighten the accepted format ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is bikeshedding, but it seems people are drastically more likely to
understand "zlib:9" as compression level rather than an algorithm
version compared to "zlib9".

Based on feedback on the mailinglist, the ":9" will be the only accepted
syntax. The level must be a single digit. Unrecognized format will
result to the default, for forward compatibility in a similar way the
compression algorithm specifier was relaxed in commit
a7164fa4e055daf6368c ("btrfs: prepare for extensions in compression
options").

Signed-off-by: Adam Borowski &lt;kilobyte@angband.pl&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
[ tighten the accepted format ]
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
