linux.git/fs/btrfs/compression.c, branch v4.15

btrfs: Fix wild memory access in compression level parser

2017-11-27T16:01:11+00:00

[BUG]
Kernel panic when mounting with "-o compress" mount option.
KASAN will report like:
------
==================================================================
BUG: KASAN: wild-memory-access in strncmp+0x31/0xc0
Read of size 1 at addr d86735fce994f800 by task mount/662
...
Call Trace:
 dump_stack+0xe3/0x175
 kasan_report+0x163/0x370
 __asan_load1+0x47/0x50
 strncmp+0x31/0xc0
 btrfs_compress_str2level+0x20/0x70 [btrfs]
 btrfs_parse_options+0xff4/0x1870 [btrfs]
 open_ctree+0x2679/0x49f0 [btrfs]
 btrfs_mount+0x1b7f/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 vfs_kern_mount+0x13/0x20
 btrfs_mount+0x31e/0x1d30 [btrfs]
 mount_fs+0x49/0x190
 vfs_kern_mount.part.29+0xba/0x280
 do_mount+0xaad/0x1a00
 SyS_mount+0x98/0xe0
 entry_SYSCALL_64_fastpath+0x1f/0xbe
------

[Cause]
For 'compress' and 'compress_force' options, its token doesn't expect
any parameter so its args[0] contains uninitialized data.
Accessing args[0] will cause above wild memory access.

[Fix]
For Opt_compress and Opt_compress_force, set compression level to
the default.

Signed-off-by: Qu Wenruo 
Reviewed-by: David Sterba 
[ set the default in advance ]
Signed-off-by: David Sterba

Btrfs: add write_flags for compression bio

2017-11-15T13:44:31+00:00

Compression code path has only flaged bios with REQ_OP_WRITE no matter
where the bios come from, but it could be a sync write if fsync starts
this writeback or a normal writeback write if wb kthread starts a
periodic writeback.

It breaks the rule that sync writes and writeback writes need to be
differentiated from each other, because from the POV of block layer,
all bios need to be recognized by these flags in order to do some
management, e.g. throttlling.

This passes writeback_control to compression write path so that it can
send bios with proper flags to block layer.

Signed-off-by: Liu Bo 
Reviewed-by: David Sterba 
Signed-off-by: David Sterba

Btrfs: heuristic: add Shannon entropy calculation

2017-11-01T19:45:36+00:00

Byte distribution check in heuristic will filter edge data cases and
some time fail to classify input data.

Let's fix that by adding Shannon entropy calculation, that will cover
classification of most other data types.

As Shannon entropy needs log2 with some precision to work, let's use
ilog2(N) and for increased precision, by do ilog2(pow(N, 4)).

Shannon entropy has been slightly changed to avoid signed numbers and
division.

The calculation is direct by the formula, successor of precalculated
table or chains of if-else.

The accuracy errors of ilog2 are compensated by

@ENTROPY_LVL_ACEPTABLE 70 -> 65
@ENTROPY_LVL_HIGH      85 -> 80

Signed-off-by: Timofey Titovets 
Reviewed-by: David Sterba 
[ update comments ]
Signed-off-by: David Sterba

Btrfs: heuristic: add byte core set calculation

2017-11-01T19:45:36+00:00

Calculate byte core set for data sample:
- sort buckets' numbers in decreasing order
- count how many values cover 90% of the sample

If the core set size is low (<=25%), data are easily compressible.
If the core set size is high (>=80%), data are not compressible.

Signed-off-by: Timofey Titovets 
Reviewed-by: David Sterba 
[ update comments ]
Signed-off-by: David Sterba

Btrfs: heuristic: add byte set calculation

2017-11-01T19:45:36+00:00

Calculate byte set size for data sample:
- calculate how many unique bytes have been in the sample
- for all bytes count > 0, check if we're still in the low count range
  (~25%), such data are easily compressible, otherwise furhter analysis
  is needed

Signed-off-by: Timofey Titovets 
Reviewed-by: David Sterba 
[ update comments ]
Signed-off-by: David Sterba

Btrfs: heuristic: add detection of repeated data patterns

2017-11-01T19:45:36+00:00

Walk over data sample and use memcmp to detect repeated patterns, like
zeros, but a bit more general.

Signed-off-by: Timofey Titovets 
Reviewed-by: David Sterba 
[ minor coding style fixes ]
Signed-off-by: David Sterba

Btrfs: heuristic: implement sampling logic

2017-11-01T19:45:36+00:00

Copy sample data from the input data range to sample buffer then
calculate byte value count for that sample into bucket.

Signed-off-by: Timofey Titovets 
[ minor comment updates ]
Signed-off-by: David Sterba

Btrfs: heuristic: add bucket and sample counters and other defines

2017-11-01T19:45:36+00:00

Add basic defines and structures for data sampling.

Added macros:
 - For future sampling algo
 - For bucket size

Heuristic workspace:
 - Add bucket for storing byte type counters
 - Add sample array for storing partial copy of input data range
 - Add counter for store current sample size to workspace

Signed-off-by: Timofey Titovets 
Reviewed-by: David Sterba 
[ minor coding style fixes, comments updated ]
Signed-off-by: David Sterba

Btrfs: compression: separate heuristic/compression workspaces

2017-11-01T19:45:35+00:00

Compression heuristic itself is not a compression type, as current
infrastructure provides workspaces for several compression types, it's
difficult to just add heuristic workspace.

Just refactor the code to support compression/heuristic workspaces with
maximum code sharing and minimum changes in it.

Signed-off-by: Timofey Titovets 
Reviewed-by: David Sterba 
[ coding style fixes ]
Signed-off-by: David Sterba

btrfs: allow setting zlib compression level via :9

2017-11-01T19:45:34+00:00

This is bikeshedding, but it seems people are drastically more likely to
understand "zlib:9" as compression level rather than an algorithm
version compared to "zlib9".

Based on feedback on the mailinglist, the ":9" will be the only accepted
syntax. The level must be a single digit. Unrecognized format will
result to the default, for forward compatibility in a similar way the
compression algorithm specifier was relaxed in commit
a7164fa4e055daf6368c ("btrfs: prepare for extensions in compression
options").

Signed-off-by: Adam Borowski 
Reviewed-by: David Sterba 
[ tighten the accepted format ]
Signed-off-by: David Sterba