linux-stable.git/drivers/block/zram, branch v6.17

zram: fix slot write race condition

2025-09-16T03:01:45+00:00

Parallel concurrent writes to the same zram index result in leaked
zsmalloc handles.  Schematically we can have something like this:

CPU0                              CPU1
zram_slot_lock()
zs_free(handle)
zram_slot_lock()
				zram_slot_lock()
				zs_free(handle)
				zram_slot_lock()

compress			compress
handle = zs_malloc()		handle = zs_malloc()
zram_slot_lock
zram_set_handle(handle)
zram_slot_lock
				zram_slot_lock
				zram_set_handle(handle)
				zram_slot_lock

Either CPU0 or CPU1 zsmalloc handle will leak because zs_free() is done
too early.  In fact, we need to reset zram entry right before we set its
new handle, all under the same slot lock scope.

Link: https://lkml.kernel.org/r/20250909045150.635345-1-senozhatsky@chromium.org
Fixes: 71268035f5d7 ("zram: free slot memory early during write")
Signed-off-by: Sergey Senozhatsky 
Reported-by: Changhui Zhong 
Closes: https://lore.kernel.org/all/CAGVVp+UtpGoW5WEdEU7uVTtsSCjPN=ksN6EcvyypAtFDOUf30A@mail.gmail.com/
Tested-by: Changhui Zhong 
Cc: Jens Axboe 
Cc: Minchan Kim 
Cc: 
Signed-off-by: Andrew Morton

zram: pass buffer offset to zcomp_available_show()

2025-07-04T01:56:51+00:00

In most cases zcomp_available_show() is the only emitting
function that is called from sysfs read() handler, so it
assumes that there is a whole PAGE_SIZE buffer to work with.
There is an exception, however: recomp_algorithm_show().

In recomp_algorithm_show() we prepend the buffer with
priority number before we pass it to zcomp_available_show(),
so it cannot assume PAGE_SIZE anymore and must take
recomp_algorithm_show() modifications into consideration.
Therefore we need to pass buffer offset to zcomp_available_show().

Also convert it to use sysfs_emit_at(), to stay aligned
with the rest of zram's sysfs read() handlers.

On practice we are never even close to using the whole PAGE_SIZE
buffer, so that's not a critical bug, but still.

Signed-off-by: Sergey Senozhatsky 
Link: https://lore.kernel.org/r/20250627071840.1394242-1-senozhatsky@chromium.org
Signed-off-by: Jens Axboe

block: zram: replace scnprintf() with sysfs_emit() in *_show() functions

2025-07-04T01:56:51+00:00

Replace scnprintf() with sysfs_emit() or sysfs_emit_at() in sysfs
*_show() functions in zram_drv.c to follow the kernel's guidelines
from Documentation/filesystems/sysfs.rst.

This improves consistency, safety, and makes the code easier to
maintain and update in the future.

Signed-off-by: Rahul Kumar 
Reviewed-by: Sergey Senozhatsky 
Link: https://lore.kernel.org/r/20250627035256.1120740-1-rk0006818@gmail.com
Signed-off-by: Jens Axboe

zram: support deflate-specific params

2025-06-01T05:46:07+00:00

Introduce support of algorithm specific parameters in algorithm_params
device attribute.  The expected format is algorithm.param=value.

For starters, add support for deflate.winbits parameter.

Link: https://lkml.kernel.org/r/20250514024825.1745489-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky 
Reviewed-by: Mikhail Zaslonko 
Cc: Minchan Kim 
Signed-off-by: Andrew Morton

zram: rename ZCOMP_PARAM_NO_LEVEL

2025-06-01T05:46:07+00:00

Patch series "zram: support algorithm-specific parameters".

This patchset adds support for algorithm-specific parameters.  For now,
only deflate-specific winbits can be configured, which fixes deflate
support on some s390 setups.


This patch (of 2):

Use more generic name because this will be default "un-set"
value for more params in the future.

Link: https://lkml.kernel.org/r/20250514024825.1745489-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20250514024825.1745489-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky 
Reviewed-by: Mikhail Zaslonko 
Cc: Minchan Kim 
Signed-off-by: Andrew Morton

zram: modernize writeback interface

2025-05-12T00:48:09+00:00

The writeback interface supports a page_index=N parameter which performs
writeback of the given page.  Since we rarely need to writeback just one
single page, the typical use case involves a number of writeback calls,
each performing writeback of one page:

  echo page_index=100 > zram0/writeback
  ...
  echo page_index=200 > zram0/writeback
  echo page_index=500 > zram0/writeback
  ...
  echo page_index=700 > zram0/writeback

One obvious downside of this is that it increases the number of syscalls. 
Less obvious, but a significantly more important downside, is that when
given only one page to post-process zram cannot perform an optimal target
selection.  This becomes a critical limitation when writeback_limit is
enabled, because under writeback_limit we want to guarantee the highest
memory savings hence we first need to writeback pages that release the
highest amount of zsmalloc pool memory.

This patch adds page_indexes=LOW-HIGH parameter to the writeback
interface:

  echo page_indexes=100-200 page_indexes=500-700 > zram0/writeback

This gives zram a chance to apply an optimal target selection strategy on
each iteration of the writeback loop.

We also now permit multiple page_index parameters per call (previously
zram would recognize only one page_index) and a mix or single pages and
page ranges:

  echo page_index=42 page_index=99 page_indexes=100-200 \
       page_indexes=500-700 > zram0/writeback

Apart from that the patch also unifies parameters passing and resembles
other "modern" zram device attributes (e.g.  recompression), while the old
interface used a mixed scheme: values-less parameters for mode and a
key=value format for page_index.  We still support the "old" value-less
format for compatibility reasons.

[senozhatsky@chromium.org: simplify parse_page_index() range checks, per Brian]
  nk: https://lkml.kernel.org/r/20250404015327.2427684-1-senozhatsky@chromium.org
[sozhatsky@chromium.org: fix uninitialized variable in zram_writeback_slots(), per Dan]
  nk: https://lkml.kernel.org/r/20250409112611.1154282-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20250327015818.4148660-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky 
Reviewed-by: Brian Geffon 
Cc: Minchan Kim 
Cc: Richard Chang 
Cc: Sergey Senozhatsky 
Cc: Dan Carpenter 
Signed-off-by: Andrew Morton

zsmalloc: prefer the the original page's node for compressed data

2025-05-12T00:48:06+00:00

Currently, zsmalloc, zswap's and zram's backend memory allocator, does not
enforce any policy for the allocation of memory for the compressed data,
instead just adopting the memory policy of the task entering reclaim, or
the default policy (prefer local node) if no such policy is specified. 
This can lead to several pathological behaviors in multi-node NUMA
systems:

1. Systems with CXL-based memory tiering can encounter the following
   inversion with zswap/zram: the coldest pages demoted to the CXL tier
   can return to the high tier when they are reclaimed to compressed swap,
   creating memory pressure on the high tier.

2. Consider a direct reclaimer scanning nodes in order of allocation
   preference.  If it ventures into remote nodes, the memory it compresses
   there should stay there.  Trying to shift those contents over to the
   reclaiming thread's preferred node further *increases* its local
   pressure, and provoking more spills.  The remote node is also the most
   likely to refault this data again.  This undesirable behavior was
   pointed out by Johannes Weiner in [1].

3. For zswap writeback, the zswap entries are organized in
   node-specific LRUs, based on the node placement of the original pages,
   allowing for targeted zswap writeback for specific nodes.

   However, the compressed data of a zswap entry can be placed on a
   different node from the LRU it is placed on.  This means that reclaim
   targeted at one node might not free up memory used for zswap entries in
   that node, but instead reclaiming memory in a different node.

All of these issues will be resolved if the compressed data go to the same
node as the original page.  This patch encourages this behavior by having
zswap and zram pass the node of the original page to zsmalloc, and have
zsmalloc prefer the specified node if we need to allocate new (zs)pages
for the compressed data.

Note that we are not strictly binding the allocation to the preferred
node.  We still allow the allocation to fall back to other nodes when the
preferred node is full, or if we have zspages with slots available on a
different node.  This is OK, and still a strict improvement over the
status quo:

1. On a system with demotion enabled, we will generally prefer
   demotions over compressed swapping, and only swap when pages have
   already gone to the lowest tier.  This patch should achieve the desired
   effect for the most part.

2. If the preferred node is out of memory, letting the compressed data
   going to other nodes can be better than the alternative (OOMs, keeping
   cold memory unreclaimed, disk swapping, etc.).

3. If the allocation go to a separate node because we have a zspage
   with slots available, at least we're not creating extra immediate
   memory pressure (since the space is already allocated).

3. While there can be mixings, we generally reclaim pages in same-node
   batches, which encourage zspage grouping that is more likely to go to
   the right node.

4. A strict binding would require partitioning zsmalloc by node, which
   is more complicated, and more prone to regression, since it reduces the
   storage density of zsmalloc.  We need to evaluate the tradeoff and
   benchmark carefully before adopting such an involved solution.

[1]: https://lore.kernel.org/linux-mm/20250331165306.GC2110528@cmpxchg.org/

[senozhatsky@chromium.org: coding-style fixes]
  Link: https://lkml.kernel.org/r/mnvexa7kseswglcqbhlot4zg3b3la2ypv2rimdl5mh5glbmhvz@wi6bgqn47hge
Link: https://lkml.kernel.org/r/20250402204416.3435994-1-nphamcs@gmail.com
Signed-off-by: Nhat Pham 
Suggested-by: Gregory Price 
Acked-by: Dan Williams 
Reviewed-by: Chengming Zhou 
Acked-by: Sergey Senozhatsky 	[zram, zsmalloc]
Acked-by: Johannes Weiner 
Acked-by: Yosry Ahmed 	[zswap/zsmalloc]
Cc: "Huang, Ying" 
Cc: Joanthan Cameron 
Cc: Minchan Kim 
Cc: SeongJae Park 
Signed-off-by: Andrew Morton

zram: add might_sleep to zcomp API

2025-03-17T05:06:37+00:00

Explicitly state that zcomp compress/decompress must be called from
non-atomic context.

Link: https://lkml.kernel.org/r/20250303022425.285971-20-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky 
Cc: Hillf Danton 
Cc: Kairui Song 
Cc: Minchan Kim 
Cc: Sebastian Andrzej Siewior 
Cc: Yosry Ahmed 
Signed-off-by: Andrew Morton

zram: do not leak page on writeback_store error path

2025-03-17T05:06:37+00:00

Ensure the page used for local object data is freed on error out path.

Link: https://lkml.kernel.org/r/20250303022425.285971-19-senozhatsky@chromium.org
Fixes: 330edc2bc059 (zram: rework writeback target selection strategy)
Signed-off-by: Sergey Senozhatsky 
Cc: Hillf Danton 
Cc: Kairui Song 
Cc: Minchan Kim 
Cc: Sebastian Andrzej Siewior 
Cc: Yosry Ahmed 
Signed-off-by: Andrew Morton

zram: do not leak page on recompress_store error path

2025-03-17T05:06:36+00:00

Ensure the page used for local object data is freed on error out path.

Link: https://lkml.kernel.org/r/20250303022425.285971-18-senozhatsky@chromium.org
Fixes: 3f909a60cec1 ("zram: rework recompress target selection strategy")
Signed-off-by: Sergey Senozhatsky 
Cc: Hillf Danton 
Cc: Kairui Song 
Cc: Minchan Kim 
Cc: Sebastian Andrzej Siewior 
Cc: Yosry Ahmed 
Signed-off-by: Andrew Morton