linux-stable.git/drivers/md, branch v6.2.9

dm crypt: avoid accessing uninitialized tasklet

2023-03-30T10:51:42+00:00

commit d9a02e016aaf5a57fb44e9a5e6da8ccd3b9e2e70 upstream.

When neither "no_read_workqueue" nor "no_write_workqueue" are enabled,
tasklet_trylock() in crypt_dec_pending() may still return false due to
an uninitialized state, and dm-crypt will unnecessarily do io completion
in io_queue workqueue instead of current context.

Fix this by adding an 'in_tasklet' flag to dm_crypt_io struct and
initialize it to false in crypt_io_init(). Set this flag to true in
kcryptd_queue_crypt() before calling tasklet_schedule(). If set
crypt_dec_pending() will punt io completion to a workqueue.

This also nicely avoids the tasklet_trylock/unlock hack when tasklets
aren't in use.

Fixes: 8e14f610159d ("dm crypt: do not call bio_endio() from the dm-crypt tasklet")
Cc: stable@vger.kernel.org
Reported-by: Hou Tao 
Suggested-by: Ignat Korchagin 
Reviewed-by: Ignat Korchagin 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm crypt: add cond_resched() to dmcrypt_write()

2023-03-30T10:51:42+00:00

commit fb294b1c0ba982144ca467a75e7d01ff26304e2b upstream.

The loop in dmcrypt_write may be running for unbounded amount of time,
thus we need cond_resched() in it.

This commit fixes the following warning:

[ 3391.153255][   C12] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [dmcrypt_write/2:2897]
...
[ 3391.387210][   C12] Call trace:
[ 3391.390338][   C12]  blk_attempt_bio_merge.part.6+0x38/0x158
[ 3391.395970][   C12]  blk_attempt_plug_merge+0xc0/0x1b0
[ 3391.401085][   C12]  blk_mq_submit_bio+0x398/0x550
[ 3391.405856][   C12]  submit_bio_noacct+0x308/0x380
[ 3391.410630][   C12]  dmcrypt_write+0x1e4/0x208 [dm_crypt]
[ 3391.416005][   C12]  kthread+0x130/0x138
[ 3391.419911][   C12]  ret_from_fork+0x10/0x18

Reported-by: yangerkun 
Fixes: dc2676210c42 ("dm crypt: offload writes to thread")
Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm stats: check for and propagate alloc_percpu failure

2023-03-30T10:51:42+00:00

commit d3aa3e060c4a80827eb801fc448debc9daa7c46b upstream.

Check alloc_precpu()'s return value and return an error from
dm_stats_init() if it fails. Update alloc_dev() to fail if
dm_stats_init() does.

Otherwise, a NULL pointer dereference will occur in dm_stats_cleanup()
even if dm-stats isn't being actively used.

Fixes: fd2ed4d25270 ("dm: add statistics support")
Cc: stable@vger.kernel.org
Signed-off-by: Jiasheng Jiang 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm thin: fix deadlock when swapping to thin device

2023-03-30T10:51:35+00:00

commit 9bbf5feecc7eab2c370496c1c161bbfe62084028 upstream.

This is an already known issue that dm-thin volume cannot be used as
swap, otherwise a deadlock may happen when dm-thin internal memory
demand triggers swap I/O on the dm-thin volume itself.

But thanks to commit a666e5c05e7c ("dm: fix deadlock when swapping to
encrypted device"), the limit_swap_bios target flag can also be used
for dm-thin to avoid the recursive I/O when it is used as swap.

Fix is to simply set ti->limit_swap_bios to true in both pool_ctr()
and thin_ctr().

In my test, I create a dm-thin volume /dev/vg/swap and use it as swap
device. Then I run fio on another dm-thin volume /dev/vg/main and use
large --blocksize to trigger swap I/O onto /dev/vg/swap.

The following fio command line is used in my test,
  fio --name recursive-swap-io --lockmem 1 --iodepth 128 \
     --ioengine libaio --filename /dev/vg/main --rw randrw \
    --blocksize 1M --numjobs 32 --time_based --runtime=12h

Without this fix, the whole system can be locked up within 15 seconds.

With this fix, there is no any deadlock or hung task observed after
2 hours of running fio.

Furthermore, if blocksize is changed from 1M to 128M, after around 30
seconds fio has no visible I/O, and the out-of-memory killer message
shows up in kernel message. After around 20 minutes all fio processes
are killed and the whole system is back to being alive.

This is exactly what is expected when recursive I/O happens on dm-thin
volume when it is used as swap.

Depends-on: a666e5c05e7c ("dm: fix deadlock when swapping to encrypted device")
Cc: stable@vger.kernel.org
Signed-off-by: Coly Li 
Acked-by: Mikulas Patocka 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

md: select BLOCK_LEGACY_AUTOLOAD

2023-03-22T12:38:00+00:00

commit 6c0f5898836c05c6d850a750ed7940ba29e4e6c5 upstream.

When BLOCK_LEGACY_AUTOLOAD is not enable, mdadm is not able to
activate new arrays unless "CREATE names=yes" appears in
mdadm.conf

As this is a regression we need to always enable BLOCK_LEGACY_AUTOLOAD
for when MD is selected - at least until mdadm is updated and the
updates widely available.

Cc: stable@vger.kernel.org # v5.18+
Fixes: fbdee71bb5d8 ("block: deprecate autoloading based on dev_t")
Signed-off-by: NeilBrown 
Signed-off-by: Song Liu 
Signed-off-by: Greg Kroah-Hartman

block: count 'ios' and 'sectors' when io is done for bio-based device

2023-03-22T12:37:50+00:00

[ Upstream commit 5f27571382ca42daa3e3d40d1b252bf18c2b61d2 ]

While using iostat for raid, I observed very strange 'await'
occasionally, and turns out it's due to that 'ios' and 'sectors' is
counted in bdev_start_io_acct(), while 'nsecs' is counted in
bdev_end_io_acct(). I'm not sure why they are ccounted like that
but I think this behaviour is obviously wrong because user will get
wrong disk stats.

Fix the problem by counting 'ios' and 'sectors' when io is done, like
what rq-based device does.

Fixes: 394ffa503bc4 ("blk: introduce generic io stat accounting help function")
Signed-off-by: Yu Kuai 
Reviewed-by: Christoph Hellwig 
Link: https://lore.kernel.org/r/20230223091226.1135678-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

dm flakey: fix a bug with 32-bit highmem systems

2023-03-10T08:29:45+00:00

commit 8eb29c4fbf9661e6bd4dd86197a37ffe0ecc9d50 upstream.

The function page_address does not work with 32-bit systems with high
memory. Use bvec_kmap_local/kunmap_local instead.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka 
Reviewed-by: Sweet Tea Dorminy 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm flakey: don't corrupt the zero page

2023-03-10T08:29:44+00:00

commit f50714b57aecb6b3dc81d578e295f86d9c73f078 upstream.

When we need to zero some range on a block device, the function
__blkdev_issue_zero_pages submits a write bio with the bio vector pointing
to the zero page. If we use dm-flakey with corrupt bio writes option, it
will corrupt the content of the zero page which results in crashes of
various userspace programs. Glibc assumes that memory returned by mmap is
zeroed and it uses it for calloc implementation; if the newly mapped
memory is not zeroed, calloc will return non-zeroed memory.

Fix this bug by testing if the page is equal to ZERO_PAGE(0) and
avoiding the corruption in this case.

Cc: stable@vger.kernel.org
Fixes: a00f5276e266 ("dm flakey: Properly corrupt multi-page bios.")
Signed-off-by: Mikulas Patocka 
Reviewed-by: Sweet Tea Dorminy 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm cache: free background tracker's queued work in btracker_destroy

2023-03-10T08:29:44+00:00

commit 95ab80a8a0fef2ce0cc494a306dd283948066ce7 upstream.

Otherwise the kernel can BUG with:

[ 2245.426978] =============================================================================
[ 2245.435155] BUG bt_work (Tainted: G    B   W         ): Objects remaining in bt_work on __kmem_cache_shutdown()
[ 2245.445233] -----------------------------------------------------------------------------
[ 2245.445233]
[ 2245.454879] Slab 0x00000000b0ce2b30 objects=64 used=2 fp=0x000000000a3c6a4e flags=0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
[ 2245.467300] CPU: 7 PID: 10805 Comm: lvm Kdump: loaded Tainted: G    B   W          6.0.0-rc2 #19
[ 2245.476078] Hardware name: Dell Inc. PowerEdge R7525/0590KW, BIOS 2.5.6 10/06/2021
[ 2245.483646] Call Trace:
[ 2245.486100]  
[ 2245.488206]  dump_stack_lvl+0x34/0x48
[ 2245.491878]  slab_err+0x95/0xcd
[ 2245.495028]  __kmem_cache_shutdown.cold+0x31/0x136
[ 2245.499821]  kmem_cache_destroy+0x49/0x130
[ 2245.503928]  btracker_destroy+0x12/0x20 [dm_cache]
[ 2245.508728]  smq_destroy+0x15/0x60 [dm_cache_smq]
[ 2245.513435]  dm_cache_policy_destroy+0x12/0x20 [dm_cache]
[ 2245.518834]  destroy+0xc0/0x110 [dm_cache]
[ 2245.522933]  dm_table_destroy+0x5c/0x120 [dm_mod]
[ 2245.527649]  __dm_destroy+0x10e/0x1c0 [dm_mod]
[ 2245.532102]  dev_remove+0x117/0x190 [dm_mod]
[ 2245.536384]  ctl_ioctl+0x1a2/0x290 [dm_mod]
[ 2245.540579]  dm_ctl_ioctl+0xa/0x20 [dm_mod]
[ 2245.544773]  __x64_sys_ioctl+0x8a/0xc0
[ 2245.548524]  do_syscall_64+0x5c/0x90
[ 2245.552104]  ? syscall_exit_to_user_mode+0x12/0x30
[ 2245.556897]  ? do_syscall_64+0x69/0x90
[ 2245.560648]  ? do_syscall_64+0x69/0x90
[ 2245.564394]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 2245.569447] RIP: 0033:0x7fe52583ec6b
...
[ 2245.646771] ------------[ cut here ]------------
[ 2245.651395] kmem_cache_destroy bt_work: Slab cache still has objects when called from btracker_destroy+0x12/0x20 [dm_cache]
[ 2245.651408] WARNING: CPU: 7 PID: 10805 at mm/slab_common.c:478 kmem_cache_destroy+0x128/0x130

Found using: lvm2-testsuite --only "cache-single-split.sh"

Ben bisected and found that commit 0495e337b703 ("mm/slab_common:
Deleting kobject in kmem_cache_destroy() without holding
slab_mutex/cpu_hotplug_lock") first exposed dm-cache's incomplete
cleanup of its background tracker work objects.

Reported-by: Benjamin Marzinski 
Tested-by: Benjamin Marzinski 
Cc: stable@vger.kernel.org # 6.0+
Signed-off-by: Joe Thornber 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman

dm flakey: fix logic when corrupting a bio

2023-03-10T08:29:44+00:00

commit aa56b9b75996ff4c76a0a4181c2fa0206c3d91cc upstream.

If "corrupt_bio_byte" is set to corrupt reads and corrupt_bio_flags is
used, dm-flakey would erroneously return all writes as errors. Likewise,
if "corrupt_bio_byte" is set to corrupt writes, dm-flakey would return
errors for all reads.

Fix the logic so that if fc->corrupt_bio_byte is non-zero, dm-flakey
will not abort reads on writes with an error.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka 
Reviewed-by: Sweet Tea Dorminy 
Signed-off-by: Mike Snitzer 
Signed-off-by: Greg Kroah-Hartman