linux-stable.git/drivers/block, branch v6.18.27

rbd: fix null-ptr-deref when device_add_disk() fails

2026-05-07T04:11:41+00:00

commit d1fef92e414433ca7b89abf85cb0df42b8d475eb upstream.

do_rbd_add() publishes the device with device_add() before calling
device_add_disk(). If device_add_disk() fails after device_add()
succeeds, the error path calls rbd_free_disk() directly and then later
falls through to rbd_dev_device_release(), which calls rbd_free_disk()
again. This double teardown can leave blk-mq cleanup operating on
invalid state and trigger a null-ptr-deref in
__blk_mq_free_map_and_rqs(), reached from blk_mq_free_tag_set().

Fix this by following the normal remove ordering: call device_del()
before rbd_dev_device_release() when device_add_disk() fails after
device_add(). That keeps the teardown sequence consistent and avoids
re-entering disk cleanup through the wrong path.

The bug was first flagged by an experimental analysis tool we are
developing for kernel memory-management bugs while analyzing
v6.13-rc1. The tool is still under development and is not yet publicly
available.

We reproduced the bug on v7.0 with a real Ceph backend and a QEMU x86_64
guest booted with KASAN and CONFIG_FAILSLAB enabled. The reproducer
confines failslab injections to the __add_disk() range and injects
fail-nth while mapping an RBD image through
/sys/bus/rbd/add_single_major.

On the unpatched kernel, fail-nth=4 reliably triggered the fault:

	Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
	KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
	CPU: 0 UID: 0 PID: 273 Comm: bash Not tainted 7.0.0-01247-gd60bc1401583 #6 PREEMPT(lazy)
	Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
	RIP: 0010:__blk_mq_free_map_and_rqs+0x8c/0x240
	Code: 00 00 48 8b 6b 60 41 89 f4 49 c1 e4 03 4c 01 e5 45 85 ed 0f 85 0a 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 e9 48 c1 e9 03 <80> 3c 01 00 0f 85 31 01 00 00 4c 8b 6d 00 4d 85 ed 0f 84 e2 00 00
	RSP: 0018:ff1100000ab0fac8 EFLAGS: 00000246
	RAX: dffffc0000000000 RBX: ff1100000c4806a0 RCX: 0000000000000000
	RDX: 0000000000000002 RSI: 0000000000000000 RDI: ff1100000c4806f4
	RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000189001b
	R10: ff1100000c4800df R11: ff1100006cf37be0 R12: 0000000000000000
	R13: 0000000000000000 R14: ff1100000c480700 R15: ff1100000c480004
	FS:  00007f0fbe8fe740(0000) GS:ff110000e5851000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007fe53473b2e0 CR3: 0000000012eef000 CR4: 00000000007516f0
	PKRU: 55555554
	Call Trace:
	 
	 blk_mq_free_tag_set+0x77/0x460
	 do_rbd_add+0x1446/0x2b80
	 ? __pfx_do_rbd_add+0x10/0x10
	 ? lock_acquire+0x18c/0x300
	 ? find_held_lock+0x2b/0x80
	 ? sysfs_file_kobj+0xb6/0x1b0
	 ? __pfx_sysfs_kf_write+0x10/0x10
	 kernfs_fop_write_iter+0x2f4/0x4a0
	 vfs_write+0x98e/0x1000
	 ? expand_files+0x51f/0x850
	 ? __pfx_vfs_write+0x10/0x10
	 ksys_write+0xf2/0x1d0
	 ? __pfx_ksys_write+0x10/0x10
	 do_syscall_64+0x115/0x690
	 entry_SYSCALL_64_after_hwframe+0x77/0x7f
	RIP: 0033:0x7f0fbea15907
	Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
	RSP: 002b:00007ffe22346ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
	RAX: ffffffffffffffda RBX: 0000000000000058 RCX: 00007f0fbea15907
	RDX: 0000000000000058 RSI: 0000563ace6c0ef0 RDI: 0000000000000001
	RBP: 0000563ace6c0ef0 R08: 0000563ace6c0ef0 R09: 6b6435726d694141
	R10: 5250337279762f78 R11: 0000000000000246 R12: 0000000000000058
	R13: 00007f0fbeb1c780 R14: ff1100000c480700 R15: ff1100000c480004
	 

With this fix applied, rerunning the reproducer over fail-nth=1..256
yields no KASAN reports.

[ idryomov: rename err_out_device_del -> err_out_device ]

Cc: stable@vger.kernel.org
Fixes: 27c97abc30e2 ("rbd: add add_disk() error handling")
Signed-off-by: Zilin Guan 
Signed-off-by: Dawei Feng 
Reviewed-by: Ilya Dryomov 
Signed-off-by: Ilya Dryomov 
Signed-off-by: Greg Kroah-Hartman

zram: do not forget to endio for partial discard requests

2026-05-07T04:11:34+00:00

commit e3668b371329ea036ff022ce8ecc82f8befcf003 upstream.

As reported by Qu Wenruo and Avinesh Kumar, the following

 getconf PAGESIZE
 65536
 blkdiscard -p 4k /dev/zram0

takes literally forever to complete.  zram doesn't support partial
discards and just returns immediately w/o doing any discard work in such
cases.  The problem is that we forget to endio on our way out, so
blkdiscard sleeps forever in submit_bio_wait().  Fix this by jumping to
end_bio label, which does bio_endio().

Link: https://lore.kernel.org/20260331074255.777019-1-senozhatsky@chromium.org
Fixes: 0120dd6e4e20 ("zram: make zram_bio_discard more self-contained")
Signed-off-by: Sergey Senozhatsky 
Reported-by: Qu Wenruo 
Closes: https://lore.kernel.org/linux-block/92361cd3-fb8b-482e-bc89-15ff1acb9a59@suse.com
Tested-by: Qu Wenruo 
Reported-by: Avinesh Kumar 
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1256530
Reviewed-by: Christoph Hellwig 
Cc: Brian Geffon 
Cc: Jens Axboe 
Cc: Minchan Kim 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Greg Kroah-Hartman

ublk: fix NULL pointer dereference in ublk_ctrl_set_size()

2026-03-25T10:10:32+00:00

[ Upstream commit 25966fc097691e5c925ad080f64a2f19c5fd940a ]

ublk_ctrl_set_size() unconditionally dereferences ub->ub_disk via
set_capacity_and_notify() without checking if it is NULL.

ub->ub_disk is NULL before UBLK_CMD_START_DEV completes (it is only
assigned in ublk_ctrl_start_dev()) and after UBLK_CMD_STOP_DEV runs
(ublk_detach_disk() sets it to NULL). Since the UBLK_CMD_UPDATE_SIZE
handler performs no state validation, a user can trigger a NULL pointer
dereference by sending UPDATE_SIZE to a device that has been added but
not yet started, or one that has been stopped.

Fix this by checking ub->ub_disk under ub->mutex before dereferencing
it, and returning -ENODEV if the disk is not available.

Fixes: 98b995660bff ("ublk: Add UBLK_U_CMD_UPDATE_SIZE")
Cc: stable@vger.kernel.org
Signed-off-by: Mehul Rao 
Reviewed-by: Ming Lei 
Signed-off-by: Jens Axboe 
[ adapted `&header` to `header` ]
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

drbd: fix null-pointer dereference on local read error

2026-03-12T11:09:42+00:00

commit 0d195d3b205ca90db30d70d09d7bb6909aac178f upstream.

In drbd_request_endio(), READ_COMPLETED_WITH_ERROR is passed to
__req_mod() with a NULL peer_device:

  __req_mod(req, what, NULL, &m);

The READ_COMPLETED_WITH_ERROR handler then unconditionally passes this
NULL peer_device to drbd_set_out_of_sync(), which dereferences it,
causing a null-pointer dereference.

Fix this by obtaining the peer_device via first_peer_device(device),
matching how drbd_req_destroy() handles the same situation.

Cc: stable@vger.kernel.org
Reported-by: Tuo Li 
Link: https://lore.kernel.org/linux-block/20260104165355.151864-1-islituo@gmail.com
Signed-off-by: Christoph Böhmwalder 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

drbd: fix "LOGIC BUG" in drbd_al_begin_io_nonblock()

2026-03-12T11:09:42+00:00

commit ab140365fb62c0bdab22b2f516aff563b2559e3b upstream.

Even though we check that we "should" be able to do lc_get_cumulative()
while holding the device->al_lock spinlock, it may still fail,
if some other code path decided to do lc_try_lock() with bad timing.

If that happened, we logged "LOGIC BUG for enr=...",
but still did not return an error.

The rest of the code now assumed that this request has references
for the relevant activity log extents.

The implcations are that during an active resync, mutual exclusivity of
resync versus application IO is not guaranteed. And a potential crash
at this point may not realizs that these extents could have been target
of in-flight IO and would need to be resynced just in case.

Also, once the request completes, it will give up activity log references it
does not even hold, which will trigger a BUG_ON(refcnt == 0) in lc_put().

Fix:

Do not crash the kernel for a condition that is harmless during normal
operation: also catch "e->refcnt == 0", not only "e == NULL"
when being noisy about "al_complete_io() called on inactive extent %u\n".

And do not try to be smart and "guess" whether something will work, then
be surprised when it does not.
Deal with the fact that it may or may not work.  If it does not, remember a
possible "partially in activity log" state (only possible for requests that
cross extent boundaries), and return an error code from
drbd_al_begin_io_nonblock().

A latter call for the same request will then resume from where we left off.

Cc: stable@vger.kernel.org
Signed-off-by: Lars Ellenberg 
Signed-off-by: Christoph Böhmwalder 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

zloop: check for spurious options passed to remove

2026-03-12T11:09:16+00:00

[ Upstream commit 3c4617117a2b7682cf037be5e5533e379707f050 ]

Zloop uses a command option parser for all control commands,
but most options are only valid for adding a new device.  Check
for incorrectly specified options in the remove handler.

Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

zloop: advertise a volatile write cache

2026-03-12T11:09:16+00:00

[ Upstream commit 6acf7860dcc79ed045cc9e6a79c8a8bb6959dba7 ]

Zloop is file system backed and thus needs to sync the underlying file
system to persist data.  Set BLK_FEAT_WRITE_CACHE so that the block
layer actually send flush commands, and fix the flush implementation
as sync_filesystem requires s_umount to be held and the code currently
misses that.

Fixes: eb0570c7df23 ("block: new zoned loop block device driver")
Signed-off-by: Christoph Hellwig 
Reviewed-by: Damien Le Moal 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

rnbd-srv: Zero the rsp buffer before using it

2026-03-04T12:19:34+00:00

[ Upstream commit 69d26698e4fd44935510553809007151b2fe4db5 ]

Before using the data buffer to send back the response message, zero it
completely. This prevents any stray bytes to be picked up by the client
side when there the message is exchanged between different protocol
versions.

Signed-off-by: Md Haris Iqbal 
Signed-off-by: Jack Wang 
Signed-off-by: Grzegorz Prajsner 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

drbd: always set BLK_FEAT_STABLE_WRITES

2026-02-26T22:59:36+00:00

[ Upstream commit 2ebc8d600fb907fa6b1e7095c0b6d84fc47e91ea ]

DRBD requires stable pages because it may read the same bio data
multiple times for local disk I/O and network transmission, and in
some cases for calculating checksums.

The BLK_FEAT_STABLE_WRITES flag is set when the device is first
created, but blk_set_stacking_limits() clears it whenever a
backing device is attached. In some cases the flag may be
inherited from the backing device, but we want it to be enabled
at all times.

Unconditionally re-enable BLK_FEAT_STABLE_WRITES in
drbd_reconsider_queue_parameters() after the queue parameter
negotiations.

Also, document why we want this flag enabled in the first place.

Fixes: 1a02f3a73f8c ("block: move the stable_writes flag to queue_limits")
Signed-off-by: Christoph Böhmwalder 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

ublk: Validate SQE128 flag before accessing the cmd

2026-02-26T22:59:01+00:00

[ Upstream commit da7e4b75e50c087d2031a92f6646eb90f7045a67 ]

ublk_ctrl_cmd_dump() accesses (header *)sqe->cmd before
IO_URING_F_SQE128 flag check. This could cause out of boundary memory
access.

Move the SQE128 flag check earlier in ublk_ctrl_uring_cmd() to return
-EINVAL immediately if the flag is not set.

Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Govindarajulu Varadarajan 
Reviewed-by: Caleb Sander Mateos 
Reviewed-by: Ming Lei 
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin