diff options
| author | Achkinazi, Igor <Igor.Achkinazi@dell.com> | 2026-05-28 15:24:27 +0000 |
|---|---|---|
| committer | Keith Busch <kbusch@kernel.org> | 2026-06-02 03:23:05 -0700 |
| commit | 88bac2c1a72b8f4f71e9845699aa872df04e5850 (patch) | |
| tree | 5c9c581656c2a1aafd1a83838e4a07cc71ab57b2 /include/linux/timerqueue_types.h | |
| parent | 4cf06977bdb6a037e2717b4117f3fd636f6e9641 (diff) | |
nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks
When nvme_ns_head_submit_bio() remaps a bio from the multipath head to a
per-path namespace, bio_set_dev() clears BIO_REMAPPED. The remapped bio
is then resubmitted through submit_bio_noacct() which calls
bio_check_eod() because BIO_REMAPPED is not set.
This races with nvme_ns_remove() which zeroes the per-path capacity
before synchronize_srcu():
CPU 0 (IO submission)
---------------------
srcu_read_lock()
nvme_find_path() -> ns
[NVME_NS_READY is set]
CPU 1 (namespace removal)
-------------------------
clear_bit(NVME_NS_READY)
set_capacity(ns->disk, 0)
synchronize_srcu() <- blocks
CPU 0 (IO submission)
---------------------
bio_set_dev(bio, ns->disk->part0)
[clears BIO_REMAPPED]
submit_bio_noacct(bio)
-> bio_check_eod() sees capacity=0
-> bio fails with IO error
The SRCU read lock prevents synchronize_srcu() from completing, but does
not prevent set_capacity(0) from executing. The bio fails the EOD check
before it reaches the NVMe driver, so nvme_failover_req() never gets a
chance to redirect it to another path of multipath. IO errors are
reported to the application despite another path being available.
On older kernels (before commit 0b64682e78f7 "block: skip unnecessary
checks for split bio"), the same race was also reachable through split
remainders resubmitted via submit_bio_noacct().
Fix this by setting BIO_REMAPPED after bio_set_dev() in
nvme_ns_head_submit_bio(). This skips bio_check_eod() on the per-path
device; the EOD check already passed on the multipath head.
NVMe per-path namespace devices are always whole disks (bd_partno=0), so
the blk_partition_remap() skip also gated by BIO_REMAPPED is a no-op.
The flag does not persist across failover and cannot go stale if the
namespace geometry changes between attempts: nvme_failover_req() calls
bio_set_dev() to redirect the bio back to the multipath head, which
clears BIO_REMAPPED. When nvme_requeue_work() resubmits through
submit_bio_noacct(), bio_check_eod() runs normally against the current
capacity.
Same approach as commit 3a905c37c351 ("block: skip bio_check_eod for
partition-remapped bios").
Fixes: a7c7f7b2b641 ("nvme: use bio_set_dev to assign ->bi_bdev")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Igor Achkinazi <igor.achkinazi@dell.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Diffstat (limited to 'include/linux/timerqueue_types.h')
0 files changed, 0 insertions, 0 deletions
