linux.git/drivers/nvme/target, branch master

Merge tag 'nvme-7.2-2026-06-23' of git://git.infradead.org/nvme into block-7.2

2026-06-23T15:05:44+00:00

Pull NVMe fixes from Keith:

"- Apple A11 quirk for sharing tags across admin and IO queues (Nick)
 - Target fix for short AUTH_RECEIVE buffers (Michael)
 - Target fix for SQ refcount leak (Wentao)
 - Target RDMA handling inline data with nonzero offset (Bryam)
 - Target TCP fix handling the TCP_CLOSING state (Maurizio)
 - FC abort fixes in early initialization (Mohamed)
 - Controller device teardown fixes (Maurizio, John)
 - Allocate the target ana_state with the port (Rosen)
 - Quieten sparse and sysfs symbol warnings (John)"

* tag 'nvme-7.2-2026-06-23' of git://git.infradead.org/nvme:
  nvmet-tcp: handle TCP_CLOSING state in nvmet_tcp_state_change
  nvmet-auth: reject short AUTH_RECEIVE buffers
  nvme-fc: Do not cancel requests in io target before it is initialized
  nvme: make nvme_add_ns{_head}_cdev return void
  nvme: make some sysfs diagnostic structures static
  nvmet-rdma: handle inline data with a nonzero offset
  nvme: target: allocate ana_state with port
  nvme: fix crash and memory leak during invalid cdev teardown
  nvmet: fix refcount leak in nvmet_sq_create()
  nvme: quieten sparse warning in valid LBA size check
  nvme-apple: Prevent shared tags across queues on Apple A11

Merge tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

2026-06-16T07:32:47+00:00

Pull block updates from Jens Axboe:

 - NVMe pull request via Keith:
     - Per-controller admin and IO timeout sysfs attributes, and
       letting the block layer set request timeouts (Maurizio,
       Maximilian)
     - Multipath passthrough iostats, and PCI P2PDMA enablement for
       multipath devices (Keith, Kiran)
     - A new diag sysfs attribute group exporting per-controller
       counters (retries, multipath failover, error counters, requeue
       and failure counts, reset and reconnect events) (Nilay)
     - FDP configuration validation and bounds check fixes (liuxixin)
     - Various nvmet fixes, including a pre-auth out-of-bounds read in
       the Discovery Get Log Page handler, auth payload bounds
       validation, and tcp error-path leak fixes (Bryam, Tianchu,
       Geliang)
     - nvme-tcp lockdep and workqueue fixes (Shin'ichiro, Kuniyuki,
       Eric)
     - Assorted other fixes and cleanups (John, Yao, Chao, Mateusz,
       Achkinazi, Wentao)

 - MD pull request via Yu Kuai:
     - raid1/raid10 fixes for a deadlock in the read error recovery
       path, error-path detection and bio accounting with cloned bios,
       and an nr_pending leak in the REQ_ATOMIC bad-block error path
       (Abd-Alrhman)
     - PCI P2PDMA propagation from member devices to the RAID device
       (Kiran)
     - dm-raid bio requeue fix, and various smaller fixes and cleanups
       (Benjamin, Chen, Li, Thorsten)

 - Enable Clang lock context analysis for the block layer, with the
   accompanying annotations across queue limits, the blk_holder_ops
   callbacks, crypto, cgroup, iocost, kyber and mq-deadline (Bart)

 - Block status code infrastructure work: a tagged status table, a
   str_to_blk_op() helper, a bio_endio_status() helper, and on top of
   that a new configurable block-layer error injection facility
   (Christoph)

 - DRBD netlink rework, replacing the genl_magic machinery with explicit
   netlink serialization and moving the DRBD UAPI headers to
   include/uapi/linux/ (Christoph Böhmwalder)

 - bvec improvements: a bvec_folio() helper and making the bvec_iter
   helpers proper inline functions (Willy, Christoph)

 - ublk cleanups and a canceling-flag fix for the disk-not-allocated
   case (Caleb, Ming)

 - Partition handling fixes: bound the AIX pp_count scan, fix an of_node
   refcount leak, and replace __get_free_page() with kmalloc() (Bryam,
   Wentao, Mike)

 - Convert numa_node to int in blk_mq_hw_ctx and ->init_request, and add
   WQ_PERCPU to the block workqueue users (Mateusz, Marco)

 - Block statistics and tracing: propagate in-flight to the whole disk
   on partition IO, export passthrough stats, and a new
   block_rq_tag_wait tracepoint (Tang, Keith, Aaron)

 - A round of removals, unexports and cleanups across bio, direct-io and
   the bvec helpers (Christoph)

 - Various driver fixes (mtip32xx use-after-free, rbd snap_count
   validation and strscpy conversion, nbd socket lockdep reclassify,
   virtio-blk zone report clamp, floppy) and a batch of MAINTAINERS
   email/list updates (Coly, Li, Yu, Christoph Böhmwalder)

 - Other little fixes and cleanups all over

* tag 'for-7.2/block-20260615' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (117 commits)
  MAINTAINERS: Update Coly Li's email address
  block: check bio split for unaligned bvec
  nbd: Reclassify sockets to avoid lockdep circular dependency
  block: add configurable error injection
  block: add a str_to_blk_op helper
  block: add a "tag" for block status codes
  block: add a macro to initialize the status table
  floppy: Drop unused pnp driver data
  block: propagate in_flight to whole disk on partition I/O
  virtio-blk: clamp zone report to the report buffer capacity
  block: optimize I/O merge hot path with unlikely() hints
  drivers/block/rbd: Use strscpy() to copy strings into arrays
  partitions: aix: bound the pp_count scan to the ppe array
  block: Enable lock context analysis
  block/mq-deadline: Make the lock context annotations compatible with Clang
  block/Kyber: Make the lock context annotations compatible with Clang
  block/blk-mq-debugfs: Improve lock context annotations
  block/blk-iocost: Inline iocg_lock() and iocg_unlock()
  block/blk-iocost: Split ioc_rqos_throttle()
  block/crypto: Annotate the crypto functions
  ...

nvmet-tcp: handle TCP_CLOSING state in nvmet_tcp_state_change

2026-06-10T15:14:19+00:00

When an NVMe/TCP connection shuts down, the underlying TCP socket can
enter the TCP_CLOSING state (state 11).  Currently, the
nvmet_tcp_state_change() callback does not explicitly handle this state,
which results in harmless but noisy kernel warnings:

nvmet_tcp: queue 2 unhandled state 11

Add TCP_CLOSING to the switch statement alongside TCP_FIN_WAIT2 and
TCP_LAST_ACK to silently ignore the state transition.

Signed-off-by: Maurizio Lombardi 
Signed-off-by: Keith Busch

nvmet-auth: reject short AUTH_RECEIVE buffers

2026-06-10T14:43:15+00:00

nvmet_execute_auth_receive() trusts the AUTH_RECEIVE allocation length
after checking only that it is nonzero and matches the transfer length.
In the SUCCESS1 and FAILURE1/default states, that lets a remote NVMe-oF
initiator reach the fixed-size DH-HMAC-CHAP response builders with a
kmalloc() buffer shorter than the response, so nvmet_auth_success1() and
nvmet_auth_failure1() write past the allocation; both only WARN_ON the
short length and then format the message anyway.

Impact: A remote NVMe-oF initiator with access to an auth-enabled target
can trigger a 16-byte heap out-of-bounds write via a one-byte
AUTH_RECEIVE allocation length.

Compute the minimum response length for the current DH-HMAC-CHAP step in
nvmet_auth_receive_data_len() and report a zero data length when the
host-supplied allocation length is shorter, so the existing zero-length
check in nvmet_execute_auth_receive() rejects the command before any
builder runs. The SUCCESS1 minimum is sizeof(struct
nvmf_auth_dhchap_success1_data) plus the HMAC hash length, because the
response hash is written into the rval[] flexible-array tail, so the
minimum is state dependent rather than a flat sizeof. CHALLENGE keeps its
existing variable-length guard in nvmet_auth_challenge().

This is reachable only when in-band DH-HMAC-CHAP authentication is
configured on the target.

Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication")
Cc: stable@vger.kernel.org
Assisted-by: Codex:gpt-5-5-xhigh
Assisted-by: Claude:claude-opus-4-8
Reviewed-by: Hannes Reinecke 
Signed-off-by: Michael Bommarito 
Signed-off-by: Keith Busch

nvmet-rdma: handle inline data with a nonzero offset

2026-06-09T21:53:00+00:00

nvmet_rdma_use_inline_sg() maps the host-controlled inline data offset
into the per-command inline scatterlist.  The bounds check admits any
offset with off + len <= inline_data_size, but the mapping still assumes
the data begins in the first inline page:

	sg->offset = off;
	sg->length = min_t(int, len, PAGE_SIZE - off);

When a port is configured with inline_data_size > PAGE_SIZE (settable up
to max(SZ_16K, PAGE_SIZE)), an offset in (PAGE_SIZE, inline_data_size]
makes "PAGE_SIZE - off" underflow, so sg->length is set to ~4 GiB and
the block backend reads far past the first inline page.  num_pages(len)
also ignores the offset, so an in-bounds offset whose [off, off+len)
span crosses a page boundary under-counts the scatterlist.

Map the offset properly: split it into a page index and an in-page
offset, start the scatterlist at that page, and size the page count from
page_off + len.  Because the request scatterlist may now start at
inline_sg[page_idx] rather than inline_sg[0], generalize the inline-SGL
identity test in nvmet_rdma_release_rsp() to a range test; otherwise the
persistent inline scatterlist is mistaken for an allocated one and
nvmet_req_free_sgls() frees an inline page (and warns in
free_large_kmalloc()).

Fixes: 0d5ee2b2ab4f ("nvmet-rdma: support max(16KB, PAGE_SIZE) inline data")
Cc: stable@vger.kernel.org
Suggested-by: Keith Busch 
Reported-by: Bryam Vargas 
Signed-off-by: Bryam Vargas 
Signed-off-by: Keith Busch

nvme: target: allocate ana_state with port

2026-06-09T18:18:03+00:00

Use a flexible array member to remove one allocation. Simplifies code
slightly.

Signed-off-by: Rosen Penev 
Signed-off-by: Keith Busch

nvmet: fix refcount leak in nvmet_sq_create()

2026-06-09T16:42:23+00:00

In nvmet_sq_create(), a reference on the ctrl is taken
via kref_get_unless_zero() before calling nvmet_check_sqid().
If nvmet_check_sqid() fails, the function returns the error
directly without releasing the reference, leading to a leak.

Fix this by jumping to the "ctrl_put" label, which already
performs the necessary nvmet_ctrl_put(ctrl). This ensures the
reference is properly released on this error path.

Cc: stable@vger.kernel.org
Fixes: 1eb380caf527 ("nvmet: Introduce nvmet_sq_create() and nvmet_cq_create()")
Signed-off-by: Wentao Liang 
Signed-off-by: Keith Busch

Merge tag 'nvme-7.2-2026-06-04' of git://git.infradead.org/nvme into for-7.2/block

2026-06-05T11:18:58+00:00

Pull NVMe updates from Keith:

"- Per-controller timeouts
 - Multipath telemetry
 - Namespace format validation
 - Various other fixes"

* tag 'nvme-7.2-2026-06-04' of git://git.infradead.org/nvme: (34 commits)
  nvme: export controller reconnect event count via sysfs
  nvme: export controller reset event count via sysfs
  nvme: export I/O failure count when no path is available via sysfs
  nvme: export I/O requeue count when no path is usable via sysfs
  nvme: export command error counters via sysfs
  nvme: export multipath failover count via sysfs
  nvme: export command retry count via sysfs
  nvme: add diag attribute group under sysfs
  nvme-tcp: lockdep: use dynamic lockdep keys per socket instance
  nvme-tcp: move nvme_tcp_reclassify_socket()
  nvme: validate FDP configuration descriptor sizes
  nvmet-auth: validate reply message payload bounds against transfer length
  nvme: refresh multipath head zoned limits from path limits
  nvme: fix FDP fdpcidx bounds check
  nvme-tcp: Use WQ_PERCPU explicitly if wq_unbound is false.
  nvmet: fix pre-auth out-of-bounds heap read in Discovery Get Log Page
  nvme-multipath: set BIO_REMAPPED on bios remapped to per-path namespace disks
  nvme-multipath: require exact iopolicy names for module parameter
  nvme-multipath: pass NS head to nvme_mpath_revalidate_paths()
  nvme-pci: fix out-of-bounds access in nvme_setup_descriptor_pools
  ...

nvmet-auth: validate reply message payload bounds against transfer length

2026-06-03T09:40:33+00:00

nvmet_auth_reply() accesses the variable-length rval[] array using
attacker-controlled hl (hash length) and dhvlen (DH value length) fields
without verifying they fit within the allocated buffer of tl bytes.

A malicious NVMe-oF initiator can craft a DHCHAP_REPLY message with a
small transfer length but large hl/dhvlen values, causing out-of-bounds
heap reads when the target processes the DH public key (rval + 2*hl) or
performs the host response memcmp.

With DH authentication configured, the OOB pointer is passed directly to
sg_init_one() and read by crypto_kpp_compute_shared_secret(), reaching
up to 526 bytes past the buffer. This is exploitable pre-authentication.

Add bounds validation ensuring sizeof(*data) + 2*hl + dhvlen <= tl before
any access to the variable-length fields.

Discovered by Atuin - Automated Vulnerability Discovery Engine.

Fixes: db1312dd9548 ("nvmet: implement basic In-Band Authentication")
Cc: stable@vger.kernel.org
Reviewed-by: Hannes Reinecke 
Signed-off-by: Tianchu Chen 
Signed-off-by: Keith Busch

nvmet: fix pre-auth out-of-bounds heap read in Discovery Get Log Page

2026-06-02T10:43:27+00:00

nvmet_execute_disc_get_log_page() validates only the dword alignment
of the host-supplied Log Page Offset (lpo).  The 64-bit offset is then
added to a small kzalloc'd buffer that holds the discovery log page
and the result is passed straight to nvmet_copy_to_sgl(), which
memcpy()s data_len bytes out to the host with no source-side bound
check:

    u64 offset      = nvmet_get_log_page_offset(req->cmd);  /* 64-bit host */
    size_t data_len = nvmet_get_log_page_len(req->cmd);     /* 32-bit host */
    ...
    if (offset & 0x3) { ... }                               /* only check */
    ...
    alloc_len = sizeof(*hdr) + entry_size * discovery_log_entries(req);
    buffer = kzalloc(alloc_len, GFP_KERNEL);
    ...
    status = nvmet_copy_to_sgl(req, 0, buffer + offset, data_len);

The Discovery controller is unauthenticated -- nvmet_host_allowed()
returns true unconditionally for the discovery subsystem -- so the call
is reachable pre-authentication by any TCP/RDMA/FC peer that can reach
the nvmet target.  With a discovery log page of ~1 KiB, an attacker
requesting up to 4 KiB starting at offset == alloc_len reads the next
slab page out and gets its content returned over the fabric (an
empirical run on a default nvmet-tcp loopback target leaked 81
canonical kernel pointers in one Get Log Page response).  Pointing the
offset at unmapped kernel memory faults the in-kernel memcpy and
crashes (or panics, on panic_on_oops=1) the target host instead.

The attacker-controlled source-side offset pattern
"nvmet_copy_to_sgl(req, 0, buffer + ATTACKER_OFFSET, ...)" is unique
to nvmet_execute_disc_get_log_page in the entire nvmet codebase: every
other Get Log Page handler in admin-cmd.c either ignores lpo (and
silently starts every response at offset 0) or tracks a local
destination offset with a fixed source pointer.

Validate the host-supplied offset against the log page size, cap the
copy length to what is actually available, and zero-fill any remainder
of the host transfer buffer.  The zero-fill matches the existing
short-response pattern in nvmet_execute_get_log_changed_ns()
(admin-cmd.c) and prevents leaking transport SGL contents when the
host asks for more bytes than the log page contains.

Fixes: a07b4970f464 ("nvmet: add a generic NVMe target")
Cc: stable@vger.kernel.org
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Bryam Vargas 
Signed-off-by: Keith Busch