<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/nvme/target, branch master</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>nvmet-tcp: Fix potential UAF when ddgst mismatch</title>
<updated>2026-05-11T15:07:40+00:00</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2026-05-10T20:30:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dbbd07d0a7020b80f6a7028e561908f7b83b3d5a'/>
<id>dbbd07d0a7020b80f6a7028e561908f7b83b3d5a</id>
<content type='text'>
Shivam Kumar found via vulnerability testing:
When data digest is enabled on an NVMe/TCP connection and a digest
mismatch occurs on a non-final H2C_DATA PDU during an R2T-based
data transfer, the digest error handler in nvmet_tcp_try_recv_ddgst()
calls nvmet_req_uninit() — which performs percpu_ref_put() on the
submission queue — but does NOT mark the command as completed. It
does not set cqe-&gt;status, does not modify rbytes_done, and does not
clear any flag. When the subsequent fatal error triggers queue
teardown, nvmet_tcp_uninit_data_in_cmds() iterates all commands,
checks nvmet_tcp_need_data_in() for each one, and finds that the
already-uninited command still appears to need data (because
rbytes_done &lt; transfer_len and cqe-&gt;status == 0). It therefore calls
nvmet_req_uninit() a second time on the same command — a double
percpu_ref_put against a single percpu_ref_get.

Reported-by: Shivam Kumar &lt;kumar.shivam43666@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Shivam Kumar found via vulnerability testing:
When data digest is enabled on an NVMe/TCP connection and a digest
mismatch occurs on a non-final H2C_DATA PDU during an R2T-based
data transfer, the digest error handler in nvmet_tcp_try_recv_ddgst()
calls nvmet_req_uninit() — which performs percpu_ref_put() on the
submission queue — but does NOT mark the command as completed. It
does not set cqe-&gt;status, does not modify rbytes_done, and does not
clear any flag. When the subsequent fatal error triggers queue
teardown, nvmet_tcp_uninit_data_in_cmds() iterates all commands,
checks nvmet_tcp_need_data_in() for each one, and finds that the
already-uninited command still appears to need data (because
rbytes_done &lt; transfer_len and cqe-&gt;status == 0). It therefore calls
nvmet_req_uninit() a second time on the same command — a double
percpu_ref_put against a single percpu_ref_get.

Reported-by: Shivam Kumar &lt;kumar.shivam43666@gmail.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-auth: Do not print DH-HMAC-CHAP secrets</title>
<updated>2026-05-11T15:07:39+00:00</updated>
<author>
<name>Hannes Reinecke</name>
<email>hare@kernel.org</email>
</author>
<published>2026-04-30T13:22:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a891962ae5e48fb5476c23631df759482e62ab16'/>
<id>a891962ae5e48fb5476c23631df759482e62ab16</id>
<content type='text'>
From a security standpoint we should not allow to print out the DH-HMAC-CHAP
secrets, but at the same time having them is useful for debugging
authentication failures.
So add a Kconfig option NVME_TARGET_AUTH_DEBUG to only enable debugging
if explictly requested at build time.

Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Hannes Reinecke &lt;hare@kernel.org&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
From a security standpoint we should not allow to print out the DH-HMAC-CHAP
secrets, but at the same time having them is useful for debugging
authentication failures.
So add a Kconfig option NVME_TARGET_AUTH_DEBUG to only enable debugging
if explictly requested at build time.

Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Hannes Reinecke &lt;hare@kernel.org&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'nvme-7.1-2026-04-24' of git://git.infradead.org/nvme into block-7.1</title>
<updated>2026-04-27T21:47:21+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-04-27T21:47:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aa03cfe9dbf487f065d0b38b95edc25c386e3d40'/>
<id>aa03cfe9dbf487f065d0b38b95edc25c386e3d40</id>
<content type='text'>
Pull NVMe fixes from Keith:

"- Target data transfer size confiruation (Aurelien)
 - Enable P2P for RDMA (Shivaji Kant)
 - TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar)
 - TCP host updates (Alistair, Chaitanya)
 - Authentication updates (Alistair, Daniel, Chris Leech)
 - Multipath fixes (John Garry)
 - New quirks (Alan Cui, Tao Jiang)
 - Apple driver fix (Fedor Pchelkin)
 - PCI admin doorbell update fix (Keith)"

* tag 'nvme-7.1-2026-04-24' of git://git.infradead.org/nvme: (22 commits)
  nvme-auth: Hash DH shared secret to create session key
  nvme-pci: fix missed admin queue sq doorbell write
  nvme-auth: Include SC_C in RVAL controller hash
  nvme-tcp: teardown circular locking fixes
  nvmet-tcp: Don't clear tls_key when freeing sq
  Revert "nvmet-tcp: Don't free SQ on authentication success"
  nvme: skip trace completion for host path errors
  nvme-pci: add quirk for Memblaze Pblaze5 (0x1c5f:0x0555)
  nvme-multipath: put module reference when delayed removal work is canceled
  nvme: expose TLS mode
  nvme-apple: drop invalid put of admin queue reference count
  nvme-core: fix parameter name in comment
  nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free
  nvme-multipath: drop head pointer check in nvme_mpath_clear_current_path()
  nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus )
  nvmet-tcp: fix race between ICReq handling and queue teardown
  nvmet-tcp: remove redundant calls to nvmet_tcp_fatal_error()
  nvmet-tcp: propagate nvmet_tcp_build_pdu_iovec() errors to its callers
  nvme: enable PCI P2PDMA support for RDMA transport
  nvmet: introduce new mdts configuration entry
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull NVMe fixes from Keith:

"- Target data transfer size confiruation (Aurelien)
 - Enable P2P for RDMA (Shivaji Kant)
 - TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar)
 - TCP host updates (Alistair, Chaitanya)
 - Authentication updates (Alistair, Daniel, Chris Leech)
 - Multipath fixes (John Garry)
 - New quirks (Alan Cui, Tao Jiang)
 - Apple driver fix (Fedor Pchelkin)
 - PCI admin doorbell update fix (Keith)"

* tag 'nvme-7.1-2026-04-24' of git://git.infradead.org/nvme: (22 commits)
  nvme-auth: Hash DH shared secret to create session key
  nvme-pci: fix missed admin queue sq doorbell write
  nvme-auth: Include SC_C in RVAL controller hash
  nvme-tcp: teardown circular locking fixes
  nvmet-tcp: Don't clear tls_key when freeing sq
  Revert "nvmet-tcp: Don't free SQ on authentication success"
  nvme: skip trace completion for host path errors
  nvme-pci: add quirk for Memblaze Pblaze5 (0x1c5f:0x0555)
  nvme-multipath: put module reference when delayed removal work is canceled
  nvme: expose TLS mode
  nvme-apple: drop invalid put of admin queue reference count
  nvme-core: fix parameter name in comment
  nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free
  nvme-multipath: drop head pointer check in nvme_mpath_clear_current_path()
  nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus )
  nvmet-tcp: fix race between ICReq handling and queue teardown
  nvmet-tcp: remove redundant calls to nvmet_tcp_fatal_error()
  nvmet-tcp: propagate nvmet_tcp_build_pdu_iovec() errors to its callers
  nvme: enable PCI P2PDMA support for RDMA transport
  nvmet: introduce new mdts configuration entry
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-auth: Hash DH shared secret to create session key</title>
<updated>2026-04-22T20:02:16+00:00</updated>
<author>
<name>Chris Leech</name>
<email>cleech@redhat.com</email>
</author>
<published>2026-04-22T19:06:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bd7b7ce96db4487bb77692a85ee4489fd2c395df'/>
<id>bd7b7ce96db4487bb77692a85ee4489fd2c395df</id>
<content type='text'>
The NVMe Base Specification 8.3.5.5.9 states that the session key Ks
shall be computed from the ephemeral DH key by applying the hash
function selected by the HashID parameter.

The current implementation stores the raw DH shared secret as the
session key without hashing it. This causes redundant hash operations:

1. Augmented challenge computation (section 8.3.5.5.4) requires
   Ca = HMAC(H(g^xy mod p), C). The code compensates by hashing the
   unhashed session key in nvme_auth_augmented_challenge() to produce
   the correct result.

2. PSK generation (section 8.3.5.5.9) requires PSK = HMAC(Ks, C1 || C2)
   where Ks should already be H(g^xy mod p). As the DH shared secret
   is always larger than the HMAC block size, HMAC internally hashes
   it before use, accidentally producing the correct result.

When using secure channel concatenation with bidirectional
authentication, this results in hashing the DH value three times: twice
for augmented challenge calculations and once during PSK generation.

Fix this by:
- Modifying nvme_auth_gen_shared_secret() to hash the DH shared secret
  once after computation: Ks = H(g^xy mod p)
- Removing the hash operation from nvme_auth_augmented_challenge()
  as the session key is now already hashed
- Updating session key buffer size from DH key size to hash output size
- Adding specification references in comments

This avoid storing the raw DH shared secret and reduces the number of
hash operations from three to one when using secure channel
concatenation.

Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Chris Leech &lt;cleech@redhat.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The NVMe Base Specification 8.3.5.5.9 states that the session key Ks
shall be computed from the ephemeral DH key by applying the hash
function selected by the HashID parameter.

The current implementation stores the raw DH shared secret as the
session key without hashing it. This causes redundant hash operations:

1. Augmented challenge computation (section 8.3.5.5.4) requires
   Ca = HMAC(H(g^xy mod p), C). The code compensates by hashing the
   unhashed session key in nvme_auth_augmented_challenge() to produce
   the correct result.

2. PSK generation (section 8.3.5.5.9) requires PSK = HMAC(Ks, C1 || C2)
   where Ks should already be H(g^xy mod p). As the DH shared secret
   is always larger than the HMAC block size, HMAC internally hashes
   it before use, accidentally producing the correct result.

When using secure channel concatenation with bidirectional
authentication, this results in hashing the DH value three times: twice
for augmented challenge calculations and once during PSK generation.

Fix this by:
- Modifying nvme_auth_gen_shared_secret() to hash the DH shared secret
  once after computation: Ks = H(g^xy mod p)
- Removing the hash operation from nvme_auth_augmented_challenge()
  as the session key is now already hashed
- Updating session key buffer size from DH key size to hash output size
- Adding specification references in comments

This avoid storing the raw DH shared secret and reduces the number of
hash operations from three to one when using secure channel
concatenation.

Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Reviewed-by: Eric Biggers &lt;ebiggers@kernel.org&gt;
Signed-off-by: Chris Leech &lt;cleech@redhat.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-auth: Include SC_C in RVAL controller hash</title>
<updated>2026-04-22T17:07:30+00:00</updated>
<author>
<name>Alistair Francis</name>
<email>alistair.francis@wdc.com</email>
</author>
<published>2026-04-17T00:50:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5d10069e1a1691a0d8642e1fa65f4c1869210299'/>
<id>5d10069e1a1691a0d8642e1fa65f4c1869210299</id>
<content type='text'>
Section 8.3.4.5.5 of the NVMe Base Specification 2.1 describes what is
included in the Response Value (RVAL) hash and SC_C should be included.
Currently we are hardcoding 0 instead of using the correct SC_C value.

Update the host and target code to use the SC_C when calculating the
RVAL instead of using 0.

Fixes: e88a7595b57f2 ("nvme-tcp: request secure channel concatenation")
Reviewed-by: Chris Leech &lt;cleech@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Alistair Francis &lt;alistair.francis@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Section 8.3.4.5.5 of the NVMe Base Specification 2.1 describes what is
included in the Response Value (RVAL) hash and SC_C should be included.
Currently we are hardcoding 0 instead of using the correct SC_C value.

Update the host and target code to use the SC_C when calculating the
RVAL instead of using 0.

Fixes: e88a7595b57f2 ("nvme-tcp: request secure channel concatenation")
Reviewed-by: Chris Leech &lt;cleech@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Alistair Francis &lt;alistair.francis@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-tcp: Don't clear tls_key when freeing sq</title>
<updated>2026-04-22T17:07:30+00:00</updated>
<author>
<name>Alistair Francis</name>
<email>alistair.francis@wdc.com</email>
</author>
<published>2026-04-17T00:48:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5fc422951c962cc01e654950fc043ebd8fadd865'/>
<id>5fc422951c962cc01e654950fc043ebd8fadd865</id>
<content type='text'>
Curently after the host sends a REPLACETLSPSK we free the TLS keys as
part of calling nvmet_auth_sq_free() on success. This means when the
host sends a follow up REPLACETLSPSK we return CONCAT_MISMATCH as the
check for !nvmet_queue_tls_keyid(req-&gt;sq) fails.

A previous attempt to fix this involed not calling nvmet_auth_sq_free()
on successful connections, but that results in memory leaks. Instead we
should not clear `tls_key` in nvmet_auth_sq_free(), as that was
incorrectly wiping the tls keys which are used for the session.

This patch ensures we correctly free the ephemeral session key on
connection, yet we don't free the TLS key unless closing the connection.

Reviewed-by: Chris Leech &lt;cleech@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Alistair Francis &lt;alistair.francis@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Curently after the host sends a REPLACETLSPSK we free the TLS keys as
part of calling nvmet_auth_sq_free() on success. This means when the
host sends a follow up REPLACETLSPSK we return CONCAT_MISMATCH as the
check for !nvmet_queue_tls_keyid(req-&gt;sq) fails.

A previous attempt to fix this involed not calling nvmet_auth_sq_free()
on successful connections, but that results in memory leaks. Instead we
should not clear `tls_key` in nvmet_auth_sq_free(), as that was
incorrectly wiping the tls keys which are used for the session.

This patch ensures we correctly free the ephemeral session key on
connection, yet we don't free the TLS key unless closing the connection.

Reviewed-by: Chris Leech &lt;cleech@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Alistair Francis &lt;alistair.francis@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "nvmet-tcp: Don't free SQ on authentication success"</title>
<updated>2026-04-22T17:07:30+00:00</updated>
<author>
<name>Alistair Francis</name>
<email>alistair.francis@wdc.com</email>
</author>
<published>2026-04-17T00:48:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f920ebd03cd13eb0976d18de77adf325b5461361'/>
<id>f920ebd03cd13eb0976d18de77adf325b5461361</id>
<content type='text'>
In an attempt to fix REPLACETLSPSK we stopped freeing the secrets on
successful connections. This resulted in memory leaks in the kernel, so
let's revert the commit. A improved fix is being developed to just avoid
clearing the tls_key variable.

This reverts commit 2e6eb6b277f593b98f151ea8eff1beb558bbea3b.

Closes: https://lore.kernel.org/linux-nvme/CAHj4cs-u3MWQR4idywptMfjEYi4YwObWFx4KVib35dZ5HMBDdw@mail.gmail.com
Reviewed-by: Chris Leech &lt;cleech@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Alistair Francis &lt;alistair.francis@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In an attempt to fix REPLACETLSPSK we stopped freeing the secrets on
successful connections. This resulted in memory leaks in the kernel, so
let's revert the commit. A improved fix is being developed to just avoid
clearing the tls_key variable.

This reverts commit 2e6eb6b277f593b98f151ea8eff1beb558bbea3b.

Closes: https://lore.kernel.org/linux-nvme/CAHj4cs-u3MWQR4idywptMfjEYi4YwObWFx4KVib35dZ5HMBDdw@mail.gmail.com
Reviewed-by: Chris Leech &lt;cleech@redhat.com&gt;
Reviewed-by: Hannes Reinecke &lt;hare@suse.de&gt;
Signed-off-by: Alistair Francis &lt;alistair.francis@wdc.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free</title>
<updated>2026-04-16T22:05:06+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-04-09T00:56:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aade8abd8b868b6ffa9697aadaea28ec7f65bee6'/>
<id>aade8abd8b868b6ffa9697aadaea28ec7f65bee6</id>
<content type='text'>
nvmet_tcp_release_queue_work() runs on nvmet-wq and can drop the
final controller reference through nvmet_cq_put(). If that triggers
nvmet_ctrl_free(), the teardown path flushes ctrl-&gt;async_event_work on
the same nvmet-wq.

Call chain:

 nvmet_tcp_schedule_release_queue()
   kref_put(&amp;queue-&gt;kref, nvmet_tcp_release_queue)
     nvmet_tcp_release_queue()
       queue_work(nvmet_wq, &amp;queue-&gt;release_work) &lt;--- nvmet_wq
         process_one_work()
           nvmet_tcp_release_queue_work()
             nvmet_cq_put(&amp;queue-&gt;nvme_cq)
               nvmet_cq_destroy()
                 nvmet_ctrl_put(cq-&gt;ctrl)
                   nvmet_ctrl_free()
                     flush_work(&amp;ctrl-&gt;async_event_work) &lt;--- nvmet_wq

                      Previously Scheduled by :-
		        nvmet_add_async_event
		          queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work);

This trips lockdep with a possible recursive locking warning.

[ 5223.015876] run blktests nvme/003 at 2026-04-07 20:53:55
[ 5223.061801] loop0: detected capacity change from 0 to 2097152
[ 5223.072206] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 5223.088368] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 5223.126086] nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 5223.128453] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 5233.199447] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

[ 5233.227718] ============================================
[ 5233.231283] WARNING: possible recursive locking detected
[ 5233.234696] 7.0.0-rc3nvme+ #20 Tainted: G           O     N
[ 5233.238434] --------------------------------------------
[ 5233.241852] kworker/u192:6/2413 is trying to acquire lock:
[ 5233.245429] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
[ 5233.251438]
               but task is already holding lock:
[ 5233.255254] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.261125]
               other info that might help us debug this:
[ 5233.265333]  Possible unsafe locking scenario:

[ 5233.269217]        CPU0
[ 5233.270795]        ----
[ 5233.272436]   lock((wq_completion)nvmet-wq);
[ 5233.275241]   lock((wq_completion)nvmet-wq);
[ 5233.278020]
                *** DEADLOCK ***

[ 5233.281793]  May be due to missing lock nesting notation

[ 5233.286195] 3 locks held by kworker/u192:6/2413:
[ 5233.289192]  #0: ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.294569]  #1: ffffc9000e2a7e40 ((work_completion)(&amp;queue-&gt;release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x6e0
[ 5233.300128]  #2: ffffffff82d7dc40 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
[ 5233.304290]
               stack backtrace:
[ 5233.306520] CPU: 4 UID: 0 PID: 2413 Comm: kworker/u192:6 Tainted: G           O     N  7.0.0-rc3nvme+ #20 PREEMPT(full)
[ 5233.306524] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 5233.306525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 5233.306527] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[ 5233.306532] Call Trace:
[ 5233.306534]  &lt;TASK&gt;
[ 5233.306536]  dump_stack_lvl+0x73/0xb0
[ 5233.306552]  print_deadlock_bug+0x225/0x2f0
[ 5233.306556]  __lock_acquire+0x13f0/0x2290
[ 5233.306563]  lock_acquire+0xd0/0x300
[ 5233.306565]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306571]  ? __flush_work+0x20b/0x530
[ 5233.306573]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306577]  touch_wq_lockdep_map+0x3b/0x90
[ 5233.306580]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306583]  ? __flush_work+0x20b/0x530
[ 5233.306585]  __flush_work+0x268/0x530
[ 5233.306588]  ? __pfx_wq_barrier_func+0x10/0x10
[ 5233.306594]  ? xen_error_entry+0x30/0x60
[ 5233.306600]  nvmet_ctrl_free+0x140/0x310 [nvmet]
[ 5233.306617]  nvmet_cq_put+0x74/0x90 [nvmet]
[ 5233.306629]  nvmet_tcp_release_queue_work+0x19f/0x360 [nvmet_tcp]
[ 5233.306634]  process_one_work+0x206/0x6e0
[ 5233.306640]  worker_thread+0x184/0x320
[ 5233.306643]  ? __pfx_worker_thread+0x10/0x10
[ 5233.306646]  kthread+0xf1/0x130
[ 5233.306648]  ? __pfx_kthread+0x10/0x10
[ 5233.306651]  ret_from_fork+0x355/0x450
[ 5233.306653]  ? __pfx_kthread+0x10/0x10
[ 5233.306656]  ret_from_fork_asm+0x1a/0x30
[ 5233.306664]  &lt;/TASK&gt;

There is also no need to flush async_event_work from controller
teardown. The admin queue teardown already fails outstanding AER
requests before the final controller put :-

 nvmet_sq_destroy(admin sq)
    nvmet_async_events_failall(ctrl)

The controller has already been removed from the subsystem list before
nvmet_ctrl_free() quiesces outstanding work.

Replace flush_work() with cancel_work_sync() so a pending
async_event_work item is canceled and a running instance is waited on
without recursing into the same workqueue.

Fixes: 06406d81a2d7 ("nvmet: cancel fatal error and flush async work before free controller")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nvmet_tcp_release_queue_work() runs on nvmet-wq and can drop the
final controller reference through nvmet_cq_put(). If that triggers
nvmet_ctrl_free(), the teardown path flushes ctrl-&gt;async_event_work on
the same nvmet-wq.

Call chain:

 nvmet_tcp_schedule_release_queue()
   kref_put(&amp;queue-&gt;kref, nvmet_tcp_release_queue)
     nvmet_tcp_release_queue()
       queue_work(nvmet_wq, &amp;queue-&gt;release_work) &lt;--- nvmet_wq
         process_one_work()
           nvmet_tcp_release_queue_work()
             nvmet_cq_put(&amp;queue-&gt;nvme_cq)
               nvmet_cq_destroy()
                 nvmet_ctrl_put(cq-&gt;ctrl)
                   nvmet_ctrl_free()
                     flush_work(&amp;ctrl-&gt;async_event_work) &lt;--- nvmet_wq

                      Previously Scheduled by :-
		        nvmet_add_async_event
		          queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work);

This trips lockdep with a possible recursive locking warning.

[ 5223.015876] run blktests nvme/003 at 2026-04-07 20:53:55
[ 5223.061801] loop0: detected capacity change from 0 to 2097152
[ 5223.072206] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 5223.088368] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 5223.126086] nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 5223.128453] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 5233.199447] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

[ 5233.227718] ============================================
[ 5233.231283] WARNING: possible recursive locking detected
[ 5233.234696] 7.0.0-rc3nvme+ #20 Tainted: G           O     N
[ 5233.238434] --------------------------------------------
[ 5233.241852] kworker/u192:6/2413 is trying to acquire lock:
[ 5233.245429] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
[ 5233.251438]
               but task is already holding lock:
[ 5233.255254] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.261125]
               other info that might help us debug this:
[ 5233.265333]  Possible unsafe locking scenario:

[ 5233.269217]        CPU0
[ 5233.270795]        ----
[ 5233.272436]   lock((wq_completion)nvmet-wq);
[ 5233.275241]   lock((wq_completion)nvmet-wq);
[ 5233.278020]
                *** DEADLOCK ***

[ 5233.281793]  May be due to missing lock nesting notation

[ 5233.286195] 3 locks held by kworker/u192:6/2413:
[ 5233.289192]  #0: ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.294569]  #1: ffffc9000e2a7e40 ((work_completion)(&amp;queue-&gt;release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x6e0
[ 5233.300128]  #2: ffffffff82d7dc40 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
[ 5233.304290]
               stack backtrace:
[ 5233.306520] CPU: 4 UID: 0 PID: 2413 Comm: kworker/u192:6 Tainted: G           O     N  7.0.0-rc3nvme+ #20 PREEMPT(full)
[ 5233.306524] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 5233.306525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 5233.306527] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[ 5233.306532] Call Trace:
[ 5233.306534]  &lt;TASK&gt;
[ 5233.306536]  dump_stack_lvl+0x73/0xb0
[ 5233.306552]  print_deadlock_bug+0x225/0x2f0
[ 5233.306556]  __lock_acquire+0x13f0/0x2290
[ 5233.306563]  lock_acquire+0xd0/0x300
[ 5233.306565]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306571]  ? __flush_work+0x20b/0x530
[ 5233.306573]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306577]  touch_wq_lockdep_map+0x3b/0x90
[ 5233.306580]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306583]  ? __flush_work+0x20b/0x530
[ 5233.306585]  __flush_work+0x268/0x530
[ 5233.306588]  ? __pfx_wq_barrier_func+0x10/0x10
[ 5233.306594]  ? xen_error_entry+0x30/0x60
[ 5233.306600]  nvmet_ctrl_free+0x140/0x310 [nvmet]
[ 5233.306617]  nvmet_cq_put+0x74/0x90 [nvmet]
[ 5233.306629]  nvmet_tcp_release_queue_work+0x19f/0x360 [nvmet_tcp]
[ 5233.306634]  process_one_work+0x206/0x6e0
[ 5233.306640]  worker_thread+0x184/0x320
[ 5233.306643]  ? __pfx_worker_thread+0x10/0x10
[ 5233.306646]  kthread+0xf1/0x130
[ 5233.306648]  ? __pfx_kthread+0x10/0x10
[ 5233.306651]  ret_from_fork+0x355/0x450
[ 5233.306653]  ? __pfx_kthread+0x10/0x10
[ 5233.306656]  ret_from_fork_asm+0x1a/0x30
[ 5233.306664]  &lt;/TASK&gt;

There is also no need to flush async_event_work from controller
teardown. The admin queue teardown already fails outstanding AER
requests before the final controller put :-

 nvmet_sq_destroy(admin sq)
    nvmet_async_events_failall(ctrl)

The controller has already been removed from the subsystem list before
nvmet_ctrl_free() quiesces outstanding work.

Replace flush_work() with cancel_work_sync() so a pending
async_event_work item is canceled and a running instance is waited on
without recursing into the same workqueue.

Fixes: 06406d81a2d7 ("nvmet: cancel fatal error and flush async work before free controller")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-04-13T22:51:31+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-04-13T22:51:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7fe6ac157b7e15c8976bd62ad7cb98e248884e83'/>
<id>7fe6ac157b7e15c8976bd62ad7cb98e248884e83</id>
<content type='text'>
Pull block updates from Jens Axboe:

 - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
   copies between kernel and userspace by matching registered buffer
   PFNs at I/O time. Includes selftests.

 - Refactor bio integrity to support filesystem initiated integrity
   operations and arbitrary buffer alignment.

 - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
   and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
   unify synchronous bi_end_io callbacks.

 - Fix zone write plug refcount handling and plug removal races. Add
   support for serializing zone writes at QD=1 for rotational zoned
   devices, yielding significant throughput improvements.

 - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
   command.

 - Add io_uring passthrough (uring_cmd) support to the BSG layer.

 - Replace pp_buf in partition scanning with struct seq_buf.

 - zloop improvements and cleanups.

 - drbd genl cleanup, switching to pre_doit/post_doit.

 - NVMe pull request via Keith:
      - Fabrics authentication updates
      - Enhanced block queue limits support
      - Workqueue usage updates
      - A new write zeroes device quirk
      - Tagset cleanup fix for loop device

 - MD pull requests via Yu Kuai:
      - Fix raid5 soft lockup in retry_aligned_read()
      - Fix raid10 deadlock with check operation and nowait requests
      - Fix raid1 overlapping writes on writemostly disks
      - Fix sysfs deadlock on array_state=clear
      - Proactive RAID-5 parity building with llbitmap, with
        write_zeroes_unmap optimization for initial sync
      - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
        version mismatch fallback
      - Fix bcache use-after-free and uninitialized closure
      - Validate raid5 journal metadata payload size
      - Various cleanups

 - Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
  ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
  scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
  block: refactor blkdev_zone_mgmt_ioctl
  MAINTAINERS: update ublk driver maintainer email
  Documentation: ublk: address review comments for SHMEM_ZC docs
  ublk: allow buffer registration before device is started
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  xfs: use bio_await in xfs_zone_gc_reset_sync
  block: add a bio_submit_or_kill helper
  block: factor out a bio_await helper
  block: unify the synchronous bi_end_io callbacks
  xfs: fix number of GC bvecs
  selftests/ublk: add read-only buffer registration test
  selftests/ublk: add filesystem fio verify test for shmem_zc
  selftests/ublk: add hugetlbfs shmem_zc test for loop target
  selftests/ublk: add shared memory zero-copy test
  selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block updates from Jens Axboe:

 - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O
   copies between kernel and userspace by matching registered buffer
   PFNs at I/O time. Includes selftests.

 - Refactor bio integrity to support filesystem initiated integrity
   operations and arbitrary buffer alignment.

 - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast
   and slow paths. Add bio_await() and bio_submit_or_kill() helpers,
   unify synchronous bi_end_io callbacks.

 - Fix zone write plug refcount handling and plug removal races. Add
   support for serializing zone writes at QD=1 for rotational zoned
   devices, yielding significant throughput improvements.

 - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET
   command.

 - Add io_uring passthrough (uring_cmd) support to the BSG layer.

 - Replace pp_buf in partition scanning with struct seq_buf.

 - zloop improvements and cleanups.

 - drbd genl cleanup, switching to pre_doit/post_doit.

 - NVMe pull request via Keith:
      - Fabrics authentication updates
      - Enhanced block queue limits support
      - Workqueue usage updates
      - A new write zeroes device quirk
      - Tagset cleanup fix for loop device

 - MD pull requests via Yu Kuai:
      - Fix raid5 soft lockup in retry_aligned_read()
      - Fix raid10 deadlock with check operation and nowait requests
      - Fix raid1 overlapping writes on writemostly disks
      - Fix sysfs deadlock on array_state=clear
      - Proactive RAID-5 parity building with llbitmap, with
        write_zeroes_unmap optimization for initial sync
      - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops
        version mismatch fallback
      - Fix bcache use-after-free and uninitialized closure
      - Validate raid5 journal metadata payload size
      - Various cleanups

 - Various other fixes, improvements, and cleanups

* tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits)
  ublk: fix tautological comparison warning in ublk_ctrl_reg_buf
  scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd()
  block: refactor blkdev_zone_mgmt_ioctl
  MAINTAINERS: update ublk driver maintainer email
  Documentation: ublk: address review comments for SHMEM_ZC docs
  ublk: allow buffer registration before device is started
  ublk: replace xarray with IDA for shmem buffer index allocation
  ublk: simplify PFN range loop in __ublk_ctrl_reg_buf
  ublk: verify all pages in multi-page bvec fall within registered range
  ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support
  xfs: use bio_await in xfs_zone_gc_reset_sync
  block: add a bio_submit_or_kill helper
  block: factor out a bio_await helper
  block: unify the synchronous bi_end_io callbacks
  xfs: fix number of GC bvecs
  selftests/ublk: add read-only buffer registration test
  selftests/ublk: add filesystem fio verify test for shmem_zc
  selftests/ublk: add hugetlbfs shmem_zc test for loop target
  selftests/ublk: add shared memory zero-copy test
  selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-tcp: fix race between ICReq handling and queue teardown</title>
<updated>2026-04-09T14:19:32+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-04-08T07:51:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5293a8882c549fab4a878bc76b0b6c951f980a61'/>
<id>5293a8882c549fab4a878bc76b0b6c951f980a61</id>
<content type='text'>
nvmet_tcp_handle_icreq() updates queue-&gt;state after sending an
Initialization Connection Response (ICResp), but it does so without
serializing against target-side queue teardown.

If an NVMe/TCP host sends an Initialization Connection Request
(ICReq) and immediately closes the connection, target-side teardown
may start in softirq context before io_work drains the already
buffered ICReq. In that case, nvmet_tcp_schedule_release_queue()
sets queue-&gt;state to NVMET_TCP_Q_DISCONNECTING and drops the queue
reference under state_lock.

If io_work later processes that ICReq, nvmet_tcp_handle_icreq() can
still overwrite the state back to NVMET_TCP_Q_LIVE. That defeats the
DISCONNECTING-state guard in nvmet_tcp_schedule_release_queue() and
allows a later socket state change to re-enter teardown and issue a
second kref_put() on an already released queue.

The ICResp send failure path has the same problem. If teardown has
already moved the queue to DISCONNECTING, a send error can still
overwrite the state with NVMET_TCP_Q_FAILED, again reopening the
window for a second teardown path to drop the queue reference.

Fix this by serializing both post-send state transitions with
state_lock and bailing out if teardown has already started.

Use -ESHUTDOWN as an internal sentinel for that bail-out path rather
than propagating it as a transport error like -ECONNRESET. Keep
nvmet_tcp_socket_error() setting rcv_state to NVMET_TCP_RECV_ERR before
honoring that sentinel so receive-side parsing stays quiesced until the
existing release path completes.

Fixes: c46a6465bac2 ("nvmet-tcp: add NVMe over TCP target driver")
Cc: stable@vger.kernel.org
Reported-by: Shivam Kumar &lt;skumar47@syr.edu&gt;
Tested-by: Shivam Kumar &lt;kumar.shivam43666@gmail.com&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
nvmet_tcp_handle_icreq() updates queue-&gt;state after sending an
Initialization Connection Response (ICResp), but it does so without
serializing against target-side queue teardown.

If an NVMe/TCP host sends an Initialization Connection Request
(ICReq) and immediately closes the connection, target-side teardown
may start in softirq context before io_work drains the already
buffered ICReq. In that case, nvmet_tcp_schedule_release_queue()
sets queue-&gt;state to NVMET_TCP_Q_DISCONNECTING and drops the queue
reference under state_lock.

If io_work later processes that ICReq, nvmet_tcp_handle_icreq() can
still overwrite the state back to NVMET_TCP_Q_LIVE. That defeats the
DISCONNECTING-state guard in nvmet_tcp_schedule_release_queue() and
allows a later socket state change to re-enter teardown and issue a
second kref_put() on an already released queue.

The ICResp send failure path has the same problem. If teardown has
already moved the queue to DISCONNECTING, a send error can still
overwrite the state with NVMET_TCP_Q_FAILED, again reopening the
window for a second teardown path to drop the queue reference.

Fix this by serializing both post-send state transitions with
state_lock and bailing out if teardown has already started.

Use -ESHUTDOWN as an internal sentinel for that bail-out path rather
than propagating it as a transport error like -ECONNRESET. Keep
nvmet_tcp_socket_error() setting rcv_state to NVMET_TCP_RECV_ERR before
honoring that sentinel so receive-side parsing stays quiesced until the
existing release path completes.

Fixes: c46a6465bac2 ("nvmet-tcp: add NVMe over TCP target driver")
Cc: stable@vger.kernel.org
Reported-by: Shivam Kumar &lt;skumar47@syr.edu&gt;
Tested-by: Shivam Kumar &lt;kumar.shivam43666@gmail.com&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
