<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/nvme, branch v7.0.9</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free</title>
<updated>2026-05-14T13:31:14+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-04-09T00:56:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=551f445a56a11a6457550cddcf39c9ebb8bcacc6'/>
<id>551f445a56a11a6457550cddcf39c9ebb8bcacc6</id>
<content type='text'>
commit aade8abd8b868b6ffa9697aadaea28ec7f65bee6 upstream.

nvmet_tcp_release_queue_work() runs on nvmet-wq and can drop the
final controller reference through nvmet_cq_put(). If that triggers
nvmet_ctrl_free(), the teardown path flushes ctrl-&gt;async_event_work on
the same nvmet-wq.

Call chain:

 nvmet_tcp_schedule_release_queue()
   kref_put(&amp;queue-&gt;kref, nvmet_tcp_release_queue)
     nvmet_tcp_release_queue()
       queue_work(nvmet_wq, &amp;queue-&gt;release_work) &lt;--- nvmet_wq
         process_one_work()
           nvmet_tcp_release_queue_work()
             nvmet_cq_put(&amp;queue-&gt;nvme_cq)
               nvmet_cq_destroy()
                 nvmet_ctrl_put(cq-&gt;ctrl)
                   nvmet_ctrl_free()
                     flush_work(&amp;ctrl-&gt;async_event_work) &lt;--- nvmet_wq

                      Previously Scheduled by :-
		        nvmet_add_async_event
		          queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work);

This trips lockdep with a possible recursive locking warning.

[ 5223.015876] run blktests nvme/003 at 2026-04-07 20:53:55
[ 5223.061801] loop0: detected capacity change from 0 to 2097152
[ 5223.072206] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 5223.088368] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 5223.126086] nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 5223.128453] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 5233.199447] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

[ 5233.227718] ============================================
[ 5233.231283] WARNING: possible recursive locking detected
[ 5233.234696] 7.0.0-rc3nvme+ #20 Tainted: G           O     N
[ 5233.238434] --------------------------------------------
[ 5233.241852] kworker/u192:6/2413 is trying to acquire lock:
[ 5233.245429] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
[ 5233.251438]
               but task is already holding lock:
[ 5233.255254] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.261125]
               other info that might help us debug this:
[ 5233.265333]  Possible unsafe locking scenario:

[ 5233.269217]        CPU0
[ 5233.270795]        ----
[ 5233.272436]   lock((wq_completion)nvmet-wq);
[ 5233.275241]   lock((wq_completion)nvmet-wq);
[ 5233.278020]
                *** DEADLOCK ***

[ 5233.281793]  May be due to missing lock nesting notation

[ 5233.286195] 3 locks held by kworker/u192:6/2413:
[ 5233.289192]  #0: ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.294569]  #1: ffffc9000e2a7e40 ((work_completion)(&amp;queue-&gt;release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x6e0
[ 5233.300128]  #2: ffffffff82d7dc40 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
[ 5233.304290]
               stack backtrace:
[ 5233.306520] CPU: 4 UID: 0 PID: 2413 Comm: kworker/u192:6 Tainted: G           O     N  7.0.0-rc3nvme+ #20 PREEMPT(full)
[ 5233.306524] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 5233.306525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 5233.306527] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[ 5233.306532] Call Trace:
[ 5233.306534]  &lt;TASK&gt;
[ 5233.306536]  dump_stack_lvl+0x73/0xb0
[ 5233.306552]  print_deadlock_bug+0x225/0x2f0
[ 5233.306556]  __lock_acquire+0x13f0/0x2290
[ 5233.306563]  lock_acquire+0xd0/0x300
[ 5233.306565]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306571]  ? __flush_work+0x20b/0x530
[ 5233.306573]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306577]  touch_wq_lockdep_map+0x3b/0x90
[ 5233.306580]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306583]  ? __flush_work+0x20b/0x530
[ 5233.306585]  __flush_work+0x268/0x530
[ 5233.306588]  ? __pfx_wq_barrier_func+0x10/0x10
[ 5233.306594]  ? xen_error_entry+0x30/0x60
[ 5233.306600]  nvmet_ctrl_free+0x140/0x310 [nvmet]
[ 5233.306617]  nvmet_cq_put+0x74/0x90 [nvmet]
[ 5233.306629]  nvmet_tcp_release_queue_work+0x19f/0x360 [nvmet_tcp]
[ 5233.306634]  process_one_work+0x206/0x6e0
[ 5233.306640]  worker_thread+0x184/0x320
[ 5233.306643]  ? __pfx_worker_thread+0x10/0x10
[ 5233.306646]  kthread+0xf1/0x130
[ 5233.306648]  ? __pfx_kthread+0x10/0x10
[ 5233.306651]  ret_from_fork+0x355/0x450
[ 5233.306653]  ? __pfx_kthread+0x10/0x10
[ 5233.306656]  ret_from_fork_asm+0x1a/0x30
[ 5233.306664]  &lt;/TASK&gt;

There is also no need to flush async_event_work from controller
teardown. The admin queue teardown already fails outstanding AER
requests before the final controller put :-

 nvmet_sq_destroy(admin sq)
    nvmet_async_events_failall(ctrl)

The controller has already been removed from the subsystem list before
nvmet_ctrl_free() quiesces outstanding work.

Replace flush_work() with cancel_work_sync() so a pending
async_event_work item is canceled and a running instance is waited on
without recursing into the same workqueue.

Fixes: 06406d81a2d7 ("nvmet: cancel fatal error and flush async work before free controller")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit aade8abd8b868b6ffa9697aadaea28ec7f65bee6 upstream.

nvmet_tcp_release_queue_work() runs on nvmet-wq and can drop the
final controller reference through nvmet_cq_put(). If that triggers
nvmet_ctrl_free(), the teardown path flushes ctrl-&gt;async_event_work on
the same nvmet-wq.

Call chain:

 nvmet_tcp_schedule_release_queue()
   kref_put(&amp;queue-&gt;kref, nvmet_tcp_release_queue)
     nvmet_tcp_release_queue()
       queue_work(nvmet_wq, &amp;queue-&gt;release_work) &lt;--- nvmet_wq
         process_one_work()
           nvmet_tcp_release_queue_work()
             nvmet_cq_put(&amp;queue-&gt;nvme_cq)
               nvmet_cq_destroy()
                 nvmet_ctrl_put(cq-&gt;ctrl)
                   nvmet_ctrl_free()
                     flush_work(&amp;ctrl-&gt;async_event_work) &lt;--- nvmet_wq

                      Previously Scheduled by :-
		        nvmet_add_async_event
		          queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work);

This trips lockdep with a possible recursive locking warning.

[ 5223.015876] run blktests nvme/003 at 2026-04-07 20:53:55
[ 5223.061801] loop0: detected capacity change from 0 to 2097152
[ 5223.072206] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 5223.088368] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[ 5223.126086] nvmet: Created discovery controller 1 for subsystem nqn.2014-08.org.nvmexpress.discovery for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[ 5223.128453] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420, hostnqn: nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[ 5233.199447] nvme nvme1: Removing ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery"

[ 5233.227718] ============================================
[ 5233.231283] WARNING: possible recursive locking detected
[ 5233.234696] 7.0.0-rc3nvme+ #20 Tainted: G           O     N
[ 5233.238434] --------------------------------------------
[ 5233.241852] kworker/u192:6/2413 is trying to acquire lock:
[ 5233.245429] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
[ 5233.251438]
               but task is already holding lock:
[ 5233.255254] ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.261125]
               other info that might help us debug this:
[ 5233.265333]  Possible unsafe locking scenario:

[ 5233.269217]        CPU0
[ 5233.270795]        ----
[ 5233.272436]   lock((wq_completion)nvmet-wq);
[ 5233.275241]   lock((wq_completion)nvmet-wq);
[ 5233.278020]
                *** DEADLOCK ***

[ 5233.281793]  May be due to missing lock nesting notation

[ 5233.286195] 3 locks held by kworker/u192:6/2413:
[ 5233.289192]  #0: ffff888111632548 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x5cc/0x6e0
[ 5233.294569]  #1: ffffc9000e2a7e40 ((work_completion)(&amp;queue-&gt;release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x6e0
[ 5233.300128]  #2: ffffffff82d7dc40 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530
[ 5233.304290]
               stack backtrace:
[ 5233.306520] CPU: 4 UID: 0 PID: 2413 Comm: kworker/u192:6 Tainted: G           O     N  7.0.0-rc3nvme+ #20 PREEMPT(full)
[ 5233.306524] Tainted: [O]=OOT_MODULE, [N]=TEST
[ 5233.306525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
[ 5233.306527] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[ 5233.306532] Call Trace:
[ 5233.306534]  &lt;TASK&gt;
[ 5233.306536]  dump_stack_lvl+0x73/0xb0
[ 5233.306552]  print_deadlock_bug+0x225/0x2f0
[ 5233.306556]  __lock_acquire+0x13f0/0x2290
[ 5233.306563]  lock_acquire+0xd0/0x300
[ 5233.306565]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306571]  ? __flush_work+0x20b/0x530
[ 5233.306573]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306577]  touch_wq_lockdep_map+0x3b/0x90
[ 5233.306580]  ? touch_wq_lockdep_map+0x26/0x90
[ 5233.306583]  ? __flush_work+0x20b/0x530
[ 5233.306585]  __flush_work+0x268/0x530
[ 5233.306588]  ? __pfx_wq_barrier_func+0x10/0x10
[ 5233.306594]  ? xen_error_entry+0x30/0x60
[ 5233.306600]  nvmet_ctrl_free+0x140/0x310 [nvmet]
[ 5233.306617]  nvmet_cq_put+0x74/0x90 [nvmet]
[ 5233.306629]  nvmet_tcp_release_queue_work+0x19f/0x360 [nvmet_tcp]
[ 5233.306634]  process_one_work+0x206/0x6e0
[ 5233.306640]  worker_thread+0x184/0x320
[ 5233.306643]  ? __pfx_worker_thread+0x10/0x10
[ 5233.306646]  kthread+0xf1/0x130
[ 5233.306648]  ? __pfx_kthread+0x10/0x10
[ 5233.306651]  ret_from_fork+0x355/0x450
[ 5233.306653]  ? __pfx_kthread+0x10/0x10
[ 5233.306656]  ret_from_fork_asm+0x1a/0x30
[ 5233.306664]  &lt;/TASK&gt;

There is also no need to flush async_event_work from controller
teardown. The admin queue teardown already fails outstanding AER
requests before the final controller put :-

 nvmet_sq_destroy(admin sq)
    nvmet_async_events_failall(ctrl)

The controller has already been removed from the subsystem list before
nvmet_ctrl_free() quiesces outstanding work.

Replace flush_work() with cancel_work_sync() so a pending
async_event_work item is canceled and a running instance is waited on
without recursing into the same workqueue.

Fixes: 06406d81a2d7 ("nvmet: cancel fatal error and flush async work before free controller")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet-tcp: fix race between ICReq handling and queue teardown</title>
<updated>2026-05-14T13:31:14+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-04-08T07:51:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=dcfe4d1f7960e7d1c01642318f3aae1a604f8508'/>
<id>dcfe4d1f7960e7d1c01642318f3aae1a604f8508</id>
<content type='text'>
commit 5293a8882c549fab4a878bc76b0b6c951f980a61 upstream.

nvmet_tcp_handle_icreq() updates queue-&gt;state after sending an
Initialization Connection Response (ICResp), but it does so without
serializing against target-side queue teardown.

If an NVMe/TCP host sends an Initialization Connection Request
(ICReq) and immediately closes the connection, target-side teardown
may start in softirq context before io_work drains the already
buffered ICReq. In that case, nvmet_tcp_schedule_release_queue()
sets queue-&gt;state to NVMET_TCP_Q_DISCONNECTING and drops the queue
reference under state_lock.

If io_work later processes that ICReq, nvmet_tcp_handle_icreq() can
still overwrite the state back to NVMET_TCP_Q_LIVE. That defeats the
DISCONNECTING-state guard in nvmet_tcp_schedule_release_queue() and
allows a later socket state change to re-enter teardown and issue a
second kref_put() on an already released queue.

The ICResp send failure path has the same problem. If teardown has
already moved the queue to DISCONNECTING, a send error can still
overwrite the state with NVMET_TCP_Q_FAILED, again reopening the
window for a second teardown path to drop the queue reference.

Fix this by serializing both post-send state transitions with
state_lock and bailing out if teardown has already started.

Use -ESHUTDOWN as an internal sentinel for that bail-out path rather
than propagating it as a transport error like -ECONNRESET. Keep
nvmet_tcp_socket_error() setting rcv_state to NVMET_TCP_RECV_ERR before
honoring that sentinel so receive-side parsing stays quiesced until the
existing release path completes.

Fixes: c46a6465bac2 ("nvmet-tcp: add NVMe over TCP target driver")
Cc: stable@vger.kernel.org
Reported-by: Shivam Kumar &lt;skumar47@syr.edu&gt;
Tested-by: Shivam Kumar &lt;kumar.shivam43666@gmail.com&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5293a8882c549fab4a878bc76b0b6c951f980a61 upstream.

nvmet_tcp_handle_icreq() updates queue-&gt;state after sending an
Initialization Connection Response (ICResp), but it does so without
serializing against target-side queue teardown.

If an NVMe/TCP host sends an Initialization Connection Request
(ICReq) and immediately closes the connection, target-side teardown
may start in softirq context before io_work drains the already
buffered ICReq. In that case, nvmet_tcp_schedule_release_queue()
sets queue-&gt;state to NVMET_TCP_Q_DISCONNECTING and drops the queue
reference under state_lock.

If io_work later processes that ICReq, nvmet_tcp_handle_icreq() can
still overwrite the state back to NVMET_TCP_Q_LIVE. That defeats the
DISCONNECTING-state guard in nvmet_tcp_schedule_release_queue() and
allows a later socket state change to re-enter teardown and issue a
second kref_put() on an already released queue.

The ICResp send failure path has the same problem. If teardown has
already moved the queue to DISCONNECTING, a send error can still
overwrite the state with NVMET_TCP_Q_FAILED, again reopening the
window for a second teardown path to drop the queue reference.

Fix this by serializing both post-send state transitions with
state_lock and bailing out if teardown has already started.

Use -ESHUTDOWN as an internal sentinel for that bail-out path rather
than propagating it as a transport error like -ECONNRESET. Keep
nvmet_tcp_socket_error() setting rcv_state to NVMET_TCP_RECV_ERR before
honoring that sentinel so receive-side parsing stays quiesced until the
existing release path completes.

Fixes: c46a6465bac2 ("nvmet-tcp: add NVMe over TCP target driver")
Cc: stable@vger.kernel.org
Reported-by: Shivam Kumar &lt;skumar47@syr.edu&gt;
Tested-by: Shivam Kumar &lt;kumar.shivam43666@gmail.com&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-apple: drop invalid put of admin queue reference count</title>
<updated>2026-05-14T13:31:14+00:00</updated>
<author>
<name>Fedor Pchelkin</name>
<email>pchelkin@ispras.ru</email>
</author>
<published>2026-04-08T14:18:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e8352496c9f5272d6b98fd2f03098edaa05a7fb0'/>
<id>e8352496c9f5272d6b98fd2f03098edaa05a7fb0</id>
<content type='text'>
commit ba9d308ccd6732dd97ed8080d834a4a89e758e14 upstream.

Commit 03b3bcd319b3 ("nvme: fix admin request_queue lifetime") moved the
admin queue reference -&gt;put call into nvme_free_ctrl() - a controller
device release callback performed for every nvme driver doing
nvme_init_ctrl().

nvme-apple sets refcount of the admin queue to 1 at allocation during the
probe function and then puts it twice now:

nvme_free_ctrl()
  blk_put_queue(ctrl-&gt;admin_q) // #1
  -&gt;free_ctrl()
    apple_nvme_free_ctrl()
      blk_put_queue(anv-&gt;ctrl.admin_q) // #2

Note that there is a commit 941f7298c70c ("nvme-apple: remove an extra
queue reference") which intended to drop taking an extra admin queue
reference.  Looks like at that moment it accidentally fixed a refcount
leak, which existed since the driver's introduction.  There were two -&gt;get
calls at driver's probe function and a single -&gt;put inside
apple_nvme_free_ctrl().

However now after commit 03b3bcd319b3 ("nvme: fix admin request_queue
lifetime") the refcount is imbalanced again.  Fix it by removing extra
-&gt;put call from apple_nvme_free_ctrl().  anv-&gt;dev and ctrl-&gt;dev point to
the same device, so use ctrl-&gt;dev directly for simplification.  Compile
tested only.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 03b3bcd319b3 ("nvme: fix admin request_queue lifetime")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Fedor Pchelkin &lt;pchelkin@ispras.ru&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ba9d308ccd6732dd97ed8080d834a4a89e758e14 upstream.

Commit 03b3bcd319b3 ("nvme: fix admin request_queue lifetime") moved the
admin queue reference -&gt;put call into nvme_free_ctrl() - a controller
device release callback performed for every nvme driver doing
nvme_init_ctrl().

nvme-apple sets refcount of the admin queue to 1 at allocation during the
probe function and then puts it twice now:

nvme_free_ctrl()
  blk_put_queue(ctrl-&gt;admin_q) // #1
  -&gt;free_ctrl()
    apple_nvme_free_ctrl()
      blk_put_queue(anv-&gt;ctrl.admin_q) // #2

Note that there is a commit 941f7298c70c ("nvme-apple: remove an extra
queue reference") which intended to drop taking an extra admin queue
reference.  Looks like at that moment it accidentally fixed a refcount
leak, which existed since the driver's introduction.  There were two -&gt;get
calls at driver's probe function and a single -&gt;put inside
apple_nvme_free_ctrl().

However now after commit 03b3bcd319b3 ("nvme: fix admin request_queue
lifetime") the refcount is imbalanced again.  Fix it by removing extra
-&gt;put call from apple_nvme_free_ctrl().  anv-&gt;dev and ctrl-&gt;dev point to
the same device, so use ctrl-&gt;dev directly for simplification.  Compile
tested only.

Found by Linux Verification Center (linuxtesting.org).

Fixes: 03b3bcd319b3 ("nvme: fix admin request_queue lifetime")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Fedor Pchelkin &lt;pchelkin@ispras.ru&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme: respect NVME_QUIRK_DISABLE_WRITE_ZEROES when wzsl is set</title>
<updated>2026-05-07T04:13:51+00:00</updated>
<author>
<name>Robert Beckett</name>
<email>bob.beckett@collabora.com</email>
</author>
<published>2026-03-20T19:22:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f3bb4aecd80a9fa83aaa1afd39bcc024fdd6ac4f'/>
<id>f3bb4aecd80a9fa83aaa1afd39bcc024fdd6ac4f</id>
<content type='text'>
commit 40f0496b617b431f8d2dd94d7f785c1121f8a68a upstream.

The NVM Command Set Identify Controller data may report a non-zero
Write Zeroes Size Limit (wzsl). When present, nvme_init_non_mdts_limits()
unconditionally overrides max_zeroes_sectors from wzsl, even if
NVME_QUIRK_DISABLE_WRITE_ZEROES previously set it to zero.

This effectively re-enables write zeroes for devices that need it
disabled, defeating the quirk. Several Kingston OM* drives rely on
this quirk to avoid firmware issues with write zeroes commands.

Check for the quirk before applying the wzsl override.

Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")
Cc: stable@vger.kernel.org
Signed-off-by: Robert Beckett &lt;bob.beckett@collabora.com&gt;
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 40f0496b617b431f8d2dd94d7f785c1121f8a68a upstream.

The NVM Command Set Identify Controller data may report a non-zero
Write Zeroes Size Limit (wzsl). When present, nvme_init_non_mdts_limits()
unconditionally overrides max_zeroes_sectors from wzsl, even if
NVME_QUIRK_DISABLE_WRITE_ZEROES previously set it to zero.

This effectively re-enables write zeroes for devices that need it
disabled, defeating the quirk. Several Kingston OM* drives rely on
this quirk to avoid firmware issues with write zeroes commands.

Check for the quirk before applying the wzsl override.

Fixes: 5befc7c26e5a ("nvme: implement non-mdts command limits")
Cc: stable@vger.kernel.org
Signed-off-by: Robert Beckett &lt;bob.beckett@collabora.com&gt;
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-pci: add NVME_QUIRK_DISABLE_WRITE_ZEROES for Kingston OM3SGP4</title>
<updated>2026-05-07T04:13:51+00:00</updated>
<author>
<name>Robert Beckett</name>
<email>bob.beckett@collabora.com</email>
</author>
<published>2026-03-20T19:22:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b94e2a8bea45ef2e1762ee5d40cdb5faeec69672'/>
<id>b94e2a8bea45ef2e1762ee5d40cdb5faeec69672</id>
<content type='text'>
commit a8eebf9699d69987cc49cec4e4fdb4111ab32423 upstream.

The Kingston OM3SGP42048K2-A00 (PCI ID 2646:502f) firmware has a race
condition when processing concurrent write zeroes and DSM (discard)
commands, causing spurious "LBA Out of Range" errors and IOMMU page
faults at address 0x0.

The issue is reliably triggered by running two concurrent mkfs commands
on different partitions of the same drive, which generates interleaved
write zeroes and discard operations.

Disable write zeroes for this device, matching the pattern used for
other Kingston OM* drives that have similar firmware issues.

Cc: stable@vger.kernel.org
Signed-off-by: Robert Beckett &lt;bob.beckett@collabora.com&gt;
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a8eebf9699d69987cc49cec4e4fdb4111ab32423 upstream.

The Kingston OM3SGP42048K2-A00 (PCI ID 2646:502f) firmware has a race
condition when processing concurrent write zeroes and DSM (discard)
commands, causing spurious "LBA Out of Range" errors and IOMMU page
faults at address 0x0.

The issue is reliably triggered by running two concurrent mkfs commands
on different partitions of the same drive, which generates interleaved
write zeroes and discard operations.

Disable write zeroes for this device, matching the pattern used for
other Kingston OM* drives that have similar firmware issues.

Cc: stable@vger.kernel.org
Signed-off-by: Robert Beckett &lt;bob.beckett@collabora.com&gt;
Assisted-by: claude-opus-4-6-v1
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'block-7.0-20260312' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux</title>
<updated>2026-03-13T17:13:06+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-03-13T17:13:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=73548503dca50d2c2aa8c8cbb6eb8c1bf5959b21'/>
<id>73548503dca50d2c2aa8c8cbb6eb8c1bf5959b21</id>
<content type='text'>
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Fix nvme-pci IRQ race and slab-out-of-bounds access
      - Fix recursive workqueue locking for target async events
      - Various cleanups

 - Fix a potential NULL pointer dereference in ublk on size setting

 - ublk automatic partition scanning fix

 - Two s390 dasd fixes

* tag 'block-7.0-20260312' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  nvme: Annotate struct nvme_dhchap_key with __counted_by
  nvme-core: do not pass empty queue_limits to blk_mq_alloc_queue()
  nvme-pci: Fix race bug in nvme_poll_irqdisable()
  nvmet: move async event work off nvmet-wq
  nvme-pci: Fix slab-out-of-bounds in nvme_dbbuf_set
  s390/dasd: Copy detected format information to secondary device
  s390/dasd: Move quiesce state with pprc swap
  ublk: don't clear GD_SUPPRESS_PART_SCAN for unprivileged daemons
  ublk: fix NULL pointer dereference in ublk_ctrl_set_size()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Fix nvme-pci IRQ race and slab-out-of-bounds access
      - Fix recursive workqueue locking for target async events
      - Various cleanups

 - Fix a potential NULL pointer dereference in ublk on size setting

 - ublk automatic partition scanning fix

 - Two s390 dasd fixes

* tag 'block-7.0-20260312' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  nvme: Annotate struct nvme_dhchap_key with __counted_by
  nvme-core: do not pass empty queue_limits to blk_mq_alloc_queue()
  nvme-pci: Fix race bug in nvme_poll_irqdisable()
  nvmet: move async event work off nvmet-wq
  nvme-pci: Fix slab-out-of-bounds in nvme_dbbuf_set
  s390/dasd: Copy detected format information to secondary device
  s390/dasd: Move quiesce state with pprc swap
  ublk: don't clear GD_SUPPRESS_PART_SCAN for unprivileged daemons
  ublk: fix NULL pointer dereference in ublk_ctrl_set_size()
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-core: do not pass empty queue_limits to blk_mq_alloc_queue()</title>
<updated>2026-03-10T15:20:29+00:00</updated>
<author>
<name>Maurizio Lombardi</name>
<email>mlombard@redhat.com</email>
</author>
<published>2026-02-20T13:30:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0375c81eb2256366cd9602441d48b8b937e79635'/>
<id>0375c81eb2256366cd9602441d48b8b937e79635</id>
<content type='text'>
In nvme_alloc_admin_tag_set(), an empty queue_limits struct is
currently allocated on the stack and passed by reference to
blk_mq_alloc_queue().

This is redundant because blk_mq_alloc_queue() already handles
a NULL limits pointer by internally substituting it with a default
empty queue_limits struct.
Remove the unnecessary local variable and pass a NULL value.

Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Signed-off-by: Maurizio Lombardi &lt;mlombard@redhat.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In nvme_alloc_admin_tag_set(), an empty queue_limits struct is
currently allocated on the stack and passed by reference to
blk_mq_alloc_queue().

This is redundant because blk_mq_alloc_queue() already handles
a NULL limits pointer by internally substituting it with a default
empty queue_limits struct.
Remove the unnecessary local variable and pass a NULL value.

Reviewed-by: Kanchan Joshi &lt;joshi.k@samsung.com&gt;
Signed-off-by: Maurizio Lombardi &lt;mlombard@redhat.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-pci: Fix race bug in nvme_poll_irqdisable()</title>
<updated>2026-03-10T15:20:29+00:00</updated>
<author>
<name>Sungwoo Kim</name>
<email>iam@sung-woo.kim</email>
</author>
<published>2026-03-07T19:46:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fc71f409b22ca831a9f87a2712eaa09ef2bb4a5e'/>
<id>fc71f409b22ca831a9f87a2712eaa09ef2bb4a5e</id>
<content type='text'>
In the following scenario, pdev can be disabled between (1) and (3) by
(2). This sets pdev-&gt;msix_enabled = 0. Then, pci_irq_vector() will
return MSI-X IRQ(&gt;15) for (1) whereas return INTx IRQ(&lt;=15) for (2).
This causes IRQ warning because it tries to enable INTx IRQ that has
never been disabled before.

To fix this, save IRQ number into a local variable and ensure
disable_irq() and enable_irq() operate on the same IRQ number.  Even if
pci_free_irq_vectors() frees the IRQ concurrently, disable_irq() and
enable_irq() on a stale IRQ number is still valid and safe, and the
depth accounting reamins balanced.

task 1:
nvme_poll_irqdisable()
  disable_irq(pci_irq_vector(pdev, nvmeq-&gt;cq_vector)) ...(1)
  enable_irq(pci_irq_vector(pdev, nvmeq-&gt;cq_vector))  ...(3)

task 2:
nvme_reset_work()
  nvme_dev_disable()
    pdev-&gt;msix_enable = 0;  ...(2)

crash log:

------------[ cut here ]------------
Unbalanced enable for IRQ 10
WARNING: kernel/irq/manage.c:753 at __enable_irq+0x102/0x190 kernel/irq/manage.c:753, CPU#1: kworker/1:0H/26
Modules linked in:
CPU: 1 UID: 0 PID: 26 Comm: kworker/1:0H Not tainted 6.19.0-dirty #9 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:__enable_irq+0x107/0x190 kernel/irq/manage.c:753
Code: ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 79 48 8d 3d 2e 7a 3f 05 41 8b 74 24 2c &lt;67&gt; 48 0f b9 3a e8 ef b9 21 00 5b 41 5c 5d e9 46 54 66 03 e8 e1 b9
RSP: 0018:ffffc900001bf550 EFLAGS: 00010046
RAX: 0000000000000007 RBX: 0000000000000000 RCX: ffffffffb20c0e90
RDX: 0000000000000000 RSI: 000000000000000a RDI: ffffffffb74b88f0
RBP: ffffc900001bf560 R08: ffff88800197cf00 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8880012a6000
R13: 1ffff92000037eae R14: 000000000000000a R15: 0000000000000293
FS:  0000000000000000(0000) GS:ffff8880b49f7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555da4a25fa8 CR3: 00000000208e8000 CR4: 00000000000006f0
Call Trace:
 &lt;TASK&gt;
 enable_irq+0x121/0x1e0 kernel/irq/manage.c:797
 nvme_poll_irqdisable+0x162/0x1c0 drivers/nvme/host/pci.c:1494
 nvme_timeout+0x965/0x14b0 drivers/nvme/host/pci.c:1744
 blk_mq_rq_timed_out block/blk-mq.c:1653 [inline]
 blk_mq_handle_expired+0x227/0x2d0 block/blk-mq.c:1721
 bt_iter+0x2fc/0x3a0 block/blk-mq-tag.c:292
 __sbitmap_for_each_set include/linux/sbitmap.h:269 [inline]
 sbitmap_for_each_set include/linux/sbitmap.h:290 [inline]
 bt_for_each block/blk-mq-tag.c:324 [inline]
 blk_mq_queue_tag_busy_iter+0x969/0x1e80 block/blk-mq-tag.c:536
 blk_mq_timeout_work+0x627/0x870 block/blk-mq.c:1763
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 &lt;/TASK&gt;
irq event stamp: 74478
hardirqs last  enabled at (74477): [&lt;ffffffffb5720a9c&gt;] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last  enabled at (74477): [&lt;ffffffffb5720a9c&gt;] _raw_spin_unlock_irq+0x2c/0x60 kernel/locking/spinlock.c:202
hardirqs last disabled at (74478): [&lt;ffffffffb57207b5&gt;] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (74478): [&lt;ffffffffb57207b5&gt;] _raw_spin_lock_irqsave+0x85/0xa0 kernel/locking/spinlock.c:162
softirqs last  enabled at (74304): [&lt;ffffffffb1e9466c&gt;] __do_softirq kernel/softirq.c:656 [inline]
softirqs last  enabled at (74304): [&lt;ffffffffb1e9466c&gt;] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last  enabled at (74304): [&lt;ffffffffb1e9466c&gt;] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
softirqs last disabled at (74287): [&lt;ffffffffb1e9466c&gt;] __do_softirq kernel/softirq.c:656 [inline]
softirqs last disabled at (74287): [&lt;ffffffffb1e9466c&gt;] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last disabled at (74287): [&lt;ffffffffb1e9466c&gt;] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
---[ end trace 0000000000000000 ]---

Fixes: fa059b856a59 (nvme-pci: Simplify nvme_poll_irqdisable)
Acked-by: Chao Shi &lt;cshi008@fiu.edu&gt;
Acked-by: Weidong Zhu &lt;weizhu@fiu.edu&gt;
Acked-by: Dave Tian &lt;daveti@purdue.edu&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sungwoo Kim &lt;iam@sung-woo.kim&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In the following scenario, pdev can be disabled between (1) and (3) by
(2). This sets pdev-&gt;msix_enabled = 0. Then, pci_irq_vector() will
return MSI-X IRQ(&gt;15) for (1) whereas return INTx IRQ(&lt;=15) for (2).
This causes IRQ warning because it tries to enable INTx IRQ that has
never been disabled before.

To fix this, save IRQ number into a local variable and ensure
disable_irq() and enable_irq() operate on the same IRQ number.  Even if
pci_free_irq_vectors() frees the IRQ concurrently, disable_irq() and
enable_irq() on a stale IRQ number is still valid and safe, and the
depth accounting reamins balanced.

task 1:
nvme_poll_irqdisable()
  disable_irq(pci_irq_vector(pdev, nvmeq-&gt;cq_vector)) ...(1)
  enable_irq(pci_irq_vector(pdev, nvmeq-&gt;cq_vector))  ...(3)

task 2:
nvme_reset_work()
  nvme_dev_disable()
    pdev-&gt;msix_enable = 0;  ...(2)

crash log:

------------[ cut here ]------------
Unbalanced enable for IRQ 10
WARNING: kernel/irq/manage.c:753 at __enable_irq+0x102/0x190 kernel/irq/manage.c:753, CPU#1: kworker/1:0H/26
Modules linked in:
CPU: 1 UID: 0 PID: 26 Comm: kworker/1:0H Not tainted 6.19.0-dirty #9 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:__enable_irq+0x107/0x190 kernel/irq/manage.c:753
Code: ff df 48 89 fa 48 c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 79 48 8d 3d 2e 7a 3f 05 41 8b 74 24 2c &lt;67&gt; 48 0f b9 3a e8 ef b9 21 00 5b 41 5c 5d e9 46 54 66 03 e8 e1 b9
RSP: 0018:ffffc900001bf550 EFLAGS: 00010046
RAX: 0000000000000007 RBX: 0000000000000000 RCX: ffffffffb20c0e90
RDX: 0000000000000000 RSI: 000000000000000a RDI: ffffffffb74b88f0
RBP: ffffc900001bf560 R08: ffff88800197cf00 R09: 0000000000000001
R10: 0000000000000003 R11: 0000000000000003 R12: ffff8880012a6000
R13: 1ffff92000037eae R14: 000000000000000a R15: 0000000000000293
FS:  0000000000000000(0000) GS:ffff8880b49f7000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555da4a25fa8 CR3: 00000000208e8000 CR4: 00000000000006f0
Call Trace:
 &lt;TASK&gt;
 enable_irq+0x121/0x1e0 kernel/irq/manage.c:797
 nvme_poll_irqdisable+0x162/0x1c0 drivers/nvme/host/pci.c:1494
 nvme_timeout+0x965/0x14b0 drivers/nvme/host/pci.c:1744
 blk_mq_rq_timed_out block/blk-mq.c:1653 [inline]
 blk_mq_handle_expired+0x227/0x2d0 block/blk-mq.c:1721
 bt_iter+0x2fc/0x3a0 block/blk-mq-tag.c:292
 __sbitmap_for_each_set include/linux/sbitmap.h:269 [inline]
 sbitmap_for_each_set include/linux/sbitmap.h:290 [inline]
 bt_for_each block/blk-mq-tag.c:324 [inline]
 blk_mq_queue_tag_busy_iter+0x969/0x1e80 block/blk-mq-tag.c:536
 blk_mq_timeout_work+0x627/0x870 block/blk-mq.c:1763
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 &lt;/TASK&gt;
irq event stamp: 74478
hardirqs last  enabled at (74477): [&lt;ffffffffb5720a9c&gt;] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:159 [inline]
hardirqs last  enabled at (74477): [&lt;ffffffffb5720a9c&gt;] _raw_spin_unlock_irq+0x2c/0x60 kernel/locking/spinlock.c:202
hardirqs last disabled at (74478): [&lt;ffffffffb57207b5&gt;] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (74478): [&lt;ffffffffb57207b5&gt;] _raw_spin_lock_irqsave+0x85/0xa0 kernel/locking/spinlock.c:162
softirqs last  enabled at (74304): [&lt;ffffffffb1e9466c&gt;] __do_softirq kernel/softirq.c:656 [inline]
softirqs last  enabled at (74304): [&lt;ffffffffb1e9466c&gt;] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last  enabled at (74304): [&lt;ffffffffb1e9466c&gt;] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
softirqs last disabled at (74287): [&lt;ffffffffb1e9466c&gt;] __do_softirq kernel/softirq.c:656 [inline]
softirqs last disabled at (74287): [&lt;ffffffffb1e9466c&gt;] invoke_softirq kernel/softirq.c:496 [inline]
softirqs last disabled at (74287): [&lt;ffffffffb1e9466c&gt;] __irq_exit_rcu+0xdc/0x120 kernel/softirq.c:723
---[ end trace 0000000000000000 ]---

Fixes: fa059b856a59 (nvme-pci: Simplify nvme_poll_irqdisable)
Acked-by: Chao Shi &lt;cshi008@fiu.edu&gt;
Acked-by: Weidong Zhu &lt;weizhu@fiu.edu&gt;
Acked-by: Dave Tian &lt;daveti@purdue.edu&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Sungwoo Kim &lt;iam@sung-woo.kim&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvmet: move async event work off nvmet-wq</title>
<updated>2026-03-10T15:20:28+00:00</updated>
<author>
<name>Chaitanya Kulkarni</name>
<email>kch@nvidia.com</email>
</author>
<published>2026-02-26T04:30:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2922e3507f6d5caa7f1d07f145e186fc6f317a4e'/>
<id>2922e3507f6d5caa7f1d07f145e186fc6f317a4e</id>
<content type='text'>
For target nvmet_ctrl_free() flushes ctrl-&gt;async_event_work.
If nvmet_ctrl_free() runs on nvmet-wq, the flush re-enters workqueue
completion for the same worker:-

A. Async event work queued on nvmet-wq (prior to disconnect):
  nvmet_execute_async_event()
     queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work)

  nvmet_add_async_event()
     queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work)

B. Full pre-work chain (RDMA CM path):
  nvmet_rdma_cm_handler()
     nvmet_rdma_queue_disconnect()
       __nvmet_rdma_queue_disconnect()
         queue_work(nvmet_wq, &amp;queue-&gt;release_work)
           process_one_work()
             lock((wq_completion)nvmet-wq)  &lt;--------- 1st
             nvmet_rdma_release_queue_work()

C. Recursive path (same worker):
  nvmet_rdma_release_queue_work()
     nvmet_rdma_free_queue()
       nvmet_sq_destroy()
         nvmet_ctrl_put()
           nvmet_ctrl_free()
             flush_work(&amp;ctrl-&gt;async_event_work)
               __flush_work()
                 touch_wq_lockdep_map()
                 lock((wq_completion)nvmet-wq) &lt;--------- 2nd

Lockdep splat:

  ============================================
  WARNING: possible recursive locking detected
  6.19.0-rc3nvme+ #14 Tainted: G                 N
  --------------------------------------------
  kworker/u192:42/44933 is trying to acquire lock:
  ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90

  but task is already holding lock:
  ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660

  3 locks held by kworker/u192:42/44933:
   #0: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
   #1: ffffc9000e6cbe28 ((work_completion)(&amp;queue-&gt;release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x660
   #2: ffffffff82d4db60 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530

  Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
  Call Trace:
   __flush_work+0x268/0x530
   nvmet_ctrl_free+0x140/0x310 [nvmet]
   nvmet_cq_put+0x74/0x90 [nvmet]
   nvmet_rdma_free_queue+0x23/0xe0 [nvmet_rdma]
   nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
   process_one_work+0x206/0x660
   worker_thread+0x184/0x320
   kthread+0x10c/0x240
   ret_from_fork+0x319/0x390

Move async event work to a dedicated nvmet-aen-wq to avoid reentrant
flush on nvmet-wq.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For target nvmet_ctrl_free() flushes ctrl-&gt;async_event_work.
If nvmet_ctrl_free() runs on nvmet-wq, the flush re-enters workqueue
completion for the same worker:-

A. Async event work queued on nvmet-wq (prior to disconnect):
  nvmet_execute_async_event()
     queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work)

  nvmet_add_async_event()
     queue_work(nvmet_wq, &amp;ctrl-&gt;async_event_work)

B. Full pre-work chain (RDMA CM path):
  nvmet_rdma_cm_handler()
     nvmet_rdma_queue_disconnect()
       __nvmet_rdma_queue_disconnect()
         queue_work(nvmet_wq, &amp;queue-&gt;release_work)
           process_one_work()
             lock((wq_completion)nvmet-wq)  &lt;--------- 1st
             nvmet_rdma_release_queue_work()

C. Recursive path (same worker):
  nvmet_rdma_release_queue_work()
     nvmet_rdma_free_queue()
       nvmet_sq_destroy()
         nvmet_ctrl_put()
           nvmet_ctrl_free()
             flush_work(&amp;ctrl-&gt;async_event_work)
               __flush_work()
                 touch_wq_lockdep_map()
                 lock((wq_completion)nvmet-wq) &lt;--------- 2nd

Lockdep splat:

  ============================================
  WARNING: possible recursive locking detected
  6.19.0-rc3nvme+ #14 Tainted: G                 N
  --------------------------------------------
  kworker/u192:42/44933 is trying to acquire lock:
  ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90

  but task is already holding lock:
  ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660

  3 locks held by kworker/u192:42/44933:
   #0: ffff888118a00948 ((wq_completion)nvmet-wq){+.+.}-{0:0}, at: process_one_work+0x53e/0x660
   #1: ffffc9000e6cbe28 ((work_completion)(&amp;queue-&gt;release_work)){+.+.}-{0:0}, at: process_one_work+0x1c5/0x660
   #2: ffffffff82d4db60 (rcu_read_lock){....}-{1:3}, at: __flush_work+0x62/0x530

  Workqueue: nvmet-wq nvmet_rdma_release_queue_work [nvmet_rdma]
  Call Trace:
   __flush_work+0x268/0x530
   nvmet_ctrl_free+0x140/0x310 [nvmet]
   nvmet_cq_put+0x74/0x90 [nvmet]
   nvmet_rdma_free_queue+0x23/0xe0 [nvmet_rdma]
   nvmet_rdma_release_queue_work+0x19/0x50 [nvmet_rdma]
   process_one_work+0x206/0x660
   worker_thread+0x184/0x320
   kthread+0x10c/0x240
   ret_from_fork+0x319/0x390

Move async event work to a dedicated nvmet-aen-wq to avoid reentrant
flush on nvmet-wq.

Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>nvme-pci: Fix slab-out-of-bounds in nvme_dbbuf_set</title>
<updated>2026-03-10T15:19:53+00:00</updated>
<author>
<name>Sungwoo Kim</name>
<email>iam@sung-woo.kim</email>
</author>
<published>2026-03-08T18:20:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b4e78f1427c7d6859229ae9616df54e1fc05a516'/>
<id>b4e78f1427c7d6859229ae9616df54e1fc05a516</id>
<content type='text'>
dev-&gt;online_queues is a count incremented in nvme_init_queue. Thus,
valid indices are 0 through dev-&gt;online_queues − 1.

This patch fixes the loop condition to ensure the index stays within the
valid range. Index 0 is excluded because it is the admin queue.

KASAN splat:

==================================================================
BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline]
BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404
Read of size 2 at addr ffff88800592a574 by task kworker/u8:5/74

CPU: 0 UID: 0 PID: 74 Comm: kworker/u8:5 Not tainted 6.19.0-dirty #10 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: nvme-reset-wq nvme_reset_work
Call Trace:
 &lt;TASK&gt;
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xea/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xce/0x5d0 mm/kasan/report.c:482
 kasan_report+0xdc/0x110 mm/kasan/report.c:595
 __asan_report_load2_noabort+0x18/0x20 mm/kasan/report_generic.c:379
 nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline]
 nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404
 nvme_reset_work+0x36b/0x8c0 drivers/nvme/host/pci.c:3252
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 &lt;/TASK&gt;

Allocated by task 34 on cpu 1 at 4.241550s:
 kasan_save_stack+0x2c/0x60 mm/kasan/common.c:57
 kasan_save_track+0x1c/0x70 mm/kasan/common.c:78
 kasan_save_alloc_info+0x3c/0x50 mm/kasan/generic.c:570
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0xb5/0xc0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5657 [inline]
 __kmalloc_node_noprof+0x2bf/0x8d0 mm/slub.c:5663
 kmalloc_array_node_noprof include/linux/slab.h:1075 [inline]
 nvme_pci_alloc_dev drivers/nvme/host/pci.c:3479 [inline]
 nvme_probe+0x2f1/0x1820 drivers/nvme/host/pci.c:3534
 local_pci_probe+0xef/0x1c0 drivers/pci/pci-driver.c:324
 pci_call_probe drivers/pci/pci-driver.c:392 [inline]
 __pci_device_probe drivers/pci/pci-driver.c:417 [inline]
 pci_device_probe+0x743/0x920 drivers/pci/pci-driver.c:451
 call_driver_probe drivers/base/dd.c:583 [inline]
 really_probe+0x29b/0xb70 drivers/base/dd.c:661
 __driver_probe_device+0x3b0/0x4a0 drivers/base/dd.c:803
 driver_probe_device+0x56/0x1f0 drivers/base/dd.c:833
 __driver_attach_async_helper+0x155/0x340 drivers/base/dd.c:1159
 async_run_entry_fn+0xa6/0x4b0 kernel/async.c:129
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

The buggy address belongs to the object at ffff88800592a000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 244 bytes to the right of
 allocated 1152-byte region [ffff88800592a000, ffff88800592a480)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5928
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfffffc0000040(head|node=0|zone=1|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001
head: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 000fffffc0000003 ffffea0000164a01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88800592a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88800592a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
&gt;ffff88800592a500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                             ^
 ffff88800592a580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88800592a600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fixes: 0f0d2c876c96 (nvme: free sq/cq dbbuf pointers when dbbuf set fails)
Acked-by: Chao Shi &lt;cshi008@fiu.edu&gt;
Acked-by: Weidong Zhu &lt;weizhu@fiu.edu&gt;
Acked-by: Dave Tian &lt;daveti@purdue.edu&gt;
Signed-off-by: Sungwoo Kim &lt;iam@sung-woo.kim&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dev-&gt;online_queues is a count incremented in nvme_init_queue. Thus,
valid indices are 0 through dev-&gt;online_queues − 1.

This patch fixes the loop condition to ensure the index stays within the
valid range. Index 0 is excluded because it is the admin queue.

KASAN splat:

==================================================================
BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline]
BUG: KASAN: slab-out-of-bounds in nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404
Read of size 2 at addr ffff88800592a574 by task kworker/u8:5/74

CPU: 0 UID: 0 PID: 74 Comm: kworker/u8:5 Not tainted 6.19.0-dirty #10 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Workqueue: nvme-reset-wq nvme_reset_work
Call Trace:
 &lt;TASK&gt;
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0xea/0x150 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:378 [inline]
 print_report+0xce/0x5d0 mm/kasan/report.c:482
 kasan_report+0xdc/0x110 mm/kasan/report.c:595
 __asan_report_load2_noabort+0x18/0x20 mm/kasan/report_generic.c:379
 nvme_dbbuf_free drivers/nvme/host/pci.c:377 [inline]
 nvme_dbbuf_set+0x39c/0x400 drivers/nvme/host/pci.c:404
 nvme_reset_work+0x36b/0x8c0 drivers/nvme/host/pci.c:3252
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
 &lt;/TASK&gt;

Allocated by task 34 on cpu 1 at 4.241550s:
 kasan_save_stack+0x2c/0x60 mm/kasan/common.c:57
 kasan_save_track+0x1c/0x70 mm/kasan/common.c:78
 kasan_save_alloc_info+0x3c/0x50 mm/kasan/generic.c:570
 poison_kmalloc_redzone mm/kasan/common.c:398 [inline]
 __kasan_kmalloc+0xb5/0xc0 mm/kasan/common.c:415
 kasan_kmalloc include/linux/kasan.h:263 [inline]
 __do_kmalloc_node mm/slub.c:5657 [inline]
 __kmalloc_node_noprof+0x2bf/0x8d0 mm/slub.c:5663
 kmalloc_array_node_noprof include/linux/slab.h:1075 [inline]
 nvme_pci_alloc_dev drivers/nvme/host/pci.c:3479 [inline]
 nvme_probe+0x2f1/0x1820 drivers/nvme/host/pci.c:3534
 local_pci_probe+0xef/0x1c0 drivers/pci/pci-driver.c:324
 pci_call_probe drivers/pci/pci-driver.c:392 [inline]
 __pci_device_probe drivers/pci/pci-driver.c:417 [inline]
 pci_device_probe+0x743/0x920 drivers/pci/pci-driver.c:451
 call_driver_probe drivers/base/dd.c:583 [inline]
 really_probe+0x29b/0xb70 drivers/base/dd.c:661
 __driver_probe_device+0x3b0/0x4a0 drivers/base/dd.c:803
 driver_probe_device+0x56/0x1f0 drivers/base/dd.c:833
 __driver_attach_async_helper+0x155/0x340 drivers/base/dd.c:1159
 async_run_entry_fn+0xa6/0x4b0 kernel/async.c:129
 process_one_work+0x956/0x1aa0 kernel/workqueue.c:3257
 process_scheduled_works kernel/workqueue.c:3340 [inline]
 worker_thread+0x65c/0xe60 kernel/workqueue.c:3421
 kthread+0x41a/0x930 kernel/kthread.c:463
 ret_from_fork+0x6f8/0x8c0 arch/x86/kernel/process.c:158
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246

The buggy address belongs to the object at ffff88800592a000
 which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 244 bytes to the right of
 allocated 1152-byte region [ffff88800592a000, ffff88800592a480)

The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5928
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0xfffffc0000040(head|node=0|zone=1|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 000fffffc0000040 ffff888001042000 0000000000000000 dead000000000001
head: 0000000000000000 0000000000080008 00000000f5000000 0000000000000000
head: 000fffffc0000003 ffffea0000164a01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88800592a400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88800592a480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
&gt;ffff88800592a500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                             ^
 ffff88800592a580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff88800592a600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fixes: 0f0d2c876c96 (nvme: free sq/cq dbbuf pointers when dbbuf set fails)
Acked-by: Chao Shi &lt;cshi008@fiu.edu&gt;
Acked-by: Weidong Zhu &lt;weizhu@fiu.edu&gt;
Acked-by: Dave Tian &lt;daveti@purdue.edu&gt;
Signed-off-by: Sungwoo Kim &lt;iam@sung-woo.kim&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
