linux-stable.git/drivers/nvme, branch v6.14.2

nvme-pci: skip nvme_write_sq_db on empty rqlist

2025-04-10T12:44:39+00:00

[ Upstream commit 288ff0d10beb069355036355d5f7612579dc869c ]

nvme_submit_cmds() should check the rqlist before calling
nvme_write_sq_db(); if the list is empty, it must return immediately.

Fixes: beadf0088501 ("nvme-pci: reverse request order in nvme_queue_rqs")
Signed-off-by: Maurizio Lombardi 
Signed-off-by: Keith Busch 
Signed-off-by: Sasha Levin

nvme/ioctl: don't warn on vectorized uring_cmd with fixed buffer

2025-04-10T12:44:39+00:00

[ Upstream commit eada75467fca0b016b9b22212637c07216135c20 ]

The vectorized io_uring NVMe passthru opcodes don't yet support fixed
buffers. But since userspace can trigger this condition based on the
io_uring SQE parameters, it shouldn't cause a kernel warning.

Signed-off-by: Caleb Sander Mateos 
Reviewed-by: Jens Axboe 
Reviewed-by: Chaitanya Kulkarni 
Reviewed-by: Christoph Hellwig 
Fixes: 23fd22e55b76 ("nvme: wire up fixed buffer support for nvme passthrough")
Signed-off-by: Keith Busch 
Signed-off-by: Sasha Levin

objtool, nvmet: Fix out-of-bounds stack access in nvmet_ctrl_state_show()

2025-04-10T12:44:35+00:00

[ Upstream commit 107a23185d990e3df6638d9a84c835f963fe30a6 ]

The csts_state_names[] array only has six sparse entries, but the
iteration code in nvmet_ctrl_state_show() iterates seven, resulting in a
potential out-of-bounds stack read.  Fix that.

Fixes the following warning with an UBSAN kernel:

  vmlinux.o: warning: objtool: .text.nvmet_ctrl_state_show: unexpected end of section

Fixes: 649fd41420a8 ("nvmet: add debugfs support")
Reported-by: kernel test robot 
Signed-off-by: Josh Poimboeuf 
Signed-off-by: Ingo Molnar 
Cc: Christoph Hellwig 
Cc: Sagi Grimberg 
Cc: Chaitanya Kulkarni 
Cc: Linus Torvalds 
Link: https://lore.kernel.org/r/f1f60858ee7a941863dc7f5506c540cb9f97b5f6.1742852847.git.jpoimboe@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202503171547.LlCTJLQL-lkp@intel.com/
Signed-off-by: Sasha Levin

nvmet: pci-epf: Always configure BAR0 as 64-bit

2025-04-10T12:44:11+00:00

[ Upstream commit 1cf0184c0ac4f1e936bb3b089894bbeb0a9eb2bc ]

NVMe PCIe Transport Specification 1.1, section 2.1.10, claims that the
BAR0 type is Implementation Specific.

However, in NVMe 1.1, the type is required to be 64-bit.

Thus, to make our PCI EPF work on as many host systems as possible,
always configure the BAR0 type to be 64-bit.

In the rare case that the underlying PCI EPC does not support configuring
BAR0 as 64-bit, the call to pci_epc_set_bar() will fail, and we will
return a failure back to the user.

This should not be a problem, as most PCI EPCs support configuring a BAR
as 64-bit (and those EPCs with .only_64bit set to true in epc_features
only support configuring the BAR as 64-bit).

Tested-by: Damien Le Moal 
Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver")
Signed-off-by: Niklas Cassel 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Chaitanya Kulkarni 
Signed-off-by: Keith Busch 
Signed-off-by: Sasha Levin

Merge tag 'nvme-6.14-2025-03-13' of git://git.infradead.org/nvme into block-6.14

2025-03-13T15:41:57+00:00

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.14

 - Concurrent pci error and hotplug handling fix (Keith)
 - Endpoint function fixes (Damien)"

* tag 'nvme-6.14-2025-03-13' of git://git.infradead.org/nvme:
  nvmet: pci-epf: Do not add an IRQ vector if not needed
  nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created
  nvme-pci: fix stuck reset on concurrent DPC and HP

block: change blk_mq_add_to_batch() third argument type to bool

2025-03-12T14:26:36+00:00

Commit 1f47ed294a2b ("block: cleanup and fix batch completion adding
conditions") modified the evaluation criteria for the third argument,
'ioerror', in the blk_mq_add_to_batch() function. Initially, the
function had checked if 'ioerror' equals zero. Following the commit, it
started checking for negative error values, with the presumption that
such values, for instance -EIO, would be passed in.

However, blk_mq_add_to_batch() callers do not pass negative error
values. Instead, they pass status codes defined in various ways:

- NVMe PCI and Apple drivers pass NVMe status code
- virtio_blk driver passes the virtblk request header status byte
- null_blk driver passes blk_status_t

These codes are either zero or positive, therefore the revised check
fails to function as intended. Specifically, with the NVMe PCI driver,
this modification led to the failure of the blktests test case nvme/039.
In this test scenario, errors are artificially injected to the NVMe
driver, resulting in positive NVMe status codes passed to
blk_mq_add_to_batch(), which unexpectedly processes the failed I/O in a
batch. Hence the failure.

To correct the ioerror check within blk_mq_add_to_batch(), make all
callers to uniformly pass the argument as boolean. Modify the callers to
check their specific status codes and pass the boolean value 'is_error'.
Also describe the arguments of blK_mq_add_to_batch as kerneldoc.

Fixes: 1f47ed294a2b ("block: cleanup and fix batch completion adding conditions")
Signed-off-by: Shin'ichiro Kawasaki 
Link: https://lore.kernel.org/r/20250311104359.1767728-3-shinichiro.kawasaki@wdc.com
[axboe: fold in documentation update]
Signed-off-by: Jens Axboe

nvme: move error logging from nvme_end_req() to __nvme_end_req()

2025-03-11T13:49:35+00:00

Before the Commit 1f47ed294a2b ("block: cleanup and fix batch completion
adding conditions"), blk_mq_add_to_batch() did not add failed
passthrough requests to batch, and returned false. After the commit,
blk_mq_add_to_batch() always adds passthrough requests to batch
regardless of whether the request failed or not, and returns true. This
affected error logging feature in the NVME driver.

Before the commit, the call chain of failed passthrough request was as
follows:

nvme_handle_cqe()
 blk_mq_add_to_batch() .. false is returned, then call nvme_pci_complete_rq()
 nvme_pci_complete_rq()
  nvme_complete_rq()
   nvme_end_req()
    nvme_log_err_passthru() .. error logging
    __nvme_end_req()        .. end of the rqeuest

After the commit, the call chain is as follows:

nvme_handle_cqe()
 blk_mq_add_to_batch() .. true is returned, then set nvme_pci_complete_batch()
 ..
 nvme_pci_complete_batch()
  nvme_complete_batch()
   nvme_complete_batch_req()
    __nvme_end_req() .. end of the request, without error logging

To make the error logging feature work again for passthrough requests, move the
nvme_log_err_passthru() call from nvme_end_req() to __nvme_end_req().

While at it, move nvme_log_error() call for non-passthrough requests together
with nvme_log_err_passthru(). Even though the trigger commit does not affect
non-passthrough requests, move it together for code simplicity.

Fixes: 1f47ed294a2b ("block: cleanup and fix batch completion adding conditions")
Signed-off-by: Shin'ichiro Kawasaki 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Christoph Hellwig 
Link: https://lore.kernel.org/r/20250311104359.1767728-2-shinichiro.kawasaki@wdc.com
Signed-off-by: Jens Axboe

nvmet: pci-epf: Do not add an IRQ vector if not needed

2025-03-10T17:12:16+00:00

The function nvmet_pci_epf_create_cq() always unconditionally calls
nvmet_pci_epf_add_irq_vector() to add an IRQ vector for a completion
queue. But this is not correct if the host requested the creation of a
completion queue for polling, without an IRQ vector specified (i.e. the
flag NVME_CQ_IRQ_ENABLED is not set).

Fix this by calling nvmet_pci_epf_add_irq_vector() and setting the queue
flag NVMET_PCI_EPF_Q_IRQ_ENABLED for the cq only if NVME_CQ_IRQ_ENABLED
is set. While at it, also fix the error path to add the missing removal
of the added IRQ vector if nvmet_cq_create() fails.

Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver")
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Keith Busch

nvmet: pci-epf: Set NVMET_PCI_EPF_Q_LIVE when a queue is fully created

2025-03-10T17:12:13+00:00

The function nvmet_pci_epf_create_sq() use test_and_set_bit() to check
that a submission queue is not already live and if not, set the
NVMET_PCI_EPF_Q_LIVE queue flag to declare the sq live (ready to use).
However, this is done on entry to the function, before the submission
queue is actually fully initialized and ready to use. This creates a
race situation with the function nvmet_pci_epf_poll_sqs_work() which
looks at the NVMET_PCI_EPF_Q_LIVE queue flag to poll the submission
queue when it is live. This race can lead to invalid DMA transfers if
nvmet_pci_epf_poll_sqs_work() runs after the NVMET_PCI_EPF_Q_LIVE flag
is set but before setting the sq pci address and doorbell ofset.

Avoid this race by only testing the NVMET_PCI_EPF_Q_LIVE flag on entry
to nvmet_pci_epf_create_sq() and setting it after the submission queue
is fully setup before nvmet_pci_epf_create_sq() returns success.
Since the function nvmet_pci_epf_create_cq() also has the same racy flag
setting pattern, also make a similar change in that function.

Fixes: 0faa0fe6f90e ("nvmet: New NVMe PCI endpoint function target driver")
Signed-off-by: Damien Le Moal 
Reviewed-by: Christoph Hellwig 
Signed-off-by: Keith Busch

nvme-pci: fix stuck reset on concurrent DPC and HP

2025-03-10T16:15:48+00:00

The PCIe error handling has the nvme driver quiesce the device, attempt
to restart it, then wait for that restart to complete.

A PCIe DPC event also toggles the PCIe link. If the slot doesn't have
out-of-band presence detection, this will trigger a pciehp
re-enumeration.

The error handling that calls nvme_error_resume is holding the device
lock while this happens. This lock blocks pciehp's request to disconnect
the driver from proceeding.

Meanwhile the nvme's reset can't make forward progress because its
device isn't there anymore with outstanding IO, and the timeout handler
won't do anything to fix it because the device is undergoing error
handling.

End result: deadlocked.

Fix this by having the timeout handler short cut the disabling for a
disconnected PCIe device. The downside is that we're relying on an IO
timeout to clean up this mess, which could be a minute by default.

Tested-by: Nilay Shroff 
Reviewed-by: Nilay Shroff 
Signed-off-by: Keith Busch