linux.git - Linux kernel source tree

diff options

author	Ankit Agrawal <ankita@nvidia.com>	2026-06-02 06:30:15 +0000
committer	Alex Williamson <alex@shazbot.org>	2026-06-05 10:43:32 -0600
commit	682ecb14e83840e87ea36c6d7c16c5111ce18784 (patch)
tree	7ee29db3ff7ca90fada3e215b1e4943e61f171be /drivers/phy/eswin/git@git.tavy.me:linux.git
parent	40ef3edf151e184d021917a5c4c771cc0870844a (diff)

vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC

Add a CXL DVSEC-based readiness check for Blackwell-Next GPUs alongside the existing legacy BAR0 polling path. The CXL Device DVSEC offset is discovered at probe time. Probe, fault and read/write paths then branch on that to use either the legacy BAR0 polling or the CXL DVSEC polling. The CXL path polls Memory_Active, requiring MEM_INFO_VALID within 1s and MEM_ACTIVE within Memory_Active_Timeout (up to 256s) as per CXL spec r4.0 sec 8.1.3.8.2. Given the long worst-case wait, the CXL poll runs outside memory_lock with only a quick readiness check is done under the lock. The poll loops sleep with schedule_timeout_killable() and return -EINTR on a fatal signal. This avoids hung-task panics during the long uninterruptible wait. Extend this to the legacy based wait as well for improvement. In the fault handler the wait runs locklessly before memory_lock. If a reset races in, the in-lock recheck returns -EAGAIN and the wait is retried rather than returning a spurious VM_FAULT_SIGBUS. Add PCI_DVSEC_CXL_MEM_ACTIVE_TIMEOUT to pci_regs.h for the timeout field. Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Suggested-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260602063015.3915-1-ankita@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>

Diffstat (limited to 'drivers/phy/eswin/git@git.tavy.me:linux.git')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: