diff options
| author | Ankit Agrawal <ankita@nvidia.com> | 2026-06-02 06:30:15 +0000 |
|---|---|---|
| committer | Alex Williamson <alex@shazbot.org> | 2026-06-05 10:43:32 -0600 |
| commit | 682ecb14e83840e87ea36c6d7c16c5111ce18784 (patch) | |
| tree | 7ee29db3ff7ca90fada3e215b1e4943e61f171be /drivers/phy/eswin/git@git.tavy.me:linux.git | |
| parent | 40ef3edf151e184d021917a5c4c771cc0870844a (diff) | |
vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC
Add a CXL DVSEC-based readiness check for Blackwell-Next GPUs alongside
the existing legacy BAR0 polling path. The CXL Device DVSEC offset is
discovered at probe time. Probe, fault and read/write paths then branch
on that to use either the legacy BAR0 polling or the CXL DVSEC polling.
The CXL path polls Memory_Active, requiring MEM_INFO_VALID within 1s and
MEM_ACTIVE within Memory_Active_Timeout (up to 256s) as per CXL spec r4.0
sec 8.1.3.8.2. Given the long worst-case wait, the CXL poll runs outside
memory_lock with only a quick readiness check is done under the lock.
The poll loops sleep with schedule_timeout_killable() and return -EINTR
on a fatal signal. This avoids hung-task panics during the long
uninterruptible wait. Extend this to the legacy based wait as well for
improvement.
In the fault handler the wait runs locklessly before memory_lock. If a
reset races in, the in-lock recheck returns -EAGAIN and the wait is
retried rather than returning a spurious VM_FAULT_SIGBUS.
Add PCI_DVSEC_CXL_MEM_ACTIVE_TIMEOUT to pci_regs.h for the timeout field.
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Alex Williamson <alex@shazbot.org>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260602063015.3915-1-ankita@nvidia.com
Signed-off-by: Alex Williamson <alex@shazbot.org>
Diffstat (limited to 'drivers/phy/eswin/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions
