<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/vfio/pci, branch master</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>vfio/pci: Expose latched module parameter policy in debugfs</title>
<updated>2026-06-29T20:26:31+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@nvidia.com</email>
</author>
<published>2026-06-15T19:12:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=586a989d8b0db75d6f97c80a4f588d46e1df6881'/>
<id>586a989d8b0db75d6f97c80a4f588d46e1df6881</id>
<content type='text'>
The nointxmask and disable_idle_d3 module parameters remain writable,
but vfio-pci now latches their values into each device at init.  Once a
device is registered, changing the module parameter only affects future
devices, leaving no direct way to confirm the effective policy for an
existing device.

Add a pci debugfs directory under the VFIO device debugfs root and
report the per-device nointxmask and disable_idle_d3 values.  These are
read-only debugfs views and use the same Y/N bool output convention as
the module parameters.

Read-only vfio-pci parameters, such as disable_vga, are not exposed here
because they cannot drift from the latched device value, therefore the
existing module parameter exposure via sysfs is sufficient.

Note that while only vfio-pci currently provides these options, the
implementation is in vfio-pci-core and therefore properly reflects the
device policy in the core, regardless of driver.

Assisted-by: OpenAI Codex:gpt-5
Cc: Guixin Liu &lt;kanie@linux.alibaba.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-7-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The nointxmask and disable_idle_d3 module parameters remain writable,
but vfio-pci now latches their values into each device at init.  Once a
device is registered, changing the module parameter only affects future
devices, leaving no direct way to confirm the effective policy for an
existing device.

Add a pci debugfs directory under the VFIO device debugfs root and
report the per-device nointxmask and disable_idle_d3 values.  These are
read-only debugfs views and use the same Y/N bool output convention as
the module parameters.

Read-only vfio-pci parameters, such as disable_vga, are not exposed here
because they cannot drift from the latched device value, therefore the
existing module parameter exposure via sysfs is sufficient.

Note that while only vfio-pci currently provides these options, the
implementation is in vfio-pci-core and therefore properly reflects the
device policy in the core, regardless of driver.

Assisted-by: OpenAI Codex:gpt-5
Cc: Guixin Liu &lt;kanie@linux.alibaba.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-7-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Latch all module parameters per device</title>
<updated>2026-06-29T19:46:50+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@nvidia.com</email>
</author>
<published>2026-06-15T19:12:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3788cd493e444742bc2ba252eb5c09ffc8b345dc'/>
<id>3788cd493e444742bc2ba252eb5c09ffc8b345dc</id>
<content type='text'>
The vfio-pci module parameters of disable_idle_d3, nointxmask, and
disable_vga latch vfio-pci policy into vfio-pci-core globals each time
the vfio-pci module is initialized.  The disable_idle_d3 parameter has
already migrated to a per-device flag in order to provide consistency
for refcounted PM operations for the lifetime of the device
registration.

Pull the remaining vfio-pci module-parameter policy out of vfio-pci-core
into per-device flags set at device initialization.

This also restores the mutable aspect of the disable_idle_d3 and
nointxmask module parameters for vfio-pci, with the caveat that the
parameters are latched into the device at probe.

A notable change for variant drivers is that their devices are no longer
affected by vfio-pci module parameters and those drivers may need to
adopt similar module parameters if any devices have a hidden dependency
on vfio-pci setting non-default policy.

Assisted-by: Claude:claude-opus-4-8
Acked-by: Chengwen Feng &lt;fengchengwen@huawei.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-6-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The vfio-pci module parameters of disable_idle_d3, nointxmask, and
disable_vga latch vfio-pci policy into vfio-pci-core globals each time
the vfio-pci module is initialized.  The disable_idle_d3 parameter has
already migrated to a per-device flag in order to provide consistency
for refcounted PM operations for the lifetime of the device
registration.

Pull the remaining vfio-pci module-parameter policy out of vfio-pci-core
into per-device flags set at device initialization.

This also restores the mutable aspect of the disable_idle_d3 and
nointxmask module parameters for vfio-pci, with the caveat that the
parameters are latched into the device at probe.

A notable change for variant drivers is that their devices are no longer
affected by vfio-pci module parameters and those drivers may need to
adopt similar module parameters if any devices have a hidden dependency
on vfio-pci setting non-default policy.

Assisted-by: Claude:claude-opus-4-8
Acked-by: Chengwen Feng &lt;fengchengwen@huawei.com&gt;
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-6-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/mlx5: Fix racy bitfields and tighten struct layout</title>
<updated>2026-06-29T19:46:50+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@nvidia.com</email>
</author>
<published>2026-06-15T19:12:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f2365a63b02ddea32e7db78b742c2503ec7b81f1'/>
<id>f2365a63b02ddea32e7db78b742c2503ec7b81f1</id>
<content type='text'>
Bitfield operations are not atomic, they use a read-modify-write
pattern, therefore we should be careful not to pack bitfields that
can be concurrently updated into the same storage unit.

This split takes a binary approach: flags that are only modified
pre/post open/close remain bitfields, flags modified from user
action, including actions that reach across to another device (ex.
reset) use dedicated storage units.

Note mlx5_vhca_page_tracker.status is relocated to fill the alignment
hole this split exposes.

Bitfield justifications:

  migrate_cap: written only in mlx5vf_cmd_set_migratable() at probe
  chunk_mode: written only in mlx5vf_cmd_set_migratable() at probe
  mig_state_cap: written only in mlx5vf_cmd_set_migratable() at probe

Dedicated storage units:

  mdev_detach: written in the VF attach/detach event notifier
               mlx5fv_vf_event() at runtime
  log_active: written in mlx5vf_start_page_tracker()/
              mlx5vf_stop_page_tracker() during runtime dirty tracking
  deferred_reset: written in mlx5vf_state_mutex_unlock()/
                  mlx5vf_pci_aer_reset_done() during runtime reset handling
  is_err: set by tracker error handling and dirty-log polling at runtime
  object_changed: set by tracker event handling and cleared by dirty-log
                  polling at runtime

Fixes: 61a2f1460fd0 ("vfio/mlx5: Manage the VF attach/detach callback from the PF")
Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty tracking")
Fixes: f886473071d6 ("vfio/mlx5: Add support for tracker object change event")
Cc: Yishai Hadas &lt;yishaih@nvidia.com&gt;
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-5-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Bitfield operations are not atomic, they use a read-modify-write
pattern, therefore we should be careful not to pack bitfields that
can be concurrently updated into the same storage unit.

This split takes a binary approach: flags that are only modified
pre/post open/close remain bitfields, flags modified from user
action, including actions that reach across to another device (ex.
reset) use dedicated storage units.

Note mlx5_vhca_page_tracker.status is relocated to fill the alignment
hole this split exposes.

Bitfield justifications:

  migrate_cap: written only in mlx5vf_cmd_set_migratable() at probe
  chunk_mode: written only in mlx5vf_cmd_set_migratable() at probe
  mig_state_cap: written only in mlx5vf_cmd_set_migratable() at probe

Dedicated storage units:

  mdev_detach: written in the VF attach/detach event notifier
               mlx5fv_vf_event() at runtime
  log_active: written in mlx5vf_start_page_tracker()/
              mlx5vf_stop_page_tracker() during runtime dirty tracking
  deferred_reset: written in mlx5vf_state_mutex_unlock()/
                  mlx5vf_pci_aer_reset_done() during runtime reset handling
  is_err: set by tracker error handling and dirty-log polling at runtime
  object_changed: set by tracker event handling and cleared by dirty-log
                  polling at runtime

Fixes: 61a2f1460fd0 ("vfio/mlx5: Manage the VF attach/detach callback from the PF")
Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty tracking")
Fixes: f886473071d6 ("vfio/mlx5: Add support for tracker object change event")
Cc: Yishai Hadas &lt;yishaih@nvidia.com&gt;
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-5-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Release the VGA arbiter client on register_device() failure</title>
<updated>2026-06-29T19:46:49+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@nvidia.com</email>
</author>
<published>2026-06-15T19:12:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=daedde7f024ecf88bc8e832ed40cf2c795f0796a'/>
<id>daedde7f024ecf88bc8e832ed40cf2c795f0796a</id>
<content type='text'>
The re-order in the Fixes commit below displaced vfio_pci_vga_init() as
the last failure point of what is now vfio_pci_core_register_device()
without introducing an unwind for the VGA arbiter registration.

In current kernels this is mostly benign because vfio_pci_set_decode()
only uses pci_dev state, but the original failure path could leave a
callback with a freed vdev cookie.  The stale registration also becomes
unsafe again once the callback follows drvdata to the vfio device.

Add the required VGA unwind callout.

Fixes: 4aeec3984ddc ("vfio/pci: Re-order vfio_pci_probe()")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The re-order in the Fixes commit below displaced vfio_pci_vga_init() as
the last failure point of what is now vfio_pci_core_register_device()
without introducing an unwind for the VGA arbiter registration.

In current kernels this is mostly benign because vfio_pci_set_decode()
only uses pci_dev state, but the original failure path could leave a
callback with a freed vdev cookie.  The stale registration also becomes
unsafe again once the callback follows drvdata to the vfio device.

Add the required VGA unwind callout.

Fixes: 4aeec3984ddc ("vfio/pci: Re-order vfio_pci_probe()")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-3-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Latch disable_idle_d3 per device</title>
<updated>2026-06-29T19:46:49+00:00</updated>
<author>
<name>Alex Williamson</name>
<email>alex.williamson@nvidia.com</email>
</author>
<published>2026-06-15T19:12:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4575e9aac5336d1365138c0284773bf8da4b1fa3'/>
<id>4575e9aac5336d1365138c0284773bf8da4b1fa3</id>
<content type='text'>
When disable_idle_d3 was introduced in vfio-pci, it directly manipulated
the device power state with pci_set_power_state().  There were no
refcounts to maintain or balanced operations, we could unconditionally
bring the device to D0 and conditionally move it to D3hot.  Therefore
the module parameter was made writable.

Later, in commit c61302aa48f7 ("vfio/pci: Move module parameters to
vfio_pci.c"), as part of the vfio-pci-core split, the writable aspect
of the module parameter was nullified.  The parameter value could still
be changed through sysfs, but the vfio-pci driver latched the values
into vfio-pci-core globals at module init.  Loading the vfio-pci module,
or unloading and reloading, with non-default or different values could
change the globals relative to existing devices bound to vfio-pci
variant drivers.

Runtime PM was introduced in commit 7ab5e10eda02 ("vfio/pci: Move the
unused device into low power state with runtime PM"), which marks the
point where power states became refcounted.  PM get and put operations
need to be balanced, but the same module operations noted above can
change the global variables relative to those devices already bound to
vfio-pci variant drivers.  This introduces a window where PM operations
can now become unbalanced.

To resolve this with a narrow footprint for stable backports, the
disable_idle_d3 flag is latched into the vfio_pci_core_device at the
time of initialization, such that the device always operates with a
consistent value.

NB. vfio_pci_dev_set_try_reset() now unconditionally raises the
runtime PM usage count around bus reset to account for disable_idle_d3
becoming a per-device rather than global flag.  When this flag is set,
the additional get/put pair is harmless and allows continued use of the
shared vfio_pci_dev_set_pm_runtime_get() helper.

Fixes: 7ab5e10eda02 ("vfio/pci: Move the unused device into low power state with runtime PM")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When disable_idle_d3 was introduced in vfio-pci, it directly manipulated
the device power state with pci_set_power_state().  There were no
refcounts to maintain or balanced operations, we could unconditionally
bring the device to D0 and conditionally move it to D3hot.  Therefore
the module parameter was made writable.

Later, in commit c61302aa48f7 ("vfio/pci: Move module parameters to
vfio_pci.c"), as part of the vfio-pci-core split, the writable aspect
of the module parameter was nullified.  The parameter value could still
be changed through sysfs, but the vfio-pci driver latched the values
into vfio-pci-core globals at module init.  Loading the vfio-pci module,
or unloading and reloading, with non-default or different values could
change the globals relative to existing devices bound to vfio-pci
variant drivers.

Runtime PM was introduced in commit 7ab5e10eda02 ("vfio/pci: Move the
unused device into low power state with runtime PM"), which marks the
point where power states became refcounted.  PM get and put operations
need to be balanced, but the same module operations noted above can
change the global variables relative to those devices already bound to
vfio-pci variant drivers.  This introduces a window where PM operations
can now become unbalanced.

To resolve this with a narrow footprint for stable backports, the
disable_idle_d3 flag is latched into the vfio_pci_core_device at the
time of initialization, such that the device always operates with a
consistent value.

NB. vfio_pci_dev_set_try_reset() now unconditionally raises the
runtime PM usage count around bus reset to account for disable_idle_d3
becoming a per-device rather than global flag.  When this flag is set,
the additional get/put pair is harmless and allows continued use of the
shared vfio_pci_dev_set_pm_runtime_get() helper.

Fixes: 7ab5e10eda02 ("vfio/pci: Move the unused device into low power state with runtime PM")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Alex Williamson &lt;alex.williamson@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260615191241.688297-2-alex.williamson@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/qat: fix f_pos race in qat_vf_resume_write()</title>
<updated>2026-06-10T20:33:05+00:00</updated>
<author>
<name>Giovanni Cabiddu</name>
<email>giovanni.cabiddu@intel.com</email>
</author>
<published>2026-06-08T15:12:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4ec5e932e636896e97e4c6a8205b0ac76d52421a'/>
<id>4ec5e932e636896e97e4c6a8205b0ac76d52421a</id>
<content type='text'>
qat_vf_resume_write() checks filp-&gt;f_pos before taking migf-&gt;lock, but
copies into the migration-state buffer after taking the lock and
re-reading the shared file position.

Two concurrent writers could therefore pass the bounds check with the
old offset, then have the second writer copy after the first advanced
f_pos, writing past the end of the migration-state buffer.

Take migf-&gt;lock before doing the boundary checks.

Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices")
Reviewed-by: Ahsan Atta &lt;ahsan.atta@intel.com&gt;
Signed-off-by: Giovanni Cabiddu &lt;giovanni.cabiddu@intel.com&gt;
Link: https://lore.kernel.org/r/20260608151317.136613-1-giovanni.cabiddu@intel.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
qat_vf_resume_write() checks filp-&gt;f_pos before taking migf-&gt;lock, but
copies into the migration-state buffer after taking the lock and
re-reading the shared file position.

Two concurrent writers could therefore pass the bounds check with the
old offset, then have the second writer copy after the first advanced
f_pos, writing past the end of the migration-state buffer.

Take migf-&gt;lock before doing the boundary checks.

Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices")
Reviewed-by: Ahsan Atta &lt;ahsan.atta@intel.com&gt;
Signed-off-by: Giovanni Cabiddu &lt;giovanni.cabiddu@intel.com&gt;
Link: https://lore.kernel.org/r/20260608151317.136613-1-giovanni.cabiddu@intel.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/nvgrace-gpu: Add Blackwell-Next GPU readiness check via CXL DVSEC</title>
<updated>2026-06-05T16:43:32+00:00</updated>
<author>
<name>Ankit Agrawal</name>
<email>ankita@nvidia.com</email>
</author>
<published>2026-06-02T06:30:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=682ecb14e83840e87ea36c6d7c16c5111ce18784'/>
<id>682ecb14e83840e87ea36c6d7c16c5111ce18784</id>
<content type='text'>
Add a CXL DVSEC-based readiness check for Blackwell-Next GPUs alongside
the existing legacy BAR0 polling path. The CXL Device DVSEC offset is
discovered at probe time. Probe, fault and read/write paths then branch
on that to use either the legacy BAR0 polling or the CXL DVSEC polling.

The CXL path polls Memory_Active, requiring MEM_INFO_VALID within 1s and
MEM_ACTIVE within Memory_Active_Timeout (up to 256s) as per CXL spec r4.0
sec 8.1.3.8.2. Given the long worst-case wait, the CXL poll runs outside
memory_lock with only a quick readiness check is done under the lock.

The poll loops sleep with schedule_timeout_killable() and return -EINTR
on a fatal signal. This avoids hung-task panics during the long
uninterruptible wait. Extend this to the legacy based wait as well for
improvement.

In the fault handler the wait runs locklessly before memory_lock. If a
reset races in, the in-lock recheck returns -EAGAIN and the wait is
retried rather than returning a spurious VM_FAULT_SIGBUS.

Add PCI_DVSEC_CXL_MEM_ACTIVE_TIMEOUT to pci_regs.h for the timeout field.

Cc: Ilpo Järvinen &lt;ilpo.jarvinen@linux.intel.com&gt;
Cc: Kevin Tian &lt;kevin.tian@intel.com&gt;
Suggested-by: Alex Williamson &lt;alex@shazbot.org&gt;
Signed-off-by: Ankit Agrawal &lt;ankita@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260602063015.3915-1-ankita@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a CXL DVSEC-based readiness check for Blackwell-Next GPUs alongside
the existing legacy BAR0 polling path. The CXL Device DVSEC offset is
discovered at probe time. Probe, fault and read/write paths then branch
on that to use either the legacy BAR0 polling or the CXL DVSEC polling.

The CXL path polls Memory_Active, requiring MEM_INFO_VALID within 1s and
MEM_ACTIVE within Memory_Active_Timeout (up to 256s) as per CXL spec r4.0
sec 8.1.3.8.2. Given the long worst-case wait, the CXL poll runs outside
memory_lock with only a quick readiness check is done under the lock.

The poll loops sleep with schedule_timeout_killable() and return -EINTR
on a fatal signal. This avoids hung-task panics during the long
uninterruptible wait. Extend this to the legacy based wait as well for
improvement.

In the fault handler the wait runs locklessly before memory_lock. If a
reset races in, the in-lock recheck returns -EAGAIN and the wait is
retried rather than returning a spurious VM_FAULT_SIGBUS.

Add PCI_DVSEC_CXL_MEM_ACTIVE_TIMEOUT to pci_regs.h for the timeout field.

Cc: Ilpo Järvinen &lt;ilpo.jarvinen@linux.intel.com&gt;
Cc: Kevin Tian &lt;kevin.tian@intel.com&gt;
Suggested-by: Alex Williamson &lt;alex@shazbot.org&gt;
Signed-off-by: Ankit Agrawal &lt;ankita@nvidia.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Link: https://lore.kernel.org/r/20260602063015.3915-1-ankita@nvidia.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/pci: Use a private flag to prevent power state change with VFs</title>
<updated>2026-05-22T15:14:16+00:00</updated>
<author>
<name>Raghavendra Rao Ananta</name>
<email>rananta@google.com</email>
</author>
<published>2026-05-14T17:34:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=40ef3edf151e184d021917a5c4c771cc0870844a'/>
<id>40ef3edf151e184d021917a5c4c771cc0870844a</id>
<content type='text'>
The current implementation uses pci_num_vf() while holding the
memory_lock to prevent changing the power state of a PF when
VFs are enabled. This creates a lockdep circular dependency
warning because memory_lock is held during device probing.

[  286.997167] ======================================================
[  287.003363] WARNING: possible circular locking dependency detected
[  287.009562] 7.0.0-dbg-DEV #3 Tainted: G S
[  287.015074] ------------------------------------------------------
[  287.021270] vfio_pci_sriov_/18636 is trying to acquire lock:
[  287.026942] ff45bea2294d4968 (&amp;vdev-&gt;memory_lock){+.+.}-{4:4}, at:
vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.036530]
[  287.036530] but task is already holding lock:
[  287.042383] ff45bea3a96b8230 (&amp;new_dev_set-&gt;lock){+.+.}-{4:4}, at:
vfio_group_fops_unl_ioctl+0x44d/0x7b0
[  287.051879]
[  287.051879] which lock already depends on the new lock.
[  287.051879]
[  287.060070]
[  287.060070] the existing dependency chain (in reverse order) is:
[  287.067568]
[  287.067568] -&gt; #2 (&amp;new_dev_set-&gt;lock){+.+.}-{4:4}:
[  287.073941]        __mutex_lock+0x92/0xb80
[  287.078058]        vfio_assign_device_set+0x66/0x1b0
[  287.083042]        vfio_pci_core_register_device+0xd1/0x2a0
[  287.088638]        vfio_pci_probe+0xd2/0x100
[  287.092933]        local_pci_probe_callback+0x4d/0xa0
[  287.098001]        process_scheduled_works+0x2ca/0x680
[  287.103158]        worker_thread+0x1e8/0x2f0
[  287.107452]        kthread+0x10c/0x140
[  287.111230]        ret_from_fork+0x18e/0x360
[  287.115519]        ret_from_fork_asm+0x1a/0x30
[  287.119983]
[  287.119983] -&gt; #1 ((work_completion)(&amp;arg.work)){+.+.}-{0:0}:
[  287.127219]        __flush_work+0x345/0x490
[  287.131429]        pci_device_probe+0x2e3/0x490
[  287.135979]        really_probe+0x1f9/0x4e0
[  287.140180]        __driver_probe_device+0x77/0x100
[  287.145079]        driver_probe_device+0x1e/0x110
[  287.149803]        __device_attach_driver+0xe3/0x170
[  287.154789]        bus_for_each_drv+0x125/0x150
[  287.159346]        __device_attach+0xca/0x1a0
[  287.163720]        device_initial_probe+0x34/0x50
[  287.168445]        pci_bus_add_device+0x6e/0x90
[  287.172995]        pci_iov_add_virtfn+0x3c9/0x3e0
[  287.177719]        sriov_add_vfs+0x2c/0x60
[  287.181838]        sriov_enable+0x306/0x4a0
[  287.186038]        vfio_pci_core_sriov_configure+0x184/0x220
[  287.191715]        sriov_numvfs_store+0xd9/0x1c0
[  287.196351]        kernfs_fop_write_iter+0x13f/0x1d0
[  287.201338]        vfs_write+0x2be/0x3b0
[  287.205286]        ksys_write+0x73/0x100
[  287.209233]        do_syscall_64+0x14d/0x750
[  287.213529]        entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  287.219120]
[  287.219120] -&gt; #0 (&amp;vdev-&gt;memory_lock){+.+.}-{4:4}:
[  287.225491]        __lock_acquire+0x14c6/0x2800
[  287.230048]        lock_acquire+0xd3/0x2f0
[  287.234168]        down_write+0x3a/0xc0
[  287.238019]        vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.243436]        __rpm_callback+0x8c/0x310
[  287.247730]        rpm_resume+0x529/0x6f0
[  287.251765]        __pm_runtime_resume+0x68/0x90
[  287.256402]        vfio_pci_core_enable+0x44/0x310
[  287.261216]        vfio_pci_open_device+0x1c/0x80
[  287.265947]        vfio_df_open+0x10f/0x150
[  287.270148]        vfio_group_fops_unl_ioctl+0x4a4/0x7b0
[  287.275476]        __se_sys_ioctl+0x71/0xc0
[  287.279679]        do_syscall_64+0x14d/0x750
[  287.283975]        entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  287.289559]
[  287.289559] other info that might help us debug this:
[  287.289559]
[  287.297582] Chain exists of:
[  287.297582]   &amp;vdev-&gt;memory_lock --&gt; (work_completion)(&amp;arg.work)
--&gt; &amp;new_dev_set-&gt;lock
[  287.297582]
[  287.310023]  Possible unsafe locking scenario:
[  287.310023]
[  287.315961]        CPU0                    CPU1
[  287.320510]        ----                    ----
[  287.325059]   lock(&amp;new_dev_set-&gt;lock);
[  287.328917]
lock((work_completion)(&amp;arg.work));
[  287.336153]                                lock(&amp;new_dev_set-&gt;lock);
[  287.342523]   lock(&amp;vdev-&gt;memory_lock);
[  287.346382]
[  287.346382]  *** DEADLOCK ***
[  287.346382]
[  287.352315] 2 locks held by vfio_pci_sriov_/18636:
[  287.357125]  #0: ff45bea208ed3e18 (&amp;group-&gt;group_lock){+.+.}-{4:4},
at: vfio_group_fops_unl_ioctl+0x3e3/0x7b0
[  287.367048]  #1: ff45bea3a96b8230 (&amp;new_dev_set-&gt;lock){+.+.}-{4:4},
at: vfio_group_fops_unl_ioctl+0x44d/0x7b0
[  287.376976]
[  287.376976] stack backtrace:
[  287.381353] CPU: 191 UID: 0 PID: 18636 Comm: vfio_pci_sriov_
Tainted: G S                  7.0.0-dbg-DEV #3 PREEMPTLAZY
[  287.381355] Tainted: [S]=CPU_OUT_OF_SPEC
[  287.381356] Call Trace:
[  287.381357]  &lt;TASK&gt;
[  287.381358]  dump_stack_lvl+0x54/0x70
[  287.381361]  print_circular_bug+0x2e1/0x300
[  287.381363]  check_noncircular+0xf9/0x120
[  287.381364]  ? __lock_acquire+0x5b4/0x2800
[  287.381366]  __lock_acquire+0x14c6/0x2800
[  287.381368]  ? pci_mmcfg_read+0x4f/0x220
[  287.381370]  ? pci_mmcfg_write+0x57/0x220
[  287.381371]  ? lock_acquire+0xd3/0x2f0
[  287.381373]  ? pci_mmcfg_write+0x57/0x220
[  287.381374]  ? lock_release+0xef/0x360
[  287.381376]  ? vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381377]  lock_acquire+0xd3/0x2f0
[  287.381378]  ? vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381379]  ? lock_is_held_type+0x76/0x100
[  287.381382]  down_write+0x3a/0xc0
[  287.381382]  ? vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381383]  vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381384]  ? __pfx_pci_pm_runtime_resume+0x10/0x10
[  287.381385]  __rpm_callback+0x8c/0x310
[  287.381386]  ? ktime_get_mono_fast_ns+0x3d/0xb0
[  287.381389]  ? __pfx_pci_pm_runtime_resume+0x10/0x10
[  287.381390]  rpm_resume+0x529/0x6f0
[  287.381392]  ? lock_is_held_type+0x76/0x100
[  287.381394]  __pm_runtime_resume+0x68/0x90
[  287.381396]  vfio_pci_core_enable+0x44/0x310
[  287.381398]  vfio_pci_open_device+0x1c/0x80
[  287.381399]  vfio_df_open+0x10f/0x150
[  287.381401]  vfio_group_fops_unl_ioctl+0x4a4/0x7b0
[  287.381402]  __se_sys_ioctl+0x71/0xc0
[  287.381404]  do_syscall_64+0x14d/0x750
[  287.381405]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  287.381406]  ? trace_irq_disable+0x25/0xd0
[  287.381409]  entry_SYSCALL_64_after_hwframe+0x77/0x7f

Introduce a private flag 'sriov_active' in the vfio_pci_core_device
struct. This  allows the driver to track the SR-IOV power state requirement
without  relying on pci_num_vf() while holding the memory_lock. The lock is
now  only held to set the flag and ensure the device is in D0, after which
pci_enable_sriov() can be called without the lock.

Fixes: f4162eb1e2fc ("vfio/pci: Change the PF power state to D0 before enabling VFs")
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Suggested-by: Alex Williamson &lt;alex@shazbot.org&gt;
Signed-off-by: Raghavendra Rao Ananta &lt;rananta@google.com&gt;
Link: https://lore.kernel.org/r/20260514173449.3282188-1-rananta@google.com
[promote bitfield to plain bool to avoid storage-unit races]
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current implementation uses pci_num_vf() while holding the
memory_lock to prevent changing the power state of a PF when
VFs are enabled. This creates a lockdep circular dependency
warning because memory_lock is held during device probing.

[  286.997167] ======================================================
[  287.003363] WARNING: possible circular locking dependency detected
[  287.009562] 7.0.0-dbg-DEV #3 Tainted: G S
[  287.015074] ------------------------------------------------------
[  287.021270] vfio_pci_sriov_/18636 is trying to acquire lock:
[  287.026942] ff45bea2294d4968 (&amp;vdev-&gt;memory_lock){+.+.}-{4:4}, at:
vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.036530]
[  287.036530] but task is already holding lock:
[  287.042383] ff45bea3a96b8230 (&amp;new_dev_set-&gt;lock){+.+.}-{4:4}, at:
vfio_group_fops_unl_ioctl+0x44d/0x7b0
[  287.051879]
[  287.051879] which lock already depends on the new lock.
[  287.051879]
[  287.060070]
[  287.060070] the existing dependency chain (in reverse order) is:
[  287.067568]
[  287.067568] -&gt; #2 (&amp;new_dev_set-&gt;lock){+.+.}-{4:4}:
[  287.073941]        __mutex_lock+0x92/0xb80
[  287.078058]        vfio_assign_device_set+0x66/0x1b0
[  287.083042]        vfio_pci_core_register_device+0xd1/0x2a0
[  287.088638]        vfio_pci_probe+0xd2/0x100
[  287.092933]        local_pci_probe_callback+0x4d/0xa0
[  287.098001]        process_scheduled_works+0x2ca/0x680
[  287.103158]        worker_thread+0x1e8/0x2f0
[  287.107452]        kthread+0x10c/0x140
[  287.111230]        ret_from_fork+0x18e/0x360
[  287.115519]        ret_from_fork_asm+0x1a/0x30
[  287.119983]
[  287.119983] -&gt; #1 ((work_completion)(&amp;arg.work)){+.+.}-{0:0}:
[  287.127219]        __flush_work+0x345/0x490
[  287.131429]        pci_device_probe+0x2e3/0x490
[  287.135979]        really_probe+0x1f9/0x4e0
[  287.140180]        __driver_probe_device+0x77/0x100
[  287.145079]        driver_probe_device+0x1e/0x110
[  287.149803]        __device_attach_driver+0xe3/0x170
[  287.154789]        bus_for_each_drv+0x125/0x150
[  287.159346]        __device_attach+0xca/0x1a0
[  287.163720]        device_initial_probe+0x34/0x50
[  287.168445]        pci_bus_add_device+0x6e/0x90
[  287.172995]        pci_iov_add_virtfn+0x3c9/0x3e0
[  287.177719]        sriov_add_vfs+0x2c/0x60
[  287.181838]        sriov_enable+0x306/0x4a0
[  287.186038]        vfio_pci_core_sriov_configure+0x184/0x220
[  287.191715]        sriov_numvfs_store+0xd9/0x1c0
[  287.196351]        kernfs_fop_write_iter+0x13f/0x1d0
[  287.201338]        vfs_write+0x2be/0x3b0
[  287.205286]        ksys_write+0x73/0x100
[  287.209233]        do_syscall_64+0x14d/0x750
[  287.213529]        entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  287.219120]
[  287.219120] -&gt; #0 (&amp;vdev-&gt;memory_lock){+.+.}-{4:4}:
[  287.225491]        __lock_acquire+0x14c6/0x2800
[  287.230048]        lock_acquire+0xd3/0x2f0
[  287.234168]        down_write+0x3a/0xc0
[  287.238019]        vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.243436]        __rpm_callback+0x8c/0x310
[  287.247730]        rpm_resume+0x529/0x6f0
[  287.251765]        __pm_runtime_resume+0x68/0x90
[  287.256402]        vfio_pci_core_enable+0x44/0x310
[  287.261216]        vfio_pci_open_device+0x1c/0x80
[  287.265947]        vfio_df_open+0x10f/0x150
[  287.270148]        vfio_group_fops_unl_ioctl+0x4a4/0x7b0
[  287.275476]        __se_sys_ioctl+0x71/0xc0
[  287.279679]        do_syscall_64+0x14d/0x750
[  287.283975]        entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  287.289559]
[  287.289559] other info that might help us debug this:
[  287.289559]
[  287.297582] Chain exists of:
[  287.297582]   &amp;vdev-&gt;memory_lock --&gt; (work_completion)(&amp;arg.work)
--&gt; &amp;new_dev_set-&gt;lock
[  287.297582]
[  287.310023]  Possible unsafe locking scenario:
[  287.310023]
[  287.315961]        CPU0                    CPU1
[  287.320510]        ----                    ----
[  287.325059]   lock(&amp;new_dev_set-&gt;lock);
[  287.328917]
lock((work_completion)(&amp;arg.work));
[  287.336153]                                lock(&amp;new_dev_set-&gt;lock);
[  287.342523]   lock(&amp;vdev-&gt;memory_lock);
[  287.346382]
[  287.346382]  *** DEADLOCK ***
[  287.346382]
[  287.352315] 2 locks held by vfio_pci_sriov_/18636:
[  287.357125]  #0: ff45bea208ed3e18 (&amp;group-&gt;group_lock){+.+.}-{4:4},
at: vfio_group_fops_unl_ioctl+0x3e3/0x7b0
[  287.367048]  #1: ff45bea3a96b8230 (&amp;new_dev_set-&gt;lock){+.+.}-{4:4},
at: vfio_group_fops_unl_ioctl+0x44d/0x7b0
[  287.376976]
[  287.376976] stack backtrace:
[  287.381353] CPU: 191 UID: 0 PID: 18636 Comm: vfio_pci_sriov_
Tainted: G S                  7.0.0-dbg-DEV #3 PREEMPTLAZY
[  287.381355] Tainted: [S]=CPU_OUT_OF_SPEC
[  287.381356] Call Trace:
[  287.381357]  &lt;TASK&gt;
[  287.381358]  dump_stack_lvl+0x54/0x70
[  287.381361]  print_circular_bug+0x2e1/0x300
[  287.381363]  check_noncircular+0xf9/0x120
[  287.381364]  ? __lock_acquire+0x5b4/0x2800
[  287.381366]  __lock_acquire+0x14c6/0x2800
[  287.381368]  ? pci_mmcfg_read+0x4f/0x220
[  287.381370]  ? pci_mmcfg_write+0x57/0x220
[  287.381371]  ? lock_acquire+0xd3/0x2f0
[  287.381373]  ? pci_mmcfg_write+0x57/0x220
[  287.381374]  ? lock_release+0xef/0x360
[  287.381376]  ? vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381377]  lock_acquire+0xd3/0x2f0
[  287.381378]  ? vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381379]  ? lock_is_held_type+0x76/0x100
[  287.381382]  down_write+0x3a/0xc0
[  287.381382]  ? vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381383]  vfio_pci_core_runtime_resume+0x1f/0xa0
[  287.381384]  ? __pfx_pci_pm_runtime_resume+0x10/0x10
[  287.381385]  __rpm_callback+0x8c/0x310
[  287.381386]  ? ktime_get_mono_fast_ns+0x3d/0xb0
[  287.381389]  ? __pfx_pci_pm_runtime_resume+0x10/0x10
[  287.381390]  rpm_resume+0x529/0x6f0
[  287.381392]  ? lock_is_held_type+0x76/0x100
[  287.381394]  __pm_runtime_resume+0x68/0x90
[  287.381396]  vfio_pci_core_enable+0x44/0x310
[  287.381398]  vfio_pci_open_device+0x1c/0x80
[  287.381399]  vfio_df_open+0x10f/0x150
[  287.381401]  vfio_group_fops_unl_ioctl+0x4a4/0x7b0
[  287.381402]  __se_sys_ioctl+0x71/0xc0
[  287.381404]  do_syscall_64+0x14d/0x750
[  287.381405]  ? entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  287.381406]  ? trace_irq_disable+0x25/0xd0
[  287.381409]  entry_SYSCALL_64_after_hwframe+0x77/0x7f

Introduce a private flag 'sriov_active' in the vfio_pci_core_device
struct. This  allows the driver to track the SR-IOV power state requirement
without  relying on pci_num_vf() while holding the memory_lock. The lock is
now  only held to set the flag and ensure the device is in D0, after which
pci_enable_sriov() can be called without the lock.

Fixes: f4162eb1e2fc ("vfio/pci: Change the PF power state to D0 before enabling VFs")
Cc: stable@vger.kernel.org
Suggested-by: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Suggested-by: Alex Williamson &lt;alex@shazbot.org&gt;
Signed-off-by: Raghavendra Rao Ananta &lt;rananta@google.com&gt;
Link: https://lore.kernel.org/r/20260514173449.3282188-1-rananta@google.com
[promote bitfield to plain bool to avoid storage-unit races]
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfio/xe: avoid duplicate reset in xe_vfio_pci_reset_done</title>
<updated>2026-05-20T20:52:21+00:00</updated>
<author>
<name>GuoHan Zhao</name>
<email>zhaoguohan@kylinos.cn</email>
</author>
<published>2026-04-27T01:21:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b9285405c5f6144f4444f97bf3048d865e11cc1d'/>
<id>b9285405c5f6144f4444f97bf3048d865e11cc1d</id>
<content type='text'>
xe_vfio_pci_reset_done() sets deferred_reset and, when it manages to
acquire state_mutex itself, hands the cleanup off to
xe_vfio_pci_state_mutex_unlock().

That helper already clears deferred_reset and runs xe_vfio_pci_reset()
before dropping the mutex. Calling xe_vfio_pci_reset() again right
afterwards repeats the reset handling unnecessarily.

Fixes: 1f5556ec8b9e ("vfio/xe: Add device specific vfio_pci driver variant for Intel graphics")
Signed-off-by: GuoHan Zhao &lt;zhaoguohan@kylinos.cn&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Acked-by: Michał Winiarski &lt;michal.winiarski@intel.com&gt;
Link: https://lore.kernel.org/r/20260427012128.117051-1-zhaoguohan@kylinos.cn
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
xe_vfio_pci_reset_done() sets deferred_reset and, when it manages to
acquire state_mutex itself, hands the cleanup off to
xe_vfio_pci_state_mutex_unlock().

That helper already clears deferred_reset and runs xe_vfio_pci_reset()
before dropping the mutex. Calling xe_vfio_pci_reset() again right
afterwards repeats the reset handling unnecessarily.

Fixes: 1f5556ec8b9e ("vfio/xe: Add device specific vfio_pci driver variant for Intel graphics")
Signed-off-by: GuoHan Zhao &lt;zhaoguohan@kylinos.cn&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Acked-by: Michał Winiarski &lt;michal.winiarski@intel.com&gt;
Link: https://lore.kernel.org/r/20260427012128.117051-1-zhaoguohan@kylinos.cn
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>hisi_acc_vfio_pci: simplify the command for reading device information</title>
<updated>2026-05-20T20:52:09+00:00</updated>
<author>
<name>Weili Qian</name>
<email>qianweili@huawei.com</email>
</author>
<published>2026-05-14T09:20:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e1b3b4d6ff6dc7f615c2259961ce860ceb93e74e'/>
<id>e1b3b4d6ff6dc7f615c2259961ce860ceb93e74e</id>
<content type='text'>
The mailbox operation for the Hisi accelerator device now provides a
new read function that supports direct information retrieval by
specifying commands, thereby simplifying the related mailbox command
handling in the driver.

Signed-off-by: Weili Qian &lt;qianweili@huawei.com&gt;
Signed-off-by: Longfang Liu &lt;liulongfang@huawei.com&gt;
Link: https://lore.kernel.org/r/20260514092026.2018844-1-liulongfang@huawei.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The mailbox operation for the Hisi accelerator device now provides a
new read function that supports direct information retrieval by
specifying commands, thereby simplifying the related mailbox command
handling in the driver.

Signed-off-by: Weili Qian &lt;qianweili@huawei.com&gt;
Signed-off-by: Longfang Liu &lt;liulongfang@huawei.com&gt;
Link: https://lore.kernel.org/r/20260514092026.2018844-1-liulongfang@huawei.com
Signed-off-by: Alex Williamson &lt;alex@shazbot.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
