linux.git/include/uapi/linux/iommufd.h, branch v7.1-rc2

Merge branches 'fixes', 'arm/smmu/updates', 'arm/smmu/bindings', 'riscv', 'intel/vt-d', 'amd/amd-vi' and 'core' into next

2026-04-09T12:18:27+00:00

iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement

2026-03-24T14:04:35+00:00

>From hardware implementation perspective, a guest tegra241-cmdqv hardware
is different than the host hardware:
 - Host HW is backed by a VINTF (HYP_OWN=1)
 - Guest HW is backed by a VINTF (HYP_OWN=0)

The kernel driver has an implementation requirement of the HYP_OWN bit in
the VM. So, VMM must follow that to allow the same copy of Linux to work.

Add this requirement to the uAPI, which is currently missing.

Fixes: 4dc0d12474f9 ("iommu/tegra241-cmdqv: Add user-space use support")
Signed-off-by: Nicolin Chen 
Reviewed-by: Eric Auger 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Will Deacon

iommufd: Report ATS not supported status via IOMMU_GET_HW_INFO

2026-03-17T13:05:05+00:00

If the IOMMU driver reports that ATS is not supported for a device, set
the IOMMU_HW_CAP_PCI_ATS_NOT_SUPPORTED flag in the returned hardware
capabilities.

This uses a negative flag for UAPI compatibility. Existing userspace
assumes ATS is supported if no flag is present. This also ensures that
new userspace works correctly on both old and new kernels, where a
zero value implies ATS support.

When this flag is set, ATS cannot be used for the device. When it is
clear, ATS may be enabled when an appropriate HWPT is attached.

Reviewed-by: Samiullah Khawaja 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Shameer Kolothum 
Signed-off-by: Joerg Roedel

iommufd: Introduce data struct for AMD nested domain allocation

2026-01-18T09:56:12+00:00

Introduce IOMMU_HWPT_DATA_AMD_GUEST data type for IOMMU guest page table,
which is used for stage-1 in nested translation. The data structure
contains information necessary for setting up the AMD HW-vIOMMU support.

Reviewed-by: Jason Gunthorpe 
Reviewed-by: Nicolin Chen 
Reviewed-by: Vasant Hegde 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Joerg Roedel

iommu/amd: Add support for hw_info for iommu capability query

2026-01-18T09:56:09+00:00

AMD IOMMU Extended Feature (EFR) and Extended Feature 2 (EFR2) registers
specify features supported by each IOMMU hardware instance.
The IOMMU driver checks each feature-specific bits before enabling
each feature at run time.

For IOMMUFD, the hypervisor passes the raw value of amd_iommu_efr and
amd_iommu_efr2 to VMM via iommufd IOMMU_DEVICE_GET_HW_INFO ioctl.

Reviewed-by: Nicolin Chen 
Reviewed-by: Vasant Hegde 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Suravee Suthikulpanit 
Signed-off-by: Joerg Roedel

iommu/arm-smmu-v3-iommufd: Allow attaching nested domain for GBPA cases

2025-11-26T18:04:04+00:00

A vDEVICE has been a hard requirement for attaching a nested domain to the
device. This makes sense when installing a guest STE, since a vSID must be
present and given to the kernel during the vDEVICE allocation.

But, when CR0.SMMUEN is disabled, VM doesn't really need a vSID to program
the vSMMU behavior as GBPA will take effect, in which case the vSTE in the
nested domain could have carried the bypass or abort configuration in GBPA
register. Thus, having such a hard requirement doesn't work well for GBPA.

Skip vmaster allocation in arm_smmu_attach_prepare_vmaster() for an abort
or bypass vSTE. Note that device on this attachment won't report vevents.

Update the uAPI doc accordingly.

Link: https://patch.msgid.link/r/20251103172755.2026145-1-nicolinc@nvidia.com
Tested-by: Shameer Kolothum 
Signed-off-by: Nicolin Chen 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Pranjal Shrivastava 
Tested-by: Shuai Xue 
Signed-off-by: Jason Gunthorpe

iommufd: Destroy vdevice on idevice destroy

2025-07-18T20:33:08+00:00

Destroy iommufd_vdevice (vdev) on iommufd_idevice (idev) destruction so
that vdev can't outlive idev.

idev represents the physical device bound to iommufd, while the vdev
represents the virtual instance of the physical device in the VM. The
lifecycle of the vdev should not be longer than idev. This doesn't
cause real problem on existing use cases cause vdev doesn't impact the
physical device, only provides virtualization information. But to
extend vdev for Confidential Computing (CC), there are needs to do
secure configuration for the vdev, e.g. TSM Bind/Unbind. These
configurations should be rolled back on idev destroy, or the external
driver (VFIO) functionality may be impact.

The idev is created by external driver so its destruction can't fail.
The idev implements pre_destroy() op to actively remove its associated
vdev before destroying itself. There are 3 cases on idev pre_destroy():

  1. vdev is already destroyed by userspace. No extra handling needed.
  2. vdev is still alive. Use iommufd_object_tombstone_user() to
     destroy vdev and tombstone the vdev ID.
  3. vdev is being destroyed by userspace. The vdev ID is already
     freed, but vdev destroy handler is not completed. This requires
     multi-threads syncing - vdev holds idev's short term users
     reference until vdev destruction completes, idev leverages
     existing wait_shortterm mechanism for syncing.

idev should also block any new reference to it after pre_destroy(),
or the following wait shortterm would timeout. Introduce a 'destroying'
flag, set it to true on idev pre_destroy(). Any attempt to reference
idev should honor this flag under the protection of
idev->igroup->lock.

Link: https://patch.msgid.link/r/20250716070349.1807226-5-yilun.xu@linux.intel.com
Originally-by: Nicolin Chen 
Suggested-by: Jason Gunthorpe 
Reviewed-by: Kevin Tian 
Reviewed-by: Nicolin Chen 
Reviewed-by: Jason Gunthorpe 
Co-developed-by: "Aneesh Kumar K.V (Arm)" 
Signed-off-by: "Aneesh Kumar K.V (Arm)" 
Tested-by: Nicolin Chen 
Signed-off-by: Xu Yilun 
Signed-off-by: Jason Gunthorpe

iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support

2025-07-11T17:34:36+00:00

Add a new vEVENTQ type for VINTFs that are assigned to the user space.
Simply report the two 64-bit LVCMDQ_ERR_MAPs register values.

Link: https://patch.msgid.link/r/68161a980da41fa5022841209638aeff258557b5.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Alok Tiwari 
Reviewed-by: Pranjal Shrivastava 
Reviewed-by: Jason Gunthorpe 
Signed-off-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe

iommu/tegra241-cmdqv: Add user-space use support

2025-07-11T17:34:36+00:00

The CMDQV HW supports a user-space use for virtualization cases. It allows
the VM to issue guest-level TLBI or ATC_INV commands directly to the queue
and executes them without a VMEXIT, as HW will replace the VMID field in a
TLBI command and the SID field in an ATC_INV command with the preset VMID
and SID.

This is built upon the vIOMMU infrastructure by allowing VMM to allocate a
VINTF (as a vIOMMU object) and assign VCMDQs (HW QUEUE objs) to the VINTF.

So firstly, replace the standard vSMMU model with the VINTF implementation
but reuse the standard cache_invalidate op (for unsupported commands) and
the standard alloc_domain_nested op (for standard nested STE).

Each VINTF has two 64KB MMIO pages (128B per logical VCMDQ):
- Page0 (directly accessed by guest) has all the control and status bits.
- Page1 (trapped by VMM) has guest-owned queue memory location/size info.

VMM should trap the emulated VINTF0's page1 of the guest VM for the guest-
level VCMDQ location/size info and forward that to the kernel to translate
to a physical memory location to program the VCMDQ HW during an allocation
call. Then, it should mmap the assigned VINTF's page0 to the VINTF0 page0
of the guest VM. This allows the guest OS to read and write the guest-own
VINTF's page0 for direct control of the VCMDQ HW.

For ATC invalidation commands that hold an SID, it requires all devices to
register their virtual SIDs to the SID_MATCH registers and their physical
SIDs to the pairing SID_REPLACE registers, so that HW can use those as a
lookup table to replace those virtual SIDs with the correct physical SIDs.
Thus, implement the driver-allocated vDEVICE op with a tegra241_vintf_sid
structure to allocate SID_REPLACE and to program the SIDs accordingly.

This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.

Link: https://patch.msgid.link/r/fb0eab83f529440b6aa181798912a6f0afa21eb0.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Jason Gunthorpe
Reviewed-by: Pranjal Shrivastava
Signed-off-by: Nicolin Chen
Signed-off-by: Jason Gunthorpe

iommufd: Allow an input data_type via iommu_hw_info

2025-07-11T17:34:35+00:00

The iommu_hw_info can output via the out_data_type field the vendor data
type from a driver, but this only allows driver to report one data type.

Now, with SMMUv3 having a Tegra241 CMDQV implementation, it has two sets
of types and data structs to report.

One way to support that is to use the same type field bidirectionally.

Reuse the same field by adding an "in_data_type", allowing user space to
request for a specific type and to get the corresponding data.

For backward compatibility, since the ioctl handler has never checked an
input value, add an IOMMU_HW_INFO_FLAG_INPUT_TYPE to switch between the
old output-only field and the new bidirectional field.

Link: https://patch.msgid.link/r/887378a7167e1786d9d13cde0c36263ed61823d7.1752126748.git.nicolinc@nvidia.com
Reviewed-by: Kevin Tian 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Lu Baolu 
Reviewed-by: Pranjal Shrivastava 
Signed-off-by: Nicolin Chen 
Signed-off-by: Jason Gunthorpe