summaryrefslogtreecommitdiff
path: root/drivers/iommu
AgeCommit message (Collapse)Author
6 daysiommupt: Fix the end_index calculation in __map_range_leaf()Jason Gunthorpe
Sashiko noticed a mismatch of units in this math: num_leaves is actually the number of leaf *entries* (so a 16-item contiguous leaf is one num_leaves), while index is in items. The mismatch in maths causes __map_range_leaf() to exit early instead of efficiently filling a larger range of contiguous PTEs. The early exit is caught by the functions above and then __map_range_leaf() is re-invoked, so there is no functional issue. Correct the misuse of units by adjusting num_leaves with the leaf size and avoid the performance cost of looping externally. There are also some mismatched types for num_leaves; simplify things to remove the duplicated calculations. Fixes: d6c65b0fd621 ("iommupt: Avoid rewalking during map") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewd-by: Pranjal Shrivastava <praan@google.com> Tested-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
6 daysiommupt: Check for missing PAGE_SIZE in the pgsize_bitmapJason Gunthorpe
Sashiko pointed out that the driver could drop PAGE_SIZE from the pgsize_bitmap. That is technically allowed but nothing does it, and such an iommu_domain would not be used with the DMA API today. Still, it is against the design and it is trivial to fix up. Lift the PT_WARN_ON to the if branch and just skip the fast path. Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
6 daysiommu: Handle unmap error when iommu_debug is enabledJason Gunthorpe
Sashiko noticed a latent bug where the map error flow called iommu_unmap() which calls iommu_debug_unmap_begin()/iommu_debug_unmap_end() however since this is an error path the map flow never actually established the original iommu_debug_map() it will malfunction. Lift the unmap error handling into iommu_map_nosync() and reorder it so the trace_map()/iommu_debug_map() records the partial mapping and then immediately unmaps it. This avoid creating the unbalanced tracking and provides saner tracing instead of a unmap unmatched to any map. Fixes: ccc21213f013 ("iommu: Add calls for IOMMU_DEBUG_PAGEALLOC") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
6 daysiommu: Fix up map/unmap debugging for iommupt domainsJason Gunthorpe
Sashiko noticed a few issues in this path, and a few more were found on review. Tidy them up further. These are intertwined because the debug code depends on some of the WARN_ONs to function right: Lift into iommu_map_nosync(): - The might_sleep_if() - 0 pgsize_bitmap WARN_ON - Promote the illegal domain->type to a WARN_ON - WARN_ON for illegal gfp flags Then remove the return 0 since it is now safe to call iommu_debug_map(). Lift into __iommu_unmap(): - 0 pgsize_bitmap WARN_ON - Promote the illegal domain->type to a WARN_ON - iommu_debug_unmap_begin() This now pairs with the unconditional iommu_debug_map() on the mapping side. Thus iommu debugging now works for iommupt along with some of the other debugging features. Fixes: 99fb8afa16ad ("iommupt: Directly call iommupt's unmap_range()") Fixes: d6c65b0fd621 ("iommupt: Avoid rewalking during map") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
6 daysiommu: Fix loss of errno on map failure for classic opsJason Gunthorpe
A typo, likely from a rebase, inverted the condition and caused errors to be lost. Fix it to be "if (ret)". This was breaking iommu_create_device_direct_mappings() on drivers that don't use iommupt and don't fully set up their domain in alloc_pages() (i.e., SMMUv2). In this case the first call of iommu_create_device_direct_mappings() should fail due to the incompletely initialized domain. Since it wrongly returns success, the second call to iommu_create_device_direct_mappings() doesn't happen and IOMMU_RESV_DIRECT is never set up. Cc: stable@vger.kernel.org Fixes: d6c65b0fd621 ("iommupt: Avoid rewalking during map") Reported-by: Josua Mayer <josua@solid-run.com> Closes: https://lore.kernel.org/all/321c2e57-6a17-4aef-ba42-d2ebd577e472@solid-run.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Josua Mayer <josua@solid-run.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu/vt-d: Avoid NULL pointer dereference or refcount corruptionZhenzhong Duan
Commit 60f030f7418d ("iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE") fixed a NULL pointer dereference in an unlikely situation partly. If dev_pasid is not found in the dev_pasids list, it remains NULL. However, the teardown operations are executed unconditionally, this lead to a NULL pointer dereference or refcount corruption. If the domain was never attached to this IOMMU, info will be NULL, which would cause an immediate dereference when checking --info->refcnt. Even if info is not NULL, decrementing the refcount without having removed a valid PASID might unbalance the count. This could lead to premature dropping of the refcount to 0, potentially causing a use-after-free for the remaining active devices sharing the domain. Fix it by returning early if dev_pasid is NULL, before executing the teardown operations. Issue found by AI review and suggested by Kevin Tian. https://sashiko.dev/#/patchset/20260421031347.1408890-1-zhenzhong.duan%40intel.com Fixes: 60f030f7418d ("iommu/vt-d: Avoid use of NULL after WARN_ON_ONCE") Cc: stable@vger.kernel.org Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260422033538.95000-1-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu/vt-d: Fix oops due to out of scope accessZhenzhong Duan
Below oops triggers when kill QEMU process: Oops: general protection fault, probably for non-canonical address 0x7fffffff844eaaa7: 0000 [#1] SMP NOPTI Call Trace: <TASK> do_raw_spin_lock+0xaa/0xc0 _raw_spin_lock_irqsave+0x21/0x40 domain_remove_dev_pasid+0x52/0x160 intel_nested_set_dev_pasid+0x1b9/0x1e0 __iommu_set_group_pasid+0x56/0x120 pci_dev_reset_iommu_done+0xe3/0x180 pcie_flr+0x65/0x160 __pci_reset_function_locked+0x5b/0x120 vfio_pci_core_close_device+0x63/0xe0 [vfio_pci_core] vfio_df_close+0x4f/0xa0 vfio_df_unbind_iommufd+0x2d/0x60 vfio_device_fops_release+0x3e/0x40 __fput+0xe5/0x2c0 task_work_run+0x58/0xa0 do_exit+0x2c8/0x600 do_group_exit+0x2f/0xa0 get_signal+0x863/0x8c0 arch_do_signal_or_restart+0x24/0x100 exit_to_user_mode_loop+0x87/0x380 do_syscall_64+0x2ff/0x11e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The global static blocked domain is a dummy domain without corresponding dmar_domain structure, accessing beyond iommu_domain structure triggers oops easily. Fix it by return early in domain_remove_dev_pasid() like identity domain. Fixes: 7d0c9da6c150 ("iommu/vt-d: Add set_dev_pasid callback for dma domain") Cc: stable@vger.kernel.org Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260421031347.1408890-1-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu/vt-d: Disable DMAR for Intel Q35 IGFXNaval Alcalá
Intel Q35 integrated graphics (8086:29b2) exhibits broken DMAR behaviour similar to other G4x/GM45 devices for which DMAR is already disabled via quirks. When DMAR is enabled, the system may hard lock up during boot or early device initialization, requiring a reset. Add the missing PCI ID to the existing quirk list to disable DMAR for this device. Fixes: 1f76249cc3be ("iommu/vt-d: Declare Broadwell igfx dmar support snafu") Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=201185 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=216064 Signed-off-by: Naval Alcalá <ari@naval.cat> Link: https://lore.kernel.org/r/20260410161622.13549-1-ari@naval.cat Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Warn on premature unblock during DMA aliased sibling resetNicolin Chen
When two aliased siblings are in the same iommu_group, they might share the same RID. The reset functions don't support this case, though it is unclear whether there is a real case of having an ATS capable device on a PCI/PCI-X bus. Theoretically, however, if two aliased devices are resetting concurrently, one might be unblocked prematurely in the middle of the reset by the other sibling who completes the reset first. This isn't a regression from this series but it's better to spit a warning, so we can know if such use case is common enough for us to make subsequent patches for its coverage. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to resetNicolin Chen
In __iommu_group_set_domain_internal(), concurrent domain attachments are rejected when any device in the group is recovering. This is necessary to fence concurrent attachments to a multi-device group where devices might share the same RID due to PCI DMA alias quirks, but triggers the WARN_ON in __iommu_group_set_domain_nofail(). Other IOMMU_SET_DOMAIN_MUST_SUCCEED callers in detach/teardown paths, such as __iommu_group_set_core_domain and __iommu_release_dma_ownership, should not be rejected, as the domain would be freed anyway in these nofail paths while group->domain is still pointing to it. So pci_dev_reset_iommu_done() could trigger a UAF when re-attaching group->domain. Honor the IOMMU_SET_DOMAIN_MUST_SUCCEED flag, allowing the callers through the group->recovery_cnt fence, so as to update the group->domain pointer. Instead add a gdev->blocked check in the device iteration loop, to prevent any concurrent per-device detachment. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Closes: https://sashiko.dev/#/patchset/20260407194644.171304-1-nicolinc%40nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid()Nicolin Chen
If a device is blocked, its PASID domains are already detached. Repeating iommu_remove_dev_pasid() is unnecessary and might trigger ATS invalidation timeouts. Skip the iommu_remove_dev_pasid() call upon gdev->blocked. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Closes: https://sashiko.dev/#/patchset/20260407194644.171304-1-nicolinc%40nvidia.com Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Fix nested pci_dev_reset_iommu_prepare/done()Nicolin Chen
Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function() internally while both are calling pci_dev_reset_iommu_prepare/done(). As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call will trigger a WARN_ON and return -EBUSY, resulting in failing the entire device reset. On the other hand, removing the outer calls in the PCI callers is unsafe. As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev() execute custom firmware waits after their inner pcie_flr() completes. If the IOMMU protection relies solely on the inner reset, the IOMMU will be unblocked prematurely while the device is still resetting. Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant. Introduce gdev->reset_depth to handle the re-entries on the same device. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Reported-by: Shuai Xue <xueshuai@linux.alibaba.com> Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/ Suggested-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done()Nicolin Chen
Now the helpers handle per-gdev resets. Replace __iommu_set_group_pasid() with set_dev_pasid() accordingly, in the pci_dev_reset_iommu_done(). Also add max_pasids check as other callers. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Reported-by: Shuai Xue <xueshuai@linux.alibaba.com> Closes: https://lore.kernel.org/all/ad858513-09fc-455e-bbc5-fe38a225cc78@linux.alibaba.com/ Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Replace per-group resetting_domain with per-gdev blocked flagNicolin Chen
The core tracks device resetting states with a per-group resetting_domain, while a reset is actually per group-device. Such a mismatch might lead to confusion and even difficulty to untangle per-gdev handling requirement. Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function() internally while both are calling pci_dev_reset_iommu_prepare/done(). And the solution requires the core to track at the group_device level as well. Introduce a 'blocked' flag to struct group_device, to allow a multi-device group to isolate concurrent device resets independently. As the reset routine is per gdev, it cannot clear group->resetting_domain without iterating over the device list to ensure no other device is being reset. Simplify it by replacing the resetting_domain with a 'recovery_cnt' in the struct iommu_group. No functional change. But this is essential to apply following bug fixes. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Reported-by: Shuai Xue <xueshuai@linux.alibaba.com> Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/ Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Fix kdocs of pci_dev_reset_iommu_done()Nicolin Chen
Remove the duplicated word. No functional change. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done()Nicolin Chen
Local sashiko review pointed it out that group->domain could be NULL when a default domain fails to allocate during the first probe, which can crash at domain->ops->attach_dev dereference in __iommu_attach_device() invoked by pci_dev_reset_iommu_done(). pci_dev_reset_iommu_prepare() is fine as an old_domain pointer can be NULL. Skip the re-attach in pci_dev_reset_iommu_done() to fix the bug. Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()") Cc: stable@vger.kernel.org Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu/amd: Bounds-check devid in __rlookup_amd_iommu()Jose Fernandez (Anthropic)
iommu_device_register() walks every device on the PCI bus via bus_for_each_dev() and calls amd_iommu_probe_device() for each. The inlined check_device() path computes the device's sbdf, calls rlookup_amd_iommu() to find the owning IOMMU, and only afterwards verifies devid <= pci_seg->last_bdf. __rlookup_amd_iommu() indexes rlookup_table[devid] with no bounds check of its own, so for a PCI device whose BDF is not described by the IVRS, the lookup reads past the end of the allocation before the caller's bounds check can run. This was harmless before commit e874c666b15b ("iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()"): the table was a zeroed page-order allocation, so the over-read returned NULL and the caller's NULL check skipped the device. After that commit the table is a tight kvcalloc() and the over-read returns adjacent slab contents, which check_device() then dereferences as a struct amd_iommu *, causing a boot-time GPF. Seen on Google Compute Engine ct6e VMs, where the virtualized IVRS describes only the four TPU endpoints 00:04.0-07.0; the gVNIC at 00:08.0 (devid 0x40) indexes 56 bytes past the 456-byte allocation, into the adjacent kmalloc-512 slab object: pci 0000:00:04.0: Adding to iommu group 0 pci 0000:00:05.0: Adding to iommu group 1 pci 0000:00:06.0: Adding to iommu group 2 pci 0000:00:07.0: Adding to iommu group 3 Oops: general protection fault, probably for non-canonical address 0x3a64695f78746382: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.22 #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/06/2025 RIP: 0010:amd_iommu_probe_device+0x54/0x3a0 Call Trace: __iommu_probe_device+0x107/0x520 probe_iommu_group+0x29/0x50 bus_for_each_dev+0x7e/0xe0 iommu_device_register+0xc9/0x240 iommu_go_to_state+0x9c0/0x1c60 amd_iommu_init+0x14/0x40 pci_iommu_init+0x16/0x60 do_one_initcall+0x47/0x2f0 Guard the array access in __rlookup_amd_iommu(). With the fix applied on 6.18.22, the gVNIC at 00:08.0 is skipped cleanly and the VM boots. Fixes: e874c666b15b ("iommu/amd: Change rlookup, irq_lookup, and alias to use kvalloc()") Cc: stable@vger.kernel.org Reported-by: Ziyuan Chen <zc@anthropic.com> Tested-by: Ziyuan Chen <zc@anthropic.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Assisted-by: Claude:unspecified Signed-off-by: Jose Fernandez (Anthropic) <jose.fernandez@linux.dev> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
10 daysiommu/amd: Remove latent out-of-bounds access in IOMMU debugfsEder Zulian
In iommu_mmio_write() and iommu_capability_write(), the variables dbg_mmio_offset and dbg_cap_offset are declared as int. However, they are populated using kstrtou32_from_user(). If a user provides a sufficiently large value, it can become a negative integer. Prior to this patch, the AMD IOMMU debugfs implementation was already protected by different mechanisms. 1. #define OFS_IN_SZ 8 ensures the user string <= 8 bytes, so e.g. 0xffffffff isn't a valid input. if (cnt > OFS_IN_SZ) return -EINVAL; 2. Implicit type promotion in iommu_mmio_write(), dbg_mmio_offset is int and iommu->mmio_phys_end is u64 if (dbg_mmio_offset > iommu->mmio_phys_end - sizeof(u64)) return -EINVAL; 3. The show handlers would currently catch the negative number and refuse to perform the read. Replace kstrtou32_from_user() with kstrtos32_from_user() to parse the input, and check for negative values to explicitly prevent out-of-bounds memory accesses directly in iommu_mmio_write() and iommu_capability_write(). Signed-off-by: Eder Zulian <ezulian@redhat.com> Fixes: 7a4ee419e8c1 ("iommu/amd: Add debugfs support to dump IOMMU MMIO registers") Cc: stable@vger.kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-05-04iommu/amd: Fix precedence order in set_dte_passthrough()Weinan Liu
Bitwise OR | operator has a higher precedence than the ternary ?: operatior. It will be incorrectly evaluated as: new->data[1] |= (FIELD_PREP(...) | dev_data->ats_enabled) ? DTE_FLAG_IOTLB : 0; Wrap the conditional operation in parentheses to enforce the correct evaluation order. Fixes: 93eee2a49c1b ("iommu/amd: Refactor logic to program the host page table in DTE") Signed-off-by: Weinan Liu <wnliu@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-27iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86Mostafa Saleh
The dma_sync_single_for_device() function expects a dma_addr_t, but iommu_pages_flush_incoherent() was incorrectly passing a virtual address. Since iommu_pages_start_incoherent() enforces a 1:1 mapping between DMA addresses and physical addresses (checked via WARN_ON), we can convert the virtual address to a physical address before passing it to the DMA API. This also matches the behaviour of the other non-x86 in iommu_pages_free_incoherent(), which uses virt_to_phys(virt); Fixes: 36ae67b13976 ("iommu/pages: Add support for incoherent IOMMU page table walkers") Signed-off-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-27iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19Vasant Hegde
Due to CVE-2023-20585, the PPR log buffer must use the maximum supported size (512K) on Genoa (Family 0x19, model >= 0x10) systems when SNP is enabled, to mitigate a potential security vulnerability. Note that Family 0x19 models below 0x10 (Milan) do not support PPR when SNP is enabled. Hence the PPR log size increase is only applied for model >= 0x10. All other systems continue to use the default PPR log buffer size (8K). Apply the errata fix by making the following changes: - Introduce global new variable (amd_iommu_pprlog_size) to have PPR log buffer size. Adjust variable size for Genoa family. - Extend 'amd_iommu_apply_erratum_snp()' to also set the PPR log buffer size to maximum for Family 0x19 model >= 0x10 when SNP is enabled. - Rename PPR_* macros to make it more readable. Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3016.html Cc: Borislav Petkov <bp@alien8.de> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-27iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19Vasant Hegde
Due to CVE-2023-20585, the Event log buffer must use the maximum supported size (512K) on Milan/Genoa (Family 0x19) systems when SNP is enabled, to mitigate a potential security vulnerability. All other systems continue to use the default Event log buffer size (8K). Apply the errata fix by making the following changes: * Introduce new global variable (amd_iommu_evtlog_size) to have event log buffer size. Adjust variable size for family 0x19. * Since 'iommu_snp_enable()' must be called after the core IOMMU subsystem is initialized, it cannot be moved to the early init stage. The SNP errata must also be applied after the 'iommu_snp_enable()' check. Therefore, 'alloc_event_buffer()' and 'iommu_enable_event_buffer()' are now called in the IOMMU_ENABLED state, after the errata is applied. * Adjust alloc_event_buffer() and iommu_enable_event_buffer() to handle all IOMMU instances. * Also rename EVT_* macros to make it more readable. Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-3016.html Cc: Borislav Petkov <bp@alien8.de> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Dheeraj Kumar Srivastava <dheerajkumar.srivastava@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-17Merge tag 'dma-mapping-7.1-2026-04-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping updates from Marek Szyprowski: - added support for batched cache sync, what improves performance of dma_map/unmap_sg() operations on ARM64 architecture (Barry Song) - introduced DMA_ATTR_CC_SHARED attribute for explicitly shared memory used in confidential computing (Jiri Pirko) - refactored spaghetti-like code in drivers/of/of_reserved_mem.c and its clients (Marek Szyprowski, shared branch with device-tree updates to avoid merge conflicts) - prepared Contiguous Memory Allocator related code for making dma-buf drivers modularized (Maxime Ripard) - added support for benchmarking dma_map_sg() calls to tools/dma utility (Qinxin Xia) * tag 'dma-mapping-7.1-2026-04-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: (24 commits) dma-buf: heaps: system: document system_cc_shared heap dma-buf: heaps: system: add system_cc_shared heap for explicitly shared memory dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory mm: cma: Export cma_alloc(), cma_release() and cma_get_name() dma: contiguous: Export dev_get_cma_area() dma: contiguous: Make dma_contiguous_default_area static dma: contiguous: Make dev_get_cma_area() a proper function dma: contiguous: Turn heap registration logic around of: reserved_mem: rework fdt_init_reserved_mem_node() of: reserved_mem: clarify fdt_scan_reserved_mem*() functions of: reserved_mem: rearrange code a bit of: reserved_mem: replace CMA quirks by generic methods of: reserved_mem: switch to ops based OF_DECLARE() of: reserved_mem: use -ENODEV instead of -ENOENT of: reserved_mem: remove fdt node from the structure dma-mapping: fix false kernel-doc comment marker dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg dma-mapping: Separate DMA sync issuing and completion waiting arm64: Provide dcache_inval_poc_nosync helper arm64: Provide dcache_clean_poc_nosync helper ...
2026-04-16Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "Several fixes: - Add missing static const - Correct type 1 emulation for VFIO_CHECK_EXTENSION when no-iommu is turned on - Fix selftest memory leak and syzkaller splat - Fix missed -EFAULT in fault reporting write() fops - Fix a race where map/unmap with the internal IOVA allocator can unmap things it should not" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix a race with concurrent allocation and unmap iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 format iommufd: Fix return value of iommufd_fault_fops_write() iommufd: update outdated comment for renamed iommufd_hw_pagetable_alloc() iommufd/selftest: Fix page leaks in mock_viommu_{init,destroy} iommufd: vfio compatibility extension check for noiommu mode iommufd: Constify struct dma_buf_attach_ops
2026-04-15Merge tag 'iommu-updates-v7.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core: - Support for RISC-V IO-page-table format in generic iommupt code ARM-SMMU Updates: - Introduction of an "invalidation array" for SMMUv3, which enables future scalability work and optimisations for devices with a large number of SMMUv3 instances - Update the conditions under which the SMMUv3 driver works around hardware errata for invalidation on MMU-700 implementations - Fix broken command filtering for the host view of NVIDIA's "cmdqv" SMMUv3 extension - MMU-500 device-tree binding additions for Qualcomm Eliza & Hawi SoCs Intel VT-d: - Support for dirty tracking on domains attached to PASID - Removal of unnecessary read*()/write*() wrappers - Improvements to the invalidation paths AMD Vi: - Race-condition fixed in debugfs code - Make log buffer allocation NUMA aware RISC-V: - IO-TLB flushing improvements - Minor fixes" * tag 'iommu-updates-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (48 commits) iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCY dt-bindings: arm-smmu: qcom: Add compatible for Hawi SoC iommu/amd: Invalidate IRT cache for DMA aliases iommu/riscv: Remove overflows on the invalidation path iommu/amd: Fix clone_alias() to use the original device's devid iommu/vt-d: Remove the remaining pages along the invalidation path iommu/vt-d: Pass size_order to qi_desc_piotlb() not npages iommu/vt-d: Split piotlb invalidation into range and all iommu/vt-d: Remove dmar_writel() and dmar_writeq() iommu/vt-d: Remove dmar_readl() and dmar_readq() iommufd/selftest: Test dirty tracking on PASID iommu/vt-d: Support dirty tracking on PASID iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointer iommu/vt-d: Block PASID attachment to nested domain with dirty tracking iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain iommu/riscv: Fix signedness bug iommu/amd: Fix illegal cap/mmio access in IOMMU debugfs iommu/amd: Fix illegal device-id access in IOMMU debugfs iommu/tegra241-cmdqv: Update uAPI to clarify HYP_OWN requirement iommu/tegra241-cmdqv: Set supports_cmd op in tegra241_vcmdq_hw_init() ...
2026-04-15Merge tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - new DRM RAS infrastructure using netlink - amdgpu: enable DC on CIK APUs, and more IP enablement, and more user queue work - xe: purgeable BO support, and new hw enablement - dma-buf : add revocable operations Full summary: mm: - two-pass MMU interval notifiers - add gpu active/reclaim per-node stat counters math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() rust: - shared tag with driver-core: register macro and io infra - core: rework DMA coherent API - core: add interop::list to interop with C linked lists - core: add more num::Bounded operations - core: enable generic_arg_infer and add EMSGSIZE - workqueue: add ARef<T> support for work and delayed work - add GPU buddy allocator abstraction - add DRM shmem GEM helper abstraction - allow drm:::Device to dispatch work and delayed work items to driver private data - add dma_resv_lock helper and raw accessors core: - introduce DRM RAS infrastructure over netlink - add connector panel_type property - fourcc: add ARM interleaved 64k modifier - colorop: add destroy helper - suballoc: split into alloc and init helpers - mode: provide DRM_ARGB_GET*() macros for reading color components edid: - provide drm_output_color_Format dma-buf: - provide revoke mechanism for shared buffers - rename move_notify to invalidate_mappings - always enable move_notify - protect dma_fence_ops with RCU and improve locking - clean pages with helpers atomic: - allocate drm_private_state via callback - helper: use system_percpu_wq buddy: - make buddy allocator available to gpu level - add kernel-doc for buddy allocator - improve aligned allocation ttm: - fix fence signalling - improve tests and docs - improve handling of gfp_retry_mayfail - use per-node stat counters to track memory allocations - port pool to use list_lru - drop NUMA specific pools - make pool shrinker numa aware - track allocated pages per numa node coreboot: - cleanup coreboot framebuffer support sched: - fix race condition in drm_sched_fini pagemap: - enable THP support - pass pagemap_addr by reference gem-shmem: - Track page accessed/dirty status across mmap/vmap gpusvm: - reenable device to device migration - fix unbalanced unclock bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus DT bindings - anx7625: Fix USB Type-C handling - cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check - Support Lontium LT8713SX DP MST bridge plus DT bindings - analogix_dp: Use DP helpers for link training panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - panel-edp: Fix timings for BOE NV140WUM-N64 - ilitek-ili9882t: Allow GPIO calls to sleep - jadard: Support TAIGUAN XTI05101-01A - lxd: Support LXD M9189A plus DT bindings - mantix: Fix pixel clock; Clean up - motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings - novatek: Support Novatek/Tianma NT37700F plus DT bindings - simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3" - novatek-nt36672a: Use mipi_dsi_*_multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings amdgpu: - enable DC by default on CIK APUs - userq fence ioctl param size fixes - set panel_type to OLED for eDP - refactor DC i2c code - FAMS2 update - rework ttm handling to allow multiple engines - DC DCE 6.x cleanup - DC support for NUTMEG/TRAVIS DP bridge - DCN 4.2 support - GC12 idle power fix for compute - use struct drm_edid in non-DC code - enable NV12/P010 support on primary planes - support newer IP discovery tables - VCN/JPEG 5.0.2 support - GC/MES 12.1 updates - USERQ fixes - add DC idle state manager - eDP DSC seamless boot amdkfd: - GC 12.1 updates - non 4K page fixes xe: - basic Xe3p_LPG and NVL-P enabling patches - allow VM_BIND decompress support - add purgeable buffer object support - add xe_vm_get_property_ioctl - restrict multi-lrc to VCS/VECS engines - allow disabling VM overcommit in fault mode - dGPU memory optimizations - Workaround cleanups and simplification - Allow VFs VRAM quote changes using sysfs - convert GT stats to per-cpu counters - pagefault refactors - enable multi-queue on xe3p_xpc - disable DCC on PTL - make MMIO communication more robust - disable D3Cold for BMG on specific platforms - vfio: improve FLR sync for Xe VFIO i915/display: - C10/C20/LT PHY PLL divider verification - use trans push mechanism to generate PSR frame change on LNL+ - refactor DP DSC slice config - VGA decode refactoring - refactor DPT, gen2-4 overlay, masked field register macro helpers - refactor stolen memory allocation decisions - prepare for UHBR DP tunnels - refactor LT PHY PLL to use DPLL framework - implement register polling/waiting in display code - add shared stepping header between i915 and display i915: - fix potential overflow of shmem scatterlist length nouveau: - provide Z cull info to userspace - initial GA100 support - shutdown on PCI device shutdown nova-core: - harden GSP command queue - add support for large RPCs - simplify GSP sequencer and message handling - refactor falcon firmware handling - convert to new register macro - conver to new DMA coherent API - use checked arithmetic - add debugfs support for gsp-rm log buffers - fix aux device registration for multi-GPU msm: - CI: - Uprev mesa - Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices - Core: - Switched to of_get_available_child_by_name() - DPU: - Fixes for DSC panels - Fixed brownout because of the frequency / OPP mismatch - Quad pipe preparation (not enabled yet) - Switched to virtual planes by default - Dropped VBIF_NRT support - Added support for Eliza platform - Reworked alpha handling - Switched to correct CWB definitions on Eliza - Dropped dummy INTF_0 on MSM8953 - Corrected INTFs related to DP-MST - DP: - Removed debug prints looking into PHY internals - DSI: - Fixes for DSC panels - RGB101010 support - Support for SC8280XP - Moved PHY bindings from display/ to phy/ - GPU: - Preemption support for x2-85 and a840 - IFPC support for a840 - SKU detection support for x2-85 and a840 - Expose AQE support (VK ray-pipeline) - Avoid locking in VM_BIND fence signaling path - Fix to avoid reclaim in GPU snapshot path - Disallow foreign mapping of _NO_SHARE BOs - HDMI: - Fixed infoframes programming - MDP5: - Dropped support for MSM8974v1 - Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998 panthor: - add tracepoints for power and IRQs - fix fence handling - extend timestamp query with flags - support various sources for timestamp queries tyr: - fix names and model/versions rockchip: - vop2: use drm logging function - rk3576 displayport support - support CRTC background color atmel-hlcdc: - support sana5d65 LCD controller tilcdc: - use DT bindings schema - use managed DRM interfaces - support DRM_BRIDGE_ATTACH_NO_CONNECTOR verisilicon: - support DC8200 + DT bindings virtgpu: - support PRIME import with 3D enabled komeda: - fix integer overflow in AFBC checks mcde: - improve bridge handling gma500: - use drm client buffer for fbdev framebuffer amdxdna: - add sensors ioctls - provide NPU power estimate - support column utilization sensor - allow forcing DMA through IOMMU IOVA - support per-BO mem usage queries - refactor GEM implementation ivpu: - update boot API to v3.29.4 - limit per-user number of doorbells/contexts - perform engine reset on TDR error loongson: - replace custom code with drm_gem_ttm_dumb_map_offset() imx: - support planes behind the primary plane - fix bus-format selection vkms: - support CRTC background color v3d: - improve handling of struct v3d_stats komeda: - support Arm China Linlon D6 plus DT bindings imagination: - improve power-off sequence - support context-reset notification from firmware mediatek: - mtk_dsi: enable hs clock during pre-enable - Remove all conflicting aperture devices during probe - Add support for mt8167 display blocks" * tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernel: (1735 commits) drm/ttm/tests: Remove checks from ttm_pool_free_no_dma_alloc drm/ttm/tests: fix lru_count ASSERT drm/vram: remove DRM_VRAM_MM_FILE_OPERATIONS from docs drm/fb-helper: Fix a locking bug in an error path dma-fence: correct kernel-doc function parameter @flags ttm/pool: track allocated_pages per numa node. ttm/pool: make pool shrinker NUMA aware (v2) ttm/pool: drop numa specific pools ttm/pool: port to list_lru. (v2) drm/ttm: use gpu mm stats to track gpu memory allocations. (v4) mm: add gpu active/reclaim per-node stat counters (v2) gpu: nova-core: fix missing colon in SEC2 boot debug message gpu: nova-core: vbios: use from_le_bytes() for PCI ROM header parsing gpu: nova-core: bitfield: fix broken Default implementation gpu: nova-core: falcon: pad firmware DMA object size to required block alignment gpu: nova-core: gsp: fix undefined behavior in command queue code drm/shmem_helper: Make sure PMD entries get the writeable upgrade accel/ivpu: Trigger recovery on TDR with OS scheduling drm/msm: Use of_get_available_child_by_name() dt-bindings: display/msm: move DSI PHY bindings to phy/ subdir ...
2026-04-11iommufd: Fix a race with concurrent allocation and unmapSina Hassani
iopt_unmap_iova_range() releases the lock on iova_rwsem inside the loop body when getting to the more expensive unmap operations. This is fine on its own, except the loop condition is based on the first area that matches the unmap address range. If a concurrent call to map picks an area that was unmapped in previous iterations, the loop mistakenly tries to unmap it. This is reproducible by having one userspace thread map buffers and pass them to another thread that unmaps them. The problem manifests as EBUSY errors with single page mappings. Fix this by advancing the start pointer after unmapping an area. This ensures each iteration only examines the IOVA range that remains mapped, which is guaranteed not to have overlaps. Cc: stable@vger.kernel.org Fixes: 51fe6141f0f6 ("iommufd: Data structure to provide IOVA to PFN mapping") Link: https://patch.msgid.link/r/CAAJpGJSR4r_ds1JOjmkqHtsBPyxu8GntoeW08Sk5RNQPmgi+tg@mail.gmail.com Signed-off-by: Sina Hassani <sina@openai.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-09Merge tag 'iommu-fixes-v7.0-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull IOMMU fix from Will Deacon: - Fix regression introduced by the empty MMU gather fix in -rc7, where the ->iotlb_sync() callback can be elided incorrectly, resulting in boot failures (hangs), crashes and potential memory corruption. * tag 'iommu-fixes-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: Ensure .iotlb_sync is called correctly
2026-04-09Merge branches 'fixes', 'arm/smmu/updates', 'arm/smmu/bindings', 'riscv', ↵Will Deacon
'intel/vt-d', 'amd/amd-vi' and 'core' into next
2026-04-09iommu: Ensure .iotlb_sync is called correctlyRobin Murphy
Many drivers have no reason to use the iotlb_gather mechanism, but do still depend on .iotlb_sync being called to properly complete an unmap. Since the core code is now relying on the gather to detect when there is legitimately something to sync, it should also take care of encoding a successful unmap when the driver does not touch the gather itself. Fixes: 90c5def10bea ("iommu: Do not call drivers for empty gathers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Will Deacon <will@kernel.org>
2026-04-09iommu/vt-d: Restore IOMMU_CAP_CACHE_COHERENCYAlex Williamson
In removing IOMMU_CAP_DEFERRED_FLUSH, the below referenced commit was over-eager in removing the return, resulting in the test for IOMMU_CAP_CACHE_COHERENCY falling through to an irrelevant option. Restore dropped return. Fixes: 1c18a1212c77 ("iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domain") Signed-off-by: Alex Williamson <alex.williamson@nvidia.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-04-07Merge v7.0-rc7 into drm-nextSimona Vetter
Thomas Zimmermann needs 2f42c1a61616 ("drm/ast: dp501: Fix initialization of SCU2C") for drm-misc-next. Conflicts: - drivers/gpu/drm/amd/display/dc/hwss/dcn401/dcn401_hwseq.c Just between e927b36ae18b ("drm/amd/display: Fix NULL pointer dereference in dcn401_init_hw()") and it's cherry-pick that confused git. - drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c Deleted in 6b0a6116286e ("drm/amd/pm: Unify version check in SMUv11") but some cherry-picks confused git. Same for v12/v14. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2026-04-02Merge tag 'iommu-fixes-v7.0-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fixes from Joerg Roedel: - IOMMU-PT related compile breakage in for AMD driver - IOTLB flushing behavior when unmapped region is larger than requested due to page-sizes - Fix IOTLB flush behavior with empty gathers * tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline iommupt: Fix short gather if the unmap goes into a large mapping iommu: Do not call drivers for empty gathers
2026-04-02iommu/amd: Invalidate IRT cache for DMA aliasesMagnus Kalland
DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared between multiple device IDs. See commit 3c124435e8dd ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more information on this. However, the AMD IOMMU driver currently invalidates IRTE cache entries on a per-device basis whenever an IRTE is updated, not for each alias. This approach leaves stale IRTE cache entries when an IRTE is cached under one DMA alias but later updated and invalidated through a different alias. In such cases, the original device ID is never invalidated, since it is programmed via aliasing. This incoherency bug has been observed when IRTEs are cached for one Non-Transparent Bridge (NTB) DMA alias, later updated via another. Fix this by invalidating the interrupt remapping table cache for all DMA aliases when updating an IRTE. Co-developed-by: Lars B. Kristiansen <larsk@dolphinics.com> Signed-off-by: Lars B. Kristiansen <larsk@dolphinics.com> Co-developed-by: Jonas Markussen <jonas@dolphinics.com> Signed-off-by: Jonas Markussen <jonas@dolphinics.com> Co-developed-by: Tore H. Larsen <torel@simula.no> Signed-off-by: Tore H. Larsen <torel@simula.no> Signed-off-by: Magnus Kalland <magnus@dolphinics.com> Link: https://lore.kernel.org/linux-iommu/9204da81-f821-4034-b8ad-501e43383b56@amd.com/ Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/riscv: Remove overflows on the invalidation pathJason Gunthorpe
Since RISC-V supports a sign extended page table it should support a gather->end of ULONG_MAX, but if this happens it will infinite loop because of the overflow. Also avoid overflow computing the length by moving the +1 to the other side of the < Fixes: 488ffbf18171 ("iommu/riscv: Paging domain support") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/amd: Fix clone_alias() to use the original device's devidVasant Hegde
Currently clone_alias() assumes first argument (pdev) is always the original device pointer. This function is called by pci_for_each_dma_alias() which based on topology decides to send original or alias device details in first argument. This meant that the source devid used to look up and copy the DTE may be incorrect, leading to wrong or stale DTE entries being propagated to alias device. Fix this by passing the original pdev as the opaque data argument to both the direct clone_alias() call and pci_for_each_dma_alias(). Inside clone_alias(), retrieve the original device from data and compute devid from it. Fixes: 3332364e4ebc ("iommu/amd: Support multiple PCI DMA aliases in device table") Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Remove the remaining pages along the invalidation pathJason Gunthorpe
This was only being used to signal that a flush all should be used. Use mask/size_order >= 52 to signal this instead. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/3-v1-f175e27af136+11647-iommupt_inv_vtd_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Pass size_order to qi_desc_piotlb() not npagesJason Gunthorpe
It doesn't make sense for the caller to compute mask, throw it away and then have qi_desc_piotlb() compute it again. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/2-v1-f175e27af136+11647-iommupt_inv_vtd_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Split piotlb invalidation into range and allJason Gunthorpe
Currently these call chains are muddled up by using npages=-1, but only one caller has the possibility to do both options. Simplify qi_flush_piotlb() to qi_flush_piotlb_all() since all callers pass npages=-1. Split qi_batch_add_piotlb() into qi_batch_add_piotlb_all() and related helpers. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1-v1-f175e27af136+11647-iommupt_inv_vtd_jgg@nvidia.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Remove dmar_writel() and dmar_writeq()Bjorn Helgaas
dmar_writel() and dmar_writeq() do nothing other than expand to the generic writel() and writeq(), and the dmar_write*() wrappers are used inconsistently. Remove the dmar_write*() wrappers and use writel() and writeq() directly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Link: https://lore.kernel.org/r/20260217214438.3395039-3-bhelgaas@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Remove dmar_readl() and dmar_readq()Bjorn Helgaas
dmar_readl() and dmar_readq() do nothing other than expand to the generic readl() and readq(), and the dmar_read*() wrappers are used inconsistently. Remove the dmar_read*() wrappers and use readl() and readq() directly. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Link: https://lore.kernel.org/r/20260217214438.3395039-2-bhelgaas@google.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Support dirty tracking on PASIDZhenzhong Duan
In order to support passthrough device with PASID capability in QEMU, e.g., DSA device, kernel needs to support attaching PASID to a domain. But attaching is not allowed if the domain is a second stage domain or nested domain with dirty tracking. The reason is kernel lacking support for dirty tracking on such domain attached to PASID. By adding dirty tracking on PASID, the check can be removed. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260330101108.12594-4-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Rename device_set_dirty_tracking() and pass dmar_domain pointerZhenzhong Duan
device_set_dirty_tracking() sets dirty tracking on all devices attached to a domain, also on all PASIDs attached to same domain in subsequent patch. So rename it as domain_set_dirty_tracking() and pass dmar_domain pointer to better align to what it does. No functional changes intended. Suggested-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260330101108.12594-3-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-02iommu/vt-d: Block PASID attachment to nested domain with dirty trackingZhenzhong Duan
Kernel lacks dirty tracking support on nested domain attached to PASID, fails the attachment early if nesting parent domain is dirty tracking configured, otherwise dirty pages would be lost. Cc: stable@vger.kernel.org Fixes: 67f6f56b5912 ("iommu/vt-d: Add set_dev_pasid callback for nested domain") Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20260330101108.12594-2-zhenzhong.duan@intel.com Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Fixes: 67f6f56b5912 ("iommu/vt-d: Add set_dev_pasid callback for nested domain") Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-04-01iommu/dma: Always allow DMA-FQ when iommupt provides the iommu_domainJason Gunthorpe
iommupt always supports the semantics required for DMA-FQ, when drivers are converted to use it they automatically get support. Detect iommpt directly instead of using IOMMU_CAP_DEFERRED_FLUSH and remove IOMMU_CAP_DEFERRED_FLUSH from converted drivers. This will also enable DMA-FQ on RISC-V. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-31iommufd/selftest: Remove MOCK_IOMMUPT_AMDV1 formatPranjal Shrivastava
syzbot found that allocating a mock domain with AMDV1 format could cause a WARN_ON because the selftest enabled DYNAMIC_TOP without providing the required driver_ops. The AMDV1 format in the selftest was a placeholder and was not actually used by any of the existing selftests. Instead of adding dummy driver_ops to satisfy the requirements of a format we don't currently test, remove the AMDV1 format option from the selftest. The MOCK_IOMMUPT_DEFAULT and MOCK_IOMMUPT_HUGE formats are unaffected as they use the amdv1_mock variant which does not enable DYNAMIC_TOP. Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Link: https://patch.msgid.link/r/20260330092609.2659235-1-praan@google.com Reported-by: syzbot+453eb7add07c3767adab@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69c1d50b.a70a0220.3cae05.0001.GAE@google.com/ Signed-off-by: Pranjal Shrivastava <praan@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31iommufd: Fix return value of iommufd_fault_fops_write()Zhenzhong Duan
copy_from_user() may return number of bytes failed to copy, we should not pass over this number to user space to cheat that write() succeed. Instead, -EFAULT should be returned. Link: https://patch.msgid.link/r/20260330030755.12856-1-zhenzhong.duan@intel.com Cc: stable@vger.kernel.org Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Pranjal Shrivastava <praan@google.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31BackMerge tag 'v7.0-rc6' into drm-nextDave Airlie
Linux 7.0-rc6 Requested by a few people on irc to resolve conflicts in other tress. Signed-off-by: Dave Airlie <airlied@redhat.com>
2026-03-27iommu/riscv: Fix signedness bugEthan Tidmore
The function platform_irq_count() returns negative error codes and iommu->irqs_count is an unsigned integer, so the check (iommu->irqs_count <= 0) is always impossible. Make the return value of platform_irq_count() be assigned to ret, check for error, and then assign iommu->irqs_count to ret. Detected by Smatch: drivers/iommu/riscv/iommu-platform.c:119 riscv_iommu_platform_probe() warn: 'iommu->irqs_count' unsigned <= 0 Signed-off-by: Ethan Tidmore <ethantidmore06@gmail.com> Fixes: 5c0ebbd3c6c6 ("iommu/riscv: Add RISC-V IOMMU platform device driver") Reviewed-by: Andrew Jones <andrew.jones@oss.qualcomm.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-03-27iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inlineSherry Yang
After enabling CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL, following build failure is observed under GCC 14.2.1: In function 'amdv1pt_install_leaf_entry', inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:650:3, inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1, inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9, inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:657:10, inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1: ././include/linux/compiler_types.h:706:45: error: call to '__compiletime_assert_71' declared with attribute error: FIELD_PREP: value too large for the field 706 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ...... drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP' 220 | FIELD_PREP(AMDV1PT_FMT_OA, | ^~~~~~~~~~ In the path '__do_map_single_page()', level 0 always invokes 'pt_install_leaf_entry(&pts, map->oa, PAGE_SHIFT, …)'. At runtime that lands in the 'if (oasz_lg2 == isz_lg2)' arm of 'amdv1pt_install_leaf_entry()'; the contiguous-only 'else' block is unreachable for 4 KiB pages. With CONFIG_GCOV_KERNEL + CONFIG_GCOV_PROFILE_ALL, the extra instrumentation changes GCC's inlining so that the "dead" 'else' branch still gets instantiated. The compiler constant-folds the contiguous OA expression, runs the 'FIELD_PREP()' compile-time check, and produces: FIELD_PREP: value too large for the field gcov-enabled builds therefore fail even though the code path never executes. Fix this by marking amdv1pt_install_leaf_entry as __always_inline. Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op") Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sherry Yang <sherry.yang@oracle.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>