<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/iommu/generic_pt, branch linux-rolling-stable</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>iommupt: Fix the end_index calculation in __map_range_leaf()</title>
<updated>2026-06-01T15:54:45+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-05-12T16:46:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a121bda169146f1fefb5d84e9a073cede8cf7a9d'/>
<id>a121bda169146f1fefb5d84e9a073cede8cf7a9d</id>
<content type='text'>
[ Upstream commit 58829512ad461af8f35941069c209941e3a97b65 ]

Sashiko noticed a mismatch of units in this math: num_leaves is
actually the number of leaf *entries* (so a 16-item contiguous leaf
is one num_leaves), while index is in items. The mismatch in maths
causes __map_range_leaf() to exit early instead of efficiently
filling a larger range of contiguous PTEs.

The early exit is caught by the functions above and then
__map_range_leaf() is re-invoked, so there is no functional issue.

Correct the misuse of units by adjusting num_leaves with the leaf
size and avoid the performance cost of looping externally.

There are also some mismatched types for num_leaves; simplify
things to remove the duplicated calculations.

Fixes: d6c65b0fd621 ("iommupt: Avoid rewalking during map")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewd-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Tested-by: Josua Mayer &lt;josua@solid-run.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 58829512ad461af8f35941069c209941e3a97b65 ]

Sashiko noticed a mismatch of units in this math: num_leaves is
actually the number of leaf *entries* (so a 16-item contiguous leaf
is one num_leaves), while index is in items. The mismatch in maths
causes __map_range_leaf() to exit early instead of efficiently
filling a larger range of contiguous PTEs.

The early exit is caught by the functions above and then
__map_range_leaf() is re-invoked, so there is no functional issue.

Correct the misuse of units by adjusting num_leaves with the leaf
size and avoid the performance cost of looping externally.

There are also some mismatched types for num_leaves; simplify
things to remove the duplicated calculations.

Fixes: d6c65b0fd621 ("iommupt: Avoid rewalking during map")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewd-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Tested-by: Josua Mayer &lt;josua@solid-run.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Check for missing PAGE_SIZE in the pgsize_bitmap</title>
<updated>2026-06-01T15:54:45+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-05-12T16:46:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=00850f41da24423587abd6124a790dd4f12bcef3'/>
<id>00850f41da24423587abd6124a790dd4f12bcef3</id>
<content type='text'>
[ Upstream commit 8ef3f77c440005c7f04229a75976bfc078364247 ]

Sashiko pointed out that the driver could drop PAGE_SIZE from the
pgsize_bitmap. That is technically allowed but nothing does it, and
such an iommu_domain would not be used with the DMA API today.

Still, it is against the design and it is trivial to fix up. Lift
the PT_WARN_ON to the if branch and just skip the fast path.

Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Josua Mayer &lt;josua@solid-run.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 8ef3f77c440005c7f04229a75976bfc078364247 ]

Sashiko pointed out that the driver could drop PAGE_SIZE from the
pgsize_bitmap. That is technically allowed but nothing does it, and
such an iommu_domain would not be used with the DMA API today.

Still, it is against the design and it is trivial to fix up. Lift
the PT_WARN_ON to the if branch and just skip the fast path.

Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Pranjal Shrivastava &lt;praan@google.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Tested-by: Josua Mayer &lt;josua@solid-run.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Avoid rewalking during map</title>
<updated>2026-06-01T15:54:44+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-02-27T19:30:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=92ec4e9185f374a0a9450835d7d761513e2e41b1'/>
<id>92ec4e9185f374a0a9450835d7d761513e2e41b1</id>
<content type='text'>
[ Upstream commit d6c65b0fd6218bd21ed0be7a8d3218e8f6dc91de ]

Currently the core code provides a simplified interface to drivers where
it fragments a requested multi-page map into single page size steps after
doing all the calculations to figure out what page size is
appropriate. Each step rewalks the page tables from the start.

Since iommupt has a single implementation of the mapping algorithm it can
internally compute each step as it goes while retaining its current
position in the walk.

Add a new function pt_pgsz_count() which computes the same page size
fragement of a large mapping operations.

Compute the next fragment when all the leaf entries of the current
fragement have been written, then continue walking from the current
point.

The function pointer is run through pt_iommu_ops instead of
iommu_domain_ops to discourage using it outside iommupt. All drivers with
their own page tables should continue to use the simplified map_pages()
style interfaces.

Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Stable-dep-of: 0735c54804c7 ("iommu: Handle unmap error when iommu_debug is enabled")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit d6c65b0fd6218bd21ed0be7a8d3218e8f6dc91de ]

Currently the core code provides a simplified interface to drivers where
it fragments a requested multi-page map into single page size steps after
doing all the calculations to figure out what page size is
appropriate. Each step rewalks the page tables from the start.

Since iommupt has a single implementation of the mapping algorithm it can
internally compute each step as it goes while retaining its current
position in the walk.

Add a new function pt_pgsz_count() which computes the same page size
fragement of a large mapping operations.

Compute the next fragment when all the leaf entries of the current
fragement have been written, then continue walking from the current
point.

The function pointer is run through pt_iommu_ops instead of
iommu_domain_ops to discourage using it outside iommupt. All drivers with
their own page tables should continue to use the simplified map_pages()
style interfaces.

Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Stable-dep-of: 0735c54804c7 ("iommu: Handle unmap error when iommu_debug is enabled")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Directly call iommupt's unmap_range()</title>
<updated>2026-06-01T15:54:44+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-02-27T19:30:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=edf82f7679d559e9fdd790ae0ad38c9ae4b4408a'/>
<id>edf82f7679d559e9fdd790ae0ad38c9ae4b4408a</id>
<content type='text'>
[ Upstream commit 99fb8afa16add85ed016baee9735231bca0c32b4 ]

The common algorithm in iommupt does not require the iommu_pgsize()
calculations, it can directly unmap any arbitrary range. Add a new function
pointer to directly call an iommupt unmap_range op and make
__iommu_unmap() call it directly.

Gives about a 5% gain on single page unmappings.

The function pointer is run through pt_iommu_ops instead of
iommu_domain_ops to discourage using it outside iommupt. All drivers with
their own page tables should continue to use the simplified
map/unmap_pages() style interfaces.

Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Stable-dep-of: 0735c54804c7 ("iommu: Handle unmap error when iommu_debug is enabled")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 99fb8afa16add85ed016baee9735231bca0c32b4 ]

The common algorithm in iommupt does not require the iommu_pgsize()
calculations, it can directly unmap any arbitrary range. Add a new function
pointer to directly call an iommupt unmap_range op and make
__iommu_unmap() call it directly.

Gives about a 5% gain on single page unmappings.

The function pointer is run through pt_iommu_ops instead of
iommu_domain_ops to discourage using it outside iommupt. All drivers with
their own page tables should continue to use the simplified
map/unmap_pages() style interfaces.

Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Stable-dep-of: 0735c54804c7 ("iommu: Handle unmap error when iommu_debug is enabled")
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt/amdv1: mark amdv1pt_install_leaf_entry as __always_inline</title>
<updated>2026-03-27T08:41:48+00:00</updated>
<author>
<name>Sherry Yang</name>
<email>sherry.yang@oracle.com</email>
</author>
<published>2026-03-26T16:17:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8b72aa5704c77380742346d4ac755b074b7f9eaa'/>
<id>8b72aa5704c77380742346d4ac755b074b7f9eaa</id>
<content type='text'>
After enabling CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL, following
build failure is observed under GCC 14.2.1:

In function 'amdv1pt_install_leaf_entry',
    inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:650:3,
    inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1,
    inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9,
    inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:657:10,
    inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1:
././include/linux/compiler_types.h:706:45: error: call to '__compiletime_assert_71' declared with attribute error: FIELD_PREP: value too large for the field
  706 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |

......

drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP'
  220 |                          FIELD_PREP(AMDV1PT_FMT_OA,
      |                          ^~~~~~~~~~

In the path '__do_map_single_page()', level 0 always invokes
'pt_install_leaf_entry(&amp;pts, map-&gt;oa, PAGE_SHIFT, …)'. At runtime that
lands in the 'if (oasz_lg2 == isz_lg2)' arm of 'amdv1pt_install_leaf_entry()';
the contiguous-only 'else' block is unreachable for 4 KiB pages.

With CONFIG_GCOV_KERNEL + CONFIG_GCOV_PROFILE_ALL, the extra
instrumentation changes GCC's inlining so that the "dead" 'else' branch
still gets instantiated. The compiler constant-folds the contiguous OA
expression, runs the 'FIELD_PREP()' compile-time check, and produces:

    FIELD_PREP: value too large for the field

gcov-enabled builds therefore fail even though the code path never executes.

Fix this by marking amdv1pt_install_leaf_entry as __always_inline.

Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op")
Suggested-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Sherry Yang &lt;sherry.yang@oracle.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After enabling CONFIG_GCOV_KERNEL and CONFIG_GCOV_PROFILE_ALL, following
build failure is observed under GCC 14.2.1:

In function 'amdv1pt_install_leaf_entry',
    inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:650:3,
    inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1,
    inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9,
    inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:657:10,
    inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:661:1:
././include/linux/compiler_types.h:706:45: error: call to '__compiletime_assert_71' declared with attribute error: FIELD_PREP: value too large for the field
  706 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |

......

drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP'
  220 |                          FIELD_PREP(AMDV1PT_FMT_OA,
      |                          ^~~~~~~~~~

In the path '__do_map_single_page()', level 0 always invokes
'pt_install_leaf_entry(&amp;pts, map-&gt;oa, PAGE_SHIFT, …)'. At runtime that
lands in the 'if (oasz_lg2 == isz_lg2)' arm of 'amdv1pt_install_leaf_entry()';
the contiguous-only 'else' block is unreachable for 4 KiB pages.

With CONFIG_GCOV_KERNEL + CONFIG_GCOV_PROFILE_ALL, the extra
instrumentation changes GCC's inlining so that the "dead" 'else' branch
still gets instantiated. The compiler constant-folds the contiguous OA
expression, runs the 'FIELD_PREP()' compile-time check, and produces:

    FIELD_PREP: value too large for the field

gcov-enabled builds therefore fail even though the code path never executes.

Fix this by marking amdv1pt_install_leaf_entry as __always_inline.

Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op")
Suggested-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Sherry Yang &lt;sherry.yang@oracle.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Fix short gather if the unmap goes into a large mapping</title>
<updated>2026-03-27T08:07:13+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-03-02T22:22:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ee6e69d032550687a3422504bfca3f834c7b5061'/>
<id>ee6e69d032550687a3422504bfca3f834c7b5061</id>
<content type='text'>
unmap has the odd behavior that it can unmap more than requested if the
ending point lands within the middle of a large or contiguous IOPTE.

In this case the gather should flush everything unmapped which can be
larger than what was requested to be unmapped. The gather was only
flushing the range requested to be unmapped, not extending to the extra
range, resulting in a short invalidation if the caller hits this special
condition.

This was found by the new invalidation/gather test I am adding in
preparation for ARMv8. Claude deduced the root cause.

As far as I remember nothing relies on unmapping a large entry, so this is
likely not a triggerable bug.

Cc: stable@vger.kernel.org
Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewed-by: Vasant Hegde &lt;vasant.hegde@amd.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
unmap has the odd behavior that it can unmap more than requested if the
ending point lands within the middle of a large or contiguous IOPTE.

In this case the gather should flush everything unmapped which can be
larger than what was requested to be unmapped. The gather was only
flushing the range requested to be unmapped, not extending to the extra
range, resulting in a short invalidation if the caller hits this special
condition.

This was found by the new invalidation/gather test I am adding in
preparation for ARMv8. Claude deduced the root cause.

As far as I remember nothing relies on unmapping a large entry, so this is
likely not a triggerable bug.

Cc: stable@vger.kernel.org
Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op")
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Reviewed-by: Vasant Hegde &lt;vasant.hegde@amd.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branches 'fixes', 'arm/smmu/updates', 'intel/vt-d', 'amd/amd-vi' and 'core' into next</title>
<updated>2026-02-06T10:10:40+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>joerg.roedel@amd.com</email>
</author>
<published>2026-02-06T10:10:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ad095636604604b3574c1920260b1360c25ced6f'/>
<id>ad095636604604b3574c1920260b1360c25ced6f</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Always add IOVA range to iotlb_gather in gather_range_pages()</title>
<updated>2026-02-03T13:36:21+00:00</updated>
<author>
<name>Yu Zhang</name>
<email>zhangyu1@linux.microsoft.com</email>
</author>
<published>2026-02-03T08:29:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b48ca920613858b477f75946907e72c74570af05'/>
<id>b48ca920613858b477f75946907e72c74570af05</id>
<content type='text'>
Add current (iova, len) to the iotlb gather, regardless of the setting
of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.

In gather_range_pages(), the current IOVA range is only added to
iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with
NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather
will stay empty (start=ULONG_MAX, end=0) after initialization, and the
current (iova, len) will not be added to the iotlb_gather, causing
subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong
parameters (e.g., amd_iommu_iotlb_sync() computes size from
gather-&gt;end - gather-&gt;start + 1, leading to an invalid range).

The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain
unchanged: when the new range is disjoint from the existing gather,
we still sync first and then add the new range, so semantics for
NO_GAPS are preserved.

Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Yu Zhang &lt;zhangyu1@linux.microsoft.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add current (iova, len) to the iotlb gather, regardless of the setting
of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.

In gather_range_pages(), the current IOVA range is only added to
iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with
NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather
will stay empty (start=ULONG_MAX, end=0) after initialization, and the
current (iova, len) will not be added to the iotlb_gather, causing
subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong
parameters (e.g., amd_iommu_iotlb_sync() computes size from
gather-&gt;end - gather-&gt;start + 1, leading to an invalid range).

The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain
unchanged: when the new range is disjoint from the existing gather,
we still sync first and then add the new range, so semantics for
NO_GAPS are preserved.

Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Yu Zhang &lt;zhangyu1@linux.microsoft.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Only cache flush memory changed by unmap</title>
<updated>2026-01-28T14:14:17+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-01-24T21:00:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5815d9303c67cef5f47cd01e73b671e6b9c40ef3'/>
<id>5815d9303c67cef5f47cd01e73b671e6b9c40ef3</id>
<content type='text'>
The cache flush was happening on every level across the whole range of
iteration, even if no leafs or tables were cleared. Instead flush only the
sub range that was actually written.

Overflushing isn't a correctness problem but it does impact the
performance of unmap.

After this series the performance compared to the original VT-d
implementation with cache flushing turned on is:

map_pages
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,    253,266   ,     213,227     ,   6.06
     2^21,    246,244   ,     221,219     ,   0.00
     2^30,    231,240   ,     209,217     ,   3.03
 256*2^12,   2604,2668  ,    2415,2540    ,   4.04
 256*2^21,   2495,2824  ,    2390,2734    ,  12.12
 256*2^30,   2542,2845  ,    2380,2718    ,  12.12

unmap_pages
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,    259,292   ,     222,251     ,  11.11
     2^21,    255,259   ,     227,236     ,   3.03
     2^30,    238,254   ,     217,230     ,   5.05
 256*2^12,   2751,2620  ,    2417,2437    ,   0.00
 256*2^21,   2461,2526  ,    2377,2423    ,   1.01
 256*2^30,   2498,2543  ,    2370,2404    ,   1.01

Fixes: efa03dab7ce4 ("iommupt: Flush the CPU cache after any writes to the page table")
Reported-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Closes: https://lore.kernel.org/all/20260121130233.257428-1-francois.dugast@intel.com/
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Tested-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cache flush was happening on every level across the whole range of
iteration, even if no leafs or tables were cleared. Instead flush only the
sub range that was actually written.

Overflushing isn't a correctness problem but it does impact the
performance of unmap.

After this series the performance compared to the original VT-d
implementation with cache flushing turned on is:

map_pages
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,    253,266   ,     213,227     ,   6.06
     2^21,    246,244   ,     221,219     ,   0.00
     2^30,    231,240   ,     209,217     ,   3.03
 256*2^12,   2604,2668  ,    2415,2540    ,   4.04
 256*2^21,   2495,2824  ,    2390,2734    ,  12.12
 256*2^30,   2542,2845  ,    2380,2718    ,  12.12

unmap_pages
   pgsz  ,avg new,old ns, min new,old ns  , min % (+ve is better)
     2^12,    259,292   ,     222,251     ,  11.11
     2^21,    255,259   ,     227,236     ,   3.03
     2^30,    238,254   ,     217,230     ,   5.05
 256*2^12,   2751,2620  ,    2417,2437    ,   0.00
 256*2^21,   2461,2526  ,    2377,2423    ,   1.01
 256*2^30,   2498,2543  ,    2370,2404    ,   1.01

Fixes: efa03dab7ce4 ("iommupt: Flush the CPU cache after any writes to the page table")
Reported-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Closes: https://lore.kernel.org/all/20260121130233.257428-1-francois.dugast@intel.com/
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Reviewed-by: Lu Baolu &lt;baolu.lu@linux.intel.com&gt;
Tested-by: Francois Dugast &lt;francois.dugast@intel.com&gt;
Reviewed-by: Kevin Tian &lt;kevin.tian@intel.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>iommupt: Make it clearer to the compiler that pts.level == 0 for single page</title>
<updated>2026-01-20T09:18:04+00:00</updated>
<author>
<name>Jason Gunthorpe</name>
<email>jgg@nvidia.com</email>
</author>
<published>2026-01-20T00:19:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=98d5110f90ae0dbc5f2f13f033e06f6d57009e0d'/>
<id>98d5110f90ae0dbc5f2f13f033e06f6d57009e0d</id>
<content type='text'>
Older versions of gcc and clang sometimes get tripped up by the build time
assertion in FIELD_PREP because they can see that the argument to
FIELD_PREP is constant but can't see that the if condition protecting it
is also a constant false.

   In file included from &lt;command-line&gt;:
   In function 'amdv1pt_install_leaf_entry',
       inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:651:3,
       inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1,
       inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9,
       inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:658:10,
       inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1:
   ././include/linux/compiler_types.h:631:45: error: call to '__compiletime_assert_251' declared with attribute error: FIELD_PREP: value too large for the field
     631 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	 |                                             ^
   ././include/linux/compiler_types.h:612:25: note: in definition of macro '__compiletime_assert'
     612 |                         prefix ## suffix();                             \
	 |                         ^~~~~~
   ././include/linux/compiler_types.h:631:9: note: in expansion of macro '_compiletime_assert'
     631 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	 |         ^~~~~~~~~~~~~~~~~~~
   ./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
	 |                                     ^~~~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:69:17: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      69 |                 BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ?           \
	 |                 ^~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:90:17: note: in expansion of macro '__BF_FIELD_CHECK_MASK'
      90 |                 __BF_FIELD_CHECK_MASK(mask, val, pfx);                  \
	 |                 ^~~~~~~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:137:17: note: in expansion of macro '__FIELD_PREP'
     137 |                 __FIELD_PREP(_mask, _val, "FIELD_PREP: ");              \
	 |                 ^~~~~~~~~~~~
   drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP'
     220 |                          FIELD_PREP(AMDV1PT_FMT_OA,
	 |                          ^~~~~~~~~~

Changing the caller to check pts.level == 0 avoids demanding a bit of
complex reasoning from the compiler that pts.level == level == 0. Instead
the compiler sees that pt_install_leaf_entry() is called with a constant
pts.level == 0 which makes it more reliable to see the constant false in
the if.

Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op")
Reported-by: Chunyu Hu &lt;chuhu@redhat.com&gt;
Closes: https://lore.kernel.org/all/aUn9uGPCooqB-RIF@gmail.com/
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Older versions of gcc and clang sometimes get tripped up by the build time
assertion in FIELD_PREP because they can see that the argument to
FIELD_PREP is constant but can't see that the if condition protecting it
is also a constant false.

   In file included from &lt;command-line&gt;:
   In function 'amdv1pt_install_leaf_entry',
       inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:651:3,
       inlined from '__map_single_page0' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1,
       inlined from 'pt_descend' at drivers/iommu/generic_pt/fmt/../pt_iter.h:391:9,
       inlined from '__do_map_single_page' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:658:10,
       inlined from '__map_single_page1.constprop' at drivers/iommu/generic_pt/fmt/../iommu_pt.h:662:1:
   ././include/linux/compiler_types.h:631:45: error: call to '__compiletime_assert_251' declared with attribute error: FIELD_PREP: value too large for the field
     631 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	 |                                             ^
   ././include/linux/compiler_types.h:612:25: note: in definition of macro '__compiletime_assert'
     612 |                         prefix ## suffix();                             \
	 |                         ^~~~~~
   ././include/linux/compiler_types.h:631:9: note: in expansion of macro '_compiletime_assert'
     631 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
	 |         ^~~~~~~~~~~~~~~~~~~
   ./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
	 |                                     ^~~~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:69:17: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      69 |                 BUILD_BUG_ON_MSG(__builtin_constant_p(_val) ?           \
	 |                 ^~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:90:17: note: in expansion of macro '__BF_FIELD_CHECK_MASK'
      90 |                 __BF_FIELD_CHECK_MASK(mask, val, pfx);                  \
	 |                 ^~~~~~~~~~~~~~~~~~~~~
   ./include/linux/bitfield.h:137:17: note: in expansion of macro '__FIELD_PREP'
     137 |                 __FIELD_PREP(_mask, _val, "FIELD_PREP: ");              \
	 |                 ^~~~~~~~~~~~
   drivers/iommu/generic_pt/fmt/amdv1.h:220:26: note: in expansion of macro 'FIELD_PREP'
     220 |                          FIELD_PREP(AMDV1PT_FMT_OA,
	 |                          ^~~~~~~~~~

Changing the caller to check pts.level == 0 avoids demanding a bit of
complex reasoning from the compiler that pts.level == level == 0. Instead
the compiler sees that pt_install_leaf_entry() is called with a constant
pts.level == 0 which makes it more reliable to see the constant false in
the if.

Fixes: dcd6a011a8d5 ("iommupt: Add map_pages op")
Reported-by: Chunyu Hu &lt;chuhu@redhat.com&gt;
Closes: https://lore.kernel.org/all/aUn9uGPCooqB-RIF@gmail.com/
Signed-off-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
