<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/dma/direct.h, branch v7.1-rc5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>dma-mapping: introduce DMA_ATTR_CC_SHARED for shared memory</title>
<updated>2026-04-02T05:29:33+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@nvidia.com</email>
</author>
<published>2026-03-25T19:23:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f0548044a02630402d374df195ed3af4cc5e4711'/>
<id>f0548044a02630402d374df195ed3af4cc5e4711</id>
<content type='text'>
Current CC designs don't place a vIOMMU in front of untrusted devices.
Instead, the DMA API forces all untrusted device DMA through swiotlb
bounce buffers (is_swiotlb_force_bounce()) which copies data into
shared memory on behalf of the device.

When a caller has already arranged for the memory to be shared
via set_memory_decrypted(), the DMA API needs to know so it can map
directly using the unencrypted physical address rather than bounce
buffering. Following the pattern of DMA_ATTR_MMIO, add
DMA_ATTR_CC_SHARED for this purpose. Like the MMIO case, only the
caller knows what kind of memory it has and must inform the DMA API
for it to work correctly.

Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Acked-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260325192352.437608-2-jiri@resnulli.us
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current CC designs don't place a vIOMMU in front of untrusted devices.
Instead, the DMA API forces all untrusted device DMA through swiotlb
bounce buffers (is_swiotlb_force_bounce()) which copies data into
shared memory on behalf of the device.

When a caller has already arranged for the memory to be shared
via set_memory_decrypted(), the DMA API needs to know so it can map
directly using the unencrypted physical address rather than bounce
buffering. Following the pattern of DMA_ATTR_MMIO, add
DMA_ATTR_CC_SHARED for this purpose. Like the MMIO case, only the
caller knows what kind of memory it has and must inform the DMA API
for it to work correctly.

Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Acked-by: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260325192352.437608-2-jiri@resnulli.us
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'dma-mapping-7.0-2026-03-25' into dma-mapping-for-next</title>
<updated>2026-03-25T17:25:43+00:00</updated>
<author>
<name>Marek Szyprowski</name>
<email>m.szyprowski@samsung.com</email>
</author>
<published>2026-03-25T17:24:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=15d6dd1ed3d58656bd7b86e9303ad839d6913b2e'/>
<id>15d6dd1ed3d58656bd7b86e9303ad839d6913b2e</id>
<content type='text'>
dma-mapping fixes for Linux 7.0

A set of fixes for DMA-mapping subsystem, which resolve false-positive
warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon
Romanovsky) as well as a simple build fix (Miguel Ojeda).

Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dma-mapping fixes for Linux 7.0

A set of fixes for DMA-mapping subsystem, which resolve false-positive
warnings from KMSAN and DMA-API debug (Shigeru Yoshida and Leon
Romanovsky) as well as a simple build fix (Miguel Ojeda).

Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-direct: prevent SWIOTLB path when DMA_ATTR_REQUIRE_COHERENT is set</title>
<updated>2026-03-20T11:05:56+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2026-03-16T19:06:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2536617f20ddc7c2f4cef59b549aa45d166b03b1'/>
<id>2536617f20ddc7c2f4cef59b549aa45d166b03b1</id>
<content type='text'>
DMA_ATTR_REQUIRE_COHERENT indicates that SWIOTLB must not be used.
Ensure the SWIOTLB path is declined whenever the DMA direct path is
selected.

Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-5-1dde90a7f08b@nvidia.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
DMA_ATTR_REQUIRE_COHERENT indicates that SWIOTLB must not be used.
Ensure the SWIOTLB path is declined whenever the DMA direct path is
selected.

Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260316-dma-debug-overlap-v3-5-1dde90a7f08b@nvidia.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: Support batch mode for dma_direct_{map,unmap}_sg</title>
<updated>2026-03-13T22:47:48+00:00</updated>
<author>
<name>Barry Song</name>
<email>baohua@kernel.org</email>
</author>
<published>2026-02-28T22:13:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=661f8a193d48d123aedcbd401ace137333d02523'/>
<id>661f8a193d48d123aedcbd401ace137333d02523</id>
<content type='text'>
Extending these APIs with a flush argument:
dma_direct_unmap_phys(), dma_direct_map_phys(), and
dma_direct_sync_single_for_cpu(). For single-buffer cases, flush=true
would be used, while for SG cases flush=false would be used, followed
by a single flush after all cache operations are issued in
dma_direct_{map,unmap}_sg().

This ultimately benefits dma_map_sg() and dma_unmap_sg().

Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Ada Couprie Diaz &lt;ada.coupriediaz@arm.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tangquan Zheng &lt;zhengtangquan@oppo.com&gt;
Reviewed-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Tested-by: Xueyuan Chen &lt;xueyuan.chen21@gmail.com&gt;
Signed-off-by: Barry Song &lt;baohua@kernel.org&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260228221337.59951-1-21cnbao@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extending these APIs with a flush argument:
dma_direct_unmap_phys(), dma_direct_map_phys(), and
dma_direct_sync_single_for_cpu(). For single-buffer cases, flush=true
would be used, while for SG cases flush=false would be used, followed
by a single flush after all cache operations are issued in
dma_direct_{map,unmap}_sg().

This ultimately benefits dma_map_sg() and dma_unmap_sg().

Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Ada Couprie Diaz &lt;ada.coupriediaz@arm.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Tangquan Zheng &lt;zhengtangquan@oppo.com&gt;
Reviewed-by: Leon Romanovsky &lt;leon@kernel.org&gt;
Tested-by: Xueyuan Chen &lt;xueyuan.chen21@gmail.com&gt;
Signed-off-by: Barry Song &lt;baohua@kernel.org&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260228221337.59951-1-21cnbao@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: Separate DMA sync issuing and completion waiting</title>
<updated>2026-03-13T22:47:31+00:00</updated>
<author>
<name>Barry Song</name>
<email>baohua@kernel.org</email>
</author>
<published>2026-02-28T22:13:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d7eafe655b741dfc241d5b920f6d2cea45b568d9'/>
<id>d7eafe655b741dfc241d5b920f6d2cea45b568d9</id>
<content type='text'>
Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
always wait for the completion of each DMA buffer. That is,
issuing the DMA sync and waiting for completion is done in a
single API call.

For scatter-gather lists with multiple entries, this means
issuing and waiting is repeated for each entry, which can hurt
performance. Architectures like ARM64 may be able to issue all
DMA sync operations for all entries first and then wait for
completion together.

To address this, arch_sync_dma_for_* now batches DMA operations
and performs a flush afterward. On ARM64, the flush is implemented
with a dsb instruction in arch_sync_dma_flush(). On other
architectures, arch_sync_dma_flush() is currently a nop.

Cc: Leon Romanovsky &lt;leon@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Ada Couprie Diaz &lt;ada.coupriediaz@arm.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Stefano Stabellini &lt;sstabellini@kernel.org&gt;
Cc: Oleksandr Tyshchenko &lt;oleksandr_tyshchenko@epam.com&gt;
Cc: Tangquan Zheng &lt;zhengtangquan@oppo.com&gt;
Reviewed-by: Juergen Gross &lt;jgross@suse.com&gt; # drivers/xen/swiotlb-xen.c
Tested-by: Xueyuan Chen &lt;xueyuan.chen21@gmail.com&gt;
Signed-off-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260228221316.59934-1-21cnbao@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, arch_sync_dma_for_cpu and arch_sync_dma_for_device
always wait for the completion of each DMA buffer. That is,
issuing the DMA sync and waiting for completion is done in a
single API call.

For scatter-gather lists with multiple entries, this means
issuing and waiting is repeated for each entry, which can hurt
performance. Architectures like ARM64 may be able to issue all
DMA sync operations for all entries first and then wait for
completion together.

To address this, arch_sync_dma_for_* now batches DMA operations
and performs a flush afterward. On ARM64, the flush is implemented
with a dsb instruction in arch_sync_dma_flush(). On other
architectures, arch_sync_dma_flush() is currently a nop.

Cc: Leon Romanovsky &lt;leon@kernel.org&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Cc: Robin Murphy &lt;robin.murphy@arm.com&gt;
Cc: Ada Couprie Diaz &lt;ada.coupriediaz@arm.com&gt;
Cc: Ard Biesheuvel &lt;ardb@kernel.org&gt;
Cc: Marc Zyngier &lt;maz@kernel.org&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Ryan Roberts &lt;ryan.roberts@arm.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Joerg Roedel &lt;joro@8bytes.org&gt;
Cc: Stefano Stabellini &lt;sstabellini@kernel.org&gt;
Cc: Oleksandr Tyshchenko &lt;oleksandr_tyshchenko@epam.com&gt;
Cc: Tangquan Zheng &lt;zhengtangquan@oppo.com&gt;
Reviewed-by: Juergen Gross &lt;jgross@suse.com&gt; # drivers/xen/swiotlb-xen.c
Tested-by: Xueyuan Chen &lt;xueyuan.chen21@gmail.com&gt;
Signed-off-by: Barry Song &lt;baohua@kernel.org&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260228221316.59934-1-21cnbao@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: avoid random addr value print out on error path</title>
<updated>2026-02-23T07:26:54+00:00</updated>
<author>
<name>Jiri Pirko</name>
<email>jiri@nvidia.com</email>
</author>
<published>2026-02-09T15:38:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=47322c469d4a63ac45b705ca83680671ff71c975'/>
<id>47322c469d4a63ac45b705ca83680671ff71c975</id>
<content type='text'>
dma_addr is unitialized in dma_direct_map_phys() when swiotlb is forced
and DMA_ATTR_MMIO is set which leads to random value print out in
warning. Fix that by just returning DMA_MAPPING_ERROR.

Fixes: e53d29f957b3 ("dma-mapping: convert dma_direct_*map_page to be phys_addr_t based")
Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260209153809.250835-2-jiri@resnulli.us
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dma_addr is unitialized in dma_direct_map_phys() when swiotlb is forced
and DMA_ATTR_MMIO is set which leads to random value print out in
warning. Fix that by just returning DMA_MAPPING_ERROR.

Fixes: e53d29f957b3 ("dma-mapping: convert dma_direct_*map_page to be phys_addr_t based")
Signed-off-by: Jiri Pirko &lt;jiri@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260209153809.250835-2-jiri@resnulli.us
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: Remove dma_mark_clean (again)</title>
<updated>2026-01-07T23:19:08+00:00</updated>
<author>
<name>Robin Murphy</name>
<email>robin.murphy@arm.com</email>
</author>
<published>2026-01-06T19:27:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8a840ab0567ff2b7d382694ba24a58a893d2c7af'/>
<id>8a840ab0567ff2b7d382694ba24a58a893d2c7af</id>
<content type='text'>
With IA-64 now gone, there are no users of the dma_mark_clean hook,
so we can retire it for good.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/c004927f01962726ff1dcf94d1b4efff84db805a.1767727673.git.robin.murphy@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With IA-64 now gone, there are no users of the dma_mark_clean hook,
so we can retire it for good.

Signed-off-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/c004927f01962726ff1dcf94d1b4efff84db805a.1767727673.git.robin.murphy@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: convert dma_direct_*map_page to be phys_addr_t based</title>
<updated>2025-09-11T22:18:20+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2025-09-09T13:27:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e53d29f957b36ba1666331956c6ccb047bb157d2'/>
<id>e53d29f957b36ba1666331956c6ccb047bb157d2</id>
<content type='text'>
Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/bb15a22f76dc2e26683333ff54e789606cfbfcf0.1757423202.git.leonro@nvidia.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Reviewed-by: Jason Gunthorpe &lt;jgg@nvidia.com&gt;
Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/bb15a22f76dc2e26683333ff54e789606cfbfcf0.1757423202.git.leonro@nvidia.com
</pre>
</div>
</content>
</entry>
<entry>
<title>swiotlb: reduce swiotlb pool lookups</title>
<updated>2024-07-10T05:59:03+00:00</updated>
<author>
<name>Michael Kelley</name>
<email>mhklinux@outlook.com</email>
</author>
<published>2024-07-08T19:41:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7296f2301a057493e97b07739213c6e864f76891'/>
<id>7296f2301a057493e97b07739213c6e864f76891</id>
<content type='text'>
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
be found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
 1. swiotlb_map -&gt; swiotlb_tbl_map_single -&gt; swiotlb_bounce

dma_direct_unmap_page:
 2. dma_direct_sync_single_for_cpu -&gt; is_swiotlb_buffer
 3. dma_direct_sync_single_for_cpu -&gt; swiotlb_sync_single_for_cpu -&gt;
	swiotlb_bounce
 4. is_swiotlb_buffer
 5. swiotlb_tbl_unmap_single -&gt; swiotlb_del_transient
 6. swiotlb_tbl_unmap_single -&gt; swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.

There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
is a swiotlb buffer before calling a swiotlb function. To reduce code
duplication in getting the pool pointer and passing it as an argument,
introduce inline wrappers for this pattern. The generated code is
essentially unchanged.

Since is_swiotlb_buffer() no longer returns a boolean, rename some
functions to reflect the change:

 * swiotlb_find_pool() becomes __swiotlb_find_pool()
 * is_swiotlb_buffer() becomes swiotlb_find_pool()
 * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()

With these changes, a round-trip map/unmap pair requires only 2 pool
lookups (listed using the new names and wrappers):

dma_direct_unmap_page:
 1. dma_direct_sync_single_for_cpu -&gt; swiotlb_find_pool
 2. swiotlb_tbl_unmap_single -&gt; swiotlb_find_pool

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
__swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Reviewed-by: Petr Tesarik &lt;petr@tesarici.cz&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
be found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
 1. swiotlb_map -&gt; swiotlb_tbl_map_single -&gt; swiotlb_bounce

dma_direct_unmap_page:
 2. dma_direct_sync_single_for_cpu -&gt; is_swiotlb_buffer
 3. dma_direct_sync_single_for_cpu -&gt; swiotlb_sync_single_for_cpu -&gt;
	swiotlb_bounce
 4. is_swiotlb_buffer
 5. swiotlb_tbl_unmap_single -&gt; swiotlb_del_transient
 6. swiotlb_tbl_unmap_single -&gt; swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.

There are 9 occurrences of is_swiotlb_buffer() used to test if a buffer
is a swiotlb buffer before calling a swiotlb function. To reduce code
duplication in getting the pool pointer and passing it as an argument,
introduce inline wrappers for this pattern. The generated code is
essentially unchanged.

Since is_swiotlb_buffer() no longer returns a boolean, rename some
functions to reflect the change:

 * swiotlb_find_pool() becomes __swiotlb_find_pool()
 * is_swiotlb_buffer() becomes swiotlb_find_pool()
 * is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()

With these changes, a round-trip map/unmap pair requires only 2 pool
lookups (listed using the new names and wrappers):

dma_direct_unmap_page:
 1. dma_direct_sync_single_for_cpu -&gt; swiotlb_find_pool
 2. swiotlb_tbl_unmap_single -&gt; swiotlb_find_pool

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
__swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley &lt;mhklinux@outlook.com&gt;
Reviewed-by: Petr Tesarik &lt;petr@tesarici.cz&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: fix dma_addressing_limited() if dma_range_map can't cover all system RAM</title>
<updated>2023-11-06T07:38:16+00:00</updated>
<author>
<name>Jia He</name>
<email>justin.he@arm.com</email>
</author>
<published>2023-10-28T10:20:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a409d9600959f3c4b2a48946304c8e01b8d04072'/>
<id>a409d9600959f3c4b2a48946304c8e01b8d04072</id>
<content type='text'>
There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewhere lower down. Then it prevents
the nvme device dma mapping in the checking path of phys_to_dma() and
causes the hangs at boot.

E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
 Method (_DMA, 0, Serialized)  // _DMA: Direct Memory Access
            {
                Name (RBUF, ResourceTemplate ()
                {
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000000000000000, // Range Minimum
                        0x00000000FFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000000100000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000006010200000, // Range Minimum
                        0x000000602FFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x000000001FE00000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x00000060F0000000, // Range Minimum
                        0x00000060FFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000000010000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000007000000000, // Range Minimum
                        0x000003FFFFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000039000000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                })

But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.

Fix it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.

Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is an unusual case that the range map covers right up to the top
of system RAM, but leaves a hole somewhere lower down. Then it prevents
the nvme device dma mapping in the checking path of phys_to_dma() and
causes the hangs at boot.

E.g. On an Armv8 Ampere server, the dsdt ACPI table is:
 Method (_DMA, 0, Serialized)  // _DMA: Direct Memory Access
            {
                Name (RBUF, ResourceTemplate ()
                {
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000000000000000, // Range Minimum
                        0x00000000FFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000000100000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000006010200000, // Range Minimum
                        0x000000602FFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x000000001FE00000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x00000060F0000000, // Range Minimum
                        0x00000060FFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000000010000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                    QWordMemory (ResourceConsumer, PosDecode, MinFixed,
MaxFixed, Cacheable, ReadWrite,
                        0x0000000000000000, // Granularity
                        0x0000007000000000, // Range Minimum
                        0x000003FFFFFFFFFF, // Range Maximum
                        0x0000000000000000, // Translation Offset
                        0x0000039000000000, // Length
                        ,, , AddressRangeMemory, TypeStatic)
                })

But the System RAM ranges are:
cat /proc/iomem |grep -i ram
90000000-91ffffff : System RAM
92900000-fffbffff : System RAM
880000000-fffffffff : System RAM
8800000000-bff5990fff : System RAM
bff59d0000-bff5a4ffff : System RAM
bff8000000-bfffffffff : System RAM
So some RAM ranges are out of dma_range_map.

Fix it by checking whether each of the system RAM resources can be
properly encompassed within the dma_range_map.

Signed-off-by: Jia He &lt;justin.he@arm.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
