<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/dma, branch v7.2-rc1</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'dma-mapping-7.2-2026-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux</title>
<updated>2026-06-17T19:20:21+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-06-17T19:20:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4cc14386e35030d016478b4ab9b10a6a95727003'/>
<id>4cc14386e35030d016478b4ab9b10a6a95727003</id>
<content type='text'>
Pull dma-mapping updates from Marek Szyprowski:

 - added checks for DMA attributes in the debug code, especially to
   ensure that mappings are created and released with matching
   attributes (Leon Romanovsky)

 - better default configuration for CMA on NUMA machines (Feng Tang)

 - code cleanup in dma benchmark tool (Rosen Penev)

* tag 'dma-mapping-7.2-2026-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma: map_benchmark: turn dma_sg_map_param buf into a flexible array
  dma-contiguous: simplify numa cma area handling
  dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly
  dma-debug: Ensure mappings are created and released with matching attributes
  dma-debug: Feed DMA attribute for unmapping flows too
  dma-debug: Record DMA attributes in debug entry
  dma-debug: Remove unused DMA attribute parameter
  ntb: Use consistent DMA attributes when freeing DMA mappings
  ntb: Store original DMA address for future release
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull dma-mapping updates from Marek Szyprowski:

 - added checks for DMA attributes in the debug code, especially to
   ensure that mappings are created and released with matching
   attributes (Leon Romanovsky)

 - better default configuration for CMA on NUMA machines (Feng Tang)

 - code cleanup in dma benchmark tool (Rosen Penev)

* tag 'dma-mapping-7.2-2026-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma: map_benchmark: turn dma_sg_map_param buf into a flexible array
  dma-contiguous: simplify numa cma area handling
  dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly
  dma-debug: Ensure mappings are created and released with matching attributes
  dma-debug: Feed DMA attribute for unmapping flows too
  dma-debug: Record DMA attributes in debug entry
  dma-debug: Remove unused DMA attribute parameter
  ntb: Use consistent DMA attributes when freeing DMA mappings
  ntb: Store original DMA address for future release
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core</title>
<updated>2026-06-15T07:11:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-06-15T07:11:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=36808d5e983985bbda87e01059cccc071fe3ec8d'/>
<id>36808d5e983985bbda87e01059cccc071fe3ec8d</id>
<content type='text'>
Pull driver core updates from Danilo Krummrich:
 "Deferred probe:
   - Fix race where deferred probe timeout work could be permanently
     canceled by using mod_delayed_work()
   - Fix missing jiffies conversion in deferred_probe_extend_timeout()
   - Guard timeout extension with delayed_work_pending() to prevent
     premature firing
   - Use system_percpu_wq instead of the deprecated system_wq
   - Update deferred_probe_timeout documentation

  device:
   - Replace direct struct device bitfield access (can_match, dma_iommu,
     dma_skip_sync, dma_ops_bypass, state_synced, dma_coherent,
     of_node_reused, offline, offline_disabled) with flag-based
     accessors using bit operations
   - Reject devices with unregistered buses
   - Delete unused DEVICE_ATTR_PREALLOC()
   - Add low-level device attribute macros with const show/store
     callbacks, allowing device attributes to reside in read-only memory
   - Move core device attributes to read-only memory
   - Constify group array pointers in driver_add_groups() /
     driver_remove_groups(), struct bus_type, and struct device_driver

  device property:
   - Fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
   - Initialize all fields of fwnode_handle in fwnode_init()
   - Provide swnode_get()/swnode_put() wrappers around kobject_get/put()
   - Allow passing struct software_node_ref_args pointers directly to
     PROPERTY_ENTRY_REF()

  driver_override:
   - Migrate amba, cdx, vmbus, and rpmsg to the generic driver_override
     infrastructure, fixing a UAF from unsynchronized access to
     driver_override in bus match() callbacks
   - Remove the now-unused driver_set_override()

  firmware loader:
   - Fix recursive lock deadlock in device_cache_fw_images() when async
     work falls back to synchronous execution
   - Fix device reference leak in firmware_upload_register()

  platform:
   - Pass KBUILD_MODNAME through the platform driver registration macro
     to create module symlinks in sysfs for built-in drivers; move
     module_kset initialization to a pure_initcall and tegra cbb
     registration to core_initcall to ensure correct ordering
   - Pass THIS_MODULE implicitly through a coresight_init_driver() macro

  sysfs:
   - Upgrade OOB write detection in sysfs_kf_seq_show() from printk to
     WARN
   - Add return value clamping to sysfs_kf_read()

  Rust:
   - ACPI:

     Fix missing match data for PRP0001 by exporting
     acpi_of_match_device()

   - Auxiliary:

     Replace drvdata() with dedicated registration data on
     auxiliary_device. drvdata() exposed the driver's bus device private
     data beyond the driver's own scope, creating ordering constraints
     and forcing the data to outlive all registrations that access it.
     Registration data is instead scoped structurally to the
     Registration object, making lifecycle ordering enforced by
     construction rather than convention.

   - Rust-native device driver lifetimes (HRT):

     Allow Rust device drivers to carry a lifetime parameter on their
     bus device private data, tied to the device binding scope -- the
     interval during which a bus device is bound to a driver. Device
     resources like pci::Bar&lt;'a&gt; and IoMem&lt;'a&gt; can be stored directly in
     the driver's bus device private data with a lifetime bounded by the
     binding scope, so the compiler enforces at build time that they do
     not outlive the binding. This removes Devres indirection from every
     access site and eliminates try_access() failure paths in
     destructors.

     Bus driver traits use a Generic Associated Type (GAT) Data&lt;'bound&gt;
     to introduce the lifetime on the private data, rather than
     parameterizing the Driver trait itself. Auxiliary registration
     data, where the lifetime is not introduced by a trait callback but
     must be threaded through Registration, uses the ForLt trait (a
     type-level abstraction for types generic over a lifetime).

  Misc:
   - Fix DT overlayed devices not probing by reverting the broken
     treewide overlay fix and re-running fw_devlink consumer pickup when
     an overlay is applied to a bound device
   - Use root_device_register() for faux bus root device; add sanity
     check for failed bus init
   - Fix dev_has_sync_state() data race with READ_ONCE() and move it to
     base.h
   - Avoid spurious device_links warning when removing a device while
     its supplier is unbinding
   - Switch ISA bus to dynamic root device
   - Fix suspicious RCU usage in kernfs_put()
   - Remove devcoredump exit callback
   - Constify devfreq_event_class"

* tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core: (81 commits)
  software node: allow passing reference args to PROPERTY_ENTRY_REF()
  driver core: platform: set mod_name in driver registration
  coresight: pass THIS_MODULE implicitly through a macro
  kernel: param: initialize module_kset in a pure_initcall
  soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
  firmware_loader: Fix recursive lock in device_cache_fw_images()
  driver core: Use system_percpu_wq instead of system_wq
  driver core: remove driver_set_override()
  rpmsg: use generic driver_override infrastructure
  Drivers: hv: vmbus: use generic driver_override infrastructure
  cdx: use generic driver_override infrastructure
  amba: use generic driver_override infrastructure
  rust: devres: add 'static bound to Devres&lt;T&gt;
  samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
  rust: auxiliary: generalize Registration over ForLt
  rust: types: add `ForLt` trait for higher-ranked lifetime support
  gpu: nova-core: separate driver type from driver data
  samples: rust: rust_driver_pci: use HRT lifetime for Bar
  rust: io: make IoMem and ExclusiveIoMem lifetime-parameterized
  rust: pci: make Bar lifetime-parameterized
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull driver core updates from Danilo Krummrich:
 "Deferred probe:
   - Fix race where deferred probe timeout work could be permanently
     canceled by using mod_delayed_work()
   - Fix missing jiffies conversion in deferred_probe_extend_timeout()
   - Guard timeout extension with delayed_work_pending() to prevent
     premature firing
   - Use system_percpu_wq instead of the deprecated system_wq
   - Update deferred_probe_timeout documentation

  device:
   - Replace direct struct device bitfield access (can_match, dma_iommu,
     dma_skip_sync, dma_ops_bypass, state_synced, dma_coherent,
     of_node_reused, offline, offline_disabled) with flag-based
     accessors using bit operations
   - Reject devices with unregistered buses
   - Delete unused DEVICE_ATTR_PREALLOC()
   - Add low-level device attribute macros with const show/store
     callbacks, allowing device attributes to reside in read-only memory
   - Move core device attributes to read-only memory
   - Constify group array pointers in driver_add_groups() /
     driver_remove_groups(), struct bus_type, and struct device_driver

  device property:
   - Fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
   - Initialize all fields of fwnode_handle in fwnode_init()
   - Provide swnode_get()/swnode_put() wrappers around kobject_get/put()
   - Allow passing struct software_node_ref_args pointers directly to
     PROPERTY_ENTRY_REF()

  driver_override:
   - Migrate amba, cdx, vmbus, and rpmsg to the generic driver_override
     infrastructure, fixing a UAF from unsynchronized access to
     driver_override in bus match() callbacks
   - Remove the now-unused driver_set_override()

  firmware loader:
   - Fix recursive lock deadlock in device_cache_fw_images() when async
     work falls back to synchronous execution
   - Fix device reference leak in firmware_upload_register()

  platform:
   - Pass KBUILD_MODNAME through the platform driver registration macro
     to create module symlinks in sysfs for built-in drivers; move
     module_kset initialization to a pure_initcall and tegra cbb
     registration to core_initcall to ensure correct ordering
   - Pass THIS_MODULE implicitly through a coresight_init_driver() macro

  sysfs:
   - Upgrade OOB write detection in sysfs_kf_seq_show() from printk to
     WARN
   - Add return value clamping to sysfs_kf_read()

  Rust:
   - ACPI:

     Fix missing match data for PRP0001 by exporting
     acpi_of_match_device()

   - Auxiliary:

     Replace drvdata() with dedicated registration data on
     auxiliary_device. drvdata() exposed the driver's bus device private
     data beyond the driver's own scope, creating ordering constraints
     and forcing the data to outlive all registrations that access it.
     Registration data is instead scoped structurally to the
     Registration object, making lifecycle ordering enforced by
     construction rather than convention.

   - Rust-native device driver lifetimes (HRT):

     Allow Rust device drivers to carry a lifetime parameter on their
     bus device private data, tied to the device binding scope -- the
     interval during which a bus device is bound to a driver. Device
     resources like pci::Bar&lt;'a&gt; and IoMem&lt;'a&gt; can be stored directly in
     the driver's bus device private data with a lifetime bounded by the
     binding scope, so the compiler enforces at build time that they do
     not outlive the binding. This removes Devres indirection from every
     access site and eliminates try_access() failure paths in
     destructors.

     Bus driver traits use a Generic Associated Type (GAT) Data&lt;'bound&gt;
     to introduce the lifetime on the private data, rather than
     parameterizing the Driver trait itself. Auxiliary registration
     data, where the lifetime is not introduced by a trait callback but
     must be threaded through Registration, uses the ForLt trait (a
     type-level abstraction for types generic over a lifetime).

  Misc:
   - Fix DT overlayed devices not probing by reverting the broken
     treewide overlay fix and re-running fw_devlink consumer pickup when
     an overlay is applied to a bound device
   - Use root_device_register() for faux bus root device; add sanity
     check for failed bus init
   - Fix dev_has_sync_state() data race with READ_ONCE() and move it to
     base.h
   - Avoid spurious device_links warning when removing a device while
     its supplier is unbinding
   - Switch ISA bus to dynamic root device
   - Fix suspicious RCU usage in kernfs_put()
   - Remove devcoredump exit callback
   - Constify devfreq_event_class"

* tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core: (81 commits)
  software node: allow passing reference args to PROPERTY_ENTRY_REF()
  driver core: platform: set mod_name in driver registration
  coresight: pass THIS_MODULE implicitly through a macro
  kernel: param: initialize module_kset in a pure_initcall
  soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
  firmware_loader: Fix recursive lock in device_cache_fw_images()
  driver core: Use system_percpu_wq instead of system_wq
  driver core: remove driver_set_override()
  rpmsg: use generic driver_override infrastructure
  Drivers: hv: vmbus: use generic driver_override infrastructure
  cdx: use generic driver_override infrastructure
  amba: use generic driver_override infrastructure
  rust: devres: add 'static bound to Devres&lt;T&gt;
  samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
  rust: auxiliary: generalize Registration over ForLt
  rust: types: add `ForLt` trait for higher-ranked lifetime support
  gpu: nova-core: separate driver type from driver data
  samples: rust: rust_driver_pci: use HRT lifetime for Bar
  rust: io: make IoMem and ExclusiveIoMem lifetime-parameterized
  rust: pci: make Bar lifetime-parameterized
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-debug: fix physical address retrieval in debug_dma_sync_sg_for_device</title>
<updated>2026-06-03T14:29:53+00:00</updated>
<author>
<name>Li RongQing</name>
<email>lirongqing@baidu.com</email>
</author>
<published>2026-06-03T12:37:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9bfaa86b405381326c971984fd6da184c289713f'/>
<id>9bfaa86b405381326c971984fd6da184c289713f</id>
<content type='text'>
In debug_dma_sync_sg_for_device(), when iterating over a scatterlist,
the debug entry population mistakenly uses the head of the scatterlist
'sg' to fetch the physical address via sg_phys(), instead of using the
current iterator variable 's'.

This causes dma-debug to track the physical address of the very first
scatterlist entry for all subsequent entries in the list.

Fix this by passing the correct loop iterator 's' to sg_phys()

Fixes: 9d4f645a1fd49ee ("dma-debug: store a phys_addr_t in struct dma_debug_entry")
Signed-off-by: Li RongQing &lt;lirongqing@baidu.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260603123708.1665-1-lirongqing@baidu.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In debug_dma_sync_sg_for_device(), when iterating over a scatterlist,
the debug entry population mistakenly uses the head of the scatterlist
'sg' to fetch the physical address via sg_phys(), instead of using the
current iterator variable 's'.

This causes dma-debug to track the physical address of the very first
scatterlist entry for all subsequent entries in the list.

Fix this by passing the correct loop iterator 's' to sg_phys()

Fixes: 9d4f645a1fd49ee ("dma-debug: store a phys_addr_t in struct dma_debug_entry")
Signed-off-by: Li RongQing &lt;lirongqing@baidu.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260603123708.1665-1-lirongqing@baidu.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments</title>
<updated>2026-06-03T06:52:40+00:00</updated>
<author>
<name>Li RongQing</name>
<email>lirongqing@baidu.com</email>
</author>
<published>2026-06-03T01:37:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=560000d619ef162568746ce287f0c725e24ea967'/>
<id>560000d619ef162568746ce287f0c725e24ea967</id>
<content type='text'>
In dma_direct_map_sg(), the case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE
incorrectly used 'break' instead of falling through to MAP_NONE.
As a result, segments traversing the host bridge skipped the required
dma_direct_map_phys() call entirely, leaving sg-&gt;dma_address
uninitialized and leading to DMA failures. Fix this by using
'fallthrough;'.

Fixes: a25e7962db0d79 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Signed-off-by: Li RongQing &lt;lirongqing@baidu.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260603013723.2439-1-lirongqing@baidu.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In dma_direct_map_sg(), the case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE
incorrectly used 'break' instead of falling through to MAP_NONE.
As a result, segments traversing the host bridge skipped the required
dma_direct_map_phys() call entirely, leaving sg-&gt;dma_address
uninitialized and leading to DMA failures. Fix this by using
'fallthrough;'.

Fixes: a25e7962db0d79 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reviewed-by: Logan Gunthorpe &lt;logang@deltatee.com&gt;
Signed-off-by: Li RongQing &lt;lirongqing@baidu.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260603013723.2439-1-lirongqing@baidu.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma: map_benchmark: turn dma_sg_map_param buf into a flexible array</title>
<updated>2026-06-03T06:20:02+00:00</updated>
<author>
<name>Rosen Penev</name>
<email>rosenp@gmail.com</email>
</author>
<published>2026-06-03T03:17:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7318301e060e3f1207a34f2fca6cd86770dd9160'/>
<id>7318301e060e3f1207a34f2fca6cd86770dd9160</id>
<content type='text'>
The buf pointer was kmalloc_array()'d immediately after the parent
struct allocation, with the count (granule, validated to 1..1024 by
the ioctl) trivially available beforehand.  Move buf to the struct
tail as a flexible array member and fold the two allocations into a
single kzalloc_flex(), dropping the kfree(params-&gt;buf) in both the
prepare error path and unprepare.

Add __counted_by for extra runtime analysis.

Assisted-by: Claude:Opus-4.7
Signed-off-by: Rosen Penev &lt;rosenp@gmail.com&gt;
Reviewed-by: Qinxin Xia &lt;xiaqinxin@huawei.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260603031758.290538-1-rosenp@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The buf pointer was kmalloc_array()'d immediately after the parent
struct allocation, with the count (granule, validated to 1..1024 by
the ioctl) trivially available beforehand.  Move buf to the struct
tail as a flexible array member and fold the two allocations into a
single kzalloc_flex(), dropping the kfree(params-&gt;buf) in both the
prepare error path and unprepare.

Add __counted_by for extra runtime analysis.

Assisted-by: Claude:Opus-4.7
Signed-off-by: Rosen Penev &lt;rosenp@gmail.com&gt;
Reviewed-by: Qinxin Xia &lt;xiaqinxin@huawei.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260603031758.290538-1-rosenp@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-contiguous: simplify numa cma area handling</title>
<updated>2026-05-28T08:11:45+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@linux.alibaba.com</email>
</author>
<published>2026-05-25T01:51:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d5cae2261b86913e602452ce4a07e6aefc0f603b'/>
<id>d5cae2261b86913e602452ce4a07e6aefc0f603b</id>
<content type='text'>
Currently, there are 2 kernel cmdline ways to setup numa cma area:
"cma_pernuma=" and "numa_cma=", and there are 2 cma arrays as well,
while they have no difference technically. Robin suggested to cleanup
the code and only use one array [1], as "the apparent intent that
users only want one _or_ the other".

Simplify the code by only using one array to save the numa cma area.
And in rare case that a user really setup the 2 cmdline parameters
at the same time,  let the per-node specific size setting 'numa_cma='
take priority over the global numa cma setting.

Link[1]: https://lore.kernel.org/lkml/43c5301c-fe6a-41e4-9482-ccfc7b62f2a7@arm.com/

Suggested-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Feng Tang &lt;feng.tang@linux.alibaba.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260525015111.6267-1-feng.tang@linux.alibaba.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, there are 2 kernel cmdline ways to setup numa cma area:
"cma_pernuma=" and "numa_cma=", and there are 2 cma arrays as well,
while they have no difference technically. Robin suggested to cleanup
the code and only use one array [1], as "the apparent intent that
users only want one _or_ the other".

Simplify the code by only using one array to save the numa cma area.
And in rare case that a user really setup the 2 cmdline parameters
at the same time,  let the per-node specific size setting 'numa_cma='
take priority over the global numa cma setting.

Link[1]: https://lore.kernel.org/lkml/43c5301c-fe6a-41e4-9482-ccfc7b62f2a7@arm.com/

Suggested-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Feng Tang &lt;feng.tang@linux.alibaba.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260525015111.6267-1-feng.tang@linux.alibaba.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'v7.1-rc5' into driver-core-next</title>
<updated>2026-05-25T00:40:57+00:00</updated>
<author>
<name>Danilo Krummrich</name>
<email>dakr@kernel.org</email>
</author>
<published>2026-05-25T00:40:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=56785dcb2ef6d3cff82ac33f2e34db94377416a3'/>
<id>56785dcb2ef6d3cff82ac33f2e34db94377416a3</id>
<content type='text'>
We need the driver-core fixes in here as well to build on top of.

Signed-off-by: Danilo Krummrich &lt;dakr@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We need the driver-core fixes in here as well to build on top of.

Signed-off-by: Danilo Krummrich &lt;dakr@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly</title>
<updated>2026-05-22T05:57:40+00:00</updated>
<author>
<name>Feng Tang</name>
<email>feng.tang@linux.alibaba.com</email>
</author>
<published>2026-05-12T08:55:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7e6ace2535d032c908e4d8747d9a7952617c001a'/>
<id>7e6ace2535d032c908e4d8747d9a7952617c001a</id>
<content type='text'>
There was a report on a multi-numa-nodes arm64 server that when IOMMU
is disabled, the dma_alloc_coherent() function always returns memory
from node 0 even for devices attaching to other nodes, while they can
get local dma memory when IOMMU is on with the same API.

The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
go the direct way and call dma_alloc_contiguous(). The system doesn't
have any explicit cma setting (like per-numa cma), and only has a
default 64MB cma reserved area (on node 0), where kernel will try
first to allocate memory from.

Robin Murphy suggested to setup pernuma cma or disable cma, which did
solve the issue. While there is still concern that for customers
which don't have much kernel knowledge, they could still suffer from
this silently as some architectures enable cma area by default (not
an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
default) for most Linux distributions.

One thought is to follow the current cma reserving policy for platform
with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
or 'cma pernuma' method) is not explicitly configured, and the platform
really has multiple NUMA nodes, set it up according to size of default
'dma_contiguous_default_area'. This way, the default behavior of
platform with one NUMA node is kept unchanged (say embedded/small
devices don't need to allocate extra memory), while the general dma
locality is improved.

Add a new bool kernel config CONFIG_CMA_SIZE_PERNUMA to control whether
to enable it. Even when the config is enabled, user can still disable
it by kernel-cmdline setting like "numa_cma=0:0" or "cma_pernuma=0".

Reported-by: Changrong Chen &lt;chenchangrong.ccr@alibaba-inc.com&gt;
Suggested-by: Ying Huang &lt;ying.huang@linux.alibaba.com&gt;
Suggested-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Feng Tang &lt;feng.tang@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20260512085509.83002-1-feng.tang@linux.alibaba.com
Link: https://lore.kernel.org/all/20260520222742.GA1607511@ax162/
[mszyprow: squashed changes from both links, added __initdata attribute
           to the numa_cma_configured variable]
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There was a report on a multi-numa-nodes arm64 server that when IOMMU
is disabled, the dma_alloc_coherent() function always returns memory
from node 0 even for devices attaching to other nodes, while they can
get local dma memory when IOMMU is on with the same API.

The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
go the direct way and call dma_alloc_contiguous(). The system doesn't
have any explicit cma setting (like per-numa cma), and only has a
default 64MB cma reserved area (on node 0), where kernel will try
first to allocate memory from.

Robin Murphy suggested to setup pernuma cma or disable cma, which did
solve the issue. While there is still concern that for customers
which don't have much kernel knowledge, they could still suffer from
this silently as some architectures enable cma area by default (not
an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
default) for most Linux distributions.

One thought is to follow the current cma reserving policy for platform
with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
or 'cma pernuma' method) is not explicitly configured, and the platform
really has multiple NUMA nodes, set it up according to size of default
'dma_contiguous_default_area'. This way, the default behavior of
platform with one NUMA node is kept unchanged (say embedded/small
devices don't need to allocate extra memory), while the general dma
locality is improved.

Add a new bool kernel config CONFIG_CMA_SIZE_PERNUMA to control whether
to enable it. Even when the config is enabled, user can still disable
it by kernel-cmdline setting like "numa_cma=0:0" or "cma_pernuma=0".

Reported-by: Changrong Chen &lt;chenchangrong.ccr@alibaba-inc.com&gt;
Suggested-by: Ying Huang &lt;ying.huang@linux.alibaba.com&gt;
Suggested-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Feng Tang &lt;feng.tang@linux.alibaba.com&gt;
Link: https://lore.kernel.org/r/20260512085509.83002-1-feng.tang@linux.alibaba.com
Link: https://lore.kernel.org/all/20260520222742.GA1607511@ax162/
[mszyprow: squashed changes from both links, added __initdata attribute
           to the numa_cma_configured variable]
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-mapping: move dma_map_resource() sanity check into debug code</title>
<updated>2026-05-18T07:04:59+00:00</updated>
<author>
<name>Jianpeng Chang</name>
<email>jianpeng.chang.cn@windriver.com</email>
</author>
<published>2026-05-13T07:22:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=af0c3f05866237f7592219bfe05387bc3bfc99b5'/>
<id>af0c3f05866237f7592219bfe05387bc3bfc99b5</id>
<content type='text'>
dma_map_resource() uses pfn_valid() to ensure the range is not RAM.
However, pfn_valid() only checks for availability of the memory map for
a PFN but it does not ensure that the PFN is actually backed by RAM. On
ARM64 with SPARSEMEM (128MB section granularity), MMIO addresses that
share a section with RAM will falsely trigger the WARN_ON_ONCE and cause
dma_map_resource() to return DMA_MAPPING_ERROR.

This causes a WARNING on Raspberry Pi 4 during spi_bcm2835 probe because
the SPI FIFO register (0xfe204004) falls in the same sparsemem section
as the end of RAM (0xf8000000-0xfbffffff), both in section 31
(0xf8000000-0xffffffff).

Move the sanity check from dma_map_resource() into debug_dma_map_phys()
and replace the unreliable pfn_valid() with pfn_valid() &amp;&amp;
!PageReserved(), which correctly identifies actual usable RAM without
false positives for MMIO regions that happen to have struct pages.

Since dma_map_resource() is dma_map_phys(DMA_ATTR_MMIO), the check
applies equally to both APIs. Any non-reserved page represents kernel
memory to a sufficient degree that using DMA_ATTR_MMIO on it is almost
certainly wrong and risks breaking coherency on non-coherent platforms.
ZONE_DEVICE pages used for PCI P2P DMA (MEMORY_DEVICE_PCI_P2PDMA) have
PageReserved set, so they will not trigger a false positive.

The check no longer blocks the mapping and uses err_printk() to
integrate with dma-debug filtering.

Fixes: f7326196a781 ("dma-mapping: export new dma_*map_phys() interface")
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Jianpeng Chang &lt;jianpeng.chang.cn@windriver.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260513072209.1486986-1-jianpeng.chang.cn@windriver.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
dma_map_resource() uses pfn_valid() to ensure the range is not RAM.
However, pfn_valid() only checks for availability of the memory map for
a PFN but it does not ensure that the PFN is actually backed by RAM. On
ARM64 with SPARSEMEM (128MB section granularity), MMIO addresses that
share a section with RAM will falsely trigger the WARN_ON_ONCE and cause
dma_map_resource() to return DMA_MAPPING_ERROR.

This causes a WARNING on Raspberry Pi 4 during spi_bcm2835 probe because
the SPI FIFO register (0xfe204004) falls in the same sparsemem section
as the end of RAM (0xf8000000-0xfbffffff), both in section 31
(0xf8000000-0xffffffff).

Move the sanity check from dma_map_resource() into debug_dma_map_phys()
and replace the unreliable pfn_valid() with pfn_valid() &amp;&amp;
!PageReserved(), which correctly identifies actual usable RAM without
false positives for MMIO regions that happen to have struct pages.

Since dma_map_resource() is dma_map_phys(DMA_ATTR_MMIO), the check
applies equally to both APIs. Any non-reserved page represents kernel
memory to a sufficient degree that using DMA_ATTR_MMIO on it is almost
certainly wrong and risks breaking coherency on non-coherent platforms.
ZONE_DEVICE pages used for PCI P2P DMA (MEMORY_DEVICE_PCI_P2PDMA) have
PageReserved set, so they will not trigger a false positive.

The check no longer blocks the mapping and uses err_printk() to
integrate with dma-debug filtering.

Fixes: f7326196a781 ("dma-mapping: export new dma_*map_phys() interface")
Reviewed-by: Robin Murphy &lt;robin.murphy@arm.com&gt;
Signed-off-by: Jianpeng Chang &lt;jianpeng.chang.cn@windriver.com&gt;
Reviewed-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260513072209.1486986-1-jianpeng.chang.cn@windriver.com
</pre>
</div>
</content>
</entry>
<entry>
<title>dma-debug: Ensure mappings are created and released with matching attributes</title>
<updated>2026-05-08T20:28:19+00:00</updated>
<author>
<name>Leon Romanovsky</name>
<email>leonro@nvidia.com</email>
</author>
<published>2026-05-01T06:35:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a34bd60bdaebb1db4631cad45ff87dcabf4fc962'/>
<id>a34bd60bdaebb1db4631cad45ff87dcabf4fc962</id>
<content type='text'>
The DMA API expects that callers use the same attributes when mapping
and unmapping. Add tracking to verify this and catch mismatches.

Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-6-8dbac75cd501@nvidia.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The DMA API expects that callers use the same attributes when mapping
and unmapping. Add tracking to verify this and catch mismatches.

Signed-off-by: Leon Romanovsky &lt;leonro@nvidia.com&gt;
Reviewed-by: Samiullah Khawaja &lt;skhawaja@google.com&gt;
Signed-off-by: Marek Szyprowski &lt;m.szyprowski@samsung.com&gt;
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-6-8dbac75cd501@nvidia.com
</pre>
</div>
</content>
</entry>
</feed>
