linux.git/kernel/dma, branch v7.2-rc1

Merge tag 'dma-mapping-7.2-2026-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

2026-06-17T19:20:21+00:00

Pull dma-mapping updates from Marek Szyprowski:

 - added checks for DMA attributes in the debug code, especially to
   ensure that mappings are created and released with matching
   attributes (Leon Romanovsky)

 - better default configuration for CMA on NUMA machines (Feng Tang)

 - code cleanup in dma benchmark tool (Rosen Penev)

* tag 'dma-mapping-7.2-2026-06-16' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma: map_benchmark: turn dma_sg_map_param buf into a flexible array
  dma-contiguous: simplify numa cma area handling
  dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly
  dma-debug: Ensure mappings are created and released with matching attributes
  dma-debug: Feed DMA attribute for unmapping flows too
  dma-debug: Record DMA attributes in debug entry
  dma-debug: Remove unused DMA attribute parameter
  ntb: Use consistent DMA attributes when freeing DMA mappings
  ntb: Store original DMA address for future release

Merge tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core

2026-06-15T07:11:17+00:00

Pull driver core updates from Danilo Krummrich:
 "Deferred probe:
   - Fix race where deferred probe timeout work could be permanently
     canceled by using mod_delayed_work()
   - Fix missing jiffies conversion in deferred_probe_extend_timeout()
   - Guard timeout extension with delayed_work_pending() to prevent
     premature firing
   - Use system_percpu_wq instead of the deprecated system_wq
   - Update deferred_probe_timeout documentation

  device:
   - Replace direct struct device bitfield access (can_match, dma_iommu,
     dma_skip_sync, dma_ops_bypass, state_synced, dma_coherent,
     of_node_reused, offline, offline_disabled) with flag-based
     accessors using bit operations
   - Reject devices with unregistered buses
   - Delete unused DEVICE_ATTR_PREALLOC()
   - Add low-level device attribute macros with const show/store
     callbacks, allowing device attributes to reside in read-only memory
   - Move core device attributes to read-only memory
   - Constify group array pointers in driver_add_groups() /
     driver_remove_groups(), struct bus_type, and struct device_driver

  device property:
   - Fix fwnode reference leak in fwnode_graph_get_endpoint_by_id()
   - Initialize all fields of fwnode_handle in fwnode_init()
   - Provide swnode_get()/swnode_put() wrappers around kobject_get/put()
   - Allow passing struct software_node_ref_args pointers directly to
     PROPERTY_ENTRY_REF()

  driver_override:
   - Migrate amba, cdx, vmbus, and rpmsg to the generic driver_override
     infrastructure, fixing a UAF from unsynchronized access to
     driver_override in bus match() callbacks
   - Remove the now-unused driver_set_override()

  firmware loader:
   - Fix recursive lock deadlock in device_cache_fw_images() when async
     work falls back to synchronous execution
   - Fix device reference leak in firmware_upload_register()

  platform:
   - Pass KBUILD_MODNAME through the platform driver registration macro
     to create module symlinks in sysfs for built-in drivers; move
     module_kset initialization to a pure_initcall and tegra cbb
     registration to core_initcall to ensure correct ordering
   - Pass THIS_MODULE implicitly through a coresight_init_driver() macro

  sysfs:
   - Upgrade OOB write detection in sysfs_kf_seq_show() from printk to
     WARN
   - Add return value clamping to sysfs_kf_read()

  Rust:
   - ACPI:

     Fix missing match data for PRP0001 by exporting
     acpi_of_match_device()

   - Auxiliary:

     Replace drvdata() with dedicated registration data on
     auxiliary_device. drvdata() exposed the driver's bus device private
     data beyond the driver's own scope, creating ordering constraints
     and forcing the data to outlive all registrations that access it.
     Registration data is instead scoped structurally to the
     Registration object, making lifecycle ordering enforced by
     construction rather than convention.

   - Rust-native device driver lifetimes (HRT):

     Allow Rust device drivers to carry a lifetime parameter on their
     bus device private data, tied to the device binding scope -- the
     interval during which a bus device is bound to a driver. Device
     resources like pci::Bar<'a> and IoMem<'a> can be stored directly in
     the driver's bus device private data with a lifetime bounded by the
     binding scope, so the compiler enforces at build time that they do
     not outlive the binding. This removes Devres indirection from every
     access site and eliminates try_access() failure paths in
     destructors.

     Bus driver traits use a Generic Associated Type (GAT) Data<'bound>
     to introduce the lifetime on the private data, rather than
     parameterizing the Driver trait itself. Auxiliary registration
     data, where the lifetime is not introduced by a trait callback but
     must be threaded through Registration, uses the ForLt trait (a
     type-level abstraction for types generic over a lifetime).

  Misc:
   - Fix DT overlayed devices not probing by reverting the broken
     treewide overlay fix and re-running fw_devlink consumer pickup when
     an overlay is applied to a bound device
   - Use root_device_register() for faux bus root device; add sanity
     check for failed bus init
   - Fix dev_has_sync_state() data race with READ_ONCE() and move it to
     base.h
   - Avoid spurious device_links warning when removing a device while
     its supplier is unbinding
   - Switch ISA bus to dynamic root device
   - Fix suspicious RCU usage in kernfs_put()
   - Remove devcoredump exit callback
   - Constify devfreq_event_class"

* tag 'driver-core-7.2-rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/driver-core/driver-core: (81 commits)
  software node: allow passing reference args to PROPERTY_ENTRY_REF()
  driver core: platform: set mod_name in driver registration
  coresight: pass THIS_MODULE implicitly through a macro
  kernel: param: initialize module_kset in a pure_initcall
  soc/tegra: cbb: Move driver registration from pure_initcall to core_initcall
  firmware_loader: Fix recursive lock in device_cache_fw_images()
  driver core: Use system_percpu_wq instead of system_wq
  driver core: remove driver_set_override()
  rpmsg: use generic driver_override infrastructure
  Drivers: hv: vmbus: use generic driver_override infrastructure
  cdx: use generic driver_override infrastructure
  amba: use generic driver_override infrastructure
  rust: devres: add 'static bound to Devres
  samples: rust: rust_driver_auxiliary: showcase lifetime-bound registration data
  rust: auxiliary: generalize Registration over ForLt
  rust: types: add `ForLt` trait for higher-ranked lifetime support
  gpu: nova-core: separate driver type from driver data
  samples: rust: rust_driver_pci: use HRT lifetime for Bar
  rust: io: make IoMem and ExclusiveIoMem lifetime-parameterized
  rust: pci: make Bar lifetime-parameterized
  ...

dma-debug: fix physical address retrieval in debug_dma_sync_sg_for_device

2026-06-03T14:29:53+00:00

In debug_dma_sync_sg_for_device(), when iterating over a scatterlist,
the debug entry population mistakenly uses the head of the scatterlist
'sg' to fetch the physical address via sg_phys(), instead of using the
current iterator variable 's'.

This causes dma-debug to track the physical address of the very first
scatterlist entry for all subsequent entries in the list.

Fix this by passing the correct loop iterator 's' to sg_phys()

Fixes: 9d4f645a1fd49ee ("dma-debug: store a phys_addr_t in struct dma_debug_entry")
Signed-off-by: Li RongQing 
Signed-off-by: Marek Szyprowski 
Link: https://lore.kernel.org/r/20260603123708.1665-1-lirongqing@baidu.com

dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments

2026-06-03T06:52:40+00:00

In dma_direct_map_sg(), the case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE
incorrectly used 'break' instead of falling through to MAP_NONE.
As a result, segments traversing the host bridge skipped the required
dma_direct_map_phys() call entirely, leaving sg->dma_address
uninitialized and leading to DMA failures. Fix this by using
'fallthrough;'.

Fixes: a25e7962db0d79 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers")
Reviewed-by: Logan Gunthorpe 
Signed-off-by: Li RongQing 
Signed-off-by: Marek Szyprowski 
Link: https://lore.kernel.org/r/20260603013723.2439-1-lirongqing@baidu.com

dma: map_benchmark: turn dma_sg_map_param buf into a flexible array

2026-06-03T06:20:02+00:00

The buf pointer was kmalloc_array()'d immediately after the parent
struct allocation, with the count (granule, validated to 1..1024 by
the ioctl) trivially available beforehand.  Move buf to the struct
tail as a flexible array member and fold the two allocations into a
single kzalloc_flex(), dropping the kfree(params->buf) in both the
prepare error path and unprepare.

Add __counted_by for extra runtime analysis.

Assisted-by: Claude:Opus-4.7
Signed-off-by: Rosen Penev 
Reviewed-by: Qinxin Xia 
Signed-off-by: Marek Szyprowski 
Link: https://lore.kernel.org/r/20260603031758.290538-1-rosenp@gmail.com

dma-contiguous: simplify numa cma area handling

2026-05-28T08:11:45+00:00

Currently, there are 2 kernel cmdline ways to setup numa cma area:
"cma_pernuma=" and "numa_cma=", and there are 2 cma arrays as well,
while they have no difference technically. Robin suggested to cleanup
the code and only use one array [1], as "the apparent intent that
users only want one _or_ the other".

Simplify the code by only using one array to save the numa cma area.
And in rare case that a user really setup the 2 cmdline parameters
at the same time,  let the per-node specific size setting 'numa_cma='
take priority over the global numa cma setting.

Link[1]: https://lore.kernel.org/lkml/43c5301c-fe6a-41e4-9482-ccfc7b62f2a7@arm.com/

Suggested-by: Robin Murphy 
Signed-off-by: Feng Tang 
Signed-off-by: Marek Szyprowski 
Link: https://lore.kernel.org/r/20260525015111.6267-1-feng.tang@linux.alibaba.com

Merge tag 'v7.1-rc5' into driver-core-next

2026-05-25T00:40:57+00:00

We need the driver-core fixes in here as well to build on top of.

Signed-off-by: Danilo Krummrich

dma-contiguous: add kconfig option to setup numa cma area if not configured explicitly

2026-05-22T05:57:40+00:00

There was a report on a multi-numa-nodes arm64 server that when IOMMU
is disabled, the dma_alloc_coherent() function always returns memory
from node 0 even for devices attaching to other nodes, while they can
get local dma memory when IOMMU is on with the same API.

The reason is, when IOMMU is disabled, the dma_alloc_coherent() will
go the direct way and call dma_alloc_contiguous(). The system doesn't
have any explicit cma setting (like per-numa cma), and only has a
default 64MB cma reserved area (on node 0), where kernel will try
first to allocate memory from.

Robin Murphy suggested to setup pernuma cma or disable cma, which did
solve the issue. While there is still concern that for customers
which don't have much kernel knowledge, they could still suffer from
this silently as some architectures enable cma area by default (not
an issue for X86 though, which set CONFIG_CMA_SIZE_MBYTES to 0 by
default) for most Linux distributions.

One thought is to follow the current cma reserving policy for platform
with 'CONFIG_DMA_NUMA_CMA=y', that if the numa cma (either the 'numa cma'
or 'cma pernuma' method) is not explicitly configured, and the platform
really has multiple NUMA nodes, set it up according to size of default
'dma_contiguous_default_area'. This way, the default behavior of
platform with one NUMA node is kept unchanged (say embedded/small
devices don't need to allocate extra memory), while the general dma
locality is improved.

Add a new bool kernel config CONFIG_CMA_SIZE_PERNUMA to control whether
to enable it. Even when the config is enabled, user can still disable
it by kernel-cmdline setting like "numa_cma=0:0" or "cma_pernuma=0".

Reported-by: Changrong Chen 
Suggested-by: Ying Huang 
Suggested-by: Robin Murphy 
Signed-off-by: Feng Tang 
Link: https://lore.kernel.org/r/20260512085509.83002-1-feng.tang@linux.alibaba.com
Link: https://lore.kernel.org/all/20260520222742.GA1607511@ax162/
[mszyprow: squashed changes from both links, added __initdata attribute
           to the numa_cma_configured variable]
Signed-off-by: Marek Szyprowski

dma-mapping: move dma_map_resource() sanity check into debug code

2026-05-18T07:04:59+00:00

dma_map_resource() uses pfn_valid() to ensure the range is not RAM.
However, pfn_valid() only checks for availability of the memory map for
a PFN but it does not ensure that the PFN is actually backed by RAM. On
ARM64 with SPARSEMEM (128MB section granularity), MMIO addresses that
share a section with RAM will falsely trigger the WARN_ON_ONCE and cause
dma_map_resource() to return DMA_MAPPING_ERROR.

This causes a WARNING on Raspberry Pi 4 during spi_bcm2835 probe because
the SPI FIFO register (0xfe204004) falls in the same sparsemem section
as the end of RAM (0xf8000000-0xfbffffff), both in section 31
(0xf8000000-0xffffffff).

Move the sanity check from dma_map_resource() into debug_dma_map_phys()
and replace the unreliable pfn_valid() with pfn_valid() &&
!PageReserved(), which correctly identifies actual usable RAM without
false positives for MMIO regions that happen to have struct pages.

Since dma_map_resource() is dma_map_phys(DMA_ATTR_MMIO), the check
applies equally to both APIs. Any non-reserved page represents kernel
memory to a sufficient degree that using DMA_ATTR_MMIO on it is almost
certainly wrong and risks breaking coherency on non-coherent platforms.
ZONE_DEVICE pages used for PCI P2P DMA (MEMORY_DEVICE_PCI_P2PDMA) have
PageReserved set, so they will not trigger a false positive.

The check no longer blocks the mapping and uses err_printk() to
integrate with dma-debug filtering.

Fixes: f7326196a781 ("dma-mapping: export new dma_*map_phys() interface")
Reviewed-by: Robin Murphy 
Signed-off-by: Jianpeng Chang 
Reviewed-by: Leon Romanovsky 
Signed-off-by: Marek Szyprowski 
Link: https://lore.kernel.org/r/20260513072209.1486986-1-jianpeng.chang.cn@windriver.com

dma-debug: Ensure mappings are created and released with matching attributes

2026-05-08T20:28:19+00:00

The DMA API expects that callers use the same attributes when mapping
and unmapping. Add tracking to verify this and catch mismatches.

Signed-off-by: Leon Romanovsky 
Reviewed-by: Samiullah Khawaja 
Signed-off-by: Marek Szyprowski 
Link: https://lore.kernel.org/r/20260501-dma-attrs-debug-v2-6-8dbac75cd501@nvidia.com