summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-11-05wifi: ath12k: Remove the wifi7 header inclusions in common codePavankumar Nandeshwar
As a precursor to the movement of wifi7 specific .c files to ath12k_wifi7.ko module, remove the wifi7 headers included in the common .c files except for dp_mon.c file, as the changes for moving the code from common to wifi7 directory for monitor will be coming incrementally. Since, dp_mon.c continues to be part of ath12k.ko module, add a few callbacks in hal_ops to facilitate calls from dp_mon.c to reach hal APIs present in ath12k_wifi7.ko module Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Pavankumar Nandeshwar <quic_pnandesh@quicinc.com> Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com> Reviewed-by: Karthikeyan Periyasamy <karthikeyan.periyasamy@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251103112111.2260639-7-quic_rdeuri@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-11-05wifi: ath12k: Move ieee80211_ops callback to the arch specific moduleRipan Deuri
Move the ieee80211_ops Tx callback to the architecture-specific module to avoid additional indirections caused by the common Tx function calling the architecture-specific Tx functions via ops. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com> Reviewed-by: Karthikeyan Periyasamy <karthikeyan.periyasamy@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251103112111.2260639-6-quic_rdeuri@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-11-05wifi: ath12k: Add helper to free DP link peerPavankumar Nandeshwar
Introduce ath12k_dp_link_peer_free() to wrap cleanup steps required before freeing a DP link peer. This avoids code duplication and ensures consistent teardown across multiple call flows. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Pavankumar Nandeshwar <quic_pnandesh@quicinc.com> Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com> Reviewed-by: Karthikeyan Periyasamy <karthikeyan.periyasamy@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251103112111.2260639-5-quic_rdeuri@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-11-05wifi: ath12k: Move DP specific link stats to DP link peerHarsh Kumar Bijlani
As part of peer modularization in the Driver Framework, the station view is as follows: Common Path Data Path ------------------------------------------------------------------- ath12k_sta ath12k_dp_peer | | |-> ath12k_link_sta <----> |-> ath12k_dp_link_peer | | |-> ath12k_link_sta <----> |-> ath12k_dp_link_peer | | |-> ath12k_link_sta <----> |-> ath12k_dp_link_peer Currently ath12k_link_sta has data path stats updated in tx_htt and rx monitor path. Move those stats from ath12_link_sta to ath12k_dp_link_peer to align with peer modularization model as shown above. This allows datapath to use only ath12k_dp_link_peer without having to reach out to other objects for updating stats, thereby improving the cache locality. Add following API to fetch rate info from DP link peer: ath12k_dp_link_peer_get_sta_rate_info_stats() This wrapper API populates link stats in 'struct ath12k_dp_link_peer_rate_info', which can be extended to support out-of-band retrieval of various rate stats. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Harsh Kumar Bijlani <quic_hbijlani@quicinc.com> Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com> Reviewed-by: Karthikeyan Periyasamy <karthikeyan.periyasamy@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251103112111.2260639-4-quic_rdeuri@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-11-05wifi: ath12k: Move DP device stats to ath12k_dpPavankumar Nandeshwar
As part of data path modularization of device object framework, the per packet tx/rx operations now use ath12k_dp. Move all the device stats 'device_stats' from ath12k_base to ath12k_dp, consolidating all device stats within ath12k_dp. This would improve the performance by allowing the datapath to reach to the stats counters without to having to reach out to ath12k_base object. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Pavankumar Nandeshwar <quic_pnandesh@quicinc.com> Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com> Reviewed-by: Karthikeyan Periyasamy <karthikeyan.periyasamy@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251103112111.2260639-3-quic_rdeuri@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-11-05wifi: ath12k: Add callbacks in arch_ops for rx APIsPavankumar Nandeshwar
Facilitate modular and architecture-agnostic access to Wi-Fi 7 Rx functionality by introducing wrapper APIs in common module for existing RX callbacks already defined in arch_ops. Also remove redundant ar usage in these APIs and registered callbacks definitions, and change the callback names of few of the APIs to remove the redundant usage of "dp". Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Pavankumar Nandeshwar <quic_pnandesh@quicinc.com> Signed-off-by: Ripan Deuri <quic_rdeuri@quicinc.com> Reviewed-by: Karthikeyan Periyasamy <karthikeyan.periyasamy@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20251103112111.2260639-2-quic_rdeuri@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-11-05reset: imx8mp-audiomix: Fix bad mask valuesLaurentiu Mihalcea
As per the i.MX8MP TRM, section 14.2 "AUDIO_BLK_CTRL", table 14.2.3.1.1 "memory map", the definition of the EARC control register shows that the EARC controller software reset is controlled via bit 0, while the EARC PHY software reset is controlled via bit 1. This means that the current definitions of IMX8MP_AUDIOMIX_EARC_RESET_MASK and IMX8MP_AUDIOMIX_EARC_PHY_RESET_MASK are wrong since their values would imply that the EARC controller software reset is controlled via bit 1 and the EARC PHY software reset is controlled via bit 2. Fix them. Fixes: a83bc87cd30a ("reset: imx8mp-audiomix: Prepare the code for more reset bits") Cc: stable@vger.kernel.org Reviewed-by: Shengjiu Wang <shengjiu.wang@gmail.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Laurentiu Mihalcea <laurentiu.mihalcea@nxp.com> Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
2025-11-05block: introduce disk_report_zone()Damien Le Moal
Commit b76b840fd933 ("dm: Fix dm-zoned-reclaim zone write pointer alignment") introduced an indirect call for the callback function of a report zones executed with blkdev_report_zones(). This is necessary so that the function disk_zone_wplug_sync_wp_offset() can be called to refresh a zone write plug zone write pointer offset after a write error. However, this solution makes following the path of a zone information harder to understand. Clean this up by introducing the new blk_report_zones_args structure to define a zone report callback and its private data and introduce the helper function disk_report_zone() which calls both disk_zone_wplug_sync_wp_offset() and the zone report user callback function for all zones of a zone report. This helper function must be called by all block device drivers that implement the report zones block operation in order to correctly report a zone information. All block device drivers supporting the report_zones block operation are updated to use this new scheme. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-05dma-buf: rework stub fence initialisation v2Christian König
Instead of doing this on the first call of the function just initialize the stub fence during kernel load. This has the clear advantage of lower overhead and also doesn't rely on the ops to not be NULL any more. v2: use correct signal function Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20251031134442.113648-3-christian.koenig@amd.com
2025-11-05regulator: Add support for MediaTek MT6363 SPMI PMIC RegulatorsAngeloGioacchino Del Regno
Add a driver for the regulators found on the MT6363 PMIC, fully controlled by SPMI interface. This PMIC regulates voltage with an input range of 2.6-5.0V, and features 10 buck converters and 26 LDOs. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20251027110527.21002-5-angelogioacchino.delregno@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-05regulator: Add support for MediaTek MT6316 SPMI PMIC RegulatorsAngeloGioacchino Del Regno
Add a driver for the regulators found on all types of the MediaTek MT6316 SPMI PMIC, fully controlled by SPMI interface and featuring four step down DCDC (buck) converters. In particular, this includes support for: - MT6316(BP/VP): 2+2 Phase (Phase 1: buck1+2, Phase 2: buck3+4) - MT6316(CP/HP/KP): 3+1 Phase (Phase 1: buck1+2+4, Phase 2: buck3) - MT6316(DP/TP): 4+0 Phase (Single phase, buck1+2+3+4) Please note that the set/clear registers for the enable bits are not documented in the datasheet version that I used as reference, but those are used in the downstream driver and I verified that are actually working as expected. Besides, it's also worth clearly mentioning that the MT6316 PMICs voltage selector register uses a weird 9-bits Big Endian format, for which a driver-private helper is provided. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://patch.msgid.link/20251027110527.21002-3-angelogioacchino.delregno@collabora.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-05media: radio: si470x: Fix DRIVER_AUTHOR macro definitionKees Cook
The DRIVER_AUTHOR macro incorrectly included a semicolon in its string literal definition. Right now, this wasn't causing any real problem, but coming changes to the MODULE_INFO() macro make this more sensitive. Specifically, when used with MODULE_AUTHOR(), this created syntax errors during macro expansion: MODULE_AUTHOR(DRIVER_AUTHOR); expands to: MODULE_INFO(author, "Joonyoung Shim <jy0922.shim@samsung.com>";) ^ syntax error Remove the trailing semicolon from the DRIVER_AUTHOR definition. Semicolons should only appear at the point of use, not in the macro definition. Reviewed-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Tested-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-11-05media: dvb-usb-v2: lmedm04: Fix firmware macro definitionsKees Cook
The firmware filename macros incorrectly included semicolons in their string literal definitions. Right now, this wasn't causing any real problem, but coming changes to the MODULE_INFO() macro make this more sensitive. Specifically, when used with MODULE_FIRMWARE(), this created syntax errors during macro expansion: MODULE_FIRMWARE(LME2510_C_S7395); expands to: MODULE_INFO(firmware, "dvb-usb-lme2510c-s7395.fw";) ^ syntax error Remove the trailing semicolons from all six firmware filename macro definitions. Semicolons should only appear at the point of use, not in the macro definition. Reviewed-by: Hans Verkuil <hverkuil+cisco@kernel.org> Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: Daniel Gomez <da.gomez@samsung.com> Tested-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
2025-11-05spi: Try to get ACPI GPIO IRQ earlierHans de Goede
Since commit d24cfee7f63d ("spi: Fix acpi deferred irq probe"), the acpi_dev_gpio_irq_get() call gets delayed till spi_probe() is called on the SPI device. If there is no driver for the SPI device then the move to spi_probe() results in acpi_dev_gpio_irq_get() never getting called. This may cause problems by leaving the GPIO pin floating because this call is responsible for setting up the GPIO pin direction and/or bias according to the values from the ACPI tables. Re-add the removed acpi_dev_gpio_irq_get() in acpi_register_spi_device() to ensure the GPIO pin is always correctly setup, while keeping the acpi_dev_gpio_irq_get() call added to spi_probe() to deal with -EPROBE_DEFER returns caused by the GPIO controller not having a driver yet. Link: https://bbs.archlinux.org/viewtopic.php?id=302348 Fixes: d24cfee7f63d ("spi: Fix acpi deferred irq probe") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede <hansg@kernel.org> Link: https://patch.msgid.link/20251102190921.30068-1-hansg@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-05spi: aspeed: Use devm_iounmap() to unmap devm_ioremap() memoryChin-Ting Kuo
The AHB IO memory for each chip select is mapped using devm_ioremap(), so it should be unmapped using devm_iounmap() to ensure proper device-managed resource cleanup. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202510292356.JnTUBxCl-lkp@intel.com/ Signed-off-by: Chin-Ting Kuo <chin-ting_kuo@aspeedtech.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://patch.msgid.link/20251105084952.1063489-1-chin-ting_kuo@aspeedtech.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-05regulator: fixed: fix GPIO descriptor leak on register failureHaotian Zhang
In the commit referenced by the Fixes tag, devm_gpiod_get_optional() was replaced by manual GPIO management, relying on the regulator core to release the GPIO descriptor. However, this approach does not account for the error path: when regulator registration fails, the core never takes over the GPIO, resulting in a resource leak. Add gpiod_put() before returning on regulator registration failure. Fixes: 5e6f3ae5c13b ("regulator: fixed: Let core handle GPIO descriptor") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20251028172828.625-1-vulab@iscas.ac.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-05spi: tegra210-quad: Improve timeout handling underMark Brown
Merge series from Vishwaroop A <va@nvidia.com>: This patch series addresses timeout handling issues in the Tegra QSPI driver that occur under high system load conditions. We've observed that when CPUs are saturated (due to error injection, RAS firmware activity, or general CPU contention), QSPI interrupt handlers can be delayed, causing spurious transfer failures even though the hardware completed the operation successfully. These changes have been tested in production environments under various high load scenarios including RAS testing and CPU saturation workloads.
2025-11-05drm/sched: Fix deadlock in drm_sched_entity_kill_jobs_cbPierre-Eric Pelloux-Prayer
The Mesa issue referenced below pointed out a possible deadlock: [ 1231.611031] Possible interrupt unsafe locking scenario: [ 1231.611033] CPU0 CPU1 [ 1231.611034] ---- ---- [ 1231.611035] lock(&xa->xa_lock#17); [ 1231.611038] local_irq_disable(); [ 1231.611039] lock(&fence->lock); [ 1231.611041] lock(&xa->xa_lock#17); [ 1231.611044] <Interrupt> [ 1231.611045] lock(&fence->lock); [ 1231.611047] *** DEADLOCK *** In this example, CPU0 would be any function accessing job->dependencies through the xa_* functions that don't disable interrupts (eg: drm_sched_job_add_dependency(), drm_sched_entity_kill_jobs_cb()). CPU1 is executing drm_sched_entity_kill_jobs_cb() as a fence signalling callback so in an interrupt context. It will deadlock when trying to grab the xa_lock which is already held by CPU0. Replacing all xa_* usage by their xa_*_irq counterparts would fix this issue, but Christian pointed out another issue: dma_fence_signal takes fence.lock and so does dma_fence_add_callback. dma_fence_signal() // locks f1.lock -> drm_sched_entity_kill_jobs_cb() -> foreach dependencies -> dma_fence_add_callback() // locks f2.lock This will deadlock if f1 and f2 share the same spinlock. To fix both issues, the code iterating on dependencies and re-arming them is moved out to drm_sched_entity_kill_jobs_work(). Cc: stable@vger.kernel.org # v6.2+ Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13908 Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [phasta: commit message nits] Signed-off-by: Philipp Stanner <phasta@kernel.org> Link: https://patch.msgid.link/20251104095358.15092-1-pierre-eric.pelloux-prayer@amd.com
2025-11-05gpu: nova-core: vbios: use FromBytes for NpdeStructAlexandre Courbot
Use `from_bytes_copy_prefix` to create `NpdeStruct` instead of building it ourselves from the bytes stream. This lets us remove a few array accesses and results in shorter code. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-5-ac441ebc1de3@nvidia.com>
2025-11-05gpu: nova-core: vbios: use FromBytes for BitHeaderAlexandre Courbot
Use `from_bytes_copy_prefix` to create `BitHeader` instead of building it ourselves from the bytes stream. This lets us remove a few array accesses and results in shorter code. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-4-ac441ebc1de3@nvidia.com>
2025-11-05gpu: nova-core: vbios: use FromBytes for PcirStructAlexandre Courbot
Use `from_bytes_copy_prefix` to create `PcirStruct` instead of building it ourselves from the bytes stream. This lets us remove a few array accesses and results in shorter code. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-3-ac441ebc1de3@nvidia.com>
2025-11-05gpu: nova-core: vbios: use FromBytes for PmuLookupTable headerAlexandre Courbot
Use `from_bytes_copy_prefix` to create the `PmuLookupTable` header instead of building it ourselves from the bytes stream. This lets us remove a few `as` conversions and array accesses. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Message-ID: <20251029-nova-vbios-frombytes-v1-2-ac441ebc1de3@nvidia.com>
2025-11-05gpio: cdev: replace use of system_wq with system_percpu_wqMarco Crivellari
Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistency cannot be addressed without refactoring the API. system_wq should be the per-cpu workqueue, yet in this name nothing makes that clear, so replace system_wq with system_percpu_wq. The old wq (system_wq) will be kept for a few release cycles. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://lore.kernel.org/r/20251031111628.143924-2-marco.crivellari@suse.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-11-05gpio: aggregator: restore the set_config operationThomas Richard
Restore the set_config operation, as it was lost during the refactoring of the gpio-aggregator driver while creating the gpio forwarder library. Fixes: b31c68fd851e7 ("gpio: aggregator: handle runtime registration of gpio_desc in gpiochip_fwd") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509281206.a7334ae8-lkp@intel.com Signed-off-by: Thomas Richard <thomas.richard@bootlin.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20250929-gpio-aggregator-fix-set-config-callback-v1-1-39046e1da609@bootlin.com Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-11-05Merge tag 'media/v6.18-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: - honour privacy led with pdx86/int3472 - fix invalid file access on cx18 and ivtv - forbid remove_bufs when legacy fileio is active on videbuf2 - add an heuristic to find stream entity on uvcvideo * tag 'media/v6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: videobuf2: forbid remove_bufs when legacy fileio is active media: uvcvideo: Use heuristic to find stream entity media: v4l2-subdev / pdx86: int3472: Use "privacy" as con_id for the privacy LED media: ivtv: Fix invalid access to file * media: cx18: Fix invalid access to file *
2025-11-05drm: rcar-du: fix incorrect return in rcar_du_crtc_cleanup()Alok Tiwari
The rcar_du_crtc_cleanup() function has a void return type, but incorrectly uses a return statement with a call to drm_crtc_cleanup(), which also returns void. Remove the return statement to ensure proper function semantics. No functional change intended. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com> Link: https://patch.msgid.link/20251017191634.1454201-1-alok.a.tiwari@oracle.com Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
2025-11-05iommupt: Add a kunit test for the SW bitsJason Gunthorpe
Add some basic checks that the sw_bit APIs work as expected. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommu/vt-d: Follow PT_FEAT_DMA_INCOHERENT into the PASID entryJason Gunthorpe
Currently a incoherent walk domain cannot be attached to a coherent capable iommu. Kevin says HW probably doesn't exist with such a mixture, but making the driver support it makes logical sense anyhow. When building the PASID entry the PWSNP (Page Walk Snoop) bit tells the HW if it should issue snoops. If the page table is cache flushed because of PT_FEAT_DMA_INCOHERENT then it is fine to set this bit to 0 even if the HW supports 1. Weaken the compatible check to permit a coherent instance to accept an incoherent table and fix the PASID table construction to set PWSNP from PT_FEAT_DMA_INCOHERENT. SVA always sets PWSNP. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommu/vt-d: Use the generic iommu page tableJason Gunthorpe
Replace the VT-d iommu_domain implementation of the VT-d second stage and first stage page tables with the iommupt VTDSS and x86_64 pagetables. x86_64 is shared with the AMD driver. There are a couple notable things in VT-d: - Like AMD the second stage format is not sign extended, unlike AMD it cannot decode a full 64 bits. The first stage format is a normal sign extended x86 page table - The HW caps can indicate how many levels, how many address bits and what leaf page sizes are supported in HW. As before the highest number of levels that can translate the entire supported address width is used. The supported page sizes are adjusted directly from the dedicated first/second stage cap bits. - VTD requires flushing 'write buffers'. This logic is left unchanged, the write buffer flushes on any gather flush or through iotlb_sync_map. - Like ARM, VTD has an optional non-coherent page table walker that requires cache flushing. This is supported through PT_FEAT_DMA_INCOHERENT the same as ARM, however x86 can't use the DMA API for flush, it must call the arch function clflush_cache_range() - The PT_FEAT_DYNAMIC_TOP can probably be supported on VT-d someday for the second stage when it uses 128 bit atomic stores for the HW context structures. - PT_FEAT_VTDSS_FORCE_WRITEABLE is used to work around ERRATA_772415_SPR17 - A kernel command line parameter "sp_off" disables all page sizes except 4k Remove all the unused iommu_domain page table code. The debugfs paths have their own independent page table walker that is left alone for now. This corrects a race with the non-coherent walker that the ARM implementations have fixed: CPU 0 CPU 1 pfn_to_dma_pte() pfn_to_dma_pte() pte = &parent[offset]; if (!dma_pte_present(pte)) { try_cmpxchg64(&pte->val) pte = &parent[offset]; .. dma_pte_present(pte) .. [...] // iommu_map() completes // Device does DMA domain_flush_cache(pte) The CPU 1 mapping operation shares a page table level with the CPU 0 mapping operation. CPU 0 installed a new page table level but has not flushed it yet. CPU1 returns from iommu_map() and the device does DMA. The non coherent walker fails to see the new table level installed by CPU 0 and fails the DMA with non-present. The iommupt PT_FEAT_DMA_INCOHERENT implementation uses the ARM design of storing a flag when CPU 0 completes the flush. If the flag is not set CPU 1 will also flush to ensure the HW can fully walk to the PTE being installed. Cc: Tina Zhang <tina.zhang@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt/x86: Support SW bits and permit PT_FEAT_DMA_INCOHERENTJason Gunthorpe
VT-d requires PT_FEAT_DMA_INCOHERENT for the x86 page table as well, implement the required SW bits and enable the feature. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt/x86: Set the dirty bit only for writable PTEsJason Gunthorpe
AMD and VTD are historically different here, adopt the VTD version of setting the D bit only on writable PTEs as it makes more sense. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add the Intel VT-d second stage page table formatJason Gunthorpe
The VT-d second stage format is almost the same as the x86 PAE format, except the bit encodings in the PTE are different and a few new PTE features, like force coherency are present. Among all the formats it is unique in not having a designated present bit. Comparing the performance of several operations to the existing version: iommu_map() pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 53,66 , 50,64 , 21.21 2^21, 59,70 , 56,67 , 16.16 2^30, 54,66 , 52,63 , 17.17 256*2^12, 384,524 , 337,516 , 34.34 256*2^21, 387,632 , 336,626 , 46.46 256*2^30, 376,629 , 323,623 , 48.48 iommu_unmap() pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 67,86 , 63,84 , 25.25 2^21, 64,84 , 59,80 , 26.26 2^30, 59,78 , 56,74 , 24.24 256*2^12, 216,335 , 198,317 , 37.37 256*2^21, 245,350 , 232,344 , 32.32 256*2^30, 248,345 , 226,339 , 33.33 Cc: Tina Zhang <tina.zhang@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05platform/x86: x86-android-tablets: Omit a variable reassignment in ↵Markus Elfring
lenovo_yoga_tab2_830_1050_init_codec() An error code was assigned to a variable and checked accordingly. This value was passed to a dev_err_probe() call in an if branch. This function is documented in the way that the same value is returned. Thus delete a redundant variable reassignment. The source code was transformed by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patch.msgid.link/90a2385c-9d19-46f2-8d31-618d5c10aa91@web.de Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-05platform/surface: aggregator: Omit a variable reassignment in ↵Markus Elfring
ssam_serial_hub_probe() An error code was assigned to a variable and checked accordingly. This value was passed to a dev_err_probe() call in an if branch. This function is documented in the way that the same value is returned. Thus delete a redundant variable reassignment. The source code was transformed by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Link: https://patch.msgid.link/b25c9842-7ebc-43f0-a411-8098359f81a6@web.de Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-11-05iommupt: Flush the CPU cache after any writes to the page tableJason Gunthorpe
Flush the CPU cache for the page table memory after each set of writes to the page table. The iommu should have visibility to the updated entries as soon as the map/unmap/etc operations return, like normal coherent hardware does. The caches also have to be flushed before any gather can be submitted to the driver. Implement the same solution to the race as io-pgtable-arm by using a software PTE bit to track if a table entry has been flushed or not. If another thread is still flushing then another concurrent map operation could return without IOMMU visibility to a required table entry. The SW bit will tell the second thread to also flush the cache. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Use the incoherent start/stop functions for PT_FEAT_DMA_INCOHERENTJason Gunthorpe
This is the first step to supporting an incoherent walker, start and stop the incoherence around the allocation and frees of the page table memory. The iommu_pages API maps this to dma_map/unmap_single(), or arch cache flushing calls. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add basic support for SW bits in the page tableJason Gunthorpe
SW bits can be placed on items, including table entries, single OA's and individual items within a contiguous OA. They are guaranteed to be ignored by the HW. The API is very basic since the only use case so far is a single bit. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommu/pages: Add support for incoherent IOMMU page table walkersJason Gunthorpe
Some IOMMU HW cannot snoop the CPU cache when it walks the IO page tables. The CPU is required to flush the cache to make changes visible to the HW. Provide some helpers from iommu-pages to manage this. The helpers combine both the ARM and x86 (used in Intel VT-d) versions of the cache flushing under a single API. The ARM version uses the DMA API to access the cache flush on the assumption that the iommu is using a direct mapping and is already marked incoherent. The helpers will do the DMA API calls to set things up and keep track of DMA mapped folios using a bit in the ioptdesc so that unmapping on error paths is cleaner. The Intel version just calls the arch cache flush call directly and has no need to cleanup prior to destruction. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add a kunit test for the IOMMU implementationJason Gunthorpe
This intends to have high coverage of the page table format functions and the IOMMU implementation itself, exercising the various corner cases. The kunit tests can be run in the kunit framework, using commands like: tools/testing/kunit/kunit.py run --build_dir build_kunit_arm64 --arch arm64 --make_options LLVM=-19 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_uml --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_x86_64 --arch x86_64 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_i386 --arch i386 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_i386pae --arch i386 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig --kconfig_add CONFIG_X86_PAE=y There are several interesting corner cases on the 32 bit platforms that need checking. Like the generic tests, these are run on the format's configuration list using kunit "params". This also checks the core iommu parts of the page table code as it enters the logic through a mock iommu_domain. The following are checked: - PT_FEAT_DYNAMIC_TOP properly adds levels one by one - Every page size can be iommu_map()'d, and mapping creates that size - iommu_iova_to_phys() works with every page size - Test converting OA -> non present -> OA when the two OAs overlap and free table levels - Test that unmap stops at holes, unmap doesn't split, and unmap returns the right values for partial unmap requests - Randomly map/unmap. Checks map with random sizes, that map fails when hitting collisions doing nothing, unmap/map with random intersections and full unmap of random sizes. Also checks iommu_iova_to_phys() with random sizes - Check for memory leaks by monitoring NR_SECONDARY_PAGETABLE Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommu/amd: Remove AMD io_pgtable supportJason Gunthorpe
None of this is used anymore, delete it. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommu/amd: Use the generic iommu page tableAlejandro Jimenez
Replace the io_pgtable versions with pt_iommu versions. The v2 page table uses the x86 implementation that will be eventually shared with VT-d. This supports the same special features as the original code: - increase_top for the v1 format to allow scaling from 3 to 6 levels - non-present flushing - Dirty tracking for v1 only - __sme_set() to adjust the PTEs for CC - Optimization for flushing with virtualization to minimize the range - amd_iommu_pgsize_bitmap override of the native page sizes - page tables allocate from the device's NUMA node Rework the domain ops so that v1/v2 get their own ops. Make dedicated allocation functions for v1 and v2. Hook up invalidation for a top change to struct pt_iommu_flush_ops. Delete some of the iopgtable related code that becomes unused in this patch. The next patch will delete the rest of it. This fixes a race bug in AMD's increase_address_space() implementation. It stores the top level and top pointer in different memory, which prevents other threads from reading a coherent version: increase_address_space() alloc_pte() level = pgtable->mode - 1; pgtable->root = pte; pgtable->mode += 1; pte = &pgtable->root[PM_LEVEL_INDEX(level, address)]; The iommupt version is careful to put mode and root under a single READ_ONCE and then is careful to only READ_ONCE a single time per walk. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add the x86 64 bit page table formatJason Gunthorpe
This is used by x86 CPUs and can be used in AMD/VT-d x86 IOMMUs. When a x86 IOMMU is running SVA the MM will be using this format. This implementation follows the AMD v2 io-pgtable version. There is nothing remarkable here, the format can have 4 or 5 levels and limited support for different page sizes. No contiguous pages support. x86 uses a sign extension mechanism where the top bits of the VA must match the sign bit. The core code supports this through PT_FEAT_SIGN_EXTEND which creates and upper and lower VA range. All the new operations will work correctly in both spaces, however currently there is no way to report the upper space to other layers. Future patches can improve that. In principle this can support 3 page tables levels matching the 32 bit PAE table format, but no iommu driver needs this. The focus is on the modern 64 bit 4 and 5 level formats. Comparing the performance of several operations to the existing version: iommu_map() pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 71,61 , 66,58 , -13.13 2^21, 66,60 , 61,55 , -10.10 2^30, 59,56 , 56,54 , -3.03 256*2^12, 392,1360 , 345,1289 , 73.73 256*2^21, 383,1159 , 335,1145 , 70.70 256*2^30, 378,965 , 331,892 , 62.62 iommu_unmap() pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 77,71 , 73,68 , -7.07 2^21, 76,70 , 70,66 , -6.06 2^30, 69,66 , 66,63 , -4.04 256*2^12, 225,899 , 210,870 , 75.75 256*2^21, 262,722 , 248,710 , 65.65 256*2^30, 251,643 , 244,634 , 61.61 The small -ve values in the iommu_unmap() are due to the core code calling iommu_pgsize() before invoking the domain op. This is unncessary with this implementation. Future work optimizes this and gets to 2%, 4%, 3%. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommufd: Change the selftest to use iommupt instead of xarrayJason Gunthorpe
The iommufd self test uses an xarray to store the pfns and their orders to emulate a page table. Make it act more like a real iommu driver by replacing the xarray with an iommupt based page table. The new AMDv1 mock format behaves similarly to the xarray. Add set_dirty() as a iommu_pt operation to allow the test suite to simulate HW dirty. Userspace can select between several formats including the normal AMDv1 format and a special MOCK_IOMMUPT_HUGE variation for testing huge page dirty tracking. To make the dirty tracking test work the page table must only store exactly 2M huge pages otherwise the logic the test uses fails. They cannot be broken up or combined. Aside from aligning the selftest with a real page table implementation, this helps test the iommupt code itself. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add a mock pagetable format for iommufd selftest to useJason Gunthorpe
The iommufd self test uses an xarray to store the pfns and their orders to emulate a page table. Slightly modify the amdv1 page table to create a real page table that has similar properties: - 2k base granule to simulate something like a 4k page table on a 64K PAGE_SIZE ARM system - Contiguous page support for every PFN order - Dirty tracking AMDv1 is the closest format, as it is the only one that already supports every page size. Tweak it to have only 5 levels and an 11 bit base granule and compile it separately as a format variant. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add a kunit test for Generic Page TableJason Gunthorpe
This intends to have high coverage of the page table format functions, it uses the IOMMU implementation to create a tree which it then walks through and directly calls the generic page table functions to test them. It is a good starting point to test a new format header as it is often able to find typos and inconsistencies much more directly, rather than with an obscure failure in the iommu implementation. The tests can be run with commands like: tools/testing/kunit/kunit.py run --build_dir build_kunit_arm64 --arch arm64 --make_options LLVM=-19 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_uml --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig --kconfig_add CONFIG_WERROR=n tools/testing/kunit/kunit.py run --build_dir build_kunit_x86_64 --arch x86_64 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_i386 --arch i386 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig tools/testing/kunit/kunit.py run --build_dir build_kunit_i386pae --arch i386 --kunitconfig ./drivers/iommu/generic_pt/.kunitconfig --kconfig_add CONFIG_X86_PAE=y There are several interesting corner cases on the 32 bit platforms that need checking. The format can declare a list of configurations that generate different configurations the initialize the page table, for instance with different top levels or other parameters. The kunit will turn these into "params" which cause each test to run multiple times. The tests are repeated to run at every table level to check that all the item encoding formats work. The following are checked: - Basic init works for each configuration - The various log2 functions have the expected behavior at the limits - pt_compute_best_pgsize() works - pt_table_pa() reads back what pt_install_table() writes - range.max_vasz_lg2 works properly - pt_table_oa_lg2sz() and pt_table_item_lg2sz() use a contiguous non-overlapping set of bits from the VA up to the defined max_va - pt_possible_sizes() and pt_can_have_leaf() produces a sensible layout - pt_item_oa(), pt_entry_oa(), and pt_entry_num_contig_lg2() read back what pt_install_leaf_entry() writes - pt_clear_entry() works - pt_attr_from_entry() reads back what pt_iommu_set_prot() & pt_install_leaf_entry() writes - pt_entry_set_write_clean(), pt_entry_make_write_dirty(), and pt_entry_write_is_dirty() work Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add read_and_clear_dirty opJason Gunthorpe
IOMMU HW now supports updating a dirty bit in an entry when a DMA writes to the entry's VA range. iommufd has a uAPI to read and clear the dirty bits from the tables. This is a trivial recursive descent algorithm to read and optionally clear the dirty bits. The format needs a function to tell if a contiguous entry is dirty, and a function to clear a contiguous entry back to clean. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add map_pages opJason Gunthorpe
map is slightly complicated because it has to handle a number of special edge cases: - Overmapping a previously shared, but now empty, table level with an OA. Requries validating and freeing the possibly empty tables - Doing the above across an entire to-be-created contiguous entry - Installing a new shared table level concurrently with another thread - Expanding the table by adding more top levels Table expansion is a unique feature of AMDv1, this version is quite similar except we handle racing concurrent lockless map. The table top pointer and starting level are encoded in a single uintptr_t which ensures we can READ_ONCE() without tearing. Any op will do the READ_ONCE() and use that fixed point as its starting point. Concurrent expansion is handled with a table global spinlock. When inserting a new table entry map checks that the entire portion of the table is empty. This includes freeing any empty lower tables that will be overwritten by an OA. A separate free list is used while checking and collecting all the empty lower tables so that writing the new entry is uninterrupted, either the new entry fully writes or nothing changes. A special fast path for PAGE_SIZE is implemented that does a direct walk to the leaf level and installs a single entry. This gives ~15% improvement for iommu_map() when mapping lists of single pages. This version sits under the iommu_domain_ops as map_pages() but does not require the external page size calculation. The implementation is actually map_range() and can do arbitrary ranges, internally handling all the validation and supporting any arrangment of page sizes. A future series can optimize iommu_map() to take advantage of this. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add unmap_pages opJason Gunthorpe
unmap_pages removes mappings and any fully contained interior tables from the given range. This follows the now-standard iommu_domain API definition where it does not split up larger page sizes into smaller. The caller must perform unmap only on ranges created by map or it must have somehow otherwise determined safe cut points (eg iommufd/vfio use iova_to_phys to scan for them) A future work will provide 'cut' which explicitly does the page size split if the HW can support it. unmap is implemented with a recursive descent of the tree. If the caller provides a VA range that spans an entire table item then the table memory can be freed as well. If an entire table item can be freed then this version will also check the leaf-only level of the tree to ensure that all entries are present to generate -EINVAL. Many of the existing drivers don't do this extra check. This version sits under the iommu_domain_ops as unmap_pages() but does not require the external page size calculation. The implementation is actually unmap_range() and can do arbitrary ranges, internally handling all the validation and supporting any arrangment of page sizes. A future series can optimize __iommu_unmap() to take advantage of this. Freed page table memory is batched up in the gather and will be freed in the driver's iotlb_sync() callback after the IOTLB flush completes. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add iova_to_phys opJason Gunthorpe
iova_to_phys is a performance path for the DMA API and iommufd, implement it using an unrolled get_user_pages() like function waterfall scheme. The implementation itself is fairly trivial. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2025-11-05iommupt: Add the AMD IOMMU v1 page table formatJason Gunthorpe
AMD IOMMU v1 is unique in supporting contiguous pages with a variable size and it can decode the full 64 bit VA space. Unlike other x86 page tables this explicitly does not do sign extension as part of allowing the entire 64 bit VA space to be supported. The general design is quite similar to the x86 PAE format, except with a 6th level and quite different PTE encoding. This format is the only one that uses the PT_FEAT_DYNAMIC_TOP feature in the existing code as the existing AMDv1 code starts out with a 3 level table and adds levels on the fly if more IOVA is needed. Comparing the performance of several operations to the existing version: iommu_map() pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 65,64 , 62,61 , -1.01 2^13, 70,66 , 67,62 , -8.08 2^14, 73,69 , 71,65 , -9.09 2^15, 78,75 , 75,71 , -5.05 2^16, 89,89 , 86,84 , -2.02 2^17, 128,121 , 124,112 , -10.10 2^18, 175,175 , 170,163 , -4.04 2^19, 264,306 , 261,279 , 6.06 2^20, 444,525 , 438,489 , 10.10 2^21, 60,62 , 58,59 , 1.01 256*2^12, 381,1833 , 367,1795 , 79.79 256*2^21, 375,1623 , 356,1555 , 77.77 256*2^30, 356,1338 , 349,1277 , 72.72 iommu_unmap() pgsz ,avg new,old ns, min new,old ns , min % (+ve is better) 2^12, 76,89 , 71,86 , 17.17 2^13, 79,89 , 75,86 , 12.12 2^14, 78,90 , 74,86 , 13.13 2^15, 82,89 , 74,86 , 13.13 2^16, 79,89 , 74,86 , 13.13 2^17, 81,89 , 77,87 , 11.11 2^18, 90,92 , 87,89 , 2.02 2^19, 91,93 , 88,90 , 2.02 2^20, 96,95 , 91,92 , 1.01 2^21, 72,88 , 68,85 , 20.20 256*2^12, 372,6583 , 364,6251 , 94.94 256*2^21, 398,6032 , 392,5758 , 93.93 256*2^30, 396,5665 , 389,5258 , 92.92 The ~5-17x speedup when working with mutli-PTE map/unmaps is because the AMD implementation rewalks the entire table on every new PTE while this version retains its position. The same speedup will be seen with dirtys as well. The old implementation triggers a compiler optimization that ends up generating a "rep stos" memset for contiguous PTEs. Since AMD can have contiguous PTEs that span 2Kbytes of table this is a huge win compared to a normal movq loop. It is why the unmap side has a fairly flat runtime as the contiguous PTE sides increases. This version makes it explicit with a memset64() call. Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>