linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
2025-07-16	drm/amdgpu/vcn5: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/vcn4.0.5: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/vcn4.0.3: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/vcn4: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg5.0.1: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg5: add queue reset	Alex Deucher
	Add queue reset support for jpeg 5.0.0. Use the new helpers to re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg4.0.5: add queue reset	Alex Deucher
	Add queue reset support for jpeg 4.0.5. Use the new helpers to re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg4.0.3: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg4: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg3: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg2.5: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg2: re-emit unprocessed state on ring reset	Alex Deucher
	Re-emit the unprocessed state after resetting the queue. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Tested-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amd/pm: Remove unnecessary variable	Asad Kamal
	Remove unnecessary variable ret from smu_v13_0_12_get_smu_metrics_data Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507150618.WOfvWsQF-lkp@intel.com Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu: Increase reset counter only on success	Lijo Lazar
	Increment the reset counter only if soft recovery succeeded. This is consistent with a ring hard reset behaviour where counter gets incremented only if hard reset succeeded. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amd/pm: Get max/min frequency on aldebaran VF	Lijo Lazar
	PMFW interface to get max/min frequencies is not available on aldebaran VFs. Use data, if available, in DPM tables to get the max/min frequencies. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu: make compute timeouts consistent	Alex Deucher
	For kernel compute queues, align the timeout with other kernel queues (10 sec). This had previously been set higher for OpenCL when it used kernel queues, but now OpenCL uses KFD user queues which don't have a timeout limitation. This also aligns with SR-IOV which already used a shorter timeout. Additionally the longer timeout negatively impacts the user experience with kernel queues for interactive applications. Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu: Check SQ_CONFIG register support on SRIOV	Tony Yi
	On SRIOV environments, check if RLCG supports SQ_CONFIG register programming. Signed-off-by: Tony Yi <Tony.Yi@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu: track ring state associated with a fence	Alex Deucher
	We need to know the wptr and sequence number associated with a fence so that we can re-emit the unprocessed state after a ring reset. Pre-allocate storage space for the ring buffer contents and add helpers to save off and re-emit the unprocessed state so that it can be re-emitted after the queue is reset. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	thermal/drivers/rockchip: Support reading trim values from OTP	Nicolas Frattaroli
	Many of the Rockchip SoCs support storing trim values for the sensors in factory programmable memory. These values specify a fixed offset from the sensor's returned temperature to get a more accurate picture of what temperature the silicon is actually at. The way this is implemented is with various OTP cells, which may be absent. There may both be whole-TSADC trim values, as well as per-sensor trim values. In the downstream driver, whole-chip trim values override the per-sensor trim values. This rewrite of the functionality changes the semantics to something I see as slightly more useful: allow the whole-chip trim values to serve as a fallback for lacking per-sensor trim values, instead of overriding already present sensor trim values. Additionally, the chip may specify an offset (trim_base, trim_base_frac) in degrees celsius and degrees decicelsius respectively which defines what the basis is from which the trim, if any, should be calculated from. By default, this is 30 degrees Celsius, but the chip can once again specify a different value through OTP cells. The implementation of these trim calculations have been tested extensively on an RK3576, where it was confirmed to get rid of pesky 1.8 degree Celsius offsets between certain sensors. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Link: https://lore.kernel.org/r/20250610-rk3576-tsadc-upstream-v6-5-b6e9efbf1015@collabora.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-07-16	thermal/drivers/rockchip: Support RK3576 SoC in the thermal driver	Ye Zhang
	The RK3576 SoC has six TS-ADC channels: TOP, BIG_CORE, LITTLE_CORE, DDR, NPU and GPU. Signed-off-by: Ye Zhang <ye.zhang@rock-chips.com> [ported to mainline, reworded commit message] Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250610-rk3576-tsadc-upstream-v6-3-b6e9efbf1015@collabora.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-07-16	thermal/drivers/rockchip: Rename rk_tsadcv3_tshut_mode	Nicolas Frattaroli
	The "v" version specifier here refers to the hardware IP revision. Mainline deviated from downstream here by calling the v4 revision v3 as it didn't support the v3 hardware revision at all. This creates needless confusion, so rename it to rk_tsadcv4_tshut_mode to be consistent with what the hardware wants to be called. Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com> Reviewed-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250610-rk3576-tsadc-upstream-v6-1-b6e9efbf1015@collabora.com Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2025-07-16	drm/amdgpu: clean up GC reset functions	Alex Deucher
	Make them consistent and use the reset flags. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu: clean up jpeg reset functions	Alex Deucher
	Make them consistent and use the reset flags. Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/vcn: don't enable per queue resets on SR-IOV	Alex Deucher
	Power control is only available in bare metal. SR-IOV will need a different method. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu/jpeg4: add additional ring reset error checking	Alex Deucher
	Start and stop can fail, so add checks. Fixes: 74894ffc7d0c ("drm/amdgpu: Add ring reset callback for JPEG4_0_0") Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sathishkumar S <sathishkumar.sundararaju@amd.com>
2025-07-16	drm/amdgpu/jpeg3: add additional ring reset error checking	Alex Deucher
	Start and stop can fail, so add checks. Fixes: 03399d0bff25 ("drm/amdgpu: Add ring reset callback for JPEG3_0_0") Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sathishkumar S <sathishkumar.sundararaju@amd.com>
2025-07-16	drm/amdgpu/jpeg2: add additional ring reset error checking	Alex Deucher
	Start and stop can fail, so add checks. Fixes: 500c04d2a708 ("drm/amdgpu: Add ring reset callback for JPEG2_0_0") Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Sathishkumar S <sathishkumar.sundararaju@amd.com>
2025-07-16	drm/amdgpu: clean up sdma reset functions	Alex Deucher
	Make them consistent and drop unneeded extra variables. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	drm/amdgpu: Fix missing unlocking in an error path in amdgpu_userq_create()	Christophe JAILLET
	If kasprintf() fails, some mutex still need to be released to avoid locking issue, as already done in all other error handling path. Fixes: c03ea34cbf88 ("drm/amdgpu: add support of debugfs for mqd information") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/all/366557fa7ca8173fd78c58336986ca56953369b9.1752087753.git.christophe.jaillet@wanadoo.fr/ Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-07-16	Bluetooth: btusb: QCA: Fix downloading wrong NVM for WCN6855 GF variant ↵	Zijun Hu
	without board ID For GF variant of WCN6855 without board ID programmed btusb_generate_qca_nvm_name() will chose wrong NVM 'qca/nvm_usb_00130201.bin' to download. Fix by choosing right NVM 'qca/nvm_usb_00130201_gf.bin'. Also simplify NVM choice logic of btusb_generate_qca_nvm_name(). Fixes: d6cba4e6d0e2 ("Bluetooth: btusb: Add support using different nvm for variant WCN6855 controller") Signed-off-by: Zijun Hu <zijun.hu@oss.qualcomm.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16	Bluetooth: hci_dev: replace 'quirks' integer by 'quirk_flags' bitmap	Christian Eggers
	The 'quirks' member already ran out of bits on some platforms some time ago. Replace the integer member by a bitmap in order to have enough bits in future. Replace raw bit operations by accessor macros. Fixes: ff26b2dd6568 ("Bluetooth: Add quirk for broken READ_VOICE_SETTING") Fixes: 127881334eaa ("Bluetooth: Add quirk for broken READ_PAGE_SCAN_TYPE") Suggested-by: Pauli Virtanen <pav@iki.fi> Tested-by: Ivan Pravdin <ipravdin.official@gmail.com> Signed-off-by: Kiran K <kiran.k@intel.com> Signed-off-by: Christian Eggers <ceggers@arri.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2025-07-16	Bluetooth: btintel: Check if controller is ISO capable on ↵	Luiz Augusto von Dentz
	btintel_classify_pkt_type Due to what seem to be a bug with variant version returned by some firmwares the code may set hdev->classify_pkt_type with btintel_classify_pkt_type when in fact the controller doesn't even support ISO channels feature but may use the handle range expected from a controllers that does causing the packets to be reclassified as ISO causing several bugs. To fix the above btintel_classify_pkt_type will attempt to check if the controller really supports ISO channels and in case it doesn't don't reclassify even if the handle range is considered to be ISO, this is considered safer than trying to fix the specific controller/firmware version as that could change over time and causing similar problems in the future. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219553 Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2100565 Link: https://github.com/StarLabsLtd/firmware/issues/180 Fixes: f25b7fd36cc3 ("Bluetooth: Add vendor-specific packet classification for ISO data") Cc: stable@vger.kernel.org Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Sean Rhodes <sean@starlabs.systems>
2025-07-16	drm/xe: Unify the initialization of VRAM regions	Piotr Piórkowski
	Currently in the drivers we have defined VRAM regions per device and per tile. Initialization of these regions is done in two completely different ways. To simplify the logic of the code and make it easier to add new regions in the future, let's unify the way we initialize VRAM regions. v2: - fix doc comments in struct xe_vram_region - remove unnecessary includes (Jani) v3: - move code from xe_vram_init_regions_managers to xe_tile_init_noalloc (Matthew) - replace ioremap_wc to devm_ioremap_wc for mapping VRAM BAR (Matthew) - Replace the tile id parameter with vram region in the xe_pf_begin function. v4: - remove tile back pointer from struct xe_vram_region - add new back pointers: xe and migarte to xe_vram_region Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-6-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16	ACPI: TAD: Replace sprintf() with sysfs_emit()	Sukrut Heroorkar
	Replace sprintf() in *_show() callbacks of sysfs attributes with sysfs_emit(). While the current implementation works, sysfs_emit() helps to prevent potential buffer overflows and aligns with kernel documentation Documentation/filesystems/sysfs.rst. Tested on an x86_64 system with acpi_tad built as a module: - Inserted patched acpi_tad.ko successfully - Verified /sys/devices/platform/ACPI000E:00/time and /caps are accessible - Confirmed correct output from 'cat' with no dmesg errors Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com> Link: https://patch.msgid.link/20250716123543.495628-1-hsukrut3@gmail.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-16	drm/xe: Split xe_migrate allocation from initialization	Piotr Piórkowski
	Currently, xe_migrate_init handled both allocation and initialization, Lets moves allocation to xe_tile_alloc via a new xe_migrate_alloc function, and keep initialization in xe_migrate_init. This will allow the migration pointers to be passed to other structures before full initialization. Also replaces devm_kzalloc with drmm_kzalloc for better DRM-managed memory. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-5-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16	drm/xe: Move struct xe_vram_region to a dedicated header	Piotr Piórkowski
	Let's move the xe_vram_region structure to a new header dedicated to VRAM to improve modularity and avoid unnecessary dependencies when only VRAM-related structures are needed. v2: Fix build if CONFIG_DRM_XE_DEVMEM_MIRROR is enabled v3: Fix build if CONFIG_DRM_XE_DISPLAY is enabled v4: Move helper to get tile dpagemap to xe_svm.c Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> # rev3 Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-4-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16	drm/xe: Use dynamic allocation for tile and device VRAM region structures	Piotr Piórkowski
	In future platforms, we will need to represent the device and tile VRAM regions in a more dynamic way, so let's abandon the static allocation of these structures and start use a dynamic allocation. v2: - Add a helpers for accessing fields of the xe_vram_region structure v3: - Add missing EXPORT_SYMBOL_IF_KUNIT for xe_vram_region_actual_physical_size Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16	ACPI: APEI: handle synchronous exceptions in task work	Shuai Xue
	The memory uncorrected error could be signaled by asynchronous interrupt (specifically, SPI in arm64 platform), e.g. when an error is detected by a background scrubber, or signaled by synchronous exception (specifically, data abort exception in arm64 platform), e.g. when a CPU tries to access a poisoned cache line. Currently, both synchronous and asynchronous errors use memory_failure_queue() to schedule memory_failure() to exectute in a kworker context. As a result, when a user-space process is accessing a poisoned data, a data abort is taken and the memory_failure() is executed in the kworker context, which: - will send wrong si_code by SIGBUS signal in early_kill mode, and - can not kill the user-space in some cases resulting a synchronous error infinite loop Issue 1: send wrong si_code in early_kill mode Since commit a70297d22132 ("ACPI: APEI: set memory failure flags as MF_ACTION_REQUIRED on synchronous events")', the flag MF_ACTION_REQUIRED could be used to determine whether a synchronous exception occurs on ARM64 platform. When a synchronous exception is detected, the kernel is expected to terminate the current process which has accessed a poisoned page. This is done by sending a SIGBUS signal with error code BUS_MCEERR_AR, indicating an action-required machine check error on read. However, when kill_proc() is called to terminate the processes who has the poisoned page mapped, it sends the incorrect SIGBUS error code BUS_MCEERR_AO because the context in which it operates is not the one where the error was triggered. To reproduce this problem: #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 5 addr 0xffffb0d75000 page not present Test passed The si_code (code 5) from einj_mem_uc indicates that it is BUS_MCEERR_AO error and it is not factually correct. After this change: # STEP1: enable early kill mode #sysctl -w vm.memory_failure_early_kill=1 vm.memory_failure_early_kill = 1 # STEP2: inject an UCE error and consume it to trigger a synchronous error #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed The si_code (code 4) from einj_mem_uc indicates that it is a BUS_MCEERR_AR error as expected. Issue 2: a synchronous error infinite loop If a user-space process, e.g. devmem, accesses a poisoned page for which the HWPoison flag is set, kill_accessing_process() is called to send SIGBUS to current processs with error info. Since the memory_failure() is executed in the kworker context, it will just do nothing but return EFAULT. So, devmem will access the posioned page and trigger an exception again, resulting in a synchronous error infinite loop. Such exception loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. To reproduce this problem: # STEP 1: inject an UCE error, and kernel will set HWPosion flag for related page #einj_mem_uc single 0: single vaddr = 0xffffb0d75400 paddr = 4092d55b400 injecting ... triggering ... signal 7 code 4 addr 0xffffb0d75000 page not present Test passed # STEP 2: access the same page and it will trigger a synchronous error infinite loop devmem 0x4092d55b400 To fix above two issues, queue memory_failure() as a task_work so that it runs in the context of the process that is actually consuming the poisoned data. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Tested-by: Ma Wupeng <mawupeng1@huawei.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Xiaofei Tan <tanxiaofei@huawei.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250714114212.31660-3-xueshuai@linux.alibaba.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-16	ACPI: APEI: send SIGBUS to current task if synchronous memory error not ↵	Shuai Xue
	recovered If a synchronous error is detected as a result of user-space process triggering a 2-bit uncorrected error, the CPU will take a synchronous error exception such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a memory_failure() work which poisons the related page, unmaps the page, and then sends a SIGBUS to the process, so that a system wide panic can be avoided. However, no memory_failure() work will be queued when abnormal synchronous errors occur. These errors can include situations like invalid PA, unexpected severity, no memory failure config support, invalid GUID section, etc. In such a case, the user-space process will trigger SEA again. This loop can potentially exceed the platform firmware threshold or even trigger a kernel hard lockup, leading to a system reboot. Fix it by performing a force kill if no memory_failure() work is queued for synchronous errors. Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Reviewed-by: Jane Chu <jane.chu@oracle.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Link: https://patch.msgid.link/20250714114212.31660-2-xueshuai@linux.alibaba.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-16	drm/xe: Use devm_ioremap_wc for VRAM mapping and drop manual unmap	Piotr Piórkowski
	Let's replace the manual call to ioremap_wc function with devm_ioremap_wc function, ensuring that VRAM mappings are automatically released when the driver is detached. Since devm_ioremap_wc registers the mapping with the device's managed resources, the explicit iounmap call in vram_fini is no longer needed, so let's remove it. Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250714184818.89201-2-piotr.piorkowski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16	remoteproc: imx_rproc: skip clock enable when M-core is managed by the SCU	Hiago De Franco
	For the i.MX8X and i.MX8 family SoCs, when the Cortex-M core is powered up and started by the Cortex-A core using the bootloader (e.g., via the U-Boot bootaux command), both M-core and Linux run within the same SCFW (System Controller Firmware) partition. With that, Linux has permission to control the M-core. But once the M-core is started by the bootloader, the SCFW automatically enables its clock and sets the clock rate. If Linux later attempts to enable the same clock via clk_prepare_enable(), the SCFW returns a 'LOCKED' error, as the clock is already configured by the SCFW. This causes the probe function in imx_rproc.c to fail, leading to the M-core power domain being shut down while the core is still running. This results in a fault from the SCU (System Controller Unit) and triggers a system reset. To address this issue, ignore handling the clk for i.MX8X and i.MX8 M-core, as SCFW already takes care of enabling and configuring the clock. Suggested-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Hiago De Franco <hiago.franco@toradex.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20250629172512.14857-3-hiagofranco@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-07-16	pmdomain: core: introduce dev_pm_genpd_is_on()	Hiago De Franco
	This helper function returns the current power status of a given generic power domain. As example, remoteproc/imx_rproc.c can now use this function to check the power status of the remote core to properly set "attached" or "offline" modes. Suggested-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Hiago De Franco <hiago.franco@toradex.com> Link: https://lore.kernel.org/r/20250629172512.14857-2-hiagofranco@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-07-16	cxl: Convert to ACQUIRE() for conditional rwsem locking	Dan Williams
	Use ACQUIRE() to cleanup conditional locking paths in the CXL driver The ACQUIRE() macro and its associated ACQUIRE_ERR() helpers, like scoped_cond_guard(), arrange for scoped-based conditional locking. Unlike scoped_cond_guard(), these macros arrange for an ERR_PTR() to be retrieved representing the state of the conditional lock. The goal of this conversion is to complete the removal of all explicit unlock calls in the subsystem. I.e. the methods to acquire a lock are solely via guard(), scoped_guard() (for limited cases), or ACQUIRE(). All unlock is implicit / scope-based. In order to make sure all lock sites are converted, the existing rwsem's are consolidated and renamed in 'struct cxl_rwsem'. While that makes the patch noisier it gives a clean cut-off between old-world (explicit unlock allowed), and new world (explicit unlock deleted). Cc: David Lechner <dlechner@baylibre.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Shiju Jose <shiju.jose@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250711234932.671292-9-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	cxl/region: Consolidate cxl_decoder_kill_region() and cxl_region_detach()	Dan Williams
	Both detach_target() and cxld_unregister() want to tear down a cxl_region when an endpoint decoder is either detached or destroyed. When a region is to be destroyed cxl_region_detach() releases cxl_region_rwsem unbinds the cxl_region driver and re-acquires the rwsem. This "reverse" locking pattern is difficult to reason about, not amenable to scope-based cleanup, and the minor differences in the calling context of detach_target() and cxld_unregister() currently results in the cxl_decoder_kill_region() wrapper. Introduce cxl_decoder_detach() to wrap a core __cxl_decoder_detach() that serves both cases. I.e. either detaching a known position in a region (interruptible), or detaching an endpoint decoder if it is found to be a member of a region (uninterruptible). Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://patch.msgid.link/20250711234932.671292-8-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	cxl/region: Move ready-to-probe state check to a helper	Dan Williams
	Rather than unlocking the region rwsem in the middle of cxl_region_probe() create a helper for determining when the region is ready-to-probe. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Link: https://patch.msgid.link/20250711234932.671292-7-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	cxl/region: Split commit_store() into __commit() and queue_reset() helpers	Dan Williams
	The complexity of dropping the lock is removed in favor of splitting commit operations to a helper, and leaving all the complexities of "decommit" for commit_store() to coordinate the different locking contexts. The CPU cache-invalidation in the decommit path is solely handled now by cxl_region_decode_reset(). Previously the CPU caches were being needlessly flushed twice in the decommit path where the first flush had no guarantee that the memory would not be immediately re-dirtied. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Link: https://patch.msgid.link/20250711234932.671292-6-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	cxl/decoder: Drop pointless locking	Dan Williams
	cxl_dpa_rwsem coordinates changes to dpa allocation settings for a given decoder. cxl_decoder_reset() has no need for a consistent snapshot of the dpa settings since it is merely clearing out whatever was there previously. Otherwise, cxl_region_rwsem protects against 'reset' racing 'setup'. In preparation for converting to rw_semaphore_acquire semantics, drop this locking. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250711234932.671292-5-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	cxl/decoder: Move decoder register programming to a helper	Dan Williams
	In preparation for converting to rw_semaphore_acquire semantics move the contents of an open-coded {down,up}_read(&cxl_dpa_rwsem) section to a helper function. Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250711234932.671292-4-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	cxl/mbox: Convert poison list mutex to ACQUIRE()	Dan Williams
	Towards removing all explicit unlock calls in the CXL subsystem, convert the conditional poison list mutex to use a conditional lock guard. Rename the lock to have the compiler validate that all existing call sites are converted. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Ira Weiny <ira.weiny@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250711234932.671292-3-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2025-07-16	drm: Make passing of format info to drm_helper_mode_fill_fb_struct() mandatory	Ville Syrjälä
	Now that everyone passes along the format info to drm_helper_mode_fill_fb_struct() we can make this behaviour mandatory and drop the extra lookup. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-20-ville.syrjala@linux.intel.com