summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2025-01-15drm/i915: Include the scanline offset in the state dumpVille Syrjälä
When looking at raw hardware scanline numbers it's helpful to remember what the offset between the hardware values and our more human readable numbers should be. Include that in the state dump. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-9-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915/vrr: Improve VRR state dumpVille Syrjälä
Dump the calculated vmin/vmax vtotal values in addition to the raw vmin/vmax/flipline values. Makes it much easier to see what kind of scanline values we should be expecting from the hardware. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-8-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915: Include the vblank delay in the state dumpVille Syrjälä
While one can look at the crtc timings to determine the actual vblank dealy, it seems nicer to provide a more human readable dump of it to ease our lives. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-7-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915: Move framestart/etc. state dump to a better spotVille Syrjälä
Try to dump all the important stuff relating to the display timings in one spot. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-6-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915: Introduce intel_vrr_{vmin,vmax}_vtotal()Ville Syrjälä
On ICL/TGL vmin/vmax/flipline won't actually match the vtotal values (currently they do, but that is wrong and needs to be fixed). Add a few helpers that will compute the actual vtotal values for us. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-5-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915: Fix include orderVille Syrjälä
Include the headers in the correct alphabetical order. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-4-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915: Check vblank delay validityVille Syrjälä
Make sure we have enough vblank for the computed vblank delay. Supposedly we'd reject things anyway later if this gets violated, but it seems nicer to do some basic sanity checks early just so we can be sure the basic relationship vblank_end > vblank_start always holds. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-3-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/i915: Extract intel_crtc_vblank_delay()Ville Syrjälä
Pull the vblank delay computation into a separate function. We'll need more logic here soon and we don't want to pollute intel_crtc_compute_config() with low level details. We'll use HAS_DSB() to determine if any delay might be required or not because delayed vblank only really exists for the purposes of the DSB. It also doesn't event exists on any pre-tgl platforms, which also don't have DSB. I was midly tempted to check for the enable_dsb modparam here actually, but as that can be changed dynamically via debugfs we'd need to either reconfigure it on the fly or force a modeset. Neither will happen currently, so we'll just assume DSB may be used of the platform supports it. Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210211007.5976-2-ville.syrjala@linux.intel.com Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
2025-01-15drm/tests/buddy: fix build with unused prngMatthew Auld
We no longer use the prng, which leads to the following warning: drivers/gpu/drm/tests/drm_buddy_test.c: In function ‘drm_test_buddy_alloc_clear’: drivers/gpu/drm/tests/drm_buddy_test.c:264:23: error: unused variable ‘prng’ [-Werror=unused-variable] 264 | DRM_RND_STATE(prng, random_seed); v2 (Thomas) - Add Closes links Reported-by: Jani Nikula <jani.nikula@intel.com> Closes: https://lore.kernel.org/dri-devel/875xmggvcs.fsf@intel.com/ Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Closes: https://lore.kernel.org/dri-devel/3f816555e6913fdbcb254523f65c98088d70581b.camel@intel.com/ Fixes: 8cb3a1e2b350 ("drm/buddy: Add a testcase to verify the multiroot fini") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250115124941.155990-2-matthew.auld@intel.com
2025-01-15drm/xe: Reject BO eviction if BO is bound to current VMOak Zeng
This is a follow up fix for https://patchwork.freedesktop.org/patch/msgid/20241203021929.1919730-1-oak.zeng@intel.com The overall goal is to fail vm_bind when there is memory pressure. See more details in the commit message of above patch. Abbove patch fixes the issue when user pass in a vm_id parameter during gem_create. If user doesn't pass in a vm_id during gem_create, above patch doesn't help. This patch further reject BO eviction (which could be triggered by bo validation) if BO is bound to the current VM. vm_bind could fail due to the eviction failure. The BO to VM reverse mapping structure is used to determine whether BO is bound to VM. v2: Move vm_bo definition from function scope to if(evict) clause (Thomas) Further constraint the condition by adding ctx->resv (Thomas) Add a short comment describe the change. Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110210137.3181576-1-oak.zeng@intel.com
2025-01-15drm/xe/oa: Add missing VISACTL mux registersAshutosh Dixit
Add missing VISACTL mux registers required for some OA config's (e.g. RenderPipeCtrl). Fixes: cdf02fe1a94a ("drm/xe/oa/uapi: Add/remove OA config perf ops") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250111021539.2920346-1-ashutosh.dixit@intel.com (cherry picked from commit c26f22dac3449d8a687237cdfc59a6445eb8f75a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe: make change ccs_mode a synchronous actionMaciej Patelczyk
If ccs_mode is being modified via /sys/class/drm/cardX/device/tileY/gtY/ccs_mode the asynchronous reset is triggered and the write returns immediately. With that some test receive false information about number of CCS engines or even fail if they proceed without delay after changing the ccs_mode. Changing the ccs_mode change from async to sync to prevent failures in tests. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211111727.1481476-3-maciej.patelczyk@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 480fb9806e2e073532f7786166287114c696b340) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe: introduce xe_gt_reset and xe_gt_wait_for_resetMaciej Patelczyk
Add synchronous version gt reset as there are few places where it is expected. Also add a wait helper to wait until gt reset is done. Signed-off-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Fixes: f3bc5bb4d53d ("drm/xe: Allow userspace to configure CCS mode") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241211111727.1481476-2-maciej.patelczyk@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> (cherry picked from commit 155c77f45f63dd58a37eeb0896b0b140ab785836) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/xe/guc: Adding steering info support for GuC register listsJesus Narvaez
The guc_mmio_reg interface supports steering, but it is currently not implemented. This will allow the GuC to control steering of MMIO registers after save-restore and avoid reading from fused off MCR register instances. Fixes: 9c57bc08652a ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com (cherry picked from commit ee5a1321df90891d59d83b7c9d5b6c5b755d059d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-01-15drm/bridge: ite-it6263: Prevent error pointer dereference in probe()Dan Carpenter
If devm_i2c_new_dummy_device() fails then we were supposed to return an error code, but instead the function continues and will crash on the next line. Add the missing return statement. Fixes: 049723628716 ("drm/bridge: Add ITE IT6263 LVDS to HDMI converter") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/804a758b-f2e7-4116-b72d-29bc8905beed@stanley.mountain
2025-01-14drm/v3d: Ensure job pointer is set to NULL after job completionMaíra Canal
After a job completes, the corresponding pointer in the device must be set to NULL. Failing to do so triggers a warning when unloading the driver, as it appears the job is still active. To prevent this, assign the job pointer to NULL after completing the job, indicating the job has finished. Fixes: 14d1d1908696 ("drm/v3d: Remove the bad signaled() implementation.") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113154741.67520-1-mcanal@igalia.com
2025-01-14drm/xe: Remove unused "mmio_ext" codeMatt Roper
The "mmio_ext" and 'REG_EXT" code is currently unused on any existing platform. Going forward, this also isn't the design we want to use for any future platforms/features either, so we should just go ahead and remove the dead code to avoid confusion. mmio_ext was originally added in an attempt to hack around the early (mis)design of the Xe driver, which used xe_gt as the target for all register MMIO access, even those completely unrelated to the GT subunit of the hardware. With the introduction of commit 34953ee349dd ("drm/xe: Create dedicated xe_mmio structure") and its follow-up patches, that misdesign has been corrected and access to register MMIO regions specific to hardware units is now done through xe_mmio structures which encapsulate an iomap, region size, and some other metadata. Although all of the registers used by the driver today happen to fall within one specific PCI BAR region, and thus re-use a single device-wide iomap, there's no requirement that this stay true for future platforms or features. I.e., if a future platform adds a new 'foo' hardware unit that exists at a different area in the BAR, or even in a completely different BAR, then that would be handled by doing a separate iomap of that unit's register region and wrapping it in its own 'struct xe_mmio foo_regs' structure. The pointer to the new 'foo_regs' could be placed within the xe_device, xe_tile, xe_gt, etc., according to where the new hardware unit falls within the current hardware hierarchy. This effectively reverts the following commits, although parts of these commits had already vanished or changed with the earlier xe_mmio refactor work: - commit 399a13323f0d ("drm/xe: add 28-bit address support in struct xe_reg") - commit fdef72e02e20 ("drm/xe: add a flag to bypass multi-tile config from MTCFG reg") - commit 866b2b176434 ("drm/xe: add MMIO extension support flags") - commit ef29b390c734 ("drm/xe: map MMIO BAR according to the num of tiles in device desc") - commit a4e2f3a299ea ("drm/xe: refactor xe_mmio_probe_tiles to support MMIO extension") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Koby Elbaz <kelbaz@habana.ai> Acked-by: Maciej Patelczyk <maciej.patelczyk@intel.com> Reviewed-by: Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106234312.2986065-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-01-14drm/v3d: Remove `v3d->cpu_job`Maíra Canal
CPU jobs, like Cache Clean jobs, execute synchronously once the DRM scheduler starts running them. Consequently, a global `v3d->cpu_job` variable is unnecessary, as everything is managed within the `v3d_cpu_job_run()` function. This commit removes the `v3d->cpu_job` pointer, as it is not needed. Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113154741.67520-2-mcanal@igalia.com
2025-01-14drm/vmwgfx: Add new keep_resv BO paramIan Forbes
Adds a new BO param that keeps the reservation locked after creation. This removes the need to re-reserve the BO after creation which is a waste of cycles. This also fixes a bug in vmw_prime_import_sg_table where the imported reservation is unlocked twice. Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Fixes: b32233acceff ("drm/vmwgfx: Fix prime import/export") Reviewed-by: Zack Rusin <zack.rusin@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110185335.15301-1-ian.forbes@broadcom.com
2025-01-14drm/vmwgfx: Remove busy_placesIan Forbes
Unused since commit a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250108201355.2521070-1-ian.forbes@broadcom.com
2025-01-14drm/vmwgfx: Unreserve BO on errorIan Forbes
Unlock BOs in reverse order. Add an acquire context so that lockdep doesn't complain. Fixes: d6667f0ddf46 ("drm/vmwgfx: Fix handling of dumb buffers") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241210195535.2074918-1-ian.forbes@broadcom.com
2025-01-14drm/display: hdmi: Do not read EDID on disconnected connectorsCristian Ciocaltea
The recently introduced hotplug event handler in the HDMI Connector framework attempts to unconditionally read the EDID data, leading to a bunch of non-harmful, yet quite annoying DDC/I2C related errors being reported. Ensure the operation is done only for connectors having the status connected or unknown. Additionally, perform an explicit reset of the connector information when dealing with a disconnected status. Fixes: ab716b74dc9d ("drm/display/hdmi: implement hotplug functions") Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113-hdmi-conn-edid-read-fix-v2-1-d2a0438a44ab@collabora.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/tests: hdmi: Add connector disablement testLiu Ying
Atomic check should succeed when disabling a connector. Add a test case drm_test_check_disabling_connector() to make sure of this. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Liu Ying <victor.liu@nxp.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110084821.3239518-3-victor.liu@nxp.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/connector: hdmi: Do atomic check when necessaryLiu Ying
It's ok to pass atomic check successfully if an atomic commit tries to disable the display pipeline which the connector belongs to. That is, when the crtc or the best_encoder pointers in struct drm_connector_state are NULL, drm_atomic_helper_connector_hdmi_check() should return 0. Without the check against the NULL pointers, drm_default_rgb_quant_range() called by drm_atomic_helper_connector_hdmi_check() would dereference the NULL pointer to_match in drm_match_cea_mode(). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Call trace: drm_default_rgb_quant_range+0x0/0x4c (P) drm_bridge_connector_atomic_check+0x20/0x2c drm_atomic_helper_check_modeset+0x488/0xc78 drm_atomic_helper_check+0x20/0xa4 drm_atomic_check_only+0x4b8/0x984 drm_atomic_commit+0x48/0xc4 drm_framebuffer_remove+0x44c/0x530 drm_mode_rmfb_work_fn+0x7c/0xa0 process_one_work+0x150/0x294 worker_thread+0x2dc/0x3dc kthread+0x130/0x204 ret_from_fork+0x10/0x20 Fixes: 8ec116ff21a9 ("drm/display: bridge_connector: provide atomic_check for HDMI bridges") Fixes: 84e541b1e58e ("drm/sun4i: use drm_atomic_helper_connector_hdmi_check()") Fixes: 65548c8ff0ab ("drm/rockchip: inno_hdmi: Switch to HDMI connector") Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250110084821.3239518-2-victor.liu@nxp.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-01-14drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}Gui Chengming
FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 62952a38d9bcf357d5ffc97615c48b12c9cd627c) Cc: stable@vger.kernel.org # 6.12.x
2025-01-14drm/amdgpu: always sync the GFX pipe on ctx switchChristian König
That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit def59436fb0d3ca0f211d14873d0273d69ebb405) Cc: stable@vger.kernel.org
2025-01-14drm/amdgpu: disable gfxoff with the compute workload on gfx12Kenneth Feng
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2affe2bbc997b3920045c2c434e480c81a5f9707) Cc: stable@vger.kernel.org # 6.12.x
2025-01-14drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX IsolationSrinivasan Shanmugam
This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] *** DEADLOCK *** [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0b6b2dd38336d5fd49214f0e4e6495e658e3ab44) Cc: stable@vger.kernel.org
2025-01-14drm/amdgpu/gfx12: Add Cleaner Shader Support for GFX12.0 GPUsSrinivasan Shanmugam
This commit enables the cleaner shader feature for GFX12.0 and GFX12.0.1 GPUs. The cleaner shader is important for clearing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs) between workloads. - This feature ensures that GPU resources are reset between workloads, preventing data leaks and ensuring accurate computation. By enabling the cleaner shader, this update enhances the security and reliability of GPU operations on GFX12.0 hardware. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: Mark debug KFD module params as unsafeKent Russell
Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: fix fw attestation for MP0_14_0_{2/3}Gui Chengming
FW attestation was disabled on MP0_14_0_{2/3}. V2: Move check into is_fw_attestation_support func. (Frank) Remove DRM_WARN log info. (Alex) Fix format. (Christian) Signed-off-by: Gui Chengming <Jack.Gui@amd.com> Reviewed-by: Frank.Min <Frank.Min@amd.com> Reviewed-by: Christian König <Christian.Koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: always sync the GFX pipe on ctx switchChristian König
That is needed to enforce isolation between contexts. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: mark a bunch of module parameters unsafeChristian König
We sometimes have people trying to use debugging options in production environments. Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: Use DRM scheduler API in amdgpu_xcp_release_schedTvrtko Ursulin
Lets use the existing helper instead of peeking into the structure directly. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Philipp Stanner <pstanner@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/amdgpu: disable gfxoff with the compute workload on gfx12Kenneth Feng
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/i915/audio: rename function prefixes from i915 to intelJani Nikula
The intel prefix is more accurate for display stuff. Rename. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/5e67f6fc5a441a9512d7855d86ce7868cc992212.1736345025.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-01-14drm/i915/audio: convert LPE audio to struct intel_displayJani Nikula
Going forward, struct intel_display will be the main display device structure. Convert intel_lpe_audio.[ch] to it. Do some minor checkpatch fixes while at it. TODO: Not sure if irq_set_chip_data(irq, dev_priv); is used. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/f04dd028cd8869cdfb9ab9eb6aceed8ff8e7ddcd.1736345025.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-01-14drm/i915/audio: convert to struct intel_displayJani Nikula
Going forward, struct intel_display will be the main display device structure. Convert intel_audio.[ch] to it, as much as possible anyway. Do some minor checkpatch fixes while at it. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/4ddcc2e704fc6b1592a878c80e15fadd82c63550.1736345025.git.jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-01-14drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX IsolationSrinivasan Shanmugam
This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` functions previously acquired `enforce_isolation_mutex` and called `amdgpu_gfx_kfd_sch_ctrl`, leading to potential deadlocks. ie., If `amdgpu_gfx_kfd_sch_ctrl` is called while `enforce_isolation_mutex` is held, and `amdgpu_gfx_enforce_isolation_handler` is called while `kfd_sch_mutex` is held, it can create a circular dependency. By ensuring consistent lock usage, this fix resolves the issue: [ 606.297333] ====================================================== [ 606.297343] WARNING: possible circular locking dependency detected [ 606.297353] 6.10.0-amd-mlkd-610-311224-lof #19 Tainted: G OE [ 606.297365] ------------------------------------------------------ [ 606.297375] kworker/u96:3/3825 is trying to acquire lock: [ 606.297385] ffff9aa64e431cb8 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}, at: __flush_work+0x232/0x610 [ 606.297413] but task is already holding lock: [ 606.297423] ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.297725] which lock already depends on the new lock. [ 606.297738] the existing dependency chain (in reverse order) is: [ 606.297749] -> #2 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}: [ 606.297765] __mutex_lock+0x85/0x930 [ 606.297776] mutex_lock_nested+0x1b/0x30 [ 606.297786] amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.298007] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.298225] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.298412] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.298603] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.298866] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.298880] process_one_work+0x21e/0x680 [ 606.298890] worker_thread+0x190/0x350 [ 606.298899] kthread+0xe7/0x120 [ 606.298908] ret_from_fork+0x3c/0x60 [ 606.298919] ret_from_fork_asm+0x1a/0x30 [ 606.298929] -> #1 (&adev->enforce_isolation_mutex){+.+.}-{3:3}: [ 606.298947] __mutex_lock+0x85/0x930 [ 606.298956] mutex_lock_nested+0x1b/0x30 [ 606.298966] amdgpu_gfx_enforce_isolation_handler+0x87/0x370 [amdgpu] [ 606.299190] process_one_work+0x21e/0x680 [ 606.299199] worker_thread+0x190/0x350 [ 606.299208] kthread+0xe7/0x120 [ 606.299217] ret_from_fork+0x3c/0x60 [ 606.299227] ret_from_fork_asm+0x1a/0x30 [ 606.299236] -> #0 ((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)){+.+.}-{0:0}: [ 606.299257] __lock_acquire+0x16f9/0x2810 [ 606.299267] lock_acquire+0xd1/0x300 [ 606.299276] __flush_work+0x250/0x610 [ 606.299286] cancel_delayed_work_sync+0x71/0x80 [ 606.299296] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.299509] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.299723] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.299909] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.300101] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.300355] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.300369] process_one_work+0x21e/0x680 [ 606.300378] worker_thread+0x190/0x350 [ 606.300387] kthread+0xe7/0x120 [ 606.300396] ret_from_fork+0x3c/0x60 [ 606.300406] ret_from_fork_asm+0x1a/0x30 [ 606.300416] other info that might help us debug this: [ 606.300428] Chain exists of: (work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work) --> &adev->enforce_isolation_mutex --> &adev->gfx.kfd_sch_mutex [ 606.300458] Possible unsafe locking scenario: [ 606.300468] CPU0 CPU1 [ 606.300476] ---- ---- [ 606.300484] lock(&adev->gfx.kfd_sch_mutex); [ 606.300494] lock(&adev->enforce_isolation_mutex); [ 606.300508] lock(&adev->gfx.kfd_sch_mutex); [ 606.300521] lock((work_completion)(&(&adev->gfx.enforce_isolation[i].work)->work)); [ 606.300536] *** DEADLOCK *** [ 606.300546] 5 locks held by kworker/u96:3/3825: [ 606.300555] #0: ffff9aa5aa1f5d58 ((wq_completion)comp_1.1.0){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680 [ 606.300577] #1: ffffaa53c3c97e40 ((work_completion)(&sched->work_run_job)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680 [ 606.300600] #2: ffff9aa64e463c98 (&adev->enforce_isolation_mutex){+.+.}-{3:3}, at: amdgpu_gfx_enforce_isolation_ring_begin_use+0x1c3/0x5d0 [amdgpu] [ 606.300837] #3: ffff9aa64e432338 (&adev->gfx.kfd_sch_mutex){+.+.}-{3:3}, at: amdgpu_gfx_kfd_sch_ctrl+0x51/0x4d0 [amdgpu] [ 606.301062] #4: ffffffff8c1a5660 (rcu_read_lock){....}-{1:2}, at: __flush_work+0x70/0x610 [ 606.301083] stack backtrace: [ 606.301092] CPU: 14 PID: 3825 Comm: kworker/u96:3 Tainted: G OE 6.10.0-amd-mlkd-610-311224-lof #19 [ 606.301109] Hardware name: Gigabyte Technology Co., Ltd. X570S GAMING X/X570S GAMING X, BIOS F7 03/22/2024 [ 606.301124] Workqueue: comp_1.1.0 drm_sched_run_job_work [gpu_sched] [ 606.301140] Call Trace: [ 606.301146] <TASK> [ 606.301154] dump_stack_lvl+0x9b/0xf0 [ 606.301166] dump_stack+0x10/0x20 [ 606.301175] print_circular_bug+0x26c/0x340 [ 606.301187] check_noncircular+0x157/0x170 [ 606.301197] ? register_lock_class+0x48/0x490 [ 606.301213] __lock_acquire+0x16f9/0x2810 [ 606.301230] lock_acquire+0xd1/0x300 [ 606.301239] ? __flush_work+0x232/0x610 [ 606.301250] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301261] ? mark_held_locks+0x54/0x90 [ 606.301274] ? __flush_work+0x232/0x610 [ 606.301284] __flush_work+0x250/0x610 [ 606.301293] ? __flush_work+0x232/0x610 [ 606.301305] ? __pfx_wq_barrier_func+0x10/0x10 [ 606.301318] ? mark_held_locks+0x54/0x90 [ 606.301331] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.301345] cancel_delayed_work_sync+0x71/0x80 [ 606.301356] amdgpu_gfx_kfd_sch_ctrl+0x287/0x4d0 [amdgpu] [ 606.301661] amdgpu_gfx_enforce_isolation_ring_begin_use+0x2a4/0x5d0 [amdgpu] [ 606.302050] ? srso_alias_return_thunk+0x5/0xfbef5 [ 606.302069] amdgpu_ring_alloc+0x48/0x70 [amdgpu] [ 606.302452] amdgpu_ib_schedule+0x176/0x8a0 [amdgpu] [ 606.302862] ? drm_sched_entity_error+0x82/0x190 [gpu_sched] [ 606.302890] amdgpu_job_run+0xac/0x1e0 [amdgpu] [ 606.303366] drm_sched_run_job_work+0x24f/0x430 [gpu_sched] [ 606.303388] process_one_work+0x21e/0x680 [ 606.303409] worker_thread+0x190/0x350 [ 606.303424] ? __pfx_worker_thread+0x10/0x10 [ 606.303437] kthread+0xe7/0x120 [ 606.303449] ? __pfx_kthread+0x10/0x10 [ 606.303463] ret_from_fork+0x3c/0x60 [ 606.303476] ? __pfx_kthread+0x10/0x10 [ 606.303489] ret_from_fork_asm+0x1a/0x30 [ 606.303512] </TASK> v2: Refactor lock handling to resolve circular dependency (Alex) - Introduced a `sched_work` flag to defer the call to `amdgpu_gfx_kfd_sch_ctrl` until after releasing `enforce_isolation_mutex`. - This change ensures that `amdgpu_gfx_kfd_sch_ctrl` is called outside the critical section, preventing the circular dependency and deadlock. - The `sched_work` flag is set within the mutex-protected section if conditions are met, and the actual function call is made afterward. - This approach ensures consistent lock acquisition order. Fixes: afefd6f24502 ("drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-14drm/buddy: Add a testcase to verify the multiroot finiArunpravin Paneer Selvam
- Added a testcase to verify the multiroot force merge fini. - Added a new field in_use to track the mm freed status. v2:(Matthew) - Add kunit_fail_current_test() when WARN_ON is true. Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241226070116.309290-2-Arunpravin.PaneerSelvam@amd.com
2025-01-14drm/buddy: fix issue that force_merge cannot free all rootsLin.Cao
If buddy manager have more than one roots and each root have sub-block need to be free. When drm_buddy_fini called, the first loop of force_merge will merge and free all of the sub block of first root, which offset is 0x0 and size is biggest(more than have of the mm size). In subsequent force_merge rounds, if we use 0 as start and use remaining mm size as end, the block of other roots will be skipped in __force_merge function. It will cause the other roots can not be freed. Solution: use roots' offset as the start could fix this issue. Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241226070116.309290-1-Arunpravin.PaneerSelvam@amd.com
2025-01-14drm/xe: Add locks in gtidle codeVinay Belgaumkar
The update of the residency values needs to be protected by a lock to avoid multiple entrypoints, for example when multiple userspace clients read the sysfs file. Other in-kernel clients are going to be added to sample these values, making the problem worse. Protect those updates with a raw_spinlock so it can be called by future integration with perf pmu. Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250110173308.2412232-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-01-14drm/probe-helper: Call connector detect functions in single helperThomas Zimmermann
Move the logic to call the connector's detect helper into a single internal function. Reduces code dupliction. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240715093936.793552-2-tzimmermann@suse.de
2025-01-14drm/i915/fb: Relax clear color alignment to 64 bytesVille Syrjälä
Mesa changed its clear color alignment from 4k to 64 bytes without informing the kernel side about the change. This is now likely to cause framebuffer creation to fail. The only thing we do with the clear color buffer in i915 is: 1. map a single page 2. read out bytes 16-23 from said page 3. unmap the page So the only requirement we really have is that those 8 bytes are all contained within one page. Thus we can deal with the Mesa regression by reducing the alignment requiment from 4k to the same 64 bytes in the kernel. We could even go as low as 32 bytes, but IIRC 64 bytes is the hardware requirement on the 3D engine side so matching that seems sensible. Note that the Mesa alignment chages were partially undone so the regression itself was already fixed on userspace side. Cc: stable@vger.kernel.org Cc: Sagar Ghuge <sagar.ghuge@intel.com> Cc: Nanley Chery <nanley.g.chery@intel.com> Reported-by: Xi Ruoyao <xry111@xry111.site> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13057 Closes: https://lore.kernel.org/all/45a5bba8de009347262d86a4acb27169d9ae0d9f.camel@xry111.site/ Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/17f97a69c13832a6c1b0b3aad45b06f07d4b852f Link: https://gitlab.freedesktop.org/mesa/mesa/-/commit/888f63cf1baf34bc95e847a30a041dc7798edddb Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241129065014.8363-2-ville.syrjala@linux.intel.com Tested-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> (cherry picked from commit ed3a892e5e3d6b3f6eeb76db7c92a968aeb52f3d) Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
2025-01-13drm/xe/oa: Add missing VISACTL mux registersAshutosh Dixit
Add missing VISACTL mux registers required for some OA config's (e.g. RenderPipeCtrl). Fixes: cdf02fe1a94a ("drm/xe/oa/uapi: Add/remove OA config perf ops") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250111021539.2920346-1-ashutosh.dixit@intel.com
2025-01-13ACPI: video: Fix random crashes due to bad kfree()Chris Bainbridge
Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") added function dm_helpers_probe_acpi_edid(), which fetches the EDID from the BIOS by calling acpi_video_get_edid(). acpi_video_get_edid() returns a pointer to the EDID, but this pointer does not originate from kmalloc() - it is actually the internal "pointer" field from an acpi_buffer struct (which did come from kmalloc()). dm_helpers_probe_acpi_edid() then attempts to kfree() the EDID pointer, resulting in memory corruption which leads to random, intermittent crashes (e.g. 4% of boots will fail with some Oops). Fix this by allocating a new array (which can be safely freed) for the EDID data, and correctly freeing the acpi_buffer pointer. The only other caller of acpi_video_get_edid() is nouveau_acpi_edid(): remove the extraneous kmemdup() here as the EDID data is now copied in acpi_video_device_EDID(). Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Closes: https://lore.kernel.org/amd-gfx/20250110175252.GBZ4FedNKqmBRaY4T3@fat_crate.local/T/#m324a23eb4c4c32fa7e89e31f8ba96c781e496fb1 Link: https://patch.msgid.link/Z4K_oQL7eA9Owkbs@debian.local [ rjw: Changed function description comment into a kerneldoc one ] [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-13drm/amd/display: Disable replay and psr while VRR is enabledTom Chung
[Why] Replay and PSR will cause some video corruption while VRR is enabled. [How] 1. Disable the Replay and PSR while VRR is enabled. 2. Change the amdgpu_dm_crtc_vrr_active() parameter to const. Because the function will only read data from dm_crtc_state. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d7879340e987b3056b8ae39db255b6c19c170a0d) Cc: stable@vger.kernel.org
2025-01-13drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enableTom Chung
[Why] The enum DC_PSR_VERSION_SU_1 of psr_version is 1 and DC_PSR_VERSION_UNSUPPORTED is 0xFFFFFFFF. The original code may has chance trigger the amdgpu_dm_psr_enable() while psr version is set to DC_PSR_VERSION_UNSUPPORTED. [How] Modify the condition to psr->psr_version == DC_PSR_VERSION_SU_1 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f765e7ce0417f8dc38479b4b495047c397c16902) Cc: stable@vger.kernel.org
2025-01-13drm/i915/fb: Check that the clear color fits within the BOVille Syrjälä
Make sure the user supplied offset[] for the clear color plane fits within the actual BO. Note that we use tile units to track the size here. All the other color/aux planes are already being checked correctly. Cc: Sagar Ghuge <sagar.ghuge@intel.com> Cc: Nanley Chery <nanley.g.chery@intel.com> Cc: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241129065014.8363-4-ville.syrjala@linux.intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
2025-01-13drm/i915/fb: Add debug spew for misaligned CC planeVille Syrjälä
We're currently failing to provide any debug output when the user passes in a misaligned offset for the clear color plane. Add some debugs prints to make debugging actually possible. Cc: Sagar Ghuge <sagar.ghuge@intel.com> Cc: Nanley Chery <nanley.g.chery@intel.com> Cc: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241129065014.8363-3-ville.syrjala@linux.intel.com Reviewed-by: José Roberto de Souza <jose.souza@intel.com>