| Age | Commit message (Collapse) | Author |
|
Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and
could be exploited.
So check these input value against AMDGPU_USERQ_MAX_HANDLES
which is big enough value for genuine use cases and could
potentially avoid OOM.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the existing helper instead of open coding it
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatrti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the existing helper instead of open coding it.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatrti@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use dedicated memory as vf ras command buffer.
V2:
Add lock to ensure serialization of sending vf ras commands.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Jinzhou Su <jinzhou.su@amd.com>
Tested-by: Jinzhou Su <jinzhou.su@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If we gate the fence destruction with a check telling us whether there are
valid pointers in there we can eliminate the need for dual, basically
identical, exit paths.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Userspace can either deliberately pass in the too small num_fences, or the
required number can legitimately grow between the two calls to the userq
wait ioctl. In both cases we do not want the emit the kernel warning
backtrace since nothing is wrong with the kernel and userspace will simply
get an errno reported back. So lets simply drop the WARN_ONs.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Drop reference to syncobj and timeline fence when aborting the ioctl due
output array being too small.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL")
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the existing helper instead of multiplying the size.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use the existing helper instead of multiplying the size.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 7.1 has increased transfer limits.
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 7.0 has increased transfer limits.
v2: fix harder, use shifts to make it more obvious
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 6.x has increased transfer limits.
v2: fix harder, use shifts to make it more obvious
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 5.2.x has increased transfer limits.
v2: fix harder, use shifts to make it more obvious
v3: align const fill with PAL limits
v4: re-align with hw limits
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 4.4.x has increased transfer limits.
v2: fix harder, use shifts to make it more obvious
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
SDMA 4.4.x has increased transfer limits.
v2: fix harder, use shifts to make it more obvious
Cc: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Xgmi link status is unavailable in guest. This patch returns
AMDGPU_XGMI_LINK_NA for VFs.
Signed-off-by: Simon Louis <simon.louis@amd.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add build number, version and date to the existing part number print.
Example:
[drm] ATOM BIOS: 113-PN000108-103, build: 00159017, ver: 022.040.003.043.000001, 2025/07/27
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Store the start wptr and ib size in the IB fence. On queue
reset, save the ring contents of all IBs.
For reemit, reemit the entire IB state for non-guilty contexts.
For guilty contexts, replace the IB submission with nops, but reemit
the rest. Split the reemit per fence and when we reemit, update the
wptr with the new values from reemit. This allows us to reemit jobs
repeatedly as the wptrs get properly updated each time.
v2: further simplify the logic
v3: reemit vm state, not just vm fence
v4: just nop the IB and possibly the VM portion of the submission
v5: simplify the vm fence check
v6: split the vm and ib fences
v7: fix commit message
v8: use wptr rather than count_dw to calculate offsets
v9: fix missing documenation update spotted by the kernel test robot
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add DM ipblock for DCN 4.2.0
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Now that DC supports external DP bridge encoders,
it has reached feature parity with the legacy non-DC display
driver on CIK APUs: Kaveri, Kabini, Mullins.
Use the DC display driver by default on SI APUs, unless it is
explicitly disabled using the amdgpu.dc=0 module parameter.
DC brings proper support for DP/HDMI audio, DP MST, VRR,
10-bit colors, some HDR features, atomic modesetting, etc.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_vm_flush() is used during job submission and is not expected to
fail. Convert it to return void and simplify the caller.
Initialize the COND_EXEC patch location to 0 so it is safe to call
amdgpu_ring_patch_cond_exec() when init_cond_exec is not supported.
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
dma_fence_wait(old, false) is not interruptible and cannot return an
error. Drop the unreachable error handling in amdgpu_fence_emit().
Since the function can no longer fail, convert amdgpu_fence_emit() to
return void and remove return value handling from all callers.
v2:
- Add comment explaining why dma_fence_wait(..., false)
return value is ignored (Alex)
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reorders the IB schedule sequence to cleanly
separate the vm operation from the IB submission.
This makes the two independent so we can cleanly
associate each one with its respective fence.
v2: fixes for VCN
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Struct amdgpu_ctx contains two copies of the pointer to the context
manager. Remove one.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a helper to calculate the distance in DWs between
two wptrs.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The mes and mes_kiq parameters we originally added for
mes bring up. However, mes is required for operation
on gfx11 and newer so these parameters aren't actually
used by the driver anymore. Remove them.
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We only want to stop the work queues, not mess with the
fences, etc.
v2: add the job back to the pending list.
v3: return the proper job status so scheduler adds the
job back to the pending list
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Re-order the struct members a bit to avoid some holes:
/* size: 408, cachelines: 7, members: 15 */
/* sum members: 393, holes: 4, sum holes: 15 */
/* last cacheline: 24 bytes */
/* size: 400, cachelines: 7, members: 15 */
/* sum members: 393, holes: 1, sum holes: 7 */
/* last cacheline: 16 bytes */
While doing so we notice a duplicate but will address than in the
following patch.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Extend the GFX12 compute MQD initialization to support
Compute Unit (CU) masking for fine-grained resource allocation.
This allows compute queues to be limited to specific CUs for
performance isolation and debugging purposes.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Extend the GFX11 compute MQD initialization to support
Compute Unit (CU) masking for fine-grained resource allocation.
This allows compute queues to be limited to specific CUs for
performance isolation and debugging purposes.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add new fields to the amdgpu_mqd_prop structure to track CU (Compute Unit)
mask information, including the mask itself, count, flags, and a flag to
indicate if user-specified CU masking is active.
v2: Create a generic function amdgpu_gfx_mqd_symmetrically_map_cu_mask()
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Extend the AMDGPU user queue function interface to support MQD
updates by adding an mqd_update callback.
v2: add the input paramter struct drm_amdgpu_userq_in in mqd_update
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It avoids duplicated code and allows to output a warning.
---
v4: move check inside the existing if (enable) test
---
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
All sdma versions used the same logic, so add a helper and move the
common code to a single place.
---
v2: pass amdgpu_vm_pte_funcs as well
v3: drop all the *_set_vm_pte_funcs one liners
v5: rebased
---
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Makes copies/evictions faster when gart windows are required.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
drm_sched_job_arm and drm_sched_entity_push_job must be called
under the same lock to guarantee the order of execution.
This commit adds a check in amdgpu_ttm_job_submit and fix the
places where the lock was missing.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Taking the entity lock is required to guarantee the ordering of
execution. The next commit will add a check that the lock is
held.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's not needed anymore.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use amdgpu_gtt_mgr_alloc_entries for each entity instead
of reserving a fixed number of pages.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Instead of reserving a number of GTT pages for VCE 1.0 this
commit now uses amdgpu_gtt_mgr_alloc_entries to allocate
the pages when initializing vce 1.0.
While at it remove the "does the VCPU BO already have a
32-bit address" check as suggested by Timur.
This decouples vce init from gtt init.
---
v7: renamed variables (Christian)
---
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Having a helper avoids code duplication.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This allows to have init/fini functions to hold all the init and
teardown code for amdgpu_ttm_buffer_entity.
For now only drm_sched_entity init/destroy function calls are moved
here, but as entities gain new members it will make code simpler.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If multiple entities share the same window we must make sure
that jobs using them are executed sequentially.
This commit gives separate windows to each entity, so jobs
from multiple entities could execute in parallel if needed.
(for now they all use the first sdma engine, so it makes no
difference yet).
The entity stores the gart window offsets to centralize the
"window id" to "window offset" in a single place.
default_entity doesn't get any windows reserved since there is
no use for them.
---
v3:
- renamed gart_window_lock -> lock (Christian)
- added amdgpu_ttm_buffer_entity_init (Christian)
- fixed gart_addr in svm_migrate_gart_map (Felix)
- renamed gart_window_idX -> gart_window_offs[]
- added amdgpu_compute_gart_address
v4:
- u32 -> u64
- added kerneldoc
v5:
- removed gtt_window_lock
- simplified gart window creation and use: entities using a
single window now uses window #0 instead of #1
- fix dst_addr calculation in kfd_migrate.c
---
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Same as what was done in commit c79cf5a7d903
("drm/amdgpu: remove gart_window_lock usage from gmc v12") for v12.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add dma_fence_lock_irqsafe() and dma_fence_unlock_irqrestore() wrappers
and mechanically apply them everywhere.
Just a pre-requisite cleanup for a follow up patch.
v2: add some missing i915 bits, add abstraction for lockdep assertion as
well
v3: one more suggestion by Tvrtko
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20260219160822.1529-4-christian.koenig@amd.com
|
|
Let's merge 7.0-rc1 to start the new drm-misc-next window
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Conversion performed via this Coccinelle script:
// SPDX-License-Identifier: GPL-2.0-only
// Options: --include-headers-for-types --all-includes --include-headers --keep-comments
virtual patch
@gfp depends on patch && !(file in "tools") && !(file in "samples")@
identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
kzalloc_obj,kzalloc_objs,kzalloc_flex,
kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
@@
ALLOC(...
- , GFP_KERNEL
)
$ make coccicheck MODE=patch COCCI=gfp.cocci
Build and boot tested x86_64 with Fedora 42's GCC and Clang:
Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|