linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c, branch v7.1-rc2

drm/amdgpu: Reject impossible entities early

2026-03-04T16:50:56+00:00

Currently there are two different behaviour modes when userspace tries to
operate on not present HW IP blocks. On a machine without UVD, VCE and VPE
blocks, this can be observed for example like this:

$ sudo ./amd_fuzzing --r cs-wait-fuzzing
...
amd_cs_wait_fuzzing DRM_IOCTL_AMDGPU_CTX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_GFX r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_COMPUTE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_DMA r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCE r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_UVD_ENC r -1
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_DEC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_ENC r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VCN_JPEG r 0
amd_cs_wait_fuzzing AMDGPU_WAIT_CS AMD_IP_VPE r 0

We can see that UVD returns an errno (-EINVAL) from the CS_WAIT ioctl,
while VCE and VPE return unexpected successes.

The difference stems from the fact the UVD is a load balancing engine
which retains the context, so with a workaround implemented in
amdgpu_ctx_init_entity(), but which does not account for the fact hardware
block may not be present.

This causes a single NULL scheduler to be passed to
drm_sched_entity_init(), which immediately rejects this with -EINVAL.

The not present VCE and VPE cases on the other hand pass zero schedulers
to drm_sched_entity_init(), which is explicitly allowed and results in
unusable entities.

As the UVD case however shows, call paths can handle the errors, so we can
consolidate this into a single path which will always return -EINVAL if
the HW IP block is not present.

We do this by rejecting it early and not calling drm_sched_entity_init()
when there is no backing hardware.

This also removes the need for the drm_sched_entity_init() to handle the
zero schedulers and NULL scheduler cases, which means that we can follow
up by removing the special casing from the DRM scheduler.

Signed-off-by: Tvrtko Ursulin 
References: f34e8bb7d6c6 ("drm/sched: fix null-ptr-deref in init entity")
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Remove duplicate struct member

2026-02-23T19:16:31+00:00

Struct amdgpu_ctx contains two copies of the pointer to the context
manager. Remove one.

Signed-off-by: Tvrtko Ursulin 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher

Convert 'alloc_flex' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

2026-02-22T01:09:51+00:00

This was done entirely with mindless brute force, using

    git grep -l '\

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

2026-02-21T09:02:28+00:00

This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook

drm/amd: Convert DRM_() to drm_()

2026-01-05T21:59:55+00:00

The drm_*() macros include the device which is helpful for debugging
issues in multi-GPU systems.

Signed-off-by: Mario Limonciello (AMD) 
Reviewed-by: Aurabindo Pillai 
Signed-off-by: Alex Deucher

drm/amdgpu: jump to the correct label on failure

2025-11-12T02:54:14+00:00

drm_sched_entity_init wasn't called yet, so the only thing to
do is to release allocated memory.
This doesn't fix any bug since entity is zero allocated and
drm_sched_entity_fini does nothing in this case.

Signed-off-by: Pierre-Eric Pelloux-Prayer 
Reviewed-by: Tvrtko Ursulin 
Acked-by: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Fix memory leak in amdgpu_ctx_mgr_entity_fini

2025-06-30T17:57:31+00:00

patch dd64956685fa ("drm/amdgpu: Remove duplicated "context still
alive" check") removed ctx put, which will cause amdgpu_ctx_fini()
cannot be called and then cause some finished fence that added by
amdgpu_ctx_add_fence() cannot be released and cause memleak.

Fixes: dd64956685fa ("drm/amdgpu: Remove duplicated "context still alive" check")
Signed-off-by: Lin.Cao 
Reviewed-by: Tvrtko Ursulin 
Acked-by: Christian König 
Signed-off-by: Alex Deucher 
(cherry picked from commit 8cf66089e28108dedd47e6156a48489303cf525c)
Cc: stable@vger.kernel.org

drm/amdgpu: Remove duplicated "context still alive" check

2025-05-22T16:02:25+00:00

When amdgpu_ctx_mgr_fini() calls amdgpu_ctx_mgr_entity_fini() it contains
the exact same "context still alive" check as it will do next. Remove the
duplicated copy.

Reviewed-by: Christian König 
Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Signed-off-by: Alex Deucher

drm/amdgpu: Make amdgpu_ctx_mgr_entity_fini static

2025-05-22T16:02:21+00:00

Function amdgpu_ctx_mgr_entity_fini() only has a single local caller so
lets make it local.

Reviewed-by: Christian König 
Signed-off-by: Tvrtko Ursulin 
Cc: Alex Deucher 
Cc: Christian König 
Signed-off-by: Alex Deucher