<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c, branch v6.18</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices</title>
<updated>2025-10-13T18:14:15+00:00</updated>
<author>
<name>Jesse.Zhang</name>
<email>Jesse.Zhang@amd.com</email>
</author>
<published>2025-10-13T05:46:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=883f309add55060233bf11c1ea6947140372920f'/>
<id>883f309add55060233bf11c1ea6947140372920f</id>
<content type='text'>
Previously, APU platforms (and other scenarios with uninitialized VRAM managers)
triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root
cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL,
but that `man-&gt;bdev` (the backing device pointer within the manager) remains
uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully
set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to
acquire `man-&gt;bdev-&gt;lru_lock`, it dereferences the NULL `man-&gt;bdev`, leading to
a kernel OOPS.

1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
   `amdgpu_cs_get_threshold_for_moves()` to include a check for
   `ttm_resource_manager_used()`. If the manager is not used (uninitialized
   `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific
   logic that would trigger the NULL dereference.

2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
   reporting to use a conditional: if the manager is used, return the real VRAM
   usage; otherwise, return 0. This avoids accessing `man-&gt;bdev` when it is
   NULL.

3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function)
   data write path. Use `ttm_resource_manager_used()` to check validity: if the
   manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set
   `fb_usage` to 0 (APUs have no discrete framebuffer to report).

This approach is more robust than APU-specific checks because it:
- Works for all scenarios where the VRAM manager is uninitialized (not just APUs),
- Aligns with TTM's design by using its native helper function,
- Preserves correct behavior for discrete GPUs (which have fully initialized
  `man-&gt;bdev` and pass the `ttm_resource_manager_used()` check).

v4: use ttm_resource_manager_used(&amp;adev-&gt;mman.vram_mgr.manager) instead of checking the adev-&gt;gmc.is_app_apu flag (Christian)

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Previously, APU platforms (and other scenarios with uninitialized VRAM managers)
triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root
cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL,
but that `man-&gt;bdev` (the backing device pointer within the manager) remains
uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully
set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to
acquire `man-&gt;bdev-&gt;lru_lock`, it dereferences the NULL `man-&gt;bdev`, leading to
a kernel OOPS.

1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
   `amdgpu_cs_get_threshold_for_moves()` to include a check for
   `ttm_resource_manager_used()`. If the manager is not used (uninitialized
   `bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific
   logic that would trigger the NULL dereference.

2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
   reporting to use a conditional: if the manager is used, return the real VRAM
   usage; otherwise, return 0. This avoids accessing `man-&gt;bdev` when it is
   NULL.

3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function)
   data write path. Use `ttm_resource_manager_used()` to check validity: if the
   manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set
   `fb_usage` to 0 (APUs have no discrete framebuffer to report).

This approach is more robust than APU-specific checks because it:
- Works for all scenarios where the VRAM manager is uninitialized (not just APUs),
- Aligns with TTM's design by using its native helper function,
- Preserves correct behavior for discrete GPUs (which have fully initialized
  `man-&gt;bdev` and pass the `ttm_resource_manager_used()` check).

v4: use ttm_resource_manager_used(&amp;adev-&gt;mman.vram_mgr.manager) instead of checking the adev-&gt;gmc.is_app_apu flag (Christian)

Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Merge amdgpu_vm_set_pasid into amdgpu_vm_init</title>
<updated>2025-10-07T18:09:07+00:00</updated>
<author>
<name>Jesse.Zhang</name>
<email>Jesse.Zhang@amd.com</email>
</author>
<published>2025-09-25T10:02:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b809ca91a5b7eccba065460200512f83cc87987a'/>
<id>b809ca91a5b7eccba065460200512f83cc87987a</id>
<content type='text'>
As KFD no longer uses a separate PASID, the global amdgpu_vm_set_pasid()function is no longer necessary.
Merge its functionality directly intoamdgpu_vm_init() to simplify code flow and eliminate redundant locking.

v2: remove superflous check
  adjust amdgpu_vm_fin and remove amdgpu_vm_set_pasid (Chritian)

v3: drop amdgpu_vm_assert_locked (Chritian)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4614
Fixes: 59e4405e9ee2 ("drm/amdgpu: revert to old status lock handling v3")
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
As KFD no longer uses a separate PASID, the global amdgpu_vm_set_pasid()function is no longer necessary.
Merge its functionality directly intoamdgpu_vm_init() to simplify code flow and eliminate redundant locking.

v2: remove superflous check
  adjust amdgpu_vm_fin and remove amdgpu_vm_set_pasid (Chritian)

v3: drop amdgpu_vm_assert_locked (Chritian)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4614
Fixes: 59e4405e9ee2 ("drm/amdgpu: revert to old status lock handling v3")
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: add AMDGPU_IDS_FLAGS_GANG_SUBMIT</title>
<updated>2025-09-15T21:04:42+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2025-09-05T12:45:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a9273da04fa033667a4d0ccfe46c4ba55721d7d1'/>
<id>a9273da04fa033667a4d0ccfe46c4ba55721d7d1</id>
<content type='text'>
Add a UAPI flag indicating if gang submit is supported or not.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a UAPI flag indicating if gang submit is supported or not.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Replace HQD terminology with slots naming</title>
<updated>2025-07-16T20:17:36+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2025-07-04T07:17:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9ffab039bcb0bbfade0e659552d2fb912347a871'/>
<id>9ffab039bcb0bbfade0e659552d2fb912347a871</id>
<content type='text'>
The term "HQD" is CP-specific and doesn't
accurately describe the queue resources for other IP blocks like SDMA,
VCN, or VPE. This change:

1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect
   the generic nature of the resource counting
2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots`
3. Maintains the same functionality while using more appropriate terminology

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Reviewed-by: Marek Olšák &lt;marek.olsak@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The term "HQD" is CP-specific and doesn't
accurately describe the queue resources for other IP blocks like SDMA,
VCN, or VPE. This change:

1. Renames `num_hqds` to `num_slots` in amdgpu_kms.c to better reflect
   the generic nature of the resource counting
2. Updates the UAPI struct member from `userq_num_hqds` to `userq_num_slots`
3. Maintains the same functionality while using more appropriate terminology

Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Reviewed-by: Marek Olšák &lt;marek.olsak@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Add user queue instance count in HW IP info</title>
<updated>2025-07-16T20:17:35+00:00</updated>
<author>
<name>Jesse Zhang</name>
<email>jesse.zhang@amd.com</email>
</author>
<published>2025-06-25T07:29:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=78d0a27ae0e2e70b22895f4b388cc0ab88e3c6ca'/>
<id>78d0a27ae0e2e70b22895f4b388cc0ab88e3c6ca</id>
<content type='text'>
This change exposes the number of available user queue instances
for each hardware IP type (GFX, COMPUTE, SDMA) through the
drm_amdgpu_info_hw_ip interface.

Key changes:
1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure
2. Implemented counting of available HQD slots using:
   - mes.gfx_hqd_mask for GFX queues
   - mes.compute_hqd_mask for COMPUTE queues
   - mes.sdma_hqd_mask for SDMA queues
3. Only counts available instances when user queues are enabled
   (!disable_uq)

v2: using the adev-&gt;mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks
  to determine the number of queue slots available for each engine type (Alex)
v3: rename userq_num_instance to userq_num_hqds (Alex)

Suggested-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change exposes the number of available user queue instances
for each hardware IP type (GFX, COMPUTE, SDMA) through the
drm_amdgpu_info_hw_ip interface.

Key changes:
1. Added userq_num_instance field to drm_amdgpu_info_hw_ip structure
2. Implemented counting of available HQD slots using:
   - mes.gfx_hqd_mask for GFX queues
   - mes.compute_hqd_mask for COMPUTE queues
   - mes.sdma_hqd_mask for SDMA queues
3. Only counts available instances when user queues are enabled
   (!disable_uq)

v2: using the adev-&gt;mes.gfx_hqd_mask[]/compute_hqd_mask[]/sdma_hqd_mask[] masks
  to determine the number of queue slots available for each engine type (Alex)
v3: rename userq_num_instance to userq_num_hqds (Alex)

Suggested-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Jesse Zhang &lt;Jesse.Zhang@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'amd-drm-next-6.17-2025-07-11' of https://gitlab.freedesktop.org/agd5f/linux into drm-next</title>
<updated>2025-07-11T21:55:40+00:00</updated>
<author>
<name>Simona Vetter</name>
<email>simona.vetter@ffwll.ch</email>
</author>
<published>2025-07-11T21:55:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7e11e01d1f1d00cb308f9351511e9597a4f70678'/>
<id>7e11e01d1f1d00cb308f9351511e9597a4f70678</id>
<content type='text'>
amd-drm-next-6.17-2025-07-11:

amdgpu:
- Clean up function signatures
- GC 10 KGQ reset fix
- SDMA reset cleanups
- Misc fixes
- LVDS fixes
- UserQ fix

amdkfd:
- Reset fix

Signed-off-by: Simona Vetter &lt;simona.vetter@ffwll.ch&gt;
From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
amd-drm-next-6.17-2025-07-11:

amdgpu:
- Clean up function signatures
- GC 10 KGQ reset fix
- SDMA reset cleanups
- Misc fixes
- LVDS fixes
- UserQ fix

amdkfd:
- Reset fix

Signed-off-by: Simona Vetter &lt;simona.vetter@ffwll.ch&gt;
From: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20250711205548.21052-1-alexander.deucher@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge remote-tracking branch 'drm/drm-next' into drm-misc-next</title>
<updated>2025-07-08T14:49:07+00:00</updated>
<author>
<name>Maarten Lankhorst</name>
<email>dev@lankhorst.se</email>
</author>
<published>2025-07-08T14:49:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e21354aea4b4420b53c44e36828607a7c94a994c'/>
<id>e21354aea4b4420b53c44e36828607a7c94a994c</id>
<content type='text'>
Pull in drm-intel-next for the updates to drm panic handling.

Signed-off-by: Maarten Lankhorst &lt;dev@lankhorst.se&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull in drm-intel-next for the updates to drm panic handling.

Signed-off-by: Maarten Lankhorst &lt;dev@lankhorst.se&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Revert "drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini"</title>
<updated>2025-07-07T17:48:50+00:00</updated>
<author>
<name>Vitaly Prosyak</name>
<email>vitaly.prosyak@amd.com</email>
</author>
<published>2025-06-24T16:05:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a73345b866ff8bbd93135af667c973a8fb4b2c40'/>
<id>a73345b866ff8bbd93135af667c973a8fb4b2c40</id>
<content type='text'>
This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b.

The original patch moved `amdgpu_userq_mgr_fini()` to the driver's
`postclose` callback, which is called after `drm_gem_release()` in
the DRM file cleanup sequence.If a user application crashes or aborts
without cleaning up its user queues, 'drm_gem_release()` may free
GEM objects that are still referenced by active user queues, leading
to use-after-free. By reverting, we ensure that user queues are
disabled and cleaned up before any GEM objects are released,
preventing this class of bug. However, this reintroduces a race
during PCI hot-unplug, where device removal can race with per-file
cleanup, leading to use-after-free in suspend/unplug paths.
This will be fixed in the next patch.

Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c")
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b.

The original patch moved `amdgpu_userq_mgr_fini()` to the driver's
`postclose` callback, which is called after `drm_gem_release()` in
the DRM file cleanup sequence.If a user application crashes or aborts
without cleaning up its user queues, 'drm_gem_release()` may free
GEM objects that are still referenced by active user queues, leading
to use-after-free. By reverting, we ensure that user queues are
disabled and cleaned up before any GEM objects are released,
preventing this class of bug. However, this reintroduces a race
during PCI hot-unplug, where device removal can race with per-file
cleanup, leading to use-after-free in suspend/unplug paths.
This will be fixed in the next patch.

Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c")
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Acked-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Pass adev pointer to functions</title>
<updated>2025-07-07T17:41:37+00:00</updated>
<author>
<name>Lijo Lazar</name>
<email>lijo.lazar@amd.com</email>
</author>
<published>2025-06-30T13:41:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=127ed492ad2df0aa2351a1ad32a793ae7d91161b'/>
<id>127ed492ad2df0aa2351a1ad32a793ae7d91161b</id>
<content type='text'>
Pass amdgpu device context instead of drm device context to some
amdgpu_device_* functions. DRM device context is not required in those
functions. No functional change.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pass amdgpu device context instead of drm device context to some
amdgpu_device_* functions. DRM device context is not required in those
functions. No functional change.

Signed-off-by: Lijo Lazar &lt;lijo.lazar@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: add debugfs support for VM pagetable per client</title>
<updated>2025-07-04T14:00:06+00:00</updated>
<author>
<name>Sunil Khatri</name>
<email>sunil.khatri@amd.com</email>
</author>
<published>2025-07-04T07:55:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=719b378d371899c8bbaf0ce5f42bf7db4be819f9'/>
<id>719b378d371899c8bbaf0ce5f42bf7db4be819f9</id>
<content type='text'>
Add a debugfs file under the client directory which shares
the root page table base address of the VM.

This address could be used to dump the pagetable for debug
memory issues.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://lore.kernel.org/r/20250704075548.1549849-4-sunil.khatri@amd.com
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a debugfs file under the client directory which shares
the root page table base address of the VM.

This address could be used to dump the pagetable for debug
memory issues.

Signed-off-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://lore.kernel.org/r/20250704075548.1549849-4-sunil.khatri@amd.com
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
