<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c, branch v7.1-rc4</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>drm/amdgpu: fix syncobj leak for amdgpu_gem_va_ioctl()</title>
<updated>2026-03-24T17:32:22+00:00</updated>
<author>
<name>Prike Liang</name>
<email>Prike.Liang@amd.com</email>
</author>
<published>2026-03-19T06:47:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a0f0b6d31a53a7607ed44f7623faafc628333258'/>
<id>a0f0b6d31a53a7607ed44f7623faafc628333258</id>
<content type='text'>
It requires freeing the syncobj and chain
alloction resource.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It requires freeing the syncobj and chain
alloction resource.

Signed-off-by: Prike Liang &lt;Prike.Liang@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix some more bug in amdgpu_gem_va_ioctl</title>
<updated>2026-03-23T18:13:11+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-02-03T16:30:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=68bd4f6b8310f309eb63b41e15088690c9cec0a9'/>
<id>68bd4f6b8310f309eb63b41e15088690c9cec0a9</id>
<content type='text'>
Some illegal combination of input flags were not checked and we need to
take the PDEs into account when returning the fence as well.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Some illegal combination of input flags were not checked and we need to
take the PDEs into account when returning the fence as well.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: fix adding eviction fence</title>
<updated>2026-03-17T21:46:26+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-01-28T15:07:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2c192b06f1e6bd9f770138a55b26c4ea04246e2f'/>
<id>2c192b06f1e6bd9f770138a55b26c4ea04246e2f</id>
<content type='text'>
We can't add the eviction fence without validating the BO.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We can't add the eviction fence without validating the BO.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: completely rework eviction fence handling v2</title>
<updated>2026-03-17T21:46:13+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-01-28T12:58:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2cd7284ba54b22869c301fba7aff9ec96a12f8c0'/>
<id>2cd7284ba54b22869c301fba7aff9ec96a12f8c0</id>
<content type='text'>
Well that was broken on multiple levels.

First of all a lot of checks were placed at incorrect locations, especially if
the resume worker should run or not.

Then a bunch of code was just mid-layering because of incorrect assignment who
should do what.

And finally comments explaining what happens instead of why.

Just re-write it from scratch, that should at least fix some of the hangs we
are seeing.

Use RCU for the eviction fence pointer in the manager, the spinlock usage was
mostly incorrect as well. Then finally remove all the nonsense checks and
actually add them in the correct locations.

v2: some typo fixes and cleanups suggested by Sunil

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Well that was broken on multiple levels.

First of all a lot of checks were placed at incorrect locations, especially if
the resume worker should run or not.

Then a bunch of code was just mid-layering because of incorrect assignment who
should do what.

And finally comments explaining what happens instead of why.

Just re-write it from scratch, that should at least fix some of the hangs we
are seeing.

Use RCU for the eviction fence pointer in the manager, the spinlock usage was
mostly incorrect as well. Then finally remove all the nonsense checks and
actually add them in the correct locations.

v2: some typo fixes and cleanups suggested by Sunil

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sunil Khatri &lt;sunil.khatri@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Convert 'alloc_obj' family to use the new default GFP_KERNEL argument</title>
<updated>2026-02-22T01:09:51+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2026-02-22T00:37:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43'/>
<id>bf4afc53b77aeaa48b5409da5c8da6bb4eff7f43</id>
<content type='text'>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This was done entirely with mindless brute force, using

    git grep -l '\&lt;k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace kmalloc with kmalloc_obj for non-scalar types</title>
<updated>2026-02-21T09:02:28+00:00</updated>
<author>
<name>Kees Cook</name>
<email>kees@kernel.org</email>
</author>
<published>2026-02-21T07:49:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=69050f8d6d075dc01af7a5f2f550a8067510366f'/>
<id>69050f8d6d075dc01af7a5f2f550a8067510366f</id>
<content type='text'>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook &lt;kees@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: lock both VM and BO in amdgpu_gem_object_open</title>
<updated>2026-02-12T20:24:59+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2026-01-20T11:57:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fd1fa48b935f22c7d99713bf33846e14a6bb6ab9'/>
<id>fd1fa48b935f22c7d99713bf33846e14a6bb6ab9</id>
<content type='text'>
The VM was not locked in the past since we initially only cleared the
linked list element and not added it to any VM state.

But this has changed quite some time ago, we just never realized this
problem because the VM state lock was masking it.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The VM was not locked in the past since we initially only cleared the
linked list element and not added it to any VM state.

But this has changed quite some time ago, we just never realized this
problem because the VM state lock was masking it.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v7</title>
<updated>2026-01-10T19:21:52+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2026-01-09T12:31:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=efdc66fe12b07e7b7d28650bd8d4f7e3bb92c5d4'/>
<id>efdc66fe12b07e7b7d28650bd8d4f7e3bb92c5d4</id>
<content type='text'>
When GPU memory mappings are updated, the driver returns a fence so
userspace knows when the update is finished.

The previous refactor could pick the wrong fence or rely on checks that
are not safe for GPU mappings that stay valid even when memory is
missing. In some cases this could return an invalid fence or cause fence
reference counting problems.

Fix this by (v5,v6, per Christian):
- Starting from the VM’s existing last update fence, so a valid and
  meaningful fence is always returned even when no new work is required.
- Selecting the VM-level fence only for always-valid / PRT mappings using
  the required combined bo_va + bo guard.
- Using the per-BO page table update fence for normal MAP and REPLACE
  operations.
- For UNMAP and CLEAR, returning the fence provided by
  amdgpu_vm_clear_freed(), which may remain unchanged when nothing needs
  clearing.
- Keeping fence reference counting balanced.

v7: Drop the extra bo_va/bo NULL guard since
    amdgpu_vm_is_bo_always_valid() handles NULL BOs correctly (including
    PRT). (Christian)

This makes VM timeline fences correct and prevents crashes caused by
incorrect fence handling.

Fixes: bd8150a1b337 ("drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4")
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When GPU memory mappings are updated, the driver returns a fence so
userspace knows when the update is finished.

The previous refactor could pick the wrong fence or rely on checks that
are not safe for GPU mappings that stay valid even when memory is
missing. In some cases this could return an invalid fence or cause fence
reference counting problems.

Fix this by (v5,v6, per Christian):
- Starting from the VM’s existing last update fence, so a valid and
  meaningful fence is always returned even when no new work is required.
- Selecting the VM-level fence only for always-valid / PRT mappings using
  the required combined bo_va + bo guard.
- Using the per-BO page table update fence for normal MAP and REPLACE
  operations.
- For UNMAP and CLEAR, returning the fence provided by
  amdgpu_vm_clear_freed(), which may remain unchanged when nothing needs
  clearing.
- Keeping fence reference counting balanced.

v7: Drop the extra bo_va/bo NULL guard since
    amdgpu_vm_is_bo_always_valid() handles NULL BOs correctly (including
    PRT). (Christian)

This makes VM timeline fences correct and prevents crashes caused by
incorrect fence handling.

Fixes: bd8150a1b337 ("drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4")
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Drop MMIO_REMAP domain bit and keep it Internal</title>
<updated>2026-01-10T19:21:35+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2025-12-02T15:12:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=96e97a562d067a6d867862db79864cc66aae99c2'/>
<id>96e97a562d067a6d867862db79864cc66aae99c2</id>
<content type='text'>
"AMDGPU_GEM_DOMAIN_MMIO_REMAP" - Never activated as UAPI and it turned
out that this was to inflexible.

Allocate the MMIO_REMAP buffer object as a regular GEM BO and explicitly
move it into the fixed AMDGPU_PL_MMIO_REMAP placement at the TTM level.

This avoids relying on GEM domain bits for MMIO_REMAP, keeps the
placement purely internal, and makes the lifetime and pinning of the
global MMIO_REMAP BO explicit. The BO is pinned in TTM so it cannot be
migrated or evicted.

The corresponding free path relies on normal DRM teardown ordering,
where no further user ioctls can access the global BO once TTM teardown
begins.

v2 (Srini):
- Updated patch title.
- Drop use of AMDGPU_GEM_DOMAIN_MMIO_REMAP in amdgpu_ttm.c. The
  MMIO_REMAP domain bit is removed from UAPI, so keep the MMIO_REMAP BO
  allocation domain-less (bp.domain = 0) and rely on the TTM placement
  (AMDGPU_PL_MMIO_REMAP) for backing/pinning.
- Keep fdinfo/mem-stats visibility for MMIO_REMAP by classifying BOs
  based on bo-&gt;tbo.resource-&gt;mem_type == AMDGPU_PL_MMIO_REMAP, since the
  domain bit is removed.

v3: Squash patches #1 &amp; #3

Fixes: 056132483724 ("drm/amdgpu/uapi: Introduce AMDGPU_GEM_DOMAIN_MMIO_REMAP")
Fixes: 2a7a794eb82c ("drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton")
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Leo Liu &lt;leo.liu@amd.com&gt;
Cc: Ruijing Dong &lt;ruijing.dong@amd.com&gt;
Cc: David (Ming Qiang) Wu &lt;David.Wu3@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
"AMDGPU_GEM_DOMAIN_MMIO_REMAP" - Never activated as UAPI and it turned
out that this was to inflexible.

Allocate the MMIO_REMAP buffer object as a regular GEM BO and explicitly
move it into the fixed AMDGPU_PL_MMIO_REMAP placement at the TTM level.

This avoids relying on GEM domain bits for MMIO_REMAP, keeps the
placement purely internal, and makes the lifetime and pinning of the
global MMIO_REMAP BO explicit. The BO is pinned in TTM so it cannot be
migrated or evicted.

The corresponding free path relies on normal DRM teardown ordering,
where no further user ioctls can access the global BO once TTM teardown
begins.

v2 (Srini):
- Updated patch title.
- Drop use of AMDGPU_GEM_DOMAIN_MMIO_REMAP in amdgpu_ttm.c. The
  MMIO_REMAP domain bit is removed from UAPI, so keep the MMIO_REMAP BO
  allocation domain-less (bp.domain = 0) and rely on the TTM placement
  (AMDGPU_PL_MMIO_REMAP) for backing/pinning.
- Keep fdinfo/mem-stats visibility for MMIO_REMAP by classifying BOs
  based on bo-&gt;tbo.resource-&gt;mem_type == AMDGPU_PL_MMIO_REMAP, since the
  domain bit is removed.

v3: Squash patches #1 &amp; #3

Fixes: 056132483724 ("drm/amdgpu/uapi: Introduce AMDGPU_GEM_DOMAIN_MMIO_REMAP")
Fixes: 2a7a794eb82c ("drm/amdgpu/ttm: Allocate/Free 4K MMIO_REMAP Singleton")
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Leo Liu &lt;leo.liu@amd.com&gt;
Cc: Ruijing Dong &lt;ruijing.dong@amd.com&gt;
Cc: David (Ming Qiang) Wu &lt;David.Wu3@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amdgpu: Refactor amdgpu_gem_va_ioctl for Handling Last Fence Update and Timeline Management v4</title>
<updated>2026-01-05T22:00:00+00:00</updated>
<author>
<name>Srinivasan Shanmugam</name>
<email>srinivasan.shanmugam@amd.com</email>
</author>
<published>2025-12-11T15:55:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bd8150a1b3370a9f7761c5814202a3fe5a79f44f'/>
<id>bd8150a1b3370a9f7761c5814202a3fe5a79f44f</id>
<content type='text'>
This commit simplifies the amdgpu_gem_va_ioctl function, key updates
include:
 - Moved the logic for managing the last update fence directly into
   amdgpu_gem_va_update_vm.
 - Introduced checks for the timeline point to enable conditional
   replacement or addition of fences.

v2: Addressed review comments from Christian.
v3: Updated comments (Christian).
v4: The previous version selected the fence too early and did not manage its
    reference correctly, which could lead to stale or freed fences being used.
    This resulted in refcount underflows and could crash when updating GPU
    timelines.
    The fence is now chosen only after the VA mapping work is completed, and its
    reference is taken safely. After exporting it to the VM timeline syncobj, the
    driver always drops its local fence reference, ensuring balanced refcounting
    and avoiding use-after-free on dma_fence.

	Crash signature:
	[  205.828135] refcount_t: underflow; use-after-free.
	[  205.832963] WARNING: CPU: 30 PID: 7274 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
	...
	[  206.074014] Call Trace:
	[  206.076488]  &lt;TASK&gt;
	[  206.078608]  amdgpu_gem_va_ioctl+0x6ea/0x740 [amdgpu]
	[  206.084040]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
	[  206.089994]  drm_ioctl_kernel+0x86/0xe0 [drm]
	[  206.094415]  drm_ioctl+0x26e/0x520 [drm]
	[  206.098424]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
	[  206.104402]  amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
	[  206.109387]  __x64_sys_ioctl+0x96/0xe0
	[  206.113156]  do_syscall_64+0x66/0x2d0
	...
	[  206.553351] BUG: unable to handle page fault for address: ffffffffc0dfde90
	...
	[  206.553378] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
	...
	[  206.553405] Call Trace:
	[  206.553409]  &lt;IRQ&gt;
	[  206.553415]  ? __pfx_drm_sched_fence_free_rcu+0x10/0x10 [gpu_sched]
	[  206.553424]  dma_fence_signal+0x30/0x60
	[  206.553427]  drm_sched_job_done.isra.0+0x123/0x150 [gpu_sched]
	[  206.553434]  dma_fence_signal_timestamp_locked+0x6e/0xe0
	[  206.553437]  dma_fence_signal+0x30/0x60
	[  206.553441]  amdgpu_fence_process+0xd8/0x150 [amdgpu]
	[  206.553854]  sdma_v4_0_process_trap_irq+0x97/0xb0 [amdgpu]
	[  206.554353]  edac_mce_amd(E) ee1004(E)
	[  206.554270]  amdgpu_irq_dispatch+0x150/0x230 [amdgpu]
	[  206.554702]  amdgpu_ih_process+0x6a/0x180 [amdgpu]
	[  206.555101]  amdgpu_irq_handler+0x23/0x60 [amdgpu]
	[  206.555500]  __handle_irq_event_percpu+0x4a/0x1c0
	[  206.555506]  handle_irq_event+0x38/0x80
	[  206.555509]  handle_edge_irq+0x92/0x1e0
	[  206.555513]  __common_interrupt+0x3e/0xb0
	[  206.555519]  common_interrupt+0x80/0xa0
	[  206.555525]  &lt;/IRQ&gt;
	[  206.555527]  &lt;TASK&gt;
	...
	[  206.555650] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
	...
	[  206.555667] Kernel panic - not syncing: Fatal exception in interrupt

Link: https://patchwork.freedesktop.org/patch/654669/
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This commit simplifies the amdgpu_gem_va_ioctl function, key updates
include:
 - Moved the logic for managing the last update fence directly into
   amdgpu_gem_va_update_vm.
 - Introduced checks for the timeline point to enable conditional
   replacement or addition of fences.

v2: Addressed review comments from Christian.
v3: Updated comments (Christian).
v4: The previous version selected the fence too early and did not manage its
    reference correctly, which could lead to stale or freed fences being used.
    This resulted in refcount underflows and could crash when updating GPU
    timelines.
    The fence is now chosen only after the VA mapping work is completed, and its
    reference is taken safely. After exporting it to the VM timeline syncobj, the
    driver always drops its local fence reference, ensuring balanced refcounting
    and avoiding use-after-free on dma_fence.

	Crash signature:
	[  205.828135] refcount_t: underflow; use-after-free.
	[  205.832963] WARNING: CPU: 30 PID: 7274 at lib/refcount.c:28 refcount_warn_saturate+0xbe/0x110
	...
	[  206.074014] Call Trace:
	[  206.076488]  &lt;TASK&gt;
	[  206.078608]  amdgpu_gem_va_ioctl+0x6ea/0x740 [amdgpu]
	[  206.084040]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
	[  206.089994]  drm_ioctl_kernel+0x86/0xe0 [drm]
	[  206.094415]  drm_ioctl+0x26e/0x520 [drm]
	[  206.098424]  ? __pfx_amdgpu_gem_va_ioctl+0x10/0x10 [amdgpu]
	[  206.104402]  amdgpu_drm_ioctl+0x4b/0x80 [amdgpu]
	[  206.109387]  __x64_sys_ioctl+0x96/0xe0
	[  206.113156]  do_syscall_64+0x66/0x2d0
	...
	[  206.553351] BUG: unable to handle page fault for address: ffffffffc0dfde90
	...
	[  206.553378] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
	...
	[  206.553405] Call Trace:
	[  206.553409]  &lt;IRQ&gt;
	[  206.553415]  ? __pfx_drm_sched_fence_free_rcu+0x10/0x10 [gpu_sched]
	[  206.553424]  dma_fence_signal+0x30/0x60
	[  206.553427]  drm_sched_job_done.isra.0+0x123/0x150 [gpu_sched]
	[  206.553434]  dma_fence_signal_timestamp_locked+0x6e/0xe0
	[  206.553437]  dma_fence_signal+0x30/0x60
	[  206.553441]  amdgpu_fence_process+0xd8/0x150 [amdgpu]
	[  206.553854]  sdma_v4_0_process_trap_irq+0x97/0xb0 [amdgpu]
	[  206.554353]  edac_mce_amd(E) ee1004(E)
	[  206.554270]  amdgpu_irq_dispatch+0x150/0x230 [amdgpu]
	[  206.554702]  amdgpu_ih_process+0x6a/0x180 [amdgpu]
	[  206.555101]  amdgpu_irq_handler+0x23/0x60 [amdgpu]
	[  206.555500]  __handle_irq_event_percpu+0x4a/0x1c0
	[  206.555506]  handle_irq_event+0x38/0x80
	[  206.555509]  handle_edge_irq+0x92/0x1e0
	[  206.555513]  __common_interrupt+0x3e/0xb0
	[  206.555519]  common_interrupt+0x80/0xa0
	[  206.555525]  &lt;/IRQ&gt;
	[  206.555527]  &lt;TASK&gt;
	...
	[  206.555650] RIP: 0010:dma_fence_signal_timestamp_locked+0x39/0xe0
	...
	[  206.555667] Kernel panic - not syncing: Fatal exception in interrupt

Link: https://patchwork.freedesktop.org/patch/654669/
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Suggested-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Srinivasan Shanmugam &lt;srinivasan.shanmugam@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
