<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/gpu/drm/scheduler, branch linux-5.14.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>drm/sched: Avoid data corruptions</title>
<updated>2021-05-20T03:50:28+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2021-05-19T14:14:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0b10ab80695d61422337ede6ff496552d8ace99d'/>
<id>0b10ab80695d61422337ede6ff496552d8ace99d</id>
<content type='text'>
Wait for all dependencies of a job  to complete before
killing it to avoid data corruptions.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Wait for all dependencies of a job  to complete before
killing it to avoid data corruptions.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/scheduler: Fix hang when sched_entity released</title>
<updated>2021-05-20T03:50:28+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2021-05-12T14:26:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c61cdbdbffc169dc7f1e6fe94dfffaf574fe672a'/>
<id>c61cdbdbffc169dc7f1e6fe94dfffaf574fe672a</id>
<content type='text'>
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.

Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.

v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.

v3:
Drop drm_sched_rq_remove_entity, only modify entity-&gt;stopped
and check for it in drm_sched_entity_is_idle

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.

Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.

v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.

v3:
Drop drm_sched_rq_remove_entity, only modify entity-&gt;stopped
and check for it in drm_sched_entity_is_idle

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Make timeout timer rearm conditional.</title>
<updated>2021-05-20T03:50:28+00:00</updated>
<author>
<name>Andrey Grodzovsky</name>
<email>andrey.grodzovsky@amd.com</email>
</author>
<published>2021-05-12T14:26:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=75973e5802afece679d3936328e23a1891a9badc'/>
<id>75973e5802afece679d3936328e23a1891a9badc</id>
<content type='text'>
We don't want to rearm the timer if driver hook reports
that the device is gone.

v5: Update drm_gpu_sched_stat values in code.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-11-andrey.grodzovsky@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We don't want to rearm the timer if driver hook reports
that the device is gone.

v5: Update drm_gpu_sched_stat values in code.

Signed-off-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-11-andrey.grodzovsky@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/scheduler: Change scheduled fence track v2</title>
<updated>2021-05-05T07:26:36+00:00</updated>
<author>
<name>Roy Sun</name>
<email>Roy.Sun@amd.com</email>
</author>
<published>2021-04-26T06:27:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5'/>
<id>1774baa64f9395fa884ea9ed494bcb043f3b83f5</id>
<content type='text'>
Update the timestamp of scheduled fence on HW
completion of the previous fences

This allow more accurate tracking of the fence
execution in HW

v2 (chk): drop the flag check and improve the comment

Signed-off-by: David M Nieto &lt;david.nieto@amd.com&gt;
Signed-off-by: Roy Sun &lt;Roy.Sun@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210426062701.39732-1-Roy.Sun@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update the timestamp of scheduled fence on HW
completion of the previous fences

This allow more accurate tracking of the fence
execution in HW

v2 (chk): drop the flag check and improve the comment

Signed-off-by: David M Nieto &lt;david.nieto@amd.com&gt;
Signed-off-by: Roy Sun &lt;Roy.Sun@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210426062701.39732-1-Roy.Sun@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge drm/drm-next into drm-misc-next</title>
<updated>2021-04-26T12:03:09+00:00</updated>
<author>
<name>Maxime Ripard</name>
<email>maxime@cerno.tech</email>
</author>
<published>2021-04-26T12:03:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=355b60296143a090039211c5f0e1463f84aab65a'/>
<id>355b60296143a090039211c5f0e1463f84aab65a</id>
<content type='text'>
Christian needs some patches from drm/next

Signed-off-by: Maxime Ripard &lt;maxime@cerno.tech&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Christian needs some patches from drm/next

Signed-off-by: Maxime Ripard &lt;maxime@cerno.tech&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/scheduler/sched_entity: Fix some function name disparity</title>
<updated>2021-04-22T12:05:17+00:00</updated>
<author>
<name>Lee Jones</name>
<email>lee.jones@linaro.org</email>
</author>
<published>2021-04-16T14:37:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=04be0c5b40a3dce3e4790aa507d0029a7d31c0b2'/>
<id>04be0c5b40a3dce3e4790aa507d0029a7d31c0b2</id>
<content type='text'>
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead
 drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead
 drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead

Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Cc: "Christian König" &lt;christian.koenig@amd.com&gt;
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones &lt;lee.jones@linaro.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes the following W=1 kernel build warning(s):

 drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead
 drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead
 drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead

Cc: David Airlie &lt;airlied@linux.ie&gt;
Cc: Daniel Vetter &lt;daniel@ffwll.ch&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Cc: "Christian König" &lt;christian.koenig@amd.com&gt;
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones &lt;lee.jones@linaro.org&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/amd/amdgpu implement tdr advanced mode</title>
<updated>2021-04-09T20:45:45+00:00</updated>
<author>
<name>Jack Zhang</name>
<email>Jack.Zhang1@amd.com</email>
</author>
<published>2021-03-08T04:41:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e6c6338f393b74ac0b303d567bb918b44ae7ad75'/>
<id>e6c6338f393b74ac0b303d567bb918b44ae7ad75</id>
<content type='text'>
[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.

[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.

1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.

2. For whole gpu reset(vram lost), do resubmit as the old way.

v2: squash in build fix (Alex)

Signed-off-by: Jack Zhang &lt;Jack.Zhang1@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[Why]
Previous tdr design treats the first job in job_timeout as the bad job.
But sometimes a later bad compute job can block a good gfx job and
cause an unexpected gfx job timeout because gfx and compute ring share
internal GC HW mutually.

[How]
This patch implements an advanced tdr mode.It involves an additinal
synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
step in order to find the real bad job.

1. At Step0 Resubmit stage, it synchronously submits and pends for the
first job being signaled. If it gets timeout, we identify it as guilty
and do hw reset. After that, we would do the normal resubmit step to
resubmit left jobs.

2. For whole gpu reset(vram lost), do resubmit as the old way.

v2: squash in build fix (Alex)

Signed-off-by: Jack Zhang &lt;Jack.Zhang1@amd.com&gt;
Reviewed-by: Andrey Grodzovsky &lt;andrey.grodzovsky@amd.com&gt;
Signed-off-by: Alex Deucher &lt;alexander.deucher@amd.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: select new rq even if there is only one v3</title>
<updated>2021-03-08T13:06:17+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2021-02-20T08:50:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ac4eb83ab255de9c31184df51fd1534ba36fd212'/>
<id>ac4eb83ab255de9c31184df51fd1534ba36fd212</id>
<content type='text'>
This is necessary when changing priorities of an entity.

v2: test the sched_list instead of num_sched.
v3: set the sched_list to NULL when there is only one entry

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sonny Jiang &lt;sonny.jiang@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is necessary when changing priorities of an entity.

v2: test the sched_list instead of num_sched.
v3: set the sched_list to NULL when there is only one entry

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-by: Sonny Jiang &lt;sonny.jiang@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/scheduler: provide scheduler score externally</title>
<updated>2021-02-05T09:47:11+00:00</updated>
<author>
<name>Christian König</name>
<email>christian.koenig@amd.com</email>
</author>
<published>2021-02-02T11:40:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f2f12eb9c32bc7a714276d8862efac2e7c41bcbe'/>
<id>f2f12eb9c32bc7a714276d8862efac2e7c41bcbe</id>
<content type='text'>
Allow multiple schedulers to share the load balancing score.

This is useful when one engine has different hw rings.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-and-Tested-by: Leo Liu &lt;leo.liu@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Allow multiple schedulers to share the load balancing score.

This is useful when one engine has different hw rings.

Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Reviewed-and-Tested-by: Leo Liu &lt;leo.liu@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/scheduler: Job timeout handler returns status (v3)</title>
<updated>2021-01-29T10:30:22+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>luben.tuikov@amd.com</email>
</author>
<published>2021-01-20T20:09:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a6a1f036c74e3d2a3a56b3140492c7c3ecb879f3'/>
<id>a6a1f036c74e3d2a3a56b3140492c7c3ecb879f3</id>
<content type='text'>
This patch does not change current behaviour.

The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the device (GPU) is no longer available, such as
after it's been unplugged, or whether all is
normal, i.e. current behaviour.

All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_GPU_SCHED_STAT_NOMINAL to restart the task's
timeout timer--this is the old behaviour, and is
preserved by this patch.

v2: Use enum as the status of a driver's job
    timeout callback method.

v3: Return scheduler/device information, rather
    than task information.

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: Andrey Grodzovsky &lt;Andrey.Grodzovsky@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Cc: Lucas Stach &lt;l.stach@pengutronix.de&gt;
Cc: Russell King &lt;linux+etnaviv@armlinux.org.uk&gt;
Cc: Christian Gmeiner &lt;christian.gmeiner@gmail.com&gt;
Cc: Qiang Yu &lt;yuq825@gmail.com&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Tomeu Vizoso &lt;tomeu.vizoso@collabora.com&gt;
Cc: Steven Price &lt;steven.price@arm.com&gt;
Cc: Alyssa Rosenzweig &lt;alyssa.rosenzweig@collabora.com&gt;
Cc: Eric Anholt &lt;eric@anholt.net&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Alyssa Rosenzweig &lt;alyssa.rosenzweig@collabora.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Steven Price &lt;steven.price@arm.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/415095/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch does not change current behaviour.

The driver's job timeout handler now returns
status indicating back to the DRM layer whether
the device (GPU) is no longer available, such as
after it's been unplugged, or whether all is
normal, i.e. current behaviour.

All drivers which make use of the
drm_sched_backend_ops' .timedout_job() callback
have been accordingly renamed and return the
would've-been default value of
DRM_GPU_SCHED_STAT_NOMINAL to restart the task's
timeout timer--this is the old behaviour, and is
preserved by this patch.

v2: Use enum as the status of a driver's job
    timeout callback method.

v3: Return scheduler/device information, rather
    than task information.

Cc: Alexander Deucher &lt;Alexander.Deucher@amd.com&gt;
Cc: Andrey Grodzovsky &lt;Andrey.Grodzovsky@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: Daniel Vetter &lt;daniel.vetter@ffwll.ch&gt;
Cc: Lucas Stach &lt;l.stach@pengutronix.de&gt;
Cc: Russell King &lt;linux+etnaviv@armlinux.org.uk&gt;
Cc: Christian Gmeiner &lt;christian.gmeiner@gmail.com&gt;
Cc: Qiang Yu &lt;yuq825@gmail.com&gt;
Cc: Rob Herring &lt;robh@kernel.org&gt;
Cc: Tomeu Vizoso &lt;tomeu.vizoso@collabora.com&gt;
Cc: Steven Price &lt;steven.price@arm.com&gt;
Cc: Alyssa Rosenzweig &lt;alyssa.rosenzweig@collabora.com&gt;
Cc: Eric Anholt &lt;eric@anholt.net&gt;
Reported-by: kernel test robot &lt;lkp@intel.com&gt;
Signed-off-by: Luben Tuikov &lt;luben.tuikov@amd.com&gt;
Acked-by: Alyssa Rosenzweig &lt;alyssa.rosenzweig@collabora.com&gt;
Acked-by: Christian König &lt;christian.koenig@amd.com&gt;
Acked-by: Steven Price &lt;steven.price@arm.com&gt;
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/415095/
</pre>
</div>
</content>
</entry>
</feed>
