linux.git/drivers/gpu/drm/lima, branch v6.9

drm/lima: standardize debug messages by ip name

2024-02-12T08:27:48+00:00

Some debug messages carried the ip name, or included "lima", or
included both the ip name and then the numbered ip name again.
Make the messages more consistent by always looking up and showing
the ip name first.

Signed-off-by: Erico Nunes 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-9-nunes.erico@gmail.com

drm/lima: increase default job timeout to 10s

2024-02-12T08:27:39+00:00

The previous 500ms default timeout was fairly optimistic and could be
hit by real world applications. Many distributions targeting devices
with a Mali-4xx already bumped this timeout to a higher limit.
We can be generous here with a high value as 10s since this should
mostly catch buggy jobs like infinite loop shaders, and these don't
seem to happen very often in real applications.

Signed-off-by: Erico Nunes 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-8-nunes.erico@gmail.com

drm/lima: remove guilty drm_sched context handling

2024-02-12T08:27:28+00:00

Marking the context as guilty currently only makes the application which
hits a single timeout problem to stop its rendering context entirely.
All jobs submitted later are dropped from the guilty context.

Lima runs on fairly underpowered hardware for modern standards and it is
not entirely unreasonable that a rendering job may time out occasionally
due to high system load or too demanding application stack. In this case
it would be generally preferred to report the error but try to keep the
application going.

Other similar embedded GPU drivers don't make use of the guilty context
flag. Now that there are reliability improvements to the lima timeout
recovery handling, drop the guilty contexts to let the application keep
running in this case.

Signed-off-by: Erico Nunes 
Acked-by: Christian König 
Reviewed-by: Vasily Khoruzhick 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-7-nunes.erico@gmail.com

drm/lima: handle spurious timeouts due to high irq latency

2024-02-12T08:27:17+00:00

There are several unexplained and unreproduced cases of rendering
timeouts with lima, for which one theory is high IRQ latency coming from
somewhere else in the system.
This kind of occurrence may cause applications to trigger unnecessary
resets of the GPU or even applications to hang if it hits an issue in
the recovery path.
Panfrost already does some special handling to account for such
"spurious timeouts", it makes sense to have this in lima too to reduce
the chance that it hit users.

Signed-off-by: Erico Nunes 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-6-nunes.erico@gmail.com

drm/lima: set gp bus_stop bit before hard reset

2024-02-12T08:27:07+00:00

This is required for reliable hard resets. Otherwise, doing a hard reset
while a task is still running (such as a task which is being stopped by
the drm_sched timeout handler) may result in random mmu write timeouts
or lockups which cause the entire gpu to hang.

Signed-off-by: Erico Nunes 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-5-nunes.erico@gmail.com

drm/lima: set pp bus_stop bit before hard reset

2024-02-12T08:26:57+00:00

This is required for reliable hard resets. Otherwise, doing a hard reset
while a task is still running (such as a task which is being stopped by
the drm_sched timeout handler) may result in random mmu write timeouts
or lockups which cause the entire gpu to hang.

Signed-off-by: Erico Nunes 
Reviewed-by: Vasily Khoruzhick 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-4-nunes.erico@gmail.com

drm/lima: reset async_reset on gp hard reset

2024-02-12T08:26:47+00:00

Lima gp jobs use an async reset to avoid having to wait for the soft
reset right after a job. The soft reset is done at the end of a job and
a reset_complete flag is expected to be set at the next job.
However, in case the user runs into a job timeout from any application,
a hard reset is issued to the hardware. This hard reset clears the
reset_complete flag, which causes an error message to show up before the
next job.
This is probably harmless for the execution but can be very confusing to
debug, as it blames a reset timeout on the next application to submit a
job.
Reset the async_reset flag when doing the hard reset so that we don't
get that message.

Signed-off-by: Erico Nunes 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-3-nunes.erico@gmail.com

drm/lima: reset async_reset on pp hard reset

2024-02-12T08:26:31+00:00

Lima pp jobs use an async reset to avoid having to wait for the soft
reset right after a job. The soft reset is done at the end of a job and
a reset_complete flag is expected to be set at the next job.
However, in case the user runs into a job timeout from any application,
a hard reset is issued to the hardware. This hard reset clears the
reset_complete flag, which causes an error message to show up before the
next job.
This is probably harmless for the execution but can be very confusing to
debug, as it blames a reset timeout on the next application to submit a
job.
Reset the async_reset flag when doing the hard reset so that we don't
get that message.

Signed-off-by: Erico Nunes 
Reviewed-by: Vasily Khoruzhick 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240124025947.2110659-2-nunes.erico@gmail.com

Merge drm/drm-next into drm-misc-next

2024-01-29T13:20:23+00:00

Kickstart 6.9 development cycle.

Signed-off-by: Maxime Ripard

drm/lima: fix a memleak in lima_heap_alloc

2024-01-19T02:12:01+00:00

When lima_vm_map_bo fails, the resources need to be deallocated, or
there will be memleaks.

Fixes: 6aebc51d7aef ("drm/lima: support heap buffer creation")
Signed-off-by: Zhipeng Lu 
Signed-off-by: Qiang Yu 
Link: https://patchwork.freedesktop.org/patch/msgid/20240117071328.3811480-1-alexious@zju.edu.cn