linux.git/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c, branch v5.8

drm/amdkfd: Use a systematic method to calculate queue mask bit

2020-05-01T19:19:08+00:00

The queue mask used for set_resources always assumes the queue number
per pipe is 8, so KFD needs to align with that by using function
amdgpu_queue_mask_bit_to_set_resource_bit().

Signed-off-by: Yong Zhao 
Signed-off-by: Alex Deucher

drm/amdkfd: Enable over-subscription with >1 GWS queue

2020-04-28T20:20:30+00:00

The current GWS usage model will only allows a single GWS-enabled
process to be active on the GPU at once. This ensures that a
barrier-using kernel gets a known amount of GPU hardware, to
prevent deadlock due to inability to go beyond the GWS barrier.

The HWS watches how many GWS entries are assigned to each process,
and goes into over-subscription mode when two processes need more
than the 64 that are available. The current KFD method for working
with this is to allocate all 64 GWS entries to each GWS-capable
process.

When more than one GWS-enabled process is in the runlist, we must
make sure the runlist is in over-subscription mode, so that the
HWS gets a chained RUN_LIST packet and continues scheduling
kernels.

Signed-off-by: Joseph Greathouse 
Reviewed-by: Felix Kuehling 
Signed-off-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: change SDMA MQD memory type

2020-02-28T21:59:20+00:00

SDMA MQD memory type is NC that causes MQD data overwritten
accidentally by an old stable cache line. Changing it to UC
default for GART will fix the issue.

The mqd_gfx9 parameter is meant for control stacks that are
allocated together with user mode queue MQDs. Setting
mqd_gfx9 to true maps the control stack pages as NC.
Here it was accidentally applied to SDMA MQDs,
which are allocated together with the HIQ MQD. Setting
the mqd_gfx9 to false avoids that.

Signed-off-by: Eric Huang 
Acked-by: Yong Zhao 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Delete unnecessary unmap queue package submissions

2020-02-26T19:20:33+00:00

The previous way of using SDMA queue count to infer whether we should unmap
SDMA engines has bugs. The reason it did not cause issues is because MEC
firmware unmaps all queues (CP + SDMA) when a unmap package for compute
engine is received. Becasue of that, only one unmap queue package
is needed, instead of one unmap queue package for CP and each SDMA engine,
which results in much simpler driver code.

Signed-off-by: Yong Zhao 
Acked-by: Alex Deucher 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Delete excessive printings

2020-02-26T19:20:26+00:00

Those printings are duplicated or useless.

Signed-off-by: Yong Zhao 
Acked-by: Alex Deucher 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Count active CP queues directly

2020-02-26T19:20:13+00:00

The previous code of calculating active CP queues is problematic if
some SDMA queues are inactive. Fix that by counting CP queues directly.

Signed-off-by: Yong Zhao 
Acked-by: Alex Deucher 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Avoid ambiguity by indicating it's cp queue

2020-02-26T19:20:05+00:00

The queues represented in queue_bitmap are only CP queues.

Signed-off-by: Yong Zhao 
Acked-by: Alex Deucher 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Rename queue_count to active_queue_count

2020-02-26T19:19:38+00:00

The name is easier to understand the code.

Signed-off-by: Yong Zhao 
Acked-by: Alex Deucher 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-02-04T15:32:41+00:00

The sdma_queue_count increment should be done before
execute_queues_cpsch(), which calls pm_calc_rlib_size() where
sdma_queue_count is used to calculate whether over_subscription is
triggered.

With the previous code, when a SDMA queue is created,
compute_queue_count in pm_calc_rlib_size() is one more than the
actual compute queue number, because the queue_count has been
incremented while sdma_queue_count has not. This patch fixes that.

Signed-off-by: Yong Zhao 
Reviewed-by: Felix Kuehling 
Signed-off-by: Alex Deucher

drm/amdkfd: Add a message when SW scheduler is used

2020-01-16T18:38:07+00:00

SW scheduler is previously called non HW scheduler, or non HWS. This
message is useful when triaging issues from dmesg.

Signed-off-by: Yong Zhao 
Acked-by: Huang Rui 
Signed-off-by: Alex Deucher