<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/drivers/gpu/drm/scheduler, branch linux-6.8.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>drm/sched: fix null-ptr-deref in init entity</title>
<updated>2024-04-03T13:32:51+00:00</updated>
<author>
<name>Vitaly Prosyak</name>
<email>vitaly.prosyak@amd.com</email>
</author>
<published>2024-03-15T02:39:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=54b5b7275dfdec35812ccce70930cd7c4ee612b2'/>
<id>54b5b7275dfdec35812ccce70930cd7c4ee612b2</id>
<content type='text'>
commit f34e8bb7d6c6626933fe993e03ed59ae85e16abb upstream.

The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung &lt;joonkyoj@yonsei.ac.kr&gt;.
For example the following code:

    static void Syzkaller2(int fd)
    {
	union drm_amdgpu_ctx arg1;
	union drm_amdgpu_wait_cs arg2;

	arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX;
	ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, &amp;arg1);

	arg2.in.handle = 0x0;
	arg2.in.timeout = 0x2000000000000;
	arg2.in.ip_type = AMD_IP_VPE /* 0x9 */;
	arg2-&gt;in.ip_instance = 0x0;
	arg2.in.ring = 0x0;
	arg2.in.ctx_id = arg1.out.alloc.ctx_id;

	drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, &amp;arg2);
    }

The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that
the error should be returned, but the following commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
modified the logic and allowed to have sched_rq equal to NULL.

As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success.
The change fixes null-ptr-deref in init entity and the stack below demonstrates
the error condition:

[  +0.000007] BUG: kernel NULL pointer dereference, address: 0000000000000028
[  +0.007086] #PF: supervisor read access in kernel mode
[  +0.005234] #PF: error_code(0x0000) - not-present page
[  +0.005232] PGD 0 P4D 0
[  +0.002501] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[  +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: G    B   W    L     6.7.0+ #4
[  +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[  +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 &lt;48&gt; 83 7b 28 00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c
[  +0.019094] RSP: 0018:ffffc90014c1fa40 EFLAGS: 00010282
[  +0.005237] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8113f3fa
[  +0.007326] RDX: fffffbfff0a7889d RSI: 0000000000000008 RDI: ffffffff853c44e0
[  +0.007264] RBP: ffffc90014c1fa80 R08: 0000000000000001 R09: fffffbfff0a7889c
[  +0.007266] R10: ffffffff853c44e7 R11: 0000000000000001 R12: ffff8881a719b010
[  +0.007263] R13: ffff88810d412748 R14: 0000000000000002 R15: 0000000000000000
[  +0.007264] FS:  00007ffff7045540(0000) GS:ffff8883cc900000(0000) knlGS:0000000000000000
[  +0.008236] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.005851] CR2: 0000000000000028 CR3: 000000011912e000 CR4: 0000000000350ef0
[  +0.007175] Call Trace:
[  +0.002561]  &lt;TASK&gt;
[  +0.002141]  ? show_regs+0x6a/0x80
[  +0.003473]  ? __die+0x25/0x70
[  +0.003124]  ? page_fault_oops+0x214/0x720
[  +0.004179]  ? preempt_count_sub+0x18/0xc0
[  +0.004093]  ? __pfx_page_fault_oops+0x10/0x10
[  +0.004590]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? vprintk_default+0x1d/0x30
[  +0.004063]  ? srso_return_thunk+0x5/0x5f
[  +0.004087]  ? vprintk+0x5c/0x90
[  +0.003296]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005807]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _printk+0xb3/0xe0
[  +0.003293]  ? __pfx__printk+0x10/0x10
[  +0.003735]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  +0.005482]  ? do_user_addr_fault+0x345/0x770
[  +0.004361]  ? exc_page_fault+0x64/0xf0
[  +0.003972]  ? asm_exc_page_fault+0x27/0x30
[  +0.004271]  ? add_taint+0x2a/0xa0
[  +0.003476]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005812]  amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu]
[  +0.009530]  ? finish_task_switch.isra.0+0x129/0x470
[  +0.005068]  ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu]
[  +0.010063]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004001]  ? mutex_unlock+0x81/0xd0
[  +0.003802]  ? srso_return_thunk+0x5/0x5f
[  +0.004096]  amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu]
[  +0.009355]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009981]  ? srso_return_thunk+0x5/0x5f
[  +0.004089]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __srcu_read_lock+0x20/0x50
[  +0.004096]  drm_ioctl_kernel+0x140/0x1f0 [drm]
[  +0.005080]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009974]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  +0.005618]  ? srso_return_thunk+0x5/0x5f
[  +0.004088]  ? __kasan_check_write+0x14/0x20
[  +0.004357]  drm_ioctl+0x3da/0x730 [drm]
[  +0.004461]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009979]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  +0.004993]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _raw_spin_lock_irqsave+0x99/0x100
[  +0.004712]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.005063]  ? __pfx_arch_do_signal_or_restart+0x10/0x10
[  +0.005477]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? preempt_count_sub+0x18/0xc0
[  +0.004237]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[  +0.005069]  amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu]
[  +0.008912]  __x64_sys_ioctl+0xcd/0x110
[  +0.003918]  do_syscall_64+0x5f/0xe0
[  +0.003649]  ? noist_exc_debug+0xe6/0x120
[  +0.004095]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  +0.005150] RIP: 0033:0x7ffff7b1a94f
[  +0.003647] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 &lt;41&gt; 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[  +0.019097] RSP: 002b:00007fffffffe0a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.007708] RAX: ffffffffffffffda RBX: 000055555558b360 RCX: 00007ffff7b1a94f
[  +0.007176] RDX: 000055555558b360 RSI: 00000000c0206449 RDI: 0000000000000003
[  +0.007326] RBP: 00000000c0206449 R08: 000055555556ded0 R09: 000000007fffffff
[  +0.007176] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5d8
[  +0.007238] R13: 0000000000000003 R14: 000055555555cba8 R15: 00007ffff7ffd040
[  +0.007250]  &lt;/TASK&gt;

v2: Reworked check to guard against null ptr deref and added helpful comments
    (Christian)

Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Cc: Bas Nieuwenhuizen &lt;bas@basnieuwenhuizen.nl&gt;
Cc: Joonkyo Jung &lt;joonkyoj@yonsei.ac.kr&gt;
Cc: Dokyung Song &lt;dokyungs@yonsei.ac.kr&gt;
Cc: &lt;jisoo.jang@yonsei.ac.kr&gt;
Cc: &lt;yw9865@yonsei.ac.kr&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Link: https://patchwork.freedesktop.org/patch/msgid/20240315023926.343164-1-vitaly.prosyak@amd.com
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f34e8bb7d6c6626933fe993e03ed59ae85e16abb upstream.

The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung &lt;joonkyoj@yonsei.ac.kr&gt;.
For example the following code:

    static void Syzkaller2(int fd)
    {
	union drm_amdgpu_ctx arg1;
	union drm_amdgpu_wait_cs arg2;

	arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX;
	ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, &amp;arg1);

	arg2.in.handle = 0x0;
	arg2.in.timeout = 0x2000000000000;
	arg2.in.ip_type = AMD_IP_VPE /* 0x9 */;
	arg2-&gt;in.ip_instance = 0x0;
	arg2.in.ring = 0x0;
	arg2.in.ctx_id = arg1.out.alloc.ctx_id;

	drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, &amp;arg2);
    }

The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that
the error should be returned, but the following commit 1decbf6bb0b4dc56c9da6c5e57b994ebfc2be3aa
modified the logic and allowed to have sched_rq equal to NULL.

As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success.
The change fixes null-ptr-deref in init entity and the stack below demonstrates
the error condition:

[  +0.000007] BUG: kernel NULL pointer dereference, address: 0000000000000028
[  +0.007086] #PF: supervisor read access in kernel mode
[  +0.005234] #PF: error_code(0x0000) - not-present page
[  +0.005232] PGD 0 P4D 0
[  +0.002501] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
[  +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: G    B   W    L     6.7.0+ #4
[  +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[  +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 &lt;48&gt; 83 7b 28 00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c
[  +0.019094] RSP: 0018:ffffc90014c1fa40 EFLAGS: 00010282
[  +0.005237] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8113f3fa
[  +0.007326] RDX: fffffbfff0a7889d RSI: 0000000000000008 RDI: ffffffff853c44e0
[  +0.007264] RBP: ffffc90014c1fa80 R08: 0000000000000001 R09: fffffbfff0a7889c
[  +0.007266] R10: ffffffff853c44e7 R11: 0000000000000001 R12: ffff8881a719b010
[  +0.007263] R13: ffff88810d412748 R14: 0000000000000002 R15: 0000000000000000
[  +0.007264] FS:  00007ffff7045540(0000) GS:ffff8883cc900000(0000) knlGS:0000000000000000
[  +0.008236] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.005851] CR2: 0000000000000028 CR3: 000000011912e000 CR4: 0000000000350ef0
[  +0.007175] Call Trace:
[  +0.002561]  &lt;TASK&gt;
[  +0.002141]  ? show_regs+0x6a/0x80
[  +0.003473]  ? __die+0x25/0x70
[  +0.003124]  ? page_fault_oops+0x214/0x720
[  +0.004179]  ? preempt_count_sub+0x18/0xc0
[  +0.004093]  ? __pfx_page_fault_oops+0x10/0x10
[  +0.004590]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? vprintk_default+0x1d/0x30
[  +0.004063]  ? srso_return_thunk+0x5/0x5f
[  +0.004087]  ? vprintk+0x5c/0x90
[  +0.003296]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005807]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _printk+0xb3/0xe0
[  +0.003293]  ? __pfx__printk+0x10/0x10
[  +0.003735]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  +0.005482]  ? do_user_addr_fault+0x345/0x770
[  +0.004361]  ? exc_page_fault+0x64/0xf0
[  +0.003972]  ? asm_exc_page_fault+0x27/0x30
[  +0.004271]  ? add_taint+0x2a/0xa0
[  +0.003476]  ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched]
[  +0.005812]  amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu]
[  +0.009530]  ? finish_task_switch.isra.0+0x129/0x470
[  +0.005068]  ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu]
[  +0.010063]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004001]  ? mutex_unlock+0x81/0xd0
[  +0.003802]  ? srso_return_thunk+0x5/0x5f
[  +0.004096]  amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu]
[  +0.009355]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009981]  ? srso_return_thunk+0x5/0x5f
[  +0.004089]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __srcu_read_lock+0x20/0x50
[  +0.004096]  drm_ioctl_kernel+0x140/0x1f0 [drm]
[  +0.005080]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009974]  ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm]
[  +0.005618]  ? srso_return_thunk+0x5/0x5f
[  +0.004088]  ? __kasan_check_write+0x14/0x20
[  +0.004357]  drm_ioctl+0x3da/0x730 [drm]
[  +0.004461]  ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu]
[  +0.009979]  ? __pfx_drm_ioctl+0x10/0x10 [drm]
[  +0.004993]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? __kasan_check_write+0x14/0x20
[  +0.004356]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _raw_spin_lock_irqsave+0x99/0x100
[  +0.004712]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  +0.005063]  ? __pfx_arch_do_signal_or_restart+0x10/0x10
[  +0.005477]  ? srso_return_thunk+0x5/0x5f
[  +0.004000]  ? preempt_count_sub+0x18/0xc0
[  +0.004237]  ? srso_return_thunk+0x5/0x5f
[  +0.004090]  ? _raw_spin_unlock_irqrestore+0x27/0x50
[  +0.005069]  amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu]
[  +0.008912]  __x64_sys_ioctl+0xcd/0x110
[  +0.003918]  do_syscall_64+0x5f/0xe0
[  +0.003649]  ? noist_exc_debug+0xe6/0x120
[  +0.004095]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[  +0.005150] RIP: 0033:0x7ffff7b1a94f
[  +0.003647] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 &lt;41&gt; 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00
[  +0.019097] RSP: 002b:00007fffffffe0a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.007708] RAX: ffffffffffffffda RBX: 000055555558b360 RCX: 00007ffff7b1a94f
[  +0.007176] RDX: 000055555558b360 RSI: 00000000c0206449 RDI: 0000000000000003
[  +0.007326] RBP: 00000000c0206449 R08: 000055555556ded0 R09: 000000007fffffff
[  +0.007176] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5d8
[  +0.007238] R13: 0000000000000003 R14: 000055555555cba8 R15: 00007ffff7ffd040
[  +0.007250]  &lt;/TASK&gt;

v2: Reworked check to guard against null ptr deref and added helpful comments
    (Christian)

Cc: Christian Koenig &lt;christian.koenig@amd.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Cc: Bas Nieuwenhuizen &lt;bas@basnieuwenhuizen.nl&gt;
Cc: Joonkyo Jung &lt;joonkyoj@yonsei.ac.kr&gt;
Cc: Dokyung Song &lt;dokyungs@yonsei.ac.kr&gt;
Cc: &lt;jisoo.jang@yonsei.ac.kr&gt;
Cc: &lt;yw9865@yonsei.ac.kr&gt;
Signed-off-by: Vitaly Prosyak &lt;vitaly.prosyak@amd.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Link: https://patchwork.freedesktop.org/patch/msgid/20240315023926.343164-1-vitaly.prosyak@amd.com
Signed-off-by: Christian König &lt;christian.koenig@amd.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() returns NULL</title>
<updated>2024-02-06T02:47:43+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-01-30T03:04:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d0399da9fb5f8e3d897b9776bffee2d3bfe20210'/>
<id>d0399da9fb5f8e3d897b9776bffee2d3bfe20210</id>
<content type='text'>
Rather then loop over entities until one with a ready job is found,
re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
Reviewed-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240130030413.2031009-1-matthew.brost@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rather then loop over entities until one with a ready job is found,
re-queue the run job worker when drm_sched_entity_pop_job() returns NULL.

Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Fixes: 66dbd9004a55 ("drm/sched: Drain all entities in DRM sched run job worker")
Reviewed-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240130030413.2031009-1-matthew.brost@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Drain all entities in DRM sched run job worker</title>
<updated>2024-01-26T02:46:36+00:00</updated>
<author>
<name>Matthew Brost</name>
<email>matthew.brost@intel.com</email>
</author>
<published>2024-01-24T21:08:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=66dbd9004a55073c5931f5f65f5fe2bbd414bdaa'/>
<id>66dbd9004a55073c5931f5f65f5fe2bbd414bdaa</id>
<content type='text'>
All entities must be drained in the DRM scheduler run job worker to
avoid the following case. An entity found that is ready, no job found
ready on entity, and run job worker goes idle with other entities + jobs
ready. Draining all ready entities (i.e. loop over all ready entities)
in the run job worker ensures all job that are ready will be scheduled.

Cc: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Reported-by: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Closes: https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuOWtoeeE+q26zE+Q@mail.gmail.com/
Reported-and-tested-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124
Link: https://lore.kernel.org/all/20240123021155.2775-1-mario.limonciello@amd.com/
Reported-and-tested-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Closes: https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd159a8@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240124210811.1639040-1-matthew.brost@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
All entities must be drained in the DRM scheduler run job worker to
avoid the following case. An entity found that is ready, no job found
ready on entity, and run job worker goes idle with other entities + jobs
ready. Draining all ready entities (i.e. loop over all ready entities)
in the run job worker ensures all job that are ready will be scheduled.

Cc: Thorsten Leemhuis &lt;regressions@leemhuis.info&gt;
Reported-by: Mikhail Gavrilov &lt;mikhail.v.gavrilov@gmail.com&gt;
Closes: https://lore.kernel.org/all/CABXGCsM2VLs489CH-vF-1539-s3in37=bwuOWtoeeE+q26zE+Q@mail.gmail.com/
Reported-and-tested-by: Mario Limonciello &lt;mario.limonciello@amd.com&gt;
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3124
Link: https://lore.kernel.org/all/20240123021155.2775-1-mario.limonciello@amd.com/
Reported-and-tested-by: Vlastimil Babka &lt;vbabka@suse.cz&gt;
Closes: https://lore.kernel.org/dri-devel/05ddb2da-b182-4791-8ef7-82179fd159a8@amd.com/T/#m0c31d4d1b9ae9995bb880974c4f1dbaddc33a48a
Signed-off-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Signed-off-by: Dave Airlie &lt;airlied@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20240124210811.1639040-1-matthew.brost@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Partial revert of "Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()"</title>
<updated>2023-11-28T19:16:56+00:00</updated>
<author>
<name>Bert Karwatzki</name>
<email>spasswolf@web.de</email>
</author>
<published>2023-11-27T16:09:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f92a39ae47076ea123c7980fb85e6e33313f372e'/>
<id>f92a39ae47076ea123c7980fb85e6e33313f372e</id>
<content type='text'>
Commit f3123c2590005c, in combination with the use of work queues by the GPU
scheduler, leads to random lock-ups of the GUI.

This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup() still
needs its entity argument to pass it to drm_sched_can_queue().

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994
Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

Signed-off-by: Bert Karwatzki &lt;spasswolf@web.de&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231127160955.87879-1-spasswolf@web.de
Link: https://lore.kernel.org/r/36bece178ff5dc705065e53d1e5e41f6db6d87e4.camel@web.de
Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()")
Reviewed-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit f3123c2590005c, in combination with the use of work queues by the GPU
scheduler, leads to random lock-ups of the GUI.

This is a partial revert of of commit f3123c2590005c since drm_sched_wakeup() still
needs its entity argument to pass it to drm_sched_can_queue().

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2994
Link: https://lists.freedesktop.org/archives/dri-devel/2023-November/431606.html

Signed-off-by: Bert Karwatzki &lt;spasswolf@web.de&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231127160955.87879-1-spasswolf@web.de
Link: https://lore.kernel.org/r/36bece178ff5dc705065e53d1e5e41f6db6d87e4.camel@web.de
Fixes: f3123c2590005c ("drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()")
Reviewed-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Reverse run-queue priority enumeration</title>
<updated>2023-11-25T04:03:53+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>ltuikov89@gmail.com</email>
</author>
<published>2023-11-23T03:08:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=38f922a563aac3148ac73e73689805917f034cb5'/>
<id>38f922a563aac3148ac73e73689805917f034cb5</id>
<content type='text'>
Reverse run-queue priority enumeration such that the higest priority is now 0,
and for each consecutive integer the prioirty diminishes.

Run-queues correspond to priorities. To an external observer a scheduler
created with a single run-queue, and another created with
DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule
sched-&gt;sched_rq[0] with the same "priority", as that index run-queue exists in
both schedulers, i.e. a scheduler with one run-queue or many. This patch makes
it so.

In other words, the "priority" of sched-&gt;sched_rq[n], n &gt;= 0, is the same for
any scheduler created with any allowable number of run-queues (priorities), 0
to DRM_SCHED_PRIORITY_COUNT.

Cc: Rob Clark &lt;robdclark@gmail.com&gt;
Cc: Abhinav Kumar &lt;quic_abhinavk@quicinc.com&gt;
Cc: Dmitry Baryshkov &lt;dmitry.baryshkov@linaro.org&gt;
Cc: Danilo Krummrich &lt;dakr@redhat.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Reverse run-queue priority enumeration such that the higest priority is now 0,
and for each consecutive integer the prioirty diminishes.

Run-queues correspond to priorities. To an external observer a scheduler
created with a single run-queue, and another created with
DRM_SCHED_PRIORITY_COUNT number of run-queues, should always schedule
sched-&gt;sched_rq[0] with the same "priority", as that index run-queue exists in
both schedulers, i.e. a scheduler with one run-queue or many. This patch makes
it so.

In other words, the "priority" of sched-&gt;sched_rq[n], n &gt;= 0, is the same for
any scheduler created with any allowable number of run-queues (priorities), 0
to DRM_SCHED_PRIORITY_COUNT.

Cc: Rob Clark &lt;robdclark@gmail.com&gt;
Cc: Abhinav Kumar &lt;quic_abhinavk@quicinc.com&gt;
Cc: Dmitry Baryshkov &lt;dmitry.baryshkov@linaro.org&gt;
Cc: Danilo Krummrich &lt;dakr@redhat.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-6-ltuikov89@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Rename priority MIN to LOW</title>
<updated>2023-11-25T04:03:53+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>ltuikov89@gmail.com</email>
</author>
<published>2023-11-15T01:40:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fe375c74806dbd30b00ec038a80a5b7bf4653ab7'/>
<id>fe375c74806dbd30b00ec038a80a5b7bf4653ab7</id>
<content type='text'>
Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW.

This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities
in ascending order,
  DRM_SCHED_PRIORITY_LOW,
  DRM_SCHED_PRIORITY_NORMAL,
  DRM_SCHED_PRIORITY_HIGH,
  DRM_SCHED_PRIORITY_KERNEL.

Cc: Rob Clark &lt;robdclark@gmail.com&gt;
Cc: Abhinav Kumar &lt;quic_abhinavk@quicinc.com&gt;
Cc: Dmitry Baryshkov &lt;dmitry.baryshkov@linaro.org&gt;
Cc: Danilo Krummrich &lt;dakr@redhat.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename DRM_SCHED_PRIORITY_MIN to DRM_SCHED_PRIORITY_LOW.

This mirrors DRM_SCHED_PRIORITY_HIGH, for a list of DRM scheduler priorities
in ascending order,
  DRM_SCHED_PRIORITY_LOW,
  DRM_SCHED_PRIORITY_NORMAL,
  DRM_SCHED_PRIORITY_HIGH,
  DRM_SCHED_PRIORITY_KERNEL.

Cc: Rob Clark &lt;robdclark@gmail.com&gt;
Cc: Abhinav Kumar &lt;quic_abhinavk@quicinc.com&gt;
Cc: Dmitry Baryshkov &lt;dmitry.baryshkov@linaro.org&gt;
Cc: Danilo Krummrich &lt;dakr@redhat.com&gt;
Cc: Alex Deucher &lt;alexander.deucher@amd.com&gt;
Cc: Christian König &lt;christian.koenig@amd.com&gt;
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231124052752.6915-5-ltuikov89@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Fix bounds limiting when given a malformed entity</title>
<updated>2023-11-25T03:56:57+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>ltuikov89@gmail.com</email>
</author>
<published>2023-11-23T04:58:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2bbe6ab2be53858507f11f99f856846d04765ae3'/>
<id>2bbe6ab2be53858507f11f99f856846d04765ae3</id>
<content type='text'>
If we're given a malformed entity in drm_sched_entity_init()--shouldn't
happen, but we verify--with out-of-bounds priority value, we set it to an
allowed value. Fix the expression which sets this limit.

Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Fixes: 56e449603f0ac5 ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Link: https://patchwork.freedesktop.org/patch/msgid/20231123122422.167832-2-ltuikov89@gmail.com
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://lore.kernel.org/r/dbb91dbe-ef77-4d79-aaf9-2adb171c1d7a@amd.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If we're given a malformed entity in drm_sched_entity_init()--shouldn't
happen, but we verify--with out-of-bounds priority value, we set it to an
allowed value. Fix the expression which sets this limit.

Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Fixes: 56e449603f0ac5 ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Link: https://patchwork.freedesktop.org/patch/msgid/20231123122422.167832-2-ltuikov89@gmail.com
Reviewed-by: Christian König &lt;christian.koenig@amd.com&gt;
Link: https://lore.kernel.org/r/dbb91dbe-ef77-4d79-aaf9-2adb171c1d7a@amd.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: implement dynamic job-flow control</title>
<updated>2023-11-10T01:54:29+00:00</updated>
<author>
<name>Danilo Krummrich</name>
<email>dakr@redhat.com</email>
</author>
<published>2023-11-10T00:16:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a78422e9dff366b3a46ae44caf6ec8ded9c9fc2f'/>
<id>a78422e9dff366b3a46ae44caf6ec8ded9c9fc2f</id>
<content type='text'>
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

Signed-off-by: Danilo Krummrich &lt;dakr@redhat.com&gt;
Reviewed-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.

This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.

However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.

In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.

Signed-off-by: Danilo Krummrich &lt;dakr@redhat.com&gt;
Reviewed-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()</title>
<updated>2023-11-10T00:05:35+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>ltuikov89@gmail.com</email>
</author>
<published>2023-11-09T23:53:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f3123c2590005c5ff631653d31428e40cd10c618'/>
<id>f3123c2590005c5ff631653d31428e40cd10c618</id>
<content type='text'>
Don't "wake up" the GPU scheduler unless the entity is ready, as well as we
can queue to the scheduler, i.e. there is no point in waking up the scheduler
for the entity unless the entity is ready.

Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Fixes: bc8d6a9df99038 ("drm/sched: Don't disturb the entity when in RR-mode scheduling")
Reviewed-by: Danilo Krummrich &lt;dakr@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231110000123.72565-2-ltuikov89@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Don't "wake up" the GPU scheduler unless the entity is ready, as well as we
can queue to the scheduler, i.e. there is no point in waking up the scheduler
for the entity unless the entity is ready.

Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Fixes: bc8d6a9df99038 ("drm/sched: Don't disturb the entity when in RR-mode scheduling")
Reviewed-by: Danilo Krummrich &lt;dakr@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231110000123.72565-2-ltuikov89@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>drm/sched: Don't disturb the entity when in RR-mode scheduling</title>
<updated>2023-11-08T04:16:39+00:00</updated>
<author>
<name>Luben Tuikov</name>
<email>ltuikov89@gmail.com</email>
</author>
<published>2023-11-02T22:17:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bc8d6a9df99038f61adf2881ad9f717abe414e06'/>
<id>bc8d6a9df99038f61adf2881ad9f717abe414e06</id>
<content type='text'>
Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.

The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().

v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
    Add fixes-tag. (Tvrtko)

Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Danilo Krummrich &lt;dakr@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231107041020.10035-2-ltuikov89@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Don't call drm_sched_select_entity() in drm_sched_run_job_queue().  In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.

The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.

Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().

v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
    Add fixes-tag. (Tvrtko)

Signed-off-by: Luben Tuikov &lt;ltuikov89@gmail.com&gt;
Fixes: f7fe64ad0f22ff ("drm/sched: Split free_job into own work item")
Reviewed-by: Matthew Brost &lt;matthew.brost@intel.com&gt;
Reviewed-by: Danilo Krummrich &lt;dakr@redhat.com&gt;
Link: https://patchwork.freedesktop.org/patch/msgid/20231107041020.10035-2-ltuikov89@gmail.com
</pre>
</div>
</content>
</entry>
</feed>
