linux.git/arch/riscv/kernel/stacktrace.c, branch v6.9

riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode

2023-03-09T22:50:35+00:00

When CONFIG_FRAME_POINTER is unset, the stack unwinding function
walk_stackframe randomly reads the stack and then, when KASAN is enabled,
it can lead to the following backtrace:

[    0.000000] ==================================================================
[    0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0xa6/0x11a
[    0.000000] Read of size 8 at addr ffffffff81807c40 by task swapper/0
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-12919-g24203e6db61f #43
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] Call Trace:
[    0.000000] [] walk_stackframe+0x0/0x11a
[    0.000000] [] init_param_lock+0x26/0x2a
[    0.000000] [] walk_stackframe+0xa2/0x11a
[    0.000000] [] dump_stack_lvl+0x22/0x36
[    0.000000] [] print_report+0x198/0x4a8
[    0.000000] [] init_param_lock+0x26/0x2a
[    0.000000] [] walk_stackframe+0xa2/0x11a
[    0.000000] [] kasan_report+0x9a/0xc8
[    0.000000] [] walk_stackframe+0xa2/0x11a
[    0.000000] [] walk_stackframe+0xa2/0x11a
[    0.000000] [] desc_make_final+0x80/0x84
[    0.000000] [] stack_trace_save+0x88/0xa6
[    0.000000] [] filter_irq_stacks+0x72/0x76
[    0.000000] [] devkmsg_read+0x32a/0x32e
[    0.000000] [] kasan_save_stack+0x28/0x52
[    0.000000] [] desc_make_final+0x7c/0x84
[    0.000000] [] stack_trace_save+0x84/0xa6
[    0.000000] [] kasan_set_track+0x12/0x20
[    0.000000] [] __kasan_slab_alloc+0x58/0x5e
[    0.000000] [] __kmem_cache_create+0x21e/0x39a
[    0.000000] [] create_boot_cache+0x70/0x9c
[    0.000000] [] kmem_cache_init+0x6c/0x11e
[    0.000000] [] mm_init+0xd8/0xfe
[    0.000000] [] start_kernel+0x190/0x3ca
[    0.000000]
[    0.000000] The buggy address belongs to stack of task swapper/0
[    0.000000]  and is located at offset 0 in frame:
[    0.000000]  stack_trace_save+0x0/0xa6
[    0.000000]
[    0.000000] This frame has 1 object:
[    0.000000]  [32, 56) 'c'
[    0.000000]
[    0.000000] The buggy address belongs to the physical page:
[    0.000000] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x81a07
[    0.000000] flags: 0x1000(reserved|zone=0)
[    0.000000] raw: 0000000000001000 ff600003f1e3d150 ff600003f1e3d150 0000000000000000
[    0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff
[    0.000000] page dumped because: kasan: bad access detected
[    0.000000]
[    0.000000] Memory state around the buggy address:
[    0.000000]  ffffffff81807b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000]  ffffffff81807b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000] >ffffffff81807c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f3
[    0.000000]                                            ^
[    0.000000]  ffffffff81807c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000]  ffffffff81807d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000] ==================================================================

Fix that by using READ_ONCE_NOCHECK when reading the stack in imprecise
mode.

Fixes: 5d8544e2d007 ("RISC-V: Generic library routines and assembly")
Reported-by: Chathura Rajapaksha 
Link: https://lore.kernel.org/all/CAD7mqryDQCYyJ1gAmtMm8SASMWAQ4i103ptTb0f6Oda=tPY2=A@mail.gmail.com/
Suggested-by: Dmitry Vyukov 
Signed-off-by: Alexandre Ghiti 
Link: https://lore.kernel.org/r/20230308091639.602024-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt

riscv: stacktrace: Fix missing the first frame

2023-02-03T03:33:05+00:00

When running kfence_test, I found some testcases failed like this:

 # test_out_of_bounds_read: EXPECTATION FAILED at mm/kfence/kfence_test.c:346
 Expected report_matches(&expect) to be true, but is false
 not ok 1 - test_out_of_bounds_read

The corresponding call-trace is:

 BUG: KFENCE: out-of-bounds read in kunit_try_run_case+0x38/0x84

 Out-of-bounds read at 0x(____ptrval____) (32B right of kfence-#10):
  kunit_try_run_case+0x38/0x84
  kunit_generic_run_threadfn_adapter+0x12/0x1e
  kthread+0xc8/0xde
  ret_from_exception+0x0/0xc

The kfence_test using the first frame of call trace to check whether the
testcase is succeed or not. Commit 6a00ef449370 ("riscv: eliminate
unreliable __builtin_frame_address(1)") skip first frame for all
case, which results the kfence_test failed. Indeed, we only need to skip
the first frame for case (task==NULL || task==current).

With this patch, the call-trace will be:

 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x88/0x19e

 Out-of-bounds read at 0x(____ptrval____) (1B left of kfence-#7):
  test_out_of_bounds_read+0x88/0x19e
  kunit_try_run_case+0x38/0x84
  kunit_generic_run_threadfn_adapter+0x12/0x1e
  kthread+0xc8/0xde
  ret_from_exception+0x0/0xc

Fixes: 6a00ef449370 ("riscv: eliminate unreliable __builtin_frame_address(1)")
Signed-off-by: Liu Shixin 
Tested-by: Samuel Holland 
Link: https://lore.kernel.org/r/20221207025038.1022045-1-liushixin2@huawei.com
Signed-off-by: Palmer Dabbelt

riscv: stacktrace: Make walk_stackframe cross pt_regs frame

2022-12-06T02:13:34+00:00

The current walk_stackframe with FRAME_POINTER would stop unwinding at
ret_from_exception:
  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
  CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
  Call Trace:
  [] walk_stackframe+0x0/0xee
  [] show_stack+0x32/0x4a
  [] dump_stack_lvl+0x72/0x8e
  [] dump_stack+0x14/0x1c
  [] ___might_sleep+0x12e/0x138
  [] __might_sleep+0x10/0x18
  [] down_read+0x22/0xa4
  [] do_page_fault+0xb0/0x2fe
  [] ret_from_exception+0x0/0xc

The optimization would help walk_stackframe cross the pt_regs frame and
get more backtrace of debug info:
  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
  CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
  Call Trace:
  [] walk_stackframe+0x0/0xee
  [] show_stack+0x32/0x4a
  [] dump_stack_lvl+0x72/0x8e
  [] dump_stack+0x14/0x1c
  [] ___might_sleep+0x12e/0x138
  [] __might_sleep+0x10/0x18
  [] down_read+0x22/0xa4
  [] do_page_fault+0xb0/0x2fe
  [] ret_from_exception+0x0/0xc
  [] riscv_intc_irq+0x1a/0x72
  [] ret_from_exception+0x0/0xc
  [] vma_link+0x54/0x160
  [] mmap_region+0x2cc/0x4d0
  [] do_mmap+0x2d8/0x3ac
  [] vm_mmap_pgoff+0x70/0xb8
  [] vm_mmap+0x2a/0x36
  [] elf_map+0x72/0x84
  [] load_elf_binary+0x69a/0xec8
  [] bprm_execve+0x246/0x53a
  [] kernel_execve+0xe8/0x124
  [] run_init_process+0xfa/0x10c
  [] try_to_run_init_process+0x12/0x3c
  [] kernel_init+0xb4/0xf8
  [] ret_from_exception+0x0/0xc

Here is the error injection test code for the above output:
 drivers/irqchip/irq-riscv-intc.c:
 static asmlinkage void riscv_intc_irq(struct pt_regs *regs)
 {
        unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
+       u32 tmp; __get_user(tmp, (u32 *)0);

Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Link: https://lore.kernel.org/r/20221109064937.3643993-3-guoren@kernel.org
[Palmer: use SYM_CODE_*]
Signed-off-by: Palmer Dabbelt

riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argument

2022-12-06T00:58:01+00:00

The 'retp' is a pointer to the return address on the stack, so we
must pass the current return address pointer as the 'retp'
argument to ftrace_push_return_trace(). Not parent function's
return address on the stack.

Fixes: b785ec129bd9 ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support")
Signed-off-by: Guo Ren 
Signed-off-by: Guo Ren 
Link: https://lore.kernel.org/r/20221109064937.3643993-2-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt

riscv: Rename "sp_in_global" to "current_stack_pointer"

2022-03-30T22:15:27+00:00

To follow the existing per-arch conventions, rename "sp_in_global" to
"current_stack_pointer". This will let it be used in non-arch places
(like HARDENED_USERCOPY).

Signed-off-by: Kees Cook 
Signed-off-by: Palmer Dabbelt

riscv: eliminate unreliable __builtin_frame_address(1)

2022-02-04T18:12:32+00:00

I tried different pieces of code which uses __builtin_frame_address(1)
(with both gcc version 7.5.0 and 10.3.0) to verify whether it works as
expected on riscv64. The result is negative.

What the compiler had generated is as below:
31                      fp = (unsigned long)__builtin_frame_address(1);
   0xffffffff80006024 <+200>:   ld      s1,0(s0)

It takes '0(s0)' as the address of frame 1 (caller), but the actual address
should be '-16(s0)'.

          |       ...       | <-+
          +-----------------+   |
          | return address  |   |
          | previous fp     |   |
          | saved registers |   |
          | local variables |   |
  $fp --> |       ...       |   |
          +-----------------+   |
          | return address  |   |
          | previous fp --------+
          | saved registers |
  $sp --> | local variables |
          +-----------------+

This leads the kernel can not dump the full stack trace on riscv.

[    7.222126][    T1] Call Trace:
[    7.222804][    T1] [] dump_backtrace+0x2c/0x3a

This problem is not exposed on most riscv builds just because the '0(s0)'
occasionally is the address frame 2 (caller's caller), if only ra and fp
are stored in frame 1 (caller).

          |       ...       | <-+
          +-----------------+   |
          | return address  |   |
  $fp --> | previous fp     |   |
          +-----------------+   |
          | return address  |   |
          | previous fp --------+
          | saved registers |
  $sp --> | local variables |
          +-----------------+

This could be a *bug* of gcc that should be fixed. But as noted in gcc
manual "Calling this function with a nonzero argument can have
unpredictable effects, including crashing the calling program.", let's
remove the '__builtin_frame_address(1)' in backtrace code.

With this fix now it can show full stack trace:
[   10.444838][    T1] Call Trace:
[   10.446199][    T1] [] dump_backtrace+0x2c/0x3a
[   10.447711][    T1] [] show_stack+0x32/0x3e
[   10.448710][    T1] [] dump_stack_lvl+0x58/0x7a
[   10.449941][    T1] [] dump_stack+0x14/0x1c
[   10.450929][    T1] [] ubsan_epilogue+0x10/0x5a
[   10.451869][    T1] [] __ubsan_handle_load_invalid_value+0x6c/0x78
[   10.453049][    T1] [] __pagevec_release+0x62/0x64
[   10.455476][    T1] [] truncate_inode_pages_range+0x132/0x5be
[   10.456798][    T1] [] truncate_inode_pages+0x24/0x30
[   10.457853][    T1] [] kill_bdev+0x32/0x3c
...

Signed-off-by: Changbin Du 
Fixes: eac2f3059e02 ("riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabled")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt

arch: Make ARCH_STACKWALK independent of STACKTRACE

2021-12-10T14:06:03+00:00

Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) 
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland 
Cc: Albert Ou 
Cc: Borislav Petkov 
Cc: Christian Borntraeger 
Cc: Dave Hansen 
Cc: Heiko Carstens 
Cc: Ingo Molnar 
Cc: Michael Ellerman 
Cc: Palmer Dabbelt 
Cc: Paul Walmsley 
Cc: Thomas Gleixner 
Cc: Vasily Gorbik 
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas

sched: Add wrapper for get_wchan() to keep task blocked

2021-10-15T09:25:14+00:00

Having a stable wchan means the process must be blocked and for it to
stay that way while performing stack unwinding.

Suggested-by: Peter Zijlstra 
Signed-off-by: Kees Cook 
Signed-off-by: Peter Zijlstra (Intel) 
Acked-by: Geert Uytterhoeven 
Acked-by: Russell King (Oracle)  [arm]
Tested-by: Mark Rutland  [arm64]
Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org

riscv: stacktrace: Fix NULL pointer dereference

2021-07-24T19:58:51+00:00

When CONFIG_FRAME_POINTER=y, calling dump_stack() can always trigger
NULL pointer dereference panic similar as below:

[    0.396060] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5+ #47
[    0.396692] Hardware name: riscv-virtio,qemu (DT)
[    0.397176] Call Trace:
[    0.398191] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000960
[    0.399487] Oops [#1]
[    0.399739] Modules linked in:
[    0.400135] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5+ #47
[    0.400570] Hardware name: riscv-virtio,qemu (DT)
[    0.400926] epc : walk_stackframe+0xc4/0xdc
[    0.401291]  ra : dump_backtrace+0x30/0x38
[    0.401630] epc : ffffffff80004922 ra : ffffffff8000496a sp : ffffffe000f3bd00
[    0.402115]  gp : ffffffff80cfdcb8 tp : ffffffe000f30000 t0 : ffffffff80d0b0cf
[    0.402602]  t1 : ffffffff80d0b0c0 t2 : 0000000000000000 s0 : ffffffe000f3bd60
[    0.403071]  s1 : ffffffff808bc2e8 a0 : 0000000000001000 a1 : 0000000000000000
[    0.403448]  a2 : ffffffff803d7088 a3 : ffffffff808bc2e8 a4 : 6131725dbc24d400
[    0.403820]  a5 : 0000000000001000 a6 : 0000000000000002 a7 : ffffffffffffffff
[    0.404226]  s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
[    0.404634]  s5 : ffffffff803d7088 s6 : ffffffff808bc2e8 s7 : ffffffff80630650
[    0.405085]  s8 : ffffffff80912a80 s9 : 0000000000000008 s10: ffffffff804000fc
[    0.405388]  s11: 0000000000000000 t3 : 0000000000000043 t4 : ffffffffffffffff
[    0.405616]  t5 : 000000000000003d t6 : ffffffe000f3baa8
[    0.405793] status: 0000000000000100 badaddr: 0000000000000960 cause: 000000000000000d
[    0.406135] [] walk_stackframe+0xc4/0xdc
[    0.407032] [] dump_backtrace+0x30/0x38
[    0.407797] [] show_stack+0x40/0x4c
[    0.408234] [] dump_stack+0x90/0xb6
[    0.409019] [] ptdump_init+0x20/0xc4
[    0.409681] [] do_one_initcall+0x4c/0x226
[    0.410110] [] kernel_init_freeable+0x1f4/0x258
[    0.410562] [] kernel_init+0x22/0x148
[    0.410959] [] ret_from_exception+0x0/0x14
[    0.412241] ---[ end trace b2ab92c901b96251 ]---
[    0.413099] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

The reason is the task is NULL when we finally call walk_stackframe()
the NULL is passed from __dump_stack():

|static void __dump_stack(void)
|{
|        dump_stack_print_info(KERN_DEFAULT);
|        show_stack(NULL, NULL, KERN_DEFAULT);
|}

Fix this issue by checking "task == NULL" case in walk_stackframe().

Fixes: eac2f3059e02 ("riscv: stacktrace: fix the riscv stacktrace when CONFIG_FRAME_POINTER enabled")
Signed-off-by: Jisheng Zhang 
Reviewed-by: Atish Patra 
Tested-by: Wende Tan 
Signed-off-by: Palmer Dabbelt

riscv: stacktrace: pin the task's stack in get_wchan

2021-07-24T00:29:03+00:00

Pin the task's stack before calling walk_stackframe() in get_wchan().
This can fix the panic as reported by Andreas when CONFIG_VMAP_STACK=y:

[   65.609696] Unable to handle kernel paging request at virtual address ffffffd0003bbde8
[   65.610460] Oops [#1]
[   65.610626] Modules linked in: virtio_blk virtio_mmio rtc_goldfish btrfs blake2b_generic libcrc32c xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[   65.611670] CPU: 2 PID: 1 Comm: systemd Not tainted 5.14.0-rc1-1.g34fe32a-default #1 openSUSE Tumbleweed (unreleased) c62f7109153e5a0897ee58ba52393ad99b070fd2
[   65.612334] Hardware name: riscv-virtio,qemu (DT)
[   65.613008] epc : get_wchan+0x5c/0x88
[   65.613334]  ra : get_wchan+0x42/0x88
[   65.613625] epc : ffffffff800048a4 ra : ffffffff8000488a sp : ffffffd00021bb90
[   65.614008]  gp : ffffffff817709f8 tp : ffffffe07fe91b80 t0 : 00000000000001f8
[   65.614411]  t1 : 0000000000020000 t2 : 0000000000000000 s0 : ffffffd00021bbd0
[   65.614818]  s1 : ffffffd0003bbdf0 a0 : 0000000000000001 a1 : 0000000000000002
[   65.615237]  a2 : ffffffff81618008 a3 : 0000000000000000 a4 : 0000000000000000
[   65.615637]  a5 : ffffffd0003bc000 a6 : 0000000000000002 a7 : ffffffe27d370000
[   65.616022]  s2 : ffffffd0003bbd90 s3 : ffffffff8071a81e s4 : 0000000000003fff
[   65.616407]  s5 : ffffffffffffc000 s6 : 0000000000000000 s7 : ffffffff81618008
[   65.616845]  s8 : 0000000000000001 s9 : 0000000180000040 s10: 0000000000000000
[   65.617248]  s11: 000000000000016b t3 : 000000ff00000000 t4 : 0c6aec92de5e3fd7
[   65.617672]  t5 : fff78f60608fcfff t6 : 0000000000000078
[   65.618088] status: 0000000000000120 badaddr: ffffffd0003bbde8 cause: 000000000000000d
[   65.618621] [] get_wchan+0x5c/0x88
[   65.619008] [] do_task_stat+0x7a2/0xa46
[   65.619325] [] proc_tgid_stat+0xe/0x16
[   65.619637] [] proc_single_show+0x46/0x96
[   65.619979] [] seq_read_iter+0x190/0x31e
[   65.620341] [] seq_read+0xc4/0x104
[   65.620633] [] vfs_read+0x6a/0x112
[   65.620922] [] ksys_read+0x54/0xbe
[   65.621206] [] sys_read+0xe/0x16
[   65.621474] [] ret_from_syscall+0x0/0x2
[   65.622169] ---[ end trace f24856ed2b8789c5 ]---
[   65.622832] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Signed-off-by: Jisheng Zhang 
Signed-off-by: Palmer Dabbelt