summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2022-09-29tracing/user_events: Use bits vs bytes for enabled status page dataBeau Belgrave
User processes may require many events and when they do the cache performance of a byte index status check is less ideal than a bit index. The previous event limit per-page was 4096, the new limit is 32,768. This change adds a bitwise index to the user_reg struct. Programs check that the bit at status_bit has a bit set within the status page(s). Link: https://lkml.kernel.org/r/20220728233309.1896-6-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use refcount instead of atomic for ref trackingBeau Belgrave
User processes could open up enough event references to cause rollovers. These could cause use after free scenarios, which we do not want. Switching to refcount APIs prevent this, but will leak memory once saturated. Once saturated, user processes can still use the events. This prevents a bad user process from stopping existing telemetry from being emitted. Link: https://lkml.kernel.org/r/20220728233309.1896-5-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Ensure user provided strings are safely formattedBeau Belgrave
User processes can provide bad strings that may cause issues or leak kernel details back out. Don't trust the content of these strings when formatting strings for matching. This also moves to a consistent dynamic length string creation model. Link: https://lkml.kernel.org/r/20220728233309.1896-4-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use WRITE instead of READ for io vector importBeau Belgrave
import_single_range expects the direction/rw to be where it came from, not the protection/limit. Since the import is in a write path use WRITE. Link: https://lkml.kernel.org/r/20220728233309.1896-3-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing/user_events: Use NULL for strstr checksBeau Belgrave
Trivial fix to ensure strstr checks use NULL instead of 0. Link: https://lkml.kernel.org/r/20220728233309.1896-2-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing: Fix spelling mistake "preapre" -> "prepare"Colin Ian King
There is a spelling mistake in the trace text. Fix it. Link: https://lkml.kernel.org/r/20220928215828.66325-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing: Wake up waiters when tracing is disabledSteven Rostedt (Google)
When tracing is disabled, there's no reason that waiters should stay waiting, wake them up, otherwise tasks get stuck when they should be flushing the buffers. Cc: stable@vger.kernel.org Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29tracing: Add ioctl() to force ring buffer waiters to wake upSteven Rostedt (Google)
If a process is waiting on the ring buffer for data, there currently isn't a clean way to force it to wake up. Add an ioctl call that will force any tasks that are waiting on the trace_pipe_raw file to wake up. Link: https://lkml.kernel.org/r/20220929095029.117f913f@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-29Merge branch 'v6.0-rc7'Peter Zijlstra
Merge upstream to get RAPTORLAKE_S Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2022-09-28tracing: Wake up ring buffer waiters on closing of the fileSteven Rostedt (Google)
When the file that represents the ring buffer is closed, there may be waiters waiting on more input from the ring buffer. Call ring_buffer_wake_waiters() to wake up any waiters when the file is closed. Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-28ring-buffer: Add ring_buffer_wake_waiters()Steven Rostedt (Google)
On closing of a file that represents a ring buffer or flushing the file, there may be waiters on the ring buffer that needs to be woken up and exit the ring_buffer_wait() function. Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer and allow them to exit the wait loop. Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ring-buffer: Check pending waiters when doing wake ups as wellSteven Rostedt (Google)
The wake up waiters only checks the "wakeup_full" variable and not the "full_waiters_pending". The full_waiters_pending is set when a waiter is added to the wait queue. The wakeup_full is only set when an event is triggered, and it clears the full_waiters_pending to avoid multiple calls to irq_work_queue(). The irq_work callback really needs to check both wakeup_full as well as full_waiters_pending such that this code can be used to wake up waiters when a file is closed that represents the ring buffer and the waiters need to be woken up. Link: https://lkml.kernel.org/r/20220927231824.209460321@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ring-buffer: Have the shortest_full queue be the shortest not longestSteven Rostedt (Google)
The logic to know when the shortest waiters on the ring buffer should be woken up or not has uses a less than instead of a greater than compare, which causes the shortest_full to actually be the longest. Link: https://lkml.kernel.org/r/20220927231823.718039222@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27bpf: Check flags for branch stack in bpf_read_branch_records helperJiri Olsa
Recent commit [1] changed branch stack data indication from br_stack pointer to sample_flags in perf_sample_data struct. We need to check sample_flags for PERF_SAMPLE_BRANCH_STACK bit for valid branch stack data. [1] a9a931e26668 ("perf: Use sample_flags for branch stack") Fixes: a9a931e26668 ("perf: Use sample_flags for branch stack") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220927203259.590950-1-jolsa@kernel.org
2022-09-27ring-buffer: Allow splice to read previous partially read pagesSteven Rostedt (Google)
If a page is partially read, and then the splice system call is run against the ring buffer, it will always fail to read, no matter how much is in the ring buffer. That's because the code path for a partial read of the page does will fail if the "full" flag is set. The splice system call wants full pages, so if the read of the ring buffer is not yet full, it should return zero, and the splice will block. But if a previous read was done, where the beginning has been consumed, it should still be given to the splice caller if the rest of the page has been written to. This caused the splice command to never consume data in this scenario, and let the ring buffer just fill up and lose events. Link: https://lkml.kernel.org/r/20220927144317.46be6b80@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 8789a9e7df6bf ("ring-buffer: read page interface") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ftrace: Fix recursive locking direct_mutex in ftrace_modify_direct_callerSong Liu
Naveen reported recursive locking of direct_mutex with sample ftrace-direct-modify.ko: [ 74.762406] WARNING: possible recursive locking detected [ 74.762887] 6.0.0-rc6+ #33 Not tainted [ 74.763216] -------------------------------------------- [ 74.763672] event-sample-fn/1084 is trying to acquire lock: [ 74.764152] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ register_ftrace_function+0x1f/0x180 [ 74.764922] [ 74.764922] but task is already holding lock: [ 74.765421] ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ modify_ftrace_direct+0x34/0x1f0 [ 74.766142] [ 74.766142] other info that might help us debug this: [ 74.766701] Possible unsafe locking scenario: [ 74.766701] [ 74.767216] CPU0 [ 74.767437] ---- [ 74.767656] lock(direct_mutex); [ 74.767952] lock(direct_mutex); [ 74.768245] [ 74.768245] *** DEADLOCK *** [ 74.768245] [ 74.768750] May be due to missing lock nesting notation [ 74.768750] [ 74.769332] 1 lock held by event-sample-fn/1084: [ 74.769731] #0: ffffffff86c9d6b0 (direct_mutex){+.+.}-{3:3}, at: \ modify_ftrace_direct+0x34/0x1f0 [ 74.770496] [ 74.770496] stack backtrace: [ 74.770884] CPU: 4 PID: 1084 Comm: event-sample-fn Not tainted ... [ 74.771498] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... [ 74.772474] Call Trace: [ 74.772696] <TASK> [ 74.772896] dump_stack_lvl+0x44/0x5b [ 74.773223] __lock_acquire.cold.74+0xac/0x2b7 [ 74.773616] lock_acquire+0xd2/0x310 [ 74.773936] ? register_ftrace_function+0x1f/0x180 [ 74.774357] ? lock_is_held_type+0xd8/0x130 [ 74.774744] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.775213] __mutex_lock+0x99/0x1010 [ 74.775536] ? register_ftrace_function+0x1f/0x180 [ 74.775954] ? slab_free_freelist_hook.isra.43+0x115/0x160 [ 74.776424] ? ftrace_set_hash+0x195/0x220 [ 74.776779] ? register_ftrace_function+0x1f/0x180 [ 74.777194] ? kfree+0x3e1/0x440 [ 74.777482] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.777941] ? __schedule+0xb40/0xb40 [ 74.778258] ? register_ftrace_function+0x1f/0x180 [ 74.778672] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.779128] register_ftrace_function+0x1f/0x180 [ 74.779527] ? ftrace_set_filter_ip+0x33/0x70 [ 74.779910] ? __schedule+0xb40/0xb40 [ 74.780231] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.780678] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.781147] ftrace_modify_direct_caller+0x5b/0x90 [ 74.781563] ? 0xffffffffa0201000 [ 74.781859] ? my_tramp1+0xf/0xf [ftrace_direct_modify] [ 74.782309] modify_ftrace_direct+0x1b2/0x1f0 [ 74.782690] ? __schedule+0xb40/0xb40 [ 74.783014] ? simple_thread+0x2a/0xb0 [ftrace_direct_modify] [ 74.783508] ? __schedule+0xb40/0xb40 [ 74.783832] ? my_tramp2+0x11/0x11 [ftrace_direct_modify] [ 74.784294] simple_thread+0x76/0xb0 [ftrace_direct_modify] [ 74.784766] kthread+0xf5/0x120 [ 74.785052] ? kthread_complete_and_exit+0x20/0x20 [ 74.785464] ret_from_fork+0x22/0x30 [ 74.785781] </TASK> Fix this by using register_ftrace_function_nolock in ftrace_modify_direct_caller. Link: https://lkml.kernel.org/r/20220927004146.1215303-1-song@kernel.org Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function") Reported-and-tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Song Liu <song@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27ftrace: Properly unset FTRACE_HASH_FL_MODZheng Yejian
When executing following commands like what document said, but the log "#### all functions enabled ####" was not shown as expect: 1. Set a 'mod' filter: $ echo 'write*:mod:ext3' > /sys/kernel/tracing/set_ftrace_filter 2. Invert above filter: $ echo '!write*:mod:ext3' >> /sys/kernel/tracing/set_ftrace_filter 3. Read the file: $ cat /sys/kernel/tracing/set_ftrace_filter By some debugging, I found that flag FTRACE_HASH_FL_MOD was not unset after inversion like above step 2 and then result of ftrace_hash_empty() is incorrect. Link: https://lkml.kernel.org/r/20220926152008.2239274-1-zhengyejian1@huawei.com Cc: <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 8c08f0d5c6fb ("ftrace: Have cached module filters be an active filter") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27tracing/eprobe: Fix alloc event dir failed when event name no setTao Chen
The event dir will alloc failed when event name no set, using the command: "echo "e:esys/ syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events" It seems that dir name="syscalls/sys_enter_openat" is not allowed in debugfs. So just use the "sys_enter_openat" as the event name. Link: https://lkml.kernel.org/r/1664028814-45923-1-git-send-email-chentao.kernel@linux.alibaba.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Linyu Yuan <quic_linyyuan@quicinc.com> Cc: Tao Chen <chentao.kernel@linux.alibaba.com Cc: stable@vger.kernel.org Fixes: 95c104c378dc ("tracing: Auto generate event name when creating a group of events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Tao Chen <chentao.kernel@linux.alibaba.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-27tracing: Disable interrupt or preemption before acquiring arch_spinlock_tWaiman Long
It was found that some tracing functions in kernel/trace/trace.c acquire an arch_spinlock_t with preemption and irqs enabled. An example is the tracing_saved_cmdlines_size_read() function which intermittently causes a "BUG: using smp_processor_id() in preemptible" warning when the LTP read_all_proc test is run. That can be problematic in case preemption happens after acquiring the lock. Add the necessary preemption or interrupt disabling code in the appropriate places before acquiring an arch_spinlock_t. The convention here is to disable preemption for trace_cmdline_lock and interupt for max_lock. Link: https://lkml.kernel.org/r/20220922145622.1744826-1-longman@redhat.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: stable@vger.kernel.org Fixes: a35873a0993b ("tracing: Add conditional snapshot") Fixes: 939c7a4f04fc ("tracing: Introduce saved_cmdlines_size file") Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26bpf: Return value in kprobe get_func_ip only for entry addressJiri Olsa
Changing return value of kprobe's version of bpf_get_func_ip to return zero if the attach address is not on the function's entry point. For kprobes attached in the middle of the function we can't easily get to the function address especially now with the CONFIG_X86_KERNEL_IBT support. If user cares about current IP for kprobes attached within the function body, they can get it with PT_REGS_IP(ctx). Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Martynas Pumputis <m@lambda.lt> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-6-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26bpf: Adjust kprobe_multi entry_ip for CONFIG_X86_KERNEL_IBTJiri Olsa
Martynas reported bpf_get_func_ip returning +4 address when CONFIG_X86_KERNEL_IBT option is enabled. When CONFIG_X86_KERNEL_IBT is enabled we'll have endbr instruction at the function entry, which screws return value of bpf_get_func_ip() helper that should return the function address. There's short term workaround for kprobe_multi bpf program made by Alexei [1], but we need this fixup also for bpf_get_attach_cookie, that returns cookie based on the entry_ip value. Moving the fixup in the fprobe handler, so both bpf_get_func_ip and bpf_get_attach_cookie get expected function address when CONFIG_X86_KERNEL_IBT option is enabled. Also renaming kprobe_multi_link_handler entry_ip argument to fentry_ip so it's clearer this is an ftrace __fentry__ ip. [1] commit 7f0059b58f02 ("selftests/bpf: Fix kprobe_multi test.") Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: Martynas Pumputis <m@lambda.lt> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-5-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26ftrace: Keep the resolved addr in kallsyms_callbackJiri Olsa
Keeping the resolved 'addr' in kallsyms_callback, instead of taking ftrace_location value, because we depend on symbol address in the cookie related code. With CONFIG_X86_KERNEL_IBT option the ftrace_location value differs from symbol address, which screwes the symbol address cookies matching. There are 2 users of this function: - bpf_kprobe_multi_link_attach for which this fix is for - get_ftrace_locations which is used by register_fprobe_syms this function needs to get symbols resolved to addresses, but does not need 'ftrace location addresses' at this point there's another ftrace location translation in the path done by ftrace_set_filter_ips call: register_fprobe_syms addrs = get_ftrace_locations register_fprobe_ips(addrs) ... ftrace_set_filter_ips ... __ftrace_match_addr ip = ftrace_location(ip); ... Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220926153340.1621984-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-26rv/monitor: Add __init/__exit annotations to module init/exit funcsXiu Jianfeng
Add missing __init/__exit annotations to module init/exit funcs. Link: https://lkml.kernel.org/r/20220922103208.162869-1-xiujianfeng@huawei.com Fixes: 24bce201d798 ("tools/rv: Add dot2k") Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k") Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/osnoise: Fix possible recursive locking in stop_per_cpu_kthreadsNico Pache
There is a recursive lock on the cpu_hotplug_lock. In kernel/trace/trace_osnoise.c:<start/stop>_per_cpu_kthreads: - start_per_cpu_kthreads calls cpus_read_lock() and if start_kthreads returns a error it will call stop_per_cpu_kthreads. - stop_per_cpu_kthreads then calls cpus_read_lock() again causing deadlock. Fix this by calling cpus_read_unlock() before calling stop_per_cpu_kthreads. This behavior can also be seen in commit f46b16520a08 ("trace/hwlat: Implement the per-cpu mode"). This error was noticed during the LTP ftrace-stress-test: WARNING: possible recursive locking detected -------------------------------------------- sh/275006 is trying to acquire lock: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: stop_per_cpu_kthreads but task is already holding lock: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: start_per_cpu_kthreads other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(cpu_hotplug_lock); lock(cpu_hotplug_lock); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by sh/275006: #0: ffff8881023f0470 (sb_writers#24){.+.+}-{0:0}, at: ksys_write #1: ffffffffb084f430 (trace_types_lock){+.+.}-{3:3}, at: rb_simple_write #2: ffffffffb02f5400 (cpu_hotplug_lock){++++}-{0:0}, at: start_per_cpu_kthreads Link: https://lkml.kernel.org/r/20220919144932.3064014-1-npache@redhat.com Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Signed-off-by: Nico Pache <npache@redhat.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: kprobe: Make gen test module work in arm and riscvYipeng Zou
For now, this selftest module can only work in x86 because of the kprobe cmd was fixed use of x86 registers. This patch adapted to register names under arm and riscv, So that this module can be worked on those platform. Link: https://lkml.kernel.org/r/20220919125629.238242-3-zouyipeng@huawei.com Cc: <linux-riscv@lists.infradead.org> Cc: <mingo@redhat.com> Cc: <paul.walmsley@sifive.com> Cc: <palmer@dabbelt.com> Cc: <aou@eecs.berkeley.edu> Cc: <zanussi@kernel.org> Cc: <liaochang1@huawei.com> Cc: <chris.zjh@huawei.com> Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: kprobe: Fix kprobe event gen test module on exitYipeng Zou
Correct gen_kretprobe_test clr event para on module exit. This will make it can't to delete. Link: https://lkml.kernel.org/r/20220919125629.238242-2-zouyipeng@huawei.com Cc: <linux-riscv@lists.infradead.org> Cc: <mingo@redhat.com> Cc: <paul.walmsley@sifive.com> Cc: <palmer@dabbelt.com> Cc: <aou@eecs.berkeley.edu> Cc: <zanussi@kernel.org> Cc: <liaochang1@huawei.com> Cc: <chris.zjh@huawei.com> Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/filter: Call filter predicate functions directly via a switch statementSteven Rostedt (Google)
Due to retpolines, indirect calls are much more expensive than direct calls. The filters have a select set of functions it uses for the predicates. Instead of using function pointers to call them, create a filter_pred_fn_call() function that uses a switch statement to call the predicate functions directly. This gives almost a 10% speedup to the filter logic. Using the histogram benchmark: Before: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 113 } hitcount: 272 { delta: 114 } hitcount: 840 { delta: 118 } hitcount: 344 { delta: 119 } hitcount: 25428 { delta: 120 } hitcount: 350590 { delta: 121 } hitcount: 1892484 { delta: 122 } hitcount: 6205004 { delta: 123 } hitcount: 11583521 { delta: 124 } hitcount: 37590979 { delta: 125 } hitcount: 108308504 { delta: 126 } hitcount: 131672461 { delta: 127 } hitcount: 88700598 { delta: 128 } hitcount: 65939870 { delta: 129 } hitcount: 45055004 { delta: 130 } hitcount: 33174464 { delta: 131 } hitcount: 31813493 { delta: 132 } hitcount: 29011676 { delta: 133 } hitcount: 22798782 { delta: 134 } hitcount: 22072486 { delta: 135 } hitcount: 17034113 { delta: 136 } hitcount: 8982490 { delta: 137 } hitcount: 2865908 { delta: 138 } hitcount: 980382 { delta: 139 } hitcount: 1651944 { delta: 140 } hitcount: 4112073 { delta: 141 } hitcount: 3963269 { delta: 142 } hitcount: 1712508 { delta: 143 } hitcount: 575941 After: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 103 } hitcount: 60 { delta: 104 } hitcount: 16966 { delta: 105 } hitcount: 396625 { delta: 106 } hitcount: 3223400 { delta: 107 } hitcount: 12053754 { delta: 108 } hitcount: 20241711 { delta: 109 } hitcount: 14850200 { delta: 110 } hitcount: 4946599 { delta: 111 } hitcount: 3479315 { delta: 112 } hitcount: 18698299 { delta: 113 } hitcount: 62388733 { delta: 114 } hitcount: 95803834 { delta: 115 } hitcount: 58278130 { delta: 116 } hitcount: 15364800 { delta: 117 } hitcount: 5586866 { delta: 118 } hitcount: 2346880 { delta: 119 } hitcount: 1131091 { delta: 120 } hitcount: 620896 { delta: 121 } hitcount: 236652 { delta: 122 } hitcount: 105957 { delta: 123 } hitcount: 119107 { delta: 124 } hitcount: 54494 { delta: 125 } hitcount: 63856 { delta: 126 } hitcount: 64454 { delta: 127 } hitcount: 34818 { delta: 128 } hitcount: 41446 { delta: 129 } hitcount: 51242 { delta: 130 } hitcount: 28361 { delta: 131 } hitcount: 23926 The peak before was 126ns per event, after the peak is 114ns, and the fastest time went from 113ns to 103ns. Link: https://lkml.kernel.org/r/20220906225529.781407172@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: Move struct filter_pred into trace_events_filter.cSteven Rostedt (Google)
The structure filter_pred and the typedef of the function used are only referenced by trace_events_filter.c. There's no reason to have it in an external header file. Move them into the only file they are used in. Link: https://lkml.kernel.org/r/20220906225529.598047132@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/hist: Call hist functions directly via a switch statementSteven Rostedt (Google)
Due to retpolines, indirect calls are much more expensive than direct calls. The histograms have a select set of functions it uses for the histograms, instead of using function pointers to call them, create a hist_fn_call() function that uses a switch statement to call the histogram functions directly. This gives a 13% speedup to the histogram logic. Using the histogram benchmark: Before: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 129 } hitcount: 2213 { delta: 130 } hitcount: 285965 { delta: 131 } hitcount: 1146545 { delta: 132 } hitcount: 5185432 { delta: 133 } hitcount: 19896215 { delta: 134 } hitcount: 53118616 { delta: 135 } hitcount: 83816709 { delta: 136 } hitcount: 68329562 { delta: 137 } hitcount: 41859349 { delta: 138 } hitcount: 46257797 { delta: 139 } hitcount: 54400831 { delta: 140 } hitcount: 72875007 { delta: 141 } hitcount: 76193272 { delta: 142 } hitcount: 49504263 { delta: 143 } hitcount: 38821072 { delta: 144 } hitcount: 47702679 { delta: 145 } hitcount: 41357297 { delta: 146 } hitcount: 22058238 { delta: 147 } hitcount: 9720002 { delta: 148 } hitcount: 3193542 { delta: 149 } hitcount: 927030 { delta: 150 } hitcount: 850772 { delta: 151 } hitcount: 1477380 { delta: 152 } hitcount: 2687977 { delta: 153 } hitcount: 2865985 { delta: 154 } hitcount: 1977492 { delta: 155 } hitcount: 2475607 { delta: 156 } hitcount: 3403612 After: # event histogram # # trigger info: hist:keys=delta:vals=hitcount:sort=delta:size=2048 if delta > 0 [active] # { delta: 113 } hitcount: 272 { delta: 114 } hitcount: 840 { delta: 118 } hitcount: 344 { delta: 119 } hitcount: 25428 { delta: 120 } hitcount: 350590 { delta: 121 } hitcount: 1892484 { delta: 122 } hitcount: 6205004 { delta: 123 } hitcount: 11583521 { delta: 124 } hitcount: 37590979 { delta: 125 } hitcount: 108308504 { delta: 126 } hitcount: 131672461 { delta: 127 } hitcount: 88700598 { delta: 128 } hitcount: 65939870 { delta: 129 } hitcount: 45055004 { delta: 130 } hitcount: 33174464 { delta: 131 } hitcount: 31813493 { delta: 132 } hitcount: 29011676 { delta: 133 } hitcount: 22798782 { delta: 134 } hitcount: 22072486 { delta: 135 } hitcount: 17034113 { delta: 136 } hitcount: 8982490 { delta: 137 } hitcount: 2865908 { delta: 138 } hitcount: 980382 { delta: 139 } hitcount: 1651944 { delta: 140 } hitcount: 4112073 { delta: 141 } hitcount: 3963269 { delta: 142 } hitcount: 1712508 { delta: 143 } hitcount: 575941 { delta: 144 } hitcount: 351427 { delta: 145 } hitcount: 218077 { delta: 146 } hitcount: 167297 { delta: 147 } hitcount: 146198 { delta: 148 } hitcount: 116122 { delta: 149 } hitcount: 58993 { delta: 150 } hitcount: 40228 The delta above is in nanoseconds. It brings the fastest time down from 129ns to 113ns, and the peak from 141ns to 126ns. Link: https://lkml.kernel.org/r/20220906225529.411545333@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing: Add numeric delta time to the trace event benchmarkSteven Rostedt (Google)
In order to testing filtering and histograms via the trace event benchmark, record the delta time of the last event as a numeric value (currently, it just saves it within the string) so that filters and histograms can use it. Link: https://lkml.kernel.org/r/20220906225529.213677569@goodmis.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26rv/monitors: add 'static' qualifier for local symbolsZeng Heng
The sparse tool complains as follows: kernel/trace/rv/monitors/wwnr/wwnr.c:18:19: warning: symbol 'rv_wwnr' was not declared. Should it be static? The `rv_wwnr` symbol is not dereferenced by other extern files, so add static qualifier for it. So does wip module. Link: https://lkml.kernel.org/r/20220824034357.2014202-2-zengheng4@huawei.com Cc: <mingo@redhat.com> Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor") Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k") Signed-off-by: Zeng Heng <zengheng4@huawei.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-26tracing/eprobe: Add eprobe filter supportMasami Hiramatsu (Google)
Add the filter option to the event probe. This is useful if user wants to derive a new event based on the condition of the original event. E.g. echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \ runtime=$runtime:u32 if cpu < 4' >> ../dynamic_events Then it can filter the events only on first 4 cores. Note that the fields used for 'if' must be the fields in the original events, not eprobe events. Link: https://lkml.kernel.org/r/165932114513.2850673.2592206685744598080.stgit@devnote2 Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-21bpf: Prevent bpf program recursion for raw tracepoint probesJiri Olsa
We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: Stanislav Fomichev <sdf@google.com> Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21bpf: Add bpf_verify_pkcs7_signature() kfuncRoberto Sassu
Add the bpf_verify_pkcs7_signature() kfunc, to give eBPF security modules the ability to check the validity of a signature against supplied data, by using user-provided or system-provided keys as trust anchor. The new kfunc makes it possible to enforce mandatory policies, as eBPF programs might be allowed to make security decisions only based on data sources the system administrator approves. The caller should provide the data to be verified and the signature as eBPF dynamic pointers (to minimize the number of parameters) and a bpf_key structure containing a reference to the keyring with keys trusted for signature verification, obtained from bpf_lookup_user_key() or bpf_lookup_system_key(). For bpf_key structures obtained from the former lookup function, bpf_verify_pkcs7_signature() completes the permission check deferred by that function by calling key_validate(). key_task_permission() is already called by the PKCS#7 code. Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: KP Singh <kpsingh@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220920075951.929132-9-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncsRoberto Sassu
Add the bpf_lookup_user_key(), bpf_lookup_system_key() and bpf_key_put() kfuncs, to respectively search a key with a given key handle serial number and flags, obtain a key from a pre-determined ID defined in include/linux/verification.h, and cleanup. Introduce system_keyring_id_check() to validate the keyring ID parameter of bpf_lookup_system_key(). Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20220920075951.929132-8-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-16ftrace: Add HAVE_DYNAMIC_FTRACE_NO_PATCHABLEPeter Zijlstra (Intel)
x86 will shortly start using -fpatchable-function-entry for purposes other than ftrace, make sure the __patchable_function_entry section isn't merged in the mcount_loc section. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220903131154.420467-2-jolsa@kernel.org
2022-09-06rv/reactor: add __init/__exit annotations to module init/exit funcsXiu Jianfeng
Add missing __init/__exit annotations to module init/exit funcs. Link: https://lkml.kernel.org/r/20220906141210.132607-1-xiujianfeng@huawei.com Fixes: 135b881ea885 ("rv/reactor: Add the printk reactor") Fixes: e88043c0ac16 ("rv/reactor: Add the panic reactor") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06tracing: Fix to check event_mutex is held while accessing trigger listMasami Hiramatsu (Google)
Since the check_user_trigger() is called outside of RCU read lock, this list_for_each_entry_rcu() caused a suspicious RCU usage warning. # echo hist:keys=pid > events/sched/sched_stat_runtime/trigger # cat events/sched/sched_stat_runtime/trigger [ 43.167032] [ 43.167418] ============================= [ 43.167992] WARNING: suspicious RCU usage [ 43.168567] 5.19.0-rc5-00029-g19ebe4651abf #59 Not tainted [ 43.169283] ----------------------------- [ 43.169863] kernel/trace/trace_events_trigger.c:145 RCU-list traversed in non-reader section!! ... However, this file->triggers list is safe when it is accessed under event_mutex is held. To fix this warning, adds a lockdep_is_held check to the list_for_each_entry_rcu(). Link: https://lkml.kernel.org/r/166226474977.223837.1992182913048377113.stgit@devnote2 Cc: stable@vger.kernel.org Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06tracing: hold caller_addr to hardirq_{enable,disable}_ipYipeng Zou
Currently, The arguments passing to lockdep_hardirqs_{on,off} was fixed in CALLER_ADDR0. The function trace_hardirqs_on_caller should have been intended to use caller_addr to represent the address that caller wants to be traced. For example, lockdep log in riscv showing the last {enabled,disabled} at __trace_hardirqs_{on,off} all the time(if called by): [ 57.853175] hardirqs last enabled at (2519): __trace_hardirqs_on+0xc/0x14 [ 57.853848] hardirqs last disabled at (2520): __trace_hardirqs_off+0xc/0x14 After use trace_hardirqs_xx_caller, we can get more effective information: [ 53.781428] hardirqs last enabled at (2595): restore_all+0xe/0x66 [ 53.782185] hardirqs last disabled at (2596): ret_from_exception+0xa/0x10 Link: https://lkml.kernel.org/r/20220901104515.135162-2-zouyipeng@huawei.com Cc: stable@vger.kernel.org Fixes: c3bc8fd637a96 ("tracing: Centralize preemptirq tracepoints and unify their usage") Signed-off-by: Yipeng Zou <zouyipeng@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-09-06rv/monitors: Make monitor's automata definition staticDaniel Bristot de Oliveira
Monitor's automata definition is only used locally, so make them static for all existing monitors. Link: https://lore.kernel.org/all/202208210332.gtHXje45-lkp@intel.com Link: https://lore.kernel.org/all/202208210358.6HH3OrVs-lkp@intel.com Link: https://lkml.kernel.org/r/a50e27c3738d6ef809f4201857229fed64799234.1661266564.git.bristot@kernel.org Fixes: ccc319dcb450 ("rv/monitor: Add the wwnr monitor") Fixes: 8812d21219b9 ("rv/monitor: Add the wip monitor skeleton created by dot2k") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-22ftrace: Fix build warning for ops_references_rec() not usedWang Jingjin
The change that made IPMODIFY and DIRECT ops work together needed access to the ops_references_ip() function, which it pulled out of the module only code. But now if both CONFIG_MODULES and CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS is not set, we get the below warning: ‘ops_references_rec’ defined but not used. Since ops_references_rec() only calls ops_references_ip() replace the usage of ops_references_rec() with ops_references_ip() and encompass the function with an #ifdef of DIRECT_CALLS || MODULES being defined. Link: https://lkml.kernel.org/r/20220801084745.1187987-1-wangjingjin1@huawei.com Fixes: 53cd885bc5c3 ("ftrace: Allow IPMODIFY and DIRECT ops on the same function") Signed-off-by: Wang Jingjin <wangjingjin1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21Merge tag 'trace-v6.0-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various fixes for tracing: - Fix a return value of traceprobe_parse_event_name() - Fix NULL pointer dereference from failed ftrace enabling - Fix NULL pointer dereference when asking for registers from eprobes - Make eprobes consistent with kprobes/uprobes, filters and histograms" * tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Have filter accept "common_cpu" to be consistent tracing/probes: Have kprobes and uprobes use $COMM too tracing/eprobes: Have event probes be consistent with kprobes and uprobes tracing/eprobes: Fix reading of string fields tracing/eprobes: Do not hardcode $comm as a string tracing/eprobes: Do not allow eprobes to use $stack, or % for regs ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead tracing/perf: Fix double put of trace event when init fails tracing: React to error return from traceprobe_parse_event_name()
2022-08-21tracing: Have filter accept "common_cpu" to be consistentSteven Rostedt (Google)
Make filtering consistent with histograms. As "cpu" can be a field of an event, allow for "common_cpu" to keep it from being confused with the "cpu" field of the event. Link: https://lkml.kernel.org/r/20220820134401.513062765@goodmis.org Link: https://lore.kernel.org/all/20220820220920.e42fa32b70505b1904f0a0ad@kernel.org/ Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 1e3bac71c5053 ("tracing/histogram: Rename "cpu" to "common_cpu"") Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21tracing/probes: Have kprobes and uprobes use $COMM tooSteven Rostedt (Google)
Both $comm and $COMM can be used to get current->comm in eprobes and the filtering and histogram logic. Make kprobes and uprobes consistent in this regard and allow both $comm and $COMM as well. Currently kprobes and uprobes only handle $comm, which is inconsistent with the other utilities, and can be confusing to users. Link: https://lkml.kernel.org/r/20220820134401.317014913@goodmis.org Link: https://lore.kernel.org/all/20220820220442.776e1ddaf8836e82edb34d01@kernel.org/ Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 533059281ee5 ("tracing: probeevent: Introduce new argument fetching code") Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21tracing/eprobes: Have event probes be consistent with kprobes and uprobesSteven Rostedt (Google)
Currently, if a symbol "@" is attempted to be used with an event probe (eprobes), it will cause a NULL pointer dereference crash. Both kprobes and uprobes can reference data other than the main registers. Such as immediate address, symbols and the current task name. Have eprobes do the same thing. For "comm", if "comm" is used and the event being attached to does not have the "comm" field, then make it the "$comm" that kprobes has. This is consistent to the way histograms and filters work. Link: https://lkml.kernel.org/r/20220820134401.136924220@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21tracing/eprobes: Fix reading of string fieldsSteven Rostedt (Google)
Currently when an event probe (eprobe) hooks to a string field, it does not display it as a string, but instead as a number. This makes the field rather useless. Handle the different kinds of strings, dynamic, static, relational/dynamic etc. Now when a string field is used, the ":string" type can be used to display it: echo "e:sw sched/sched_switch comm=$next_comm:string" > dynamic_events Link: https://lkml.kernel.org/r/20220820134400.959640191@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21tracing/eprobes: Do not hardcode $comm as a stringSteven Rostedt (Google)
The variable $comm is hard coded as a string, which is true for both kprobes and uprobes, but for event probes (eprobes) it is a field name. In most cases the "comm" field would be a string, but there's no guarantee of that fact. Do not assume that comm is a string. Not to mention, it currently forces comm fields to fault, as string processing for event probes is currently broken. Link: https://lkml.kernel.org/r/20220820134400.756152112@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21tracing/eprobes: Do not allow eprobes to use $stack, or % for regsSteven Rostedt (Google)
While playing with event probes (eprobes), I tried to see what would happen if I attempted to retrieve the instruction pointer (%rip) knowing that event probes do not use pt_regs. The result was: BUG: kernel NULL pointer dereference, address: 0000000000000024 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 RIP: 0010:get_event_field.isra.0+0x0/0x50 Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8 50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24 8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74 RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086 RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000 RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8 R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8 R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff916c9ea40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0 Call Trace: <TASK> get_eprobe_size+0xb4/0x640 ? __mod_node_page_state+0x72/0xc0 __eprobe_trace_func+0x59/0x1a0 ? __mod_lruvec_page_state+0xaa/0x1b0 ? page_remove_file_rmap+0x14/0x230 ? page_remove_rmap+0xda/0x170 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x18f/0x240 trace_event_raw_event_sched_wakeup_template+0x7a/0xb0 try_to_wake_up+0x260/0x4c0 __wake_up_common+0x80/0x180 __wake_up_common_lock+0x7c/0xc0 do_notify_parent+0x1c9/0x2a0 exit_notify+0x1a9/0x220 do_exit+0x2ba/0x450 do_group_exit+0x2d/0x90 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Obviously this is not the desired result. Move the testing for TPARG_FL_TPOINT which is only used for event probes to the top of the "$" variable check, as all the other variables are not used for event probes. Also add a check in the register parsing "%" to fail if an event probe is used. Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is deadYang Jihong
ftrace_startup does not remove ops from ftrace_ops_list when ftrace_startup_enable fails: register_ftrace_function ftrace_startup __register_ftrace_function ... add_ftrace_ops(&ftrace_ops_list, ops) ... ... ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1 ... return 0 // ops is in the ftrace_ops_list. When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything: unregister_ftrace_function ftrace_shutdown if (unlikely(ftrace_disabled)) return -ENODEV; // return here, __unregister_ftrace_function is not executed, // as a result, ops is still in the ftrace_ops_list __unregister_ftrace_function ... If ops is dynamically allocated, it will be free later, in this case, is_ftrace_trampoline accesses NULL pointer: is_ftrace_trampoline ftrace_ops_trampoline do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL! Syzkaller reports as follows: [ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b [ 1203.508039] #PF: supervisor read access in kernel mode [ 1203.508798] #PF: error_code(0x0000) - not-present page [ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0 [ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI [ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G B W 5.10.0 #8 [ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0 [ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00 [ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246 [ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866 [ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b [ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07 [ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399 [ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008 [ 1203.525634] FS: 00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000 [ 1203.526801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0 [ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Therefore, when ftrace_startup_enable fails, we need to rollback registration process and remove ops from ftrace_ops_list. Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.com Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-08-21tracing/perf: Fix double put of trace event when init failsSteven Rostedt (Google)
If in perf_trace_event_init(), the perf_trace_event_open() fails, then it will call perf_trace_event_unreg() which will not only unregister the perf trace event, but will also call the put() function of the tp_event. The problem here is that the trace_event_try_get_ref() is called by the caller of perf_trace_event_init() and if perf_trace_event_init() returns a failure, it will then call trace_event_put(). But since the perf_trace_event_unreg() already called the trace_event_put() function, it triggers a WARN_ON(). WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20 If perf_trace_event_reg() does not call the trace_event_try_get_ref() then the perf_trace_event_unreg() should not be calling trace_event_put(). This breaks symmetry and causes bugs like these. Pull out the trace_event_put() from perf_trace_event_unreg() and call it in the locations that perf_trace_event_unreg() is called. This not only fixes this bug, but also brings back the proper symmetry of the reg/unreg vs get/put logic. Link: https://lore.kernel.org/all/cover.1660347763.git.kjlx@templeofstupid.com/ Link: https://lkml.kernel.org/r/20220816192817.43d5e17f@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 1d18538e6a092 ("tracing: Have dynamic events have a ref counter") Reported-by: Krister Johansen <kjlx@templeofstupid.com> Reviewed-by: Krister Johansen <kjlx@templeofstupid.com> Tested-by: Krister Johansen <kjlx@templeofstupid.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>