summaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)Author
2026-02-02ftrace: Fix direct_functions leak in update_ftrace_direct_delJiri Olsa
Alexei reported memory leak in update_ftrace_direct_del. We miss cleanup of the replaced direct_functions in the success path in update_ftrace_direct_del, adding that. Fixes: 8d2c1233f371 ("ftrace: Add update_ftrace_direct_del function") Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Closes: https://lore.kernel.org/bpf/aX_BxG5EJTJdCMT9@krava/T/#m7c13f5a95f862ed7ab78e905fbb678d635306a0c Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20260202075849.1684369-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-31tracing: remove size parameter in __trace_puts()Steven Rostedt
The __trace_puts() function takes a string pointer and the size of the string itself. All users currently simply pass in the strlen() of the string it is also passing in. There's no reason to pass in the size. Instead have the __trace_puts() function do the strlen() within the function itself. This fixes a header recursion issue where using strlen() in the macro calling __trace_puts() requires adding #include <linux/string.h> in order to use strlen(). Removing the use of strlen() from the header fixes the recursion issue. Link: https://lore.kernel.org/all/aUN8Hm377C5A0ILX@yury/ Link: https://lkml.kernel.org/r/20260116042510.241009-6-ynorov@nvidia.com Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Yury Norov <ynorov@nvidia.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-30tracing: Add kerneldoc to trace_event_buffer_reserve()Steven Rostedt
Add a appropriate kerneldoc to trace_event_buffer_reserve() to make it easier to understand how that function is used. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260130103745.1126e4af@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-30bpf: Have __bpf_trace_run() use rcu_read_lock_dont_migrate()Steven Rostedt
In order to switch the protection of tracepoint callbacks from preempt_disable() to srcu_read_lock_fast() the BPF callback from tracepoints needs to have migration prevention as the BPF programs expect to stay on the same CPU as they execute. Put together the RCU protection with migration prevention and use rcu_read_lock_dont_migrate() in __bpf_trace_run(). This will allow tracepoints callbacks to be preemptible. Link: https://lore.kernel.org/all/CAADnVQKvY026HSFGOsavJppm3-Ajm-VsLzY-OeFUe+BaKMRnDg@mail.gmail.com/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20260126231256.335034877@kernel.org Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-28tracing: Remove duplicate ENABLE_EVENT_STR and DISABLE_EVENT_STR macrosSteven Rostedt
The macros ENABLE_EVENT_STR and DISABLE_EVENT_STR were added to trace.h so that more than one file can have access to them, but was never removed from their original location. Remove the duplicates. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260126130037.4ba201f9@gandalf.local.home Fixes: d0bad49bb0a09 ("tracing: Add enable_hist/disable_hist triggers") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-28tracing: Up the hist stacktrace size from 16 to 31Steven Rostedt
Recording stacktraces is very useful, but the size of 16 deep is very restrictive. For example, in seeing where tasks schedule out in a non running state, the following can be used: ~# cd /sys/kernel/tracing ~# echo 'hist:keys=common_stacktrace:vals=hitcount if prev_state & 3' > events/sched/sched_switch/trigger ~# cat events/sched/sched_switch/hist [..] { common_stacktrace: __schedule+0xdc0/0x1860 schedule+0x27/0xd0 schedule_timeout+0xb5/0x100 wait_for_completion+0x8a/0x140 xfs_buf_iowait+0x20/0xd0 [xfs] xfs_buf_read_map+0x103/0x250 [xfs] xfs_trans_read_buf_map+0x161/0x310 [xfs] xfs_btree_read_buf_block+0xa0/0x120 [xfs] xfs_btree_lookup_get_block+0xa3/0x1e0 [xfs] xfs_btree_lookup+0xea/0x530 [xfs] xfs_alloc_fixup_trees+0x72/0x570 [xfs] xfs_alloc_ag_vextent_size+0x67f/0x800 [xfs] xfs_alloc_vextent_iterate_ags.constprop.0+0x52/0x230 [xfs] xfs_alloc_vextent_start_ag+0x9d/0x1b0 [xfs] xfs_bmap_btalloc+0x2af/0x680 [xfs] xfs_bmapi_allocate+0xdb/0x2c0 [xfs] } hitcount: 1 [..] The above stops at 16 functions where knowing more would be useful. As the allocated storage for stacks is the same for strings, and that size is 256 bytes, there is a lot of space not being used for stacktraces. 16 * 8 = 128 Up the size to 31 (it requires the last slot to be zero, so it can't be 32). Also change the BUILD_BUG_ON() to allow the size of the stacktrace storage to be equal to the max size. One slot is used to hold the number of elements in the stack. BUILD_BUG_ON((HIST_STACKTRACE_DEPTH + 1) * sizeof(long) >= STR_VAR_LEN_MAX); Change that from ">=" to just ">", as now they are equal. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260123105415.2be26bf4@gandalf.local.home Reviewed-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-28tracing: Remove notrace from trace_event_raw_event_synth()Steven Rostedt
When debugging the synthetic events, being able to function trace its functions is very useful (now that CONFIG_FUNCTION_SELF_TRACING is available). For some reason trace_event_raw_event_synth() was marked as "notrace", which was totally unnecessary as all of the tracing directory had function tracing disabled until the recent FUNCTION_SELF_TRACING was added. Remove the notrace annotation from trace_event_raw_event_synth() as there's no reason to not trace it when tracing synthetic event functions. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260122204526.068a98c9@gandalf.local.home Acked-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-28tracing: Have hist_debug show what function a field usesSteven Rostedt
When CONFIG_HIST_TRIGGERS_DEBUG is enabled, each trace event has a "hist_debug" file that explains the histogram internal data. This is very useful for debugging histograms. One bit of data that was missing from this file was what function a histogram field uses to process its data. The hist_field structure now has a fn_num that is used by a switch statement in hist_fn_call() to call a function directly (to avoid spectre mitigations). Instead of displaying that number, create a string array that maps to the histogram function enums so that the function for a field may be displayed: ~# cat /sys/kernel/tracing/events/sched/sched_switch/hist_debug [..] hist_data: 0000000043d62762 n_vals: 2 n_keys: 1 n_fields: 3 val fields: hist_data->fields[0]: flags: VAL: HIST_FIELD_FL_HITCOUNT type: u64 size: 8 is_signed: 0 function: hist_field_counter() hist_data->fields[1]: flags: HIST_FIELD_FL_VAR var.name: __arg_3921_2 var.idx (into tracing_map_elt.vars[]): 0 type: unsigned long[] size: 128 is_signed: 0 function: hist_field_nop() key fields: hist_data->fields[2]: flags: HIST_FIELD_FL_KEY ftrace_event_field name: prev_pid type: pid_t size: 8 is_signed: 1 function: hist_field_s32() The "function:" field above is added. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260122203822.58df4d80@gandalf.local.home Reviewed-by: Tom Zanussi <zanussi@kernel.org> Tested-by: Tom Zanussi <zanussi@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-29tracing: kprobe-event: Return directly when trace kprobes is emptysunliming
In enable_boot_kprobe_events(), it returns directly when trace kprobes is empty, thereby reducing the function's execution time. This function may otherwise wait for the event_mutex lock for tens of milliseconds on certain machines, which is unnecessary when trace kprobes is empty. Link: https://lore.kernel.org/all/20260127053848.108473-1-sunliming@linux.dev/ Signed-off-by: sunliming <sunliming@kylinos.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-01-28bpf,x86: Use single ftrace_ops for direct callsJiri Olsa
Using single ftrace_ops for direct calls update instead of allocating ftrace_ops object for each trampoline. With single ftrace_ops object we can use update_ftrace_direct_* api that allows multiple ip sites updates on single ftrace_ops object. Adding HAVE_SINGLE_FTRACE_DIRECT_OPS config option to be enabled on each arch that supports this. At the moment we can enable this only on x86 arch, because arm relies on ftrace_ops object representing just single trampoline image (stored in ftrace_ops::direct_call). Archs that do not support this will continue to use *_ftrace_direct api. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-10-jolsa@kernel.org
2026-01-28ftrace: Factor ftrace_ops ops_func interfaceJiri Olsa
We are going to remove "ftrace_ops->private == bpf_trampoline" setup in following changes. Adding ip argument to ftrace_ops_func_t callback function, so we can use it to look up the trampoline. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-9-jolsa@kernel.org
2026-01-28ftrace: Add update_ftrace_direct_mod functionJiri Olsa
Adding update_ftrace_direct_mod function that modifies all entries (ip -> direct) provided in hash argument to direct ftrace ops and updates its attachments. The difference to current modify_ftrace_direct is: - hash argument that allows to modify multiple ip -> direct entries at once This change will allow us to have simple ftrace_ops for all bpf direct interface users in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-7-jolsa@kernel.org
2026-01-28ftrace: Add update_ftrace_direct_del functionJiri Olsa
Adding update_ftrace_direct_del function that removes all entries (ip -> addr) provided in hash argument to direct ftrace ops and updates its attachments. The difference to current unregister_ftrace_direct is - hash argument that allows to unregister multiple ip -> direct entries at once - we can call update_ftrace_direct_del multiple times on the same ftrace_ops object, becase we do not need to unregister all entries at once, we can do it gradualy with the help of ftrace_update_ops function This change will allow us to have simple ftrace_ops for all bpf direct interface users in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-6-jolsa@kernel.org
2026-01-28ftrace: Add update_ftrace_direct_add functionJiri Olsa
Adding update_ftrace_direct_add function that adds all entries (ip -> addr) provided in hash argument to direct ftrace ops and updates its attachments. The difference to current register_ftrace_direct is - hash argument that allows to register multiple ip -> direct entries at once - we can call update_ftrace_direct_add multiple times on the same ftrace_ops object, becase after first registration with register_ftrace_function_nolock, it uses ftrace_update_ops to update the ftrace_ops object This change will allow us to have simple ftrace_ops for all bpf direct interface users in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-5-jolsa@kernel.org
2026-01-28ftrace: Export some of hash related functionsJiri Olsa
We are going to use these functions in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-4-jolsa@kernel.org
2026-01-28ftrace: Make alloc_and_copy_ftrace_hash direct friendlyJiri Olsa
Make alloc_and_copy_ftrace_hash to copy also direct address for each hash entry. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-3-jolsa@kernel.org
2026-01-28ftrace,bpf: Remove FTRACE_OPS_FL_JMP ftrace_ops flagJiri Olsa
At the moment the we allow the jmp attach only for ftrace_ops that has FTRACE_OPS_FL_JMP set. This conflicts with following changes where we use single ftrace_ops object for all direct call sites, so all could be be attached via just call or jmp. We already limit the jmp attach support with config option and bit (LSB) set on the trampoline address. It turns out that's actually enough to limit the jmp attach for architecture and only for chosen addresses (with LSB bit set). Each user of register_ftrace_direct or modify_ftrace_direct can set the trampoline bit (LSB) to indicate it has to be attached by jmp. The bpf trampoline generation code uses trampoline flags to generate jmp-attach specific code and ftrace inner code uses the trampoline bit (LSB) to handle return from jmp attachment, so there's no harm to remove the FTRACE_OPS_FL_JMP bit. The fexit/fmodret performance stays the same (did not drop), current code: fentry : 77.904 ± 0.546M/s fexit : 62.430 ± 0.554M/s fmodret : 66.503 ± 0.902M/s with this change: fentry : 80.472 ± 0.061M/s fexit : 63.995 ± 0.127M/s fmodret : 67.362 ± 0.175M/s Fixes: 25e4e3565d45 ("ftrace: Introduce FTRACE_OPS_FL_JMP") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/bpf/20251230145010.103439-2-jolsa@kernel.org
2026-01-26tracing: Disable trace_printk buffer on warning tooSteven Rostedt
When /proc/sys/kernel/traceoff_on_warning is set to 1, the top level tracing buffer is disabled when a warning happens. This is very useful when debugging and want the tracing buffer to stop taking new data when a warning triggers keeping the events that lead up to the warning from being overwritten. Now that there is also a persistent ring buffer and an option to have trace_printk go to that buffer, the same holds true for that buffer. A warning could happen just before a crash but still write enough events to lose the events that lead up to the first warning that was the reason for the crash. When /proc/sys/kernel/traceoff_on_warning is set to 1 and a warning is triggered, not only disable the top level tracing buffer, but also disable the buffer that trace_printk()s are written to. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://patch.msgid.link/20260121093858.5c5d7e7b@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26ftrace: Introduce and use ENTRIES_PER_PAGE_GROUP macroGuenter Roeck
ENTRIES_PER_PAGE_GROUP() returns the number of dyn_ftrace entries in a page group, identified by its order. No functional change. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260113152243.3557219-2-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Have show_event_trigger/filter format a bit more in columnsSteven Rostedt
By doing: # trace-cmd sqlhist -e -n futex_wait select TIMESTAMP_DELTA_USECS as lat from sys_enter_futex as start join sys_exit_futex as end on start.common_pid = end.common_pid and # trace-cmd start -e futex_wait -f 'lat > 100' -e page_pool_state_release -f 'pfn == 1' The output of the show_event_trigger and show_event_filter files are well aligned because of the inconsistent 'tab' spacing: ~# cat /sys/kernel/tracing/show_event_triggers syscalls:sys_exit_futex hist:keys=common_pid:vals=hitcount:__lat_12046_2=common_timestamp.usecs-$__arg_12046_1:sort=hitcount:size=2048:clock=global:onmatch(syscalls.sys_enter_futex).trace(futex_wait,$__lat_12046_2) [active] syscalls:sys_enter_futex hist:keys=common_pid:vals=hitcount:__arg_12046_1=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] ~# cat /sys/kernel/tracing/show_event_filters synthetic:futex_wait (lat > 100) page_pool:page_pool_state_release (pfn == 1) This makes it not so easy to read. Instead, force the spacing to be at least 32 bytes from the beginning (one space if the system:event is longer than 30 bytes): ~# cat /sys/kernel/tracing/show_event_triggers syscalls:sys_exit_futex hist:keys=common_pid:vals=hitcount:__lat_8125_2=common_timestamp.usecs-$__arg_8125_1:sort=hitcount:size=2048:clock=global:onmatch(syscalls.sys_enter_futex).trace(futex_wait,$__lat_8125_2) [active] syscalls:sys_enter_futex hist:keys=common_pid:vals=hitcount:__arg_8125_1=common_timestamp.usecs:sort=hitcount:size=2048:clock=global [active] ~# cat /sys/kernel/tracing/show_event_filters synthetic:futex_wait (lat > 100) page_pool:page_pool_state_release (pfn == 1) Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260112153408.18373e73@gandalf.local.home Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26ring-buffer: Use a housekeeping CPU to wake up waitersPetr Tesarik
Avoid running the wakeup irq_work on an isolated CPU. Since the wakeup can run on any CPU, let's pick a housekeeping CPU to do the job. This change reduces additional noise when tracing isolated CPUs. For example, the following ipi_send_cpu stack trace was captured with nohz_full=2 on the isolated CPU: <idle>-0 [002] d.h4. 1255.379293: ipi_send_cpu: cpu=2 callsite=irq_work_queue+0x2d/0x50 callback=rb_wake_up_waiters+0x0/0x80 <idle>-0 [002] d.h4. 1255.379329: <stack trace> => trace_event_raw_event_ipi_send_cpu => __irq_work_queue_local => irq_work_queue => ring_buffer_unlock_commit => trace_buffer_unlock_commit_regs => trace_event_buffer_commit => trace_event_raw_event_x86_irq_vector => __sysvec_apic_timer_interrupt => sysvec_apic_timer_interrupt => asm_sysvec_apic_timer_interrupt => pv_native_safe_halt => default_idle => default_idle_call => do_idle => cpu_startup_entry => start_secondary => common_startup_64 The IRQ work interrupt alone adds considerable noise, but the impact can get even worse with PREEMPT_RT, because the IRQ work interrupt is then handled by a separate kernel thread. This requires a task switch and makes tracing useless for analyzing latency on an isolated CPU. After applying the patch, the trace is similar, but ipi_send_cpu always targets a non-isolated CPU. Unfortunately, irq_work_queue_on() is not NMI-safe. When running in NMI context, fall back to queuing the irq work on the local CPU. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Clark Williams <clrkwllms@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20260108132132.2473515-1-ptesarik@suse.com Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Check the return value of tracing_update_buffers()Steven Rostedt
In the very unlikely event that tracing_update_buffers() fails in trace_printk_init_buffers(), report the failure so that it is known. Link: https://lore.kernel.org/all/20220917020353.3836285-1-floridsleeves@gmail.com/ Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260107161510.4dc98b15@gandalf.local.home Suggested-by: Li Zhong <floridsleeves@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Add show_event_triggers to expose active event triggersAaron Tomlin
To audit active event triggers, userspace currently must traverse the events/ directory and read each individual trigger file. This is cumbersome for system-wide auditing or debugging. Introduce "show_event_triggers" at the trace root directory. This file displays all events that currently have one or more triggers applied, alongside the trigger configuration, in a consolidated system:event [tab] trigger format. The implementation leverages the existing trace_event_file iterators and uses the trigger's own print() operation to ensure output consistency with the per-event trigger files. Link: https://patch.msgid.link/20260105142939.2655342-3-atomlin@atomlin.com Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Add show_event_filters to expose active event filtersAaron Tomlin
Currently, to audit active Ftrace event filters, userspace must recursively traverse the events/ directory and read each individual filter file. This is inefficient for monitoring tools and debugging. Introduce "show_event_filters" at the trace root directory. This file displays all events that currently have a filter applied, alongside the actual filter string, in a consolidated system:event [tab] filter format. The implementation reuses the existing trace_event_file iterators to ensure atomic traversal of the event list and utilises guard(rcu)() for automatic, scope-based protection when accessing volatile filter strings. Link: https://patch.msgid.link/20260105142939.2655342-2-atomlin@atomlin.com Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Replace use of system_wq with system_dfl_wqMarco Crivellari
This patch continues the effort to refactor workqueue APIs, which has begun with the changes introducing new workqueues and a new alloc_workqueue flag: commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq") commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag") The point of the refactoring is to eventually alter the default behavior of workqueues to become unbound by default so that their workload placement is optimized by the scheduler. Before that to happen after a careful review and conversion of each individual case, workqueue users must be converted to the better named new workqueues with no intended behaviour changes: system_wq -> system_percpu_wq system_unbound_wq -> system_dfl_wq This specific workflow has no benefits being per-cpu, so instead of system_percpu_wq the new unbound workqueue has been used (system_dfl_wq). This way the old obsolete workqueues (system_wq, system_unbound_wq) can be removed in the future. Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251230142820.173712-1-marco.crivellari@suse.com Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Add bitmask-list option for human-readable bitmask displayAaron Tomlin
Add support for displaying bitmasks in human-readable list format (e.g., 0,2-5,7) in addition to the default hexadecimal bitmap representation. This is particularly useful when tracing CPU masks and other large bitmasks where individual bit positions are more meaningful than their hexadecimal encoding. When the "bitmask-list" option is enabled, the printk "%*pbl" format specifier is used to render bitmasks as comma-separated ranges, making trace output easier to interpret for complex CPU configurations and large bitmask values. Link: https://patch.msgid.link/20251226160724.2246493-2-atomlin@atomlin.com Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Remove redundant call to event_trigger_reset_filter() in ↵Steven Rostedt
event_hist_trigger_parse() With the change to replace kfree() with trigger_data_free(), which starts out doing the exact same thing as event_trigger_reset_filter(), there's no reason to call event_trigger_reset_filter() before calling trigger_data_free(). Remove the call to it. Link: https://lore.kernel.org/linux-trace-kernel/20251211204520.0f3ba6d1@fedora/ Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Miaoqian Lin <linmq006@gmail.com> Link: https://patch.msgid.link/20260108174429.2d9ca51f@gandalf.local.home Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-26tracing: Properly process error handling in event_hist_trigger_parse()Miaoqian Lin
Memory allocated with trigger_data_alloc() requires trigger_data_free() for proper cleanup. Replace kfree() with trigger_data_free() to fix this. Found via static analysis and code review. This isn't a real bug due to the current code basically being an open coded version of trigger_data_free() without the synchronization. The synchronization isn't needed as this is the error path of creation and there's nothing to synchronize against yet. Replace the kfree() to be consistent with the allocation. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251211100058.2381268-1-linmq006@gmail.com Fixes: e1f187d09e11 ("tracing: Have existing event_command.parse() implementations use helpers") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-24bpf: support fsession for bpf_session_is_returnMenglong Dong
If fsession exists, we will use the bit (1 << BPF_TRAMP_IS_RETURN_SHIFT) in ((u64 *)ctx)[-1] to store the "is_return" flag. The logic of bpf_session_is_return() for fsession is implemented in the verifier by inline following code: bool bpf_session_is_return(void *ctx) { return (((u64 *)ctx)[-1] >> BPF_TRAMP_IS_RETURN_SHIFT) & 1; } Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Co-developed-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20260124062008.8657-5-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24bpf: change prototype of bpf_session_{cookie,is_return}Menglong Dong
Add the function argument of "void *ctx" to bpf_session_cookie() and bpf_session_is_return(), which is a preparation of the next patch. The two kfunc is seldom used now, so it will not introduce much effect to change their function prototype. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20260124062008.8657-4-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-24bpf: use the least significant byte for the nr_args in trampolineMenglong Dong
For now, ((u64 *)ctx)[-1] is used to store the nr_args in the trampoline. However, 1 byte is enough to store such information. Therefore, we use only the least significant byte of ((u64 *)ctx)[-1] to store the nr_args, and reserve the rest for other usages. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Link: https://lore.kernel.org/r/20260124062008.8657-3-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-23function_graph: Fix args pointer mismatch in print_graph_retval()Donglin Peng
When funcgraph-args and funcgraph-retaddr are both enabled, many kernel functions display invalid parameters in trace logs. The issue occurs because print_graph_retval() passes a mismatched args pointer to print_function_args(). Fix this by retrieving the correct args pointer using the FGRAPH_ENTRY_ARGS() macro. Link: https://patch.msgid.link/20260112021601.1300479-1-dolinux.peng@gmail.com Fixes: f83ac7544fbf ("function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-23tracing: Avoid possible signed 64-bit truncationIan Rogers
64-bit truncation to 32-bit can result in the sign of the truncated value changing. The cmp_mod_entry is used in bsearch and so the truncation could result in an invalid search order. This would only happen were the addresses more than 2GB apart and so unlikely, but let's fix the potentially broken compare anyway. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260108002625.333331-1-irogers@google.com Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-23tracing: Fix crash on synthetic stacktrace field usageSteven Rostedt
When creating a synthetic event based on an existing synthetic event that had a stacktrace field and the new synthetic event used that field a kernel crash occurred: ~# cd /sys/kernel/tracing ~# echo 's:stack unsigned long stack[];' > dynamic_events ~# echo 'hist:keys=prev_pid:s0=common_stacktrace if prev_state & 3' >> events/sched/sched_switch/trigger ~# echo 'hist:keys=next_pid:s1=$s0:onmatch(sched.sched_switch).trace(stack,$s1)' >> events/sched/sched_switch/trigger The above creates a synthetic event that takes a stacktrace when a task schedules out in a non-running state and passes that stacktrace to the sched_switch event when that task schedules back in. It triggers the "stack" synthetic event that has a stacktrace as its field (called "stack"). ~# echo 's:syscall_stack s64 id; unsigned long stack[];' >> dynamic_events ~# echo 'hist:keys=common_pid:s2=stack' >> events/synthetic/stack/trigger ~# echo 'hist:keys=common_pid:s3=$s2,i0=id:onmatch(synthetic.stack).trace(syscall_stack,$i0,$s3)' >> events/raw_syscalls/sys_exit/trigger The above makes another synthetic event called "syscall_stack" that attaches the first synthetic event (stack) to the sys_exit trace event and records the stacktrace from the stack event with the id of the system call that is exiting. When enabling this event (or using it in a historgram): ~# echo 1 > events/synthetic/syscall_stack/enable Produces a kernel crash! BUG: unable to handle page fault for address: 0000000000400010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP PTI CPU: 6 UID: 0 PID: 1257 Comm: bash Not tainted 6.16.3+deb14-amd64 #1 PREEMPT(lazy) Debian 6.16.3-1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:trace_event_raw_event_synth+0x90/0x380 Code: c5 00 00 00 00 85 d2 0f 84 e1 00 00 00 31 db eb 34 0f 1f 00 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 <49> 8b 04 24 48 83 c3 01 8d 0c c5 08 00 00 00 01 cd 41 3b 5d 40 0f RSP: 0018:ffffd2670388f958 EFLAGS: 00010202 RAX: ffff8ba1065cc100 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: fffff266ffda7b90 RDI: ffffd2670388f9b0 RBP: 0000000000000010 R08: ffff8ba104e76000 R09: ffffd2670388fa50 R10: ffff8ba102dd42e0 R11: ffffffff9a908970 R12: 0000000000400010 R13: ffff8ba10a246400 R14: ffff8ba10a710220 R15: fffff266ffda7b90 FS: 00007fa3bc63f740(0000) GS:ffff8ba2e0f48000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000400010 CR3: 0000000107f9e003 CR4: 0000000000172ef0 Call Trace: <TASK> ? __tracing_map_insert+0x208/0x3a0 action_trace+0x67/0x70 event_hist_trigger+0x633/0x6d0 event_triggers_call+0x82/0x130 trace_event_buffer_commit+0x19d/0x250 trace_event_raw_event_sys_exit+0x62/0xb0 syscall_exit_work+0x9d/0x140 do_syscall_64+0x20a/0x2f0 ? trace_event_raw_event_sched_switch+0x12b/0x170 ? save_fpregs_to_fpstate+0x3e/0x90 ? _raw_spin_unlock+0xe/0x30 ? finish_task_switch.isra.0+0x97/0x2c0 ? __rseq_handle_notify_resume+0xad/0x4c0 ? __schedule+0x4b8/0xd00 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x1ef/0x2f0 ? do_fault+0x2e9/0x540 ? __handle_mm_fault+0x7d1/0xf70 ? count_memcg_events+0x167/0x1d0 ? handle_mm_fault+0x1d7/0x2e0 ? do_user_addr_fault+0x2c3/0x7f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The reason is that the stacktrace field is not labeled as such, and is treated as a normal field and not as a dynamic event that it is. In trace_event_raw_event_synth() the event is field is still treated as a dynamic array, but the retrieval of the data is considered a normal field, and the reference is just the meta data: // Meta data is retrieved instead of a dynamic array str_val = (char *)(long)var_ref_vals[val_idx]; // Then when it tries to process it: len = *((unsigned long *)str_val) + 1; It triggers a kernel page fault. To fix this, first when defining the fields of the first synthetic event, set the filter type to FILTER_STACKTRACE. This is used later by the second synthetic event to know that this field is a stacktrace. When creating the field of the new synthetic event, have it use this FILTER_STACKTRACE to know to create a stacktrace field to copy the stacktrace into. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260122194824.6905a38e@gandalf.local.home Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-21bpf: support bpf_get_func_arg() for BPF_TRACE_RAW_TPMenglong Dong
For now, bpf_get_func_arg() and bpf_get_func_arg_cnt() is not supported by the BPF_TRACE_RAW_TP, which is not convenient to get the argument of the tracepoint, especially for the case that the position of the arguments in a tracepoint can change. The target tracepoint BTF type id is specified during loading time, therefore we can get the function argument count from the function prototype instead of the stack. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20260121044348.113201-2-dongml2@chinatelecom.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-20kallsyms/ftrace: set module buildid in ftrace_mod_address_lookup()Petr Mladek
__sprint_symbol() might access an invalid pointer when kallsyms_lookup_buildid() returns a symbol found by ftrace_mod_address_lookup(). The ftrace lookup function must set both @modname and @modbuildid the same way as module_address_lookup(). Link: https://lkml.kernel.org/r/20251128135920.217303-7-pmladek@suse.com Fixes: 9294523e3768 ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20bpf: Fix memory access flags in helper prototypesZesen Liu
After commit 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking"), the verifier started relying on the access type flags in helper function prototypes to perform memory access optimizations. Currently, several helper functions utilizing ARG_PTR_TO_MEM lack the corresponding MEM_RDONLY or MEM_WRITE flags. This omission causes the verifier to incorrectly assume that the buffer contents are unchanged across the helper call. Consequently, the verifier may optimize away subsequent reads based on this wrong assumption, leading to correctness issues. For bpf_get_stack_proto_raw_tp, the original MEM_RDONLY was incorrect since the helper writes to the buffer. Change it to ARG_PTR_TO_UNINIT_MEM which correctly indicates write access to potentially uninitialized memory. Similar issues were recently addressed for specific helpers in commit ac44dcc788b9 ("bpf: Fix verifier assumptions of bpf_d_path's output buffer") and commit 2eb7648558a7 ("bpf: Specify access type of bpf_sysctl_get_name args"). Fix these prototypes by adding the correct memory access flags. Fixes: 37cce22dbd51 ("bpf: verifier: Refactor helper access type tracking") Co-developed-by: Shuran Liu <electronlsr@gmail.com> Signed-off-by: Shuran Liu <electronlsr@gmail.com> Co-developed-by: Peili Gao <gplhust955@gmail.com> Signed-off-by: Peili Gao <gplhust955@gmail.com> Co-developed-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Haoran Ni <haoran.ni.cs@gmail.com> Signed-off-by: Zesen Liu <ftyghome@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260120-helper_proto-v3-1-27b0180b4e77@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-16bpf: Add __force annotations to silence sparse warningsMykyta Yatsenko
Add __force annotations to casts that convert between __user and kernel address spaces. These casts are intentional: - In bpf_send_signal_common(), the value is stored in si_value.sival_ptr which is typed as void __user *, but the value comes from a BPF program parameter. - In the bpf_*_dynptr() kfuncs, user pointers are cast to const void * before being passed to copy helper functions that correctly handle the user address space through copy_from_user variants. Without __force, sparse reports: warning: cast removes address space '__user' of expression Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260115184509.3585759-1-mykyta.yatsenko5@gmail.com Closes: https://lore.kernel.org/oe-kbuild-all/202601131740.6C3BdBaB-lkp@intel.com/
2026-01-15arm64/ftrace,bpf: Fix partial regs after bpf_prog_runJiri Olsa
Mahe reported issue with bpf_override_return helper not working when executed from kprobe.multi bpf program on arm. The problem is that on arm we use alternate storage for pt_regs object that is passed to bpf_prog_run and if any register is changed (which is the case of bpf_override_return) it's not propagated back to actual pt_regs object. Fixing this by introducing and calling ftrace_partial_regs_update function to propagate the values of changed registers (ip and stack). Reported-by: Mahe Tardy <mahe.tardy@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/bpf/20260112121157.854473-1-jolsa@kernel.org
2026-01-15ftrace: Do not over-allocate ftrace memoryGuenter Roeck
The pg_remaining calculation in ftrace_process_locs() assumes that ENTRIES_PER_PAGE multiplied by 2^order equals the actual capacity of the allocated page group. However, ENTRIES_PER_PAGE is PAGE_SIZE / ENTRY_SIZE (integer division). When PAGE_SIZE is not a multiple of ENTRY_SIZE (e.g. 4096 / 24 = 170 with remainder 16), high-order allocations (like 256 pages) have significantly more capacity than 256 * 170. This leads to pg_remaining being underestimated, which in turn makes skip (derived from skipped - pg_remaining) larger than expected, causing the WARN(skip != remaining) to trigger. Extra allocated pages for ftrace: 2 with 654 skipped WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7295 ftrace_process_locs+0x5bf/0x5e0 A similar problem in ftrace_allocate_records() can result in allocating too many pages. This can trigger the second warning in ftrace_process_locs(). Extra allocated pages for ftrace WARNING: CPU: 0 PID: 0 at kernel/trace/ftrace.c:7276 ftrace_process_locs+0x548/0x580 Use the actual capacity of a page group to determine the number of pages to allocate. Have ftrace_allocate_pages() return the number of allocated pages to avoid having to calculate it. Use the actual page group capacity when validating the number of unused pages due to skipped entries. Drop the definition of ENTRIES_PER_PAGE since it is no longer used. Cc: stable@vger.kernel.org Fixes: 4a3efc6baff93 ("ftrace: Update the mcount_loc check of skipped entries") Link: https://patch.msgid.link/20260113152243.3557219-1-linux@roeck-us.net Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc5Alexei Starovoitov
Cross-merge BPF and other fixes after downstream PR. No conflicts. Adjacent: Auto-merging MAINTAINERS Auto-merging Makefile Auto-merging kernel/bpf/verifier.c Auto-merging kernel/sched/ext.c Auto-merging mm/memcontrol.c Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-01-12verification/rvgen: Remove unused variable declaration from containersGabriele Monaco
The monitor container source files contained a declaration and a definition for the rv_monitor variable. The former is superfluous and can be removed. Remove the variable declaration from the template as well as the existing monitor containers. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20251126104241.291258-9-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-01-12verification/dot2c: Remove superfluous enum assignment and add last commaGabriele Monaco
The header files generated by dot2c currently create enums for states and events assigning the first element to 0. This is superfluous as it happens automatically if no value is specified. Also it doesn't add a comma to the last enum elements, which slightly complicates the diff if states or events are added. Remove the assignment to 0 and add a comma to last elements, this simplifies the logic for the code generator. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20251126104241.291258-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-01-12rv: Refactor da_monitor to minimise macrosGabriele Monaco
The da_monitor helper functions are generated from macros of the type: DECLARE_DA_FUNCTION(name, type) \ static void da_func_x_##name(type arg) {} \ static void da_func_y_##name(type arg) {} \ This is good to minimise code duplication but the long macros made of skipped end of lines is rather hard to parse. Since functions are static, the advantage of naming them differently for each monitor is minimal. Refactor the da_monitor.h file to minimise macros, instead of declaring functions from macros, we simply declare them with the same name for all monitors (e.g. da_func_x) and for any remaining reference to the monitor name (e.g. tracepoints, enums, global variables) we use the CONCATENATE macro. In this way the file is much easier to maintain while keeping the same generality. Functions depending on the monitor types are now conditionally compiled according to the value of RV_MON_TYPE, which must be defined in the monitor source. The monitor type can be specified as in the original implementation, although it's best to keep the default implementation (unsigned char) as not all parts of code support larger data types, and likely there's no need. We keep the empty macro definitions to ease review of this change with diff tools, but cleanup is required. Also adapt existing monitors to keep the build working. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20251126104241.291258-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-01-07trace: ftrace_dump_on_oops[] is not exported, make it staticBen Dooks
The ftrace_dump_on_oops string is not used outside of trace.c so make it static to avoid the export warning from sparse: kernel/trace/trace.c:141:6: warning: symbol 'ftrace_dump_on_oops' was not declared. Should it be static? Fixes: dd293df6395a2 ("tracing: Move trace sysctls into trace.c") Link: https://patch.msgid.link/20260106231054.84270-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-07tracing: Add recursion protection in kernel stack trace recordingSteven Rostedt
A bug was reported about an infinite recursion caused by tracing the rcu events with the kernel stack trace trigger enabled. The stack trace code called back into RCU which then called the stack trace again. Expand the ftrace recursion protection to add a set of bits to protect events from recursion. Each bit represents the context that the event is in (normal, softirq, interrupt and NMI). Have the stack trace code use the interrupt context to protect against recursion. Note, the bug showed an issue in both the RCU code as well as the tracing stacktrace code. This only handles the tracing stack trace side of the bug. The RCU fix will be handled separately. Link: https://lore.kernel.org/all/20260102122807.7025fc87@gandalf.local.home/ Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Boqun Feng <boqun.feng@gmail.com> Link: https://patch.msgid.link/20260105203141.515cd49f@gandalf.local.home Reported-by: Yao Kai <yaokai34@huawei.com> Tested-by: Yao Kai <yaokai34@huawei.com> Fixes: 5f5fa7ea89dc ("rcu: Don't use negative nesting depth in __rcu_read_unlock()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-07ring-buffer: Avoid softlockup in ring_buffer_resize() during memory freeWupeng Ma
When user resize all trace ring buffer through file 'buffer_size_kb', then in ring_buffer_resize(), kernel allocates buffer pages for each cpu in a loop. If the kernel preemption model is PREEMPT_NONE and there are many cpus and there are many buffer pages to be freed, it may not give up cpu for a long time and finally cause a softlockup. To avoid it, call cond_resched() after each cpu buffer free as Commit f6bd2c92488c ("ring-buffer: Avoid softlockup in ring_buffer_resize()") does. Detailed call trace as follow: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 24-....: (14837 ticks this GP) idle=521c/1/0x4000000000000000 softirq=230597/230597 fqs=5329 rcu: (t=15004 jiffies g=26003221 q=211022 ncpus=96) CPU: 24 UID: 0 PID: 11253 Comm: bash Kdump: loaded Tainted: G EL 6.18.2+ #278 NONE pc : arch_local_irq_restore+0x8/0x20 arch_local_irq_restore+0x8/0x20 (P) free_frozen_page_commit+0x28c/0x3b0 __free_frozen_pages+0x1c0/0x678 ___free_pages+0xc0/0xe0 free_pages+0x3c/0x50 ring_buffer_resize.part.0+0x6a8/0x880 ring_buffer_resize+0x3c/0x58 __tracing_resize_ring_buffer.part.0+0x34/0xd8 tracing_resize_ring_buffer+0x8c/0xd0 tracing_entries_write+0x74/0xd8 vfs_write+0xcc/0x288 ksys_write+0x74/0x118 __arm64_sys_write+0x24/0x38 Cc: <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251228065008.2396573-1-mawupeng1@huawei.com Signed-off-by: Wupeng Ma <mawupeng1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-01-07tracing: Drop unneeded assignment to soft_modeJulia Lawall
soft_mode is not read in the enable case, so drop the assignment. Drop also the comment text that refers to the assignment and realign the comment. Cc: "Paul E . McKenney" <paulmck@kernel.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251226110531.4129794-1-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-12-21bpf: move recursion detection logic to helpersPuranjay Mohan
BPF programs detect recursion by doing atomic inc/dec on a per-cpu active counter from the trampoline. Create two helpers for operations on this active counter, this makes it easy to changes the recursion detection logic in future. This commit makes no functional changes. Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20251219184422.2899902-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-19Merge tag 'trace-v6.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Add Documentation/core-api/tracepoint.rst to TRACING in MAINTAINERS file Updates to the tracepoint.rst document should be reviewed by the tracing maintainers. - Fix warning triggered by perf attaching to synthetic events The synthetic events do not add a function to be registered when perf attaches to them. This causes a warning when perf registers a synthetic event and passes a NULL pointer to the tracepoint register function. Ideally synthetic events should be updated to work with perf, but as that's a feature and not a bug fix, simply now return -ENODEV when perf tries to register an event that has a NULL pointer for its function. This no longer causes a kernel warning and simply causes the perf code to fail with an error message. - Fix 32bit overflow in option flag test The option's flags changed from 32 bits in size to 64 bits in size. Fix one of the places that shift 1 by the option bit number to to be 1ULL. - Fix the output of printing the direct jmp functions The enabled_functions that shows how functions are being attached by ftrace wasn't updated to accommodate the new direct jmp trampolines that set the LSB of the pointer, and outputs garbage. Update the output to handle the direct jmp trampolines. * tag 'trace-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Fix address for jmp mode in t_show() tracing: Fix UBSAN warning in __remove_instance() tracing: Do not register unsupported perf events MAINTAINERS: add tracepoint core-api doc files to TRACING