| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt
- Do not invalidate entire buffer for invalid sub-buffers
For the persistent ring buffer, if one sub-buffer is found to be
invalid, it invalidates the entire per CPU ring buffer. This can lose
a lot of valuable data if there's some corruption with the writes to
the buffer not syncing properly on a hard crash. Instead, if a
sub-buffer is found to be invalid, simply zero it out and mark it for
"missed events".
When the persistent ring buffer is read and a sub-buffer that was
cleared due to being invalid on boot up is discovered, the output
will show "[LOST EVENTS]" to let the user know that events were
missing at that location. Displaying the events from valid buffers
can still be useful.
- Add a test to be able to test corrupted sub-buffers
If a persistent ring buffer is created as "ptraingtest" and the new
config that adds the test is enabled, when a panic happens, the
kernel will randomly corrupt one of the per CPU ring buffers. On boot
up, the sub-buffers with the corruption should be cleared and
flagged. When reading this buffer, the missed events should should
[LOST EVENTS].
- Add commit number in the sub-buffer meta debug info
The commit is used to know the content of a meta page. Add it to the
buffer_meta file that is shown for each per CPU buffer.
- Clean up the persistent ring buffer validation code
Add some helper functions and make variable names more consistent.
* tag 'trace-ring-buffer-v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Better comment the use of RB_MISSED_EVENTS
ring-buffer: Show persistent buffer dropped events in trace_pipe file
ring-buffer: Show persistent buffer dropped events in trace file
ring-buffer: Have dropped subbuffers be persistent across reboots
ring-buffer: Cleanup buffer_data_page related code
ring-buffer: Cleanup persistent ring buffer validation
ring-buffer: Show commit numbers in buffer_meta file
ring-buffer: Add persistent ring buffer invalid-page inject test
ring-buffer: Skip invalid sub-buffers when rewinding persistent ring buffer
ring-buffer: Skip invalid sub-buffers when validating persistent ring buffer
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"SMP load-balancing updates:
- A large series to introduce infrastructure for cache-aware load
balancing, with the goal of co-locating tasks that share data
within the same Last Level Cache (LLC) domain. By improving cache
locality, the scheduler can reduce cache bouncing and cache misses,
ultimately improving data access efficiency.
Implemented by Chen Yu and Tim Chen, based on early prototype work
by Peter Zijlstra, with fixes by Jianyong Wu, Peter Zijlstra and
Shrikanth Hegde.
- A series to simplify CONFIG_SCHED_SMT ifdef usage (Shrikanth Hegde)
Fair scheduler updates:
- A series to improve SD_ASYM_CPUCAPACITY scheduling by introducing
SMT awareness (Andrea Righi, K Prateek Nayak)
- A series to optimize cfs_rq and sched_entity allocation for better
data locality (Zecheng Li)
- A preparatory series to change fair/cgroup scheduling to a single
runqueue, without the final change (Peter Zijlstra)
- Auto-manage ext/fair dl_server bandwidth (Andrea Righi)
- Fix cpu_util runnable_avg arithmetic (Hongyan Xia)
- Optimize update_tg_load_avg()'s rate-limiting code (Rik van Riel)
- Allow account_cfs_rq_runtime() to throttle current hierarchy
(K Prateek Nayak)
- Update util_est after updating util_avg during dequeue, to fix the
util signal update logic, which reduces signal noise (Vincent
Guittot)
Scheduler topology updates:
- Allow multiple domains to claim sched_domain_shared (K Prateek
Nayak)
- Add parameter to split LLC (Peter Zijlstra)
Core scheduler updates:
- Use trace_call__<tp>() to save a static branch (Gabriele Monaco)
Scheduler statistics updates:
- Drop now-stale mul_u64_u64_div_u64() cputime over-approximation
guard (Nicolas Pitre)
Deadline scheduler updates:
- Reject debugfs dl_server writes for offline CPUs (Andrea Righi)
- Fix replenishment logic for non-deferred servers (Yuri Andriaccio)
RT scheduling updates:
- Turn RT_PUSH_IPI default off for non PREEMPT_RT (Steven Rostedt)
- Update default bandwidth for real-time tasks to 1.0 (Yuri
Andriaccio)
Proxy scheduling updates:
- A series to implement Optimized Donor Migration for Proxy Execution
(John Stultz, Peter Zijlstra)
- Various proxy scheduling cleanups and fixes (Peter Zijlstra,
K Prateek Nayak)
Misc fixes, improvements and cleanups by Aaron Lu, Andrea Righi,
Zenghui Yu, Chen Yu, Guanyou.Chen, John Stultz, Shrikanth Hegde,
Peter Zijlstra, Liang Luo and Yiyang Chen"
* tag 'sched-core-2026-06-14' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (91 commits)
sched/fair: Fix newidle vs core-sched
sched/deadline: Use task_on_rq_migrating() helper
sched/core: Combine separate 'else' and 'if' statements
sched/fair: Fix cpu_util runnable_avg arithmetic
sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()
sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()
sched/fair: Call update_curr() before unthrottling the hierarchy
sched/fair: Use throttled_csd_list for local unthrottle
sched/fair: Convert cfs bandwidth throttling to use guards
sched/fair: Allocate cfs_tg_state with percpu allocator
sched/fair: Remove task_group->se pointer array
sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_state
sched: restore timer_slack_ns when resetting RT policy on fork
MAINTAINERS: Fix spelling mistake in Peter's name
sched: Simplify ttwu_runnable()
sched/proxy: Remove superfluous clear_task_blocked_in()
sched/proxy: Remove PROXY_WAKING
sched/proxy: Switch proxy to use p->is_blocked
sched/proxy: Only return migrate when needed
sched: Be more strict about p->is_blocked
...
|
|
If the persistent ring buffer is detected on boot up to have a corrupted
sub-buffer, that sub-buffer is cleared to zero and its commit value has
the RB_MISSED_EVENTS bit set. That bit is to allow the "trace",
"trace_pipe" and "trace_pipe_raw" files know that events were dropped by
outputting "[LOST EVENTS]".
Only in this case does that bit get set in the writeable portion of the
ring buffer. When events are dropped in the normal ring buffer, that
information is stored in the cpu_buffer descriptor and the
RB_MISSED_EVENTS is set in the buffer page at the time the page is
consumed. It is never set in the writeable portion of the buffer.
Add comments to describe this better as it can be confusing to know when
the RB_MISSED_EVENTS are set in the commit portion of the buffer page.
Link: https://lore.kernel.org/all/20260529001500.14178455a046a5cbc6180861@kernel.org/
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260528223738.41276c0e@fedora
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When the persistent ring buffer is validated on boot up, if a subbuffer is
deemed invalid, it resets the buffer and continues. Have the code preserve
the RB_MISSED_EVENTS flag in the commit portion of the subbuffer header
and pass that back so that the trace_pipe file can show the missed events
like the trace file does.
For example:
<...>-1242 [005] d.... 4429.120116: page_fault_user: address=0x7ffaebb6e728 ip=0x7ffaeb9d4960 error_code=0x7
<...>-1242 [005] ..... 4429.120124: mm_page_alloc: page=00000000055254f3 pfn=0x1373bd order=0 migratetype=1 gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP
<...>-1242 [005] d..2. 4429.120132: tlb_flush: pages:1 reason:local MM shootdown (3)
CPU:5 [LOST EVENTS]
<...>-1242 [005] d.... 4429.120661: page_fault_user: address=0x55ba7c2d0944 ip=0x55ba7c20cd02 error_code=0x7
<...>-1242 [005] ..... 4429.120669: mm_page_alloc: page=0000000005a02500 pfn=0x12b6e4 order=0 migratetype=1 gfp_flags=GFP_HIGHUSER_MOVABLE|__GFP_COMP
<...>-1242 [005] d..2. 4429.120680: tlb_flush: pages:1 reason:local MM shootdown (3)
Link: https://patch.msgid.link/20260522171052.156419479@kernel.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When the persistent ring buffer is validated on boot up, if a subbuffer is
deemed invalid, it resets the buffer and continues. Currently, these lost
events are not shown in the trace file output.
Have the trace iterator look for subbuffers that have the RB_MISSED_EVENTS
set and set the iter->missed_events flag when it is detected. This will
then have the trace file shows "LOST EVENTS" when it reads across a
subbuffer that was corrupted and invalidated.
For example:
<...>-1016 [005] ...1. 6230.660403: preempt_disable: caller=__mod_memcg_state+0x1c8/0x200 parent=__mod_memcg_state+0x1c8/0x200
CPU:5 [LOST EVENTS]
<...>-1016 [005] ..... 6230.660673: kmem_cache_alloc: call_site=__anon_vma_prepare+0x1ad/0x1e0 ptr=000000006e40294c name=anon_vma bytes_req=200 bytes_alloc=208 gfp_flags=GFP_KERNEL node=-1 accounted=true
Link: https://patch.msgid.link/20260522171052.006276604@kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When the persistent ring buffer detects a corrupted subbuffer, it will
zero its size and report dropped pages in the dmesg, then it continues
normally.
But if a reboot happens without clearing or restarting tracing on the
persistent ring buffer, the next boot will show no pages are dropped.
If the persistent ring buffer is still the same, then it should still
report dropped pages so the user knows that the buffer has missing events.
Add the RB_MISSED_EVENTS flag to the commit value of the subbuffer so that
the next boot will still show that pages were dropped.
Link: https://patch.msgid.link/20260522171051.860780286@kernel.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Code cleanup related to buffer_data_page for readability,
which includes:
- Introduce rb_data_page_commit() and rb_data_page_size()
- Use 'dpage' for buffer_data_page, instead of 'bpage' because
'bpage' is used for buffer_page.
Link: https://patch.msgid.link/20260522171051.722645963@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Cleanup rb_meta_validate_events() function to make it easier to read.
This includes the following cleanups:
- Introduce rb_validatation_state to hold working variables in
validation.
- Move repleated validation state updates into rb_validate_buffer().
- Move reader_page injection code outside of rb_meta_validate_events().
Link: https://patch.msgid.link/20260522171051.577231395@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In addition to the index number, show the commit numbers of
each data page in the per_cpu buffer_meta file.
This is useful for understanding the current status of the
persistent ring buffer. (Note that this file is shown
only for persistent ring buffer and its backup instance)
Link: https://patch.msgid.link/20260522171051.424411323@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Add a self-corrupting test for the persistent ring buffer.
This will inject an erroneous value to some sub-buffer pages (where
the index is even or multiples of 5) in the persistent ring buffer
when the kernel panics, and checks whether the number of detected
invalid pages and the total entry_bytes are the same as the recorded
values after reboot.
This ensures that the kernel can correctly recover a partially
corrupted persistent ring buffer after a reboot or panic.
The test only runs on the persistent ring buffer whose name is
"ptracingtest". The user has to fill it with events before a
kernel panic.
To run the test, enable CONFIG_RING_BUFFER_PERSISTENT_INJECT
and add the following kernel cmdline:
reserve_mem=20M:2M:trace trace_instance=ptracingtest^traceoff@trace
panic=1
Run the following commands after the 1st boot:
cd /sys/kernel/tracing/instances/ptracingtest
echo 1 > tracing_on
echo 1 > events/enable
sleep 3
echo c > /proc/sysrq-trigger
After panic message, the kernel will reboot and run the verification
on the persistent ring buffer, e.g.
Ring buffer meta [2] invalid buffer page detected
Ring buffer meta [2] is from previous boot! (318 pages discarded)
Ring buffer testing [2] invalid pages: PASSED (318/318)
Ring buffer testing [2] entry_bytes: PASSED (1300476/1300476)
Link: https://patch.msgid.link/20260522171051.260140328@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Skip invalid sub-buffers when rewinding the persistent ring buffer
instead of stopping the rewinding the ring buffer. The skipped
buffers are cleared.
To ensure the rewinding stops at the unused page, this also clears
buffer_data_page::time_stamp when tracing resets the buffer. This
allows us to identify unused pages and empty pages.
Link: https://patch.msgid.link/20260522171051.091265852@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
[ SDR: Have reader_page still get evaluated if header_page fails ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Skip invalid sub-buffers when validating the persistent ring buffer
instead of discarding the entire ring buffer. Only skipped buffers
are invalidated (cleared).
If the cache data in memory fails to be synchronized during a reboot,
the persistent ring buffer may become partially corrupted, but other
sub-buffers may still contain readable event data. Only discard the
subbuffers that are found to be corrupted.
Link: https://lore.kernel.org/all/20260520185018.051228084@kernel.org/
Link: https://patch.msgid.link/20260522171050.914418536@kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
[SDR: Fixed max_loops in rb_iter_peek() as well ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
On real hardware, panic and machine reboot may not flush hardware cache
to memory. This means the persistent ring buffer, which relies on a
coherent state of memory, may not have its events written to the buffer
and they may be lost. Moreover, there may be inconsistency with the
counters which are used for validation of the integrity of the
persistent ring buffer which may cause all data to be discarded.
To avoid this issue, stop recording of the ring buffer on panic and
flush the cache of the ring buffer's memory.
Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance")
Cc: stable@vger.kernel.org
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When tracing is active while reading the trace file, if the iterator
reading the buffer detects that the writer has passed the iterator head,
it will reset and set a "missed events" flag. This flag is passed to the
output processing to show the user that events were missed:
CPU:4 [LOST EVENTS]
The problem is that the flag is reset after it is checked in
ring_buffer_iter_dropped(). But the "trace" file iterates over all the CPU
ring buffers and it will check if they are dropped when figuring out which
buffer to print next. This prematurely clears the missed_events flag if
the CPU buffer with the missed events is not the one that is printed next.
On the iteration where the CPU buffer with the missed events is printed,
the check if it had missed events would return false and the output does
not show that events were missed.
Do not reset the missed_events flag when checking if there were missed
events, but instead clear it when moving the iterator head to the next
event.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260520220801.4fd09d13@fedora
Fixes: c9b7a4a72ff64 ("ring-buffer/tracing: Have iterator acknowledge dropped events")
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is disabled, sched_clock() is
already assumed to provide stable semantics, but the public header
doesn't provide a sched_clock_stable() stub for that case.
Add a header stub that always returns true and clean up the duplicate
local stub in ring_buffer.c, so callers can use sched_clock_stable()
unconditionally.
Signed-off-by: Yiyang Chen <cyyzero16@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://patch.msgid.link/56e45338858946cd9581b75c8bd45dd37dba52c5.1778773587.git.cyyzero16@gmail.com
|
|
Since the cpu_buffer->reader_page is updated if there are unwound
pages. After that update, we should skip the page if it is the
original reader_page, because the original reader_page is already
checked.
Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Link: https://patch.msgid.link/177701353063.2223789.1471163147644103306.stgit@mhiramat.tok.corp.google.com
Fixes: ca296d32ece3 ("tracing: ring_buffer: Rewind persistent ring buffer on reboot")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
As pointed out by Smatch, the ring-buffer descriptor array page_va is
counted by nr_page_va, but the accessor ring_buffer_desc_page() allows
access off by one.
Currently, this does not cause problems, as the page ID always comes
from a trusted source. Nonetheless, ensure robustness and fix the
accessor. While at it, make the page_id unsigned.
Link: https://patch.msgid.link/20260410124527.3563970-1-vdonnefort@google.com
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The header_page tracefs metadata currently reports overwrite as an
int field with size 1. That makes parsers warn about a type and
size mismatch even though the field is only used as a one-byte flag
within commit.
Keep the shared offset with commit as-is, but report overwrite as
char so the declared type matches the hardcoded size. The signedness
is already carried separately by the emitted signed field.
Link: https://patch.msgid.link/20260406165333.46052-1-create0818@163.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216999
Signed-off-by: Cao Ruichuang <create0818@163.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
On CPU hotplug, if it is the first time a trace_buffer sees a CPU, a
ring_buffer_per_cpu will be allocated and its corresponding bit toggled
in the cpumask. Many readers check this cpumask to know if they can
safely read the ring_buffer_per_cpu but they are doing so without memory
ordering and may observe the cpumask bit set while having NULL buffer
pointer.
Enforce the memory read ordering by sending an IPI to all online CPUs.
The hotplug path is a slow-path anyway and it saves us from adding read
barriers in numerous call sites.
Link: https://patch.msgid.link/20260401053659.3458961-1-vdonnefort@google.com
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The testing for tracing was triggering a timestamp count issue that was
always off by one. This has been happening for some time but has never
been reported by anyone else. It was finally discovered to be an issue
with the "uptime" (jiffies) clock that happened to be traced and the
internal recursion caused the discrepancy. This would have been much
easier to solve if the clock function being used was displayed when the
error was detected.
Add the clock function to the error output.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260323202212.479bb288@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
trace/ring-buffer/core
The commit f35dbac69421 ("ring-buffer: Fix to update per-subbuf entries of
persistent ring buffer") was a fix and merged upstream. It is needed for
some other work in the ring buffer. The current branch has the remote
buffer code that is shared with the Arm64 subsystem and can't be rebased.
Merge in the upstream commit to allow continuing of the ring buffer work.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the validation loop in rb_meta_validate_events() updates the same
cpu_buffer->head_page->entries, the other subbuf entries are not updated.
Fix to use head_page to update the entries field, since it is the cursor
in this loop.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In preparation for allowing the writing of ring-buffer compliant pages
outside of ring_buffer.c, move buffer_data_page and timestamps encoding
macros into the publicly available ring_buffer_types.h.
Link: https://patch.msgid.link/20260309162516.2623589-13-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Just like for the kernel events directory, add 'enable', 'header_page'
and 'header_event' at the root of the trace remote events/ directory.
Link: https://patch.msgid.link/20260309162516.2623589-11-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Hopefully, the remote will only swap pages on the kernel instruction (via
the swap_reader_page() callback). This means we know at what point the
ring-buffer geometry has changed. It is therefore possible to rearrange
the kernel view of that ring-buffer to allow non-consuming read.
Link: https://patch.msgid.link/20260309162516.2623589-5-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add ring-buffer remotes to support entities outside of the kernel (such
as firmware or a hypervisor) that writes events into a ring-buffer using
the tracefs format
Require a description of the ring-buffer pages (struct
trace_buffer_desc) and callbacks (swap_reader_page and reset) to set up
the ring-buffer on the kernel side.
Expect the remote entity to maintain and update the meta-page.
Link: https://patch.msgid.link/20260309162516.2623589-4-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The subbuf_ids field allows to point to a specific page from the
ring-buffer based on its ID. As a preparation or the upcoming
ring-buffer remote support, point this array to the buffer_page instead
of the buffer_data_page.
Link: https://patch.msgid.link/20260309162516.2623589-3-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add two fields pages_touched and pages_lost to the ring-buffer
meta-page. Those fields are useful to get the number of used pages in
the ring-buffer.
Link: https://patch.msgid.link/20260309162516.2623589-2-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When a process forks, the child process copies the parent's VMAs but the
user_mapped reference count is not incremented. As a result, when both the
parent and child processes exit, tracing_buffers_mmap_close() is called
twice. On the second call, user_mapped is already 0, causing the function to
return -ENODEV and triggering a WARN_ON.
Normally, this isn't an issue as the memory is mapped with VM_DONTCOPY set.
But this is only a hint, and the application can call
madvise(MADVISE_DOFORK) which resets the VM_DONTCOPY flag. When the
application does that, it can trigger this issue on fork.
Fix it by incrementing the user_mapped reference count without re-mapping
the pages in the VMA's open callback.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://patch.msgid.link/20260227025842.1085206-1-wangqing7171@gmail.com
Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer")
Reported-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=3b5dd2030fe08afdf65d
Tested-by: syzbot+3b5dd2030fe08afdf65d@syzkaller.appspotmail.com
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Check the event length before adding it for accessing next index in
rb_read_data_buffer(). Since this function is used for validating
possibly broken ring buffers, the length of the event could be broken.
In that case, the new event (e + len) can point a wrong address.
To avoid invalid memory access at boot, check whether the length of
each event is in the possible range before using it.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177123421541.142205.9414352170164678966.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
There is a pointer head_page in rb_meta_validate_events() which is not
initialized at the beginning of a function. This pointer can be dereferenced
if there is a failure during reader page validation. In this case the control
is passed to "invalid" label where the pointer is dereferenced in a loop.
To fix the issue initialize orig_head and head_page before calling
rb_validate_buffer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260213100130.2013839-1-d.dulov@aladdin.ru
Closes: https://lore.kernel.org/r/202406130130.JtTGRf7W-lkp@intel.com/
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Avoid running the wakeup irq_work on an isolated CPU. Since the wakeup can
run on any CPU, let's pick a housekeeping CPU to do the job.
This change reduces additional noise when tracing isolated CPUs. For
example, the following ipi_send_cpu stack trace was captured with
nohz_full=2 on the isolated CPU:
<idle>-0 [002] d.h4. 1255.379293: ipi_send_cpu: cpu=2 callsite=irq_work_queue+0x2d/0x50 callback=rb_wake_up_waiters+0x0/0x80
<idle>-0 [002] d.h4. 1255.379329: <stack trace>
=> trace_event_raw_event_ipi_send_cpu
=> __irq_work_queue_local
=> irq_work_queue
=> ring_buffer_unlock_commit
=> trace_buffer_unlock_commit_regs
=> trace_event_buffer_commit
=> trace_event_raw_event_x86_irq_vector
=> __sysvec_apic_timer_interrupt
=> sysvec_apic_timer_interrupt
=> asm_sysvec_apic_timer_interrupt
=> pv_native_safe_halt
=> default_idle
=> default_idle_call
=> do_idle
=> cpu_startup_entry
=> start_secondary
=> common_startup_64
The IRQ work interrupt alone adds considerable noise, but the impact can
get even worse with PREEMPT_RT, because the IRQ work interrupt is then
handled by a separate kernel thread. This requires a task switch and makes
tracing useless for analyzing latency on an isolated CPU.
After applying the patch, the trace is similar, but ipi_send_cpu always
targets a non-isolated CPU.
Unfortunately, irq_work_queue_on() is not NMI-safe. When running in NMI
context, fall back to queuing the irq work on the local CPU.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Clark Williams <clrkwllms@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20260108132132.2473515-1-ptesarik@suse.com
Signed-off-by: Petr Tesarik <ptesarik@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When user resize all trace ring buffer through file 'buffer_size_kb',
then in ring_buffer_resize(), kernel allocates buffer pages for each
cpu in a loop.
If the kernel preemption model is PREEMPT_NONE and there are many cpus
and there are many buffer pages to be freed, it may not give up cpu
for a long time and finally cause a softlockup.
To avoid it, call cond_resched() after each cpu buffer free as Commit
f6bd2c92488c ("ring-buffer: Avoid softlockup in ring_buffer_resize()")
does.
Detailed call trace as follow:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 24-....: (14837 ticks this GP) idle=521c/1/0x4000000000000000 softirq=230597/230597 fqs=5329
rcu: (t=15004 jiffies g=26003221 q=211022 ncpus=96)
CPU: 24 UID: 0 PID: 11253 Comm: bash Kdump: loaded Tainted: G EL 6.18.2+ #278 NONE
pc : arch_local_irq_restore+0x8/0x20
arch_local_irq_restore+0x8/0x20 (P)
free_frozen_page_commit+0x28c/0x3b0
__free_frozen_pages+0x1c0/0x678
___free_pages+0xc0/0xe0
free_pages+0x3c/0x50
ring_buffer_resize.part.0+0x6a8/0x880
ring_buffer_resize+0x3c/0x58
__tracing_resize_ring_buffer.part.0+0x34/0xd8
tracing_resize_ring_buffer+0x8c/0xd0
tracing_entries_write+0x74/0xd8
vfs_write+0xcc/0x288
ksys_write+0x74/0x118
__arm64_sys_write+0x24/0x38
Cc: <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251228065008.2396573-1-mawupeng1@huawei.com
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Fix multiple typos in comments:
"ording" -> "ordering"
"scatch" -> "scratch"
"wont" -> "won't"
Link: https://patch.msgid.link/20251121221835.28032-5-mhi@mailbox.org
Signed-off-by: Maurice Hieronymus <mhi@mailbox.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull trace ring-buffer cleanup from Steven Rostedt:
- Add helper functions for allocations
The allocation of the per CPU buffer descriptor, the buffer page
descriptors and the buffer page data itself can be pretty ugly.
Add some helper macros and a function to have the code that allocates
buffer pages and such look a little cleaner.
* tag 'trace-ringbuffer-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Add helper functions for allocations
|
|
The allocation of the per CPU buffer descriptor, the buffer page
descriptors and the buffer page data itself can be pretty ugly:
kzalloc_node(ALIGN(sizeof(struct buffer_page), cache_line_size()),
GFP_KERNEL, cpu_to_node(cpu));
And the data pages:
page = alloc_pages_node(cpu_to_node(cpu),
GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_COMP | __GFP_ZERO, order);
if (!page)
return NULL;
bpage->page = page_address(page);
rb_init_page(bpage->page);
Add helper functions to make the code easier to read.
This does make all allocations of the data page (bpage->page) allocated
with the __GFP_RETRY_MAYFAIL flag (and not just the bulk allocator). Which
is actually better, as allocating the data page for the ring buffer tracing
should try hard but not trigger the OOM killer.
Link: https://lore.kernel.org/all/CAHk-=wjMMSAaqTjBSfYenfuzE1bMjLj+2DLtLWJuGt07UGCH_Q@mail.gmail.com/
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251125121153.35c07461@gandalf.local.home
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The function ring_buffer_map_get_reader() is a bit more strict than the
other get reader functions, and except for certain situations the
rb_get_reader_page() should not return NULL. If it does, it triggers a
warning.
This warning was triggering but after looking at why, it was because
another acceptable situation was happening and it wasn't checked for.
If the reader catches up to the writer and there's still data to be read
on the reader page, then the rb_get_reader_page() will return NULL as
there's no new page to get.
In this situation, the reader page should not be updated and no warning
should trigger.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Reported-by: syzbot+92a3745cea5ec6360309@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/690babec.050a0220.baf87.0064.GAE@google.com/
Link: https://lore.kernel.org/20251016132848.1b11bb37@gandalf.local.home
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The return value from `__rb_map_vma()`, which rejects writable or
executable mappings (VM_WRITE, VM_EXEC, or !VM_MAYSHARE), was being
ignored. As a result the caller of `__rb_map_vma` always returned 0
even when the mapping had actually failed, allowing it to proceed
with an invalid VMA.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20251008172516.20697-1-ankitkhushwaha.linux@gmail.com
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Reported-by: syzbot+ddc001b92c083dbf2b97@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=194151be8eaebd826005329b2e123aecae714bdb
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Remove unnecessary semicolons.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250813095114.559530-1-liaoyuanhong@vivo.com
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull more tracing updates from Steven Rostedt:
- Remove unneeded goto out statements
Over time, the logic was restructured but left a "goto out" where the
out label simply did a "return ret;". Instead of jumping to this out
label, simply return immediately and remove the out label.
- Add guard(ring_buffer_nest)
Some calls to the tracing ring buffer can happen when the ring buffer
is already being written to at the same context (for example, a
trace_printk() in between a ring_buffer_lock_reserve() and a
ring_buffer_unlock_commit()).
In order to not trigger the recursion detection, these functions use
ring_buffer_nest_start() and ring_buffer_nest_end(). Create a guard()
for these functions so that their use cases can be simplified and not
need to use goto for the release.
- Clean up the tracing code with guard() and __free() logic
There were several locations that were prime candidates for using
guard() and __free() helpers. Switch them over to use them.
- Fix output of function argument traces for unsigned int values
The function tracer with "func-args" option set will record up to 6
argument registers and then use BTF to format them for human
consumption when the trace file is read. There are several arguments
that are "unsigned long" and even "unsigned int" that are either and
address or a mask. It is easier to understand if they were printed
using hexadecimal instead of decimal. The old method just printed all
non-pointer values as signed integers, which made it even worse for
unsigned integers.
For instance, instead of:
__local_bh_disable_ip(ip=-2127311112, cnt=256) <-handle_softirqs
show:
__local_bh_disable_ip(ip=0xffffffff8133cef8, cnt=0x100) <-handle_softirqs"
* tag 'trace-v6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have unsigned int function args displayed as hexadecimal
ring-buffer: Convert ring_buffer_write() to use guard(preempt_notrace)
tracing: Use __free(kfree) in trace.c to remove gotos
tracing: Add guard() around locks and mutexes in trace.c
tracing: Add guard(ring_buffer_nest)
tracing: Remove unneeded goto out logic
|
|
The function ring_buffer_write() has a goto out to only do a
preempt_enable_notrace(). This can be replaced by a guard.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://lore.kernel.org/20250801203858.205479143@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Don't populate the read-only 'type' on the stack at run time,
instead make it static.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250714160858.1234719-1-colin.i.king@gmail.com
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the ring buffer was first introduced, reading the non-consuming
"trace" file required disabling the writing of the ring buffer. To make
sure the writing was fully disabled before iterating the buffer with a
non-consuming read, it would set the disable flag of the buffer and then
call an RCU synchronization to make sure all the buffers were
synchronized.
The function ring_buffer_read_start() originally would initialize the
iterator and call an RCU synchronization, but this was for each individual
per CPU buffer where this would get called many times on a machine with
many CPUs before the trace file could be read. The commit 72c9ddfd4c5bf
("ring-buffer: Make non-consuming read less expensive with lots of cpus.")
separated ring_buffer_read_start into ring_buffer_read_prepare(),
ring_buffer_read_sync() and then ring_buffer_read_start() to allow each of
the per CPU buffers to be prepared, call the read_buffer_read_sync() once,
and then the ring_buffer_read_start() for each of the CPUs which made
things much faster.
The commit 1039221cc278 ("ring-buffer: Do not disable recording when there
is an iterator") removed the requirement of disabling the recording of the
ring buffer in order to iterate it, but it did not remove the
synchronization that was happening that was required to wait for all the
buffers to have no more writers. It's now OK for the buffers to have
writers and no synchronization is needed.
Remove the synchronization and put back the interface for the ring buffer
iterator back before commit 72c9ddfd4c5bf was applied.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250630180440.3eabb514@batman.local.home
Reported-by: David Howells <dhowells@redhat.com>
Fixes: 1039221cc278 ("ring-buffer: Do not disable recording when there is an iterator")
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Rewind persistent ring buffer pages which have been read in the previous
boot. Those pages are highly possible to be lost before writing it to the
disk if the previous kernel crashed. In this case, the trace data is kept
on the persistent ring buffer, but it can not be read because its commit
size has been reset after read. This skips clearing the commit size of
each sub-buffer and recover it after reboot.
Note: If you read the previous boot data via trace_pipe, that is not
accessible in that time. But reboot without clearing (or reusing) the read
data, the read data is recovered again in the next boot.
Thus, when you read the previous boot data, clear it by `echo > trace`.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/174899582116.955054.773265393511190051.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Enlarge the critical section in ring_buffer_subbuf_order_set() to
ensure that error handling takes place with per-buffer mutex held,
thus preventing list corruption and other concurrency-related issues.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
Link: https://lore.kernel.org/20250606112242.1510605-1-dmantipov@yandex.ru
Reported-by: syzbot+05d673e83ec640f0ced9@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=05d673e83ec640f0ced9
Fixes: f9b94daa542a8 ("ring-buffer: Set new size of the ring buffer sub page")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
- Allow the persistent ring buffer to be memory mapped
In the last merge window there was issues with the implementation of
mapping the persistent ring buffer because it was assumed that the
persistent memory was just physical memory without being part of the
kernel virtual address space. But this was incorrect and the
persistent ring buffer can be mapped the same way as the allocated
ring buffer is mapped.
The metadata for the persistent ring buffer is different than the
normal ring buffer and the organization of mapping it to user space
is a little different. Make the updates needed to the meta data to
allow the persistent ring buffer to be mapped to user space.
- Fix cpus_read_lock() with buffer->mutex and cpu_buffer->mapping_lock
Mapping the ring buffer to user space uses the
cpu_buffer->mapping_lock. The buffer->mutex can be taken when the
mapping_lock is held, giving the locking order of:
cpu_buffer->mapping_lock -->> buffer->mutex. But there also exists
the ordering:
buffer->mutex -->> cpus_read_lock()
mm->mmap_lock -->> cpu_buffer->mapping_lock
cpus_read_lock() -->> mm->mmap_lock
causing a circular chain of:
cpu_buffer->mapping_lock -> buffer->mutex -->> cpus_read_lock() -->>
mm->mmap_lock -->> cpu_buffer->mapping_lock
By moving the cpus_read_lock() outside the buffer->mutex where:
cpus_read_lock() -->> buffer->mutex, breaks the deadlock chain.
- Do not trigger WARN_ON() for commit overrun
When the ring buffer is user space mapped and there's a "commit
overrun" (where an interrupt preempted an event, and then added so
many events it filled the buffer having to drop events when it hit
the preempted event) a WARN_ON() was triggered if this was read via a
memory mapped buffer.
This is due to "missed events" being non zero when the reader page
ended up with the commit page. The idea was, if the writer is on the
reader page, there's only one page that has been written to and there
should be no missed events.
But if a commit overrun is done where the writer is off the commit
page and looped around to the commit page causing missed events, it
is possible that the reader page is the commit page with missed
events.
Instead of triggering a WARN_ON() when the reader page is the commit
page with missed events, trigger it when the reader page is the
tail_page with missed events. That's because the writer is always on
the tail_page if an event was interrupted (which holds the commit
event) and continues off the commit page.
- Reset the persistent buffer if it is fully consumed
On boot up, if the user fully consumes the last boot buffer of the
persistent buffer, if it reboots without enabling it, there will
still be events in the buffer which can cause confusion. Instead,
reset the buffer when it is fully consumed, so that the data is not
read again.
- Clean up some goto out jumps
There's a few cases that the code jumps to the "out:" label that
simply returns a value. There used to be more work done at those
labels but now that they simply return a value use a return instead
of jumping to a label.
- Use guard() to simplify some of the code
Add guard() around some locking instead of jumping to a label to do
the unlocking.
- Use free() to simplify some of the code
Use free(kfree) on variables that will get freed on error and use
return_ptr() to return the variable when its not freed. There's one
instance where free(kfree) simplifies the code on a temp variable
that was allocated just for the function use.
* tag 'trace-ringbuffer-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Simplify functions with __free(kfree) to free allocations
ring-buffer: Make ring_buffer_{un}map() simpler with guard(mutex)
ring-buffer: Simplify ring_buffer_read_page() with guard()
ring-buffer: Simplify reset_disabled_cpu_buffer() with use of guard()
ring-buffer: Remove jump to out label in ring_buffer_swap_cpu()
ring-buffer: Removed unnecessary if() goto out where out is the next line
tracing: Reset last-boot buffers when reading out all cpu buffers
ring-buffer: Allow reserve_mem persistent ring buffers to be mmapped
ring-buffer: Do not trigger WARN_ON() due to a commit_overrun
ring-buffer: Move cpus_read_lock() outside of buffer->mutex
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Have module addresses get updated in the persistent ring buffer
The addresses of the modules from the previous boot are saved in the
persistent ring buffer. If the same modules are loaded and an address
is in the old buffer points to an address that was both saved in the
persistent ring buffer and is loaded in memory, shift the address to
point to the address that is loaded in memory in the trace event.
- Print function names for irqs off and preempt off callsites
When ignoring the print fmt of a trace event and just printing the
fields directly, have the fields for preempt off and irqs off events
still show the function name (via kallsyms) instead of just showing
the raw address.
- Clean ups of the histogram code
The histogram functions saved over 800 bytes on the stack to process
events as they come in. Instead, create per-cpu buffers that can hold
this information and have a separate location for each context level
(thread, softirq, IRQ and NMI).
Also add some more comments to the code.
- Add "common_comm" field for histograms
Add "common_comm" that uses the current->comm as a field in an event
histogram and acts like any of the other fields of the event.
- Show "subops" in the enabled_functions file
When the function graph infrastructure is used, a subsystem has a
"subops" that it attaches its callback function to. Instead of the
enabled_functions just showing a function calling the function that
calls the subops functions, also show the subops functions that will
get called for that function too.
- Add "copy_trace_marker" option to instances
There are cases where an instance is created for tooling to write
into, but the old tooling has the top level instance hardcoded into
the application. New tools want to consume the data from an instance
and not the top level buffer. By adding a copy_trace_marker option,
whenever the top instance trace_marker is written into, a copy of it
is also written into the instance with this option set. This allows
new tools to read what old tools are writing into the top buffer.
If this option is cleared by the top instance, then what is written
into the trace_marker is not written into the top instance. This is a
way to redirect the trace_marker writes into another instance.
- Have tracepoints created by DECLARE_TRACE() use trace_<name>_tp()
If a tracepoint is created by DECLARE_TRACE() instead of
TRACE_EVENT(), then it will not be exposed via tracefs. Currently
there's no way to differentiate in the kernel the tracepoint
functions between those that are exposed via tracefs or not. A
calling convention has been made manually to append a "_tp" prefix
for events created by DECLARE_TRACE(). Instead of doing this
manually, force it so that all DECLARE_TRACE() events have this
notation.
- Use __string() for task->comm in some sched events
Instead of hardcoding the comm to be TASK_COMM_LEN in some of the
scheduler events use __string() which makes it dynamic. Note, if
these events are parsed by user space it they may break, and the
event may have to be converted back to the hardcoded size.
- Have function graph "depth" be unsigned to the user
Internally to the kernel, the "depth" field of the function graph
event is signed due to -1 being used for end of boundary. What
actually gets recorded in the event itself is zero or positive.
Reflect this to user space by showing "depth" as unsigned int and be
consistent across all events.
- Allow an arbitrary long CPU string to osnoise_cpus_write()
The filtering of which CPUs to write to can exceed 256 bytes. If a
machine has 256 CPUs, and the filter is to filter every other CPU,
the write would take a string larger than 256 bytes. Instead of using
a fixed size buffer on the stack that is 256 bytes, allocate it to
handle what is passed in.
- Stop having ftrace check the per-cpu data "disabled" flag
The "disabled" flag in the data structure passed to most ftrace
functions is checked to know if tracing has been disabled or not.
This flag was added back in 2008 before the ring buffer had its own
way to disable tracing. The "disable" flag is now not always set when
needed, and the ring buffer flag should be used in all locations
where the disabled is needed. Since the "disable" flag is redundant
and incorrect, stop using it. Fix up some locations that use the
"disable" flag to use the ring buffer info.
- Use a new tracer_tracing_disable/enable() instead of data->disable
flag
There's a few cases that set the data->disable flag to stop tracing,
but this flag is not consistently used. It is also an on/off switch
where if a function set it and calls another function that sets it,
the called function may incorrectly enable it.
Use a new trace_tracing_disable() and tracer_tracing_enable() that
uses a counter and can be nested. These use the ring buffer flags
which are always checked making the disabling more consistent.
- Save the trace clock in the persistent ring buffer
Save what clock was used for tracing in the persistent ring buffer
and set it back to that clock after a reboot.
- Remove unused reference to a per CPU data pointer in mmiotrace
functions
- Remove unused buffer_page field from trace_array_cpu structure
- Remove more strncpy() instances
- Other minor clean ups and fixes
* tag 'trace-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (36 commits)
tracing: Fix compilation warning on arm32
tracing: Record trace_clock and recover when reboot
tracing/sched: Use __string() instead of fixed lengths for task->comm
tracepoint: Have tracepoints created with DECLARE_TRACE() have _tp suffix
tracing: Cleanup upper_empty() in pid_list
tracing: Allow the top level trace_marker to write into another instances
tracing: Add a helper function to handle the dereference arg in verifier
tracing: Remove unnecessary "goto out" that simply returns ret is trigger code
tracing: Fix error handling in event_trigger_parse()
tracing: Rename event_trigger_alloc() to trigger_data_alloc()
tracing: Replace deprecated strncpy() with strscpy() for stack_trace_filter_buf
tracing: Remove unused buffer_page field from trace_array_cpu structure
tracing: Use atomic_inc_return() for updating "disabled" counter in irqsoff tracer
tracing: Convert the per CPU "disabled" counter to local from atomic
tracing: branch: Use trace_tracing_is_on_cpu() instead of "disabled" field
ring-buffer: Add ring_buffer_record_is_on_cpu()
tracing: Do not use per CPU array_buffer.data->disabled for cpumask
ftrace: Do not disabled function graph based on "disabled" field
tracing: kdb: Use tracer_tracing_on/off() instead of setting per CPU disabled
tracing: Use tracer_tracing_disable() instead of "disabled" field for ftrace_dump_one()
...
|
|
The function rb_allocate_pages() allocates cpu_buffer and on error needs
to free it. It has a single return. Use __free(kfree) and return directly
on errors and have the return use return_ptr(cpu_buffer).
The function alloc_buffer() allocates buffer and on error needs to free
it. It has a single return. Use __free(kfree) and return directly on
errors and have the return use return_ptr(buffer).
The function __rb_map_vma() allocates a temporary array "pages". Have it
use __free() and not worry about freeing it when returning.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20250527143144.6edc4625@gandalf.local.home
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|