| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf report:
- Add 'comm_nodigit' sort key to combine similar threads that only
have different numbers in the comm. In the following example, the
'comm_nodigit' will have samples from all threads starting with
"bpfrb/" into an entry "bpfrb/<N>".
$ perf report -s comm_nodigit,comm -H
...
#
# Overhead CommandNoDigit / Command
# ........... ........................
#
20.30% swapper
20.30% swapper
13.37% chrome
13.37% chrome
10.07% bpfrb/<N>
7.47% bpfrb/0
0.70% bpfrb/1
0.47% bpfrb/3
0.46% bpfrb/2
0.25% bpfrb/4
0.23% bpfrb/5
0.20% bpfrb/6
0.14% bpfrb/10
0.07% bpfrb/7
- Support flat layout for symfs. The --symfs option is to specify the
location of debugging symbol files. The default 'hierarchy' layout
would search the symbol file using the same path of the original
file under the symfs root. The new 'flat' layout would search only
in the root directory.
- Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more
predicate flags.
perf stat:
- Add --pmu-filter option to select specific PMUs. This would be
useful when you measure metrics from multiple instance of uncore
PMUs with similar names.
# perf stat -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
19,417,779,115 hisi_sicl0_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl0_cpa0/cpa_p0_wr_dat/
0 hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/
19,417,751,103 hisi_sicl10_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl10_cpa0/cpa_p0_wr_dat/
0 hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/
19,417,730,679 hisi_sicl2_cpa0/cpa_cycles/ # 0.31 cpa_p0_avg_bw
75,635,749 hisi_sicl2_cpa0/cpa_p0_wr_dat/
18,520,640 hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/
19,417,674,227 hisi_sicl8_cpa0/cpa_cycles/ # 0.00 cpa_p0_avg_bw
0 hisi_sicl8_cpa0/cpa_p0_wr_dat/
0 hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/
0 hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/
19.417734480 seconds time elapsed
With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU.
# perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw
Performance counter stats for 'system wide':
6,234,093,559 cpa_cycles # 0.60 cpa_p0_avg_bw
50,548,465 cpa_p0_wr_dat
7,552,182 cpa_p0_rd_dat_64b
0 cpa_p0_rd_dat_32b
6.234139320 seconds time elapsed
Data type profiling:
- Quality improvements by tracking register state more precisely
- Ensure array members to get the type
- Handle more cases for global variables
Vendor event/metric updates:
- Update various Intel events and metrics
- Add NVIDIA Tegra 410 Olympus events
Internal changes:
- Verify perf.data header for maliciously crafted files
- Update perf test to cover more usages and make them robust
- Move a couple of copied kernel headers not to annoy objtool build
- Fix a bug in map sorting in name order
- Remove some unused codes
Misc:
- Fix module symbol resolution with non-zero text address
- Add -t/--threads option to `perf bench mem mmap`
- Track duration of exit*() syscall by `perf trace -s`
- Add core.addr2line-timeout and core.addr2line-disable-warn config
items"
* tag 'perf-tools-for-v7.1-2026-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (131 commits)
perf loongarch: Fix build failure with CONFIG_LIBDW_DWARF_UNWIND
perf annotate: Use jump__delete when freeing LoongArch jumps
perf test: Fixes for check branch stack sampling
perf test: Fix inet_pton probe failure and unroll call graph
perf build: fix "argument list too long" in second location
perf header: Add sanity checks to HEADER_BPF_BTF processing
perf header: Sanity check HEADER_BPF_PROG_INFO
perf header: Sanity check HEADER_PMU_CAPS
perf header: Sanity check HEADER_HYBRID_TOPOLOGY
perf header: Sanity check HEADER_CACHE
perf header: Sanity check HEADER_GROUP_DESC
perf header: Sanity check HEADER_PMU_MAPPINGS
perf header: Sanity check HEADER_MEM_TOPOLOGY
perf header: Sanity check HEADER_NUMA_TOPOLOGY
perf header: Sanity check HEADER_CPU_TOPOLOGY
perf header: Sanity check HEADER_NRCPUS and HEADER_CPU_DOMAIN_INFO
perf header: Bump up the max number of command line args allowed
perf header: Validate nr_domains when reading HEADER_CPU_DOMAIN_INFO
perf sample: Fix documentation typo
perf arm_spe: Improve SIMD flags setting
...
|
|
Building perf for LoongArch fails when CONFIG_LIBDW_DWARF_UNWIND is
enabled because unwind-libdw.o is still referenced in
arch/loongarch/util/Build.
Fixes: e62fae9d9e8 ("perf unwind-libdw: Fix a cross-arch unwinding bug")
Signed-off-by: WANG Rui <r@hev.cc>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Currently, the initialization of loongarch_jump_ops does not contain an
assignment to its .free field. This causes disasm_line__free() to fall
through to ins_ops__delete() for LoongArch jump instructions.
ins_ops__delete() will free ins_operands.source.raw and
ins_operands.source.name, and these fields overlaps with
ins_operands.jump.raw_comment and ins_operands.jump.raw_func_start.
Since in loongarch_jump__parse(), these two fields are populated by
strchr()-ing the same buffer, trying to free them will lead to undefined
behavior.
This invalid free usually leads to crashes:
Process 1712902 (perf) of user 1000 dumped core.
Stack trace of thread 1712902:
#0 0x00007fffef155c58 n/a (libc.so.6 + 0x95c58)
#1 0x00007fffef0f7a94 raise (libc.so.6 + 0x37a94)
#2 0x00007fffef0dd6a8 abort (libc.so.6 + 0x1d6a8)
#3 0x00007fffef145490 n/a (libc.so.6 + 0x85490)
#4 0x00007fffef1646f4 n/a (libc.so.6 + 0xa46f4)
#5 0x00007fffef164718 n/a (libc.so.6 + 0xa4718)
#6 0x00005555583a6764 __zfree (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x106764)
#7 0x000055555854fb70 disasm_line__free (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x2afb70)
#8 0x000055555853d618 annotated_source__purge (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x29d618)
#9 0x000055555852300c __hist_entry__tui_annotate (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x28300c)
#10 0x0000555558526718 do_annotate (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x286718)
#11 0x000055555852ed94 evsel__hists_browse (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x28ed94)
#12 0x000055555831fdd0 cmd_report (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x7fdd0)
#13 0x000055555839b644 handle_internal_command (/home/csmantle/dist/linux-arch/tools/perf/perf + 0xfb644)
#14 0x00005555582fe6ac main (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x5e6ac)
#15 0x00007fffef0ddd90 n/a (libc.so.6 + 0x1dd90)
#16 0x00007fffef0ddf0c __libc_start_main (libc.so.6 + 0x1df0c)
#17 0x00005555582fed10 _start (/home/csmantle/dist/linux-arch/tools/perf/perf + 0x5ed10)
ELF object binary architecture: LoongArch
... and it can be confirmed with Valgrind:
==1721834== Invalid free() / delete / delete[] / realloc()
==1721834== at 0x4EA9014: free (in /usr/lib/valgrind/vgpreload_memcheck-loongarch64-linux.so)
==1721834== by 0x4106287: __zfree (zalloc.c:13)
==1721834== by 0x42ADC8F: disasm_line__free (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x429B737: annotated_source__purge (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x42811EB: __hist_entry__tui_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x42848D7: do_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x428CF33: evsel__hists_browse (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== Address 0x7d34303 is 35 bytes inside a block of size 62 alloc'd
==1721834== at 0x4EA59B8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-loongarch64-linux.so)
==1721834== by 0x6B80B6F: strdup (strdup.c:42)
==1721834== by 0x42AD917: disasm_line__new (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x42AE5A3: symbol__disassemble_objdump (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x42AF0A7: symbol__disassemble (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x429B3CF: symbol__annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x429C233: symbol__annotate2 (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x42804D3: __hist_entry__tui_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x42848D7: do_annotate (in /home/csmantle/dist/linux-arch/tools/perf/perf)
==1721834== by 0x428CF33: evsel__hists_browse (in /home/csmantle/dist/linux-arch/tools/perf/perf)
This patch adds the missing free() specialization in loongarch_jump_ops,
which prevents disasm_line__free() from invoking the default cleanup
function.
Fixes: fb7fd2a14a503b9a ("perf annotate: Move raw_comment and raw_func_start fields out of 'struct ins_operands'")
Cc: stable@vger.kernel.org
Cc: WANG Rui <wangrui@loongson.cn>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: loongarch@lists.linux.dev
Signed-off-by: Rong Bao <rong.bao@csmantle.top>
Tested-by: WANG Rui <wangrui@loongson.cn>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When filtering branch stack samples on user events they sample in user
land but may have come from the kernel. Aarch64 avoids leaking the
kernel address for kaslr reasons but other platforms, for now,
don't. Be more permissive in allowing kernel addresses in the source
of user branch stacks.
When filtering branch stack samples on kernel events they sample in
kernel land but may have come from user land. Avoid the target being a
user address but allow the source to be in user land. Aarch64 may not
leak the user land addresses (making them 0) but other platforms
do. As the kernel address sampling implies privelege, just allow this.
Increase the duration of the system call sampling test to make the
likelihood of sampling a system call higher (increased from 1000 to
8000 loops - a number found through experimentation on an Intel
Tigerlake laptop), also make the period of the event a prime number.
Put unneeded perf record output into a temporary file so that the test
output isn't cluttered. More clearly state which test is running and
the pass, fail or skipped result of the test.
These changes make the test on an Intel tigerlake laptop reliably pass
rather than reliably fail.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When adding a probe for libc's inet_pton, perf probe may create multiple
probe points (e.g., due to inlining or multiple symbol resolutions),
resulting in multiple identical event names being output (e.g.,
`probe_libc:inet_pton_1`).
The script previously used a brittle pipeline (`tail -n +2 | head -n -5`)
and an awk script to extract the event name. When multiple probes were
added, awk would output the event name multiple times, which expanded
to multiple words in bash. This broke the subsequent `perf record` and
`perf probe -d` commands, causing the test to fail with:
`Error: another command except --add is set.`
Fix this by removing the brittle `tail/head` commands and appending
`| head -n 1` to the awk extraction. This ensures that only a single,
unique event name is captured, regardless of how many probe points
are created.
Additionally, the test artificially limited the backtrace size via
`max-stack=4` and did not specify dwarf call graphs for non-s390x
architectures. In newer libc versions where `inet_pton` is nested
deeper or compiled without frame pointers, `perf script` failed to resolve
the backtrace up to `/bin/ping`. Fix this by explicitly collecting
dwarf call-graphs for all architectures and increasing `max-stack` to 8.
Assisted-by: Gemini:gemini-3.1-pro-preview
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Turns out that displaying "RM $^" via quiet_cmd_rm can also upset the
shell and cause it to display "argument list too long".
Trying to quote $^ doesn't help.
In the end, *not* displaying the (potentially long) list of files is
probably the right thing to do for a "quiet" message, anyway. Instead,
let's display a count of how many files were removed. There is always
V=1 if more detail is required.
TEST linux/tools/perf/pmu-events/metric_test.log
RM ...634 orphan file(s)...
LD linux/tools/perf/util/perf-util-in.o
Also move the comment regarding xargs before the rule, so it doesn't
show up in the build output.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Validate the BTF entry count and individual data sizes when reading
HEADER_BPF_BTF from perf.data files to prevent excessive memory
allocation from malformed files.
Reuses the MAX_BPF_PROGS (131072) and MAX_BPF_DATA_LEN (256 MB)
limits from HEADER_BPF_PROG_INFO processing.
Cc: Song Liu <song@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add validation to process_bpf_prog_info() to harden against malformed
perf.data files:
- Upper bound on BPF program count (max 131072)
- Upper bound on per-program data_len (max 256MB)
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add upper bound checks in PMU capabilities processing to harden against
malformed perf.data files:
- nr_pmu bounded to MAX_PMU_MAPPINGS (4096) in process_pmu_caps()
- nr_pmu_caps bounded to MAX_PMU_CAPS (512) in __process_pmu_caps()
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add upper bound check on nr_nodes in process_hybrid_topology() to
harden against malformed perf.data files (reuses MAX_PMU_MAPPINGS,
4096).
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add upper bound check on cache entry count in process_cache() to harden
against malformed perf.data files (max 32768).
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add upper bound check on nr_groups in process_group_desc() to harden
against malformed perf.data files (max 32768), and move the env
assignment after validation.
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add upper bound check on pmu_num in process_pmu_mappings() to harden
against malformed perf.data files (max 4096).
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add validation to process_mem_topology() to harden against malformed
perf.data files:
- Upper bound check on nr_nodes (reuses MAX_NUMA_NODES, 4096)
- Minimum section size check before allocating
This is particularly important here since nr is u64, making unbounded
values especially dangerous.
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add validation to process_numa_topology() to harden against malformed
perf.data files:
- Upper bound check on nr_nodes (max 4096)
- Minimum section size check before allocating
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add validation to process_cpu_topology() to harden against malformed
perf.data files:
- Verify nr_cpus_avail was initialized (HEADER_NRCPUS processed first)
- Bounds check sibling counts (cores, threads, dies) against nr_cpus_avail
- Fix two bare 'return -1' that leaked env->cpu by using 'goto free_cpu'
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
While working on some cleanups sashiko questioned about pre-existing
issues, namely lacking sanity checks for perf.data headers, add some
with the help of Claude.
Cc: Ian Rogers <irogers@google.com>
Cc: Swapnil Sapkal <swapnil.sapkal@amd.com>
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
We need to do some upper limit validation, bump up the arbitrary limit
as per suggestion of Sashiko about command line wildcard expansion
ending up with more than 32768 args.
Link: https://sashiko.dev/#/patchset/20260408172846.96360-1-acme%40kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Further validate the HEADER_CPU_DOMAIN_INFO fields, this time checking
the nr_domains field.
Assisted-by: Claude Code:claude-opus-4-6
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
s/PEF/PERF/
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
"Before v7.0 is released, fix a few issues with the CFI patchset,
merged earlier in v7.0-rc, that primarily affect interfaces to
non-kernel code:
- Improve the prctl() interface for per-task indirect branch landing
pad control to expand abbreviations and to resemble the speculation
control prctl() interface
- Expand the "LP" and "SS" abbreviations in the ptrace uapi header
file to "branch landing pad" and "shadow stack", to improve
readability
- Fix a typo in a CFI-related macro name in the ptrace uapi header
file
- Ensure that the indirect branch tracking state and shadow stack
state are unlocked immediately after an exec() on the new task so
that libc subsequently can control it
- While working in this area, clean up the kernel-internal,
cross-architecture prctl() function names by expanding the
abbreviations mentioned above"
* tag 'riscv-for-linus-v7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
prctl: cfi: change the branch landing pad prctl()s to be more descriptive
riscv: ptrace: cfi: expand "SS" references to "shadow stack" in uapi headers
prctl: rename branch landing pad implementation functions to be more explicit
riscv: ptrace: expand "LP" references to "branch landing pads" in uapi headers
riscv: cfi: clear CFI lock status in start_thread()
riscv: ptrace: cfi: fix "PRACE" typo in uapi header
|
|
Fill in ASE and SME operations for the SIMD arch field.
Also set the predicate flags for SVE and SME, but differences between
them: SME does not have a predicate flag, so the setting is based on
events. SVE provides a predicate flag to indicate whether the predicate
is disabled, which allows it to be distinguished into four cases: full
predicates, empty predicates, fully predicated, and disabled predicates.
After:
perf report -s +simd
...
0.06% 0.06% sve-test sve-test [.] setz [p] SVE
0.06% 0.06% sve-test [kernel.kallsyms] [k] do_raw_spin_lock
0.06% 0.06% sve-test sve-test [.] getz [p] SVE
0.06% 0.06% sve-test [kernel.kallsyms] [k] timekeeping_advance
0.06% 0.06% sve-test sve-test [.] getz [d] SVE
0.06% 0.06% sve-test [kernel.kallsyms] [k] update_load_avg
0.06% 0.06% sve-test sve-test [.] getz [e] SVE
0.05% 0.05% sve-test sve-test [.] setz [e] SVE
0.05% 0.05% sve-test [kernel.kallsyms] [k] update_curr
0.05% 0.05% sve-test sve-test [.] setz [d] SVE
0.05% 0.05% sve-test [kernel.kallsyms] [k] do_raw_spin_unlock
0.05% 0.05% sve-test [kernel.kallsyms] [k] timekeeping_update_from_shadow.constprop.0
0.05% 0.05% sve-test sve-test [.] getz [f] SVE
0.05% 0.05% sve-test sve-test [.] setz [f] SVE
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Update SIMD architecture and predicate flags.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
According to the Arm ARM (ARM DDI 0487, L.a), section D18.2.6
"Events packet", apart from the empty predicate and partial
predicates, an SVE or SME operation can be predicate-disabled
or full predicated.
To provide complete results, introduce two predicate types for
these cases.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Support sort Advance SIMD extension (ASE) and SME.
Reviewed-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Running both tests cases 126 128 together causes the first test case
126 to fail:
# for i in $(seq 3); do ./perf test 'perf trace BTF general tests' \
'perf trace record and replay'; done
126: perf trace BTF general tests : FAILED!
128: perf trace record and replay : Ok
126: perf trace BTF general tests : FAILED!
128: perf trace record and replay : Ok
126: perf trace BTF general tests : FAILED!
128: perf trace record and replay : Ok
#
Test case 126 fails because test case 128 runs concurrently as can
be observed using a ps -ef | grep perf output list on a different
window. Both do a perf trace command concurrently.
Make test case 'perf trace BTF general tests' exclusive.
Output after:
# for i in $(seq 3); do ./perf test 'perf trace BTF general tests' \
'perf trace record and replay'; done
127: perf trace BTF general tests : Ok
155: perf trace record and replay : Ok
127: perf trace BTF general tests : Ok
155: perf trace record and replay : Ok
127: perf trace BTF general tests : Ok
155: perf trace record and replay : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
use_stdio was associated with struct perf_data and not perf_data_file
meaning there was implicit use of fd rather than fptr that may not be
safe. For example, in perf_data_file__write. Reorganize perf_data_file
to better abstract use_stdio, add kernel-doc and more consistently use
perf_data__ accessors so that use_stdio is better respected.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
As noticed in a sashiko review for a patch adding a missing libgen.h
in a file using basename():
https://sashiko.dev/#/patchset/20260402001740.2220481-1-acme%40kernel.org
So avoid these subtleties and instead reuse the gnu_basename() function
we had in srcline.c, renaming it to perf_basename() and replace
basename() calls with it, simplifying several cases by removing now
needless strdups.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Instead of using zalloc(nr_entries * sizeof_entry) that is what calloc()
does.
In some places where linux/zalloc.h isn't needed, remove it, add when
needed and was getting it indirectly.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
As suggested in an unrelated sashiko review:
https://sashiko.dev/#/patchset/20260407195145.2372104-1-acme%40kernel.org
"
Could a malformed perf.data file provide out-of-bounds values for cpu and
domain?
These variables are read directly from the file and used as indices for
cd_map and cd_map[cpu]->domains without any validation against
env->nr_cpus_avail or max_sched_domains.
Similar to the issue above, this is an existing lack of validation that
becomes apparent when looking at the allocation boundaries.
"
Validate it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Sashiko suggests we use some reasonable max number of args to avoid
overflows when reading perf.data files, do it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Those tables and variables don't change, better capture this by
explicitely using 'const'.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
`make check` will run sparse on the perf code base. A frequent warning
is "warning: symbol '...' was not declared. Should it be static?" Go
through and make global definitions without declarations static.
In some cases it is deliberate due to dlsym accessing the symbol, this
change doesn't clean up the missing declarations for perf test suites.
Sometimes things can opportunistically be made const.
Making somethings static exposed unused functions warnings, so
restructuring of ifdefs was necessary for that.
These changes reduce the size of the perf binary by 568 bytes.
Committer notes:
Refreshed the patch, the original one fell thru the cracks, updated the
size reduction.
Remove the trace-event-scripting.c changes, break the build, noticed
with container builds and with sashiko:
https://sashiko.dev/#/patchset/20260401215306.2152898-1-acme%40kernel.org
Also make two variables static to address another sashiko review
comment:
https://sashiko.dev/#/patchset/20260402001740.2220481-1-acme%40kernel.org
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Ankur Arora <ankur.a.arora@oracle.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <pjw@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In fef2a735167a827a ("perf tools: Kill die()") the die() function was
removed, but not the prototype in util.h, now when building with
LIBPERL=1, during a 'make -C tools/perf build-test' routine test, it is
failing as perl likes die() calls and then this clashes with this
remnant, remove it.
Fixes: fef2a735167a827a ("perf tools: Kill die()")
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Fixing:
util/symbol.c: In function ‘symbol__config_symfs’:
util/symbol.c:2499:20: error: assignment discards ‘const’ qualifier from pointer target type [-Werror=discarded-qualifiers]
2499 | layout_str = strrchr(dir, ',');
|
With recent gcc/glibc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When an parent is copied into a child the name array is populated in
address not name order. Make sure the name array isn't flagged as sorted.
Fixes: 659ad3492b91 ("perf maps: Switch from rbtree to lazily sorted array for addresses")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When an entry in the address array is replaced, the corresponding name
entry is replaced. The entries names may sort differently and so it is
important that the sorted by name property be cleared on the maps.
Fixes: 0d11fab32714 ("perf maps: Fixup maps_by_name when modifying maps_by_address")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Getting debug_file can trigger warnings if not set. Avoid getting
these warnings by pushing the use under the controlling if.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Remove global variable addr2line_timeout_ms and add it as a member
to symbol_conf structure.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
[namhyung: move the initialization to util/symbol.c]
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Make symbol_conf::addr2line_disable_warn configurable by reading
the perfconfig file.
Use section core and addr2line-disable-warn = value.
Update documentation.
Example:
# perf config -l
core.addr2line-timeout=5000
core.addr2line-disable-warn=1
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rename member symbol_conf::disable_add2line_warn to
symbol_conf::addr2line_disable_warn to make it consistent with other
addr2line_xxx constants.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Running perf sched stats requires root and it fails to open the
schedstat file for regular users. Let's skip the test.
$ perf sched stats true
Failed to open /proc/sys/kernel/sched_schedstats
Reviewed-by: Ian Rogers <irogers@google.com>
Tested-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When the evlist is expanded the metric leader wasn't being updated. As
the original evsel is deleted this creates a use-after-free in
stat-shadow's prepare_metric. This was detected running the "perf stat
--bpf-counters --for-each-cgroup test" with sanitizers.
The change itself puts the copied evsel into the priv field (known
unused because of evsel__clone use) and then in a second pass over the
list updates the copied values using the priv pointer.
Fixes: d1c5a0e86a4e ("perf stat: Add --for-each-cgroup option")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Sun Jian <sun.jian.kdev@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add the evsel from evsel__parse_sample into the struct
perf_sample. Sometimes we want to alter the evsel associated with a
sample, such as with off-cpu bpf-output events. In general the evsel
and perf_sample are passed as a pair, but this makes an altered evsel
something of a chore to keep checking for and setting up. Later
patches will remove passing an evsel with the perf_sample and switch
to just using the perf_sample's value.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The deferred stack trace code wasn't using perf_sample__init/exit. Add
the deferred stack trace clean up to perf_sample__exit which requires
proper NULL initialization in perf_sample__init. Make the
perf_sample__exit robust to being called more than once by using
zfree. Make the error paths in evsel__parse_sample exit the
sample. Add a merged_callchain boolean to capture that callchain is
allocated, deferred_callchain doen't suffice for this. Pack the struct
variables to avoid padding bytes for this.
Similiarly powerpc_vpadtl_sample wasn't using perf_sample__init/exit,
use it for consistency and potential issues with uninitialized
variables.
Similarly guest_session__inject_events in builtin-inject wasn't using
perf_sample_init/exit. The lifetime management for fetched events is
somewhat complex there, but when an event is fetched the sample should
be initialized and needs exiting on error. The sample may be left in
place so that future injects have access to it.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add kernel-doc for struct perf_sample capturing the somewhat unusual
population of fields and lifetime relationships.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Store cacheline size during perf record in header, so that cacheline
size can be used for other features, like sort keys for perf report.
Testing example with feat enabled:
$ perf record ./Example
$ perf report --header-only | grep -C 3 cacheline
CPU_DOMAIN_INFO info available, use -I to display
e_machine : 62
e_flags : 0
cacheline size: 64
missing features: TRACING_DATA BUILD_ID BRANCH_STACK GROUP_DESC AUXTRACE \
STAT CLOCKID DIR_FORMAT COMPRESSED CLOCK_DATA
========
[namhyung: Update the commit message and remove blank lines]
Signed-off-by: Ricky Ringler <ricky.ringler@proton.me>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Writing to the perf.data file can fail in various contexts such as
continual test. Other tests write to a mktemp-ed file, make the "perf
sched stats tests" follow this convention.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Swapnil Sapkal <swapnil.sapkal@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Doing a `perf sched record` then `perf sched stats report` crashes as
the tp_handler isn't set. Add a dummy tp_handler for it rather than
adding an extra check.
Reported-by: Ian Rogers <irogers@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Per Linus' comments requesting the replacement of "INDIR_BR_LP" in the
indirect branch tracking prctl()s with something more readable, and
suggesting the use of the speculation control prctl()s as an exemplar,
reimplement the prctl()s and related constants that control per-task
forward-edge control flow integrity.
This primarily involves two changes. First, the prctls are
restructured to resemble the style of the speculative execution
workaround control prctls PR_{GET,SET}_SPECULATION_CTRL, to make them
easier to extend in the future. Second, the "indir_br_lp" abbrevation
is expanded to "branch_landing_pads" to be less telegraphic. The
kselftest and documentation is adjusted accordingly.
Link: https://lore.kernel.org/linux-riscv/CAHk-=whhSLGZAx3N5jJpb4GLFDqH_QvS07D+6BnkPWmCEzTAgw@mail.gmail.com/
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Signed-off-by: Paul Walmsley <pjw@kernel.org>
|