| Age | Commit message (Collapse) | Author |
|
The line_len is only set on success. Check the return value instead.
util/hwmon_pmu.c: In function ‘perf_pmus__read_hwmon_pmus’:
util/hwmon_pmu.c:742:20: warning: ‘line_len’ may be used uninitialized [-Wmaybe-uninitialized]
742 | if (line_len > 0 && line[line_len - 1] == '\n')
| ^
util/hwmon_pmu.c:719:24: note: ‘line_len’ was declared here
719 | size_t line_len;
Fixes: 53cc0b351ec9 ("perf hwmon_pmu: Add a tool PMU exposing events from hwmon in sysfs")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To avoid hardcoding the offset value for synthetic event IDs
in multiple auxtrace modules (arm-spe, cs-etm, intel-pt, etc.),
and to improve code reusability, this patch unifies
the handling of the ID offset via a dedicated helper function.
Signed-off-by: tanze <tanze@kylinos.cn>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Commit b8308511f6e0 bumped the max events to 1024 but this results in
BPF verifier issues if the number of command line events is too
large. Workaround this by:
1) moving the constants to a header file to share between BPF and perf
C code,
2) testing that the maximum number of events doesn't cause BPF
verifier issues in debug builds,
3) lower the max events from 1024 to 128,
4) in perf stat, if there are more events than the BPF counters can
support then disable BPF counter usage.
The rodata setup is factored into its own function to avoid
duplicating it in the testing code.
Signed-off-by: Ian Rogers <irogers@google.com>
Fixes: b8308511f6e0 ("perf stat bperf cgroup: Increase MAX_EVENTS from 32 to 1024")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When the OpenCSD library introduces a new enumeration value (for example,
in the v1.7.1 release), the perf build fails with an error:
util/cs-etm-decoder/cs-etm-decoder.c:600:10: error: enumeration value 'OCSD_GEN_TRC_ELEM_ITMTRACE' not explicitly handled in switch [-Werror, -Wswitch-enum]
600 | switch (elem->elem_type) {
| ^~~~~~~~~~~~~~~
1 error generated.
Convert to if-else sentences to mute the enumeration value warning,
which can avoid build failures whenever the lib is updated.
No functional change.
Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add cputype definitions for Cortex-A720AE. These will be used for errata
detection in subsequent patches.
These values can be found in the Cortex-A720AE TRM:
https://developer.arm.com/documentation/102828/0001/
... in Table A-187
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Clang and GCC disagree with what constitutes a "declaration after
statement". GCC allows declarations in switch cases without an extra
block, as long as it's immediately after the label. Clang does not.
Unfortunately this is the case even in the latest versions of both
compilers. The only option that makes them behave in the same way is
-Wpedantic, which can't be enabled in Perf because of the number of
warnings it generates.
Add a block to fix the Clang build, which is the only thing we can do.
Fixes the build error:
ui/browsers/annotate.c:999:4: error: expected expression
struct annotation_line *al = NULL;
ui/browsers/annotate.c:1008:4: error: use of undeclared identifier 'al'
al = annotated_source__get_line(notes->src, offset);
ui/browsers/annotate.c:1009:24: error: use of undeclared identifier 'al'
browser->curr_hot = al ? &al->rb_node : NULL;
ui/browsers/annotate.c:1009:30: error: use of undeclared identifier 'al'
browser->curr_hot = al ? &al->rb_node : NULL;
ui/browsers/annotate.c:1000:8: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement]
s64 offset = annotate_browser__curr_hot_offset(browser);
Fixes: ad83f3b7155d ("perf c2c annotate: Start from the contention line")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Commit a808a2b35f66 ("tools build: Fix fixdep dependencies") broke the
perf build ("make -C tools/perf") by introducing two inadvertent
conflicts:
1) tools/build/Makefile includes tools/build/Makefile.include, which
defines a phony 'fixdep' target. This conflicts with the $(FIXDEP)
file target in tools/build/Makefile when OUTPUT is empty, causing
make to report duplicate recipes for the same target.
2) The FIXDEP variable in tools/build/Makefile conflicts with the
previously existing one in tools/perf/Makefile.perf.
Remove the unnecessary include of tools/build/Makefile.include from
tools/build/Makefile, and rename the FIXDEP variable in
tools/perf/Makefile.perf to FIXDEP_BUILT.
Fixes: a808a2b35f66 ("tools build: Fix fixdep dependencies")
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Link: https://patch.msgid.link/8881bc3321bd9fa58802e4f36286eefe3667806b.1760992391.git.jpoimboe@kernel.org
|
|
When tracking variable types, instructions that modify a pointer value
in an untracked way can lead to incorrect type propagation. To prevent
this, invalidate the register state when encountering such instructions.
This change invalidates pointer types for various arithmetic and bitwise
operations that current pointer offset tracking doesn't support, like
imul, shl, and, inc, etc.
A special case is added for 'xor reg, reg', which is a common idiom for
zeroing a register. For this, the register state is updated to be a
constant with a value of 0.
This could introduce slight regressions if a variable is zeroed and then
reused. This can be addressed in the future by using all DWARF locations
for instruction tracking instead of only the first one.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The tracked pointer offset was not being preserved in the stack state,
which could lead to incorrect type analysis. This change adds a
ptr_offset field to the type_state_stack struct and passes it to
set_stack_state and findnew_stack_state to ensure the offset is
preserved after the pointer is loaded from a stack location. It improves
the type annotation coverage and quality.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Track the arithmetic operations on registers with pointer types. We
handle only add, sub and lea instructions. The original pointer
information needs to be preserved for getting outermost struct types.
For example, reg0 points to a struct cfs_rq, when we add 0x10 to reg0,
it should preserve the information of struct cfs_rq + 0x10 in the
register instead of a pointer type to the child field at 0x10.
Details:
1. struct type_state_reg now includes an offset, indicating if the
register points to the start or an internal part of its associated
type. This offset is used in mem to reg and reg to stack mem
transfers, and also applied to the final type offset.
2. lea offset(%sp/%fp), reg is now treated as taking the address of a
stack variable. It worked fine in most cases, but an issue with this
approach is the pointer type may not exist.
3. lea offset(%base), reg is handled by moving the type from %base and
adding an offset, similar to an add operation followed by a mov reg
to reg.
4. Non-stack variables from DWARF with non-zero offsets in their
location expressions are now accepted with register offset tracking.
Multi-register addressing modes in LEA are not supported.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce TSR_KIND_POINTER to improve the data type profiler's ability
to track pointer-based memory accesses and address register variables.
TSR_KIND_POINTER represents that the location holds a pointer type to
the type in the type state. The semantics match the `breg` registers
that describe a memory location.
This change implements handling for this new kind in mov instructions
and in the check_matching_type() function. When a TSR_KIND_POINTER is
moved to the stack, the stack state size is set to the architecture's
pointer size.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Introduce a helper function is_address_gen_insn() to check
arch-dependent address generation instructions like lea in x86. Remove
type annotation on these instructions since they are not accessing
memory. It should be counted as `no_mem_ops`.
Signed-off-by: Zecheng Li <zecheng@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Check the error code of evsel__get_arch() in the symbol__annotate().
Previously it checked non-zero value but after the refactoring it does
only for negative values.
Fixes: 0669729eb0afb0cf ("perf annotate: Factor out evsel__get_arch()")
Suggested-by: James Clark <james.clark@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When perf report with annotation for a symbol, press 's' and 'T', then exit
the annotate browser. Once annotate the same symbol, the annotate browser
will crash.
The browser.arch was required to be correctly updated when data type
feature was enabled by 'T'. Usually it was initialized by symbol__annotate2
function. If a symbol has already been correctly annotated at the first
time, it should not call the symbol__annotate2 function again, thus the
browser.arch will not get initialized. Then at the second time to show the
annotate browser, the data type needs to be displayed but the browser.arch
is empty.
Stack trace as below:
Perf: Segmentation fault
-------- backtrace --------
#0 0x55d365 in ui__signal_backtrace setup.c:0
#1 0x7f5ff1a3e930 in __restore_rt libc.so.6[3e930]
#2 0x570f08 in arch__is perf[570f08]
#3 0x562186 in annotate_get_insn_location perf[562186]
#4 0x562626 in __hist_entry__get_data_type annotate.c:0
#5 0x56476d in annotation_line__write perf[56476d]
#6 0x54e2db in annotate_browser__write annotate.c:0
#7 0x54d061 in ui_browser__list_head_refresh perf[54d061]
#8 0x54dc9e in annotate_browser__refresh annotate.c:0
#9 0x54c03d in __ui_browser__refresh browser.c:0
#10 0x54ccf8 in ui_browser__run perf[54ccf8]
#11 0x54eb92 in __hist_entry__tui_annotate perf[54eb92]
#12 0x552293 in do_annotate hists.c:0
#13 0x55941c in evsel__hists_browse hists.c:0
#14 0x55b00f in evlist__tui_browse_hists perf[55b00f]
#15 0x42ff02 in cmd_report perf[42ff02]
#16 0x494008 in run_builtin perf.c:0
#17 0x494305 in handle_internal_command perf.c:0
#18 0x410547 in main perf[410547]
#19 0x7f5ff1a295d0 in __libc_start_call_main libc.so.6[295d0]
#20 0x7f5ff1a29680 in __libc_start_main@@GLIBC_2.34 libc.so.6[29680]
#21 0x410b75 in _start perf[410b75]
Fixes: 1d4374afd000 ("perf annotate: Add 'T' hot key to toggle data type display")
Reviewed-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The recent change for perf c2c annotate broke build without slang
support like below.
builtin-annotate.c: In function 'hists__find_annotations':
builtin-annotate.c:522:73: error: 'NO_ADDR' undeclared (first use in this function); did you mean 'NR_ADDR'?
522 | key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR);
| ^~~~~~~
| NR_ADDR
builtin-annotate.c:522:73: note: each undeclared identifier is reported only once for each function it appears in
builtin-annotate.c:522:31: error: too many arguments to function 'hist_entry__tui_annotate'
522 | key = hist_entry__tui_annotate(he, evsel, NULL, NO_ADDR);
| ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from util/sort.h:6,
from builtin-annotate.c:28:
util/hist.h:756:19: note: declared here
756 | static inline int hist_entry__tui_annotate(struct hist_entry *he __maybe_unused,
| ^~~~~~~~~~~~~~~~~~~~~~~~
And I noticed that it missed to update the other side of #ifdef
HAVE_SLANG_SUPPORT. Let's fix it.
Cc: Tianyou Li <tianyou.li@intel.com>
Fixes: cd3466cd2639783d ("perf c2c: Add annotation support to perf c2c report")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When doing an in source build, $(OUTPUT) is empty so the rule has the
same input and output file. Suppress the warning by only adding the rule
when doing an out of source build. The same condition already exists for
the clean rule for json files.
This fixes the following warnings:
make[3]: Circular pmu-events/arch/nds32/mapfile.csv <- pmu-events/arch/nds32/mapfile.csv dependency dropped.
make[3]: Circular pmu-events/arch/powerpc/mapfile.csv <- pmu-events/arch/powerpc/mapfile.csv dependency dropped.
...
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
JDIR is unused since commit 4bb55de4ff03 ("perf jevents: Support copying
the source json files to OUTPUT"), remove it.
Signed-off-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The unquoted glob *.json will expand to a real file if, for example,
there is any file in the Perf source ending in .json. This can happen
when using tools like Bear and clangd which generate a
compile_commands.json file. With the glob already expanded by the shell,
the find command will fail to wildcard any real json events files.
Fix it by wrapping the star in quotes so it's passed to find rather than
the shell.
This fixes the following build error (most of the diff output omitted):
$ make V=1 -C tools/perf O=/tmp/perf_build_with_json
TEST /tmp/perf_build_with_json/pmu-events/empty-pmu-events.log
...
/* offset=121053 */ "node-access\000legacy cache\000Local memory read accesses\000legacy-cache-config=6\000\00010\000\000\000\000\000"
/* offset=121135 */ "node-misses\000legacy cache\000Local memory read misses\000legacy-cache-config=0x10006\000\00010\000\000\000\000\000"
/* offset=121221 */ "node-miss\000legacy cache\000Local memory read misses\000legacy-cache-config=0x10006\000\00010\000\000\000\000\000"
...
- {
.event_table = { 0, 0 },
.metric_table = { 0, 0 },
},
make[3]: *** [pmu-events/Build:54: /tmp/perf_build_with_json/pmu-events/empty-pmu-events.log] Error 1
Fixes: 4bb55de4ff03 ("perf jevents: Support copying the source json files to OUTPUT")
Signed-off-by: James Clark <james.clark@linaro.org>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Events with an X modifier were reordered within a group, for example
slots was made the leader in:
```
$ perf record -e '{cpu/mem-stores/ppu,cpu/slots/uX}' -- sleep 1
```
Fix by making `dont_regroup` evsels always use their index for
sorting. Make the cur_leader, when fixing the groups, be that of
`dont_regroup` evsel so that the `dont_regroup` evsel doesn't become a
leader.
On a tigerlake this patch corrects this and meets expectations in:
```
$ perf stat -e '{cpu/mem-stores/,cpu/slots/uX}' -a -- sleep 0.1
Performance counter stats for 'system wide':
83,458,652 cpu/mem-stores/
2,720,854,880 cpu/slots/uX
0.103780587 seconds time elapsed
$ perf stat -e 'slots,slots:X' -a -- sleep 0.1
Performance counter stats for 'system wide':
732,042,247 slots (48.96%)
643,288,155 slots:X (51.04%)
0.102731018 seconds time elapsed
```
Closes: https://lore.kernel.org/lkml/18f20d38-070c-4e17-bc90-cf7102e1e53d@linux.intel.com/
Fixes: 035c17893082 ("perf parse-events: Add 'X' modifier to exclude an event from being regrouped")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add support to highlight the contention line in the annotate browser,
use 'TAB'/'UNTAB' to refocus to the contention line.
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Jiebin Sun <jiebin.sun@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com>
Reviewed-by: Wangyang Guo <wangyang.guo@intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Perf c2c report currently specified the code address and source:line
information in the cacheline browser, while it is lack of annotation
support like perf report to directly show the disassembly code for
the particular symbol shared that same cacheline. This patches add
a key 'a' binding to the cacheline browser which reuse the annotation
browser to show the disassembly view for easier analysis of cacheline
contentions.
Signed-off-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Reviewed-by: Jiebin Sun <jiebin.sun@intel.com>
Reviewed-by: Pan Deng <pan.deng@intel.com>
Reviewed-by: Zhiguo Zhou <zhiguo.zhou@intel.com>
Reviewed-by: Wangyang Guo <wangyang.guo@intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The MAX_EVENTS value ensured a counted loop presumably to satisfy the
BPF verifier. It is possible to go past 32 events when gathering
uncore events. Increase the amount to 1024 as that should provide some
amount of headroom.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Duplicate metrics may exist on hybrid platforms, with the metric's PMU
being used to select the metric to use. Incorporate the metric PMU
into the ilist display and support opening it just for a given PMU.
Before:
```
⭘ Interactive Perf List
├── ▼ TopdownL1 tma_backend_bound
│ ├── tma_backend_bound Counts the total number of issue slots that were
│ ├── ▶ tma_backend_bound_group not consumed by the backend due to backend stalls
│ ├── tma_backend_bound Counts the total number of issue slots that were
│ ├── ▶ tma_backend_bound_group not consumed by the backend due to backend stalls.
│ ├── tma_bad_speculation Note that uops must be available for consumption
│ ├── ▶ tma_bad_speculation_group in order for this event to count. If a uop is not
│ ├── tma_bad_speculation available (IQ is empty), this event will not count
│ ├── ▶ tma_bad_speculation_group cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 *
│ ├── tma_frontend_bound cpu_atom@CPU_CLK_UNHALTED.CORE@)
│ ├── ▶ tma_frontend_bound_group tma_backend_bound > 0.1
│ ├── tma_frontend_bound ▆▆
│ ├── ▶ tma_frontend_bound_group
│ ├── tma_retiring
│ ├── ▶ tma_retiring_group
│ ├── tma_retiring
│ └── ▶ tma_retiring_group
├── ▶ TopdownL2
total▄▄▅▅▆▅▅▂▁▁▁▁▂▃▂▂▃▄▄▇▇█▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▅▅▅▄▆▆▆▅▅▅▅▅▅▇▇▇▇▆▅▆▆▆▆▅▅▅▄▃▃▃▃▃▃▃▃▃▃▄▄▄▅▅▅▅▅▆▆▆▆▆▆
cpu0▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu1▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu2▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
cpu3▁▁▁▁▁▁▁▁▁▄▄▄▄▄▄▄▄▄▄█████▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅
cpu4████▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂
cpu5▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▇▇▇▇▇▇▇▇▇▇▆▆
cpu6▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
cpu7▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu8▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu9▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂█████▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅
cpu10▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu11▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▁▁▁▁▁▁▁▁▁
cpu12▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu13▁▁▁▁▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu14▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇█████▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
cpu15▁▁▁▁▁▁▁▁▁▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu16▃▃▃▃▃▃▃▃▃▄▄▃▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▃▃▃▃▃▃▃▃▄▄▄▄▄▄▁▁▃▃▃▃▃▃▃▃▃▃▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
cpu17▁▁▁▁▁▄▄▅▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▃▃▃▃▂▂▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▄▄▄▄▃▃▃▃▂▂▂▂▄▄▄▄▄▄▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
s Search n Next p Previous c Collapse ^q Quit ▏^p palette
```
After:
```
⭘ Interactive Perf List
├── ▼ TopdownL1 tma_backend_bound
│ ├── tma_backend_bound (cpu_atom) Counts the total number of issue slots that were
│ ├── ▼ tma_backend_bound_group (cpu_atom) not consumed by the backend due to backend stalls
│ │ ├── tma_core_bound (cpu_atom) Counts the total number of issue slots that were
│ │ ├── ▶ tma_core_bound_group (cpu_atom not consumed by the backend due to backend stalls.
│ │ ├── tma_resource_bound (cpu_atom) Note that uops must be available for consumption
│ │ └── ▶ tma_resource_bound_group (cpu_ in order for this event to count. If a uop is not
│ ├── tma_backend_bound (cpu_core) available (IQ is empty), this event will not count
│ ├── ▶ tma_backend_bound_group (cpu_core) cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 *
│ ├── tma_bad_speculation (cpu_atom) cpu_atom@CPU_CLK_UNHALTED.CORE@)
│ ├── ▶ tma_bad_speculation_group (cpu_ato▆▆tma_backend_bound > 0.1
│ ├── tma_bad_speculation (cpu_core)
│ ├── ▶ tma_bad_speculation_group (cpu_cor▃▃
│ ├── tma_frontend_bound (cpu_atom)
│ ├── ▶ tma_frontend_bound_group (cpu_atom
│ ├── tma_frontend_bound (cpu_core)
│ ├── ▶ tma_frontend_bound_group (cpu_core
▌
total▁▁▁▁▂▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▃▃▃▂▂▂▂▃▃▃▄▄▄▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▄▄▄▄▄▅▅▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▆▇▇
cpu16▇▇▇▇▇▇▇▇▇▇▇▆▆▁▁▁▁▁▁▁▁▁▁▁▁▂▂▄▄▅▅▅▅▅▅▆▆▆▆▆▆▆▆▇▇▇▇▆▆▆▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▆▆▆▆▄▄▄▄▃▃▄▄▄▄▇▇▇▇▇▇▇▇▇▇
cpu17█▇▇▇▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▃▃▃▃▂▂▁▁▅▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▄▄▄▄▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▅▅▄▄▂▂▇▇▇▇▆▆▅▅▆▆
cpu18▇▇▇▇▇██▇▇▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▃▃▃▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▄▄▄▄▄▅▅▅▅▅▅▅▅
cpu19▇▃▃▄▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▁▁▂▂▃▃▃▃▅▅▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇██▇▇▇▇▇▇▆▆▅▅▅▅▆▆▄▄▄▄▅▅
cpu20▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▄▄▄▄▅▅▅▅▅▅▆▆▇▇
cpu21▇▇▇▇▇▇▇██▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▅▅▄▄▂▂▂▂▂▂▁▁▁▁
cpu22█▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▆▆▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▂▂▁▁▁▁▂▂▂▂▂▂▂▂
cpu23▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▄▄▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▅▇▇▇▇▆▆██▇▇▇▇▇▇
cpu24▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▃▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▅▅▇▇▆▆▆▆▆▆▇▇▇▇
cpu25▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▆▆▆▆▇▇▇▇▇▇██
cpu26▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▄▄▇▇▇▇▇▇▇▇▇▇██▇▇▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▆▆▇▇▇▇▇▇▇▇▆▆▆▆▆▆▆▆▆▆▂▂▁▁▁▁▂▂▂▂▂▂▂▂
cpu27▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▃▄▄▄▄▅▅▅▅▅▅▇▇
total 7.4923074548462605
cpu16 0.2961618003253457
cpu17 0.3065719718925585
cpu18 0.27800656881051855
cpu19 0.28564742078353406
cpu20 0.2764790653117084
s Search n Next p Previous c Collapse ^q Quit ▏^p palette
```
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add an optional PMU argument to parse_metrics to allow restriction of
the particular metrics to be opened. If no argument is provided then
all metrics with the given name/group are opened
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Unsupported legacy events are flagged as deprecated. Don't display
these events in ilist as they won't open and there are over 1,000
legacy cache events.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Synthesizing mmaps in perf trace is unnecessary unless call chains are
being generated.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Howard Chu <howardchu95@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add TEST_ASSERT_EVSEL to dump the failing evsel in the event of a
failure.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add TEST_ASSERT_EVLIST to dump the failing evlist in the event of a
failure.
Add the macro to a number of tests not currently checking the evlist
length.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Just have a single test_hw_config helper that strips extended type
information in the case of hardware and hardware cache events.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Without a PMU perf matches an event against any PMU with the
event. Unfortunately some PMU drivers advertise a "cycles" event which
is typically just a core event. As tests assume a core event, switch
to use "cpu-cycles" that avoids the overloaded "cycles" event on
troublesome PMUs and is so far not overloaded. Note, on x86 this
changes a legacy event into a sysfs one.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In the event parse string, switch "cpu" to "default_core" and then
rewrite this to the first core PMU name prior to parsing. This enables
testing with a PMU on hybrid x86 and other systems that don't use
"cpu" for the core PMU name. The name "default_core" is already used
by jevents. Update test expectations to match.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Without a PMU perf matches an event against any PMU with the
event. Unfortunately some PMU drivers advertise a "cycles" event which
is typically just a core event. Switch to using "cpu-cycles" which is
an indentical legacy event but avoids the multiple PMU confusion
introduced by the PMU drivers. Note, on x86 cpu-cycles is also a sysfs
event but cycles isn't.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Switch from the test's assert_hw/test_config to the common
evsel__match code that appropriately handles events with both legacy
and sysfs/json encoding.
For tests asserting that a config value matches that placed in the
perf_event_attr just directly compare the config values.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Ensure both the perf_event_attr and alternate_hw_config are checked in
the match. Don't mask the config if the perf_event_attr isn't a
HARDWARE or HW_CACHE event. Add common early exit cases.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than wildcard matching the cycles event specify only the core
PMUs. This avoids potentially loading unnecessary uncore PMUs.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than distributing the code doing similar things to
evlist__new_default, use the one implementation so that paranoia and
wildcard scanning can be optimized.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than distributing the code doing similar things to
evlist__new_default, use the one implementation so that paranoia and
wildcard scanning can be optimized.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now that legacy hardware and cache events are in json, having the
lexer match the specific event is no longer necessary and generic PMU
parsing can take place. Because of this remove the specific term
parsing, event adding, and passing of alternate_hw_config which was
now always PERF_COUNT_HW_MAX.
This mirrors a similar change for software events in commit 6e9fa4131abb
("perf parse-events: Remove non-json software events").
With no hard coded legacy hardware or cache events the wild card, case
insensitivity, etc. is consistent for events. This does, however, mean
events like cycles will wild card against all PMUs. A change does the
same was originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
and reverted by Linus in commit 4f1b067359ac ("Revert "perf
parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
his dislike for the cycles behavior on ARM. Earlier patches in this
series make perf record event opening failures non-fatal and hide the
cycles event's failure to open on ARM in perf record, so it is
expected the behavior will now be transparent in perf record. perf
stat with a cycles event will wildcard open the event on all PMUs. As
cycles is a "default event", the perf stat behavior for default events
was updated to only open them on core/software PMUs.
The change to support legacy events with PMUs was done to clean up
Intel's hybrid PMU implementation. Having sysfs/json events with
increased priority to legacy was requested by Mark Rutland
<mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
events on that PMU. It was requested that RISC-V be able to add events
to the perf tool json so the PMU driver didn't need to map legacy
events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
A previous series of patches decreasing legacy hardware event
priorities was posted in:
https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
software events can be implemented similarly:
https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now legacy hardware events are in json there's no need for a specific
printing routine that previously served for both hardware and software
events. The associated event_symbols_hw is also removed. To support
the previous filtered version use an event glob of "legacy hardware"
which matches the topic of the json events.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now legacy cache events are in json there's no need for a specific
printing routine. To support the previous filtered version use an
event glob of "legacy cache" which matches the topic of the json
events.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The legacy-hardware.json is added containing hardware events similarly
to the software.json file. A difference is that for the software PMU
the name is known and matches sysfs. In the legacy-hardware.json no
Unit/PMU is specified for the events meaning default_core is used and
the events will appear for all core PMUs.
There are potentially 1216 legacy cache events, rather than list them
in a json file add a make_legacy_cache.py helper to generate them.
By using json for legacy hardware and cache events: descriptions of
the events can be added; events can be marked as deprecated, such as
those misleadingly named l2 (deprecated is also used to mark all
events that weren't previously displayed in perf list); and the name
lookup becomes case insensitive.
The C string encoding all the perf events and metrics is increased in
size by 123,499 bytes which will increase the perf binary size. Later
changes will remove hard coded event parsing for legacy hardware and
cache events, turning parsing overhead into a binary search during
event lookup.
That event descriptions are based off of those in perf_event_open man
page, credit to Vince Weaver <vincent.weaver@maine.edu>.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add support to finding/adding events from the default_core event
table. If an event already exists from sysfs/json then the
default_core configuration is saved in the legacy_terms string. Lazily
use the legacy_terms string to set a legacy hardware or cache event as
deprecated if the core PMU doesn't support it. Use the legacy terms
string to set the alternate_hw_config, avoiding the value needing to
be passed from the parse_events parser.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add json LegacyConfigCode and LegacyCacheCode values that translate to
legacy-hardware-config and legacy-cache-config event terms
respectively.
Add perf_pmu__default_core_events_table as a means to find a
default_core event table that will later contain legacy events.
In situations like hypervisors it is more likely that tables will be
NULL. Rather than testing in the calling PMU code, early exit in the
pmu-event.c routines.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add the PMU terms legacy-hardware-config and
legacy-cache-config. These terms are similar to the config term in
that their values are assigned to the perf_event_attr config
value. They differ in that the PMU type is switched to be either
PERF_TYPE_HARDWARE or PERF_TYPE_HW_CACHE, and the PMU type is moved
into the extended type information of the config value. This will
allow later patches to add legacy events to json.
An example use of the terms is in the following:
```
$ perf stat -vv -e 'cpu/legacy-hardware-config=1/,cpu/legacy-cache-config=0x10001/' true
Using CPUID GenuineIntel-6-8D-1
Attempt to add: cpu/legacy-hardware-config=0x1/
..after resolving event: cpu/legacy-hardware-config=0x1/
Attempt to add: cpu/legacy-cache-config=0x10001/
..after resolving event: cpu/legacy-cache-config=0x10001/
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0x1 (PERF_COUNT_HW_INSTRUCTIONS)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 3 (PERF_TYPE_HW_CACHE)
size 136
config 0x10001 (PERF_COUNT_HW_CACHE_RESULT_MISS | PERF_COUNT_HW_CACHE_OP_READ | PERF_COUNT_HW_CACHE_L1I)
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
------------------------------------------------------------
sys_perf_event_open: pid 994937 cpu -1 group_fd -1 flags 0x8 = 4
cpu/legacy-hardware-config=1/: -1: 1364046 414756 414756
cpu/legacy-cache-config=0x10001/: -1: 57453 414756 414756
cpu/legacy-hardware-config=1/: 1364046 414756 414756
cpu/legacy-cache-config=0x10001/: 57453 414756 414756
Performance counter stats for 'true':
1,364,046 cpu/legacy-hardware-config=1/
57,453 cpu/legacy-cache-config=0x10001/
0.001988593 seconds time elapsed
0.002194000 seconds user
0.000000000 seconds sys
```
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Factor existing functionality in perf_pmu__name_from_config into a
helper that will be used in later patches.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The FILE argument was necessary for the scanner but now that
functionality is not being used we can switch to just using
io__getline which should cut down on stdio buffer usage.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Now the events file isn't directly parsed from a FILE but stored in a
string prior to parsing, remove the FILE argument to the associated
scanner functions as they only ever pass NULL.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
When an event/alias is created for a PMU the terms are eagerly parsed
using parse_events_terms. For a command like perf stat or perf record,
the particular event/alias will be found, the terms parsed, the
terms cloned for use in the event parsing, and then the terms used to
configure the perf_event_attr. Events/aliases may be eagerly loaded,
such as from sysfs or in perf list, in which case the aliases terms
will be little or never used. To avoid redundant work, to avoid
cloning, and to reduce memory overhead, hold the terms for an event as
a string until they need handling as a term list. This may introduce
duplicate parsing if an event is repeated in a list, but this
situation is expected to be uncommon.
Measuring the number of instructions before and after with a sysfs
event and perf stat, there is a minor reduction in the number of
instructions executed by 0.3%.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The jevents command expects all json files to be organized under a
single directory. When generating json files from scripts (to reduce
laborious copy and paste in the json) we don't want to generate the
json into the source directory if there is an OUTPUT directory
specified. This change adds a GEN_JSON for this case where the
GEN_JSON copies the JSON files to OUTPUT, only when OUTPUT is
specified. The Makefile.perf clean code is updated to clean up this
directory when present.
This patch is part of:
https://lore.kernel.org/lkml/20240926173554.404411-12-irogers@google.com/
which was similarly adding support for generating json in scripts for
the consumption of jevents.py.
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Whilst for many tools it is an expected behavior that failure to open
a perf event is a failure, ARM decided to name PMU events the same as
legacy events and then failed to rename such events on a server uncore
SLC PMU. As perf's default behavior when no PMU is specified is to
open the event on all PMUs that advertise/"have" the event, this
yielded failures when trying to make the priority of legacy and
sysfs/json events uniform - something requested by RISC-V and ARM. A
legacy event user on ARM hardware may find their event opened on an
uncore PMU which for perf record will fail. Arnaldo suggested skipping
such events which this patch implements. Rather than have the skipping
conditional on running on ARM, the skipping is done on all
architectures as such a fundamental behavioral difference could lead
to problems with tools built/depending on perf.
An example of perf record failing to open events on x86 is:
```
$ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
$ perf report --stats
Aggregated stats:
TOTAL events: 17255
MMAP events: 284 ( 1.6%)
COMM events: 1961 (11.4%)
EXIT events: 1 ( 0.0%)
FORK events: 1960 (11.4%)
SAMPLE events: 87 ( 0.5%)
MMAP2 events: 12836 (74.4%)
KSYMBOL events: 83 ( 0.5%)
BPF_EVENT events: 36 ( 0.2%)
FINISHED_ROUND events: 2 ( 0.0%)
ID_INDEX events: 1 ( 0.0%)
THREAD_MAP events: 1 ( 0.0%)
CPU_MAP events: 1 ( 0.0%)
TIME_CONV events: 1 ( 0.0%)
FINISHED_INIT events: 1 ( 0.0%)
cycles stats:
SAMPLE events: 87
```
If all events fail to open then the perf record will fail:
```
$ perf record -e LLC-prefetch-read true
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
Error:
Failure to open any events for recording
```
As an evlist may have dummy events that open when all command line
events fail we ignore dummy events when detecting if at least some
events open. This still permits the dummy event on its own to be used
as a permission check:
```
$ perf record -e dummy true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.046 MB perf.data ]
```
but allows failure when a dummy event is implicilty inserted or when
there are insufficient permissions to open it:
```
$ perf record -e LLC-prefetch-read -a true
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
Error:
Failure to open any events for recording
```
As the first parsed event in an evlist is marked as tracking, removing
this event can remove tracking from the evlist, removing mmap events
and breaking symbolization. To avoid this, if a tracking event is
removed then the next event has tracking added.
The issue with legacy events is that on RISC-V they want the driver to
not have mappings from legacy to non-legacy config encodings for each
vendor/model due to size, complexity and difficulty to update. It was
reported that on ARM Apple-M? CPUs the legacy mapping in the driver
was broken and the sysfs/json events should always take precedent,
however, it isn't clear this is still the case. It is the case that
without working around this issue a legacy event like cycles without a
PMU can encode differently than when specified with a PMU - the
non-PMU version favoring legacy encodings, the PMU one avoiding legacy
encodings. Legacy events are also case sensitive while sysfs/json
events are not.
The patch removes events and then adjusts the idx value for each
evsel. This is done so that the dense xyarrays used for file
descriptors, etc. don't contain broken entries.
On ARM it could be common following this change to see a lot of
warnings for the cycles event due to many ARM PMUs advertising the
cycles event (ARM inconsistently have events bus_cycles and then
cycles implying CPU cycles, they also sometimes have a cpu_cycles
event). As cycles is a popular event, avoid potentially spamming users
with error messages on ARM when there are multiple cycles events in
the evlist, the error is still shown when verbose is enabled.
Prior versions without adding the tracking data and not warning for
cycles on ARM was:
Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|