linux.git/tools/perf, branch v5.18-rc2

perf annotate: Drop objdump stderr to avoid getting stuck waiting for stdout output

2022-04-09T17:21:00+00:00

If objdump writes to stderr it can block waiting for it to be read. As
perf doesn't read stderr then progress stops with perf waiting for
stdout output.

Signed-off-by: Ian Rogers 
Cc: Alexander Shishkin 
Cc: Alexandre Truong 
Cc: Dave Marchevsky 
Cc: Denis Nikitin 
Cc: German Gomez 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Leo Yan 
Cc: Lexi Shao 
Cc: Li Huafei 
Cc: Mark Rutland 
Cc: Martin Liška 
Cc: Masami Hiramatsu 
Cc: Mathieu Poirier 
Cc: Michael Petlan 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Ravi Bangoria 
Cc: Remi Bernon 
Cc: Riccardo Mancini 
Cc: Song Liu 
Cc: Stephane Eranian 
Cc: Thomas Richter 
Cc: Will Deacon 
Cc: William Cohen 
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220407230503.1265036-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf tools: Add external commands to list-cmds

2022-04-09T17:21:00+00:00

The `perf --list-cmds` output prints only internal commands, although
there is no reason for that from users' perspective.

Adding the external commands to commands array with NULL function
pointer allows printing all perf commands while not changing the logic
of command handler selection.

Signed-off-by: Michael Petlan 
Acked-by: Ian Rogers 
Cc: Jiri Olsa 
Link: https://lore.kernel.org/r/20220404221541.30312-2-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo

perf docs: Add perf-iostat link to manpages

2022-04-09T17:20:59+00:00

Signed-off-by: Michael Petlan 
Acked-by: Ian Rogers 
Cc: Jiri Olsa 
Link: https://lore.kernel.org/r/20220404221541.30312-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo

perf session: Remap buf if there is no space for event

2022-04-09T17:20:59+00:00

If a perf event doesn't fit into remaining buffer space return NULL to
remap buf and fetch the event again.

Keep the logic to error out on inadequate input from fuzzing.

This fixes perf failing on ChromeOS (with 32b userspace):

  $ perf report -v -i perf.data
  ...
  prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data?
  Error:
  failed to process sample

Fixes: 57fc032ad643ffd0 ("perf session: Avoid infinite loop when seeing invalid header.size")
Reviewed-by: James Clark 
Signed-off-by: Denis Nikitin 
Acked-by: Jiri Olsa 
Cc: Alexander Shishkin 
Cc: Alexey Budankov 
Cc: Namhyung Kim 
Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo

perf bench: Fix epoll bench to correct usage of affinity for machines with #CPUs > 1K

2022-04-09T15:34:29+00:00

The 'perf bench epoll' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench epoll all

Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.

perf: pthread_create: No such file or directory
<<>>

In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_*  function. Though the logs shows direct
failure from pthread_create, the actual failure is from
"sched_setaffinity" returning EINVAL (invalid argument).

This happens because the default mask size in glibc is 1024. To overcome
this 1024 CPUs mask size limitation of cpu_set_t, change the mask size
using the CPU_*_S macros.

Patch addresses this by fixing all the epoll benchmarks to use CPU_ALLOC
to allocate cpumask, CPU_ALLOC_SIZE for size, and CPU_SET_S to set the
mask.

Reported-by: Disha Goel 
Signed-off-by: Athira Jajeev 
Tested-by: Disha Goel 
Acked-by: Ian Rogers 
Cc: Jiri Olsa 
Cc: Kajol Jain 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Nageswara R Sastry 
Cc: Srikar Dronamraju 
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220406175113.87881-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo

perf bench: Fix futex bench to correct usage of affinity for machines with #CPUs > 1K

2022-04-09T15:34:29+00:00

The 'perf bench futex' testcase fails on systems with more than 1K CPUs.

Testcase: perf bench futex all

Failure snippet:
<<>>Running futex/hash benchmark...

perf: pthread_create: No such file or directory
<<>>

All the futex benchmarks (ie hash, lock-api, requeue, wake,
wake-parallel), pthread_create is invoked in respective bench_futex_*
function. Though the logs shows direct failure from pthread_create,
strace logs showed that actual failure is from  "sched_setaffinity"
returning EINVAL (invalid argument).

This happens because the default mask size in glibc is 1024. To overcome
this 1024 CPUs mask size limitation of cpu_set_t, change the mask size
using the CPU_*_S macros.

Patch addresses this by fixing all the futex benchmarks to use CPU_ALLOC
to allocate cpumask, CPU_ALLOC_SIZE for size, and CPU_SET_S to set the
mask.

Reported-by: Disha Goel 
Reviewed-by: Srikar Dronamraju 
Signed-off-by: Athira Jajeev 
Tested-by: Disha Goel 
Acked-by: Ian Rogers 
Cc: Jiri Olsa 
Cc: Kajol Jain 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Nageswara R Sastry 
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220406175113.87881-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo

perf tools: Fix perf's libperf_print callback

2022-04-09T15:34:29+00:00

eprintf() does not expect va_list as the type of the 4th parameter.

Use veprintf() because it does.

Signed-off-by: Adrian Hunter 
Fixes: 428dab813a56ce94 ("libperf: Merge libperf_set_print() into libperf_init()")
Cc: Jiri Olsa 
Link: https://lore.kernel.org/r/20220408132625.2451452-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo

perf: arm-spe: Fix perf report --mem-mode

2022-04-09T15:34:29+00:00

Since commit bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't allow opening the file unless one of the events has
PERF_SAMPLE_DATA_SRC set.

SPE doesn't have this set even though synthetic memory data is generated
after it is decoded. Fix this issue by setting DATA_SRC on SPE events.
This has no effect on the data collected because the SPE driver doesn't
do anything with that flag and doesn't generate samples.

Fixes: bb30acae4c4dacfa ("perf report: Bail out --mem-mode if mem info is not available")
Signed-off-by: James Clark 
Tested-by: Leo Yan 
Acked-by: Namhyung Kim 
Cc: Alexander Shishkin 
Cc: German Gomez 
Cc: Jiri Olsa 
Cc: John Garry 
Cc: Leo Yan 
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland 
Cc: Mathieu Poirier 
Cc: Ravi Bangoria 
Cc: Will Deacon 
Link: https://lore.kernel.org/r/20220408144056.1955535-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo

perf unwind: Don't show unwind error messages when augmenting frame pointer stack

2022-04-09T15:34:29+00:00

Commit Fixes: b9f6fbb3b2c29736 ("perf arm64: Inject missing frames when
using 'perf record --call-graph=fp'") intended to add a 'best effort'
DWARF unwind that improved the frame pointer stack in most scenarios.

It's expected that the unwind will fail sometimes, but this shouldn't be
reported as an error. It only works when the return address can be
determined from the contents of the link register alone.

Fix the error shown when the unwinder requires extra registers by adding
a new flag that suppresses error messages. This flag is not set in the
normal --call-graph=dwarf unwind mode so that behavior is not changed.

Fixes: b9f6fbb3b2c29736 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'")
Reported-by: John Garry 
Signed-off-by: James Clark 
Tested-by: John Garry 
Cc: Alexander Shishkin 
Cc: Alexandre Truong 
Cc: German Gomez 
Cc: Jiri Olsa 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Link: https://lore.kernel.org/r/20220406145651.1392529-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo

perf test tsc: Fix error message when not supported

2022-04-09T15:34:29+00:00

By default `perf test tsc` does not return the error message when the
child process detected kernel does not support it. Instead, the child
process prints an error message to stderr, unfortunately stderr is
redirected to /dev/null when verbose <= 0.

This patch does:

- return TEST_SKIP to the parent process instead of TEST_OK when
  perf_read_tsc_conversion() is not supported.

- Add a new subtest of testing if TSC is supported on current
  architecture by moving exist code to a separate function.
  It avoids two places in test__perf_time_to_tsc() that return
  TEST_SKIP by doing this.

- Extend the test suite definition to contain above two subtests.
  Current test_suite and test_case structs do not support printing skip
  reason when the number of subtest less than 1. To print skip reason, it
  is necessary to extend current test suite definition.

Reviewed-by: Adrian Hunter 
Signed-off-by: Chengdong Li 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Mark Rutland 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: likexu@tencent.com
Link: https://lore.kernel.org/r/20220408084748.43707-1-chengdongli@tencent.com
Signed-off-by: Arnaldo Carvalho de Melo