linux-stable.git/tools/perf/ui, branch v6.15

perf hist stdio: Do bounds check when printing callchains to avoid UB with new gcc versions

2025-03-13T07:30:14+00:00

Do a simple bounds check to avoid this on new gcc versions:

  31    15.81 fedora:rawhide                : FAIL gcc version 15.0.1 20250225 (Red Hat 15.0.1-0) (GCC)
    In function 'callchain__fprintf_left_margin',
        inlined from 'callchain__fprintf_graph.constprop' at ui/stdio/hist.c:246:12:
    ui/stdio/hist.c:27:39: error: iteration 2147483647 invokes undefined behavior [-Werror=aggressive-loop-optimizations]
       27 |         for (i = 0; i < left_margin; i++)
          |                                      ~^~
    ui/stdio/hist.c:27:23: note: within this loop
       27 |         for (i = 0; i < left_margin; i++)
          |                     ~~^~~~~~~~~~~~~
    cc1: all warnings being treated as errors

Signed-off-by: Arnaldo Carvalho de Melo 
Link: https://lore.kernel.org/r/20250310194534.265487-4-acme@kernel.org
Signed-off-by: Namhyung Kim

perf report: Add --latency flag

2025-02-18T22:04:32+00:00

Add record/report --latency flag that allows to capture and show
latency-centric profiles rather than the default CPU-consumption-centric
profiles. For latency profiles record captures context switch events,
and report shows Latency as the first column.

Signed-off-by: Dmitry Vyukov 
Reviewed-by: Andi Kleen 
Link: https://lore.kernel.org/r/e9640464bcbc47dde2cb557003f421052ebc9eec.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim

perf report: Add latency output field

2025-02-18T22:04:32+00:00

Latency output field is similar to overhead, but represents overhead for
latency rather than CPU consumption. It's re-scaled from overhead by dividing
weight by the current parallelism level at the time of the sample.
It effectively models profiling with 1 sample taken per unit of wall-clock
time rather than unit of CPU time.

Signed-off-by: Dmitry Vyukov 
Reviewed-by: Andi Kleen 
Link: https://lore.kernel.org/r/b6269518758c2166e6ffdc2f0e24cfdecc8ef9c1.1739437531.git.dvyukov@google.com
Signed-off-by: Namhyung Kim

perf annotate: Prefer passing evsel to evsel->core.idx

2025-01-18T18:02:10+00:00

An evsel idx may not be stable due to sorting, evlist removal,
etc. Try to reduce it being part of APIs by explicitly passing the
evsel in annotate code. Internally the code just reads evsel->core.idx
so behavior is unchanged.

Signed-off-by: Ian Rogers 
Cc: Chen Ni 
Cc: Athira Rajeev 
Link: https://lore.kernel.org/r/20250117181848.690474-1-irogers@google.com
Signed-off-by: Namhyung Kim

perf hist: Fix width calculation in hpp__fmt()

2025-01-17T17:51:18+00:00

hpp__width_fn() round up width to length of the field name,
hpp__fmt() should do it too. Otherwise, the numbers may
end up unaligned if the field name is long.

Signed-off-by: Dmitry Vyukov 
Reviewed-by: James Clark 
Link: https://lore.kernel.org/r/20250108065949.235718-1-dvyukov@google.com
Signed-off-by: Namhyung Kim

perf script: Move find_scripts to browser/scripts.c

2024-12-18T19:24:32+00:00

The only use of find_scripts is in browser/scripts.c but the
definition in builtin causes linking problems requiring a stub in
python.c. Move the function to allow the stub to be removed.

Signed-off-by: Ian Rogers 
Tested-by: Arnaldo Carvalho de Melo 
Acked-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Andi Kleen 
Cc: Athira Rajeev 
Cc: Colin Ian King 
Cc: Dapeng Mi 
Cc: Howard Chu 
Cc: Ilya Leoshkevich 
Cc: Ingo Molnar 
Cc: James Clark 
Cc: Jiri Olsa 
Cc: Josh Poimboeuf 
Cc: Kan Liang 
Cc: Mark Rutland 
Cc: Michael Petlan 
Cc: Peter Zijlstra 
Cc: Thomas Richter 
Cc: Veronika Molnarova 
Cc: Weilin Wang 
Link: https://lore.kernel.org/r/20241119011644.971342-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo

perf tools: Print lost samples due to BPF filter

2024-08-28T21:07:20+00:00

Print the actual dropped sample count in the event stat.

  $ sudo perf record -o- -e cycles --filter 'period < 10000' \
      -e instructions --filter 'ip > 0x8000000000000000' perf test -w noploop | \
      perf report --stat -i-
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.058 MB - ]

  Aggregated stats:
                 TOTAL events:        469
                  MMAP events:        268  (57.1%)
                  COMM events:          2  ( 0.4%)
                  EXIT events:          1  ( 0.2%)
                SAMPLE events:         16  ( 3.4%)
                 MMAP2 events:         22  ( 4.7%)
          LOST_SAMPLES events:          2  ( 0.4%)
               KSYMBOL events:         89  (19.0%)
             BPF_EVENT events:         39  ( 8.3%)
                  ATTR events:          2  ( 0.4%)
        FINISHED_ROUND events:          1  ( 0.2%)
              ID_INDEX events:          1  ( 0.2%)
            THREAD_MAP events:          1  ( 0.2%)
               CPU_MAP events:          1  ( 0.2%)
          EVENT_UPDATE events:          2  ( 0.4%)
             TIME_CONV events:          1  ( 0.2%)
               FEATURE events:         20  ( 4.3%)
         FINISHED_INIT events:          1  ( 0.2%)
  cycles stats:
                SAMPLE events:          2
    LOST_SAMPLES (BPF) events:       4010
  instructions stats:
                SAMPLE events:         14
    LOST_SAMPLES (BPF) events:       3990

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: KP Singh 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Cc: Song Liu 
Link: https://lore.kernel.org/r/20240820154504.128923-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf hist: Don't set hpp_fmt_value for members in --no-group

2024-08-28T21:07:20+00:00

Perf crashes as below when applying --no-group

  # perf record -e "{cache-misses,branches"} -b sleep 1
  # perf report --stdio --no-group
  free(): invalid next size (fast)
  Aborted (core dumped)
  #

In the __hpp__fmt(), only 1 hpp_fmt_value is allocated for the current
event when --no-group is applied.

However, the current implementation tries to assign the hists from all
members to the hpp_fmt_value, which exceeds the allocated memory.

Fixes: 8f6071a3dce40e69 ("perf hist: Simplify __hpp_fmt() using hpp_fmt_data")
Signed-off-by: Kan Liang 
Acked-by: Namhyung Kim 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20240820183202.3174323-1-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Show offset and size in hex

2024-08-21T14:48:39+00:00

It'd be better to have them in hex to check cacheline alignment.

 Percent     offset       size  field
  100.00          0      0x1c0  struct cfs_rq    {
    0.00          0       0x10      struct load_weight  load {
    0.00          0        0x8          long unsigned int       weight;
    0.00        0x8        0x4          u32     inv_weight;
                                    };
    0.00       0x10        0x4      unsigned int        nr_running;
   14.56       0x14        0x4      unsigned int        h_nr_running;
    0.00       0x18        0x4      unsigned int        idle_nr_running;
    0.00       0x1c        0x4      unsigned int        idle_h_nr_running;
  ...

Committer notes:

Justification from Namhyung when asked about why it would be "better":

Cache line sizes are power of 2 so it'd be natural to use hex and
check whether an offset is in the same boundary.  Also 'perf annotate'
shows instruction offsets in hex.

>
> Maybe this should be selectable?

I can add an option and/or a config if you want.

Signed-off-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Athira Rajeev 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20240819233603.54941-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate: Display the branch counter histogram

2024-08-14T13:20:40+00:00

Display the branch counter histogram in the annotation view.

Press 'B' to display the branch counter's abbreviation list as well.

  Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
  4000 Hz, Event count (approx.):
  f3  /home/sdp/test/tchain_edit [Percent: local period]
  Percent       │ IPC Cycle       Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
                │                                     0000000000401755 :
    0.00   0.00 │                                       endbr64
                │                                       push    %rbp
                │                                       mov     %rsp,%rbp
                │                                       movl    $0x0,-0x4(%rbp)
    0.00   0.00 │1.33     3          |A   |-   |      ↓ jmp     25
   11.03  11.03 │                                 11:   mov     -0x4(%rbp),%eax
                │                                       and     $0x1,%eax
                │                                       test    %eax,%eax
   17.13  17.13 │2.41     1          |A   |-   |      ↓ je      21
                │                                       addl    $0x1,-0x4(%rbp)
   21.84  21.84 │2.22     2          |AA  |-   |      ↓ jmp     25
   17.13  17.13 │                                 21:   addl    $0x1,-0x4(%rbp)
   21.84  21.84 │                                 25:   cmpl    $0x270f,-0x4(%rbp)
   11.03  11.03 │0.61     3          |A   |-   |      ↑ jle     11
                │                                       nop
                │                                       pop     %rbp
    0.00   0.00 │0.24    20          |AA  |B   |      ← ret

Originally-by: Tinghao Zhang 
Reviewed-by: Andi Kleen 
Signed-off-by: Kan Liang 
Acked-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240813160208.2493643-8-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo