<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/tools/perf/util/annotate.h, branch v5.5</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>perf annotate: Pass a 'map_symbol' in places receiving a pair of 'map' and 'symbol' pointers</title>
<updated>2019-11-12T11:20:53+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-11-04T14:10:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2975489458c59ce2e348b1b3aef5d8d2acb5cc8d'/>
<id>2975489458c59ce2e348b1b3aef5d8d2acb5cc8d</id>
<content type='text'>
We are already passing things like:

  symbol__annotate(ms-&gt;sym, ms-&gt;map, ...)

So shorten the signature of such functions to receive the 'map_symbol'
pointer.

This also paves the way to having the 'struct map_groups' pointer in the
'struct map_symbol' so that we can get rid of 'struct map'-&gt;groups.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We are already passing things like:

  symbol__annotate(ms-&gt;sym, ms-&gt;map, ...)

So shorten the signature of such functions to receive the 'map_symbol'
pointer.

This also paves the way to having the 'struct map_groups' pointer in the
'struct map_symbol' so that we can get rid of 'struct map'-&gt;groups.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-23yx8v1t41nzpkpi7rdrozww@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf diff: Report noisy for cycles diff</title>
<updated>2019-10-11T13:57:00+00:00</updated>
<author>
<name>Jin Yao</name>
<email>yao.jin@linux.intel.com</email>
</author>
<published>2019-09-25T01:14:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cebf7d51a6c3babc4d0589da7aec0de1af0a5691'/>
<id>cebf7d51a6c3babc4d0589da7aec0de1af0a5691</id>
<content type='text'>
This patch prints the stddev and hist for the cycles diff of program
block. It can help us to understand if the cycles is noisy or not.

This patch is inspired by Andi Kleen's patch:

  https://lwn.net/Articles/600471/

We create new option '--cycles-hist'.

Example:

  perf record -b ./div
  perf record -b ./div
  perf diff -c cycles

  # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
  # ........  .......................................................... ....  .................  ............................
  #
      46.72%                                      [div.c:40 -&gt; div.c:40]    0  div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:44]    0  div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:39]    0  div                [.] main
      20.54%                          [random_r.c:357 -&gt; random_r.c:394]    1  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -&gt; random_r.c:380]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:388]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:391]    0  libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -&gt; random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -&gt; random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -&gt; random.c:293]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -&gt; random.c:298]    0  libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -&gt; div.c:25]    0  div                [.] compute_flag
       8.40%                                      [div.c:27 -&gt; div.c:28]    0  div                [.] compute_flag
       5.14%                                    [rand.c:26 -&gt; rand.c:27]    0  libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -&gt; rand.c:28]    0  libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -&gt; rand@plt+0]    0  div                [.] rand@plt
       0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -&gt; do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -&gt; do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -&gt; do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -&gt; __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -&gt; native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -&gt; native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr

When we enable the option '--cycles-hist', the output is

  perf diff -c cycles --cycles-hist

  # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
  # ........  .......................................................... ....  .................  .................  ............................
  #
      46.72%                                      [div.c:40 -&gt; div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
      20.54%                          [random_r.c:357 -&gt; random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -&gt; random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:388]    0                     libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -&gt; random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -&gt; random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -&gt; random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0                     libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -&gt; random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -&gt; div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
       8.40%                                      [div.c:27 -&gt; div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
       5.14%                                    [rand.c:26 -&gt; rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -&gt; rand.c:28]    0                     libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -&gt; rand@plt+0]    0                     div                [.] rand@plt
       0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -&gt; do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -&gt; do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -&gt; do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -&gt; __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -&gt; native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -&gt; native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr

 v8:
 ---
 Rebase to perf/core branch

 v7:
 ---
 1. v6 got Jiri's ACK.
 2. Rebase to latest perf/core branch.

 v6:
 ---
 1. Jiri provides better code for using data__hpp_register() in ui_init().
    Use this code in v6.

 v5:
 ---
 1. Refine the use of data__hpp_register() in ui_init() according to
    Jiri's suggestion.

 v4:
 ---
 1. Rename the new option from '--noisy' to '--cycles-hist'
 2. Remove the option '-n'.
 3. Only update the spark value and stats when '--cycles-hist' is enabled.
 4. Remove the code of printing '..'.

 v3:
 ---
 1. Move the histogram to a separate column
 2. Move the svals[] out of struct stats

 v2:
 ---
 Jiri got a compile error,

  CC       builtin-diff.o
  builtin-diff.c: In function ‘compute_cycles_diff’:
  builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
  712 |          labs(pair-&gt;block_info-&gt;cycles_spark[i] -
      |          ^~~~

 Because the result of u64 - u64 is still u64. Now we change the type of
 cycles_spark[] to s64.

Signed-off-by: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch prints the stddev and hist for the cycles diff of program
block. It can help us to understand if the cycles is noisy or not.

This patch is inspired by Andi Kleen's patch:

  https://lwn.net/Articles/600471/

We create new option '--cycles-hist'.

Example:

  perf record -b ./div
  perf record -b ./div
  perf diff -c cycles

  # Baseline                                [Program Block Range] Cycles Diff  Shared Object      Symbol
  # ........  .......................................................... ....  .................  ............................
  #
      46.72%                                      [div.c:40 -&gt; div.c:40]    0  div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:44]    0  div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:39]    0  div                [.] main
      20.54%                          [random_r.c:357 -&gt; random_r.c:394]    1  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -&gt; random_r.c:380]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:388]    0  libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:391]    0  libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -&gt; random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -&gt; random.c:291]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -&gt; random.c:293]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -&gt; random.c:298]    0  libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -&gt; div.c:25]    0  div                [.] compute_flag
       8.40%                                      [div.c:27 -&gt; div.c:28]    0  div                [.] compute_flag
       5.14%                                    [rand.c:26 -&gt; rand.c:27]    0  libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -&gt; rand.c:28]    0  libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -&gt; rand@plt+0]    0  div                [.] rand@plt
       0.00%                                                                   [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -&gt; do_mmap+732]  -10  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -&gt; do_mmap+765]    1  [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -&gt; do_mmap+299]    0  [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -&gt; __x86_indirect_thunk_r15+0]    7  [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -&gt; native_sched_clock+119]   -1  [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -&gt; native_write_msr+16]  -13  [kernel.kallsyms]  [k] native_write_msr

When we enable the option '--cycles-hist', the output is

  perf diff -c cycles --cycles-hist

  # Baseline                                [Program Block Range] Cycles Diff        stddev/Hist  Shared Object      Symbol
  # ........  .......................................................... ....  .................  .................  ............................
  #
      46.72%                                      [div.c:40 -&gt; div.c:40]    0  ± 37.8% ▁█▁▁██▁█   div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:44]    0  ± 49.4% ▁▁▂█▂▂▂▂   div                [.] main
      46.72%                                      [div.c:42 -&gt; div.c:39]    0  ± 24.1% ▃█▂▄▁▃▂▁   div                [.] main
      20.54%                          [random_r.c:357 -&gt; random_r.c:394]    1  ± 33.5% ▅▂▁█▃▁▂▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:357 -&gt; random_r.c:380]    0  ± 39.4% ▁▁█▁██▅▁   libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:388]    0                     libc-2.27.so       [.] __random_r
      20.54%                          [random_r.c:388 -&gt; random_r.c:391]    0  ± 41.2% ▁▃▁▂█▄▃▁   libc-2.27.so       [.] __random_r
      17.04%                              [random.c:288 -&gt; random.c:291]    0  ± 48.8% ▁▁▁▁███▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:291 -&gt; random.c:291]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:293 -&gt; random.c:293]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0  ±100.0% ▁█▁▁▁▁▁▁   libc-2.27.so       [.] __random
      17.04%                              [random.c:295 -&gt; random.c:295]    0                     libc-2.27.so       [.] __random
      17.04%                              [random.c:298 -&gt; random.c:298]    0  ± 75.6% ▃█▁▁▁▁▁▁   libc-2.27.so       [.] __random
       8.40%                                      [div.c:22 -&gt; div.c:25]    0  ± 42.1% ▁▃▁▁███▁   div                [.] compute_flag
       8.40%                                      [div.c:27 -&gt; div.c:28]    0  ± 41.8% ██▁▁▄▁▁▄   div                [.] compute_flag
       5.14%                                    [rand.c:26 -&gt; rand.c:27]    0  ± 37.8% ▁▁▁████▁   libc-2.27.so       [.] rand
       5.14%                                    [rand.c:28 -&gt; rand.c:28]    0                     libc-2.27.so       [.] rand
       2.15%                                  [rand@plt+0 -&gt; rand@plt+0]    0                     div                [.] rand@plt
       0.00%                                                                                      [kernel.kallsyms]  [k] __x86_indirect_thunk_rax
       0.00%                                [do_mmap+714 -&gt; do_mmap+732]  -10                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+737 -&gt; do_mmap+765]    1                     [kernel.kallsyms]  [k] do_mmap
       0.00%                                [do_mmap+262 -&gt; do_mmap+299]    0                     [kernel.kallsyms]  [k] do_mmap
       0.00%  [__x86_indirect_thunk_r15+0 -&gt; __x86_indirect_thunk_r15+0]    7                     [kernel.kallsyms]  [k] __x86_indirect_thunk_r15
       0.00%            [native_sched_clock+0 -&gt; native_sched_clock+119]   -1  ± 38.5% ▄█▁        [kernel.kallsyms]  [k] native_sched_clock
       0.00%                 [native_write_msr+0 -&gt; native_write_msr+16]  -13  ± 47.1% ▁█▇▃▁▁     [kernel.kallsyms]  [k] native_write_msr

 v8:
 ---
 Rebase to perf/core branch

 v7:
 ---
 1. v6 got Jiri's ACK.
 2. Rebase to latest perf/core branch.

 v6:
 ---
 1. Jiri provides better code for using data__hpp_register() in ui_init().
    Use this code in v6.

 v5:
 ---
 1. Refine the use of data__hpp_register() in ui_init() according to
    Jiri's suggestion.

 v4:
 ---
 1. Rename the new option from '--noisy' to '--cycles-hist'
 2. Remove the option '-n'.
 3. Only update the spark value and stats when '--cycles-hist' is enabled.
 4. Remove the code of printing '..'.

 v3:
 ---
 1. Move the histogram to a separate column
 2. Move the svals[] out of struct stats

 v2:
 ---
 Jiri got a compile error,

  CC       builtin-diff.o
  builtin-diff.c: In function ‘compute_cycles_diff’:
  builtin-diff.c:712:10: error: taking the absolute value of unsigned type ‘u64’ {aka ‘long unsigned int’} has no effect [-Werror=absolute-value]
  712 |          labs(pair-&gt;block_info-&gt;cycles_spark[i] -
      |          ^~~~

 Because the result of u64 - u64 is still u64. Now we change the type of
 cycles_spark[] to s64.

Signed-off-by: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lore.kernel.org/lkml/20190925011446.30678-1-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf annotate: Don't return -1 for error when doing BPF disassembly</title>
<updated>2019-09-30T20:30:06+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-09-30T19:04:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=11aad897f6d1a28eae3b7e5b293647c522d65819'/>
<id>11aad897f6d1a28eae3b7e5b293647c522d65819</id>
<content type='text'>
Return errno when open_memstream() fails and add two new speciall error
codes for when an invalid, non BPF file or one without BTF is passed to
symbol__disassemble_bpf(), so that its callers can rely on
symbol__strerror_disassemble() to convert that to a human readable error
message that can help figure out what is wrong, with hints even.

Cc: Russell King - ARM Linux admin &lt;linux@armlinux.org.uk&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;,
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-usevw9r2gcipfcrbpaueurw0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Return errno when open_memstream() fails and add two new speciall error
codes for when an invalid, non BPF file or one without BTF is passed to
symbol__disassemble_bpf(), so that its callers can rely on
symbol__strerror_disassemble() to convert that to a human readable error
message that can help figure out what is wrong, with hints even.

Cc: Russell King - ARM Linux admin &lt;linux@armlinux.org.uk&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;,
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-usevw9r2gcipfcrbpaueurw0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf annotate: Fix arch specific -&gt;init() failure errors</title>
<updated>2019-09-30T20:30:03+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-09-30T18:48:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=42d7a9107d83223a5fcecc6732d626a6c074cbc2'/>
<id>42d7a9107d83223a5fcecc6732d626a6c074cbc2</id>
<content type='text'>
They are called from symbol__annotate() and to propagate errors that can
help understand the problem make them return what
symbol__strerror_disassemble() known, i.e. errno codes and other
annotation specific errors in a special, out of errnos, range.

Reported-by: Russell King - ARM Linux admin &lt;linux@armlinux.org.uk&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;,
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-pqx7srcv7tixgid251aeboj6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
They are called from symbol__annotate() and to propagate errors that can
help understand the problem make them return what
symbol__strerror_disassemble() known, i.e. errno codes and other
annotation specific errors in a special, out of errnos, range.

Reported-by: Russell King - ARM Linux admin &lt;linux@armlinux.org.uk&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;,
Cc: Will Deacon &lt;will@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-pqx7srcv7tixgid251aeboj6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>libperf: Add nr_entries to struct perf_evlist</title>
<updated>2019-07-29T21:34:45+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2019-07-21T11:24:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6484d2f9dc3ecbf13f07100f7f771d1d779eda04'/>
<id>6484d2f9dc3ecbf13f07100f7f771d1d779eda04</id>
<content type='text'>
Move nr_entries count from 'struct perf' to into perf_evlist struct.

Committer notes:

Fix tools/perf/arch/s390/util/auxtrace.c case. And also the comment in
tools/perf/util/annotate.h.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alexey Budankov &lt;alexey.budankov@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20190721112506.12306-42-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Move nr_entries count from 'struct perf' to into perf_evlist struct.

Committer notes:

Fix tools/perf/arch/s390/util/auxtrace.c case. And also the comment in
tools/perf/util/annotate.h.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alexey Budankov &lt;alexey.budankov@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20190721112506.12306-42-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf evsel: Rename struct perf_evsel to struct evsel</title>
<updated>2019-07-29T21:34:42+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2019-07-21T11:23:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=32dcd021d004038ca12ac17319da5aa4756e9312'/>
<id>32dcd021d004038ca12ac17319da5aa4756e9312</id>
<content type='text'>
Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.

Committer notes:

Added fixes for arm64, provided by Jiri.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alexey Budankov &lt;alexey.budankov@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.

Committer notes:

Added fixes for arm64, provided by Jiri.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alexey Budankov &lt;alexey.budankov@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf annotate: Enable annotation of BPF programs</title>
<updated>2019-03-20T19:43:15+00:00</updated>
<author>
<name>Song Liu</name>
<email>songliubraving@fb.com</email>
</author>
<published>2019-03-12T05:30:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6987561c9e86eace45f2dbb0c564964a63f4150a'/>
<id>6987561c9e86eace45f2dbb0c564964a63f4150a</id>
<content type='text'>
In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
a new function symbol__disassemble_bpf(), where annotation line
information is filled based on the bpf_prog_info and btf data saved in
given perf_env.

symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
programs.

Committer testing:

After fixing this:

  -               u64 *addrs = (u64 *)(info_linear-&gt;info.jited_ksyms);
  +               u64 *addrs = (u64 *)(uintptr_t)(info_linear-&gt;info.jited_ksyms);

Detected when crossbuilding to a 32-bit arch.

And making all this dependent on HAVE_LIBBFD_SUPPORT and
HAVE_LIBBPF_SUPPORT:

1) Have a BPF program running, one that has BTF info, etc, I used
   the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
   by 'perf trace'.

  # grep -B1 augmented_raw ~/.perfconfig
  [trace]
	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
  #
  # perf trace -e *mmsg
  dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
  NetworkManager/10055 sendmmsg(22&lt;socket:[1056822]&gt;, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2

2) Then do a 'perf record' system wide for a while:

  # perf record -a
  ^C[ perf record: Woken up 68 times to write data ]
  [ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
  #

3) Check that we captured BPF and BTF info in the perf.data file:

  # perf report --header-only | grep 'b[pt]f'
  # event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
  # bpf_prog_info of id 13
  # bpf_prog_info of id 14
  # bpf_prog_info of id 15
  # bpf_prog_info of id 16
  # bpf_prog_info of id 17
  # bpf_prog_info of id 18
  # bpf_prog_info of id 21
  # bpf_prog_info of id 22
  # bpf_prog_info of id 41
  # bpf_prog_info of id 42
  # btf info of id 2
  #

4) Check which programs got recorded:

   # perf report | grep bpf_prog | head
     0.16%  exe              bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.14%  exe              bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.08%  fuse-overlayfs   bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.07%  fuse-overlayfs   bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.01%  clang-4.0        bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.01%  clang-4.0        bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  clang            bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.00%  runc             bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  clang            bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  sh               bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
  #

  This was with the default --sort order for 'perf report', which is:

    --sort comm,dso,symbol

  If we just look for the symbol, for instance:

   # perf report --sort symbol | grep bpf_prog | head
     0.26%  [k] bpf_prog_819967866022f1e1_sys_enter                -      -
     0.24%  [k] bpf_prog_c1bd85c092d6e4aa_sys_exit                 -      -
   #

  or the DSO:

   # perf report --sort dso | grep bpf_prog | head
     0.26%  bpf_prog_819967866022f1e1_sys_enter
     0.24%  bpf_prog_c1bd85c092d6e4aa_sys_exit
  #

We'll see the two BPF programs that augmented_raw_syscalls.o puts in
place,  one attached to the raw_syscalls:sys_enter and another to the
raw_syscalls:sys_exit tracepoints, as expected.

Now we can finally do, from the command line, annotation for one of
those two symbols, with the original BPF program source coude intermixed
with the disassembled JITed code:

  # perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter

  Samples: 950  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
  bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
  Percent      int sys_enter(struct syscall_enter_args *args)
   53.41         push   %rbp

    0.63         mov    %rsp,%rbp
    0.31         sub    $0x170,%rsp
    1.93         sub    $0x28,%rbp
    7.02         mov    %rbx,0x0(%rbp)
    3.20         mov    %r13,0x8(%rbp)
    1.07         mov    %r14,0x10(%rbp)
    0.61         mov    %r15,0x18(%rbp)
    0.11         xor    %eax,%eax
    1.29         mov    %rax,0x20(%rbp)
    0.11         mov    %rdi,%rbx
               	return bpf_get_current_pid_tgid();
    2.02       → callq  *ffffffffda6776d9
    2.76         mov    %eax,-0x148(%rbp)
                 mov    %rbp,%rsi
               int sys_enter(struct syscall_enter_args *args)
                 add    $0xfffffffffffffeb8,%rsi
               	return bpf_map_lookup_elem(pids, &amp;pid) != NULL;
                 movabs $0xffff975ac2607800,%rdi

    1.26       → callq  *ffffffffda6789e9
                 cmp    $0x0,%rax
    2.43       → je     0
                 add    $0x38,%rax
    0.21         xor    %r13d,%r13d
               	if (pid_filter__has(&amp;pids_filtered, getpid()))
    0.81         cmp    $0x0,%rax
               → jne    0
                 mov    %rbp,%rdi
               	probe_read(&amp;augmented_args.args, sizeof(augmented_args.args), args);
    2.22         add    $0xfffffffffffffeb8,%rdi
    0.11         mov    $0x40,%esi
    0.32         mov    %rbx,%rdx
    2.74       → callq  *ffffffffda658409
               	syscall = bpf_map_lookup_elem(&amp;syscalls, &amp;augmented_args.args.syscall_nr);
    0.22         mov    %rbp,%rsi
    1.69         add    $0xfffffffffffffec0,%rsi
               	syscall = bpf_map_lookup_elem(&amp;syscalls, &amp;augmented_args.args.syscall_nr);
                 movabs $0xffff975bfcd36000,%rdi

                 add    $0xd0,%rdi
    0.21         mov    0x0(%rsi),%eax
    0.93         cmp    $0x200,%rax
               → jae    0
    0.10         shl    $0x3,%rax

    0.11         add    %rdi,%rax
    0.11       → jmp    0
                 xor    %eax,%eax
               	if (syscall == NULL || !syscall-&gt;enabled)
    1.07         cmp    $0x0,%rax
               → je     0
               	if (syscall == NULL || !syscall-&gt;enabled)
    6.57         movzbq 0x0(%rax),%rdi

               	if (syscall == NULL || !syscall-&gt;enabled)
                 cmp    $0x0,%rdi
    0.95       → je     0
                 mov    $0x40,%r8d
               	switch (augmented_args.args.syscall_nr) {
                 mov    -0x140(%rbp),%rdi
               	switch (augmented_args.args.syscall_nr) {
                 cmp    $0x2,%rdi
               → je     0
                 cmp    $0x101,%rdi
               → je     0
                 cmp    $0x15,%rdi
               → jne    0
               	case SYS_OPEN:	 filename_arg = (const void *)args-&gt;args[0];
                 mov    0x10(%rbx),%rdx
               → jmp    0
               	case SYS_OPENAT: filename_arg = (const void *)args-&gt;args[1];
                 mov    0x18(%rbx),%rdx
               	if (filename_arg != NULL) {
                 cmp    $0x0,%rdx
               → je     0
                 xor    %edi,%edi
               		augmented_args.filename.reserved = 0;
                 mov    %edi,-0x104(%rbp)
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    %rbp,%rdi
                 add    $0xffffffffffffff00,%rdi
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    $0x100,%esi
               → callq  *ffffffffda658499
                 mov    $0x148,%r8d
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    %eax,-0x108(%rbp)
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    %rax,%rdi
                 shl    $0x20,%rdi

                 shr    $0x20,%rdi

               		if (augmented_args.filename.size &lt; sizeof(augmented_args.filename.value)) {
                 cmp    $0xff,%rdi
               → ja     0
               			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
                 add    $0x48,%rax
               			len &amp;= sizeof(augmented_args.filename.value) - 1;
                 and    $0xff,%rax
                 mov    %rax,%r8
                 mov    %rbp,%rcx
               	return perf_event_output(args, &amp;__augmented_syscalls__, BPF_F_CURRENT_CPU, &amp;augmented_args, len);
                 add    $0xfffffffffffffeb8,%rcx
                 mov    %rbx,%rdi
                 movabs $0xffff975fbd72d800,%rsi

                 mov    $0xffffffff,%edx
               → callq  *ffffffffda658ad9
                 mov    %rax,%r13
               }
                 mov    %r13,%rax
    0.72         mov    0x0(%rbp),%rbx
                 mov    0x8(%rbp),%r13
    1.16         mov    0x10(%rbp),%r14
    0.10         mov    0x18(%rbp),%r15
    0.42         add    $0x28,%rbp
    0.54         leaveq
    0.54       ← retq
  #

Please see 'man perf-config' to see how to control what should be seen,
via ~/.perfconfig [annotate] section, for instance, one can suppress the
source code and see just the disassembly, etc.

Alternatively, use the TUI bu just using 'perf annotate', press
'/bpf_prog' to see the bpf symbols, press enter and do the interactive
annotation, which allows for dumping to a file after selecting the
the various output tunables, for instance, the above without source code
intermixed, plus showing all the instruction offsets:

  # perf annotate bpf_prog_819967866022f1e1_sys_enter

Then press: 's' to hide the source code + 'O' twice to show all
instruction offsets, then 'P' to print to the
bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:

  # cat bpf_prog_819967866022f1e1_sys_enter.annotation
  bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
  Event: cycles:ppp

   53.41    0:   push   %rbp

    0.63    1:   mov    %rsp,%rbp
    0.31    4:   sub    $0x170,%rsp
    1.93    b:   sub    $0x28,%rbp
    7.02    f:   mov    %rbx,0x0(%rbp)
    3.20   13:   mov    %r13,0x8(%rbp)
    1.07   17:   mov    %r14,0x10(%rbp)
    0.61   1b:   mov    %r15,0x18(%rbp)
    0.11   1f:   xor    %eax,%eax
    1.29   21:   mov    %rax,0x20(%rbp)
    0.11   25:   mov    %rdi,%rbx
    2.02   28: → callq  *ffffffffda6776d9
    2.76   2d:   mov    %eax,-0x148(%rbp)
           33:   mov    %rbp,%rsi
           36:   add    $0xfffffffffffffeb8,%rsi
           3d:   movabs $0xffff975ac2607800,%rdi

    1.26   47: → callq  *ffffffffda6789e9
           4c:   cmp    $0x0,%rax
    2.43   50: → je     0
           52:   add    $0x38,%rax
    0.21   56:   xor    %r13d,%r13d
    0.81   59:   cmp    $0x0,%rax
           5d: → jne    0
           63:   mov    %rbp,%rdi
    2.22   66:   add    $0xfffffffffffffeb8,%rdi
    0.11   6d:   mov    $0x40,%esi
    0.32   72:   mov    %rbx,%rdx
    2.74   75: → callq  *ffffffffda658409
    0.22   7a:   mov    %rbp,%rsi
    1.69   7d:   add    $0xfffffffffffffec0,%rsi
           84:   movabs $0xffff975bfcd36000,%rdi

           8e:   add    $0xd0,%rdi
    0.21   95:   mov    0x0(%rsi),%eax
    0.93   98:   cmp    $0x200,%rax
           9f: → jae    0
    0.10   a1:   shl    $0x3,%rax

    0.11   a5:   add    %rdi,%rax
    0.11   a8: → jmp    0
           aa:   xor    %eax,%eax
    1.07   ac:   cmp    $0x0,%rax
           b0: → je     0
    6.57   b6:   movzbq 0x0(%rax),%rdi

           bb:   cmp    $0x0,%rdi
    0.95   bf: → je     0
           c5:   mov    $0x40,%r8d
           cb:   mov    -0x140(%rbp),%rdi
           d2:   cmp    $0x2,%rdi
           d6: → je     0
           d8:   cmp    $0x101,%rdi
           df: → je     0
           e1:   cmp    $0x15,%rdi
           e5: → jne    0
           e7:   mov    0x10(%rbx),%rdx
           eb: → jmp    0
           ed:   mov    0x18(%rbx),%rdx
           f1:   cmp    $0x0,%rdx
           f5: → je     0
           f7:   xor    %edi,%edi
           f9:   mov    %edi,-0x104(%rbp)
           ff:   mov    %rbp,%rdi
          102:   add    $0xffffffffffffff00,%rdi
          109:   mov    $0x100,%esi
          10e: → callq  *ffffffffda658499
          113:   mov    $0x148,%r8d
          119:   mov    %eax,-0x108(%rbp)
          11f:   mov    %rax,%rdi
          122:   shl    $0x20,%rdi

          126:   shr    $0x20,%rdi

          12a:   cmp    $0xff,%rdi
          131: → ja     0
          133:   add    $0x48,%rax
          137:   and    $0xff,%rax
          13d:   mov    %rax,%r8
          140:   mov    %rbp,%rcx
          143:   add    $0xfffffffffffffeb8,%rcx
          14a:   mov    %rbx,%rdi
          14d:   movabs $0xffff975fbd72d800,%rsi

          157:   mov    $0xffffffff,%edx
          15c: → callq  *ffffffffda658ad9
          161:   mov    %rax,%r13
          164:   mov    %r13,%rax
    0.72  167:   mov    0x0(%rbp),%rbx
          16b:   mov    0x8(%rbp),%r13
    1.16  16f:   mov    0x10(%rbp),%r14
    0.10  173:   mov    0x18(%rbp),%r15
    0.42  177:   add    $0x28,%rbp
    0.54  17b:   leaveq
    0.54  17c: ← retq

Another cool way to test all this is to symple use 'perf top' look for
those symbols, go there and press enter, annotate it live :-)

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Reviewed-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
a new function symbol__disassemble_bpf(), where annotation line
information is filled based on the bpf_prog_info and btf data saved in
given perf_env.

symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
programs.

Committer testing:

After fixing this:

  -               u64 *addrs = (u64 *)(info_linear-&gt;info.jited_ksyms);
  +               u64 *addrs = (u64 *)(uintptr_t)(info_linear-&gt;info.jited_ksyms);

Detected when crossbuilding to a 32-bit arch.

And making all this dependent on HAVE_LIBBFD_SUPPORT and
HAVE_LIBBPF_SUPPORT:

1) Have a BPF program running, one that has BTF info, etc, I used
   the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
   by 'perf trace'.

  # grep -B1 augmented_raw ~/.perfconfig
  [trace]
	add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
  #
  # perf trace -e *mmsg
  dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
  NetworkManager/10055 sendmmsg(22&lt;socket:[1056822]&gt;, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2

2) Then do a 'perf record' system wide for a while:

  # perf record -a
  ^C[ perf record: Woken up 68 times to write data ]
  [ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
  #

3) Check that we captured BPF and BTF info in the perf.data file:

  # perf report --header-only | grep 'b[pt]f'
  # event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
  # bpf_prog_info of id 13
  # bpf_prog_info of id 14
  # bpf_prog_info of id 15
  # bpf_prog_info of id 16
  # bpf_prog_info of id 17
  # bpf_prog_info of id 18
  # bpf_prog_info of id 21
  # bpf_prog_info of id 22
  # bpf_prog_info of id 41
  # bpf_prog_info of id 42
  # btf info of id 2
  #

4) Check which programs got recorded:

   # perf report | grep bpf_prog | head
     0.16%  exe              bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.14%  exe              bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.08%  fuse-overlayfs   bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.07%  fuse-overlayfs   bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.01%  clang-4.0        bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.01%  clang-4.0        bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  clang            bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
     0.00%  runc             bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  clang            bpf_prog_819967866022f1e1_sys_enter      [k] bpf_prog_819967866022f1e1_sys_enter
     0.00%  sh               bpf_prog_c1bd85c092d6e4aa_sys_exit       [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
  #

  This was with the default --sort order for 'perf report', which is:

    --sort comm,dso,symbol

  If we just look for the symbol, for instance:

   # perf report --sort symbol | grep bpf_prog | head
     0.26%  [k] bpf_prog_819967866022f1e1_sys_enter                -      -
     0.24%  [k] bpf_prog_c1bd85c092d6e4aa_sys_exit                 -      -
   #

  or the DSO:

   # perf report --sort dso | grep bpf_prog | head
     0.26%  bpf_prog_819967866022f1e1_sys_enter
     0.24%  bpf_prog_c1bd85c092d6e4aa_sys_exit
  #

We'll see the two BPF programs that augmented_raw_syscalls.o puts in
place,  one attached to the raw_syscalls:sys_enter and another to the
raw_syscalls:sys_exit tracepoints, as expected.

Now we can finally do, from the command line, annotation for one of
those two symbols, with the original BPF program source coude intermixed
with the disassembled JITed code:

  # perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter

  Samples: 950  of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
  bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
  Percent      int sys_enter(struct syscall_enter_args *args)
   53.41         push   %rbp

    0.63         mov    %rsp,%rbp
    0.31         sub    $0x170,%rsp
    1.93         sub    $0x28,%rbp
    7.02         mov    %rbx,0x0(%rbp)
    3.20         mov    %r13,0x8(%rbp)
    1.07         mov    %r14,0x10(%rbp)
    0.61         mov    %r15,0x18(%rbp)
    0.11         xor    %eax,%eax
    1.29         mov    %rax,0x20(%rbp)
    0.11         mov    %rdi,%rbx
               	return bpf_get_current_pid_tgid();
    2.02       → callq  *ffffffffda6776d9
    2.76         mov    %eax,-0x148(%rbp)
                 mov    %rbp,%rsi
               int sys_enter(struct syscall_enter_args *args)
                 add    $0xfffffffffffffeb8,%rsi
               	return bpf_map_lookup_elem(pids, &amp;pid) != NULL;
                 movabs $0xffff975ac2607800,%rdi

    1.26       → callq  *ffffffffda6789e9
                 cmp    $0x0,%rax
    2.43       → je     0
                 add    $0x38,%rax
    0.21         xor    %r13d,%r13d
               	if (pid_filter__has(&amp;pids_filtered, getpid()))
    0.81         cmp    $0x0,%rax
               → jne    0
                 mov    %rbp,%rdi
               	probe_read(&amp;augmented_args.args, sizeof(augmented_args.args), args);
    2.22         add    $0xfffffffffffffeb8,%rdi
    0.11         mov    $0x40,%esi
    0.32         mov    %rbx,%rdx
    2.74       → callq  *ffffffffda658409
               	syscall = bpf_map_lookup_elem(&amp;syscalls, &amp;augmented_args.args.syscall_nr);
    0.22         mov    %rbp,%rsi
    1.69         add    $0xfffffffffffffec0,%rsi
               	syscall = bpf_map_lookup_elem(&amp;syscalls, &amp;augmented_args.args.syscall_nr);
                 movabs $0xffff975bfcd36000,%rdi

                 add    $0xd0,%rdi
    0.21         mov    0x0(%rsi),%eax
    0.93         cmp    $0x200,%rax
               → jae    0
    0.10         shl    $0x3,%rax

    0.11         add    %rdi,%rax
    0.11       → jmp    0
                 xor    %eax,%eax
               	if (syscall == NULL || !syscall-&gt;enabled)
    1.07         cmp    $0x0,%rax
               → je     0
               	if (syscall == NULL || !syscall-&gt;enabled)
    6.57         movzbq 0x0(%rax),%rdi

               	if (syscall == NULL || !syscall-&gt;enabled)
                 cmp    $0x0,%rdi
    0.95       → je     0
                 mov    $0x40,%r8d
               	switch (augmented_args.args.syscall_nr) {
                 mov    -0x140(%rbp),%rdi
               	switch (augmented_args.args.syscall_nr) {
                 cmp    $0x2,%rdi
               → je     0
                 cmp    $0x101,%rdi
               → je     0
                 cmp    $0x15,%rdi
               → jne    0
               	case SYS_OPEN:	 filename_arg = (const void *)args-&gt;args[0];
                 mov    0x10(%rbx),%rdx
               → jmp    0
               	case SYS_OPENAT: filename_arg = (const void *)args-&gt;args[1];
                 mov    0x18(%rbx),%rdx
               	if (filename_arg != NULL) {
                 cmp    $0x0,%rdx
               → je     0
                 xor    %edi,%edi
               		augmented_args.filename.reserved = 0;
                 mov    %edi,-0x104(%rbp)
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    %rbp,%rdi
                 add    $0xffffffffffffff00,%rdi
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    $0x100,%esi
               → callq  *ffffffffda658499
                 mov    $0x148,%r8d
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    %eax,-0x108(%rbp)
               		augmented_args.filename.size = probe_read_str(&amp;augmented_args.filename.value,
                 mov    %rax,%rdi
                 shl    $0x20,%rdi

                 shr    $0x20,%rdi

               		if (augmented_args.filename.size &lt; sizeof(augmented_args.filename.value)) {
                 cmp    $0xff,%rdi
               → ja     0
               			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
                 add    $0x48,%rax
               			len &amp;= sizeof(augmented_args.filename.value) - 1;
                 and    $0xff,%rax
                 mov    %rax,%r8
                 mov    %rbp,%rcx
               	return perf_event_output(args, &amp;__augmented_syscalls__, BPF_F_CURRENT_CPU, &amp;augmented_args, len);
                 add    $0xfffffffffffffeb8,%rcx
                 mov    %rbx,%rdi
                 movabs $0xffff975fbd72d800,%rsi

                 mov    $0xffffffff,%edx
               → callq  *ffffffffda658ad9
                 mov    %rax,%r13
               }
                 mov    %r13,%rax
    0.72         mov    0x0(%rbp),%rbx
                 mov    0x8(%rbp),%r13
    1.16         mov    0x10(%rbp),%r14
    0.10         mov    0x18(%rbp),%r15
    0.42         add    $0x28,%rbp
    0.54         leaveq
    0.54       ← retq
  #

Please see 'man perf-config' to see how to control what should be seen,
via ~/.perfconfig [annotate] section, for instance, one can suppress the
source code and see just the disassembly, etc.

Alternatively, use the TUI bu just using 'perf annotate', press
'/bpf_prog' to see the bpf symbols, press enter and do the interactive
annotation, which allows for dumping to a file after selecting the
the various output tunables, for instance, the above without source code
intermixed, plus showing all the instruction offsets:

  # perf annotate bpf_prog_819967866022f1e1_sys_enter

Then press: 's' to hide the source code + 'O' twice to show all
instruction offsets, then 'P' to print to the
bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:

  # cat bpf_prog_819967866022f1e1_sys_enter.annotation
  bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
  Event: cycles:ppp

   53.41    0:   push   %rbp

    0.63    1:   mov    %rsp,%rbp
    0.31    4:   sub    $0x170,%rsp
    1.93    b:   sub    $0x28,%rbp
    7.02    f:   mov    %rbx,0x0(%rbp)
    3.20   13:   mov    %r13,0x8(%rbp)
    1.07   17:   mov    %r14,0x10(%rbp)
    0.61   1b:   mov    %r15,0x18(%rbp)
    0.11   1f:   xor    %eax,%eax
    1.29   21:   mov    %rax,0x20(%rbp)
    0.11   25:   mov    %rdi,%rbx
    2.02   28: → callq  *ffffffffda6776d9
    2.76   2d:   mov    %eax,-0x148(%rbp)
           33:   mov    %rbp,%rsi
           36:   add    $0xfffffffffffffeb8,%rsi
           3d:   movabs $0xffff975ac2607800,%rdi

    1.26   47: → callq  *ffffffffda6789e9
           4c:   cmp    $0x0,%rax
    2.43   50: → je     0
           52:   add    $0x38,%rax
    0.21   56:   xor    %r13d,%r13d
    0.81   59:   cmp    $0x0,%rax
           5d: → jne    0
           63:   mov    %rbp,%rdi
    2.22   66:   add    $0xfffffffffffffeb8,%rdi
    0.11   6d:   mov    $0x40,%esi
    0.32   72:   mov    %rbx,%rdx
    2.74   75: → callq  *ffffffffda658409
    0.22   7a:   mov    %rbp,%rsi
    1.69   7d:   add    $0xfffffffffffffec0,%rsi
           84:   movabs $0xffff975bfcd36000,%rdi

           8e:   add    $0xd0,%rdi
    0.21   95:   mov    0x0(%rsi),%eax
    0.93   98:   cmp    $0x200,%rax
           9f: → jae    0
    0.10   a1:   shl    $0x3,%rax

    0.11   a5:   add    %rdi,%rax
    0.11   a8: → jmp    0
           aa:   xor    %eax,%eax
    1.07   ac:   cmp    $0x0,%rax
           b0: → je     0
    6.57   b6:   movzbq 0x0(%rax),%rdi

           bb:   cmp    $0x0,%rdi
    0.95   bf: → je     0
           c5:   mov    $0x40,%r8d
           cb:   mov    -0x140(%rbp),%rdi
           d2:   cmp    $0x2,%rdi
           d6: → je     0
           d8:   cmp    $0x101,%rdi
           df: → je     0
           e1:   cmp    $0x15,%rdi
           e5: → jne    0
           e7:   mov    0x10(%rbx),%rdx
           eb: → jmp    0
           ed:   mov    0x18(%rbx),%rdx
           f1:   cmp    $0x0,%rdx
           f5: → je     0
           f7:   xor    %edi,%edi
           f9:   mov    %edi,-0x104(%rbp)
           ff:   mov    %rbp,%rdi
          102:   add    $0xffffffffffffff00,%rdi
          109:   mov    $0x100,%esi
          10e: → callq  *ffffffffda658499
          113:   mov    $0x148,%r8d
          119:   mov    %eax,-0x108(%rbp)
          11f:   mov    %rax,%rdi
          122:   shl    $0x20,%rdi

          126:   shr    $0x20,%rdi

          12a:   cmp    $0xff,%rdi
          131: → ja     0
          133:   add    $0x48,%rax
          137:   and    $0xff,%rax
          13d:   mov    %rax,%r8
          140:   mov    %rbp,%rcx
          143:   add    $0xfffffffffffffeb8,%rcx
          14a:   mov    %rbx,%rdi
          14d:   movabs $0xffff975fbd72d800,%rsi

          157:   mov    $0xffffffff,%edx
          15c: → callq  *ffffffffda658ad9
          161:   mov    %rax,%r13
          164:   mov    %r13,%rax
    0.72  167:   mov    0x0(%rbp),%rbx
          16b:   mov    0x8(%rbp),%r13
    1.16  16f:   mov    0x10(%rbp),%r14
    0.10  173:   mov    0x18(%rbp),%r15
    0.42  177:   add    $0x28,%rbp
    0.54  17b:   leaveq
    0.54  17c: ← retq

Another cool way to test all this is to symple use 'perf top' look for
those symbols, go there and press enter, annotate it live :-)

Signed-off-by: Song Liu &lt;songliubraving@fb.com&gt;
Reviewed-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf annotate: Calculate the max instruction name, align column to that</title>
<updated>2019-03-06T19:40:15+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-03-06T19:40:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bc3bb795345891509b4a3cbff824cbef8c130f20'/>
<id>bc3bb795345891509b4a3cbff824cbef8c130f20</id>
<content type='text'>
We were hardcoding '6' as the max instruction name, and we have lots
that are longer than that, see the diff from two 'P' printed TUI
annotations for a libc function that uses instructions with long names,
such as 'vpmovmskb' with its 9 chars:

  --- __strcmp_avx2.annotation.before	2019-03-06 16:31:39.368020425 -0300
  +++ __strcmp_avx2.annotation	2019-03-06 16:32:12.079450508 -0300
  @@ -2,284 +2,284 @@
   Event: cycles:ppp

   Percent        endbr64
  -  0.10         mov    %edi,%eax
  +  0.10         mov        %edi,%eax
  -               xor    %edx,%edx
  +               xor        %edx,%edx
  -  3.54         vpxor  %ymm7,%ymm7,%ymm7
  +  3.54         vpxor      %ymm7,%ymm7,%ymm7
  -               or     %esi,%eax
  +               or         %esi,%eax
  -               and    $0xfff,%eax
  +               and        $0xfff,%eax
  -               cmp    $0xf80,%eax
  +               cmp        $0xf80,%eax
  -             ↓ jg     370
  +             ↓ jg         370
  - 27.07         vmovdqu (%rdi),%ymm1
  + 27.07         vmovdqu    (%rdi),%ymm1
  -  7.97         vpcmpeqb (%rsi),%ymm1,%ymm0
  +  7.97         vpcmpeqb   (%rsi),%ymm1,%ymm0
  -  2.15         vpminub %ymm1,%ymm0,%ymm0
  +  2.15         vpminub    %ymm1,%ymm0,%ymm0
  -  4.09         vpcmpeqb %ymm7,%ymm0,%ymm0
  +  4.09         vpcmpeqb   %ymm7,%ymm0,%ymm0
  -  0.43         vpmovmskb %ymm0,%ecx
  +  0.43         vpmovmskb  %ymm0,%ecx
  -  1.53         test   %ecx,%ecx
  +  1.53         test       %ecx,%ecx
  -             ↓ je     b0
  +             ↓ je         b0
  -  5.26         tzcnt  %ecx,%edx
  +  5.26         tzcnt      %ecx,%edx
  - 18.40         movzbl (%rdi,%rdx,1),%eax
  + 18.40         movzbl     (%rdi,%rdx,1),%eax
  -  7.09         movzbl (%rsi,%rdx,1),%edx
  +  7.09         movzbl     (%rsi,%rdx,1),%edx
  -  3.34         sub    %edx,%eax
  +  3.34         sub        %edx,%eax
     2.37         vzeroupper
                ← retq
                  nop
  -         50:   tzcnt  %ecx,%edx
  +         50:   tzcnt      %ecx,%edx
  -               movzbl 0x20(%rdi,%rdx,1),%eax
  +               movzbl     0x20(%rdi,%rdx,1),%eax
  -               movzbl 0x20(%rsi,%rdx,1),%edx
  +               movzbl     0x20(%rsi,%rdx,1),%edx
  -               sub    %edx,%eax
  +               sub        %edx,%eax
                  vzeroupper
                ← retq
  -               data16 nopw %cs:0x0(%rax,%rax,1)
  +               data16     nopw %cs:0x0(%rax,%rax,1)

Reported-by: Travis Downs &lt;travis.downs@gmail.com&gt;
LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We were hardcoding '6' as the max instruction name, and we have lots
that are longer than that, see the diff from two 'P' printed TUI
annotations for a libc function that uses instructions with long names,
such as 'vpmovmskb' with its 9 chars:

  --- __strcmp_avx2.annotation.before	2019-03-06 16:31:39.368020425 -0300
  +++ __strcmp_avx2.annotation	2019-03-06 16:32:12.079450508 -0300
  @@ -2,284 +2,284 @@
   Event: cycles:ppp

   Percent        endbr64
  -  0.10         mov    %edi,%eax
  +  0.10         mov        %edi,%eax
  -               xor    %edx,%edx
  +               xor        %edx,%edx
  -  3.54         vpxor  %ymm7,%ymm7,%ymm7
  +  3.54         vpxor      %ymm7,%ymm7,%ymm7
  -               or     %esi,%eax
  +               or         %esi,%eax
  -               and    $0xfff,%eax
  +               and        $0xfff,%eax
  -               cmp    $0xf80,%eax
  +               cmp        $0xf80,%eax
  -             ↓ jg     370
  +             ↓ jg         370
  - 27.07         vmovdqu (%rdi),%ymm1
  + 27.07         vmovdqu    (%rdi),%ymm1
  -  7.97         vpcmpeqb (%rsi),%ymm1,%ymm0
  +  7.97         vpcmpeqb   (%rsi),%ymm1,%ymm0
  -  2.15         vpminub %ymm1,%ymm0,%ymm0
  +  2.15         vpminub    %ymm1,%ymm0,%ymm0
  -  4.09         vpcmpeqb %ymm7,%ymm0,%ymm0
  +  4.09         vpcmpeqb   %ymm7,%ymm0,%ymm0
  -  0.43         vpmovmskb %ymm0,%ecx
  +  0.43         vpmovmskb  %ymm0,%ecx
  -  1.53         test   %ecx,%ecx
  +  1.53         test       %ecx,%ecx
  -             ↓ je     b0
  +             ↓ je         b0
  -  5.26         tzcnt  %ecx,%edx
  +  5.26         tzcnt      %ecx,%edx
  - 18.40         movzbl (%rdi,%rdx,1),%eax
  + 18.40         movzbl     (%rdi,%rdx,1),%eax
  -  7.09         movzbl (%rsi,%rdx,1),%edx
  +  7.09         movzbl     (%rsi,%rdx,1),%edx
  -  3.34         sub    %edx,%eax
  +  3.34         sub        %edx,%eax
     2.37         vzeroupper
                ← retq
                  nop
  -         50:   tzcnt  %ecx,%edx
  +         50:   tzcnt      %ecx,%edx
  -               movzbl 0x20(%rdi,%rdx,1),%eax
  +               movzbl     0x20(%rdi,%rdx,1),%eax
  -               movzbl 0x20(%rsi,%rdx,1),%edx
  +               movzbl     0x20(%rsi,%rdx,1),%edx
  -               sub    %edx,%eax
  +               sub        %edx,%eax
                  vzeroupper
                ← retq
  -               data16 nopw %cs:0x0(%rax,%rax,1)
  +               data16     nopw %cs:0x0(%rax,%rax,1)

Reported-by: Travis Downs &lt;travis.downs@gmail.com&gt;
LPU-Reference: CAOBGo4z1KfmWeOm6Et0cnX5Z6DWsG2PQbAvRn1MhVPJmXHrc5g@mail.gmail.com
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-89wsdd9h9g6bvq52sgp6d0u4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf annotate: Remove lots of headers from annotate.h</title>
<updated>2019-01-25T14:12:08+00:00</updated>
<author>
<name>Arnaldo Carvalho de Melo</name>
<email>acme@redhat.com</email>
</author>
<published>2019-01-22T12:47:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8a249c73a5cc0f578ef2c5a392e20336203ebe3a'/>
<id>8a249c73a5cc0f578ef2c5a392e20336203ebe3a</id>
<content type='text'>
To reduce the chances changes trigger tons of rebuilds, more to come.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-ytbykaku63862guk7muflcy4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
To reduce the chances changes trigger tons of rebuilds, more to come.

Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lkml.kernel.org/n/tip-ytbykaku63862guk7muflcy4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf annotate: Compute average IPC and IPC coverage per symbol</title>
<updated>2018-12-17T17:55:32+00:00</updated>
<author>
<name>Jin Yao</name>
<email>yao.jin@linux.intel.com</email>
</author>
<published>2018-11-30T13:54:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ace4f8faea54f62521e349f70b49797e48873e1f'/>
<id>ace4f8faea54f62521e349f70b49797e48873e1f</id>
<content type='text'>
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example:

  $ perf annotate --stdio2

  Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

                          Disassembly of section .text:

                          000000000003aac0 &lt;random@@GLIBC_2.2.5&gt;:
    8.32  3.28              sub    $0x18,%rsp
          3.28              mov    $0x1,%esi
          3.28              xor    %eax,%eax
          3.28              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
   11.57  3.28     1      ↓ je     20
                            lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    29
                          ↓ jmp    43
   11.57  1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
    0.00  1.10     1      ↓ je     43
                      29:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_lock_wait_private
                            add    $0x80,%rsp
    0.00  3.00        43:   lea    __ctype_b@GLIBC_2.2.5+0x38,%rdi
          3.00              lea    0xc(%rsp),%rsi
    8.49  3.00     1      → callq  __random_r
    7.91  1.94              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
    0.00  1.94     1      ↓ je     68
                            lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    70
                          ↓ jmp    8a
    0.00  2.00        68:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
   21.56  2.00     1      ↓ je     8a
                      70:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_unlock_wake_private
                            add    $0x80,%rsp
   21.56  2.90        8a:   movslq 0xc(%rsp),%rax
          2.90              add    $0x18,%rsp
    9.03  2.90     1      ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.

Signed-off-by: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add support to 'perf report' annotate view or 'perf annotate --stdio2'
to aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example:

  $ perf annotate --stdio2

  Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

                          Disassembly of section .text:

                          000000000003aac0 &lt;random@@GLIBC_2.2.5&gt;:
    8.32  3.28              sub    $0x18,%rsp
          3.28              mov    $0x1,%esi
          3.28              xor    %eax,%eax
          3.28              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
   11.57  3.28     1      ↓ je     20
                            lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    29
                          ↓ jmp    43
   11.57  1.10        20:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
    0.00  1.10     1      ↓ je     43
                      29:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_lock_wait_private
                            add    $0x80,%rsp
    0.00  3.00        43:   lea    __ctype_b@GLIBC_2.2.5+0x38,%rdi
          3.00              lea    0xc(%rsp),%rsi
    8.49  3.00     1      → callq  __random_r
    7.91  1.94              cmpl   $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
    0.00  1.94     1      ↓ je     68
                            lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
                          ↓ jne    70
                          ↓ jmp    8a
    0.00  2.00        68:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
   21.56  2.00     1      ↓ je     8a
                      70:   lea    __abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
                            sub    $0x80,%rsp
                          → callq  __lll_unlock_wake_private
                            add    $0x80,%rsp
   21.56  2.90        8a:   movslq 0xc(%rsp),%rax
          2.90              add    $0x18,%rsp
    9.03  2.90     1      ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage is
54.8%.

Signed-off-by: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Reviewed-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/1543586097-27632-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
