linux-stable.git/tools/perf/util/annotate-data.h, branch v6.10

perf annotate-data: Add hist_entry__annotate_data_tui()

2024-04-12T15:02:06+00:00

Support data type profiling output on TUI.

Testing from Arnaldo:

First make sure that the debug information for your workload binaries
in embedded in them by building it with '-g' or install the debuginfo
packages, since our workload is 'find':

  root@number:~# type find
  find is hashed (/usr/bin/find)
  root@number:~# rpm -qf /usr/bin/find
  findutils-4.9.0-5.fc39.x86_64
  root@number:~# dnf debuginfo-install findutils
  
  root@number:~#

Then collect some data:

  root@number:~# echo 1 > /proc/sys/vm/drop_caches
  root@number:~# perf mem record find / > /dev/null
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.331 MB perf.data (3982 samples) ]
  root@number:~#

Finally do data-type annotation with the following command, that will
default, as 'perf report' to the --tui mode, with lines colored to
highlight the hotspots, etc.

  root@number:~# perf annotate --data-type
  Annotate type: 'struct predicate' (58 samples)
      Percent     Offset       Size  Field
       100.00          0        312  struct predicate {
         0.00          0          8      PRED_FUNC        pred_func;
         0.00          8          8      char*    p_name;
         0.00         16          4      enum predicate_type      p_type;
         0.00         20          4      enum predicate_precedence        p_prec;
         0.00         24          1      _Bool    side_effects;
         0.00         25          1      _Bool    no_default_print;
         0.00         26          1      _Bool    need_stat;
         0.00         27          1      _Bool    need_type;
         0.00         28          1      _Bool    need_inum;
         0.00         32          4      enum EvaluationCost      p_cost;
         0.00         36          4      float    est_success_rate;
         0.00         40          1      _Bool    literal_control_chars;
         0.00         41          1      _Bool    artificial;
         0.00         48          8      char*    arg_text;
  

Reviewed-by: Ian Rogers 
Signed-off-by: Namhyung Kim 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Adrian Hunter 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20240411033256.2099646-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Add hist_entry__annotate_data_tty()

2024-04-12T15:02:06+00:00

And move the related code into util/annotate-data.c file.

Reviewed-by: Ian Rogers 
Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Kan Liang 
Cc: Peter Zijlstra 
Link: https://lore.kernel.org/r/20240411033256.2099646-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Add a cache for global variable types

2024-03-21T13:41:29+00:00

They are often searched by many different places.  Let's add a cache
for them to reduce the duplicate DWARF access.

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240319055115.4063940-23-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Add stack canary type

2024-03-21T13:41:29+00:00

When the stack protector is enabled, compiler would generate code to
check stack overflow with a special value called 'stack carary' at
runtime.  On x86_64, GCC hard-codes the stack canary as %gs:40.

While there's a definition of fixed_percpu_data in asm/processor.h,
it seems that the header is not included everywhere and many places
it cannot find the type info.  As it's in the well-known location (at
%gs:40), let's add a pseudo stack canary type to handle it specially.

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240319055115.4063940-22-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Implement instruction tracking

2024-03-21T13:41:29+00:00

If it failed to find a variable for the location directly, it might be
due to a missing variable in the source code.  For example, accessing
pointer variables in a chain can result in the case like below:

  struct foo *foo = ...;

  int i = foo->bar->baz;

The DWARF debug information is created for each variable so it'd have
one for 'foo'.  But there's no variable for 'foo->bar' and then it
cannot know the type of 'bar' and 'baz'.

The above source code can be compiled to the follow x86 instructions:

  mov  0x8(%rax), %rcx
  mov  0x4(%rcx), %rdx   <=== PMU sample
  mov  %rdx, -4(%rbp)

Let's say 'foo' is located in the %rax and it has a pointer to struct
foo.  But perf sample is captured in the second instruction and there
is no variable or type info for the %rcx.

It'd be great if compiler could generate debug info for %rcx, but we
should handle it on our side.  So this patch implements the logic to
iterate instructions and update the type table for each location.

As it already collected a list of scopes including the target
instruction, we can use it to construct the type table smartly.

  +----------------  scope[0] subprogram
  |
  | +--------------  scope[1] lexical_block
  | |
  | | +------------  scope[2] inlined_subroutine
  | | |
  | | | +----------  scope[3] inlined_subroutine
  | | | |
  | | | | +--------  scope[4] lexical_block
  | | | | |
  | | | | |     ***  target instruction
  ...

Image the target instruction has 5 scopes, each scope will have its own
variables and parameters.  Then it can start with the innermost scope
(4).  So it'd search the shortest path from the start of scope[4] to
the target address and build a list of basic blocks.  Then it iterates
the basic blocks with the variables in the scope and update the table.
If it finds a type at the target instruction, then returns it.

Otherwise, it moves to the upper scope[3].  Now it'd search the shortest
path from the start of scope[3] to the start of scope[4].  Then connect
it to the existing basic block list.  Then it'd iterate the blocks with
variables for both scopes.  It can repeat this until it finds a type at
the target instruction or reaches to the top scope[0].

As the basic blocks contain the shortest path, it won't worry about
branches and can update the table simply.

The final check will be done by find_matching_type() in the next patch.

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240319055115.4063940-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Add get_global_var_type()

2024-03-21T13:41:28+00:00

Accessing global variable is common when it tracks execution later.
Factor out the common code into a function for later use.

It adds thread and cpumode to struct data_loc_info to find (global)
symbols if needed.  Also remove var_name as it's retrieved in the
helper function.

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240319055115.4063940-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Add update_insn_state()

2024-03-21T13:41:28+00:00

The update_insn_state() function is to update the type state table after
processing each instruction.  For now, it handles MOV (on x86) insn
to transfer type info from the source location to the target.

The location can be a register or a stack slot.  Check carefully when
memory reference happens and fetch the type correctly.  It basically
ignores write to a memory since it doesn't change the type info.  One
exception is writes to (new) stack slots for register spilling.

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240319055115.4063940-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Introduce 'struct data_loc_info'

2024-03-21T13:41:28+00:00

The find_data_type() needs many information to describe the location of
the data.  Add the new 'struct data_loc_info' to pass those information at
once.

No functional changes intended.

Signed-off-by: Namhyung Kim 
Cc: Adrian Hunter 
Cc: Ian Rogers 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Link: https://lore.kernel.org/r/20240319055115.4063940-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

perf annotate-data: Support global variables

2024-01-22T20:08:20+00:00

Global variables are accessed using PC-relative address so it needs to
be handled separately.  The PC-rel addressing is detected by using
DWARF_REG_PC.  On x86, %rip register would be used.

The address can be calculated using the ip and offset in the
instruction.  But it should start from the next instruction so add
calculate_pcrel_addr() to do it properly.

But global variables defined in a different file would only have a
declaration which doesn't include a location list.  So it first tries
to get the type info using the address, and then looks up the variable
declarations using name.  The name of global variables should be get
from the symbol table.  The declaration would have the type info.

So extend find_var_type() to take both address and name for global
variables.

The stat is now looks like:

  Annotate data type stats:
  total 294, ok 153 (52.0%), bad 141 (48.0%)
  -----------------------------------------------------------
          30 : no_sym
          32 : no_mem_ops
          61 : no_var
          10 : no_typeinfo
           8 : bad_offset

Reviewed-by: Ian Rogers 
Cc: Stephane Eranian 
Cc: Masami Hiramatsu 
Link: https://lore.kernel.org/r/20240117062657.985479-7-namhyung@kernel.org
Signed-off-by: Namhyung Kim

perf annotate-data: Add stack operation pseudo type

2024-01-22T20:08:20+00:00

A typical function prologue and epilogue include multiple stack
operations to save and restore the current value of registers.
On x86, it looks like below:

  push  r15
  push  r14
  push  r13
  push  r12

  ...

  pop   r12
  pop   r13
  pop   r14
  pop   r15
  ret

As these all touches the stack memory region, chances are high that they
appear in a memory profile data.  But these are not used for any real
purpose yet so it'd return no types.

One of my profile type shows that non neglible portion of data came from
the stack operations.  It also seems GCC generates more stack operations
than clang.

Annotate Instruction stats
total 264, ok 169 (64.0%), bad 95 (36.0%)

    Name      :  Good   Bad
  -----------------------------------------------------------
    movq      :    49    27
    movl      :    24     9
    popq      :     0    19   <-- here
    cmpl      :    17     2
    addq      :    14     1
    cmpq      :    12     2
    cmpxchgl  :     3     7

Instead of dealing them as unknown, let's create a seperate pseudo type
to represent those stack operations separately.

Reviewed-by: Ian Rogers 
Cc: Stephane Eranian 
Cc: Masami Hiramatsu 
Link: https://lore.kernel.org/r/20240117062657.985479-5-namhyung@kernel.org
Signed-off-by: Namhyung Kim