<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/tools/perf/util/thread-stack.c, branch linux-5.2.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>perf thread-stack: Fix thread stack return from kernel for kernel-only case</title>
<updated>2019-07-14T06:01:08+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-06-19T06:44:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=af31a530e760736df39e1758446b0d2fc394e684'/>
<id>af31a530e760736df39e1758446b0d2fc394e684</id>
<content type='text'>
commit 97860b483c5597663a174ff7405be957b4838391 upstream.

Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a
different symbol") had the side-effect of introducing more stack entries
before return from kernel space.

When user space is also traced, those entries are popped before entry to
user space, but when user space is not traced, they get stuck at the
bottom of the stack, making the stack grow progressively larger.

Fix by detecting a return-from-kernel branch type, and popping kernel
addresses from the stack then.

Note, the problem and fix affect the exported Call Graph / Tree but not
the callindent option used by "perf script --call-trace".

Example:

  perf-with-kcore record example -e intel_pt//k -- ls
  perf-with-kcore script example --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db

  Menu option: Reports -&gt; Context-Sensitive Call Graph

  Before: (showing Call Path column only)

    Call Path
    ▶ perf
    ▼ ls
      ▼ 12111:12111
        ▶ setup_new_exec
        ▶ __task_pid_nr_ns
        ▶ perf_event_pid_type
        ▶ perf_event_comm_output
        ▶ perf_iterate_ctx
        ▶ perf_iterate_sb
        ▶ perf_event_comm
        ▶ __set_task_comm
        ▶ load_elf_binary
        ▶ search_binary_handler
        ▶ __do_execve_file.isra.41
        ▶ __x64_sys_execve
        ▶ do_syscall_64
        ▼ entry_SYSCALL_64_after_hwframe
          ▼ swapgs_restore_regs_and_return_to_usermode
            ▼ native_iret
              ▶ error_entry
              ▶ do_page_fault
              ▼ error_exit
                ▼ retint_user
                  ▶ prepare_exit_to_usermode
                  ▼ native_iret
                    ▶ error_entry
                    ▶ do_page_fault
                    ▼ error_exit
                      ▼ retint_user
                        ▶ prepare_exit_to_usermode
                        ▼ native_iret
                          ▶ error_entry
                          ▶ do_page_fault
                          ▼ error_exit
                            ▼ retint_user
                              ▶ prepare_exit_to_usermode
                              ▶ native_iret

  After: (showing Call Path column only)

    Call Path
    ▶ perf
    ▼ ls
      ▼ 12111:12111
        ▶ setup_new_exec
        ▶ __task_pid_nr_ns
        ▶ perf_event_pid_type
        ▶ perf_event_comm_output
        ▶ perf_iterate_ctx
        ▶ perf_iterate_sb
        ▶ perf_event_comm
        ▶ __set_task_comm
        ▶ load_elf_binary
        ▶ search_binary_handler
        ▶ __do_execve_file.isra.41
        ▶ __x64_sys_execve
        ▶ do_syscall_64
        ▶ entry_SYSCALL_64_after_hwframe
        ▶ page_fault
        ▼ entry_SYSCALL_64
          ▼ do_syscall_64
            ▶ __x64_sys_brk
            ▶ __x64_sys_access
            ▶ __x64_sys_openat
            ▶ __x64_sys_newfstat
            ▶ __x64_sys_mmap
            ▶ __x64_sys_close
            ▶ __x64_sys_read
            ▶ __x64_sys_mprotect
            ▶ __x64_sys_arch_prctl
            ▶ __x64_sys_munmap
            ▶ exit_to_usermode_loop
            ▶ __x64_sys_set_tid_address
            ▶ __x64_sys_set_robust_list
            ▶ __x64_sys_rt_sigaction
            ▶ __x64_sys_rt_sigprocmask
            ▶ __x64_sys_prlimit64
            ▶ __x64_sys_statfs
            ▶ __x64_sys_ioctl
            ▶ __x64_sys_getdents64
            ▶ __x64_sys_write
            ▶ __x64_sys_exit_group

Committer notes:

The first arg to the perf-with-kcore needs to be the same for the
'record' and 'script' lines, otherwise we'll record the perf.data file
and kcore_dir/ files in one directory ('example') to then try to use it
from the 'bep' directory, fix the instructions above it so that both use
'example'.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: stable@vger.kernel.org
Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol")
Link: http://lkml.kernel.org/r/20190619064429.14940-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 97860b483c5597663a174ff7405be957b4838391 upstream.

Commit f08046cb3082 ("perf thread-stack: Represent jmps to the start of a
different symbol") had the side-effect of introducing more stack entries
before return from kernel space.

When user space is also traced, those entries are popped before entry to
user space, but when user space is not traced, they get stuck at the
bottom of the stack, making the stack grow progressively larger.

Fix by detecting a return-from-kernel branch type, and popping kernel
addresses from the stack then.

Note, the problem and fix affect the exported Call Graph / Tree but not
the callindent option used by "perf script --call-trace".

Example:

  perf-with-kcore record example -e intel_pt//k -- ls
  perf-with-kcore script example --itrace=bep -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py example.db branches calls
  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py example.db

  Menu option: Reports -&gt; Context-Sensitive Call Graph

  Before: (showing Call Path column only)

    Call Path
    ▶ perf
    ▼ ls
      ▼ 12111:12111
        ▶ setup_new_exec
        ▶ __task_pid_nr_ns
        ▶ perf_event_pid_type
        ▶ perf_event_comm_output
        ▶ perf_iterate_ctx
        ▶ perf_iterate_sb
        ▶ perf_event_comm
        ▶ __set_task_comm
        ▶ load_elf_binary
        ▶ search_binary_handler
        ▶ __do_execve_file.isra.41
        ▶ __x64_sys_execve
        ▶ do_syscall_64
        ▼ entry_SYSCALL_64_after_hwframe
          ▼ swapgs_restore_regs_and_return_to_usermode
            ▼ native_iret
              ▶ error_entry
              ▶ do_page_fault
              ▼ error_exit
                ▼ retint_user
                  ▶ prepare_exit_to_usermode
                  ▼ native_iret
                    ▶ error_entry
                    ▶ do_page_fault
                    ▼ error_exit
                      ▼ retint_user
                        ▶ prepare_exit_to_usermode
                        ▼ native_iret
                          ▶ error_entry
                          ▶ do_page_fault
                          ▼ error_exit
                            ▼ retint_user
                              ▶ prepare_exit_to_usermode
                              ▶ native_iret

  After: (showing Call Path column only)

    Call Path
    ▶ perf
    ▼ ls
      ▼ 12111:12111
        ▶ setup_new_exec
        ▶ __task_pid_nr_ns
        ▶ perf_event_pid_type
        ▶ perf_event_comm_output
        ▶ perf_iterate_ctx
        ▶ perf_iterate_sb
        ▶ perf_event_comm
        ▶ __set_task_comm
        ▶ load_elf_binary
        ▶ search_binary_handler
        ▶ __do_execve_file.isra.41
        ▶ __x64_sys_execve
        ▶ do_syscall_64
        ▶ entry_SYSCALL_64_after_hwframe
        ▶ page_fault
        ▼ entry_SYSCALL_64
          ▼ do_syscall_64
            ▶ __x64_sys_brk
            ▶ __x64_sys_access
            ▶ __x64_sys_openat
            ▶ __x64_sys_newfstat
            ▶ __x64_sys_mmap
            ▶ __x64_sys_close
            ▶ __x64_sys_read
            ▶ __x64_sys_mprotect
            ▶ __x64_sys_arch_prctl
            ▶ __x64_sys_munmap
            ▶ exit_to_usermode_loop
            ▶ __x64_sys_set_tid_address
            ▶ __x64_sys_set_robust_list
            ▶ __x64_sys_rt_sigaction
            ▶ __x64_sys_rt_sigprocmask
            ▶ __x64_sys_prlimit64
            ▶ __x64_sys_statfs
            ▶ __x64_sys_ioctl
            ▶ __x64_sys_getdents64
            ▶ __x64_sys_write
            ▶ __x64_sys_exit_group

Committer notes:

The first arg to the perf-with-kcore needs to be the same for the
'record' and 'script' lines, otherwise we'll record the perf.data file
and kcore_dir/ files in one directory ('example') to then try to use it
from the 'bep' directory, fix the instructions above it so that both use
'example'.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: stable@vger.kernel.org
Fixes: f08046cb3082 ("perf thread-stack: Represent jmps to the start of a different symbol")
Link: http://lkml.kernel.org/r/20190619064429.14940-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288</title>
<updated>2019-06-05T15:36:37+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2019-05-29T14:18:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2025cf9e193de05b0654570dd639acb49ebd3adf'/>
<id>2025cf9e193de05b0654570dd639acb49ebd3adf</id>
<content type='text'>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms and conditions of the gnu general public license
  version 2 as published by the free software foundation this program
  is distributed in the hope it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Alexios Zavras &lt;alexios.zavras@intel.com&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms and conditions of the gnu general public license
  version 2 as published by the free software foundation this program
  is distributed in the hope it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 263 file(s).

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Allison Randal &lt;allison@lohutok.net&gt;
Reviewed-by: Alexios Zavras &lt;alexios.zavras@intel.com&gt;
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf db-export: Add calls parent_id to enable creation of call trees</title>
<updated>2019-03-01T17:50:47+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-02-28T13:00:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f435887ec0c941b97301bd6ed1f3e4b5200df409'/>
<id>f435887ec0c941b97301bd6ed1f3e4b5200df409</id>
<content type='text'>
The call_path can be used to find the parent symbol for a call but not
the exact parent call. To do that add parent_id to the call_return
export. This enables the creation of a call tree from the exported data.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The call_path can be used to find the parent symbol for a call but not
the exact parent call. To do that add parent_id to the call_return
export. This enables the creation of a call tree from the exported data.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Link: https://lkml.kernel.org/n/tip-6j7tzdxo67cox6kan7k22oo6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Hide x86 retpolines</title>
<updated>2019-02-22T19:49:49+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-01-09T09:18:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3c0cd952cf051903929cd57b89034cc5f71f451d'/>
<id>3c0cd952cf051903929cd57b89034cc5f71f451d</id>
<content type='text'>
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.

This does not handle function return thunks, nor is there any
improvement for the handling of inline thunks or extern thunks.

Example:

  $ cat simple-retpoline.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  int foo(void)
  {
          return bar() + 1;
  }

  __attribute__((indirect_branch("thunk"))) int main()
  {
          int (*volatile fn)(void) = foo;

          fn();
          return fn();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
  $ objdump -d simple-retpoline
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       48 83 ec 18             sub    $0x18,%rsp
      1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 &lt;foo&gt;
      104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
      1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      1055:       e8 1f 01 00 00          callq  1179 &lt;__x86_indirect_thunk_rax&gt;
      105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      105f:       48 83 c4 18             add    $0x18,%rsp
      1063:       e9 11 01 00 00          jmpq   1179 &lt;__x86_indirect_thunk_rax&gt;
  &lt;SNIP&gt;
  0000000000001160 &lt;bar&gt;:
      1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1165:       c3                      retq
  &lt;SNIP&gt;
  0000000000001170 &lt;foo&gt;:
      1170:       e8 eb ff ff ff          callq  1160 &lt;bar&gt;
      1175:       83 c0 01                add    $0x1,%eax
      1178:       c3                      retq
  0000000000001179 &lt;__x86_indirect_thunk_rax&gt;:
      1179:       e8 07 00 00 00          callq  1185 &lt;__x86_indirect_thunk_rax+0xc&gt;
      117e:       f3 90                   pause
      1180:       0f ae e8                lfence
      1183:       eb f9                   jmp    117e &lt;__x86_indirect_thunk_rax+0x5&gt;
      1185:       48 89 04 24             mov    %rax,(%rsp)
      1189:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
  $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
  2019-01-08 14:03:37.851655 Creating database...
  2019-01-08 14:03:37.863256 Writing records...
  2019-01-08 14:03:38.069750 Adding indexes
  2019-01-08 14:03:38.078799 Done
  $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db

Before:

    main
        -&gt; __x86_indirect_thunk_rax
            -&gt; __x86_indirect_thunk_rax
                -&gt; foo
                    -&gt; bar

After:

    main
        -&gt; foo
            -&gt; bar

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
[ Remove (sym-&gt;name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
x86 retpoline functions pollute the call graph by showing up everywhere
there is an indirect branch, but they do not really mean anything. Make
changes so that the default retpoline functions will no longer appear in
the call graph. Note this only affects the call graph, since all the
original branches are left unchanged.

This does not handle function return thunks, nor is there any
improvement for the handling of inline thunks or extern thunks.

Example:

  $ cat simple-retpoline.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  int foo(void)
  {
          return bar() + 1;
  }

  __attribute__((indirect_branch("thunk"))) int main()
  {
          int (*volatile fn)(void) = foo;

          fn();
          return fn();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
  $ objdump -d simple-retpoline
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       48 83 ec 18             sub    $0x18,%rsp
      1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 &lt;foo&gt;
      104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
      1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      1055:       e8 1f 01 00 00          callq  1179 &lt;__x86_indirect_thunk_rax&gt;
      105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      105f:       48 83 c4 18             add    $0x18,%rsp
      1063:       e9 11 01 00 00          jmpq   1179 &lt;__x86_indirect_thunk_rax&gt;
  &lt;SNIP&gt;
  0000000000001160 &lt;bar&gt;:
      1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1165:       c3                      retq
  &lt;SNIP&gt;
  0000000000001170 &lt;foo&gt;:
      1170:       e8 eb ff ff ff          callq  1160 &lt;bar&gt;
      1175:       83 c0 01                add    $0x1,%eax
      1178:       c3                      retq
  0000000000001179 &lt;__x86_indirect_thunk_rax&gt;:
      1179:       e8 07 00 00 00          callq  1185 &lt;__x86_indirect_thunk_rax+0xc&gt;
      117e:       f3 90                   pause
      1180:       0f ae e8                lfence
      1183:       eb f9                   jmp    117e &lt;__x86_indirect_thunk_rax+0x5&gt;
      1185:       48 89 04 24             mov    %rax,(%rsp)
      1189:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
  $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
  2019-01-08 14:03:37.851655 Creating database...
  2019-01-08 14:03:37.863256 Writing records...
  2019-01-08 14:03:38.069750 Adding indexes
  2019-01-08 14:03:38.078799 Done
  $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db

Before:

    main
        -&gt; __x86_indirect_thunk_rax
            -&gt; __x86_indirect_thunk_rax
                -&gt; foo
                    -&gt; bar

After:

    main
        -&gt; foo
            -&gt; bar

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-7-adrian.hunter@intel.com
[ Remove (sym-&gt;name != NULL) test, this is not a pointer and breaks the build with clang version 7.0.1 (Fedora 7.0.1-2.fc30) ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Improve thread_stack__no_call_return()</title>
<updated>2019-02-22T14:42:34+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-01-09T09:18:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1f35cd65386e02430546727bd3cc2508d052e3ec'/>
<id>1f35cd65386e02430546727bd3cc2508d052e3ec</id>
<content type='text'>
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:

Example:

  $ cat simple-retpoline.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  int foo(void)
  {
          return bar() + 1;
  }

  __attribute__((indirect_branch("thunk"))) int main()
  {
          int (*volatile fn)(void) = foo;

          fn();
          return fn();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
  $ objdump -d simple-retpoline
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       48 83 ec 18             sub    $0x18,%rsp
      1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 &lt;foo&gt;
      104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
      1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      1055:       e8 1f 01 00 00          callq  1179 &lt;__x86_indirect_thunk_rax&gt;
      105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      105f:       48 83 c4 18             add    $0x18,%rsp
      1063:       e9 11 01 00 00          jmpq   1179 &lt;__x86_indirect_thunk_rax&gt;
  &lt;SNIP&gt;
  0000000000001160 &lt;bar&gt;:
      1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1165:       c3                      retq
  &lt;SNIP&gt;
  0000000000001170 &lt;foo&gt;:
      1170:       e8 eb ff ff ff          callq  1160 &lt;bar&gt;
      1175:       83 c0 01                add    $0x1,%eax
      1178:       c3                      retq
  0000000000001179 &lt;__x86_indirect_thunk_rax&gt;:
      1179:       e8 07 00 00 00          callq  1185 &lt;__x86_indirect_thunk_rax+0xc&gt;
      117e:       f3 90                   pause
      1180:       0f ae e8                lfence
      1183:       eb f9                   jmp    117e &lt;__x86_indirect_thunk_rax+0x5&gt;
      1185:       48 89 04 24             mov    %rax,(%rsp)
      1189:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
  $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
  2019-01-08 14:03:37.851655 Creating database...
  2019-01-08 14:03:37.863256 Writing records...
  2019-01-08 14:03:38.069750 Adding indexes
  2019-01-08 14:03:38.078799 Done
  $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db

Before:

    main
        -&gt; __x86_indirect_thunk_rax
            -&gt; __x86_indirect_thunk_rax
                -&gt; __x86_indirect_thunk_rax
                    -&gt; bar

After:

    main
        -&gt; __x86_indirect_thunk_rax
            -&gt; __x86_indirect_thunk_rax
                -&gt; foo
                    -&gt; bar

Committer testing:

Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:

Before:

simple-retpolin
   PID:PID
      _start
         _start
            __libc_start_main
               main
                   __x86_indirect_thunk_rax
                      __x86_indirect_thunk_rax
                      bar

After:

Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':

               main
                   __x86_indirect_thunk_rax
                       __x86_indirect_thunk_rax
                           foo
                               bar

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Improve thread_stack__no_call_return() to better handle 'returns' that
do not match the stack i.e. 'no call'. See code comments for details.
The example below shows how retpolines are affected:

Example:

  $ cat simple-retpoline.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  int foo(void)
  {
          return bar() + 1;
  }

  __attribute__((indirect_branch("thunk"))) int main()
  {
          int (*volatile fn)(void) = foo;

          fn();
          return fn();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o simple-retpoline simple-retpoline.c
  $ objdump -d simple-retpoline
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       48 83 ec 18             sub    $0x18,%rsp
      1044:       48 8d 05 25 01 00 00    lea    0x125(%rip),%rax        # 1170 &lt;foo&gt;
      104b:       48 89 44 24 08          mov    %rax,0x8(%rsp)
      1050:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      1055:       e8 1f 01 00 00          callq  1179 &lt;__x86_indirect_thunk_rax&gt;
      105a:       48 8b 44 24 08          mov    0x8(%rsp),%rax
      105f:       48 83 c4 18             add    $0x18,%rsp
      1063:       e9 11 01 00 00          jmpq   1179 &lt;__x86_indirect_thunk_rax&gt;
  &lt;SNIP&gt;
  0000000000001160 &lt;bar&gt;:
      1160:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1165:       c3                      retq
  &lt;SNIP&gt;
  0000000000001170 &lt;foo&gt;:
      1170:       e8 eb ff ff ff          callq  1160 &lt;bar&gt;
      1175:       83 c0 01                add    $0x1,%eax
      1178:       c3                      retq
  0000000000001179 &lt;__x86_indirect_thunk_rax&gt;:
      1179:       e8 07 00 00 00          callq  1185 &lt;__x86_indirect_thunk_rax+0xc&gt;
      117e:       f3 90                   pause
      1180:       0f ae e8                lfence
      1183:       eb f9                   jmp    117e &lt;__x86_indirect_thunk_rax+0x5&gt;
      1185:       48 89 04 24             mov    %rax,(%rsp)
      1189:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o simple-retpoline.perf.data -e intel_pt/cyc/u ./simple-retpoline
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB simple-retpoline.perf.data ]
  $ perf script -i simple-retpoline.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py simple-retpoline.db branches calls
  2019-01-08 14:03:37.851655 Creating database...
  2019-01-08 14:03:37.863256 Writing records...
  2019-01-08 14:03:38.069750 Adding indexes
  2019-01-08 14:03:38.078799 Done
  $ ~/libexec/perf-core/scripts/python/exported-sql-viewer.py simple-retpoline.db

Before:

    main
        -&gt; __x86_indirect_thunk_rax
            -&gt; __x86_indirect_thunk_rax
                -&gt; __x86_indirect_thunk_rax
                    -&gt; bar

After:

    main
        -&gt; __x86_indirect_thunk_rax
            -&gt; __x86_indirect_thunk_rax
                -&gt; foo
                    -&gt; bar

Committer testing:

Chose "Reports", Then "Context-Sensitive Call Graph" and then go on
expanding:

Before:

simple-retpolin
   PID:PID
      _start
         _start
            __libc_start_main
               main
                   __x86_indirect_thunk_rax
                      __x86_indirect_thunk_rax
                      bar

After:

Remove the "simple.retpoline.db" file, run again the 'perf script' line
to regenerate the .db file and run the exported-sql-viewer.py again to
get the same all the way to 'main', then, from there, including 'main':

               main
                   __x86_indirect_thunk_rax
                       __x86_indirect_thunk_rax
                           foo
                               bar

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-6-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Represent jmps to the start of a different symbol</title>
<updated>2019-02-06T13:00:40+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-01-09T09:18:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f08046cb3082b313e7b08dc35838cf8bd902c36b'/>
<id>f08046cb3082b313e7b08dc35838cf8bd902c36b</id>
<content type='text'>
The compiler might optimize a call/ret combination by making it a jmp.
However the thread-stack does not presently cater for that, so that such
control flow is not visible in the call graph. Make it visible by
recording on the stack a branch to the start of a different symbol.
Note, that means when a ret pops the stack, all jmps must be popped off
first.

Example:

  $ cat jmp-to-fn.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  __attribute__((noinline)) int foo(void)
  {
          return bar() + 1;
  }

  int main()
  {
          return foo();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
  $ objdump -d jmp-to-fn
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       31 c0                   xor    %eax,%eax
      1042:       e9 09 01 00 00          jmpq   1150 &lt;foo&gt;
  &lt;SNIP&gt;
  0000000000001140 &lt;bar&gt;:
      1140:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1145:       c3                      retq
  &lt;SNIP&gt;
  0000000000001150 &lt;foo&gt;:
      1150:       31 c0                   xor    %eax,%eax
      1152:       e8 e9 ff ff ff          callq  1140 &lt;bar&gt;
      1157:       83 c0 01                add    $0x1,%eax
      115a:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
  $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
  2019-01-08 13:24:58.783069 Creating database...
  2019-01-08 13:24:58.794650 Writing records...
  2019-01-08 13:24:59.008050 Adding indexes
  2019-01-08 13:24:59.015802 Done
  $  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db

Before:

    main
        -&gt; bar

After:

    main
        -&gt; foo
            -&gt; bar

Committer testing:

Install the python2-pyside package, then select these menu options
on the GUI:

   "Reports"
      "Context sensitive callgraphs"

Then go on expanding the symbols, to get, full picture when doing this
on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC):

jmp-to-fn
  PID:TID
    _start                (ld-2.28.so)
      __libc_start_main
        main
          foo
            bar

To verify that indeed, this fixes the problem.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The compiler might optimize a call/ret combination by making it a jmp.
However the thread-stack does not presently cater for that, so that such
control flow is not visible in the call graph. Make it visible by
recording on the stack a branch to the start of a different symbol.
Note, that means when a ret pops the stack, all jmps must be popped off
first.

Example:

  $ cat jmp-to-fn.c
  __attribute__((noinline)) int bar(void)
  {
          return -1;
  }

  __attribute__((noinline)) int foo(void)
  {
          return bar() + 1;
  }

  int main()
  {
          return foo();
  }
  $ gcc -ggdb3 -Wall -Wextra -O2 -o jmp-to-fn jmp-to-fn.c
  $ objdump -d jmp-to-fn
  &lt;SNIP&gt;
  0000000000001040 &lt;main&gt;:
      1040:       31 c0                   xor    %eax,%eax
      1042:       e9 09 01 00 00          jmpq   1150 &lt;foo&gt;
  &lt;SNIP&gt;
  0000000000001140 &lt;bar&gt;:
      1140:       b8 ff ff ff ff          mov    $0xffffffff,%eax
      1145:       c3                      retq
  &lt;SNIP&gt;
  0000000000001150 &lt;foo&gt;:
      1150:       31 c0                   xor    %eax,%eax
      1152:       e8 e9 ff ff ff          callq  1140 &lt;bar&gt;
      1157:       83 c0 01                add    $0x1,%eax
      115a:       c3                      retq
  &lt;SNIP&gt;
  $ perf record -o jmp-to-fn.perf.data -e intel_pt/cyc/u ./jmp-to-fn
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0,017 MB jmp-to-fn.perf.data ]
  $ perf script -i jmp-to-fn.perf.data --itrace=be -s ~/libexec/perf-core/scripts/python/export-to-sqlite.py jmp-to-fn.db branches calls
  2019-01-08 13:24:58.783069 Creating database...
  2019-01-08 13:24:58.794650 Writing records...
  2019-01-08 13:24:59.008050 Adding indexes
  2019-01-08 13:24:59.015802 Done
  $  ~/libexec/perf-core/scripts/python/exported-sql-viewer.py jmp-to-fn.db

Before:

    main
        -&gt; bar

After:

    main
        -&gt; foo
            -&gt; bar

Committer testing:

Install the python2-pyside package, then select these menu options
on the GUI:

   "Reports"
      "Context sensitive callgraphs"

Then go on expanding the symbols, to get, full picture when doing this
on a fedora:29 with gcc version 8.2.1 20181215 (Red Hat 8.2.1-6) (GCC):

jmp-to-fn
  PID:TID
    _start                (ld-2.28.so)
      __libc_start_main
        main
          foo
            bar

To verify that indeed, this fixes the problem.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-5-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Tidy thread_stack__no_call_return() by adding more local variables</title>
<updated>2019-02-06T13:00:40+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-01-09T09:18:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=90c2cda7056e3a7555d874a27aae12fd46ca802e'/>
<id>90c2cda7056e3a7555d874a27aae12fd46ca802e</id>
<content type='text'>
Make thread_stack__no_call_return() more readable by adding more local
variables.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make thread_stack__no_call_return() more readable by adding more local
variables.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-4-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Tidy thread_stack__push_cp() usage</title>
<updated>2019-02-06T13:00:40+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2019-01-09T09:18:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e7a3a055f2b88ebd0bdae8b0aade1e7d80c8a81e'/>
<id>e7a3a055f2b88ebd0bdae8b0aade1e7d80c8a81e</id>
<content type='text'>
If 'cp' is checked in thread_stack__push_cp() a number of error checks
can be removed, reducing code size and improving readability.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If 'cp' is checked in thread_stack__push_cp() a number of error checks
can be removed, reducing code size and improving readability.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20190109091835.5570-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Fix thread stack processing for the idle task</title>
<updated>2019-01-02T14:03:17+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2018-12-21T12:06:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=256d92bc93fd40411a02be5cdba74a7bf91e6e09'/>
<id>256d92bc93fd40411a02be5cdba74a7bf91e6e09</id>
<content type='text'>
perf creates a single 'struct thread' to represent the idle task. That
is because threads are identified by PID and TID, and the idle task
always has PID == TID == 0.

However, there are actually separate idle tasks for each CPU. That
creates a problem for thread stack processing which assumes that each
thread has a single stack, not one stack per CPU.

Fix that by passing through the CPU number, and in the case of the idle
"thread", pick the thread stack from an array based on the CPU number.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20181221120620.9659-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
perf creates a single 'struct thread' to represent the idle task. That
is because threads are identified by PID and TID, and the idle task
always has PID == TID == 0.

However, there are actually separate idle tasks for each CPU. That
creates a problem for thread stack processing which assumes that each
thread has a single stack, not one stack per CPU.

Fix that by passing through the CPU number, and in the case of the idle
"thread", pick the thread stack from an array based on the CPU number.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20181221120620.9659-8-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf thread-stack: Allocate an array of thread stacks</title>
<updated>2019-01-02T13:55:55+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2018-12-21T12:06:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=139f42f3b3b495e61bb2cfef40e1dd5e845e3052'/>
<id>139f42f3b3b495e61bb2cfef40e1dd5e845e3052</id>
<content type='text'>
In preparation for fixing thread stack processing for the idle task,
allocate an array of thread stacks.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20181221120620.9659-7-adrian.hunter@intel.com
[ No need to check for NULL when calling zfree(), noticed by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for fixing thread stack processing for the idle task,
allocate an array of thread stacks.

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Link: http://lkml.kernel.org/r/20181221120620.9659-7-adrian.hunter@intel.com
[ No need to check for NULL when calling zfree(), noticed by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
