<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/tools/perf/util/machine.c, branch v4.20.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>perf thread: Add fallback functions for cases where cpumode is insufficient</title>
<updated>2019-01-09T16:45:56+00:00</updated>
<author>
<name>Adrian Hunter</name>
<email>adrian.hunter@intel.com</email>
</author>
<published>2018-11-06T21:07:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=76514a7511955c25f0adb5f42f403a8e71d1334e'/>
<id>76514a7511955c25f0adb5f42f403a8e71d1334e</id>
<content type='text'>
commit 8e80ad9983caeee09c3a0a1a37e05bff93becce4 upstream.

For branch stacks or branch samples, the sample cpumode might not be
correct because it applies only to the sample 'ip' and not necessary to
'addr' or branch stack addresses. Add fallback functions that can be
used to deal with those cases

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Mathieu Poirier &lt;mathieu.poirier@linaro.org&gt;
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181106210712.12098-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8e80ad9983caeee09c3a0a1a37e05bff93becce4 upstream.

For branch stacks or branch samples, the sample cpumode might not be
correct because it applies only to the sample 'ip' and not necessary to
'addr' or branch stack addresses. Add fallback functions that can be
used to deal with those cases

Signed-off-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Leo Yan &lt;leo.yan@linaro.org&gt;
Cc: Mathieu Poirier &lt;mathieu.poirier@linaro.org&gt;
Cc: stable@vger.kernel.org # 4.19
Link: http://lkml.kernel.org/r/20181106210712.12098-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>perf tools: Don't clone maps from parent when synthesizing forks</title>
<updated>2018-10-31T13:18:01+00:00</updated>
<author>
<name>David Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-10-31T05:24:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4f8f382e635707ddaddf8269a116e4f8cc8835c0'/>
<id>4f8f382e635707ddaddf8269a116e4f8cc8835c0</id>
<content type='text'>
When synthesizing FORK events, we are trying to create thread objects
for the already running tasks on the machine.

Normally, for a kernel FORK event, we want to clone the parent's maps
because that is what the kernel just did.

But when synthesizing, this should not be done.  If we do, we end up
with overlapping maps as we process the sythesized MMAP2 events that
get delivered shortly thereafter.

Use the FORK event misc flags in an internal way to signal this
situation, so we can elide the map clone when appropriate.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Joe Mario &lt;jmario@redhat.com&gt;
Link: http://lkml.kernel.org/r/20181030.222404.2085088822877051075.davem@davemloft.net
[ Added comment about flag use in machine__process_fork_event(),
  use ternary op in thread__clone_map_groups() as suggested by Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When synthesizing FORK events, we are trying to create thread objects
for the already running tasks on the machine.

Normally, for a kernel FORK event, we want to clone the parent's maps
because that is what the kernel just did.

But when synthesizing, this should not be done.  If we do, we end up
with overlapping maps as we process the sythesized MMAP2 events that
get delivered shortly thereafter.

Use the FORK event misc flags in an internal way to signal this
situation, so we can elide the map clone when appropriate.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Cc: Don Zickus &lt;dzickus@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@redhat.com&gt;
Cc: Joe Mario &lt;jmario@redhat.com&gt;
Link: http://lkml.kernel.org/r/20181030.222404.2085088822877051075.davem@davemloft.net
[ Added comment about flag use in machine__process_fork_event(),
  use ternary op in thread__clone_map_groups() as suggested by Jiri ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}</title>
<updated>2018-10-31T12:57:51+00:00</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2018-10-30T15:12:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e9024d519d892b38176cafd46f68a7cdddd77412'/>
<id>e9024d519d892b38176cafd46f68a7cdddd77412</id>
<content type='text'>
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.

The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.

This seems silly and more expensive than it needs to be but it is enough
for a initial fix.

The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.

Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Souvik Banerjee &lt;souvik1997@gmail.com&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Cc: stable@vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.

The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.

This seems silly and more expensive than it needs to be but it is enough
for a initial fix.

The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.

Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Souvik Banerjee &lt;souvik1997@gmail.com&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Cc: stable@vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf record: Use unmapped IP for inline callchain cursors</title>
<updated>2018-10-05T14:18:09+00:00</updated>
<author>
<name>Milian Wolff</name>
<email>milian.wolff@kdab.com</email>
</author>
<published>2018-09-26T13:52:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf'/>
<id>7a8a8fcf7b860e4b2d4edc787c844d41cad9dfcf</id>
<content type='text'>
Only use the mapped IP to find inline frames, but keep using the
unmapped IP for the callchain cursor. This ensures we properly show the
unmapped IP when displaying a frame we received via the
dso__parse_addr_inlines API for a module which does not contain
sufficient debug symbols to show the srcline.

This is another follow-up to commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

Signed-off-by: Milian Wolff &lt;milian.wolff@kdab.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Tested-by: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets")
Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wolff@kdab.com
Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wolff@kdab.com
[ Squashed a fix from Milian for a problem reported by Ravi, fixed up space damage ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Only use the mapped IP to find inline frames, but keep using the
unmapped IP for the callchain cursor. This ensures we properly show the
unmapped IP when displaying a frame we received via the
dso__parse_addr_inlines API for a module which does not contain
sufficient debug symbols to show the srcline.

This is another follow-up to commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

Signed-off-by: Milian Wolff &lt;milian.wolff@kdab.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Tested-by: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Fixes: 19610184693c ("perf script: Show virtual addresses instead of offsets")
Link: http://lkml.kernel.org/r/20180926135207.30263-2-milian.wolff@kdab.com
Link: http://lkml.kernel.org/r/20181002073949.3297-1-milian.wolff@kdab.com
[ Squashed a fix from Milian for a problem reported by Ravi, fixed up space damage ]
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf report: Don't try to map ip to invalid map</title>
<updated>2018-09-27T19:05:43+00:00</updated>
<author>
<name>Milian Wolff</name>
<email>milian.wolff@kdab.com</email>
</author>
<published>2018-09-26T13:52:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ff4ce2885af8f9e8e99864d78dbeb4673f089c76'/>
<id>ff4ce2885af8f9e8e99864d78dbeb4673f089c76</id>
<content type='text'>
Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:

  #0  0x00005555557bdc4a in callchain_srcline (ip=&lt;error reading variable: Cannot access memory at address 0x38&gt;, sym=0x0, map=0x0) at util/machine.c:2329
  #1  unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329
  #2  0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 &lt;unwind_entry&gt;, thread=&lt;optimized out&gt;, ip=18446744073709551615) at util/unwind-libunwind-local.c:586
  #3  get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 &lt;unwind_entry&gt;, arg=0x7ffff5642498, max_stack=&lt;optimized out&gt;) at util/unwind-libunwind-local.c:703
  #4  0x0000555555837192 in _unwind__get_entries (cb=&lt;optimized out&gt;, arg=&lt;optimized out&gt;, thread=&lt;optimized out&gt;, data=&lt;optimized out&gt;, max_stack=&lt;optimized out&gt;) at util/unwind-libunwind-local.c:725
  #5  0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351
  #6  thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378
  #7  0x00005555557ba4ee in sample__resolve_callchain (sample=&lt;optimized out&gt;, cursor=&lt;optimized out&gt;, parent=parent@entry=0x7fffffff97b8, evsel=&lt;optimized out&gt;, al=al@entry=0x7fffffff9750,
      max_stack=&lt;optimized out&gt;) at util/callchain.c:1085

Signed-off-by: Milian Wolff &lt;milian.wolff@kdab.com&gt;
Tested-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:

  #0  0x00005555557bdc4a in callchain_srcline (ip=&lt;error reading variable: Cannot access memory at address 0x38&gt;, sym=0x0, map=0x0) at util/machine.c:2329
  #1  unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329
  #2  0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 &lt;unwind_entry&gt;, thread=&lt;optimized out&gt;, ip=18446744073709551615) at util/unwind-libunwind-local.c:586
  #3  get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 &lt;unwind_entry&gt;, arg=0x7ffff5642498, max_stack=&lt;optimized out&gt;) at util/unwind-libunwind-local.c:703
  #4  0x0000555555837192 in _unwind__get_entries (cb=&lt;optimized out&gt;, arg=&lt;optimized out&gt;, thread=&lt;optimized out&gt;, data=&lt;optimized out&gt;, max_stack=&lt;optimized out&gt;) at util/unwind-libunwind-local.c:725
  #5  0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351
  #6  thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378
  #7  0x00005555557ba4ee in sample__resolve_callchain (sample=&lt;optimized out&gt;, cursor=&lt;optimized out&gt;, parent=parent@entry=0x7fffffff97b8, evsel=&lt;optimized out&gt;, al=al@entry=0x7fffffff9750,
      max_stack=&lt;optimized out&gt;) at util/callchain.c:1085

Signed-off-by: Milian Wolff &lt;milian.wolff@kdab.com&gt;
Tested-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Jin Yao &lt;yao.jin@linux.intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf tools: Store compression id into struct dso</title>
<updated>2018-08-20T11:54:59+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2018-08-17T09:48:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2af5247530e073f4146d74ecd96cf64c953c001c'/>
<id>2af5247530e073f4146d74ecd96cf64c953c001c</id>
<content type='text'>
Add comp to 'struct dso' to hold the compression index.  It will be used
in the following patches.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20180817094813.15086-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add comp to 'struct dso' to hold the compression index.  It will be used
in the following patches.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: http://lkml.kernel.org/r/20180817094813.15086-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf machine: Use last_match threads cache only in single thread mode</title>
<updated>2018-07-24T17:53:52+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2018-07-19T14:33:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b57334b9453949bf81281321d14d86d60aee6fde'/>
<id>b57334b9453949bf81281321d14d86d60aee6fde</id>
<content type='text'>
There's an issue with using threads::last_match in multithread mode
which is enabled during the perf top synthesize. It might crash with
following assertion:

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
      at ...include/linux/refcount.h:109
  #5  0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
      at util/thread.c:115
  #6  0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
  #7  0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
      pid=2, tid=2) at util/machine.c:489
  #8  0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
      pid=2, tid=2) at util/machine.c:499
  #9  0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

the problem is that we don't serialize access to threads::last_match.
We serialize the access to the threads tree, but we don't care how's
threads::last_match being accessed. Both locked/unlocked paths use
that data and can set it. In multithreaded mode we can end up with
invalid object in thread__get call, like in following paths race:

  thread 1
    ...
    machine__findnew_thread
      down_write(&amp;threads-&gt;lock);
      __machine__findnew_thread
        ____machine__findnew_thread
          th = threads-&gt;last_match;
          if (th-&gt;tid == tid) {
            thread__get

  thread 2
    ...
    machine__find_thread
      down_read(&amp;threads-&gt;lock);
      __machine__findnew_thread
        ____machine__findnew_thread
          th = threads-&gt;last_match;
          if (th-&gt;tid == tid) {
            thread__get

  thread 3
    ...
    machine__process_fork_event
      machine__remove_thread
        __machine__remove_thread
          threads-&gt;last_match = NULL
          thread__put
      thread__put

Thread 1 and 2 might got stale last_match, before thread 3 clears
it. Thread 1 and 2 then race with thread 3's thread__put and they
might trigger the refcnt == 0 assertion above.

The patch is disabling the last_match cache for multiple thread
mode. It was originally meant for single thread scenarios, where
it's common to have multiple sequential searches of the same
thread.

In multithread mode this does not make sense, because top's threads
processes different /proc entries and so the 'struct threads' object
is queried for various threads. Moreover we'd need to add more locks
to make it work.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Lukasz Odzioba &lt;lukasz.odzioba@intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's an issue with using threads::last_match in multithread mode
which is enabled during the perf top synthesize. It might crash with
following assertion:

  perf: ...include/linux/refcount.h:109: refcount_inc:
        Assertion `!(!refcount_inc_not_zero(r))' failed.

The gdb backtrace looks like this:

  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  (gdb)
  #0  0x00007ffff50839fb in raise () from /lib64/libc.so.6
  #1  0x00007ffff5085800 in abort () from /lib64/libc.so.6
  #2  0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
  #3  0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
  #4  0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
      at ...include/linux/refcount.h:109
  #5  0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
      at util/thread.c:115
  #6  0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
      threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
  #7  0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
      pid=2, tid=2) at util/machine.c:489
  #8  0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
      pid=2, tid=2) at util/machine.c:499
  #9  0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
  ...

The failing assertion is this one:

  REFCOUNT_WARN(!refcount_inc_not_zero(r), ...

the problem is that we don't serialize access to threads::last_match.
We serialize the access to the threads tree, but we don't care how's
threads::last_match being accessed. Both locked/unlocked paths use
that data and can set it. In multithreaded mode we can end up with
invalid object in thread__get call, like in following paths race:

  thread 1
    ...
    machine__findnew_thread
      down_write(&amp;threads-&gt;lock);
      __machine__findnew_thread
        ____machine__findnew_thread
          th = threads-&gt;last_match;
          if (th-&gt;tid == tid) {
            thread__get

  thread 2
    ...
    machine__find_thread
      down_read(&amp;threads-&gt;lock);
      __machine__findnew_thread
        ____machine__findnew_thread
          th = threads-&gt;last_match;
          if (th-&gt;tid == tid) {
            thread__get

  thread 3
    ...
    machine__process_fork_event
      machine__remove_thread
        __machine__remove_thread
          threads-&gt;last_match = NULL
          thread__put
      thread__put

Thread 1 and 2 might got stale last_match, before thread 3 clears
it. Thread 1 and 2 then race with thread 3's thread__put and they
might trigger the refcnt == 0 assertion above.

The patch is disabling the last_match cache for multiple thread
mode. It was originally meant for single thread scenarios, where
it's common to have multiple sequential searches of the same
thread.

In multithread mode this does not make sense, because top's threads
processes different /proc entries and so the 'struct threads' object
is queried for various threads. Moreover we'd need to add more locks
to make it work.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Lukasz Odzioba &lt;lukasz.odzioba@intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf machine: Add threads__set_last_match function</title>
<updated>2018-07-24T17:53:42+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2018-07-19T14:33:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=67fda0f32cd9428cb9a3166796162097d7fcbcbf'/>
<id>67fda0f32cd9428cb9a3166796162097d7fcbcbf</id>
<content type='text'>
Separating threads::last_match cache set into separate
threads__set_last_match function.  This will be useful in following
patch.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Lukasz Odzioba &lt;lukasz.odzioba@intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Link: http://lkml.kernel.org/r/20180719143345.12963-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Separating threads::last_match cache set into separate
threads__set_last_match function.  This will be useful in following
patch.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Lukasz Odzioba &lt;lukasz.odzioba@intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Link: http://lkml.kernel.org/r/20180719143345.12963-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf machine: Add threads__get_last_match function</title>
<updated>2018-07-24T17:53:31+00:00</updated>
<author>
<name>Jiri Olsa</name>
<email>jolsa@kernel.org</email>
</author>
<published>2018-07-19T14:33:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f8b2ebb532e0553b60ae5ad1b84d1d4f0c285752'/>
<id>f8b2ebb532e0553b60ae5ad1b84d1d4f0c285752</id>
<content type='text'>
Separating threads::last_match cache read/check into separate
threads__get_last_match function. This will be useful in following
patch.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Lukasz Odzioba &lt;lukasz.odzioba@intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Link: http://lkml.kernel.org/r/20180719143345.12963-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Separating threads::last_match cache read/check into separate
threads__get_last_match function. This will be useful in following
patch.

Signed-off-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: David Ahern &lt;dsahern@gmail.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Lukasz Odzioba &lt;lukasz.odzioba@intel.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Wang Nan &lt;wangnan0@huawei.com&gt;
Link: http://lkml.kernel.org/r/20180719143345.12963-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf script: Show correct offsets for DWARF-based unwinding</title>
<updated>2018-07-24T17:53:11+00:00</updated>
<author>
<name>Sandipan Das</name>
<email>sandipan@linux.ibm.com</email>
</author>
<published>2018-07-03T12:05:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2a9d5050dc84fa2060f08a52f632976923e0fa7e'/>
<id>2a9d5050dc84fa2060f08a52f632976923e0fa7e</id>
<content type='text'>
When perf/data is recorded with the dwarf call-graph option, the
callchain shown by 'perf script' still shows the binary offsets of the
userspace symbols instead of their virtual addresses. Since the symbol
offset calculation is based on using virtual address as the ip, we see
incorrect offsets as well.

The use of virtual addresses affects the ability to find out the
line number in the corresponding source file to which an address
maps to as described in commit 67540759151a ("perf unwind: Use
addr_location::addr instead of ip for entries").

This has also been addressed by temporarily converting the virtual
address to the correponding binary offset so that it can be mapped
to the source line number correctly.

This is a follow-up for commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

This can be verified on a powerpc64le system running Fedora 27 as
shown below:

  # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton
  # perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1

Before:

  # perf report --stdio --no-children -s sym,srcline -g address

  # Samples: 1  of event 'probe_libc:inet_pton'
  # Event count (approx.): 1
  #
  # Overhead  Symbol                Source:Line
  # ........  ....................  ...........
  #
     100.00%  [.] __GI___inet_pton  inet_pton.c
              |
              ---gaih_inet getaddrinfo.c:537 (inlined)
                 __GI_getaddrinfo getaddrinfo.c:2304 (inlined)
                 main ping.c:519
                 generic_start_main libc-start.c:308 (inlined)
                 __libc_start_main libc-start.c:102
  ...

  # perf script -F comm,ip,sym,symoff,srcline,dso

  ping
                    15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so)
    libc-2.26.so[ffff80004ca0af28]
                    10fa53 gaih_inet+0xffff000099160f43
    libc-2.26.so[ffff80004c9bfa53] (inlined)
                    1105b3 __GI_getaddrinfo+0xffff000099160163
    libc-2.26.so[ffff80004c9c05b3] (inlined)
                      2d6f main+0xfffffffd9f1003df (/usr/bin/ping)
    ping[fffffffecf882d6f]
                     2369f generic_start_main+0xffff00009916013f
    libc-2.26.so[ffff80004c8d369f] (inlined)
                     23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so)
    libc-2.26.so[ffff80004c8d3897]

After:

  # perf report --stdio --no-children -s sym,srcline -g address

  # Samples: 1  of event 'probe_libc:inet_pton'
  # Event count (approx.): 1
  #
  # Overhead  Symbol                Source:Line
  # ........  ....................  ...........
  #
     100.00%  [.] __GI___inet_pton  inet_pton.c
              |
              ---gaih_inet.constprop.7 getaddrinfo.c:537
                 getaddrinfo getaddrinfo.c:2304
                 main ping.c:519
                 generic_start_main.isra.0 libc-start.c:308
                 __libc_start_main libc-start.c:102
  ...

  # perf script -F comm,ip,sym,symoff,srcline,dso

  ping
              7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
    inet_pton.c:68
              7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
    getaddrinfo.c:537
              7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
    getaddrinfo.c:2304
                 130782d6f main+0x3df (/usr/bin/ping)
    ping.c:519
              7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so)
    libc-start.c:308
              7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so)
    libc-start.c:102

Signed-off-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Milian Wolff &lt;milian.wolff@kdab.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip for entries")
Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When perf/data is recorded with the dwarf call-graph option, the
callchain shown by 'perf script' still shows the binary offsets of the
userspace symbols instead of their virtual addresses. Since the symbol
offset calculation is based on using virtual address as the ip, we see
incorrect offsets as well.

The use of virtual addresses affects the ability to find out the
line number in the corresponding source file to which an address
maps to as described in commit 67540759151a ("perf unwind: Use
addr_location::addr instead of ip for entries").

This has also been addressed by temporarily converting the virtual
address to the correponding binary offset so that it can be mapped
to the source line number correctly.

This is a follow-up for commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").

This can be verified on a powerpc64le system running Fedora 27 as
shown below:

  # perf probe -x /usr/lib64/libc-2.26.so -a inet_pton
  # perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1

Before:

  # perf report --stdio --no-children -s sym,srcline -g address

  # Samples: 1  of event 'probe_libc:inet_pton'
  # Event count (approx.): 1
  #
  # Overhead  Symbol                Source:Line
  # ........  ....................  ...........
  #
     100.00%  [.] __GI___inet_pton  inet_pton.c
              |
              ---gaih_inet getaddrinfo.c:537 (inlined)
                 __GI_getaddrinfo getaddrinfo.c:2304 (inlined)
                 main ping.c:519
                 generic_start_main libc-start.c:308 (inlined)
                 __libc_start_main libc-start.c:102
  ...

  # perf script -F comm,ip,sym,symoff,srcline,dso

  ping
                    15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so)
    libc-2.26.so[ffff80004ca0af28]
                    10fa53 gaih_inet+0xffff000099160f43
    libc-2.26.so[ffff80004c9bfa53] (inlined)
                    1105b3 __GI_getaddrinfo+0xffff000099160163
    libc-2.26.so[ffff80004c9c05b3] (inlined)
                      2d6f main+0xfffffffd9f1003df (/usr/bin/ping)
    ping[fffffffecf882d6f]
                     2369f generic_start_main+0xffff00009916013f
    libc-2.26.so[ffff80004c8d369f] (inlined)
                     23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so)
    libc-2.26.so[ffff80004c8d3897]

After:

  # perf report --stdio --no-children -s sym,srcline -g address

  # Samples: 1  of event 'probe_libc:inet_pton'
  # Event count (approx.): 1
  #
  # Overhead  Symbol                Source:Line
  # ........  ....................  ...........
  #
     100.00%  [.] __GI___inet_pton  inet_pton.c
              |
              ---gaih_inet.constprop.7 getaddrinfo.c:537
                 getaddrinfo getaddrinfo.c:2304
                 main ping.c:519
                 generic_start_main.isra.0 libc-start.c:308
                 __libc_start_main libc-start.c:102
  ...

  # perf script -F comm,ip,sym,symoff,srcline,dso

  ping
              7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
    inet_pton.c:68
              7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
    getaddrinfo.c:537
              7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
    getaddrinfo.c:2304
                 130782d6f main+0x3df (/usr/bin/ping)
    ping.c:519
              7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so)
    libc-start.c:308
              7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so)
    libc-start.c:102

Signed-off-by: Sandipan Das &lt;sandipan@linux.ibm.com&gt;
Acked-by: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Milian Wolff &lt;milian.wolff@kdab.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Naveen N. Rao &lt;naveen.n.rao@linux.vnet.ibm.com&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@linux.ibm.com&gt;
Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip for entries")
Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
