<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux/perf_event.h, branch v3.2.54</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>perf: Fix perf ring buffer memory ordering</title>
<updated>2013-11-28T14:01:59+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-10-28T12:55:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d08b0a5594dde8b0fbda5d38cb01a81954a9829e'/>
<id>d08b0a5594dde8b0fbda5d38cb01a81954a9829e</id>
<content type='text'>
commit bf378d341e4873ed928dc3c636252e6895a21f50 upstream.

The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky &lt;victork@il.ibm.com&gt;
Tested-by: Victor Kaplansky &lt;victork@il.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Cc: michael@ellerman.id.au
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bf378d341e4873ed928dc3c636252e6895a21f50 upstream.

The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.

Reported-by: Victor Kaplansky &lt;victork@il.ibm.com&gt;
Tested-by: Victor Kaplansky &lt;victork@il.ibm.com&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@polymtl.ca&gt;
Cc: michael@ellerman.id.au
Cc: Paul McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Michael Neuling &lt;mikey@neuling.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[bwh: Backported to 3.2: adjust filename]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf: Clarify perf_cpu_context::active_pmu usage by renaming it to ::unique_pmu</title>
<updated>2013-10-26T20:06:11+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2012-10-02T13:38:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6272221247a80adf0c36bf4f6ac8a4d7dcfb7e5b'/>
<id>6272221247a80adf0c36bf4f6ac8a4d7dcfb7e5b</id>
<content type='text'>
commit 3f1f33206c16c7b3839d71372bc2ac3f305aa802 upstream.

Stephane thought the perf_cpu_context::active_pmu name confusing and
suggested using 'unique_pmu' instead.

This pointer is a pointer to a 'random' pmu sharing the cpuctx
instance, therefore limiting a for_each_pmu loop to those where
cpuctx-&gt;unique_pmu matches the pmu we get a loop over unique cpuctx
instances.

Suggested-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3f1f33206c16c7b3839d71372bc2ac3f305aa802 upstream.

Stephane thought the perf_cpu_context::active_pmu name confusing and
suggested using 'unique_pmu' instead.

This pointer is a pointer to a 'random' pmu sharing the cpuctx
instance, therefore limiting a for_each_pmu loop to those where
cpuctx-&gt;unique_pmu matches the pmu we get a loop over unique cpuctx
instances.

Suggested-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/n/tip-kxyjqpfj2fn9gt7kwu5ag9ks@git.kernel.org
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf: Fix perf mmap bugs</title>
<updated>2013-07-27T04:34:32+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2013-05-28T08:55:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7edcd18138461f7ded2ee55c0662f49b29a3fc8d'/>
<id>7edcd18138461f7ded2ee55c0662f49b29a3fc8d</id>
<content type='text'>
commit 26cb63ad11e04047a64309362674bcbbd6a6f246 upstream.

Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma-&gt;vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

[Backporting for 3.4-stable.
VM_RESERVED flag was replaced with pair 'VM_DONTEXPAND | VM_DONTDUMP' in
314e51b9 since 3.7.0-rc1, and 314e51b9 comes from a big patchset, we didn't
backport the patchset, so I restored 'VM_DNOTEXPAND | VM_DONTDUMP' as before:
-       vma-&gt;vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+       vma-&gt;vm_flags |= VM_DONTCOPY | VM_RESERVED;
 -- zliu]

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Zhouping Liu &lt;zliu@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 26cb63ad11e04047a64309362674bcbbd6a6f246 upstream.

Vince reported a problem found by his perf specific trinity
fuzzer.

Al noticed 2 problems with perf's mmap():

 - it has issues against fork() since we use vma-&gt;vm_mm for accounting.
 - it has an rb refcount leak on double mmap().

We fix the issues against fork() by using VM_DONTCOPY; I don't
think there's code out there that uses this; we didn't hear
about weird accounting problems/crashes. If we do need this to
work, the previously proposed VM_PINNED could make this work.

Aside from the rb reference leak spotted by Al, Vince's example
prog was indeed doing a double mmap() through the use of
perf_event_set_output().

This exposes another problem, since we now have 2 events with
one buffer, the accounting gets screwy because we account per
event. Fix this by making the buffer responsible for its own
accounting.

[Backporting for 3.4-stable.
VM_RESERVED flag was replaced with pair 'VM_DONTEXPAND | VM_DONTDUMP' in
314e51b9 since 3.7.0-rc1, and 314e51b9 comes from a big patchset, we didn't
backport the patchset, so I restored 'VM_DNOTEXPAND | VM_DONTDUMP' as before:
-       vma-&gt;vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+       vma-&gt;vm_flags |= VM_DONTCOPY | VM_RESERVED;
 -- zliu]

Reported-by: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Signed-off-by: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Paul Mackerras &lt;paulus@samba.org&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@ghostprotocols.net&gt;
Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Zhouping Liu &lt;zliu@redhat.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf,x86: fix link failure for non-Intel configs</title>
<updated>2013-03-27T02:40:55+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2013-03-17T22:49:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9deb93cfc455f4b825edf096a06d82c78b67ea3b'/>
<id>9deb93cfc455f4b825edf096a06d82c78b67ea3b</id>
<content type='text'>
commit 6c4d3bc99b3341067775efd4d9d13cc8e655fd7c upstream.

Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:

	arch/x86/power/built-in.o: In function `restore_processor_state':
	(.text+0x45c): undefined reference to `perf_restore_debug_store'

Fix it by defining the dummy function appropriately.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6c4d3bc99b3341067775efd4d9d13cc8e655fd7c upstream.

Commit 1d9d8639c063 ("perf,x86: fix kernel crash with PEBS/BTS after
suspend/resume") introduces a link failure since
perf_restore_debug_store() is only defined for CONFIG_CPU_SUP_INTEL:

	arch/x86/power/built-in.o: In function `restore_processor_state':
	(.text+0x45c): undefined reference to `perf_restore_debug_store'

Fix it by defining the dummy function appropriately.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf,x86: fix kernel crash with PEBS/BTS after suspend/resume</title>
<updated>2013-03-27T02:40:54+00:00</updated>
<author>
<name>Stephane Eranian</name>
<email>eranian@google.com</email>
</author>
<published>2013-03-15T13:26:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8da1a618f37700141f1cf6e3a9c16dc1b1e261c9'/>
<id>8da1a618f37700141f1cf6e3a9c16dc1b1e261c9</id>
<content type='text'>
commit 1d9d8639c063caf6efc2447f5f26aa637f844ff6 upstream.

This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.

Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 1d9d8639c063caf6efc2447f5f26aa637f844ff6 upstream.

This patch fixes a kernel crash when using precise sampling (PEBS)
after a suspend/resume. Turns out the CPU notifier code is not invoked
on CPU0 (BP). Therefore, the DS_AREA (used by PEBS) is not restored properly
by the kernel and keeps it power-on/resume value of 0 causing any PEBS
measurement to crash when running on CPU0.

The workaround is to add a hook in the actual resume code to restore
the DS Area MSR value. It is invoked for all CPUS. So for all but CPU0,
the DS_AREA will be restored twice but this is harmless.

Reported-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf_event: Switch to internal refcount, fix race with close()</title>
<updated>2012-09-19T14:05:06+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@ZenIV.linux.org.uk</email>
</author>
<published>2012-08-20T13:59:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f8112ffff6bacaad4fcdf144fee23d1b1be74f9a'/>
<id>f8112ffff6bacaad4fcdf144fee23d1b1be74f9a</id>
<content type='text'>
commit a6fa941d94b411bbd2b6421ffbde6db3c93e65ab upstream.

Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event.  Use explicit refcount of its own
instead.  Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit a6fa941d94b411bbd2b6421ffbde6db3c93e65ab upstream.

Don't mess with file refcounts (or keep a reference to file, for
that matter) in perf_event.  Use explicit refcount of its own
instead.  Deal with the race between the final reference to event
going away and new children getting created for it by use of
atomic_long_inc_not_zero() in inherit_event(); just have the
latter free what it had allocated and return NULL, that works
out just fine (children of siblings of something doomed are
created as singletons, same as if the child of leader had been
created and immediately killed).

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20120820135925.GG23464@ZenIV.linux.org.uk
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf: Fix loss of notification with multi-event</title>
<updated>2011-12-05T08:33:03+00:00</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-11-26T01:47:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=10c6db110d0eb4466b59812c49088ab56218fc2e'/>
<id>10c6db110d0eb4466b59812c49088ab56218fc2e</id>
<content type='text'>
When you do:
        $ perf record -e cycles,cycles,cycles noploop 10

You expect about 10,000 samples for each event, i.e., 10s at
1000samples/sec. However, this is not what's happening. You
get much fewer samples, maybe 3700 samples/event:

$ perf report -D | tail -15
Aggregated stats:
           TOTAL events:      10998
            MMAP events:         66
            COMM events:          2
          SAMPLE events:      10930
cycles stats:
           TOTAL events:       3644
          SAMPLE events:       3644
cycles stats:
           TOTAL events:       3642
          SAMPLE events:       3642
cycles stats:
           TOTAL events:       3644
          SAMPLE events:       3644

On a Intel Nehalem or even AMD64, there are 4 counters capable
of measuring cycles, so there is plenty of space to measure those
events without multiplexing (even with the NMI watchdog active).
And even with multiplexing, we'd expect roughly the same number
of samples per event.

The root of the problem was that when the event that caused the buffer
to become full was not the first event passed on the cmdline, the user
notification would get lost. The notification was sent to the file
descriptor of the overflowed event but the perf tool was not polling
on it.  The perf tool aggregates all samples into a single buffer,
i.e., the buffer of the first event. Consequently, it assumes
notifications for any event will come via that descriptor.

The seemingly straight forward solution of moving the waitq into the
ringbuffer object doesn't work because of life-time issues. One could
perf_event_set_output() on a fd that you're also blocking on and cause
the old rb object to be freed while its waitq would still be
referenced by the blocked thread -&gt; FAIL.

Therefore link all events to the ringbuffer and broadcast the wakeup
from the ringbuffer object to all possible events that could be waited
upon. This is rather ugly, and we're open to better solutions but it
works for now.

Reported-by: Stephane Eranian &lt;eranian@google.com&gt;
Finished-by: Stephane Eranian &lt;eranian@google.com&gt;
Reviewed-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When you do:
        $ perf record -e cycles,cycles,cycles noploop 10

You expect about 10,000 samples for each event, i.e., 10s at
1000samples/sec. However, this is not what's happening. You
get much fewer samples, maybe 3700 samples/event:

$ perf report -D | tail -15
Aggregated stats:
           TOTAL events:      10998
            MMAP events:         66
            COMM events:          2
          SAMPLE events:      10930
cycles stats:
           TOTAL events:       3644
          SAMPLE events:       3644
cycles stats:
           TOTAL events:       3642
          SAMPLE events:       3642
cycles stats:
           TOTAL events:       3644
          SAMPLE events:       3644

On a Intel Nehalem or even AMD64, there are 4 counters capable
of measuring cycles, so there is plenty of space to measure those
events without multiplexing (even with the NMI watchdog active).
And even with multiplexing, we'd expect roughly the same number
of samples per event.

The root of the problem was that when the event that caused the buffer
to become full was not the first event passed on the cmdline, the user
notification would get lost. The notification was sent to the file
descriptor of the overflowed event but the perf tool was not polling
on it.  The perf tool aggregates all samples into a single buffer,
i.e., the buffer of the first event. Consequently, it assumes
notifications for any event will come via that descriptor.

The seemingly straight forward solution of moving the waitq into the
ringbuffer object doesn't work because of life-time issues. One could
perf_event_set_output() on a fd that you're also blocking on and cause
the old rb object to be freed while its waitq would still be
referenced by the blocked thread -&gt; FAIL.

Therefore link all events to the ringbuffer and broadcast the wakeup
from the ringbuffer object to all possible events that could be waited
upon. This is rather ugly, and we're open to better solutions but it
works for now.

Reported-by: Stephane Eranian &lt;eranian@google.com&gt;
Finished-by: Stephane Eranian &lt;eranian@google.com&gt;
Reviewed-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20111126014731.GA7030@quad
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf, core: Introduce attrs to count in either host or guest mode</title>
<updated>2011-10-06T11:00:28+00:00</updated>
<author>
<name>Joerg Roedel</name>
<email>joerg.roedel@amd.com</email>
</author>
<published>2011-10-05T12:01:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a240f76165e6255384d4bdb8139895fac7988799'/>
<id>a240f76165e6255384d4bdb8139895fac7988799</id>
<content type='text'>
The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.

An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.

Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Gleb Natapov &lt;gleb@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1317816084-18026-2-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.

An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.

Signed-off-by: Joerg Roedel &lt;joerg.roedel@amd.com&gt;
Signed-off-by: Gleb Natapov &lt;gleb@redhat.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/1317816084-18026-2-git-send-email-gleb@redhat.com
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf events: Fix slow and broken cgroup context switch code</title>
<updated>2011-08-29T10:28:33+00:00</updated>
<author>
<name>Stephane Eranian</name>
<email>eranian@google.com</email>
</author>
<published>2011-08-25T13:58:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a8d757ef076f0f95f13a918808824058de25b3eb'/>
<id>a8d757ef076f0f95f13a918808824058de25b3eb</id>
<content type='text'>
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 &lt;not counted&gt; cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 &lt;not counted&gt; cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 &lt;not counted&gt; cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 &lt;not counted&gt; cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian &lt;eranian@google.com&gt;
Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar &lt;mingo@elte.hu&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>atomic: use &lt;linux/atomic.h&gt;</title>
<updated>2011-07-26T23:49:47+00:00</updated>
<author>
<name>Arun Sharma</name>
<email>asharma@fb.com</email>
</author>
<published>2011-07-26T23:09:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=60063497a95e716c9a689af3be2687d261f115b4'/>
<id>60063497a95e716c9a689af3be2687d261f115b4</id>
<content type='text'>
This allows us to move duplicated code in &lt;asm/atomic.h&gt;
(atomic_inc_not_zero() for now) to &lt;linux/atomic.h&gt;

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Reviewed-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Mike Frysinger &lt;vapier@gentoo.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This allows us to move duplicated code in &lt;asm/atomic.h&gt;
(atomic_inc_not_zero() for now) to &lt;linux/atomic.h&gt;

Signed-off-by: Arun Sharma &lt;asharma@fb.com&gt;
Reviewed-by: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Eric Dumazet &lt;eric.dumazet@gmail.com&gt;
Acked-by: Mike Frysinger &lt;vapier@gentoo.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
