linux-stable.git/include/linux/perf_event.h, branch v3.16.67

perf/x86: Add check_period PMU callback

2019-05-02T20:41:51+00:00

commit 81ec3f3c4c4d78f2d3b6689c9816bfbdf7417dbb upstream.

Vince (and later on Ravi) reported crashes in the BTS code during
fuzzing with the following backtrace:

  general protection fault: 0000 [#1] SMP PTI
  ...
  RIP: 0010:perf_prepare_sample+0x8f/0x510
  ...
  Call Trace:
   
   ? intel_pmu_drain_bts_buffer+0x194/0x230
   intel_pmu_drain_bts_buffer+0x160/0x230
   ? tick_nohz_irq_exit+0x31/0x40
   ? smp_call_function_single_interrupt+0x48/0xe0
   ? call_function_single_interrupt+0xf/0x20
   ? call_function_single_interrupt+0xa/0x20
   ? x86_schedule_events+0x1a0/0x2f0
   ? x86_pmu_commit_txn+0xb4/0x100
   ? find_busiest_group+0x47/0x5d0
   ? perf_event_set_state.part.42+0x12/0x50
   ? perf_mux_hrtimer_restart+0x40/0xb0
   intel_pmu_disable_event+0xae/0x100
   ? intel_pmu_disable_event+0xae/0x100
   x86_pmu_stop+0x7a/0xb0
   x86_pmu_del+0x57/0x120
   event_sched_out.isra.101+0x83/0x180
   group_sched_out.part.103+0x57/0xe0
   ctx_sched_out+0x188/0x240
   ctx_resched+0xa8/0xd0
   __perf_event_enable+0x193/0x1e0
   event_function+0x8e/0xc0
   remote_function+0x41/0x50
   flush_smp_call_function_queue+0x68/0x100
   generic_smp_call_function_single_interrupt+0x13/0x30
   smp_call_function_single_interrupt+0x3e/0xe0
   call_function_single_interrupt+0xf/0x20
   

The reason is that while event init code does several checks
for BTS events and prevents several unwanted config bits for
BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
to create BTS event without those checks being done.

Following sequence will cause the crash:

If we create an 'almost' BTS event with precise_ip and callchains,
and it into a BTS event it will crash the perf_prepare_sample()
function because precise_ip events are expected to come
in with callchain data initialized, but that's not the
case for intel_pmu_drain_bts_buffer() caller.

Adding a check_period callback to be called before the period
is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
if the event would become BTS. Plus adding also the limit_period
check as well.

Reported-by: Vince Weaver 
Signed-off-by: Jiri Olsa 
Acked-by: Peter Zijlstra 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Naveen N. Rao 
Cc: Ravi Bangoria 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20190204123532.GA4794@krava
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.16:
 - Don't call limit_period operation, which doesn't exist and isn't needed here
 - Add the intel_pmu_has_bts() function, which didn't previously exist here
 - Adjust filenames, context]
Signed-off-by: Ben Hutchings

perf/core: Fix group scheduling with mixed hw and sw events

2018-11-20T18:05:03+00:00

commit a1150c202207cc8501bebc45b63c264f91959260 upstream.

When hw and sw events are mixed in the same group, they are all attached
to the hw perf_event_context. This sometimes requires moving group of
perf_event to a different context.

We found a bug in how the kernel handles this, for example if we do:

   perf stat -e '{faults,ref-cycles,faults}'  -I 1000

     1.005591180              1,297      faults
     1.005591180        457,476,576      ref-cycles
     1.005591180          faults

First, sw event "faults" is attached to the sw context, and becomes the
group leader. Then, hw event "ref-cycles" is attached, so both events
are moved to the hw context. Last, another sw "faults" tries to attach,
but it fails because of mismatch between the new target ctx (from sw
pmu) and the group_leader's ctx (hw context, same as ref-cycles).

The broken condition is:
   group_leader is sw event;
   group_leader is on hw context;
   add a sw event to the group.

Fix this scenario by checking group_leader's context (instead of just
event type). If group_leader is on hw context, use the ->pmu of this
context to look up context for the new event.

Signed-off-by: Song Liu 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Fixes: b04243ef7006 ("perf: Complete software pmu grouping")
Link: http://lkml.kernel.org/r/20180503194716.162815-1-songliubraving@fb.com
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings

perf: Avoid horrible stack usage

2017-11-11T13:34:02+00:00

commit 86038c5ea81b519a8a1fcfcd5e4599aab0cdd119 upstream.

Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.

The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.

Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.

This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.

Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.

We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).

One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.

Reported-by: Linus Torvalds 
Reported-by: Steven Rostedt 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Javi Merino 
Cc: Linus Torvalds 
Cc: Mathieu Desnoyers 
Cc: Oleg Nesterov 
Cc: Paul Mackerras 
Cc: Petr Mladek 
Cc: Steven Rostedt 
Cc: Tom Zanussi 
Cc: Vaibhav Nagarnaik 
Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Arnd Bergmann 
Signed-off-by: Ben Hutchings

perf: Tighten (and fix) the grouping condition

2015-02-04T10:58:30+00:00

commit c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream.

The fix from 9fc81d87420d ("perf: Fix events installation during
moving group") was incomplete in that it failed to recognise that
creating a group with events for different CPUs is semantically
broken -- they cannot be co-scheduled.

Furthermore, it leads to real breakage where, when we create an event
for CPU Y and then migrate it to form a group on CPU X, the code gets
confused where the counter is programmed -- triggered in practice
as well by me via the perf fuzzer.

Fix this by tightening the rules for creating groups. Only allow
grouping of counters that can be co-scheduled in the same context.
This means for the same task and/or the same cpu.

Fixes: 9fc81d87420d ("perf: Fix events installation during moving group")
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Luis Henriques

perf: Differentiate exec() and non-exec() comm events

2014-06-06T05:56:22+00:00

perf tools like 'perf report' can aggregate samples by comm strings,
which generally works.  However, there are other potential use-cases.
For example, to pair up 'calls' with 'returns' accurately (from branch
events like Intel BTS) it is necessary to identify whether the process
has exec'd.  Although a comm event is generated when an 'exec' happens
it is also generated whenever the comm string is changed on a whim
(e.g. by prctl PR_SET_NAME).  This patch adds a flag to the comm event
to differentiate one case from the other.

In order to determine whether the kernel supports the new flag, a
selection bit named 'exec' is added to struct perf_event_attr.  The
bit does nothing but will cause perf_event_open() to fail if the bit
is set on kernels that do not have it defined.

Signed-off-by: Adrian Hunter 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/537D9EBE.7030806@intel.com
Cc: Paul Mackerras 
Cc: Dave Jones 
Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Jiri Olsa 
Cc: Alexander Viro 
Cc: Linus Torvalds 
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Merge branch 'perf/urgent' into perf/core, to resolve conflict and to prepare for new patches

2014-06-06T05:55:06+00:00

Conflicts:
	arch/x86/kernel/traps.c

Signed-off-by: Ingo Molnar

perf: Fix perf_event_comm() vs. exec() assumption

2014-06-06T05:54:02+00:00

perf_event_comm() assumes that set_task_comm() is only called on
exec(), and in particular that its only called on current.

Neither are true, as Dave reported a WARN triggered by set_task_comm()
being called on !current.

Separate the exec() hook from the comm hook.

Reported-by: Dave Jones 
Signed-off-by: Peter Zijlstra 
Cc: Alexander Viro 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20140521153219.GH5226@laptop.programming.kicks-ass.net
[ Build fix. ]
Signed-off-by: Ingo Molnar

perf: Disable sampled events if no PMU interrupt

2014-06-05T10:29:55+00:00

Add common code to generate -ENOTSUPP at event creation time if an
architecture attempts to create a sampled event and
PERF_PMU_NO_INTERRUPT is set.

This adds a new pmu->capabilities flag.  Initially we only support
PERF_PMU_NO_INTERRUPT (to indicate a PMU has no support for generating
hardware interrupts) but there are other capabilities that can be
added later.

Signed-off-by: Vince Weaver 
Acked-by: Will Deacon 
[peterz: rename to PERF_PMU_CAP_* and moved the pmu::capabilities word into a hole]
Signed-off-by: Peter Zijlstra 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1405161708060.11099@vincent-weaver-1.umelst.maine.edu
Signed-off-by: Ingo Molnar 

Signed-off-by: Ingo Molnar

perf: Fix a race between ring_buffer_detach() and ring_buffer_attach()

2014-05-19T12:44:56+00:00

Alexander noticed that we use RCU iteration on rb->event_list but do
not use list_{add,del}_rcu() to add,remove entries to that list, nor
do we observe proper grace periods when re-using the entries.

Merge ring_buffer_detach() into ring_buffer_attach() such that
attaching to the NULL buffer is detaching.

Furthermore, ensure that between any 'detach' and 'attach' of the same
event we observe the required grace period, but only when strictly
required. In effect this means that only ioctl(.request =
PERF_EVENT_IOC_SET_OUTPUT) will wait for a grace period, while the
normal initial attach and final detach will not be delayed.

This patch should, I think, do the right thing under all
circumstances, the 'normal' cases all should never see the extra grace
period, but the two cases:

 1) PERF_EVENT_IOC_SET_OUTPUT on an event which already has a
    ring_buffer set, will now observe the required grace period between
    removing itself from the old and attaching itself to the new buffer.

    This case is 'simple' in that both buffers are present in
    perf_event_set_output() one could think an unconditional
    synchronize_rcu() would be sufficient; however...

 2) an event that has a buffer attached, the buffer is destroyed
    (munmap) and then the event is attached to a new/different buffer
    using PERF_EVENT_IOC_SET_OUTPUT.

    This case is more complex because the buffer destruction does:
      ring_buffer_attach(.rb = NULL)
    followed by the ioctl() doing:
      ring_buffer_attach(.rb = foo);

    and we still need to observe the grace period between these two
    calls due to us reusing the event->rb_entry list_head.

In order to make 2 happen we use Paul's latest cond_synchronize_rcu()
call.

Cc: Paul Mackerras 
Cc: Stephane Eranian 
Cc: Andi Kleen 
Cc: "Paul E. McKenney" 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Mike Galbraith 
Reported-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20140507123526.GD13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner

perf: Allow building PMU drivers as modules

2014-04-18T10:54:45+00:00

This patch adds support for building PMU driver as module. It exports
the functions perf_pmu_{register,unregister}() and adds reference tracking
for the PMU driver module.

When the PMU driver is built as a module, each active event of the PMU
holds a reference to the driver module.

Signed-off-by: Yan, Zheng 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1395133004-23205-1-git-send-email-zheng.z.yan@intel.com
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar