linux-stable.git/kernel/events/ring_buffer.c, branch v3.16.78

perf/ring_buffer: Add ordering to rb->nest increment

2019-10-05T15:19:45+00:00

commit 3f9fbe9bd86c534eba2faf5d840fd44c6049f50e upstream.

Similar to how decrementing rb->next too early can cause data_head to
(temporarily) be observed to go backward, so too can this happen when
we increment too late.

This barrier() ensures the rb->head load happens after the increment,
both the one in the 'goto again' path, as the one from
perf_output_get_handle() -- albeit very unlikely to matter for the
latter.

Suggested-by: Yabin Cui 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: acme@kernel.org
Cc: mark.rutland@arm.com
Cc: namhyung@kernel.org
Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables")
Link: http://lkml.kernel.org/r/20190517115418.309516009@infradead.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf/ring_buffer: Fix exposing a temporarily decreased data_head

2019-10-05T15:19:45+00:00

commit 1b038c6e05ff70a1e66e3e571c2e6106bdb75f53 upstream.

In perf_output_put_handle(), an IRQ/NMI can happen in below location and
write records to the same ring buffer:

	...
	local_dec_and_test(&rb->nest)
	...                          <-- an IRQ/NMI can happen here
	rb->user_page->data_head = head;
	...

In this case, a value A is written to data_head in the IRQ, then a value
B is written to data_head after the IRQ. And A > B. As a result,
data_head is temporarily decreased from A to B. And a reader may see
data_head < data_tail if it read the buffer frequently enough, which
creates unexpected behaviors.

This can be fixed by moving dec(&rb->nest) to after updating data_head,
which prevents the IRQ/NMI above from updating data_head.

[ Split up by peterz. ]

Signed-off-by: Yabin Cui 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: mark.rutland@arm.com
Fixes: ef60777c9abd ("perf: Optimize the perf_output() path by removing IRQ-disables")
Link: http://lkml.kernel.org/r/20190517115418.224478157@infradead.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf/core: Fix impossible ring-buffer sizes warning

2019-05-02T20:41:53+00:00

commit 528871b456026e6127d95b1b2bd8e3a003dc1614 upstream.

The following commit:

  9dff0aa95a32 ("perf/core: Don't WARN() for impossible ring-buffer sizes")

results in perf recording failures with larger mmap areas:

  root@skl:/tmp# perf record -g -a
  failed to mmap with 12 (Cannot allocate memory)

The root cause is that the following condition is buggy:

	if (order_base_2(size) >= MAX_ORDER)
		goto fail;

The problem is that @size is in bytes and MAX_ORDER is in pages,
so the right test is:

	if (order_base_2(size) >= PAGE_SHIFT+MAX_ORDER)
		goto fail;

Fix it.

Reported-by: "Jin, Yao" 
Bisected-by: Borislav Petkov 
Analyzed-by: Peter Zijlstra 
Cc: Julien Thierry 
Cc: Mark Rutland 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Greg Kroah-Hartman 
Fixes: 9dff0aa95a32 ("perf/core: Don't WARN() for impossible ring-buffer sizes")
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf/core: Don't WARN() for impossible ring-buffer sizes

2019-05-02T20:41:36+00:00

commit 9dff0aa95a324e262ffb03f425d00e4751f3294e upstream.

The perf tool uses /proc/sys/kernel/perf_event_mlock_kb to determine how
large its ringbuffer mmap should be. This can be configured to arbitrary
values, which can be larger than the maximum possible allocation from
kmalloc.

When this is configured to a suitably large value (e.g. thanks to the
perf fuzzer), attempting to use perf record triggers a WARN_ON_ONCE() in
__alloc_pages_nodemask():

   WARNING: CPU: 2 PID: 5666 at mm/page_alloc.c:4511 __alloc_pages_nodemask+0x3f8/0xbc8

Let's avoid this by checking that the requested allocation is possible
before calling kzalloc.

Reported-by: Julien Thierry 
Signed-off-by: Mark Rutland 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Julien Thierry 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: https://lkml.kernel.org/r/20190110142745.25495-1-mark.rutland@arm.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Optimize ring-buffer write by depending on control dependencies

2013-12-11T14:53:22+00:00

Remove a full barrier from the ring-buffer write path by relying on
a control dependency to order a LOAD -> STORE scenario.

Cc: "Paul E. McKenney" 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/n/tip-8alv40z6ikk57jzbaobnxrjl@git.kernel.org
Signed-off-by: Ingo Molnar

perf: Update a stale comment

2013-11-06T11:34:23+00:00

Signed-off-by: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: Frederic Weisbecker 
Cc: Mathieu Desnoyers 
Cc: Michael Ellerman 
Cc: Michael Neuling 
Cc: "Paul E. McKenney" 
Cc: james.hogan@imgtec.com
Cc: Vince Weaver 
Cc: Victor Kaplansky 
Cc: Oleg Nesterov 
Cc: Anton Blanchard 
Link: http://lkml.kernel.org/n/tip-9s5mze78gmlz19agt39i8rii@git.kernel.org
Signed-off-by: Ingo Molnar

perf: Optimize perf_output_begin() -- address calculation

2013-11-06T11:34:22+00:00

Rewrite the handle address calculation code to be clearer.

Saves 8 bytes on x86_64-defconfig.

Signed-off-by: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: Frederic Weisbecker 
Cc: Mathieu Desnoyers 
Cc: Michael Ellerman 
Cc: Michael Neuling 
Cc: "Paul E. McKenney" 
Cc: james.hogan@imgtec.com
Cc: Vince Weaver 
Cc: Victor Kaplansky 
Cc: Oleg Nesterov 
Cc: Anton Blanchard 
Link: http://lkml.kernel.org/n/tip-3trb2n2henb9m27tncef3ag7@git.kernel.org
Signed-off-by: Ingo Molnar

perf: Optimize perf_output_begin() -- lost_event case

2013-11-06T11:34:21+00:00

Avoid touching the lost_event and sample_data cachelines twince. Its
not like we end up doing less work, but it might help to keep all
accesses to these cachelines in one place.

Due to code shuffle, this looses 4 bytes on x86_64-defconfig.

Signed-off-by: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: Frederic Weisbecker 
Cc: Mathieu Desnoyers 
Cc: Michael Ellerman 
Cc: Michael Neuling 
Cc: "Paul E. McKenney" 
Cc: james.hogan@imgtec.com
Cc: Vince Weaver 
Cc: Victor Kaplansky 
Cc: Oleg Nesterov 
Cc: Anton Blanchard 
Link: http://lkml.kernel.org/n/tip-zfxnc58qxj0eawdoj31hhupv@git.kernel.org
Signed-off-by: Ingo Molnar

perf: Optimize perf_output_begin()

2013-11-06T11:34:20+00:00

There's no point in re-doing the memory-barrier when we fail the
cmpxchg(). Also placing it after the space reservation loop makes it
clearer it only separates the userpage->tail read from the data
stores.

Signed-off-by: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: Frederic Weisbecker 
Cc: Mathieu Desnoyers 
Cc: Michael Ellerman 
Cc: Michael Neuling 
Cc: "Paul E. McKenney" 
Cc: james.hogan@imgtec.com
Cc: Vince Weaver 
Cc: Victor Kaplansky 
Cc: Oleg Nesterov 
Cc: Anton Blanchard 
Link: http://lkml.kernel.org/n/tip-c19u6egfldyx86tpyc3zgkw9@git.kernel.org
Signed-off-by: Ingo Molnar

perf: Add unlikely() to the ring-buffer code

2013-11-06T11:34:19+00:00

Add unlikely() annotations to 'slow' paths:

When having a sampling event but no output buffer; you have bigger
issues -- also the bail is still faster than actually doing the work.

When having a sampling event but a control page only buffer, you have
bigger issues -- again the bail is still faster than actually doing
work.

Optimize for the case where you're not loosing events -- again, not
doing the work is still faster but make sure that when you have to
actually do work its as fast as possible.

The typical watermark is 1/2 the buffer size, so most events will not
take this path.

Shrinks perf_output_begin() by 16 bytes on x86_64-defconfig.

Signed-off-by: Peter Zijlstra 
Cc: Benjamin Herrenschmidt 
Cc: Frederic Weisbecker 
Cc: Mathieu Desnoyers 
Cc: Michael Ellerman 
Cc: Michael Neuling 
Cc: "Paul E. McKenney" 
Cc: james.hogan@imgtec.com
Cc: Vince Weaver 
Cc: Victor Kaplansky 
Cc: Oleg Nesterov 
Cc: Anton Blanchard 
Link: http://lkml.kernel.org/n/tip-wlg3jew3qnutm8opd0hyeuwn@git.kernel.org
Signed-off-by: Ingo Molnar