linux-stable.git/kernel/events/core.c, branch v3.2.93

ptrace: use fsuid, fsgid, effective creds for fs access checks

2017-09-15T17:30:57+00:00

commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn 
Acked-by: Kees Cook 
Cc: Casey Schaufler 
Cc: Oleg Nesterov 
Cc: Ingo Molnar 
Cc: James Morris 
Cc: "Serge E. Hallyn" 
Cc: Andy Shevchenko 
Cc: Andy Lutomirski 
Cc: Al Viro 
Cc: "Eric W. Biederman" 
Cc: Willy Tarreau 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 
[bwh: Backported to 3.2:
 - Drop changes to kcmp, procfs map_files, procfs has_pid_permissions()
 - Keep using uid_t, gid_t and == operator for IDs
 - Adjust context]
Signed-off-by: Ben Hutchings

perf/core: Fix event inheritance on fork()

2017-07-18T17:38:33+00:00

commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream.

While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.

Spotted-by: Oleg Nesterov 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Dmitry Vyukov 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Mathieu Desnoyers 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: oleg@redhat.com
Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists")
Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race

2017-02-23T03:51:04+00:00

commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream.

Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.

The problem is exactly that described in commit:

  f63a8daa5812 ("perf: Fix event->ctx locking")

... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.

That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.

So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).

Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.

Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Kees Cook 
Cc: Linus Torvalds 
Cc: Min Chong 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Fixes: f63a8daa5812 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2:
 - Use ACCESS_ONCE() instead of READ_ONCE()
 - Test perf_event::group_flags instead of group_caps
 - Add the err_locked cleanup block, which we didn't need before
 - Adjust context]
Signed-off-by: Ben Hutchings

perf: Do not double free

2017-02-23T03:51:04+00:00

commit 130056275ade730e7a79c110212c8815202773ee upstream.

In case of: err_file: fput(event_file), we'll end up calling
perf_release() which in turn will free the event.

Do not then free the event _again_.

Tested-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: dvyukov@google.com
Cc: eranian@google.com
Cc: oleg@redhat.com
Cc: panand@redhat.com
Cc: sasha.levin@oracle.com
Cc: vince@deater.net
Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings

perf: Fix event->ctx locking

2017-02-23T03:51:04+00:00

commit f63a8daa5812afef4f06c962351687e1ff9ccb2b upstream.

There have been a few reported issues wrt. the lack of locking around
changing event->ctx. This patch tries to address those.

It avoids the whole rwsem thing; and while it appears to work, please
give it some thought in review.

What I did fail at is sensible runtime checks on the use of
event->ctx, the RCU use makes it very hard.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Paul E. McKenney 
Cc: Jiri Olsa 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2:
 - We don't have perf_pmu_migrate_context()
 - Adjust context]
Signed-off-by: Ben Hutchings

perf: Fix perf_event_for_each() to use sibling

2017-02-23T03:51:04+00:00

commit 724b6daa13e100067c30cfc4d1ad06629609dc4e upstream.

In perf_event_for_each() we call a function on an event, and then
iterate over the siblings of the event.

However we don't call the function on the siblings, we call it
repeatedly on the original event - it seems "obvious" that we should
be calling it with sibling as the argument.

It looks like this broke in commit 75f937f24bd9 ("Fix ctx->mutex
vs counter->mutex inversion").

The only effect of the bug is that the PERF_IOC_FLAG_GROUP parameter
to the ioctls doesn't work.

Signed-off-by: Michael Ellerman 
Signed-off-by: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1334109253-31329-1-git-send-email-michael@ellerman.id.au
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix race in swevent hash

2017-02-23T03:51:03+00:00

commit 12ca6ad2e3a896256f086497a7c7406a547ee373 upstream.

There's a race on CPU unplug where we free the swevent hash array
while it can still have events on. This will result in a
use-after-free which is BAD.

Simply do not free the hash array on unplug. This leaves the thing
around and no use-after-free takes place.

When the last swevent dies, we do a for_each_possible_cpu() iteration
anyway to clean these up, at which time we'll free it, so no leakage
will occur.

Reported-by: Sasha Levin 
Tested-by: Sasha Levin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Frederic Weisbecker 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix inherited events vs. tracepoint filters

2015-11-27T12:48:23+00:00

commit b71b437eedaed985062492565d9d421d975ae845 upstream.

Arnaldo reported that tracepoint filters seem to misbehave (ie. not
apply) on inherited events.

The fix is obvious; filters are only set on the actual (parent)
event, use the normal pattern of using this parent event for filters.
This is safe because each child event has a reference to it.

Reported-by: Arnaldo Carvalho de Melo 
Tested-by: Arnaldo Carvalho de Melo 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Adrian Hunter 
Cc: Arnaldo Carvalho de Melo 
Cc: David Ahern 
Cc: Frédéric Weisbecker 
Cc: Jiri Olsa 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Cc: Wang Nan 
Link: http://lkml.kernel.org/r/20151102095051.GN17308@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix fasync handling on inherited events

2015-10-13T02:46:02+00:00

commit fed66e2cdd4f127a43fd11b8d92a99bdd429528c upstream.

Vince reported that the fasync signal stuff doesn't work proper for
inherited events. So fix that.

Installing fasync allocates memory and sets filp->f_flags |= FASYNC,
which upon the demise of the file descriptor ensures the allocation is
freed and state is updated.

Now for perf, we can have the events stick around for a while after the
original FD is dead because of references from child events. So we
cannot copy the fasync pointer around. We can however consistently use
the parent's fasync, as that will be updated.

Reported-and-Tested-by: Vince Weaver 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho deMelo 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1434011521.1495.71.camel@twins
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings

perf: Fix irq_work 'tail' recursion

2015-05-09T22:16:31+00:00

commit d525211f9d1be8b523ec7633f080f2116f5ea536 upstream.

Vince reported a watchdog lockup like:

	[] perf_tp_event+0xc4/0x210
	[] perf_trace_lock+0x12a/0x160
	[] lock_release+0x130/0x260
	[] _raw_spin_unlock_irqrestore+0x24/0x40
	[] do_send_sig_info+0x5d/0x80
	[] send_sigio_to_task+0x12f/0x1a0
	[] send_sigio+0xae/0x100
	[] kill_fasync+0x97/0xf0
	[] perf_event_wakeup+0xd4/0xf0
	[] perf_pending_event+0x33/0x60
	[] irq_work_run_list+0x4c/0x80
	[] irq_work_run+0x18/0x40
	[] smp_trace_irq_work_interrupt+0x3f/0xc0
	[] trace_irq_work_interrupt+0x6d/0x80

Which is caused by an irq_work generating new irq_work and therefore
not allowing forward progress.

This happens because processing the perf irq_work triggers another
perf event (tracepoint stuff) which in turn generates an irq_work ad
infinitum.

Avoid this by raising the recursion counter in the irq_work -- which
effectively disables all software events (including tracepoints) from
actually triggering again.

Reported-by: Vince Weaver 
Tested-by: Vince Weaver 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Steven Rostedt 
Link: http://lkml.kernel.org/r/20150219170311.GH21418@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar 
Signed-off-by: Ben Hutchings