linux-stable.git/fs/proc, branch v3.12.71

sysctl: Drop reference added by grab_header in proc_sys_readdir

2017-01-26T16:40:44+00:00

commit 93362fa47fe98b62e4a34ab408c4a418432e7939 upstream.

Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
added by grab_header when return from !dir_emit_dots path.
It can cause any path called unregister_sysctl_table will
wait forever.

The calltrace of CVE-2016-9191:

[ 5535.960522] Call Trace:
[ 5535.963265]  [] schedule+0x3f/0xa0
[ 5535.968817]  [] schedule_timeout+0x3db/0x6f0
[ 5535.975346]  [] ? wait_for_completion+0x45/0x130
[ 5535.982256]  [] wait_for_completion+0xc3/0x130
[ 5535.988972]  [] ? wake_up_q+0x80/0x80
[ 5535.994804]  [] drop_sysctl_table+0xc4/0xe0
[ 5536.001227]  [] drop_sysctl_table+0x77/0xe0
[ 5536.007648]  [] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654]  [] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657]  [] unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344]  [] partition_sched_domains+0x44/0x450
[ 5536.036447]  [] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844]  [] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336]  [] update_flag+0x11d/0x210
[ 5536.057373]  [] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186]  [] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899]  [] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420]  [] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234]  [] ? css_killed_work_fn+0x25/0x220
[ 5536.091049]  [] cpuset_css_offline+0x35/0x60
[ 5536.097571]  [] css_killed_work_fn+0x5c/0x220
[ 5536.104207]  [] process_one_work+0x1df/0x710
[ 5536.110736]  [] ? process_one_work+0x160/0x710
[ 5536.117461]  [] worker_thread+0x12b/0x4a0
[ 5536.123697]  [] ? process_one_work+0x710/0x710
[ 5536.130426]  [] kthread+0xfe/0x120
[ 5536.135991]  [] ret_from_fork+0x1f/0x40
[ 5536.142041]  [] ? kthread_create_on_node+0x230/0x230

One cgroup maintainer mentioned that "cgroup is trying to offline
a cpuset css, which takes place under cgroup_mutex.  The offlining
ends up trying to drain active usages of a sysctl table which apprently
is not happening."
The real reason is that proc_sys_readdir doesn't drop reference added
by grab_header when return from !dir_emit_dots path. So this cpuset
offline path will wait here forever.

See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13

Fixes: f0c3b5093add ("[readdir] convert procfs")
Reported-by: CAI Qian 
Tested-by: Yang Shukui 
Signed-off-by: Zhou Chengming 
Acked-by: Al Viro 
Signed-off-by: Eric W. Biederman 
Signed-off-by: Jiri Slaby

proc: prevent accessing /proc//environ until it's ready

2016-05-11T09:37:33+00:00

commit 8148a73c9901a8794a50f950083c00ccf97d43b3 upstream.

If /proc//environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.

Fix this as it is done for /proc//cmdline by testing env_end for
zero.  It is, apparently, intentionally set last in create_*_tables().

This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.

The expected consequence is that userland trying to access
/proc//environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.

Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause 
Cc: Emese Revfy 
Cc: Pax Team 
Cc: Al Viro 
Cc: Mateusz Guzik 
Cc: Alexey Dobriyan 
Cc: Cyrill Gorcunov 
Cc: Jarod Wilson 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

proc: Fix ptrace-based permission checks for accessing task maps

2016-03-02T09:29:36+00:00

Modify mm_access() calls in fs/proc/task_mmu.c and fs/proc/task_nommu.c to
have the mode include PTRACE_MODE_FSCREDS so accessing /proc/pid/maps and
/proc/pid/pagemap is not denied to all users.

In backporting upstream commit caaee623 to pre-3.18 kernel versions it was
overlooked that mm_access() is used in fs/proc/task_*mmu.c as those calls
were removed in 3.18 (by upstream commit 29a40ace) and did not exist at the
time of the original commit.

Fixes: caaee6234d ("ptrace: use fsuid, fsgid, effective creds for fs access checks")
Signed-off-by: Corey Wright 
Acked-by: Jann Horn 
Signed-off-by: Jiri Slaby

ptrace: use fsuid, fsgid, effective creds for fs access checks

2016-02-24T09:23:22+00:00

commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn 
Acked-by: Kees Cook 
Cc: Casey Schaufler 
Cc: Oleg Nesterov 
Cc: Ingo Molnar 
Cc: James Morris 
Cc: "Serge E. Hallyn" 
Cc: Andy Shevchenko 
Cc: Andy Lutomirski 
Cc: Al Viro 
Cc: "Eric W. Biederman" 
Cc: Willy Tarreau 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

proc: actually make proc_fd_permission() thread-friendly

2016-02-15T16:07:47+00:00

commit 54708d2858e79a2bdda10bf8a20c80eb96c20613 upstream.

The commit 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly")
fixed the access to /proc/self/fd from sub-threads, but introduced another
problem: a sub-thread can't access /proc//fd/ or /proc/thread-self/fd
if generic_permission() fails.

Change proc_fd_permission() to check same_thread_group(pid_task(), current).

Fixes: 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly")
Reported-by: "Jin, Yihua" 
Signed-off-by: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

/proc/stat: convert to single_open_size()

2015-05-15T07:10:56+00:00

commit f74373a5cc7a0155d232c4e999648c7a95435bb2 upstream.

These two patches are supposed to "fix" failed order-4 memory
allocations which have been observed when reading /proc/stat.  The
problem has been observed on s390 as well as on x86.

To address the problem change the seq_file memory allocations to
fallback to use vmalloc, so that allocations also work if memory is
fragmented.

This approach seems to be simpler and less intrusive than changing
/proc/stat to use an interator.  Also it "fixes" other users as well,
which use seq_file's single_open() interface.

This patch (of 2):

Use seq_file's single_open_size() to preallocate a buffer that is large
enough to hold the whole output, instead of open coding it.  Also
calculate the requested size using the number of online cpus instead of
possible cpus, since the size of the output only depends on the number
of online cpus.

Signed-off-by: Heiko Carstens 
Acked-by: David Rientjes 
Cc: Ian Kent 
Cc: Hendrik Brueckner 
Cc: Thorsten Diehl 
Cc: Andrea Righi 
Cc: Christoph Hellwig 
Cc: Al Viro 
Cc: Stefan Bader 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

proc/pagemap: walk page tables under pte lock

2015-04-27T18:38:16+00:00

commit 05fbf357d94152171bc50f8a369390f1f16efd89 upstream.

Lockless access to pte in pagemap_pte_range() might race with page
migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():

CPU A (pagemap)                           CPU B (migration)
                                          lock_page()
                                          try_to_unmap(page, TTU_MIGRATION...)
                                               make_migration_entry()
                                               set_pte_at()

pte_to_pagemap_entry()
                                          remove_migration_ptes()
                                          unlock_page()
    if(is_migration_entry())
        migration_entry_to_page()
            BUG_ON(!PageLocked(page))

Also lockless read might be non-atomic if pte is larger than wordsize.
Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.

Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
Signed-off-by: Konstantin Khlebnikov 
Reported-by: Andrey Ryabinin 
Reviewed-by: Cyrill Gorcunov 
Acked-by: Naoya Horiguchi 
Acked-by: Kirill A. Shutemov 
Cc: 	[3.5+]
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

mm: softdirty: unmapped addresses between VMAs are clean

2015-04-27T18:25:29+00:00

commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4 upstream.

If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a
VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported
as softdirty.  Here's a program to demonstrate the bug:

int main() {
	const uint64_t PAGEMAP_SOFTDIRTY = 1ul << 55;
	uint64_t pme[3];
	int fd = open("/proc/self/pagemap", O_RDONLY);;
	char *m = mmap(NULL, 3 * getpagesize(), PROT_READ,
	               MAP_ANONYMOUS | MAP_SHARED, -1, 0);
	munmap(m + getpagesize(), getpagesize());
	pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8);
	assert(pme[0] & PAGEMAP_SOFTDIRTY);    /* passes */
	assert(!(pme[1] & PAGEMAP_SOFTDIRTY)); /* fails */
	assert(pme[2] & PAGEMAP_SOFTDIRTY);    /* passes */
	return 0;
}

(Note that all pages in new VMAs are softdirty until cleared).

Tested:
	Used the program given above. I'm going to include this code in
	a selftest in the future.

[n-horiguchi@ah.jp.nec.com: prevent pagemap_pte_range() from overrunning]
Signed-off-by: Peter Feiner 
Cc: "Kirill A. Shutemov" 
Cc: Cyrill Gorcunov 
Cc: Pavel Emelyanov 
Cc: Jamie Liu 
Cc: Hugh Dickins 
Cc: Naoya Horiguchi 
Signed-off-by: Naoya Horiguchi 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

pagemap: do not leak physical addresses to non-privileged userspace

2015-04-09T11:14:15+00:00

commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

As pointed by recent post[1] on exploiting DRAM physical imperfection,
/proc/PID/pagemap exposes sensitive information which can be used to do
attacks.

This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

[1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[ Eventually we might want to do anything more finegrained, but for now
  this is the simple model.   - Linus ]

Signed-off-by: Kirill A. Shutemov 
Acked-by: Konstantin Khlebnikov 
Acked-by: Andy Lutomirski 
Cc: Pavel Emelyanov 
Cc: Andrew Morton 
Cc: Mark Seaborn 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

procfs: fix race between symlink removals and traversals

2015-03-12T16:31:20+00:00

commit 7e0e953bb0cf649f93277ac8fb67ecbb7f7b04a9 upstream.

use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

Signed-off-by: Al Viro 
Signed-off-by: Jiri Slaby