linux-stable.git/fs/exec.c, branch linux-2.6.32.y

fs: take i_mutex during prepare_binprm for set[ug]id executables

2015-05-24T08:10:44+00:00

commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

This prevents a race between chown() and execve(), where chowning a
setuid-user binary to root would momentarily make the binary setuid
root.

This patch was mostly written by Linus Torvalds.

Signed-off-by: Jann Horn 
Signed-off-by: Linus Torvalds 
[bwh: Backported to 3.2:
 - Drop the task_no_new_privs() and user namespace checks
 - Open-code file_inode()
 - s/READ_ONCE/ACCESS_ONCE/
 - Adjust context]
Signed-off-by: Ben Hutchings 
(cherry picked from commit 470e517be17dd6ef8670bec7bd7831ea0d3ad8a6)

Signed-off-by: Willy Tarreau

exec/ptrace: fix get_dumpable() incorrect tests

2014-05-19T05:54:30+00:00

commit d049f74f2dbe71354d43d393ac3a188947811348 upstream

The get_dumpable() return value is not boolean.  Most users of the
function actually want to be testing for non-SUID_DUMP_USER(1) rather than
SUID_DUMP_DISABLE(0).  The SUID_DUMP_ROOT(2) is also considered a
protected state.  Almost all places did this correctly, excepting the two
places fixed in this patch.

Wrong logic:
    if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ }
        or
    if (dumpable == 0) { /* be protective */ }
        or
    if (!dumpable) { /* be protective */ }

Correct logic:
    if (dumpable != SUID_DUMP_USER) { /* be protective */ }
        or
    if (dumpable != 1) { /* be protective */ }

Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a
user was able to ptrace attach to processes that had dropped privileges to
that user.  (This may have been partially mitigated if Yama was enabled.)

The macros have been moved into the file that declares get/set_dumpable(),
which means things like the ia64 code can see them too.

CVE-2013-2929

Reported-by: Vasily Kulikov 
Signed-off-by: Kees Cook 
Cc: Tony Luck 
Cc: Oleg Nesterov 
Cc: "Eric W. Biederman" 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau

exec: use -ELOOP for max recursion depth

2013-06-10T09:42:18+00:00

commit d740269867021faf4ce38a449353d2b986c34a67 upstream

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

        if (cmd != path_bshell && errno == ENOEXEC) {
                *argv-- = cmd;
                *argv = cmd = path_bshell;
                goto repeat;
        }

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook 
Cc: halfdog 
Cc: P J P 
Cc: Alexander Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau

exec: do not leave bprm->interp on stack

2013-06-10T09:42:17+00:00

commit b66c5984017533316fd1951770302649baf1aa33 upstream

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules.  Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted.  They leave bprm->interp
pointing to their local stack.  This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules.  As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp.  To avoid adding a new kmalloc to every exec, the default
value is left as-is.  Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

   http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook 
Cc: halfdog 
Cc: P J P 
Cc: Alexander Viro 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[dannf: backported to Debian's 2.6.32]
Signed-off-by: Willy Tarreau

exec: delay address limit change until point of no return

2011-06-23T22:24:09+00:00

commit dac853ae89043f1b7752875300faf614de43c74b upstream.

Unconditionally changing the address limit to USER_DS and not restoring
it to its old value in the error path is wrong because it prevents us
using kernel memory on repeated calls to this function.  This, in fact,
breaks the fallback of hard coded paths to the init program from being
ever successful if the first candidate fails to load.

With this patch applied switching to USER_DS is delayed until the point
of no return is reached which makes it possible to have a multi-arch
rootfs with one arch specific init binary for each of the (hard coded)
probed paths.

Since the address limit is already set to USER_DS when start_thread()
will be invoked, this redundancy can be safely removed.

Signed-off-by: Mathias Krause 
Cc: Al Viro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

exec: copy-and-paste the fixes into compat_do_execve() paths

2011-04-14T23:53:55+00:00

commit 114279be2120a916e8a04feeb2ac976a10016f2f upstream.

Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov 
Cc: KOSAKI Motohiro 
Signed-off-by: Linus Torvalds 
Signed-off-by: Andi Kleen 
Cc: Moritz Muehlenhoff 
Signed-off-by: Greg Kroah-Hartman

exec: make argv/envp memory visible to oom-killer

2011-04-14T23:53:55+00:00

commit 3c77f845722158206a7209c45ccddc264d19319c upstream.

Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.

Reported-by: Brad Spengler 
Reviewed-and-discussed-by: KOSAKI Motohiro 
Signed-off-by: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Cc: Moritz Muehlenhoff 
Signed-off-by: Greg Kroah-Hartman

install_special_mapping skips security_file_mmap check.

2011-01-07T22:43:14+00:00

commit 462e635e5b73ba9a4c03913b77138cd57ce4b050 upstream.

The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.

Signed-off-by: Tavis Ormandy 
Acked-by: Kees Cook 
Acked-by: Robert Swiecki 
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

execve: make responsive to SIGKILL with large arguments

2010-10-29T04:44:17+00:00

commit 9aea5a65aa7a1af9a4236dfaeb0088f1624f9919 upstream.

An execve with a very large total of argument/environment strings
can take a really long time in the execve system call.  It runs
uninterruptibly to count and copy all the strings.  This change
makes it abort the exec quickly if sent a SIGKILL.

Note that this is the conservative change, to interrupt only for
SIGKILL, by using fatal_signal_pending().  It would be perfectly
correct semantics to let any signal interrupt the string-copying in
execve, i.e. use signal_pending() instead of fatal_signal_pending().
We'll save that change for later, since it could have user-visible
consequences, such as having a timer set too quickly make it so that
an execve can never complete, though it always happened to work before.

Signed-off-by: Roland McGrath 
Reviewed-by: KOSAKI Motohiro 
Cc: Chuck Ebbert 
Signed-off-by: Linus Torvalds

execve: improve interactivity with large arguments

2010-10-29T04:44:17+00:00

commit 7993bc1f4663c0db67bb8f0d98e6678145b387cd upstream.

This adds a preemption point during the copying of the argument and
environment strings for execve, in copy_strings().  There is already
a preemption point in the count() loop, so this doesn't add any new
points in the abstract sense.

When the total argument+environment strings are very large, the time
spent copying them can be much more than a normal user time slice.
So this change improves the interactivity of the rest of the system
when one process is doing an execve with very large arguments.

Signed-off-by: Roland McGrath 
Reviewed-by: KOSAKI Motohiro 
Signed-off-by: Linus Torvalds 
Cc: Chuck Ebbert 
Signed-off-by: Greg Kroah-Hartman