linux.git/fs/exec.c, branch v2.6.39

Small typo fix...

2011-03-21T04:16:09+00:00

Hi,

I was backporting the coredump over pipe feature and noticed this small typo,
I wish I would have something bigger to contribute...

>From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
From: Holger Hans Peter Freyther 
Date: Thu, 24 Feb 2011 17:42:50 +0100
Subject: [PATCH] fs: Fix a small typo in the comment

The function is called umh_pipe_setup not uhm_pipe_setup.

Signed-off-by: Holger Hans Peter Freyther 
Signed-off-by: Al Viro

switch do_filp_open() to struct open_flags

2011-03-14T13:15:25+00:00

take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.

Signed-off-by: Al Viro

vfs: sparse: add __FMODE_EXEC

2011-02-03T00:03:19+00:00

FMODE_EXEC is a constant type of fmode_t but was used with normal integer
constants.  This results in following warnings from sparse.  Fix it using
new macro __FMODE_EXEC.

 fs/exec.c:116:58: warning: restricted fmode_t degrades to integer
 fs/exec.c:689:58: warning: restricted fmode_t degrades to integer
 fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer

Signed-off-by: Namhyung Kim 
Cc: Al Viro 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

install_special_mapping skips security_file_mmap check.

2010-12-15T20:30:36+00:00

The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.

Signed-off-by: Tavis Ormandy 
Acked-by: Kees Cook 
Acked-by: Robert Swiecki 
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris 
Signed-off-by: Linus Torvalds

exec: copy-and-paste the fixes into compat_do_execve() paths

2010-12-01T01:56:38+00:00

Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm->vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov 
Cc: KOSAKI Motohiro 
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

exec: make argv/envp memory visible to oom-killer

2010-12-01T01:56:37+00:00

Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()->copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm->mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes ->mm or fails.

Reported-by: Brad Spengler 
Reviewed-and-discussed-by: KOSAKI Motohiro 
Signed-off-by: Oleg Nesterov 
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

exec: don't turn PF_KTHREAD off when a target command was not found

2010-10-28T01:03:13+00:00

Presently do_execve() turns PF_KTHREAD off before search_binary_handler().
 THis has a theorical risk of PF_KTHREAD getting lost.  We don't have to
turn PF_KTHREAD off in the ENOEXEC case.

This patch moves this flag modification to after the finding of the
executable file.

This is only a theorical issue because kthreads do not call do_execve()
directly.  But fixing would be better.

Signed-off-by: KOSAKI Motohiro 
Acked-by: Roland McGrath 
Acked-by: Oleg Nesterov 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

core_pattern: fix truncation by core_pattern handler with long parameters

2010-10-28T01:03:12+00:00

We met a parameter truncated issue, consider following:
> echo "|/root/core_pattern_pipe_test %p /usr/libexec/blah-blah-blah \
%s %c %p %u %g 11 12345678901234567890123456789012345678 %t" > \
/proc/sys/kernel/core_pattern

This is okay because the strings is less than CORENAME_MAX_SIZE.  "cat
/proc/sys/kernel/core_pattern" shows the whole string.  but after we run
core_pattern_pipe_test in man page, we found last parameter was truncated
like below:

        argc[10]=<12807486>

The root cause is core_pattern allows % specifiers, which need to be
replaced during parse time, but the replace may expand the strings to
larger than CORENAME_MAX_SIZE.  So if the last parameter is % specifiers,
the replace code is using snprintf(out_ptr, out_end - out_ptr, ...), this
will write out of corename array.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Xiaotian Feng 
Cc: Alexander Viro 
Cc: Oleg Nesterov 
Cc: KOSAKI Motohiro 
Reviewed-by: Neil Horman 
Cc: Roland McGrath 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

signals: move cred_guard_mutex from task_struct to signal_struct

2010-10-28T01:03:12+00:00

Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
itself and we can reuse ->cred_guard_mutex for it.  Yes, concurrent
execve() has no worth.

Let's move ->cred_guard_mutex from task_struct to signal_struct.  It
naturally prevent multiple-threads-inside-exec.

Signed-off-by: KOSAKI Motohiro 
Reviewed-by: Oleg Nesterov 
Acked-by: Roland McGrath 
Acked-by: David Howells 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

oom: add per-mm oom disable count

2010-10-26T23:52:05+00:00

It's pointless to kill a task if another thread sharing its mm cannot be
killed to allow future memory freeing.  A subsequent patch will prevent
kills in such cases, but first it's necessary to have a way to flag a task
that shares memory with an OOM_DISABLE task that doesn't incur an
additional tasklist scan, which would make select_bad_process() an O(n^2)
function.

This patch adds an atomic counter to struct mm_struct that follows how
many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
They cannot be killed by the kernel, so their memory cannot be freed in
oom conditions.

This only requires task_lock() on the task that we're operating on, it
does not require mm->mmap_sem since task_lock() pins the mm and the
operation is atomic.

[rientjes@google.com: changelog and sys_unshare() code]
[rientjes@google.com: protect oom_disable_count with task_lock in fork]
[rientjes@google.com: use old_mm for oom_disable_count in exec]
Signed-off-by: Ying Han 
Signed-off-by: David Rientjes 
Cc: KAMEZAWA Hiroyuki 
Cc: KOSAKI Motohiro 
Cc: Rik van Riel 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds