<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/exec.c, branch v2.6.39</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Small typo fix...</title>
<updated>2011-03-21T04:16:09+00:00</updated>
<author>
<name>Holger Hans Peter Freyther</name>
<email>holger@freyther.de</email>
</author>
<published>2011-02-24T16:46:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1bef82917c74249ff21982127e57defd6ca2bb1b'/>
<id>1bef82917c74249ff21982127e57defd6ca2bb1b</id>
<content type='text'>
Hi,

I was backporting the coredump over pipe feature and noticed this small typo,
I wish I would have something bigger to contribute...

&gt;From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
From: Holger Hans Peter Freyther &lt;holger@moiji-mobile.com&gt;
Date: Thu, 24 Feb 2011 17:42:50 +0100
Subject: [PATCH] fs: Fix a small typo in the comment

The function is called umh_pipe_setup not uhm_pipe_setup.

Signed-off-by: Holger Hans Peter Freyther &lt;holger@moiji-mobile.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hi,

I was backporting the coredump over pipe feature and noticed this small typo,
I wish I would have something bigger to contribute...

&gt;From 15d6080e0ed4267da103c706917a33b1015e8804 Mon Sep 17 00:00:00 2001
From: Holger Hans Peter Freyther &lt;holger@moiji-mobile.com&gt;
Date: Thu, 24 Feb 2011 17:42:50 +0100
Subject: [PATCH] fs: Fix a small typo in the comment

The function is called umh_pipe_setup not uhm_pipe_setup.

Signed-off-by: Holger Hans Peter Freyther &lt;holger@moiji-mobile.com&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>switch do_filp_open() to struct open_flags</title>
<updated>2011-03-14T13:15:25+00:00</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2011-02-23T22:44:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=47c805dc2d2dff686962f5f0baa6bac2d703ba19'/>
<id>47c805dc2d2dff686962f5f0baa6bac2d703ba19</id>
<content type='text'>
take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>vfs: sparse: add __FMODE_EXEC</title>
<updated>2011-02-03T00:03:19+00:00</updated>
<author>
<name>Namhyung Kim</name>
<email>namhyung@gmail.com</email>
</author>
<published>2011-02-01T23:52:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3cd90ea42f2c15f928b70ed66f6d8ed0a8e7aadd'/>
<id>3cd90ea42f2c15f928b70ed66f6d8ed0a8e7aadd</id>
<content type='text'>
FMODE_EXEC is a constant type of fmode_t but was used with normal integer
constants.  This results in following warnings from sparse.  Fix it using
new macro __FMODE_EXEC.

 fs/exec.c:116:58: warning: restricted fmode_t degrades to integer
 fs/exec.c:689:58: warning: restricted fmode_t degrades to integer
 fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
FMODE_EXEC is a constant type of fmode_t but was used with normal integer
constants.  This results in following warnings from sparse.  Fix it using
new macro __FMODE_EXEC.

 fs/exec.c:116:58: warning: restricted fmode_t degrades to integer
 fs/exec.c:689:58: warning: restricted fmode_t degrades to integer
 fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer

Signed-off-by: Namhyung Kim &lt;namhyung@gmail.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>install_special_mapping skips security_file_mmap check.</title>
<updated>2010-12-15T20:30:36+00:00</updated>
<author>
<name>Tavis Ormandy</name>
<email>taviso@cmpxchg8b.com</email>
</author>
<published>2010-12-09T14:29:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=462e635e5b73ba9a4c03913b77138cd57ce4b050'/>
<id>462e635e5b73ba9a4c03913b77138cd57ce4b050</id>
<content type='text'>
The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &amp;
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.

Signed-off-by: Tavis Ormandy &lt;taviso@google.com&gt;
Acked-by: Kees Cook &lt;kees@ubuntu.com&gt;
Acked-by: Robert Swiecki &lt;swiecki@google.com&gt;
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris &lt;jmorris@namei.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &amp;
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.

Signed-off-by: Tavis Ormandy &lt;taviso@google.com&gt;
Acked-by: Kees Cook &lt;kees@ubuntu.com&gt;
Acked-by: Robert Swiecki &lt;swiecki@google.com&gt;
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris &lt;jmorris@namei.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: copy-and-paste the fixes into compat_do_execve() paths</title>
<updated>2010-12-01T01:56:38+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-11-30T19:56:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=114279be2120a916e8a04feeb2ac976a10016f2f'/>
<id>114279be2120a916e8a04feeb2ac976a10016f2f</id>
<content type='text'>
Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm-&gt;vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Note: this patch targets 2.6.37 and tries to be as simple as possible.
That is why it adds more copy-and-paste horror into fs/compat.c and
uglifies fs/exec.c, this will be cleanuped later.

compat_copy_strings() plays with bprm-&gt;vma/mm directly and thus has
two problems: it lacks the RLIMIT_STACK check and argv/envp memory
is not visible to oom killer.

Export acct_arg_size() and get_arg_page(), change compat_copy_strings()
to use get_arg_page(), change compat_do_execve() to do acct_arg_size(0)
as do_execve() does.

Add the fatal_signal_pending/cond_resched checks into compat_count() and
compat_copy_strings(), this matches the code in fs/exec.c and certainly
makes sense.

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: make argv/envp memory visible to oom-killer</title>
<updated>2010-12-01T01:56:37+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2010-11-30T19:55:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3c77f845722158206a7209c45ccddc264d19319c'/>
<id>3c77f845722158206a7209c45ccddc264d19319c</id>
<content type='text'>
Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()-&gt;copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm-&gt;mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes -&gt;mm or fails.

Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Reviewed-and-discussed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Brad Spengler published a local memory-allocation DoS that
evades the OOM-killer (though not the virtual memory RLIMIT):
http://www.grsecurity.net/~spender/64bit_dos.c

execve()-&gt;copy_strings() can allocate a lot of memory, but
this is not visible to oom-killer, nobody can see the nascent
bprm-&gt;mm and take it into account.

With this patch get_arg_page() increments current's MM_ANONPAGES
counter every time we allocate the new page for argv/envp. When
do_execve() succeds or fails, we change this counter back.

Technically this is not 100% correct, we can't know if the new
page is swapped out and turn MM_ANONPAGES into MM_SWAPENTS, but
I don't think this really matters and everything becomes correct
once exec changes -&gt;mm or fails.

Reported-by: Brad Spengler &lt;spender@grsecurity.net&gt;
Reviewed-and-discussed-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>exec: don't turn PF_KTHREAD off when a target command was not found</title>
<updated>2010-10-28T01:03:13+00:00</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2010-10-27T22:34:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=98391cf4dcf893e9e74e1c14189851dbc9c5ad0d'/>
<id>98391cf4dcf893e9e74e1c14189851dbc9c5ad0d</id>
<content type='text'>
Presently do_execve() turns PF_KTHREAD off before search_binary_handler().
 THis has a theorical risk of PF_KTHREAD getting lost.  We don't have to
turn PF_KTHREAD off in the ENOEXEC case.

This patch moves this flag modification to after the finding of the
executable file.

This is only a theorical issue because kthreads do not call do_execve()
directly.  But fixing would be better.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Presently do_execve() turns PF_KTHREAD off before search_binary_handler().
 THis has a theorical risk of PF_KTHREAD getting lost.  We don't have to
turn PF_KTHREAD off in the ENOEXEC case.

This patch moves this flag modification to after the finding of the
executable file.

This is only a theorical issue because kthreads do not call do_execve()
directly.  But fixing would be better.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>core_pattern: fix truncation by core_pattern handler with long parameters</title>
<updated>2010-10-28T01:03:12+00:00</updated>
<author>
<name>Xiaotian Feng</name>
<email>dfeng@redhat.com</email>
</author>
<published>2010-10-27T22:34:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1b0d300bd0f047e2edaf9d4b6784189e6c67c3d1'/>
<id>1b0d300bd0f047e2edaf9d4b6784189e6c67c3d1</id>
<content type='text'>
We met a parameter truncated issue, consider following:
&gt; echo "|/root/core_pattern_pipe_test %p /usr/libexec/blah-blah-blah \
%s %c %p %u %g 11 12345678901234567890123456789012345678 %t" &gt; \
/proc/sys/kernel/core_pattern

This is okay because the strings is less than CORENAME_MAX_SIZE.  "cat
/proc/sys/kernel/core_pattern" shows the whole string.  but after we run
core_pattern_pipe_test in man page, we found last parameter was truncated
like below:

        argc[10]=&lt;12807486&gt;

The root cause is core_pattern allows % specifiers, which need to be
replaced during parse time, but the replace may expand the strings to
larger than CORENAME_MAX_SIZE.  So if the last parameter is % specifiers,
the replace code is using snprintf(out_ptr, out_end - out_ptr, ...), this
will write out of corename array.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Xiaotian Feng &lt;dfeng@redhat.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We met a parameter truncated issue, consider following:
&gt; echo "|/root/core_pattern_pipe_test %p /usr/libexec/blah-blah-blah \
%s %c %p %u %g 11 12345678901234567890123456789012345678 %t" &gt; \
/proc/sys/kernel/core_pattern

This is okay because the strings is less than CORENAME_MAX_SIZE.  "cat
/proc/sys/kernel/core_pattern" shows the whole string.  but after we run
core_pattern_pipe_test in man page, we found last parameter was truncated
like below:

        argc[10]=&lt;12807486&gt;

The root cause is core_pattern allows % specifiers, which need to be
replaced during parse time, but the replace may expand the strings to
larger than CORENAME_MAX_SIZE.  So if the last parameter is % specifiers,
the replace code is using snprintf(out_ptr, out_end - out_ptr, ...), this
will write out of corename array.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Xiaotian Feng &lt;dfeng@redhat.com&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Neil Horman &lt;nhorman@tuxdriver.com&gt;
Cc: Roland McGrath &lt;roland@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signals: move cred_guard_mutex from task_struct to signal_struct</title>
<updated>2010-10-28T01:03:12+00:00</updated>
<author>
<name>KOSAKI Motohiro</name>
<email>kosaki.motohiro@jp.fujitsu.com</email>
</author>
<published>2010-10-27T22:34:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b1bf12d5d51bca178dea21b04a0805e29d60cf1'/>
<id>9b1bf12d5d51bca178dea21b04a0805e29d60cf1</id>
<content type='text'>
Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
itself and we can reuse -&gt;cred_guard_mutex for it.  Yes, concurrent
execve() has no worth.

Let's move -&gt;cred_guard_mutex from task_struct to signal_struct.  It
naturally prevent multiple-threads-inside-exec.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
itself and we can reuse -&gt;cred_guard_mutex for it.  Yes, concurrent
execve() has no worth.

Let's move -&gt;cred_guard_mutex from task_struct to signal_struct.  It
naturally prevent multiple-threads-inside-exec.

Signed-off-by: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Reviewed-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Roland McGrath &lt;roland@redhat.com&gt;
Acked-by: David Howells &lt;dhowells@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>oom: add per-mm oom disable count</title>
<updated>2010-10-26T23:52:05+00:00</updated>
<author>
<name>Ying Han</name>
<email>yinghan@google.com</email>
</author>
<published>2010-10-26T21:21:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d5992d2ac7dc09aed8ab537cba074589f0f0a52'/>
<id>3d5992d2ac7dc09aed8ab537cba074589f0f0a52</id>
<content type='text'>
It's pointless to kill a task if another thread sharing its mm cannot be
killed to allow future memory freeing.  A subsequent patch will prevent
kills in such cases, but first it's necessary to have a way to flag a task
that shares memory with an OOM_DISABLE task that doesn't incur an
additional tasklist scan, which would make select_bad_process() an O(n^2)
function.

This patch adds an atomic counter to struct mm_struct that follows how
many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
They cannot be killed by the kernel, so their memory cannot be freed in
oom conditions.

This only requires task_lock() on the task that we're operating on, it
does not require mm-&gt;mmap_sem since task_lock() pins the mm and the
operation is atomic.

[rientjes@google.com: changelog and sys_unshare() code]
[rientjes@google.com: protect oom_disable_count with task_lock in fork]
[rientjes@google.com: use old_mm for oom_disable_count in exec]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's pointless to kill a task if another thread sharing its mm cannot be
killed to allow future memory freeing.  A subsequent patch will prevent
kills in such cases, but first it's necessary to have a way to flag a task
that shares memory with an OOM_DISABLE task that doesn't incur an
additional tasklist scan, which would make select_bad_process() an O(n^2)
function.

This patch adds an atomic counter to struct mm_struct that follows how
many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
They cannot be killed by the kernel, so their memory cannot be freed in
oom conditions.

This only requires task_lock() on the task that we're operating on, it
does not require mm-&gt;mmap_sem since task_lock() pins the mm and the
operation is atomic.

[rientjes@google.com: changelog and sys_unshare() code]
[rientjes@google.com: protect oom_disable_count with task_lock in fork]
[rientjes@google.com: use old_mm for oom_disable_count in exec]
Signed-off-by: Ying Han &lt;yinghan@google.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
