<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/fork.c, branch v4.4.166</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>kthread: Fix use-after-free if kthread fork fails</title>
<updated>2018-09-19T20:48:55+00:00</updated>
<author>
<name>Vegard Nossum</name>
<email>vegard.nossum@oracle.com</email>
</author>
<published>2017-05-09T07:39:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=798ef283a8dd73dea2ae8f817abe75255fde772c'/>
<id>798ef283a8dd73dea2ae8f817abe75255fde772c</id>
<content type='text'>
commit 4d6501dce079c1eb6bf0b1d8f528a5e81770109e upstream.

If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but
fails in copy_process() between calling dup_task_struct() and setting
p-&gt;set_child_tid, then the value of p-&gt;set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().

    kthread()
     - worker_thread()
        - process_one_work()
        |  - call_usermodehelper_exec_work()
        |     - kernel_thread()
        |        - _do_fork()
        |           - copy_process()
        |              - dup_task_struct()
        |                 - arch_dup_task_struct()
        |                    - tsk-&gt;set_child_tid = current-&gt;set_child_tid // implied
        |              - ...
        |              - goto bad_fork_*
        |              - ...
        |              - free_task(tsk)
        |                 - free_kthread_struct(tsk)
        |                    - kfree(tsk-&gt;set_child_tid)
        - ...
        - schedule()
           - __schedule()
              - wq_worker_sleeping()
                 - kthread_data(task)-&gt;flags // UAF

The problem started showing up with commit 1da5c46fa965 since it reused
-&gt;set_child_tid for the kthread worker data.

A better long-term solution might be to get rid of the -&gt;set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.

Debugged-by: Jamie Iles &lt;jamie.iles@oracle.com&gt;
Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Jamie Iles &lt;jamie.iles@oracle.com&gt;
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Amit Pundir &lt;amit.pundir@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4d6501dce079c1eb6bf0b1d8f528a5e81770109e upstream.

If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but
fails in copy_process() between calling dup_task_struct() and setting
p-&gt;set_child_tid, then the value of p-&gt;set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().

    kthread()
     - worker_thread()
        - process_one_work()
        |  - call_usermodehelper_exec_work()
        |     - kernel_thread()
        |        - _do_fork()
        |           - copy_process()
        |              - dup_task_struct()
        |                 - arch_dup_task_struct()
        |                    - tsk-&gt;set_child_tid = current-&gt;set_child_tid // implied
        |              - ...
        |              - goto bad_fork_*
        |              - ...
        |              - free_task(tsk)
        |                 - free_kthread_struct(tsk)
        |                    - kfree(tsk-&gt;set_child_tid)
        - ...
        - schedule()
           - __schedule()
              - wq_worker_sleeping()
                 - kthread_data(task)-&gt;flags // UAF

The problem started showing up with commit 1da5c46fa965 since it reused
-&gt;set_child_tid for the kthread worker data.

A better long-term solution might be to get rid of the -&gt;set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.

Debugged-by: Jamie Iles &lt;jamie.iles@oracle.com&gt;
Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum &lt;vegard.nossum@oracle.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Cc: Andy Lutomirski &lt;luto@kernel.org&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Jamie Iles &lt;jamie.iles@oracle.com&gt;
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Amit Pundir &lt;amit.pundir@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>fork: don't copy inconsistent signal handler state to child</title>
<updated>2018-09-15T07:40:38+00:00</updated>
<author>
<name>Jann Horn</name>
<email>jannh@google.com</email>
</author>
<published>2018-08-22T05:00:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b7befd11e0b259699ed1ee69dd3ee66da25b2d5e'/>
<id>b7befd11e0b259699ed1ee69dd3ee66da25b2d5e</id>
<content type='text'>
[ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

Before this change, if a multithreaded process forks while one of its
threads is changing a signal handler using sigaction(), the memcpy() in
copy_sighand() can race with the struct assignment in do_sigaction().  It
isn't clear whether this can cause corruption of the userspace signal
handler pointer, but it definitely can cause inconsistency between
different fields of struct sigaction.

Take the appropriate spinlock to avoid this.

I have tested that this patch prevents inconsistency between sa_sigaction
and sa_flags, which is possible before this patch.

Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Peter Zijlstra (Intel)" &lt;peterz@infradead.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 06e62a46bbba20aa5286102016a04214bb446141 ]

Before this change, if a multithreaded process forks while one of its
threads is changing a signal handler using sigaction(), the memcpy() in
copy_sighand() can race with the struct assignment in do_sigaction().  It
isn't clear whether this can cause corruption of the userspace signal
handler pointer, but it definitely can cause inconsistency between
different fields of struct sigaction.

Take the appropriate spinlock to avoid this.

I have tested that this patch prevents inconsistency between sa_sigaction
and sa_flags, which is possible before this patch.

Link: http://lkml.kernel.org/r/20180702145108.73189-1-jannh@google.com
Signed-off-by: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Reviewed-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: "Peter Zijlstra (Intel)" &lt;peterz@infradead.org&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Sasha Levin &lt;alexander.levin@microsoft.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE</title>
<updated>2018-01-05T14:44:23+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2017-09-04T01:57:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=003e476716906afa135faf605ae0a5c3598c0293'/>
<id>003e476716906afa135faf605ae0a5c3598c0293</id>
<content type='text'>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping().  And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kaiser: merged update</title>
<updated>2018-01-05T14:44:23+00:00</updated>
<author>
<name>Dave Hansen</name>
<email>dave.hansen@linux.intel.com</email>
</author>
<published>2017-08-30T23:23:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bed9bb7f3e6d4045013d2bb9e4004896de57f02b'/>
<id>bed9bb7f3e6d4045013d2bb9e4004896de57f02b</id>
<content type='text'>
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>KAISER: Kernel Address Isolation</title>
<updated>2018-01-05T14:44:23+00:00</updated>
<author>
<name>Richard Fellner</name>
<email>richard.fellner@student.tugraz.at</email>
</author>
<published>2017-05-04T12:26:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8a43ddfb93a0c6ae1a6e1f5c25705ec5d1843c40'/>
<id>8a43ddfb93a0c6ae1a6e1f5c25705ec5d1843c40</id>
<content type='text'>
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner &lt;richard.fellner@student.tugraz.at&gt;
From: Daniel Gruss &lt;daniel.gruss@iaik.tugraz.at&gt;
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&amp;m=149390087310405&amp;w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: &lt;linux-kernel@vger.kernel.org&gt;
To: &lt;kernel-hardening@lists.openwall.com&gt;
Cc: &lt;clementine.maurice@iaik.tugraz.at&gt;
Cc: &lt;moritz.lipp@iaik.tugraz.at&gt;
Cc: Michael Schwarz &lt;michael.schwarz@iaik.tugraz.at&gt;
Cc: Richard Fellner &lt;richard.fellner@student.tugraz.at&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;anders.fogh@gdata-adan.de&gt;

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner &lt;richard.fellner@student.tugraz.at&gt;
Signed-off-by: Moritz Lipp &lt;moritz.lipp@iaik.tugraz.at&gt;
Signed-off-by: Daniel Gruss &lt;daniel.gruss@iaik.tugraz.at&gt;
Signed-off-by: Michael Schwarz &lt;michael.schwarz@iaik.tugraz.at&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner &lt;richard.fellner@student.tugraz.at&gt;
From: Daniel Gruss &lt;daniel.gruss@iaik.tugraz.at&gt;
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&amp;m=149390087310405&amp;w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: &lt;linux-kernel@vger.kernel.org&gt;
To: &lt;kernel-hardening@lists.openwall.com&gt;
Cc: &lt;clementine.maurice@iaik.tugraz.at&gt;
Cc: &lt;moritz.lipp@iaik.tugraz.at&gt;
Cc: Michael Schwarz &lt;michael.schwarz@iaik.tugraz.at&gt;
Cc: Richard Fellner &lt;richard.fellner@student.tugraz.at&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: &lt;kirill.shutemov@linux.intel.com&gt;
Cc: &lt;anders.fogh@gdata-adan.de&gt;

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner &lt;richard.fellner@student.tugraz.at&gt;
Signed-off-by: Moritz Lipp &lt;moritz.lipp@iaik.tugraz.at&gt;
Signed-off-by: Daniel Gruss &lt;daniel.gruss@iaik.tugraz.at&gt;
Signed-off-by: Michael Schwarz &lt;michael.schwarz@iaik.tugraz.at&gt;
Acked-by: Jiri Kosina &lt;jkosina@suse.cz&gt;
Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms</title>
<updated>2017-06-14T11:16:23+00:00</updated>
<author>
<name>Daniel Micay</name>
<email>danielmicay@gmail.com</email>
</author>
<published>2017-05-04T13:32:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2ff1edbbb29b11ca0cce7704c680ae88c3d78568'/>
<id>2ff1edbbb29b11ca0cce7704c680ae88c3d78568</id>
<content type='text'>
commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.

The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay &lt;danielmicay@gmail.com&gt;
Acked-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Arjan van Ven &lt;arjan@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: kernel-hardening@lists.openwall.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5ea30e4e58040cfd6434c2f33dc3ea76e2c15b05 upstream.

The stack canary is an 'unsigned long' and should be fully initialized to
random data rather than only 32 bits of random data.

Signed-off-by: Daniel Micay &lt;danielmicay@gmail.com&gt;
Acked-by: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Arjan van Ven &lt;arjan@linux.intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: kernel-hardening@lists.openwall.com
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()</title>
<updated>2017-05-25T12:30:11+00:00</updated>
<author>
<name>Kirill Tkhai</name>
<email>ktkhai@virtuozzo.com</email>
</author>
<published>2017-05-12T16:11:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6a70a5833ecc9147d8257e80f39e11d582810082'/>
<id>6a70a5833ecc9147d8257e80f39e11d582810082</id>
<content type='text'>
commit 3fd37226216620c1a468afa999739d5016fbc349 upstream.

Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:

Task from parent pid_ns             Child reaper
copy_process()                      ..
  alloc_pid()                       ..
  ..                                zap_pid_ns_processes()
  ..                                  disable_pid_allocation()
  ..                                  read_lock(&amp;tasklist_lock)
  ..                                  iterate over pids in pid_ns
  ..                                    kill tasks linked to pids
  ..                                  read_unlock(&amp;tasklist_lock)
  write_lock_irq(&amp;tasklist_lock);   ..
  attach_pid(p, PIDTYPE_PID);       ..
  ..                                ..

So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.

The patch fixes the problem. It simply checks for
(pid_ns-&gt;nr_hashed &amp; PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:

"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in -&gt;nr_hashed."

If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().

v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.

Fixes: c876ad768215 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
CC: Ingo Molnar &lt;mingo@kernel.org&gt;
CC: Peter Zijlstra &lt;peterz@infradead.org&gt;
CC: Oleg Nesterov &lt;oleg@redhat.com&gt;
CC: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
CC: Michal Hocko &lt;mhocko@suse.com&gt;
CC: Andy Lutomirski &lt;luto@kernel.org&gt;
CC: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
CC: Andrei Vagin &lt;avagin@openvz.org&gt;
CC: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
CC: Serge Hallyn &lt;serge@hallyn.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3fd37226216620c1a468afa999739d5016fbc349 upstream.

Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:

Task from parent pid_ns             Child reaper
copy_process()                      ..
  alloc_pid()                       ..
  ..                                zap_pid_ns_processes()
  ..                                  disable_pid_allocation()
  ..                                  read_lock(&amp;tasklist_lock)
  ..                                  iterate over pids in pid_ns
  ..                                    kill tasks linked to pids
  ..                                  read_unlock(&amp;tasklist_lock)
  write_lock_irq(&amp;tasklist_lock);   ..
  attach_pid(p, PIDTYPE_PID);       ..
  ..                                ..

So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.

The patch fixes the problem. It simply checks for
(pid_ns-&gt;nr_hashed &amp; PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:

"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in -&gt;nr_hashed."

If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().

v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.

Fixes: c876ad768215 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai &lt;ktkhai@virtuozzo.com&gt;
CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
CC: Ingo Molnar &lt;mingo@kernel.org&gt;
CC: Peter Zijlstra &lt;peterz@infradead.org&gt;
CC: Oleg Nesterov &lt;oleg@redhat.com&gt;
CC: Mike Rapoport &lt;rppt@linux.vnet.ibm.com&gt;
CC: Michal Hocko &lt;mhocko@suse.com&gt;
CC: Andy Lutomirski &lt;luto@kernel.org&gt;
CC: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
CC: Andrei Vagin &lt;avagin@openvz.org&gt;
CC: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
CC: Serge Hallyn &lt;serge@hallyn.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>kernek/fork.c: allocate idle task for a CPU always on its local node</title>
<updated>2017-03-26T10:13:18+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2016-05-23T23:24:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6052eb871217c0679ac63779fc5e43eb49c83b0c'/>
<id>6052eb871217c0679ac63779fc5e43eb49c83b0c</id>
<content type='text'>
commit 725fc629ff2545b061407305ae51016c9f928fce upstream.

Linux preallocates the task structs of the idle tasks for all possible
CPUs.  This currently means they all end up on node 0.  This also
implies that the cache line of MWAIT, which is around the flags field in
the task struct, are all located in node 0.

We see a noticeable performance improvement on Knights Landing CPUs when
the cache lines used for MWAIT are located in the local nodes of the
CPUs using them.  I would expect this to give a (likely slight)
improvement on other systems too.

The patch implements placing the idle task in the node of its CPUs, by
passing the right target node to copy_process()

[akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1]
Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 725fc629ff2545b061407305ae51016c9f928fce upstream.

Linux preallocates the task structs of the idle tasks for all possible
CPUs.  This currently means they all end up on node 0.  This also
implies that the cache line of MWAIT, which is around the flags field in
the task struct, are all located in node 0.

We see a noticeable performance improvement on Knights Landing CPUs when
the cache lines used for MWAIT are located in the local nodes of the
CPUs using them.  I would expect this to give a (likely slight)
improvement on other systems too.

The patch implements placing the idle task in the node of its CPUs, by
passing the right target node to copy_process()

[akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1]
Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org
Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Sumit Semwal &lt;sumit.semwal@linaro.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>mm: Add a user_ns owner to mm_struct and fix ptrace permission checks</title>
<updated>2017-01-06T10:16:11+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2016-10-14T02:23:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=03eed7afbc09e061f66b448daf7863174c3dc3f3'/>
<id>03eed7afbc09e061f66b448daf7863174c3dc3f3</id>
<content type='text'>
commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream.

During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file.  A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task-&gt;mm-&gt;user_ns instead of task-&gt;cred-&gt;user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task-&gt;mm-&gt;user_ns.  The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue.  task-&gt;cred-&gt;user_ns is always the same
as or descendent of mm-&gt;user_ns.  Which guarantees that having
CAP_SYS_PTRACE over mm-&gt;user_ns is the worst case for the tasks
credentials.

To prevent regressions mm-&gt;dumpable and mm-&gt;user_ns are not considered
when a task has no mm.  As simply failing ptrace_may_attach causes
regressions in privileged applications attempting to read things
such as /proc/&lt;pid&gt;/stat

Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Tested-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bfedb589252c01fa505ac9f6f2a3d5d68d707ef4 upstream.

During exec dumpable is cleared if the file that is being executed is
not readable by the user executing the file.  A bug in
ptrace_may_access allows reading the file if the executable happens to
enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).

This problem is fixed with only necessary userspace breakage by adding
a user namespace owner to mm_struct, captured at the time of exec, so
it is clear in which user namespace CAP_SYS_PTRACE must be present in
to be able to safely give read permission to the executable.

The function ptrace_may_access is modified to verify that the ptracer
has CAP_SYS_ADMIN in task-&gt;mm-&gt;user_ns instead of task-&gt;cred-&gt;user_ns.
This ensures that if the task changes it's cred into a subordinate
user namespace it does not become ptraceable.

The function ptrace_attach is modified to only set PT_PTRACE_CAP when
CAP_SYS_PTRACE is held over task-&gt;mm-&gt;user_ns.  The intent of
PT_PTRACE_CAP is to be a flag to note that whatever permission changes
the task might go through the tracer has sufficient permissions for
it not to be an issue.  task-&gt;cred-&gt;user_ns is always the same
as or descendent of mm-&gt;user_ns.  Which guarantees that having
CAP_SYS_PTRACE over mm-&gt;user_ns is the worst case for the tasks
credentials.

To prevent regressions mm-&gt;dumpable and mm-&gt;user_ns are not considered
when a task has no mm.  As simply failing ptrace_may_attach causes
regressions in privileged applications attempting to read things
such as /proc/&lt;pid&gt;/stat

Acked-by: Kees Cook &lt;keescook@chromium.org&gt;
Tested-by: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Fixes: 8409cca70561 ("userns: allow ptrace from non-init user namespaces")
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd</title>
<updated>2016-10-07T13:23:46+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-09-01T23:15:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=82b7839a4063855b666f5bd2d309871cbbe21eff'/>
<id>82b7839a4063855b666f5bd2d309871cbbe21eff</id>
<content type='text'>
commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: William Preston &lt;wpreston@suse.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Andreas Schwab &lt;schwab@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: William Preston &lt;wpreston@suse.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Andreas Schwab &lt;schwab@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
