linux-stable.git/kernel/fork.c, branch v4.12.7

kernel/fork.c: virtually mapped stacks: do not disable interrupts

2017-07-27T22:10:23+00:00

commit 112166f88cf83dd11486cf1818672d42b540865b upstream.

The reason to disable interrupts seems to be to avoid switching to a
different processor while handling per cpu data using individual loads and
stores.  If we use per cpu RMV primitives we will not have to disable
interrupts.

Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1705171055130.5898@east.gentwo.org
Signed-off-by: Christoph Lameter 
Cc: Andy Lutomirski 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Mel Gorman 
Signed-off-by: Greg Kroah-Hartman

sched/cputime: Move the vtime task fields to their own struct

2017-07-27T22:10:23+00:00

commit bac5b6b6b11560f323e71d0ebac4061cfe5f56c0 upstream.

We are about to add vtime accumulation fields to the task struct. Let's
avoid more bloatification and gather vtime information to their own
struct.

Tested-by: Luiz Capitulino 
Signed-off-by: Frederic Weisbecker 
Reviewed-by: Thomas Gleixner 
Acked-by: Rik van Riel 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Wanpeng Li 
Link: http://lkml.kernel.org/r/1498756511-11714-5-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Mel Gorman 
Signed-off-by: Greg Kroah-Hartman

sched/cputime: Rename vtime fields

2017-07-27T22:10:22+00:00

commit 60a9ce57e7c5ac1df3a39fb941022bbfa40c0862 upstream.

The current "snapshot" based naming on vtime fields suggests we record
some past event but that's a low level picture of their actual purpose
which comes out blurry. The real point of these fields is to run a basic
state machine that tracks down cputime entry while switching between
contexts.

So lets reflect that with more meaningful names.

Tested-by: Luiz Capitulino 
Signed-off-by: Frederic Weisbecker 
Reviewed-by: Thomas Gleixner 
Acked-by: Rik van Riel 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Wanpeng Li 
Link: http://lkml.kernel.org/r/1498756511-11714-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Mel Gorman 
Signed-off-by: Greg Kroah-Hartman

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2017-05-27T15:52:27+00:00

Pull kthread fix from Thomas Gleixner:
 "A single fix which prevents a use after free when kthread fork fails"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kthread: Fix use-after-free if kthread fork fails

kthread: Fix use-after-free if kthread fork fails

2017-05-22T20:21:16+00:00

If a kthread forks (e.g. usermodehelper since commit 1da5c46fa965) but
fails in copy_process() between calling dup_task_struct() and setting
p->set_child_tid, then the value of p->set_child_tid will be inherited
from the parent and get prematurely freed by free_kthread_struct().

    kthread()
     - worker_thread()
        - process_one_work()
        |  - call_usermodehelper_exec_work()
        |     - kernel_thread()
        |        - _do_fork()
        |           - copy_process()
        |              - dup_task_struct()
        |                 - arch_dup_task_struct()
        |                    - tsk->set_child_tid = current->set_child_tid // implied
        |              - ...
        |              - goto bad_fork_*
        |              - ...
        |              - free_task(tsk)
        |                 - free_kthread_struct(tsk)
        |                    - kfree(tsk->set_child_tid)
        - ...
        - schedule()
           - __schedule()
              - wq_worker_sleeping()
                 - kthread_data(task)->flags // UAF

The problem started showing up with commit 1da5c46fa965 since it reused
->set_child_tid for the kthread worker data.

A better long-term solution might be to get rid of the ->set_child_tid
abuse. The comment in set_kthread_struct() also looks slightly wrong.

Debugged-by: Jamie Iles 
Fixes: 1da5c46fa965 ("kthread: Make struct kthread kmalloc'ed")
Signed-off-by: Vegard Nossum 
Acked-by: Oleg Nesterov 
Cc: Peter Zijlstra 
Cc: Greg Kroah-Hartman 
Cc: Andy Lutomirski 
Cc: Frederic Weisbecker 
Cc: Jamie Iles 
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170509073959.17858-1-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner

pid_ns: Fix race between setns'ed fork() and zap_pid_ns_processes()

2017-05-13T22:26:02+00:00

Imagine we have a pid namespace and a task from its parent's pid_ns,
which made setns() to the pid namespace. The task is doing fork(),
while the pid namespace's child reaper is dying. We have the race
between them:

Task from parent pid_ns             Child reaper
copy_process()                      ..
  alloc_pid()                       ..
  ..                                zap_pid_ns_processes()
  ..                                  disable_pid_allocation()
  ..                                  read_lock(&tasklist_lock)
  ..                                  iterate over pids in pid_ns
  ..                                    kill tasks linked to pids
  ..                                  read_unlock(&tasklist_lock)
  write_lock_irq(&tasklist_lock);   ..
  attach_pid(p, PIDTYPE_PID);       ..
  ..                                ..

So, just created task p won't receive SIGKILL signal,
and the pid namespace will be in contradictory state.
Only manual kill will help there, but does the userspace
care about this? I suppose, the most users just inject
a task into a pid namespace and wait a SIGCHLD from it.

The patch fixes the problem. It simply checks for
(pid_ns->nr_hashed & PIDNS_HASH_ADDING) in copy_process().
We do it under the tasklist_lock, and can't skip
PIDNS_HASH_ADDING as noted by Oleg:

"zap_pid_ns_processes() does disable_pid_allocation()
and then takes tasklist_lock to kill the whole namespace.
Given that copy_process() checks PIDNS_HASH_ADDING
under write_lock(tasklist) they can't race;
if copy_process() takes this lock first, the new child will
be killed, otherwise copy_process() can't miss
the change in ->nr_hashed."

If allocation is disabled, we just return -ENOMEM
like it's made for such cases in alloc_pid().

v2: Do not move disable_pid_allocation(), do not
introduce a new variable in copy_process() and simplify
the patch as suggested by Oleg Nesterov.
Account the problem with double irq enabling
found by Eric W. Biederman.

Fixes: c876ad768215 ("pidns: Stop pid allocation when init dies")
Signed-off-by: Kirill Tkhai 
CC: Andrew Morton 
CC: Ingo Molnar 
CC: Peter Zijlstra 
CC: Oleg Nesterov 
CC: Mike Rapoport 
CC: Michal Hocko 
CC: Andy Lutomirski 
CC: "Eric W. Biederman" 
CC: Andrei Vagin 
CC: Cyrill Gorcunov 
CC: Serge Hallyn 
Cc: stable@vger.kernel.org
Acked-by: Oleg Nesterov 
Signed-off-by: Eric W. Biederman

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2017-05-12T17:41:45+00:00

Pull stackprotector fixlet from Ingo Molnar:
 "A single fix/enhancement to increase stackprotector canary randomness
  on 64-bit kernels with very little cost"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms

Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2017-05-10T17:30:46+00:00

Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the  header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...

mm, vmalloc: use __GFP_HIGHMEM implicitly

2017-05-09T00:15:13+00:00

__vmalloc* allows users to provide gfp flags for the underlying
allocation.  This API is quite popular

  $ git grep "=[[:space:]]__vmalloc\|return[[:space:]]*__vmalloc" | wc -l
  77

The only problem is that many people are not aware that they really want
to give __GFP_HIGHMEM along with other flags because there is really no
reason to consume precious lowmemory on CONFIG_HIGHMEM systems for pages
which are mapped to the kernel vmalloc space.  About half of users don't
use this flag, though.  This signals that we make the API unnecessarily
too complex.

This patch simply uses __GFP_HIGHMEM implicitly when allocating pages to
be mapped to the vmalloc space.  Current users which add __GFP_HIGHMEM
are simplified and drop the flag.

Link: http://lkml.kernel.org/r/20170307141020.29107-1-mhocko@kernel.org
Signed-off-by: Michal Hocko 
Reviewed-by: Matthew Wilcox 
Cc: Al Viro 
Cc: Vlastimil Babka 
Cc: David Rientjes 
Cc: Cristopher Lameter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

fork: free vmapped stacks in cache when cpus are offline

2017-05-09T00:15:11+00:00

Using virtually mapped stack, kernel stacks are allocated via vmalloc.

In the current implementation, two stacks per cpu can be cached when
tasks are freed and the cached stacks are used again in task
duplications.  But the cached stacks may remain unfreed even when cpu
are offline.  By adding a cpu hotplug callback to free the cached stacks
when a cpu goes offline, the pages of the cached stacks are not wasted.

Link: http://lkml.kernel.org/r/1487076043-17802-1-git-send-email-hoeun.ryu@gmail.com
Signed-off-by: Hoeun Ryu 
Reviewed-by: Thomas Gleixner 
Acked-by: Michal Hocko 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Kees Cook 
Cc: "Eric W. Biederman" 
Cc: Oleg Nesterov 
Cc: Mateusz Guzik 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds