linux.git/kernel/cgroup, branch v4.12

cpuset: consider dying css as offline

2017-05-24T16:43:30+00:00

In most cases, a cgroup controller don't care about the liftimes of
cgroups.  For the controller, a css becomes online when ->css_online()
is called on it and offline when ->css_offline() is called.

However, cpuset is special in that the user interface it exposes cares
whether certain cgroups exist or not.  Combined with the RCU delay
between cgroup removal and css offlining, this can lead to user
visible behavior oddities where operations which should succeed after
cgroup removals fail for some time period.  The effects of cgroup
removals are delayed when seen from userland.

This patch adds css_is_dying() which tests whether offline is pending
and updates is_cpuset_online() so that the function returns false also
while offline is pending.  This gets rid of the userland visible
delays.

Signed-off-by: Tejun Heo 
Reported-by: Daniel Jordan 
Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo

cgroup: Prevent kill_css() from being called more than once

2017-05-17T20:58:32+00:00

The kill_css() function may be called more than once under the condition
that the css was killed but not physically removed yet followed by the
removal of the cgroup that is hosting the css. This patch prevents any
harmm from being done when that happens.

Signed-off-by: Waiman Long 
Signed-off-by: Tejun Heo 
Cc: stable@vger.kernel.org # v4.5+

Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

2017-05-01T20:52:24+00:00

Pull cgroup updates from Tejun Heo:
 "Nothing major. Two notable fixes are Li's second stab at fixing the
  long-standing race condition in the mount path and suppression of
  spurious warning from cgroup_get(). All other changes are trivial"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: mark cgroup_get() with __maybe_unused
  cgroup: avoid attaching a cgroup root to two different superblocks, take 2
  cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()
  cgroup: move cgroup_subsys_state parent field for cache locality
  cpuset: Remove cpuset_update_active_cpus()'s parameter.
  cgroup: switch to BUG_ON()
  cgroup: drop duplicate header nsproxy.h
  kernel: convert css_set.refcount from atomic_t to refcount_t
  kernel: convert cgroup_namespace.count from atomic_t to refcount_t

cgroup: mark cgroup_get() with __maybe_unused

2017-05-01T19:24:14+00:00

a590b90d472f ("cgroup: fix spurious warnings on cgroup_is_dead() from
cgroup_sk_alloc()") converted most cgroup_get() usages to
cgroup_get_live() leaving cgroup_sk_alloc() the sole user of
cgroup_get().  When !CONFIG_SOCK_CGROUP_DATA, this ends up triggering
unused warning for cgroup_get().

Silence the warning by adding __maybe_unused to cgroup_get().

Reported-by: Stephen Rothwell 
Link: http://lkml.kernel.org/r/20170501145340.17e8ef86@canb.auug.org.au
Signed-off-by: Tejun Heo

cgroup: avoid attaching a cgroup root to two different superblocks, take 2

2017-04-28T22:04:54+00:00

Commit bfb0b80db5f9 ("cgroup: avoid attaching a cgroup root to two
different superblocks") is broken.  Now we try to fix the race by
delaying the initialization of cgroup root refcnt until a superblock
has been allocated.

Reported-by: Dmitry Vyukov 
Reported-by: Andrei Vagin 
Tested-by: Andrei Vagin 
Signed-off-by: Zefan Li 
Signed-off-by: Tejun Heo

cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc()

2017-04-28T19:28:20+00:00

cgroup_get() expected to be called only on live cgroups and triggers
warning on a dead cgroup; however, cgroup_sk_alloc() may be called
while cloning a socket which is left in an empty and removed cgroup
and thus may legitimately duplicate its reference on a dead cgroup.
This currently triggers the following warning spuriously.

 WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
 ...
  [] __warn+0xd3/0xf0
  [] warn_slowpath_null+0x1e/0x20
  [] cgroup_get+0x55/0x60
  [] cgroup_sk_alloc+0x51/0xe0
  [] sk_clone_lock+0x2db/0x390
  [] inet_csk_clone_lock+0x16/0xc0
  [] tcp_create_openreq_child+0x23/0x4b0
  [] tcp_v6_syn_recv_sock+0x91/0x670
  [] tcp_check_req+0x3a6/0x4e0
  [] tcp_v6_rcv+0x693/0xa00
  [] ip6_input_finish+0x59/0x3e0
  [] ip6_input+0x32/0xb0
  [] ip6_rcv_finish+0x57/0xa0
  [] ipv6_rcv+0x318/0x4d0
  [] __netif_receive_skb_core+0x2d7/0x9a0
  [] __netif_receive_skb+0x16/0x70
  [] netif_receive_skb_internal+0x23/0x80
  [] napi_gro_frags+0x208/0x270
  [] mlx4_en_process_rx_cq+0x74c/0xf40
  [] mlx4_en_poll_rx_cq+0x30/0x90
  [] net_rx_action+0x210/0x350
  [] __do_softirq+0x106/0x2c7
  [] irq_exit+0x9d/0xa0 [] do_IRQ+0x54/0xd0
  [] common_interrupt+0x7f/0x7f 
  [] cpuidle_enter+0x17/0x20
  [] cpu_startup_entry+0x2a9/0x2f0
  [] start_secondary+0xf1/0x100

This patch renames the existing cgroup_get() with the dead cgroup
warning to cgroup_get_live() after cgroup_kn_lock_live() and
introduces the new cgroup_get() which doesn't check whether the cgroup
is live or dead.

All existing cgroup_get() users except for cgroup_sk_alloc() are
converted to use cgroup_get_live().

Fixes: d979a39d7242 ("cgroup: duplicate cgroup reference when cloning sockets")
Cc: stable@vger.kernel.org # v4.5+
Cc: Johannes Weiner 
Reported-by: Chris Mason 
Signed-off-by: Tejun Heo

Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

2017-04-16T18:48:10+00:00

Pull cgroup fix from Tejun Heo:
 "Unfortunately, the commit to fix the cgroup mount race in the previous
  pull request can lead to hangs.

  The original bug has been around for a while and isn't too likely to
  be triggered in usual use cases. Revert the commit for now"

* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Revert "cgroup: avoid attaching a cgroup root to two different superblocks"

Revert "cgroup: avoid attaching a cgroup root to two different superblocks"

2017-04-16T14:17:37+00:00

This reverts commit bfb0b80db5f9dca5ac0a5fd0edb765ee555e5a8e.

Andrei reports CRIU test hangs with the patch applied.  The bug fixed
by the patch isn't too likely to trigger in actual uses.  Revert the
patch for now.

Signed-off-by: Tejun Heo 
Reported-by: Andrei Vagin 
Link: http://lkml.kernel.org/r/20170414232737.GC20350@outlook.office365.com

Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

2017-04-12T06:38:16+00:00

Pull cgroup fixes from Tejun Heo:
 "This contains fixes for two long standing subtle bugs:

   - kthread_bind() on a new kthread binds it to specific CPUs and
     prevents userland from messing with the affinity or cgroup
     membership. Unfortunately, for cgroup membership, there's a window
     between kthread creation and kthread_bind*() invocation where the
     kthread can be moved into a non-root cgroup by userland.

     Depending on what controllers are in effect, this can assign the
     kthread unexpected attributes. For example, in the reported case,
     workqueue workers ended up in a non-root cpuset cgroups and had
     their CPU affinities overridden. This broke workqueue invariants
     and led to workqueue stalls.

     Fixed by closing the window between kthread creation and
     kthread_bind() as suggested by Oleg.

   - There was a bug in cgroup mount path which could allow two
     competing mount attempts to attach the same cgroup_root to two
     different superblocks.

     This was caused by mishandling return value from kernfs_pin_sb().

     Fixed"

* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: avoid attaching a cgroup root to two different superblocks
  cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups

cgroup: avoid attaching a cgroup root to two different superblocks

2017-04-11T00:00:57+00:00

Run this:

    touch file0
    for ((; ;))
    {
        mount -t cpuset xxx file0
    }

And this concurrently:

    touch file1
    for ((; ;))
    {
        mount -t cpuset xxx file1
    }

We'll trigger a warning like this:

 ------------[ cut here ]------------
 WARNING: CPU: 1 PID: 4675 at lib/percpu-refcount.c:317 percpu_ref_kill_and_confirm+0x92/0xb0
 percpu_ref_kill_and_confirm called more than once on css_release!
 CPU: 1 PID: 4675 Comm: mount Not tainted 4.11.0-rc5+ #5
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
 Call Trace:
  dump_stack+0x63/0x84
  __warn+0xd1/0xf0
  warn_slowpath_fmt+0x5f/0x80
  percpu_ref_kill_and_confirm+0x92/0xb0
  cgroup_kill_sb+0x95/0xb0
  deactivate_locked_super+0x43/0x70
  deactivate_super+0x46/0x60
 ...
 ---[ end trace a79f61c2a2633700 ]---

Here's a race:

  Thread A				Thread B

  cgroup1_mount()
    # alloc a new cgroup root
    cgroup_setup_root()
					cgroup1_mount()
					  # no sb yet, returns NULL
					  kernfs_pin_sb()

					  # but succeeds in getting the refcnt,
					  # so re-use cgroup root
					  percpu_ref_tryget_live()
    # alloc sb with cgroup root
    cgroup_do_mount()

  cgroup_kill_sb()
					  # alloc another sb with same root
					  cgroup_do_mount()

					cgroup_kill_sb()

We end up using the same cgroup root for two different superblocks,
so percpu_ref_kill() will be called twice on the same root when the
two superblocks are destroyed.

We should fix to make sure the superblock pinning is really successful.

Cc: stable@vger.kernel.org # 3.16+
Reported-by: Dmitry Vyukov 
Signed-off-by: Zefan Li 
Signed-off-by: Tejun Heo