linux-stable.git/kernel, branch v3.4.76

sched: Avoid throttle_cfs_rq() racing with period_timer stopping

2014-01-08T17:42:12+00:00

commit f9f9ffc237dd924f048204e8799da74f9ecf40cf upstream.

throttle_cfs_rq() doesn't check to make sure that period_timer is running,
and while update_curr/assign_cfs_runtime does, a concurrently running
period_timer on another cpu could cancel itself between this cpu's
update_curr and throttle_cfs_rq(). If there are no other cfs_rqs running
in the tg to restart the timer, this causes the cfs_rq to be stranded
forever.

Fix this by calling __start_cfs_bandwidth() in throttle if the timer is
inactive.

(Also add some sched_debug lines for cfs_bandwidth.)

Tested: make a run/sleep task in a cgroup, loop switching the cgroup
between 1ms/100ms quota and unlimited, checking for timer_active=0 and
throttled=1 as a failure. With the throttle_cfs_rq() change commented out
this fails, with the full patch it passes.

Signed-off-by: Ben Segall 
Signed-off-by: Peter Zijlstra 
Cc: pjt@google.com
Link: http://lkml.kernel.org/r/20131016181632.22647.84174.stgit@sword-of-the-dawn.mtv.corp.google.com
Signed-off-by: Ingo Molnar 
Signed-off-by: Chris J Arges 
Signed-off-by: Greg Kroah-Hartman

sched/rt: Fix rq's cpupri leak while enqueue/dequeue child RT entities

2014-01-08T17:42:11+00:00

commit 757dfcaa41844595964f1220f1d33182dae49976 upstream.

This patch touches the RT group scheduling case.

Functions inc_rt_prio_smp() and dec_rt_prio_smp() change (global) rq's
priority, while rt_rq passed to them may be not the top-level rt_rq.
This is wrong, because changing of priority on a child level does not
guarantee that the priority is the highest all over the rq. So, this
leak makes RT balancing unusable.

The short example: the task having the highest priority among all rq's
RT tasks (no one other task has the same priority) are waking on a
throttle rt_rq.  The rq's cpupri is set to the task's priority
equivalent, but real rq->rt.highest_prio.curr is less.

The patch below fixes the problem.

Signed-off-by: Kirill Tkhai 
Signed-off-by: Peter Zijlstra 
CC: Steven Rostedt 
Link: http://lkml.kernel.org/r/49231385567953@web4m.yandex.ru
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

ftrace: Initialize the ftrace profiler for each possible cpu

2014-01-08T17:42:10+00:00

commit c4602c1c818bd6626178d6d3fcc152d9f2f48ac0 upstream.

Ftrace currently initializes only the online CPUs. This implementation has
two problems:
- If we online a CPU after we enable the function profile, and then run the
  test, we will lose the trace information on that CPU.
  Steps to reproduce:
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # cd /tracing/
  # echo  >> set_ftrace_filter
  # echo 1 > function_profile_enabled
  # echo 1 > /sys/devices/system/cpu/cpu1/online
  # run test
- If we offline a CPU before we enable the function profile, we will not clear
  the trace information when we enable the function profile. It will trouble
  the users.
  Steps to reproduce:
  # cd /tracing/
  # echo  >> set_ftrace_filter
  # echo 1 > function_profile_enabled
  # run test
  # cat trace_stat/function*
  # echo 0 > /sys/devices/system/cpu/cpu1/online
  # echo 0 > function_profile_enabled
  # echo 1 > function_profile_enabled
  # cat trace_stat/function*
  # run test
  # cat trace_stat/function*

So it is better that we initialize the ftrace profiler for each possible cpu
every time we enable the function profile instead of just the online ones.

Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com

Signed-off-by: Miao Xie 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

futex: fix handling of read-only-mapped hugepages

2013-12-20T15:34:19+00:00

commit f12d5bfceb7e1f9051563381ec047f7f13956c3c upstream.

The hugepage code had the exact same bug that regular pages had in
commit 7485d0d3758e ("futexes: Remove rw parameter from
get_futex_key()").

The regular page case was fixed by commit 9ea71503a8ed ("futex: Fix
regression with read only mappings"), but the transparent hugepage case
(added in a5b338f2b0b1: "thp: update futex compound knowledge") case
remained broken.

Found by Dave Jones and his trinity tool.

Reported-and-tested-by: Dave Jones 
Acked-by: Thomas Gleixner 
Cc: Mel Gorman 
Cc: Darren Hart 
Cc: Andrea Arcangeli 
Cc: Oleg Nesterov 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman

irq: Enable all irqs unconditionally in irq_resume

2013-12-12T06:34:11+00:00

commit ac01810c9d2814238f08a227062e66a35a0e1ea2 upstream.

When the system enters suspend, it disables all interrupts in
suspend_device_irqs(), including the interrupts marked EARLY_RESUME.

On the resume side things are different. The EARLY_RESUME interrupts
are reenabled in sys_core_ops->resume and the non EARLY_RESUME
interrupts are reenabled in the normal system resume path.

When suspend_noirq() failed or suspend is aborted for any other
reason, we might omit the resume side call to sys_core_ops->resume()
and therefor the interrupts marked EARLY_RESUME are not reenabled and
stay disabled forever.

To solve this, enable all irqs unconditionally in irq_resume()
regardless whether interrupts marked EARLY_RESUMEhave been already
enabled or not.

This might try to reenable already enabled interrupts in the non
failure case, but the only affected platform is XEN and it has been
confirmed that it does not cause any side effects.

[ tglx: Massaged changelog. ]

Signed-off-by: Laxman Dewangan 
Acked-by-and-tested-by: Konrad Rzeszutek Wilk 
Acked-by: Heiko Stuebner 
Reviewed-by: Pavel Machek 
Cc: 
Cc: 
Cc: 
Cc: 
Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

ftrace: Fix function graph with loading of modules

2013-12-04T18:50:34+00:00

commit 8a56d7761d2d041ae5e8215d20b4167d8aa93f51 upstream.

Commit 8c4f3c3fa9681 "ftrace: Check module functions being traced on reload"
fixed module loading and unloading with respect to function tracing, but
it missed the function graph tracer. If you perform the following

 # cd /sys/kernel/debug/tracing
 # echo function_graph > current_tracer
 # modprobe nfsd
 # echo nop > current_tracer

You'll get the following oops message:

 ------------[ cut here ]------------
 WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
 Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
 CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
  0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
  ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
 Call Trace:
  [] dump_stack+0x4f/0x7c
  [] warn_slowpath_common+0x81/0x9b
  [] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
  [] warn_slowpath_null+0x1a/0x1c
  [] __ftrace_hash_rec_update.part.35+0x168/0x1b9
  [] ? __mutex_lock_slowpath+0x364/0x364
  [] ftrace_shutdown+0xd7/0x12b
  [] unregister_ftrace_graph+0x49/0x78
  [] graph_trace_reset+0xe/0x10
  [] tracing_set_tracer+0xa7/0x26a
  [] tracing_set_trace_write+0x8b/0xbd
  [] ? ftrace_return_to_handler+0xb2/0xde
  [] ? __sb_end_write+0x5e/0x5e
  [] vfs_write+0xab/0xf6
  [] ftrace_graph_caller+0x85/0x85
  [] SyS_write+0x59/0x82
  [] ftrace_graph_caller+0x85/0x85
  [] system_call_fastpath+0x16/0x1b
 ---[ end trace 940358030751eafb ]---

The above mentioned commit didn't go far enough. Well, it covered the
function tracer by adding checks in __register_ftrace_function(). The
problem is that the function graph tracer circumvents that (for a slight
efficiency gain when function graph trace is running with a function
tracer. The gain was not worth this).

The problem came with ftrace_startup() which should always be called after
__register_ftrace_function(), if you want this bug to be completely fixed.

Anyway, this solution moves __register_ftrace_function() inside of
ftrace_startup() and removes the need to call them both.

Reported-by: Dave Wysochanski 
Fixes: ed926f9b35cd ("ftrace: Use counters to enable functions to trace")
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

cpuset: Fix memory allocator deadlock

2013-12-04T18:50:34+00:00

commit 0fc0287c9ed1ffd3706f8b4d9b314aa102ef1245 upstream.

Juri hit the below lockdep report:

[    4.303391] ======================================================
[    4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
[    4.303394] 3.12.0-dl-peterz+ #144 Not tainted
[    4.303395] ------------------------------------------------------
[    4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[    4.303399]  (&p->mems_allowed_seq){+.+...}, at: [] new_slab+0x6c/0x290
[    4.303417]
[    4.303417] and this task is already holding:
[    4.303418]  (&(&q->__queue_lock)->rlock){..-...}, at: [] blk_execute_rq_nowait+0x5b/0x100
[    4.303431] which would create a new lock dependency:
[    4.303432]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
[    4.303436]

[    4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
[    4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
[    4.303922]    HARDIRQ-ON-W at:
[    4.303923]                     [] __lock_acquire+0x65a/0x1ff0
[    4.303926]                     [] lock_acquire+0x93/0x140
[    4.303929]                     [] kthreadd+0x86/0x180
[    4.303931]                     [] ret_from_fork+0x7c/0xb0
[    4.303933]    SOFTIRQ-ON-W at:
[    4.303933]                     [] __lock_acquire+0x68c/0x1ff0
[    4.303935]                     [] lock_acquire+0x93/0x140
[    4.303940]                     [] kthreadd+0x86/0x180
[    4.303955]                     [] ret_from_fork+0x7c/0xb0
[    4.303959]    INITIAL USE at:
[    4.303960]                    [] __lock_acquire+0x344/0x1ff0
[    4.303963]                    [] lock_acquire+0x93/0x140
[    4.303966]                    [] kthreadd+0x86/0x180
[    4.303969]                    [] ret_from_fork+0x7c/0xb0
[    4.303972]  }

Which reports that we take mems_allowed_seq with interrupts enabled. A
little digging found that this can only be from
cpuset_change_task_nodemask().

This is an actual deadlock because an interrupt doing an allocation will
hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
forever waiting for the write side to complete.

Cc: John Stultz 
Cc: Mel Gorman 
Reported-by: Juri Lelli 
Signed-off-by: Peter Zijlstra 
Tested-by: Juri Lelli 
Acked-by: Li Zefan 
Acked-by: Mel Gorman 
Signed-off-by: Tejun Heo 
Signed-off-by: Greg Kroah-Hartman

audit: fix info leak in AUDIT_GET requests

2013-12-04T18:50:32+00:00

commit 64fbff9ae0a0a843365d922e0057fc785f23f0e3 upstream.

We leak 4 bytes of kernel stack in response to an AUDIT_GET request as
we miss to initialize the mask member of status_set. Fix that.

Cc: Al Viro 
Cc: Eric Paris 
Signed-off-by: Mathias Krause 
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Eric Paris 
Signed-off-by: Greg Kroah-Hartman

audit: use nlmsg_len() to get message payload length

2013-12-04T18:50:31+00:00

commit 4d8fe7376a12bf4524783dd95cbc00f1fece6232 upstream.

Using the nlmsg_len member of the netlink header to test if the message
is valid is wrong as it includes the size of the netlink header itself.
Thereby allowing to send short netlink messages that pass those checks.

Use nlmsg_len() instead to test for the right message length. The result
of nlmsg_len() is guaranteed to be non-negative as the netlink message
already passed the checks of nlmsg_ok().

Also switch to min_t() to please checkpatch.pl.

Cc: Al Viro 
Cc: Eric Paris 
Signed-off-by: Mathias Krause 
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Eric Paris 
Signed-off-by: Greg Kroah-Hartman

audit: printk USER_AVC messages when audit isn't enabled

2013-12-04T18:50:31+00:00

commit 0868a5e150bc4c47e7a003367cd755811eb41e0b upstream.

When the audit=1 kernel parameter is absent and auditd is not running,
AUDIT_USER_AVC messages are being silently discarded.

AUDIT_USER_AVC messages should be sent to userspace using printk(), as
mentioned in the commit message of 4a4cd633 ("AUDIT: Optimise the
audit-disabled case for discarding user messages").

When audit_enabled is 0, audit_receive_msg() discards all user messages
except for AUDIT_USER_AVC messages. However, audit_log_common_recv_msg()
refuses to allocate an audit_buffer if audit_enabled is 0. The fix is to
special case AUDIT_USER_AVC messages in both functions.

It looks like commit 50397bd1 ("[AUDIT] clean up audit_receive_msg()")
introduced this bug.

Signed-off-by: Tyler Hicks 
Cc: Al Viro 
Cc: Eric Paris 
Cc: linux-audit@redhat.com
Acked-by: Kees Cook 
Signed-off-by: Richard Guy Briggs 
Signed-off-by: Eric Paris 
Signed-off-by: Greg Kroah-Hartman