linux-stable.git/kernel, branch linux-3.10.y

tracing: Erase irqsoff trace with empty write

2017-11-02T09:46:01+00:00

commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream.

One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com

Cc: 
Cc: stable@vger.kernel.org
Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Bo Yan 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Willy Tarreau

tracing: Apply trace_clock changes to instance max buffer

2017-11-02T09:46:01+00:00

commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream.

Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.

Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Willy Tarreau

workqueue: implicit ordered attribute should be overridable

2017-11-02T09:45:58+00:00

commit 0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream.

5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1.  Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.

This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.

Signed-off-by: Tejun Heo 
Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
Cc: Holger Hoffstätte 
Signed-off-by: Greg Kroah-Hartman 

Signed-off-by: Willy Tarreau

kernel/extable.c: mark core_kernel_text notrace

2017-11-02T06:16:20+00:00

commit c0d80ddab89916273cb97114889d3f337bc370ae upstream.

core_kernel_text is used by MIPS in its function graph trace processing,
so having this method traced leads to an infinite set of recursive calls
such as:

  Call Trace:
     ftrace_return_to_handler+0x50/0x128
     core_kernel_text+0x10/0x1b8
     prepare_ftrace_return+0x6c/0x114
     ftrace_graph_caller+0x20/0x44
     return_to_handler+0x10/0x30
     return_to_handler+0x0/0x30
     return_to_handler+0x0/0x30
     ftrace_ops_no_ops+0x114/0x1bc
     core_kernel_text+0x10/0x1b8
     core_kernel_text+0x10/0x1b8
     core_kernel_text+0x10/0x1b8
     ftrace_ops_no_ops+0x114/0x1bc
     core_kernel_text+0x10/0x1b8
     prepare_ftrace_return+0x6c/0x114
     ftrace_graph_caller+0x20/0x44
     (...)

Mark the function notrace to avoid it being traced.

Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.com
Signed-off-by: Marcin Nowakowski 
Reviewed-by: Masami Hiramatsu 
Cc: Peter Zijlstra 
Cc: Thomas Meyer 
Cc: Ingo Molnar 
Cc: Steven Rostedt 
Cc: Daniel Borkmann 
Cc: Paul Gortmaker 
Cc: Thomas Gleixner 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Willy Tarreau

workqueue: restore WQ_UNBOUND/max_active==1 to be ordered

2017-11-01T21:12:44+00:00

commit 5c0338c68706be53b3dc472e4308961c36e4ece1 upstream.

The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution.  After NUMA affinity 4c16bd327c74 ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.

While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.

It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues.  Let's
automatically set __WQ_ORDERED for those workqueues.

Signed-off-by: Tejun Heo 
Reported-by: Christoph Hellwig 
Reported-by: Alexei Potashnik 
Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues")
Cc: stable@vger.kernel.org # v3.10+
Signed-off-by: Willy Tarreau

printk: use rcuidle console tracepoint

2017-06-20T12:04:54+00:00

commit fc98c3c8c9dcafd67adcce69e6ce3191d5306c9c upstream.

Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

  hw-breakpoint: Failed to enable monitor mode on CPU 0.
  hw-breakpoint: CPU 0 failed to disable vector catch

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-rc8-next-20170215+ #119 Not tainted
  -------------------------------
  ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  RCU used illegally from idle CPU!
  rcu_scheduler_active = 2, debug_locks = 0
  RCU used illegally from extended quiescent state!
  2 locks held by swapper/0/0:
   #0:  (cpu_pm_notifier_lock){......}, at: [] cpu_pm_exit+0x10/0x54
   #1:  (console_lock){+.+.+.}, at: [] vprintk_emit+0x264/0x474

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
  Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky 
Reported-by: Tony Lindgren 
Tested-by: Tony Lindgren 
Acked-by: Paul E. McKenney 
Acked-by: Steven Rostedt (VMware) 
Cc: Petr Mladek 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Lindgren 
Cc: Russell King 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
[wt: changes are in kernel/printk.c in 3.10]
Signed-off-by: Willy Tarreau

padata: avoid race in reordering

2017-06-20T12:04:40+00:00

commit de5540d088fe97ad583cc7d396586437b32149a5 upstream.

Under extremely heavy uses of padata, crashes occur, and with list
debugging turned on, this happens instead:

[87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33
__list_add+0xae/0x130
[87487.301868] list_add corruption. prev->next should be next
(ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00).
[87487.339011]  [] dump_stack+0x68/0xa3
[87487.342198]  [] ? console_unlock+0x281/0x6d0
[87487.345364]  [] __warn+0xff/0x140
[87487.348513]  [] warn_slowpath_fmt+0x4a/0x50
[87487.351659]  [] __list_add+0xae/0x130
[87487.354772]  [] ? _raw_spin_lock+0x64/0x70
[87487.357915]  [] padata_reorder+0x1e6/0x420
[87487.361084]  [] padata_do_serial+0xa5/0x120

padata_reorder calls list_add_tail with the list to which its adding
locked, which seems correct:

spin_lock(&squeue->serial.lock);
list_add_tail(&padata->list, &squeue->serial.list);
spin_unlock(&squeue->serial.lock);

This therefore leaves only place where such inconsistency could occur:
if padata->list is added at the same time on two different threads.
This pdata pointer comes from the function call to
padata_get_next(pd), which has in it the following block:

next_queue = per_cpu_ptr(pd->pqueue, cpu);
padata = NULL;
reorder = &next_queue->reorder;
if (!list_empty(&reorder->list)) {
       padata = list_entry(reorder->list.next,
                           struct padata_priv, list);
       spin_lock(&reorder->lock);
       list_del_init(&padata->list);
       atomic_dec(&pd->reorder_objects);
       spin_unlock(&reorder->lock);

       pd->processed++;

       goto out;
}
out:
return padata;

I strongly suspect that the problem here is that two threads can race
on reorder list. Even though the deletion is locked, call to
list_entry is not locked, which means it's feasible that two threads
pick up the same padata object and subsequently call list_add_tail on
them at the same time. The fix is thus be hoist that lock outside of
that block.

Signed-off-by: Jason A. Donenfeld 
Acked-by: Steffen Klassert 
Signed-off-by: Herbert Xu 
Signed-off-by: Willy Tarreau

futex: Add missing error handling to FUTEX_REQUEUE_PI

2017-06-20T12:04:31+00:00

commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Darren Hart 
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Willy Tarreau

futex: Fix potential use-after-free in FUTEX_REQUEUE_PI

2017-06-20T12:04:31+00:00

commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.

While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Reported-by: Dmitry Vyukov 
Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Darren Hart 
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Willy Tarreau

futex: Move futex_init() to core_initcall

2017-06-20T12:04:20+00:00

commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream.

The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.

The user mode helper is triggered by device init calls and the executable
might use the futex syscall.

futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.

If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.

Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.

[ tglx: Rewrote changelog ]

Signed-off-by: Yang Yang 
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner 
Signed-off-by: Willy Tarreau