linux-stable.git/kernel, branch v3.12.71

printk: use rcuidle console tracepoint

2017-03-01T09:38:14+00:00

commit fc98c3c8c9dcafd67adcce69e6ce3191d5306c9c upstream.

Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

  hw-breakpoint: Failed to enable monitor mode on CPU 0.
  hw-breakpoint: CPU 0 failed to disable vector catch

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-rc8-next-20170215+ #119 Not tainted
  -------------------------------
  ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  RCU used illegally from idle CPU!
  rcu_scheduler_active = 2, debug_locks = 0
  RCU used illegally from extended quiescent state!
  2 locks held by swapper/0/0:
   #0:  (cpu_pm_notifier_lock){......}, at: [] cpu_pm_exit+0x10/0x54
   #1:  (console_lock){+.+.+.}, at: [] vprintk_emit+0x264/0x474

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119
  Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky 
Reported-by: Tony Lindgren 
Tested-by: Tony Lindgren 
Acked-by: Paul E. McKenney 
Acked-by: Steven Rostedt (VMware) 
Cc: Petr Mladek 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Tony Lindgren 
Cc: Russell King 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

futex: Move futex_init() to core_initcall

2017-03-01T09:38:14+00:00

commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream.

The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.

The user mode helper is triggered by device init calls and the executable
might use the futex syscall.

futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.

If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.

Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.

[ tglx: Rewrote changelog ]

Signed-off-by: Yang Yang 
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner 
Signed-off-by: Jiri Slaby

sched/debug: Don't dump sched debug info in SysRq-W

2017-02-16T10:44:47+00:00

commit fb90a6e93c0684ab2629a42462400603aa829b9c upstream.

sysrq_sched_debug_show() can dump a lot of information.  Don't print out
all that if we're just trying to get a list of blocked tasks (SysRq-W).
The information is still accessible with SysRq-T.

Signed-off-by: Rabin Vincent 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vincent@axis.com
Signed-off-by: Ingo Molnar 
Cc: Nikolay Borisov 
Signed-off-by: Jiri Slaby

sysctl: fix proc_doulongvec_ms_jiffies_minmax()

2017-02-15T10:56:07+00:00

commit ff9f8a7cf935468a94d9927c68b00daae701667e upstream.

We perform the conversion between kernel jiffies and ms only when
exporting kernel value to user space.

We need to do the opposite operation when value is written by user.

Only matters when HZ != 1000

Signed-off-by: Eric Dumazet 
Signed-off-by: Linus Torvalds 
Signed-off-by: Jiri Slaby

jump_labels: API for flushing deferred jump label updates

2017-01-26T16:40:35+00:00

commit b6416e61012429e0277bd15a229222fd17afc1c1 upstream.

Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.

[js] no STATIC_KEY_CHECK_USE in 3.12 -> remove it

Signed-off-by: David Matlack 
Acked-by: Peter Zijlstra (Intel) 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Jiri Slaby

tick/broadcast: Prevent NULL pointer dereference

2017-01-26T16:40:21+00:00

commit c1a9eeb938b5433947e5ea22f89baff3182e7075 upstream.

When a disfunctional timer, e.g. dummy timer, is installed, the tick core
tries to setup the broadcast timer.

If no broadcast device is installed, the kernel crashes with a NULL pointer
dereference in tick_broadcast_setup_oneshot() because the function has no
sanity check.

Reported-by: Mason 
Signed-off-by: Thomas Gleixner 
Cc: Mark Rutland 
Cc: Anna-Maria Gleixner 
Cc: Richard Cochran 
Cc: Sebastian Andrzej Siewior 
Cc: Daniel Lezcano 
Cc: Peter Zijlstra ,
Cc: Sebastian Frias 
Cc: Thibaud Cornic 
Cc: Robin Murphy 
Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr
Signed-off-by: Jiri Slaby

hotplug: Make register and unregister notifier API symmetric

2017-01-26T16:22:20+00:00

commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream.

Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
notifiers when HOTPLUG_CPU=y while the registration might succeed even
when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
might keep a stale notifier on the list on the manual clean up during
the pool tear down and thus corrupt the list. Resulting in the following

[  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
[  144.971337] IP: [] raw_notifier_chain_register+0x1b/0x40

[  145.122628] Call Trace:
[  145.125086]  [] __register_cpu_notifier+0x18/0x20
[  145.131350]  [] zswap_pool_create+0x273/0x400
[  145.137268]  [] __zswap_param_set+0x1fc/0x300
[  145.143188]  [] ? trace_hardirqs_on+0xd/0x10
[  145.149018]  [] ? kernel_param_lock+0x28/0x30
[  145.154940]  [] ? __might_fault+0x4f/0xa0
[  145.160511]  [] zswap_compressor_param_set+0x17/0x20
[  145.167035]  [] param_attr_store+0x5c/0xb0
[  145.172694]  [] module_attr_store+0x1d/0x30
[  145.178443]  [] sysfs_kf_write+0x4f/0x70
[  145.183925]  [] kernfs_fop_write+0x149/0x180
[  145.189761]  [] __vfs_write+0x18/0x40
[  145.194982]  [] vfs_write+0xb2/0x1a0
[  145.200122]  [] SyS_write+0x52/0xa0
[  145.205177]  [] entry_SYSCALL_64_fastpath+0x12/0x17

This can be even triggered manually by changing
/sys/module/zswap/parameters/compressor multiple times.

Fix this issue by making unregister APIs symmetric to the register so
there are no surprises.

[js] backport to 3.12

Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
Reported-and-tested-by: Yu Zhao 
Signed-off-by: Michal Hocko 
Cc: linux-mm@kvack.org
Cc: Andrew Morton 
Cc: Dan Streetman 
Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Jiri Slaby

locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()

2017-01-26T16:22:16+00:00

commit 1be5d4fa0af34fb7bafa205aeb59f5c7cc7a089d upstream.

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon 
Signed-off-by: Thomas Gleixner 
Acked-by: Peter Zijlstra (Intel) 
Cc: David Daney 
Cc: Linus Torvalds 
Cc: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Sebastian Siewior 
Cc: Steven Rostedt 
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Jiri Slaby

locking/rtmutex: Prevent dequeue vs. unlock race

2017-01-26T16:22:16+00:00

commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream.

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l->owner=T1
		rt_mutex_lock(l)
		lock(l->wait_lock)
		l->owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l->wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l->wait_lock)
				l->owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l->wait_lock)
				schedule()
		signal(->T2)	signal(->T3)
		lock(l->wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l->wait_lock)
				lock(l->wait_lock)
				dequeue(T3)
				  ===> wait list is now empty
				deboost()
				 unlock(l->wait_lock)
		lock(l->wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l->owner & ~HAS_WAITERS;
		    l->owner = owner
		     ==> l->owner = T1
		  }

				lock(l->wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
 ===> Success (l->owner = NULL)
				    l->owner = owner
				     ==> l->owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney 
Tested-by: David Daney 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Steven Rostedt 
Acked-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Sebastian Siewior 
Cc: Will Deacon 
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar 
Signed-off-by: Jiri Slaby

rcu: Fix soft lockup for rcu_nocb_kthread

2016-12-12T14:25:24+00:00

commit bedc1969150d480c462cdac320fa944b694a7162 upstream.

Carrying out the following steps results in a softlockup in the
RCU callback-offload (rcuo) kthreads:

1. Connect to ixgbevf, and set the speed to 10Gb/s.
2. Use ifconfig to bring the nic up and down repeatedly.

[  317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
[  368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15]
[  368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000
[  368.106005] RIP: 0010:[]  [] fib_table_lookup+0x14/0x390
[  368.106005] RSP: 0018:ffff88061fc83ce8  EFLAGS: 00000286
[  368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001
[  368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00
[  368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000
[  368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58
[  368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0
[  368.106005] FS:  0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000
[  368.106005] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0
[  368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  368.106005] Stack:
[  368.106005]  00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00
[  368.106005]  ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146
[  368.106005]  ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0
[  368.106005] Call Trace:
[  368.106005]  
[  368.106005]
[  368.106005]  [] ip_route_input_noref+0x516/0xbd0
[  368.106005]  [] ? skb_release_data+0xd6/0x110
[  368.106005]  [] ? kfree_skb+0x3a/0xa0
[  368.106005]  [] ip_rcv_finish+0x29f/0x350
[  368.106005]  [] ip_rcv+0x234/0x380
[  368.106005]  [] __netif_receive_skb_core+0x676/0x870
[  368.106005]  [] __netif_receive_skb+0x18/0x60
[  368.106005]  [] process_backlog+0xae/0x180
[  368.106005]  [] net_rx_action+0x152/0x240
[  368.106005]  [] __do_softirq+0xef/0x280
[  368.106005]  [] call_softirq+0x1c/0x30
[  368.106005]  
[  368.106005]
[  368.106005]  [] do_softirq+0x65/0xa0
[  368.106005]  [] local_bh_enable+0x94/0xa0
[  368.106005]  [] rcu_nocb_kthread+0x232/0x370
[  368.106005]  [] ? wake_up_bit+0x30/0x30
[  368.106005]  [] ? rcu_start_gp+0x40/0x40
[  368.106005]  [] kthread+0xcf/0xe0
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140
[  368.106005]  [] ret_from_fork+0x58/0x90
[  368.106005]  [] ? kthread_create_on_node+0x140/0x140

==================================cut here==============================

It turns out that the rcuos callback-offload kthread is busy processing
a very large quantity of RCU callbacks, and it is not reliquishing the
CPU while doing so.  This commit therefore adds an cond_resched_rcu_qs()
within the loop to allow other tasks to run.

[js] use onlu cond_resched() in 3.12

Signed-off-by: Ding Tianhong 
[ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ]
Signed-off-by: Paul E. McKenney 
Cc: Dhaval Giani 
Signed-off-by: Jiri Slaby