linux-stable.git/kernel, branch v3.18.7

hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

2015-02-11T07:00:56+00:00

commit 2d926c15d629a13914ce3e5f26354f6a0ac99e70 upstream.

I noticed some CLOCK_TAI timer test failures on one of my
less-frequently used configurations. And after digging in I
found in 76f4108892d9 (Cleanup hrtimer accessors to the
timekepeing state), the hrtimer_get_softirq_time tai offset
calucation was incorrectly rewritten, as the tai offset we
return shold be from CLOCK_MONOTONIC, and not CLOCK_REALTIME.

This results in CLOCK_TAI timers expiring early on non-highres
capable machines.

This patch fixes the issue, calculating the tai time properly
from the monotonic base.

Signed-off-by: John Stultz 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/1423097126-10236-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Greg Kroah-Hartman

smpboot: Add missing get_online_cpus() in smpboot_register_percpu_thread()

2015-02-11T07:00:56+00:00

commit 4bee96860a65c3a62d332edac331b3cf936ba3ad upstream.

The following race exists in the smpboot percpu threads management:

CPU0	      	   	     CPU1
cpu_up(2)
  get_online_cpus();
  smpboot_create_threads(2);
			     smpboot_register_percpu_thread();
			     for_each_online_cpu();
			       __smpboot_create_thread();
  __cpu_up(2);

This results in a missing per cpu thread for the newly onlined cpu2 and
in a NULL pointer dereference on a consecutive offline of that cpu.

Proctect smpboot_register_percpu_thread() with get_online_cpus() to
prevent that.

[ tglx: Massaged changelog and removed the change in
        smpboot_unregister_percpu_thread() because that's an
        optimization and therefor not stable material. ]

Signed-off-by: Lai Jiangshan 
Cc: Thomas Gleixner 
Cc: Rusty Russell 
Cc: Peter Zijlstra 
Cc: Srivatsa S. Bhat 
Cc: David Rientjes 
Link: http://lkml.kernel.org/r/1406777421-12830-1-git-send-email-laijs@cn.fujitsu.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

time: adjtimex: Validate the ADJ_FREQUENCY values

2015-01-30T01:40:44+00:00

commit 5e5aeb4367b450a28f447f6d5ab57d8f2ab16a5f upstream.

Verify that the frequency value from userspace is valid and makes sense.

Unverified values can cause overflows later on.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Sasha Levin 
[jstultz: Fix up bug for negative values and drop redunent cap check]
Signed-off-by: John Stultz 
Signed-off-by: Greg Kroah-Hartman

time: settimeofday: Validate the values of tv from user

2015-01-30T01:40:44+00:00

commit 6ada1fc0e1c4775de0e043e1bd3ae9d065491aa5 upstream.

An unvalidated user input is multiplied by a constant, which can result in
an undefined behaviour for large values. While this is validated later,
we should avoid triggering undefined behaviour.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Signed-off-by: Sasha Levin 
[jstultz: include trivial milisecond->microsecond correction noticed
by Andy]
Signed-off-by: John Stultz 
Signed-off-by: Greg Kroah-Hartman

workqueue: fix subtle pool management issue which can stall whole worker_pool

2015-01-30T01:40:43+00:00

commit 29187a9eeaf362d8422e62e17a22a6e115277a49 upstream.

A worker_pool's forward progress is guaranteed by the fact that the
last idle worker assumes the manager role to create more workers and
summon the rescuers if creating workers doesn't succeed in timely
manner before proceeding to execute work items.

This manager role is implemented in manage_workers(), which indicates
whether the worker may proceed to work item execution with its return
value.  This is necessary because multiple workers may contend for the
manager role, and, if there already is a manager, others should
proceed to work item execution.

Unfortunately, the function also indicates that the worker may proceed
to work item execution if need_to_create_worker() is false at the head
of the function.  need_to_create_worker() tests the following
conditions.

	pending work items && !nr_running && !nr_idle

The first and third conditions are protected by pool->lock and thus
won't change while holding pool->lock; however, nr_running can change
asynchronously as other workers block and resume and while it's likely
to be zero, as someone woke this worker up in the first place, some
other workers could have become runnable inbetween making it non-zero.

If this happens, manage_worker() could return false even with zero
nr_idle making the worker, the last idle one, proceed to execute work
items.  If then all workers of the pool end up blocking on a resource
which can only be released by a work item which is pending on that
pool, the whole pool can deadlock as there's no one to create more
workers or summon the rescuers.

This patch fixes the problem by removing the early exit condition from
maybe_create_worker() and making manage_workers() return false iff
there's already another manager, which ensures that the last worker
doesn't start executing work items.

We can leave the early exit condition alone and just ignore the return
value but the only reason it was put there is because the
manage_workers() used to perform both creations and destructions of
workers and thus the function may be invoked while the pool is trying
to reduce the number of workers.  Now that manage_workers() is called
only when more workers are needed, the only case this early exit
condition is triggered is rare race conditions rendering it pointless.

Tested with simulated workload and modified workqueue code which
trigger the pool deadlock reliably without this patch.

Signed-off-by: Tejun Heo 
Reported-by: Eric Sandeen 
Link: http://lkml.kernel.org/g/54B019F4.8030009@sandeen.net
Cc: Dave Chinner 
Cc: Lai Jiangshan 
Signed-off-by: Greg Kroah-Hartman

ftrace: Check both notrace and filter for old hash

2015-01-27T16:29:41+00:00

commit 7485058eea40783ac142a60c3e799fc66ce72583 upstream.

Using just the filter for checking for trampolines or regs is not enough
when updating the code against the records that represent all functions.
Both the filter hash and the notrace hash need to be checked.

To trigger this bug (using trace-cmd and perf):

 # perf probe -a do_fork
 # trace-cmd start -B foo -e probe
 # trace-cmd record -p function_graph -n do_fork sleep 1

The trace-cmd record at the end clears the filter before it disables
function_graph tracing and then that causes the accounting of the
ftrace function records to become incorrect and causes ftrace to bug.

Link: http://lkml.kernel.org/r/20150114154329.358378039@goodmis.org

[ still need to switch old_hash_ops to old_ops_hash ]
Reviewed-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

ftrace: Fix updating of filters for shared global_ops filters

2015-01-27T16:29:41+00:00

commit 8f86f83709c585742dea5dd7f0d2b79c43f992ec upstream.

As the set_ftrace_filter affects both the function tracer as well as the
function graph tracer, the ops that represent each have a shared
ftrace_ops_hash structure. This allows both to be updated when the filter
files are updated.

But if function graph is enabled and the global_ops (function tracing) ops
is not, then it is possible that the filter could be changed without the
update happening for the function graph ops. This will cause the changes
to not take place and may even cause a ftrace_bug to occur as it could mess
with the trampoline accounting.

The solution is to check if the ops uses the shared global_ops filter and
if the ops itself is not enabled, to check if there's another ops that is
enabled and also shares the global_ops filter. In that case, the
modification still needs to be executed.

Link: http://lkml.kernel.org/r/20150114154329.055980438@goodmis.org

Reviewed-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt 
Signed-off-by: Greg Kroah-Hartman

genirq: Prevent proc race against freeing of irq descriptors

2015-01-27T16:29:37+00:00

commit c291ee622165cb2c8d4e7af63fffd499354a23be upstream.

Since the rework of the sparse interrupt code to actually free the
unused interrupt descriptors there exists a race between the /proc
interfaces to the irq subsystem and the code which frees the interrupt
descriptor.

CPU0				CPU1
				show_interrupts()
				  desc = irq_to_desc(X);
free_desc(desc)
  remove_from_radix_tree();
  kfree(desc);
				  raw_spinlock_irq(&desc->lock);

/proc/interrupts is the only interface which can actively corrupt
kernel memory via the lock access. /proc/stat can only read from freed
memory. Extremly hard to trigger, but possible.

The interfaces in /proc/irq/N/ are not affected by this because the
removal of the proc file is serialized in procfs against concurrent
readers/writers. The removal happens before the descriptor is freed.

For architectures which have CONFIG_SPARSE_IRQ=n this is a non issue
as the descriptor is never freed. It's merely cleared out with the irq
descriptor lock held. So any concurrent proc access will either see
the old correct value or the cleared out ones.

Protect the lookup and access to the irq descriptor in
show_interrupts() with the sparse_irq_lock.

Provide kstat_irqs_usr() which is protecting the lookup and access
with sparse_irq_lock and switch /proc/stat to use it.

Document the existing kstat_irqs interfaces so it's clear that the
caller needs to take care about protection. The users of these
interfaces are either not affected due to SPARSE_IRQ=n or already
protected against removal.

Fixes: 1f5a5b87f78f "genirq: Implement a sane sparse_irq allocator"
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

tick/powerclamp: Remove tick_nohz_idle abuse

2015-01-27T16:29:37+00:00

commit a5fd9733a30d18d7ac23f17080e7e07bb3205b69 upstream.

commit 4dbd27711cd9 "tick: export nohz tick idle symbols for module
use" was merged via the thermal tree without an explicit ack from the
relevant maintainers.

The exports are abused by the intel powerclamp driver which implements
a fake idle state from a sched FIFO task. This causes all kinds of
wreckage in the NOHZ core code which rightfully assumes that
tick_nohz_idle_enter/exit() are only called from the idle task itself.

Recent changes in the NOHZ core lead to a failure of the powerclamp
driver and now people try to hack completely broken and backwards
workarounds into the NOHZ core code. This is completely unacceptable
and just papers over the real problem. There are way more subtle
issues lurking around the corner.

The real solution is to fix the powerclamp driver by rewriting it with
a sane concept, but that's beyond the scope of this.

So the only solution for now is to remove the calls into the core NOHZ
code from the powerclamp trainwreck along with the exports.

Fixes: d6d71ee4a14a "PM: Introduce Intel PowerClamp Driver"
Signed-off-by: Thomas Gleixner 
Cc: Preeti U Murthy 
Cc: Viresh Kumar 
Cc: Frederic Weisbecker 
Cc: Fengguang Wu 
Cc: Frederic Weisbecker 
Cc: Pan Jacob jun 
Cc: LKP 
Cc: Peter Zijlstra 
Cc: Zhang Rui 
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412181110110.17382@nanos
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

exit: fix race between wait_consider_task() and wait_task_zombie()

2015-01-16T14:59:57+00:00

commit 3245d6acab981a2388ffb877c7ecc97e763c59d4 upstream.

wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and
both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE
change in between, gcc needs to reload p->exit_state after
security_task_wait().  In this case ->notask_error will be wrongly
cleared and do_wait() can hang forever if it was the last eligible
child.

Many thanks to Arne who carefully investigated the problem.

Note: this bug is very old but it was pure theoretical until commit
b3ab03160dfa ("wait: completely ignore the EXIT_DEAD tasks").  Before
this commit "-O2" was probably enough to guarantee that compiler won't
read ->exit_state twice.

Signed-off-by: Oleg Nesterov 
Reported-by: Arne Goedeke 
Tested-by: Arne Goedeke 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman