linux-stable.git/kernel/time, branch v4.4.136

time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

2018-05-26T06:49:00+00:00

commit 3d88d56c5873f6eebe23e05c3da701960146b801 upstream.

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Signed-off-by: John Stultz 
Tested-by: Daniel Mentz 
Cc: Prarit Bhargava 
Cc: Kevin Brodsky 
Cc: Richard Cochran 
Cc: Stephen Boyd 
Cc: Will Deacon 
Cc: "stable #4 . 8+" 
Cc: Miroslav Lichvar 
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner 
[fabrizio: cherry-pick to 4.4. Kept cycle_t type for function
logarithmic_accumulation local variable "interval". Dropped
casting of "interval" variable]
Signed-off-by: Fabrizio Castro 
Signed-off-by: Biju Das 
Signed-off-by: Greg Kroah-Hartman

tick/broadcast: Use for_each_cpu() specially on UP kernels

2018-05-26T06:48:56+00:00

commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream.

for_each_cpu() unintuitively reports CPU0 as set independent of the actual
cpumask content on UP kernels. This causes an unexpected PIT interrupt
storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as
a result, the virtual machine can suffer from a strange random delay of 1~20
minutes during boot-up, and sometimes it can hang forever.

Protect if by checking whether the cpumask is empty before entering the
for_each_cpu() loop.

[ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ]

Signed-off-by: Dexuan Cui 
Signed-off-by: Thomas Gleixner 
Cc: Josh Poulson 
Cc: "Michael Kelley (EOSG)" 
Cc: Peter Zijlstra 
Cc: Frederic Weisbecker 
Cc: stable@vger.kernel.org
Cc: Rakib Mullick 
Cc: Jork Loeser 
Cc: Greg Kroah-Hartman 
Cc: Andrew Morton 
Cc: KY Srinivasan 
Cc: Linus Torvalds 
Cc: Alexey Dobriyan 
Cc: Dmitry Vyukov 
Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM
Signed-off-by: Greg Kroah-Hartman

time: Change posix clocks ops interfaces to use timespec64

2018-03-24T09:58:40+00:00

[ Upstream commit d340266e19ddb70dbd608f9deedcfb35fdb9d419 ]

struct timespec is not y2038 safe on 32 bit machines.

The posix clocks apis use struct timespec directly and through struct
itimerspec.

Replace the posix clock interfaces to use struct timespec64 and struct
itimerspec64 instead.  Also fix up their implementations accordingly.

Note that the clock_getres() interface has also been changed to use
timespec64 even though this particular interface is not affected by the
y2038 problem. This helps verification for internal kernel code for y2038
readiness by getting rid of time_t/ timeval/ timespec.

Signed-off-by: Deepa Dinamani 
Cc: arnd@arndb.de
Cc: y2038@lists.linaro.org
Cc: netdev@vger.kernel.org
Cc: Richard Cochran 
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1490555058-4603-3-git-send-email-deepa.kernel@gmail.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

sysrq: Reset the watchdog timers while displaying high-resolution timers

2018-03-22T08:23:21+00:00

[ Upstream commit 0107042768658fea9f5f5a9c00b1c90f5dab6a06 ]

On systems with a large number of CPUs, running sysrq- can cause
watchdog timeouts.  There are two slow sections of code in the sysrq-
path in timer_list.c.

1. print_active_timers() - This function is called by print_cpu() and
   contains a slow goto loop.  On a machine with hundreds of CPUs, this
   loop took approximately 100ms for the first CPU in a NUMA node.
   (Subsequent CPUs in the same node ran much quicker.)  The total time
   to print all of the CPUs is ultimately long enough to trigger the
   soft lockup watchdog.

2. print_tickdevice() - This function outputs a large amount of textual
   information.  This function also took approximately 100ms per CPU.

Since sysrq- is not a performance critical path, there should be no
harm in touching the nmi watchdog in both slow sections above.  Touching
it in just one location was insufficient on systems with hundreds of
CPUs as occasional timeouts were still observed during testing.

This issue was observed on an Oracle T7 machine with 128 CPUs, but I
anticipate it may affect other systems with similarly large numbers of
CPUs.

Signed-off-by: Tom Hromatka 
Reviewed-by: Rob Gardner 
Signed-off-by: John Stultz 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

timers, sched_clock: Update timeout for clock wrap

2018-03-22T08:23:21+00:00

[ Upstream commit 1b8955bc5ac575009835e371ae55e7f3af2197a9 ]

The scheduler clock framework may not use the correct timeout for the clock
wrap. This happens when a new clock driver calls sched_clock_register()
after the kernel called sched_clock_postinit(). In this case the clock wrap
timeout is too long thus sched_clock_poll() is called too late and the clock
already wrapped.

On my ARM system the scheduler was no longer scheduling any other task than
the idle task because the sched_clock() wrapped.

Signed-off-by: David Engraf 
Signed-off-by: John Stultz 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)

2018-03-03T09:19:41+00:00

commit 48d0c9becc7f3c66874c100c126459a9da0fdced upstream.

The POSIX specification defines that relative CLOCK_REALTIME timers are not
affected by clock modifications. Those timers have to use CLOCK_MONOTONIC
to ensure POSIX compliance.

The introduction of the additional HRTIMER_MODE_PINNED mode broke this
requirement for pinned timers.

There is no user space visible impact because user space timers are not
using pinned mode, but for consistency reasons this needs to be fixed.

Check whether the mode has the HRTIMER_MODE_REL bit set instead of
comparing with HRTIMER_MODE_ABS.

Signed-off-by: Anna-Maria Gleixner 
Cc: Christoph Hellwig 
Cc: John Stultz 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: keescook@chromium.org
Fixes: 597d0275736d ("timers: Framework for identifying pinned timers")
Link: http://lkml.kernel.org/r/20171221104205.7269-7-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar 
Cc: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

posix-timer: Properly check sigevent->sigev_notify

2018-02-16T19:09:40+00:00

commit cef31d9af908243421258f1df35a4a644604efbe upstream.

timer_create() specifies via sigevent->sigev_notify the signal delivery for
the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
and (SIGEV_SIGNAL | SIGEV_THREAD_ID).

The sanity check in good_sigevent() is only checking the valid combination
for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
not set it accepts any random value.

This has no real effects on the posix timer and signal delivery code, but
it affects show_timer() which handles the output of /proc/$PID/timers. That
function uses a string array to pretty print sigev_notify. The access to
that array has no bound checks, so random sigev_notify cause access beyond
the array bounds.

Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
masking from various code pathes as SIGEV_NONE can never be set in
combination with SIGEV_THREAD_ID.

Reported-by: Eric Biggers 
Reported-by: Dmitry Vyukov 
Reported-by: Alexey Dobriyan 
Signed-off-by: Thomas Gleixner 
Cc: John Stultz 
Signed-off-by: Greg Kroah-Hartman

hrtimer: Reset hrtimer cpu base proper on CPU hotplug

2018-01-31T11:06:12+00:00

commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.

The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.

If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.

Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.

Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.

Fixes: 41d2e4949377 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney 
Signed-off-by: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Sebastian Sewior 
Cc: Anna-Maria Gleixner 
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: Greg Kroah-Hartman

timers: Plug locking race vs. timer migration

2018-01-31T11:06:08+00:00

commit b831275a3553c32091222ac619cfddd73a5553fb upstream.

Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.

While this has not been observed (yet), we need to make sure that it never
happens.

Fixes: 0eeda71bc30d ("timer: Replace timer base by a cpu index")
Reported-by: Linus Torvalds 
Signed-off-by: Thomas Gleixner 
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Signed-off-by: Mike Galbraith 
Signed-off-by: Greg Kroah-Hartman

time: Avoid undefined behaviour in ktime_add_safe()

2018-01-31T11:06:08+00:00

commit 979515c5645830465739254abc1b1648ada41518 upstream.

I ran into this:

    ================================================================================
    UBSAN: Undefined behaviour in kernel/time/hrtimer.c:310:16
    signed integer overflow:
    9223372036854775807 + 50000 cannot be represented in type 'long long int'
    CPU: 2 PID: 4798 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #91
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
     0000000000000000 ffff88010ce6fb88 ffffffff82344740 0000000041b58ab3
     ffffffff84f97a20 ffffffff82344694 ffff88010ce6fbb0 ffff88010ce6fb60
     000000000000c350 ffff88010ce6f968 dffffc0000000000 ffffffff857bc320
    Call Trace:
     [] dump_stack+0xac/0xfc
     [] ? _atomic_dec_and_lock+0xc4/0xc4
     [] ubsan_epilogue+0xd/0x8a
     [] handle_overflow+0x202/0x23d
     [] ? val_to_string.constprop.6+0x11e/0x11e
     [] ? timerqueue_add+0x151/0x410
     [] ? hrtimer_start_range_ns+0x3b8/0x1380
     [] ? memset+0x31/0x40
     [] __ubsan_handle_add_overflow+0xe/0x10
     [] hrtimer_nanosleep+0x5d9/0x790
     [] ? hrtimer_init_sleeper+0x80/0x80
     [] ? __might_sleep+0x5b/0x260
     [] common_nsleep+0x20/0x30
     [] SyS_clock_nanosleep+0x197/0x210
     [] ? SyS_clock_getres+0x150/0x150
     [] ? __this_cpu_preempt_check+0x13/0x20
     [] ? __context_tracking_exit.part.3+0x30/0x1b0
     [] ? SyS_clock_getres+0x150/0x150
     [] do_syscall_64+0x1b3/0x4b0
     [] entry_SYSCALL64_slow_path+0x25/0x25
    ================================================================================

Add a new ktime_add_unsafe() helper which doesn't check for overflow, but
doesn't throw a UBSAN warning when it does overflow either.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Signed-off-by: Vegard Nossum 
Signed-off-by: John Stultz 
Signed-off-by: Jiri Slaby 
Signed-off-by: Greg Kroah-Hartman