linux-stable.git/kernel/time, branch linux-2.6.33.y

clocksource: Make watchdog robust vs. interruption

2011-07-13T03:31:23+00:00

commit b5199515c25cca622495eb9c6a8a1d275e775088 upstream.

The clocksource watchdog code is interruptible and it has been
observed that this can trigger false positives which disable the TSC.

The reason is that an interrupt storm or a long running interrupt
handler between the read of the watchdog source and the read of the
TSC brings the two far enough apart that the delta is larger than the
unstable treshold. Move both reads into a short interrupt disabled
region to avoid that.

Reported-and-tested-by: Vernon Mauery 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

time: Compensate for rounding on odd-frequency clocksources

2011-06-23T22:28:35+00:00

commit a386b5af8edda1c742ce9f77891e112eefffc005 upstream.

When the clocksource is not a multiple of HZ, the clock will be off.  For
acpi_pm, HZ=1000 the error is 127.111 ppm:

The rounding of cycle_interval ends up generating a false error term in
ntp_error accumulation since xtime_interval is not exactly 1/HZ.  So, we
subtract out the error caused by the rounding.

This has been visible since 2.6.32-rc2
	commit a092ff0f90cae22b2ac8028ecd2c6f6c1a9e4601
	time: Implement logarithmic time accumulation
That commit raised NTP_INTERVAL_FREQ and exposed the rounding error.

testing tool: http://n1.taur.dk/permanent/testpmt.c
Also tested with ntpd and a frequency counter.

Signed-off-by: Kasper Pedersen 
Acked-by: john stultz 
Cc: John Kacur 
Cc: Clark Williams 
Cc: Martin Schwidefsky 
Signed-off-by: Andrew Morton 
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

tick: Clear broadcast active bit when switching to oneshot

2011-05-23T18:23:01+00:00

commit 07f4beb0b5bbfaf36a64aa00d59e670ec578a95a upstream.

The first cpu which switches from periodic to oneshot mode switches
also the broadcast device into oneshot mode. The broadcast device
serves as a backup for per cpu timers which stop in deeper
C-states. To avoid starvation of the cpus which might be in idle and
depend on broadcast mode it marks the other cpus as broadcast active
and sets the brodcast expiry value of those cpus to the next tick.

The oneshot mode broadcast bit for the other cpus is sticky and gets
only cleared when those cpus exit idle. If a cpu was not idle while
the bit got set in consequence the bit prevents that the broadcast
device is armed on behalf of that cpu when it enters idle for the
first time after it switched to oneshot mode.

In most cases that goes unnoticed as one of the other cpus has usually
a timer pending which keeps the broadcast device armed with a short
timeout. Now if the only cpu which has a short timer active has the
bit set then the broadcast device will not be armed on behalf of that
cpu and will fire way after the expected timer expiry. In the case of
Christians bug report it took ~145 seconds which is about half of the
wrap around time of HPET (the limit for that device) due to the fact
that all other cpus had no timers armed which expired before the 145
seconds timeframe.

The solution is simply to clear the broadcast active bit
unconditionally when a cpu switches to oneshot mode after the first
cpu switched the broadcast device over. It's not idle at that point
otherwise it would not be executing that code.

[ I fundamentally hate that broadcast crap. Why the heck thought some
  folks that when going into deep idle it's a brilliant concept to
  switch off the last device which brings the cpu back from that
  state? ]

Thanks to Christian for providing all the valuable debug information!

Reported-and-tested-by: Christian Hoffmann 
Cc: John Stultz 
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1105161105170.3078%40ionos%3E
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

clocksource: Install completely before selecting

2011-05-23T18:23:01+00:00

commit e05b2efb82596905ebfe88e8612ee81dec9b6592 upstream.

Christian Hoffmann reported that the command line clocksource override
with acpi_pm timer fails:

 Kernel command line:  clocksource=acpi_pm
 hpet clockevent registered
 Switching to clocksource hpet
 Override clocksource acpi_pm is not HRT compatible.
 Cannot switch while in HRT/NOHZ mode.

The watchdog code is what enables CLOCK_SOURCE_VALID_FOR_HRES, but we
actually end up selecting the clocksource before we enqueue it into
the watchdog list, so that's why we see the warning and fail to switch
to acpi_pm timer as requested. That's particularly bad when we want to
debug timekeeping related problems in early boot.

Put the selection call last.

Reported-by: Christian Hoffmann 
Signed-off-by: John Stultz 
Link: http://lkml.kernel.org/r/%3C1304558210.2943.24.camel%40work-vm%3E
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

clockevents: Prevent oneshot mode when broadcast device is periodic

2011-03-21T19:45:24+00:00

commit 3a142a0672b48a853f00af61f184c7341ac9c99d upstream.

When the per cpu timer is marked CLOCK_EVT_FEAT_C3STOP, then we only
can switch into oneshot mode, when the backup broadcast device
supports oneshot mode as well. Otherwise we would try to switch the
broadcast device into an unsupported mode unconditionally. This went
unnoticed so far as the current available broadcast devices support
oneshot mode. Seth unearthed this problem while debugging and working
around an hpet related BIOS wreckage.

Add the necessary check to tick_is_oneshot_available().

Reported-and-tested-by: Seth Forshee 
Signed-off-by: Thomas Gleixner 
LKML-Reference: 
Signed-off-by: Greg Kroah-Hartman

timekeeping: Prevent oops when GENERIC_TIME=n

2010-04-01T23:01:11+00:00

commit ad6759fbf35d104dbf573cd6f4c6784ad6823f7e upstream.

Aaro Koskinen reported an issue in kernel.org bugzilla #15366, where
on non-GENERIC_TIME systems, accessing
/sys/devices/system/clocksource/clocksource0/current_clocksource
results in an oops.

It seems the timekeeper/clocksource rework missed initializing the
curr_clocksource value in the !GENERIC_TIME case.

Thanks to Aaro for reporting and diagnosing the issue as well as
testing the fix!

Reported-by: Aaro Koskinen 
Signed-off-by: John Stultz 
Cc: Martin Schwidefsky 
LKML-Reference: <1267475683.4216.61.camel@localhost.localdomain>
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman

Export the symbol of getboottime and mmonotonic_to_bootbased

2010-02-09T17:20:15+00:00

Export getboottime and monotonic_to_bootbased in order to let them
could be used by following patch.

Cc: stable@kernel.org
Signed-off-by: Jason Wang 
Signed-off-by: Marcelo Tosatti

clocksource: Prevent potential kgdb dead lock

2010-01-26T13:53:16+00:00

commit 0f8e8ef7 (clocksource: Simplify clocksource watchdog resume
logic) introduced a potential kgdb dead lock. When the kernel is
stopped by kgdb inside code which holds watchdog_lock then kgdb dead
locks in clocksource_resume_watchdog().

clocksource_resume_watchdog() is called from kbdg via
clocksource_touch_watchdog() to avoid that the clock source watchdog
marks TSC unstable after the kernel has been stopped.

Solve this by replacing spin_lock with a spin_trylock and just return
in case the lock is held. Not resetting the watchdog might result in
TSC becoming marked unstable, but that's an acceptable penalty for
using kgdb.

The timekeeping is anyway easily screwed up by kgdb when the system
uses either jiffies or a clock source which wraps in short intervals
(e.g. pm_timer wraps about every 4.6s), so we really do not have to
worry about that occasional TSC marked unstable side effect.

The second caller of clocksource_resume_watchdog() is
clocksource_resume(). The trylock is safe here as well because the
system is UP at this point, interrupts are disabled and nothing else
can hold watchdog_lock().

Reported-by: Jason Wessel 
LKML-Reference: <1264480000-6997-4-git-send-email-jason.wessel@windriver.com>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: Martin Schwidefsky 
Cc: John Stultz 
Cc: Andrew Morton 
Signed-off-by: Thomas Gleixner

clockevent: Don't remove broadcast device when cpu is dead

2010-01-18T13:44:50+00:00

Marc reported that the BUG_ON in clockevents_notify() triggers on his
system. This happens because the kernel tries to remove an active
clock event device (used for broadcasting) from the device list.

The handling of devices which can be used as per cpu device and as a
global broadcast device is suboptimal.

The simplest solution for now (and for stable) is to check whether the
device is used as global broadcast device, but this needs to be
revisited.

[ tglx: restored the cpuweight check and massaged the changelog ]

Reported-by: Marc Dionne 
Tested-by: Marc Dionne 
Signed-off-by: Xiaotian Feng 
LKML-Reference: <1262834564-13033-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner 
Cc: stable@kernel.org

Revert "time: Remove xtime_cache"

2009-12-22T22:10:37+00:00

This reverts commit 7bc7d637452383d56ba4368d4336b0dde1bb476d, as
requested by John Stultz. Quoting John:

 "Petr Titěra reported an issue where he saw odd atime regressions with
  2.6.33 where there were a full second worth of nanoseconds in the
  nanoseconds field.

  He also reviewed the time code and narrowed down the problem: unhandled
  overflow of the nanosecond field caused by rounding up the
  sub-nanosecond accumulated time.

  Details:

   * At the end of update_wall_time(), we currently round up the
  sub-nanosecond portion of accumulated time when storing it into xtime.
  This was added to avoid time inconsistencies caused when the
  sub-nanosecond portion was truncated when storing into xtime.
  Unfortunately we don't handle the possible second overflow caused by
  that rounding.

   * Previously the xtime_cache code hid this overflow by normalizing the
  xtime value when storing into the xtime_cache.

   * We could try to handle the second overflow after the rounding up, but
  since this affects the timekeeping's internal state, this would further
  complicate the next accumulation cycle, causing small errors in ntp
  steering. As much as I'd like to get rid of it, the xtime_cache code is
  known to work.

   * The correct fix is really to include the sub-nanosecond portion in the
  timekeeping accessor function, so we don't need to round up at during
  accumulation. This would greatly simplify the accumulation code.
  Unfortunately, we can't do this safely until the last three
  non-GENERIC_TIME arches (sparc32, arm, cris) are converted  (those
  patches are in -mm) and we kill off the spots where arches set xtime
  directly. This is all 2.6.34 material, so I think reverting the
  xtime_cache change is the best approach for now.

  Many thanks to Petr for both reporting and finding the issue!"

Reported-by: Petr Titěra 
Requested-by: john stultz 
Cc: Ingo Molnar 
Signed-off-by: Linus Torvalds