linux.git/kernel/time/clocksource.c, branch v7.2-rc1

Merge tag 'timers-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip

2026-06-15T08:09:12+00:00

Pull timer core updates from Thomas Gleixner:
 "Updates for the time/timer core subsystem:

   - Harden the user space controllable hrtimer interfaces further to
     protect against unpriviledged DoS attempts by arming timers in the
     past.

   - Add per-capacity hierarchies to the timer migration code to prevent
     timer migration accross different capacity domains. This code has
     been disabled last minute as there is a pathological problem with
     SoCs which advertise a larger number of capacity domains. The
     problem is under investigation and the code won't be active before
     v7.3, but that turned out to be less intrusive than a full revert
     as it preserves the preparatory steps and allows people to work on
     the final resolution

   - Export time namespace functionality as a recent user can be built
     as a module.

   - Initialize the jiffies clocksource before using it. The recent
     hardening against time moving backward requires that the related
     members of struct clocksource have been initialized, otherwise it
     clamps the readout to 0, which makes time stand sill and causes
     boot delays.

   - Fix a more than twenty year old PID reference count leak in an
     error path of the POSIX CPU timer code.

   - The usual small fixes, improvements and cleanups all over the
     place"

* tag 'timers-core-2026-06-13' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: (31 commits)
  posix-cpu-timers: Fix pid refcount leak in do_cpu_nanosleep() error path
  time/jiffies: Register jiffies clocksource before usage
  timers/migration: Temporarily disable per capacity hierarchies
  timers/migration: Turn tmigr_hierarchy level_list into a flexible array
  timers/migration: Deactivate per-capacity hierarchies under nohz_full
  timers/migration: Fix hotplug migrator selection target on asymetric capacity machines
  ntsync: Honour caller's time namespace for absolute MONOTONIC timeouts
  time/namespace: Export init_time_ns and do_timens_ktime_to_host()
  timers/migration: Update stale @online doc to @available
  timers: Fix flseep() typo in kernel-doc comment
  hrtimer: Fix the bogus return type of __hrtimer_start_range_ns()
  hrtimer: Return ktime_t from hrtimer_get_next_event()/hrtimer_next_event_without()
  clocksource: Clean up clocksource_update_freq() functions
  alarmtimer: Remove stale return description from alarm_handle_timer()
  selftests/posix_timers: Use CLOCK_THREAD_CPUTIME_ID for ITIMER_PROF measurements
  scripts/timers: Add timer_migration_tree.py
  timers/migration: Handle capacity in connect tracepoints
  timers/migration: Split per-capacity hierarchies
  timers/migration: Track CPUs in a hierarchy
  timers/migration: Abstract out hierarchy to prepare for CPU capacity awareness
  ...

clocksource: Add devm_clocksource_register_*() helpers

2026-05-18T09:02:51+00:00

Introduce device-managed helpers for clocksource registration.

The clocksource framework currently provides __clocksource_register_scale()
along with convenience wrappers for Hz and kHz registration. However,
drivers must handle error paths and cleanup manually, typically by pairing
registration with an explicit clocksource_unregister() call.

Add a devm-based variant, __devm_clocksource_register_scale(), along with
devm_clocksource_register_hz() and devm_clocksource_register_khz() helpers.

These helpers register the clocksource and attach a devres action to
automatically unregister it on driver detach or probe failure.

This simplifies driver code by:

  * removing explicit cleanup paths
  * ensuring correct teardown ordering
  * aligning with the devm-based resource management model widely used
    across the kernel

While drivers can open-code devm_add_action_or_reset(), providing a
dedicated helper avoids duplication, reduces boilerplate, and ensures
consistent usage across drivers, following patterns used in other
subsystems.

This is also particularly useful for drivers built as modules, where
device-managed resource handling avoids manual cleanup in remove paths and
ensures correct teardown on module unload.

This helper is self-contained and can be adopted progressively by drivers.

No functional change.

Signed-off-by: Daniel Lezcano 
Signed-off-by: Thomas Gleixner 
Link: https://patch.msgid.link/20260506153831.605159-1-daniel.lezcano@oss.qualcomm.com

clocksource: Clean up clocksource_update_freq() functions

2026-05-06T06:33:08+00:00

Remove the unused functions __clocksource_update_freq_hz() and
__clocksource_update_freq_khz().

Then make __clocksource_update_freq_scale() static as it is not used
from external callers anymore. Also clean up the comment accordingly.

Signed-off-by: Thomas Weißschuh 
Signed-off-by: Thomas Gleixner 
Acked-by: John Stultz 
Link: https://patch.msgid.link/20260504-clocksource-update_freq-v2-1-3e696fb01776@linutronix.de

clocksource: Rewrite watchdog code completely

2026-03-20T12:36:32+00:00

The clocksource watchdog code has over time reached the state of an
impenetrable maze of duct tape and staples. The original design, which was
made in the context of systems far smaller than today, is based on the
assumption that the to be monitored clocksource (TSC) can be trivially
compared against a known to be stable clocksource (HPET/ACPI-PM timer).

Over the years it turned out that this approach has major flaws:

  - Long delays between watchdog invocations can result in wrap arounds
    of the reference clocksource

  - Scalability of the reference clocksource readout can degrade on large
    multi-socket systems due to interconnect congestion

This was addressed with various heuristics which degraded the accuracy of
the watchdog to the point that it fails to detect actual TSC problems on
older hardware which exposes slow inter CPU drifts due to firmware
manipulating the TSC to hide SMI time.

To address this and bring back sanity to the watchdog, rewrite the code
completely with a different approach:

  1) Restrict the validation against a reference clocksource to the boot
     CPU, which is usually the CPU/Socket closest to the legacy block which
     contains the reference source (HPET/ACPI-PM timer). Validate that the
     reference readout is within a bound latency so that the actual
     comparison against the TSC stays within 500ppm as long as the clocks
     are stable.

  2) Compare the TSCs of the other CPUs in a round robin fashion against
     the boot CPU in the same way the TSC synchronization on CPU hotplug
     works. This still can suffer from delayed reaction of the remote CPU
     to the SMP function call and the latency of the control variable cache
     line. But this latency is not affecting correctness. It only affects
     the accuracy. With low contention the readout latency is in the low
     nanoseconds range, which detects even slight skews between CPUs. Under
     high contention this becomes obviously less accurate, but still
     detects slow skews reliably as it solely relies on subsequent readouts
     being monotonically increasing. It just can take slightly longer to
     detect the issue.

  3) Rewrite the watchdog test so it tests the various mechanisms one by
     one and validating the result against the expectation.

Signed-off-by: Thomas Gleixner 
Tested-by: Borislav Petkov (AMD) 
Tested-by: Daniel J Blueman 
Reviewed-by: Jiri Wiesner 
Reviewed-by: Daniel J Blueman 
Link: https://patch.msgid.link/20260123231521.926490888@kernel.org
Link: https://patch.msgid.link/87h5qeomm5.ffs@tglx

clocksource: Don't use non-continuous clocksources as watchdog

2026-03-12T11:23:27+00:00

Using a non-continuous aka untrusted clocksource as a watchdog for another
untrusted clocksource is equivalent to putting the fox in charge of the
henhouse.

That's especially true with the jiffies clocksource which depends on
interrupt delivery based on a periodic timer. Neither the frequency of that
timer is trustworthy nor the kernel's ability to react on it in a timely
manner and rearm it if it is not self rearming.

Just don't bother to deal with this. It's not worth the trouble and only
relevant to museum piece hardware.

Signed-off-by: Thomas Gleixner 
Link: https://patch.msgid.link/20260123231521.858743259@kernel.org

clocksource: Update clocksource::freq_khz on registration

2026-03-05T16:41:06+00:00

Borislav reported a division by zero in the timekeeping code and random
hangs with the new coupled clocksource/clockevent functionality.

It turned out that the TSC clocksource is not always updating the
freq_khz field of the clocksource on registration. The coupled mode
conversion calculation requires the frequency and as it's not
initialized the resulting factor is zero or a random value. As a
consequence this causes a division by zero or random boot hangs.

Instead of chasing down all clocksources which fail to update that
member, fill it in at registration time where the caller has to supply
the frequency anyway. Except for special clocksources like jiffies which
never can have coupled mode.

To make this more robust put a check into the registration function to
validate that the caller supplied a frequency if the coupled mode
feature bit is set. If not, emit a warning and clear the feature bit.

Fixes: cd38bdb8e696 ("timekeeping: Provide infrastructure for coupled clockevents")
Reported-by: Borislav Petkov 
Reported-by: Nathan Chancellor 
Signed-off-by: Thomas Gleixner 
Tested-by: Borislav Petkov 
Tested-by: Nathan Chancellor 
Link: https://patch.msgid.link/87cy1jsa4m.ffs@tglx
Closes: https://lore.kernel.org/20260303213027.GA2168957@ax162

clocksource: Reduce watchdog readout delay limit to prevent false positives

2026-01-21T10:33:11+00:00

The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.

Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:

  1) wdnow = read(HPET);
  2) csnow = read(TSC);
  3) wdend = read(HPET);

The valid window for the delta between #1 and #3 is calculated by the
uncertainty margins of the watchdog and the clocksource:

   m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 750us for the TSC/HPET case.

The actual interval comparison uses a smaller margin:

   m = watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 500us for the TSC/HPET case.

That means the following scenario will trigger the watchdog:

 Watchdog cycle N:

 1)       wdnow[N] = read(HPET);
 2)       csnow[N] = read(TSC);
 3)       wdend[N] = read(HPET);

Assume the delay between #1 and #2 is 100us and the delay between #1 and

 Watchdog cycle N + 1:

 4)       wdnow[N + 1] = read(HPET);
 5)       csnow[N + 1] = read(TSC);
 6)       wdend[N + 1] = read(HPET);

If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:

    wd_int = wdnow[N + 1] - wdnow[N];
    cs_int = csnow[N + 1] - csnow[N];

Putting the above delays in place this results in:

    cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us);
 -> cs_int = wd_int + 510us;

which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.

Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.

Fixes: 4ac1dd3245b9 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin")
Reported-by: Daniel J Blueman 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paul E. McKenney 
Tested-by: Paul E. McKenney 
Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/
Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx

time: Fix spelling mistakes in comments

2025-09-21T08:02:02+00:00

Correct several typos found in comments across various files in the
kernel/time directory.

No functional changes are introduced by these corrections.

Signed-off-by: Haofeng Li 
Signed-off-by: Thomas Gleixner

clocksource: Print durations for sync check unconditionally

2025-09-09T12:08:19+00:00

A typical set of messages that gets printed as a result of the clocksource
watchdog finding the TSC unstable usually does not contain messages
indicating CPUs being ahead of or behind the CPU from which the check is
carried out. That fact suggests that the TSC does not experience time skew
between CPUs (if the clocksource.verify_n_cpus parameter is set to a
negative value) but quantitative information is missing.

The cs_nsec_max value printed by the "CPU %d check durations" message
actually provides a worst case estimate of the time skew. If all CPUs have
been checked, the cs_nsec_max value multiplied by 2 is the maximum
possible time skew between the TSCs of any two CPUs on the system. The
worst case estimate is derived from two boundary cases:

1. No time is consumed to execute instructions between csnow_begin and
csnow_mid while all the cs_nsec_max time is consumed by the code between
csnow_mid and csnow_end. In this case, the maximum undetectable time skew
of a CPU being ahead would be cs_nsec_max.

2. All the cs_nsec_max time is consumed to execute instructions between
csnow_begin and csnow_mid while no time is consumed by the code between
csnow_mid and csnow_end. In this case, the maximum undetectable time skew
of a CPU being behind would be cs_nsec_max.

The worst case estimate assumes a system experiencing a corner case
consisting of the two boundary cases.

Always print the "CPU %d check durations" message so that the maximum
possible time skew measured by the TSC sync check can be compared to the
time skew measured by the clocksource watchdog.

Signed-off-by: Jiri Wiesner 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Paul E. McKenney 
Link: https://lore.kernel.org/all/aIuXXfdITXdI0lLp@incl

Merge tag 'bitmap-for-6.17' of https://github.com/norov/linux

2025-07-31T23:52:32+00:00

Pull bitmap updates from Yury Norov:

 - find_random_bit() series (Yury)

 - GENMASK() consolidation (Vincent)

 - random cleanups (Shaopeng, Ben, Yury)

* tag 'bitmap-for-6.17' of https://github.com/norov/linux:
  bitfield: Ensure the return values of helper functions are checked
  test_bits: add tests for __GENMASK() and __GENMASK_ULL()
  bits: unify the non-asm GENMASK*()
  bits: split the definition of the asm and non-asm GENMASK*()
  cpumask: Remove unnecessary cpumask_nth_andnot()
  watchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu()
  clocksource: Improve randomness in clocksource_verify_choose_cpus()
  cpumask: introduce cpumask_random()
  bitmap: generalize node_random()