<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/time/clocksource.c, branch v7.1-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>clocksource: Rewrite watchdog code completely</title>
<updated>2026-03-20T12:36:32+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-03-17T09:01:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=763aacf86f1baefb134c70813aa8c72d1675d738'/>
<id>763aacf86f1baefb134c70813aa8c72d1675d738</id>
<content type='text'>
The clocksource watchdog code has over time reached the state of an
impenetrable maze of duct tape and staples. The original design, which was
made in the context of systems far smaller than today, is based on the
assumption that the to be monitored clocksource (TSC) can be trivially
compared against a known to be stable clocksource (HPET/ACPI-PM timer).

Over the years it turned out that this approach has major flaws:

  - Long delays between watchdog invocations can result in wrap arounds
    of the reference clocksource

  - Scalability of the reference clocksource readout can degrade on large
    multi-socket systems due to interconnect congestion

This was addressed with various heuristics which degraded the accuracy of
the watchdog to the point that it fails to detect actual TSC problems on
older hardware which exposes slow inter CPU drifts due to firmware
manipulating the TSC to hide SMI time.

To address this and bring back sanity to the watchdog, rewrite the code
completely with a different approach:

  1) Restrict the validation against a reference clocksource to the boot
     CPU, which is usually the CPU/Socket closest to the legacy block which
     contains the reference source (HPET/ACPI-PM timer). Validate that the
     reference readout is within a bound latency so that the actual
     comparison against the TSC stays within 500ppm as long as the clocks
     are stable.

  2) Compare the TSCs of the other CPUs in a round robin fashion against
     the boot CPU in the same way the TSC synchronization on CPU hotplug
     works. This still can suffer from delayed reaction of the remote CPU
     to the SMP function call and the latency of the control variable cache
     line. But this latency is not affecting correctness. It only affects
     the accuracy. With low contention the readout latency is in the low
     nanoseconds range, which detects even slight skews between CPUs. Under
     high contention this becomes obviously less accurate, but still
     detects slow skews reliably as it solely relies on subsequent readouts
     being monotonically increasing. It just can take slightly longer to
     detect the issue.

  3) Rewrite the watchdog test so it tests the various mechanisms one by
     one and validating the result against the expectation.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Tested-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Daniel J Blueman &lt;daniel@quora.org&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Reviewed-by: Daniel J Blueman &lt;daniel@quora.org&gt;
Link: https://patch.msgid.link/20260123231521.926490888@kernel.org
Link: https://patch.msgid.link/87h5qeomm5.ffs@tglx
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The clocksource watchdog code has over time reached the state of an
impenetrable maze of duct tape and staples. The original design, which was
made in the context of systems far smaller than today, is based on the
assumption that the to be monitored clocksource (TSC) can be trivially
compared against a known to be stable clocksource (HPET/ACPI-PM timer).

Over the years it turned out that this approach has major flaws:

  - Long delays between watchdog invocations can result in wrap arounds
    of the reference clocksource

  - Scalability of the reference clocksource readout can degrade on large
    multi-socket systems due to interconnect congestion

This was addressed with various heuristics which degraded the accuracy of
the watchdog to the point that it fails to detect actual TSC problems on
older hardware which exposes slow inter CPU drifts due to firmware
manipulating the TSC to hide SMI time.

To address this and bring back sanity to the watchdog, rewrite the code
completely with a different approach:

  1) Restrict the validation against a reference clocksource to the boot
     CPU, which is usually the CPU/Socket closest to the legacy block which
     contains the reference source (HPET/ACPI-PM timer). Validate that the
     reference readout is within a bound latency so that the actual
     comparison against the TSC stays within 500ppm as long as the clocks
     are stable.

  2) Compare the TSCs of the other CPUs in a round robin fashion against
     the boot CPU in the same way the TSC synchronization on CPU hotplug
     works. This still can suffer from delayed reaction of the remote CPU
     to the SMP function call and the latency of the control variable cache
     line. But this latency is not affecting correctness. It only affects
     the accuracy. With low contention the readout latency is in the low
     nanoseconds range, which detects even slight skews between CPUs. Under
     high contention this becomes obviously less accurate, but still
     detects slow skews reliably as it solely relies on subsequent readouts
     being monotonically increasing. It just can take slightly longer to
     detect the issue.

  3) Rewrite the watchdog test so it tests the various mechanisms one by
     one and validating the result against the expectation.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Tested-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Daniel J Blueman &lt;daniel@quora.org&gt;
Reviewed-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Reviewed-by: Daniel J Blueman &lt;daniel@quora.org&gt;
Link: https://patch.msgid.link/20260123231521.926490888@kernel.org
Link: https://patch.msgid.link/87h5qeomm5.ffs@tglx
</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Don't use non-continuous clocksources as watchdog</title>
<updated>2026-03-12T11:23:27+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-01-23T23:17:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=1432f9d4e8aa2d7585b678bdd0b740597af00d6e'/>
<id>1432f9d4e8aa2d7585b678bdd0b740597af00d6e</id>
<content type='text'>
Using a non-continuous aka untrusted clocksource as a watchdog for another
untrusted clocksource is equivalent to putting the fox in charge of the
henhouse.

That's especially true with the jiffies clocksource which depends on
interrupt delivery based on a periodic timer. Neither the frequency of that
timer is trustworthy nor the kernel's ability to react on it in a timely
manner and rearm it if it is not self rearming.

Just don't bother to deal with this. It's not worth the trouble and only
relevant to museum piece hardware.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Link: https://patch.msgid.link/20260123231521.858743259@kernel.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Using a non-continuous aka untrusted clocksource as a watchdog for another
untrusted clocksource is equivalent to putting the fox in charge of the
henhouse.

That's especially true with the jiffies clocksource which depends on
interrupt delivery based on a periodic timer. Neither the frequency of that
timer is trustworthy nor the kernel's ability to react on it in a timely
manner and rearm it if it is not self rearming.

Just don't bother to deal with this. It's not worth the trouble and only
relevant to museum piece hardware.

Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Link: https://patch.msgid.link/20260123231521.858743259@kernel.org
</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Update clocksource::freq_khz on registration</title>
<updated>2026-03-05T16:41:06+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@kernel.org</email>
</author>
<published>2026-03-04T18:49:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=53007d526e17d29f0e5b81c07eb594a93bc4d29c'/>
<id>53007d526e17d29f0e5b81c07eb594a93bc4d29c</id>
<content type='text'>
Borislav reported a division by zero in the timekeeping code and random
hangs with the new coupled clocksource/clockevent functionality.

It turned out that the TSC clocksource is not always updating the
freq_khz field of the clocksource on registration. The coupled mode
conversion calculation requires the frequency and as it's not
initialized the resulting factor is zero or a random value. As a
consequence this causes a division by zero or random boot hangs.

Instead of chasing down all clocksources which fail to update that
member, fill it in at registration time where the caller has to supply
the frequency anyway. Except for special clocksources like jiffies which
never can have coupled mode.

To make this more robust put a check into the registration function to
validate that the caller supplied a frequency if the coupled mode
feature bit is set. If not, emit a warning and clear the feature bit.

Fixes: cd38bdb8e696 ("timekeeping: Provide infrastructure for coupled clockevents")
Reported-by: Borislav Petkov &lt;bp@alien8.de&gt;
Reported-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Tested-by: Borislav Petkov &lt;bp@alien8.de&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Link: https://patch.msgid.link/87cy1jsa4m.ffs@tglx
Closes: https://lore.kernel.org/20260303213027.GA2168957@ax162
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Borislav reported a division by zero in the timekeeping code and random
hangs with the new coupled clocksource/clockevent functionality.

It turned out that the TSC clocksource is not always updating the
freq_khz field of the clocksource on registration. The coupled mode
conversion calculation requires the frequency and as it's not
initialized the resulting factor is zero or a random value. As a
consequence this causes a division by zero or random boot hangs.

Instead of chasing down all clocksources which fail to update that
member, fill it in at registration time where the caller has to supply
the frequency anyway. Except for special clocksources like jiffies which
never can have coupled mode.

To make this more robust put a check into the registration function to
validate that the caller supplied a frequency if the coupled mode
feature bit is set. If not, emit a warning and clear the feature bit.

Fixes: cd38bdb8e696 ("timekeeping: Provide infrastructure for coupled clockevents")
Reported-by: Borislav Petkov &lt;bp@alien8.de&gt;
Reported-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@kernel.org&gt;
Tested-by: Borislav Petkov &lt;bp@alien8.de&gt;
Tested-by: Nathan Chancellor &lt;nathan@kernel.org&gt;
Link: https://patch.msgid.link/87cy1jsa4m.ffs@tglx
Closes: https://lore.kernel.org/20260303213027.GA2168957@ax162
</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Reduce watchdog readout delay limit to prevent false positives</title>
<updated>2026-01-21T10:33:11+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2025-12-17T17:21:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c06343be0b4e03fe319910dd7a5d5b9929e1c0cb'/>
<id>c06343be0b4e03fe319910dd7a5d5b9929e1c0cb</id>
<content type='text'>
The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.

Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:

  1) wdnow = read(HPET);
  2) csnow = read(TSC);
  3) wdend = read(HPET);

The valid window for the delta between #1 and #3 is calculated by the
uncertainty margins of the watchdog and the clocksource:

   m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 750us for the TSC/HPET case.

The actual interval comparison uses a smaller margin:

   m = watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 500us for the TSC/HPET case.

That means the following scenario will trigger the watchdog:

 Watchdog cycle N:

 1)       wdnow[N] = read(HPET);
 2)       csnow[N] = read(TSC);
 3)       wdend[N] = read(HPET);

Assume the delay between #1 and #2 is 100us and the delay between #1 and

 Watchdog cycle N + 1:

 4)       wdnow[N + 1] = read(HPET);
 5)       csnow[N + 1] = read(TSC);
 6)       wdend[N + 1] = read(HPET);

If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:

    wd_int = wdnow[N + 1] - wdnow[N];
    cs_int = csnow[N + 1] - csnow[N];

Putting the above delays in place this results in:

    cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us);
 -&gt; cs_int = wd_int + 510us;

which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.

Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.

Fixes: 4ac1dd3245b9 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin")
Reported-by: Daniel J Blueman &lt;daniel@quora.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Tested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/
Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The "valid" readout delay between the two reads of the watchdog is larger
than the valid delta between the resulting watchdog and clocksource
intervals, which results in false positive watchdog results.

Assume TSC is the clocksource and HPET is the watchdog and both have a
uncertainty margin of 250us (default). The watchdog readout does:

  1) wdnow = read(HPET);
  2) csnow = read(TSC);
  3) wdend = read(HPET);

The valid window for the delta between #1 and #3 is calculated by the
uncertainty margins of the watchdog and the clocksource:

   m = 2 * watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 750us for the TSC/HPET case.

The actual interval comparison uses a smaller margin:

   m = watchdog.uncertainty_margin + cs.uncertainty margin;

which results in 500us for the TSC/HPET case.

That means the following scenario will trigger the watchdog:

 Watchdog cycle N:

 1)       wdnow[N] = read(HPET);
 2)       csnow[N] = read(TSC);
 3)       wdend[N] = read(HPET);

Assume the delay between #1 and #2 is 100us and the delay between #1 and

 Watchdog cycle N + 1:

 4)       wdnow[N + 1] = read(HPET);
 5)       csnow[N + 1] = read(TSC);
 6)       wdend[N + 1] = read(HPET);

If the delay between #4 and #6 is within the 750us margin then any delay
between #4 and #5 which is larger than 600us will fail the interval check
and mark the TSC unstable because the intervals are calculated against the
previous value:

    wd_int = wdnow[N + 1] - wdnow[N];
    cs_int = csnow[N + 1] - csnow[N];

Putting the above delays in place this results in:

    cs_int = (wdnow[N + 1] + 610us) - (wdnow[N] + 100us);
 -&gt; cs_int = wd_int + 510us;

which is obviously larger than the allowed 500us margin and results in
marking TSC unstable.

Fix this by using the same margin as the interval comparison. If the delay
between two watchdog reads is larger than that, then the readout was either
disturbed by interconnect congestion, NMIs or SMIs.

Fixes: 4ac1dd3245b9 ("clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin")
Reported-by: Daniel J Blueman &lt;daniel@quora.org&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Tested-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/lkml/20250602223251.496591-1-daniel@quora.org/
Link: https://patch.msgid.link/87bjjxc9dq.ffs@tglx
</pre>
</div>
</content>
</entry>
<entry>
<title>time: Fix spelling mistakes in comments</title>
<updated>2025-09-21T08:02:02+00:00</updated>
<author>
<name>Haofeng Li</name>
<email>lihaofeng@kylinos.cn</email>
</author>
<published>2025-09-10T09:37:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=391253b25f078d2fe5657a1dedd360396d186407'/>
<id>391253b25f078d2fe5657a1dedd360396d186407</id>
<content type='text'>
Correct several typos found in comments across various files in the
kernel/time directory.

No functional changes are introduced by these corrections.

Signed-off-by: Haofeng Li &lt;lihaofeng@kylinos.cn&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Correct several typos found in comments across various files in the
kernel/time directory.

No functional changes are introduced by these corrections.

Signed-off-by: Haofeng Li &lt;lihaofeng@kylinos.cn&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Print durations for sync check unconditionally</title>
<updated>2025-09-09T12:08:19+00:00</updated>
<author>
<name>Jiri Wiesner</name>
<email>jwiesner@suse.de</email>
</author>
<published>2025-07-31T16:18:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b9aa93aa5185aee76c4c7a5ba4432b4d0d15f797'/>
<id>b9aa93aa5185aee76c4c7a5ba4432b4d0d15f797</id>
<content type='text'>
A typical set of messages that gets printed as a result of the clocksource
watchdog finding the TSC unstable usually does not contain messages
indicating CPUs being ahead of or behind the CPU from which the check is
carried out. That fact suggests that the TSC does not experience time skew
between CPUs (if the clocksource.verify_n_cpus parameter is set to a
negative value) but quantitative information is missing.

The cs_nsec_max value printed by the "CPU %d check durations" message
actually provides a worst case estimate of the time skew. If all CPUs have
been checked, the cs_nsec_max value multiplied by 2 is the maximum
possible time skew between the TSCs of any two CPUs on the system. The
worst case estimate is derived from two boundary cases:

1. No time is consumed to execute instructions between csnow_begin and
csnow_mid while all the cs_nsec_max time is consumed by the code between
csnow_mid and csnow_end. In this case, the maximum undetectable time skew
of a CPU being ahead would be cs_nsec_max.

2. All the cs_nsec_max time is consumed to execute instructions between
csnow_begin and csnow_mid while no time is consumed by the code between
csnow_mid and csnow_end. In this case, the maximum undetectable time skew
of a CPU being behind would be cs_nsec_max.

The worst case estimate assumes a system experiencing a corner case
consisting of the two boundary cases.

Always print the "CPU %d check durations" message so that the maximum
possible time skew measured by the TSC sync check can be compared to the
time skew measured by the clocksource watchdog.

Signed-off-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/all/aIuXXfdITXdI0lLp@incl

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A typical set of messages that gets printed as a result of the clocksource
watchdog finding the TSC unstable usually does not contain messages
indicating CPUs being ahead of or behind the CPU from which the check is
carried out. That fact suggests that the TSC does not experience time skew
between CPUs (if the clocksource.verify_n_cpus parameter is set to a
negative value) but quantitative information is missing.

The cs_nsec_max value printed by the "CPU %d check durations" message
actually provides a worst case estimate of the time skew. If all CPUs have
been checked, the cs_nsec_max value multiplied by 2 is the maximum
possible time skew between the TSCs of any two CPUs on the system. The
worst case estimate is derived from two boundary cases:

1. No time is consumed to execute instructions between csnow_begin and
csnow_mid while all the cs_nsec_max time is consumed by the code between
csnow_mid and csnow_end. In this case, the maximum undetectable time skew
of a CPU being ahead would be cs_nsec_max.

2. All the cs_nsec_max time is consumed to execute instructions between
csnow_begin and csnow_mid while no time is consumed by the code between
csnow_mid and csnow_end. In this case, the maximum undetectable time skew
of a CPU being behind would be cs_nsec_max.

The worst case estimate assumes a system experiencing a corner case
consisting of the two boundary cases.

Always print the "CPU %d check durations" message so that the maximum
possible time skew measured by the TSC sync check can be compared to the
time skew measured by the clocksource watchdog.

Signed-off-by: Jiri Wiesner &lt;jwiesner@suse.de&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Paul E. McKenney &lt;paulmck@kernel.org&gt;
Link: https://lore.kernel.org/all/aIuXXfdITXdI0lLp@incl

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'bitmap-for-6.17' of https://github.com/norov/linux</title>
<updated>2025-07-31T23:52:32+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-07-31T23:52:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f2d282e1dfb3d8cb95b5ccdea43f2411f27201db'/>
<id>f2d282e1dfb3d8cb95b5ccdea43f2411f27201db</id>
<content type='text'>
Pull bitmap updates from Yury Norov:

 - find_random_bit() series (Yury)

 - GENMASK() consolidation (Vincent)

 - random cleanups (Shaopeng, Ben, Yury)

* tag 'bitmap-for-6.17' of https://github.com/norov/linux:
  bitfield: Ensure the return values of helper functions are checked
  test_bits: add tests for __GENMASK() and __GENMASK_ULL()
  bits: unify the non-asm GENMASK*()
  bits: split the definition of the asm and non-asm GENMASK*()
  cpumask: Remove unnecessary cpumask_nth_andnot()
  watchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu()
  clocksource: Improve randomness in clocksource_verify_choose_cpus()
  cpumask: introduce cpumask_random()
  bitmap: generalize node_random()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull bitmap updates from Yury Norov:

 - find_random_bit() series (Yury)

 - GENMASK() consolidation (Vincent)

 - random cleanups (Shaopeng, Ben, Yury)

* tag 'bitmap-for-6.17' of https://github.com/norov/linux:
  bitfield: Ensure the return values of helper functions are checked
  test_bits: add tests for __GENMASK() and __GENMASK_ULL()
  bits: unify the non-asm GENMASK*()
  bits: split the definition of the asm and non-asm GENMASK*()
  cpumask: Remove unnecessary cpumask_nth_andnot()
  watchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu()
  clocksource: Improve randomness in clocksource_verify_choose_cpus()
  cpumask: introduce cpumask_random()
  bitmap: generalize node_random()
</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Improve randomness in clocksource_verify_choose_cpus()</title>
<updated>2025-07-31T15:27:48+00:00</updated>
<author>
<name>Yury Norov [NVIDIA]</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-06-19T18:26:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8557c8628cf3cf8ebd3b32601ccdde550bbf6c54'/>
<id>8557c8628cf3cf8ebd3b32601ccdde550bbf6c54</id>
<content type='text'>
The current algorithm of picking a random CPU works OK for dense online
cpumask, but if cpumask is non-dense, the distribution of picked CPUs
is skewed.

For example, on 8-CPU board with CPUs 4-7 offlined, the probability of
selecting CPU 0 is 5/8. Accordingly, cpus 1, 2 and 3 are chosen with
probability 1/8 each. The proper algorithm should pick each online CPU
with probability 1/4.

Switch it to cpumask_random(), which has better statistical
characteristics.

CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: John Stultz &lt;jstultz@google.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: "Yury Norov [NVIDIA]" &lt;yury.norov@gmail.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current algorithm of picking a random CPU works OK for dense online
cpumask, but if cpumask is non-dense, the distribution of picked CPUs
is skewed.

For example, on 8-CPU board with CPUs 4-7 offlined, the probability of
selecting CPU 0 is 5/8. Accordingly, cpus 1, 2 and 3 are chosen with
probability 1/8 each. The proper algorithm should pick each online CPU
with probability 1/4.

Switch it to cpumask_random(), which has better statistical
characteristics.

CC: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Acked-by: John Stultz &lt;jstultz@google.com&gt;
Reviewed-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: "Yury Norov [NVIDIA]" &lt;yury.norov@gmail.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Use cpumask_next_wrap() in clocksource_watchdog()</title>
<updated>2025-06-14T18:09:44+00:00</updated>
<author>
<name>Yury Norov [NVIDIA]</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-06-14T15:50:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bfa788dc2ddaea7d7930f63a5c7c8f3668a3f2c5'/>
<id>bfa788dc2ddaea7d7930f63a5c7c8f3668a3f2c5</id>
<content type='text'>
cpumask_next_wrap() is more verbose and efficient comparing to
cpumask_next() followed by cpumask_first().

Signed-off-by: Yury Norov [NVIDIA] &lt;yury.norov@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: John Stultz &lt;jstultz@google.com&gt;
Link: https://lore.kernel.org/all/20250614155031.340988-3-yury.norov@gmail.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cpumask_next_wrap() is more verbose and efficient comparing to
cpumask_next() followed by cpumask_first().

Signed-off-by: Yury Norov [NVIDIA] &lt;yury.norov@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: John Stultz &lt;jstultz@google.com&gt;
Link: https://lore.kernel.org/all/20250614155031.340988-3-yury.norov@gmail.com

</pre>
</div>
</content>
</entry>
<entry>
<title>clocksource: Use cpumask_any_but() in clocksource_verify_choose_cpus()</title>
<updated>2025-06-14T18:09:44+00:00</updated>
<author>
<name>Yury Norov [NVIDIA]</name>
<email>yury.norov@gmail.com</email>
</author>
<published>2025-06-14T15:50:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4fa7d61d5a02ad57a05c69365db293afddf678fc'/>
<id>4fa7d61d5a02ad57a05c69365db293afddf678fc</id>
<content type='text'>
cpumask_any_but() is more verbose than cpumask_first() followed by
cpumask_next(). Use it in clocksource_verify_choose_cpus().

Signed-off-by: Yury Norov [NVIDIA] &lt;yury.norov@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: John Stultz &lt;jstultz@google.com&gt;
Link: https://lore.kernel.org/all/20250614155031.340988-2-yury.norov@gmail.com

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
cpumask_any_but() is more verbose than cpumask_first() followed by
cpumask_next(). Use it in clocksource_verify_choose_cpus().

Signed-off-by: Yury Norov [NVIDIA] &lt;yury.norov@gmail.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Acked-by: John Stultz &lt;jstultz@google.com&gt;
Link: https://lore.kernel.org/all/20250614155031.340988-2-yury.norov@gmail.com

</pre>
</div>
</content>
</entry>
</feed>
