linux.git/kernel/watchdog.c, branch v7.1-rc4

watchdog/hardlockup: improve buddy system detection timeliness

2026-03-28T04:19:47+00:00

Currently, the buddy system only performs checks every 3rd sample.  With a
4-second interval.  If a check window is missed, the next check occurs 12
seconds later, potentially delaying hard lockup detection for up to 24
seconds.

Modify the buddy system to perform checks at every interval (4s). 
Introduce a missed-interrupt threshold to maintain the existing grace
period while reducing the detection window to 8-12 seconds.

Best and worst case detection scenarios:

Before (12s check window):
- Best case: Lockup occurs after first check but just before heartbeat
  interval. Detected in ~8s (8s till next check).
- Worst case: Lockup occurs just after a check.
  Detected in ~24s (missed check + 12s till next check + 12s logic).

After (4s check window with threshold of 3):
- Best case: Lockup occurs just before a check.
  Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
- Worst case: Lockup occurs just after a check.
  Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-4-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta 
Reviewed-by: Douglas Anderson 
Reviewed-by: Petr Mladek 
Cc: Ian Rogers 
Cc: Jonathan Corbet 
Cc: Li Huafei 
Cc: Max Kellermann 
Cc: Shuah Khan 
Cc: Stephane Erainan 
Cc: Wang Jinchao 
Cc: Yunhui Cui 
Signed-off-by: Andrew Morton

watchdog: update saved interrupts during check

2026-03-28T04:19:46+00:00

Currently, arch_touch_nmi_watchdog() causes an early return that skips
updating hrtimer_interrupts_saved.  This leads to stale comparisons and
delayed lockup detection.

I found this issue because in our system the serial console is fairly
chatty.  For example, the 8250 console driver frequently calls
touch_nmi_watchdog() via console_write().  If a CPU locks up after a timer
interrupt but before next watchdog check, we see the following sequence:

  * watchdog_hardlockup_check() saves counter (e.g., 1000)
  * Timer runs and updates the counter (1001)
  * touch_nmi_watchdog() is called
  * CPU locks up
  * 10s pass: check() notices touch, returns early, skips update
  * 10s pass: check() saves counter (1001)
  * 10s pass: check() finally detects lockup

This delays detection to 30 seconds.  With this fix, we detect the lockup
in 20 seconds.

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-2-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta 
Reviewed-by: Douglas Anderson 
Reviewed-by: Petr Mladek 
Cc: Ian Rogers 
Cc: Jonathan Corbet 
Cc: Li Huafei 
Cc: Max Kellermann 
Cc: Shuah Khan 
Cc: Stephane Erainan 
Cc: Wang Jinchao 
Cc: Yunhui Cui 
Signed-off-by: Andrew Morton

watchdog: return early in watchdog_hardlockup_check()

2026-03-28T04:19:46+00:00

Patch series "watchdog/hardlockup: Improvements to hardlockup", v2.

This series addresses limitations in the hardlockup detector
implementations and updates the documentation to reflect actual behavior
and recent changes.

The changes are structured as follows:

Refactoring (Patch 1)
=====================
Patch 1 refactors watchdog_hardlockup_check() to return early if no
lockup is detected. This reduces the indentation level of the main
logic block, serving as a clean base for the subsequent changes.

Hardlockup Detection Improvements (Patches 2 & 4)
=================================================
The hardlockup detector logic relies on updating saved interrupt counts to
determine if the CPU is making progress.

Patch 1 ensures that the saved interrupt count is updated unconditionally
before checking the "touched" flag.  This prevents stale comparisons which
can delay detection.  This is a logic fix that ensures the detector
remains accurate even when the watchdog is frequently touched.

Patch 3 improves the Buddy detector's timeliness.  The current checking
interval (every 3rd sample) causes high variability in detection time (up
to 24s).  This patch changes the Buddy detector to check at every hrtimer
interval (4s) with a missed-interrupt threshold of 3, narrowing the
detection window to a consistent 8-12 second range.

Documentation Updates (Patches 3 & 5)
=====================================
The current documentation does not fully capture the variable nature of
detection latency or the details of the Buddy system.

Patch 3 removes the strict "10 seconds" definition of a hardlockup, which
was misleading given the periodic nature of the detector.  It adds a
"Detection Overhead" section to the admin guide, using "Best Case" and
"Worst Case" scenarios to illustrate that detection time can vary
significantly (e.g., ~6s to ~20s).

Patch 5 adds a dedicated section for the Buddy detector, which was
previously undocumented.  It details the mechanism, the new timing logic,
and known limitations.


This patch (of 5):

Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()` to
return early when a hardlockup is not detected.  This flattens the main
logic block, reducing the indentation level and making the code easier to
read and maintain.

This refactoring serves as a preparation patch for future hardlockup
changes.

Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com
Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-1-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta 
Reviewed-by: Douglas Anderson 
Reviewed-by: Petr Mladek 
Cc: Ian Rogers 
Cc: Jonathan Corbet 
Cc: Li Huafei 
Cc: Max Kellermann 
Cc: Shuah Khan 
Cc: Stephane Erainan 
Cc: Wang Jinchao 
Cc: Yunhui Cui 
Signed-off-by: Andrew Morton

watchdog/softlockup: fix sample ring index wrap in need_counting_irqs()

2026-02-08T08:13:34+00:00

cpustat_tail indexes cpustat_util[], which is a NUM_SAMPLE_PERIODS-sized
ring buffer. need_counting_irqs() currently wraps the index using
NUM_HARDIRQ_REPORT, which only happens to match NUM_SAMPLE_PERIODS.

Use NUM_SAMPLE_PERIODS for the wrap to keep the ring math correct even if
the NUM_HARDIRQ_REPORT or  NUM_SAMPLE_PERIODS changes.

Link: https://lkml.kernel.org/r/tencent_7068189CB6D6689EB353F3D17BF5A5311A07@qq.com
Fixes: e9a9292e2368 ("watchdog/softlockup: Report the most frequent interrupts")
Signed-off-by: Shengming Hu 
Reviewed-by: Petr Mladek 
Cc: Ingo Molnar 
Cc: Mark Brown 
Cc: Thomas Gleixner 
Cc: Zhang Run 
Cc: 
Signed-off-by: Andrew Morton

watchdog: softlockup: panic when lockup duration exceeds N thresholds

2026-01-21T03:44:20+00:00

The softlockup_panic sysctl is currently a binary option: panic
immediately or never panic on soft lockups.

Panicking on any soft lockup, regardless of duration, can be overly
aggressive for brief stalls that may be caused by legitimate operations. 
Conversely, never panicking may allow severe system hangs to persist
undetected.

Extend softlockup_panic to accept an integer threshold, allowing the
kernel to panic only when the normalized lockup duration exceeds N
watchdog threshold periods.  This provides finer-grained control to
distinguish between transient delays and persistent system failures.

The accepted values are:
- 0: Don't panic (unchanged)
- 1: Panic when duration >= 1 * threshold (20s default, original behavior)
- N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.)

The original behavior is preserved for values 0 and 1, maintaining full
backward compatibility while allowing systems to tolerate brief lockups
while still catching severe, persistent hangs.

[lirongqing@baidu.com: v2]
  Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com
Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com
Signed-off-by: Li RongQing 
Cc: Eduard Zingerman 
Cc: Hao Luo 
Cc: Jiri Olsa 
Cc: John Fastabend 
Cc: KP Singh 
Cc: Lance Yang 
Cc: Martin KaFai Lau 
Cc: Nicholas Piggin 
Cc: Song Liu 
Cc: Stanislav Fomichev 
Cc: Yonghong Song 
Signed-off-by: Andrew Morton

powerpc/watchdog: add support for hardlockup_sys_info sysctl

2026-01-15T06:16:22+00:00

Commit a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on
system lockup") adds 'hardlock_sys_info' systcl knob for general kernel
watchdog to control what kinds of system debug info to be dumped on
hardlockup.

Add similar support in powerpc watchdog code to make the sysctl knob more
general, which also fixes a compiling warning in general watchdog code
reported by 0day bot.

Link: https://lkml.kernel.org/r/20251231080309.39642-1-feng.tang@linux.alibaba.com
Fixes: a9af76a78760 ("watchdog: add sys_info sysctls to dump sys info on system lockup")
Signed-off-by: Feng Tang 
Reported-by: kernel test robot 
Closes: https://lore.kernel.org/oe-kbuild-all/202512030920.NFKtekA7-lkp@intel.com/
Suggested-by: Petr Mladek 
Reviewed-by: Petr Mladek 
Cc: Madhavan Srinivasan 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Signed-off-by: Andrew Morton

Merge tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2025-12-06T22:01:20+00:00

Pull non-MM updates from Andrew Morton:

 - "panic: sys_info: Refactor and fix a potential issue" (Andy Shevchenko)
   fixes a build issue and does some cleanup in ib/sys_info.c

 - "Implement mul_u64_u64_div_u64_roundup()" (David Laight)
   enhances the 64-bit math code on behalf of a PWM driver and beefs up
   the test module for these library functions

 - "scripts/gdb/symbols: make BPF debug info available to GDB" (Ilya Leoshkevich)
   makes BPF symbol names, sizes, and line numbers available to the GDB
   debugger

 - "Enable hung_task and lockup cases to dump system info on demand" (Feng Tang)
   adds a sysctl which can be used to cause additional info dumping when
   the hung-task and lockup detectors fire

 - "lib/base64: add generic encoder/decoder, migrate users" (Kuan-Wei Chiu)
   adds a general base64 encoder/decoder to lib/ and migrates several
   users away from their private implementations

 - "rbree: inline rb_first() and rb_last()" (Eric Dumazet)
   makes TCP a little faster

 - "liveupdate: Rework KHO for in-kernel users" (Pasha Tatashin)
   reworks the KEXEC Handover interfaces in preparation for Live Update
   Orchestrator (LUO), and possibly for other future clients

 - "kho: simplify state machine and enable dynamic updates" (Pasha Tatashin)
   increases the flexibility of KEXEC Handover. Also preparation for LUO

 - "Live Update Orchestrator" (Pasha Tatashin)
   is a major new feature targeted at cloud environments. Quoting the
   cover letter:

      This series introduces the Live Update Orchestrator, a kernel
      subsystem designed to facilitate live kernel updates using a
      kexec-based reboot. This capability is critical for cloud
      environments, allowing hypervisors to be updated with minimal
      downtime for running virtual machines. LUO achieves this by
      preserving the state of selected resources, such as memory,
      devices and their dependencies, across the kernel transition.

      As a key feature, this series includes support for preserving
      memfd file descriptors, which allows critical in-memory data, such
      as guest RAM or any other large memory region, to be maintained in
      RAM across the kexec reboot.

   Mike Rappaport merits a mention here, for his extensive review and
   testing work.

 - "kexec: reorganize kexec and kdump sysfs" (Sourabh Jain)
   moves the kexec and kdump sysfs entries from /sys/kernel/ to
   /sys/kernel/kexec/ and adds back-compatibility symlinks which can
   hopefully be removed one day

 - "kho: fixes for vmalloc restoration" (Mike Rapoport)
   fixes a BUG which was being hit during KHO restoration of vmalloc()
   regions

* tag 'mm-nonmm-stable-2025-12-06-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (139 commits)
  calibrate: update header inclusion
  Reinstate "resource: avoid unnecessary lookups in find_next_iomem_res()"
  vmcoreinfo: track and log recoverable hardware errors
  kho: fix restoring of contiguous ranges of order-0 pages
  kho: kho_restore_vmalloc: fix initialization of pages array
  MAINTAINERS: TPM DEVICE DRIVER: update the W-tag
  init: replace simple_strtoul with kstrtoul to improve lpj_setup
  KHO: fix boot failure due to kmemleak access to non-PRESENT pages
  Documentation/ABI: new kexec and kdump sysfs interface
  Documentation/ABI: mark old kexec sysfs deprecated
  kexec: move sysfs entries to /sys/kernel/kexec
  test_kho: always print restore status
  kho: free chunks using free_page() instead of kfree()
  selftests/liveupdate: add kexec test for multiple and empty sessions
  selftests/liveupdate: add simple kexec-based selftest for LUO
  selftests/liveupdate: add userspace API selftests
  docs: add documentation for memfd preservation via LUO
  mm: memfd_luo: allow preserving memfd
  liveupdate: luo_file: add private argument to store runtime state
  mm: shmem: export some functions to internal.h
  ...

Merge tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl

2025-12-05T19:15:37+00:00

Pull sysctl updates from Joel Granados:

 - Move jiffies converters out of kernel/sysctl.c

   Move the jiffies converters into kernel/time/jiffies.c and replace
   the pipe-max-size proc_handler converter with a macro based version.
   This is all part of the effort to relocate non-sysctl logic out of
   kernel/sysctl.c into more relevant subsystems. No functional changes.

 - Generalize proc handler converter creation

   Remove duplicated sysctl converter logic by consolidating it in
   macros. These are used inside sysctl core as well as in pipe.c and
   jiffies.c. Converter kernel and user space pointer args are now
   automatically const qualified for the convenience of the caller. No
   functional changes.

 - Miscellaneous

   Fix kernel-doc format warnings, remove unnecessary __user
   qualifiers, and move the nmi_watchdog sysctl into .rodata.

 - Testing

   This series was run through sysctl selftests/kunit test suite in
   x86_64. It went into linux-next after rc2, giving it a good 4/5 weeks
   of testing.

* tag 'sysctl-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (21 commits)
  sysctl: Wrap do_proc_douintvec with the public function proc_douintvec_conv
  sysctl: Create pipe-max-size converter using sysctl UINT macros
  sysctl: Move proc_doulongvec_ms_jiffies_minmax to kernel/time/jiffies.c
  sysctl: Move jiffies converters to kernel/time/jiffies.c
  sysctl: Move UINT converter macros to sysctl header
  sysctl: Move INT converter macros to sysctl header
  sysctl: Allow custom converters from outside sysctl
  sysctl: remove __user qualifier from stack_erasing_sysctl buffer argument
  sysctl: Create macro for user-to-kernel uint converter
  sysctl: Add optional range checking to SYSCTL_UINT_CONV_CUSTOM
  sysctl: Create unsigned int converter using new macro
  sysctl: Add optional range checking to SYSCTL_INT_CONV_CUSTOM
  sysctl: Create integer converters with one macro
  sysctl: Create converter functions with two new macros
  sysctl: Discriminate between kernel and user converter params
  sysctl: Indicate the direction of operation with macro names
  sysctl: Remove superfluous __do_proc_* indirection
  sysctl: Remove superfluous tbl_data param from "dovec" functions
  sysctl: Replace void pointer with const pointer to ctl_table
  sysctl: fix kernel-doc format warning
  ...

watchdog: add sys_info sysctls to dump sys info on system lockup

2025-11-20T22:03:43+00:00

When soft/hard lockup happens, developers may need different kinds of
system information (call-stacks, memory info, locks, etc.) to help
debugging.

Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take
human readable string like "tasks,mem,timers,locks,ftrace,...", and when
system lockup happens, all requested information will be printed out. 
(refer kernel/sys_info.c for more details).

Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com
Signed-off-by: Feng Tang 
Reviewed-by: Petr Mladek 
Cc: Jonathan Corbet 
Cc: Lance Yang 
Cc: "Paul E . McKenney" 
Cc: Petr Mladek 
Cc: Steven Rostedt 
Signed-off-by: Andrew Morton

sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs

2025-11-14T21:11:08+00:00

With the buddy lockup detector, smp_processor_id() returns the detecting CPU,
not the locked CPU, making scx_hardlockup()'s printouts confusing. Pass the
locked CPU number from watchdog_hardlockup_check() as a parameter instead.

Also add kerneldoc comments to handle_lockup(), scx_hardlockup(), and
scx_rcu_cpu_stall() documenting their return value semantics.

Suggested-by: Doug Anderson 
Reviewed-by: Douglas Anderson 
Acked-by: Andrea Righi 
Reviewed-by: Emil Tsalapatis 
Signed-off-by: Tejun Heo