<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/watchdog_perf.c, branch v7.0</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency</title>
<updated>2026-02-08T08:13:35+00:00</updated>
<author>
<name>Qiliang Yuan</name>
<email>realwujing@gmail.com</email>
</author>
<published>2026-01-29T02:26:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0dddf20b4fd4afd59767acc144ad4da60259f21f'/>
<id>0dddf20b4fd4afd59767acc144ad4da60259f21f</id>
<content type='text'>
Simplify the hardlockup detector's probe path and remove its implicit
dependency on pinned per-cpu execution.

Refactor hardlockup_detector_event_create() to be stateless.  Return the
created perf_event pointer to the caller instead of directly modifying the
per-cpu 'watchdog_ev' variable.  This allows the probe path to safely
manage a temporary event without the risk of leaving stale pointers should
task migration occur.

Link: https://lkml.kernel.org/r/20260129022629.2201331-1-realwujing@gmail.com
Signed-off-by: Shouxin Sun &lt;sunshx@chinatelecom.cn&gt;
Signed-off-by: Junnan Zhang &lt;zhangjn11@chinatelecom.cn&gt;
Signed-off-by: Qiliang Yuan &lt;yuanql9@chinatelecom.cn&gt;
Signed-off-by: Qiliang Yuan &lt;realwujing@gmail.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Jinchao Wang &lt;wangjinchao600@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Huafei &lt;lihuafei1@huawei.com&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thorsten Blum &lt;thorsten.blum@linux.dev&gt;
Cc: Wang Jinchao &lt;wangjinchao600@gmail.com&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Simplify the hardlockup detector's probe path and remove its implicit
dependency on pinned per-cpu execution.

Refactor hardlockup_detector_event_create() to be stateless.  Return the
created perf_event pointer to the caller instead of directly modifying the
per-cpu 'watchdog_ev' variable.  This allows the probe path to safely
manage a temporary event without the risk of leaving stale pointers should
task migration occur.

Link: https://lkml.kernel.org/r/20260129022629.2201331-1-realwujing@gmail.com
Signed-off-by: Shouxin Sun &lt;sunshx@chinatelecom.cn&gt;
Signed-off-by: Junnan Zhang &lt;zhangjn11@chinatelecom.cn&gt;
Signed-off-by: Qiliang Yuan &lt;yuanql9@chinatelecom.cn&gt;
Signed-off-by: Qiliang Yuan &lt;realwujing@gmail.com&gt;
Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Cc: Jinchao Wang &lt;wangjinchao600@gmail.com&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Li Huafei &lt;lihuafei1@huawei.com&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thorsten Blum &lt;thorsten.blum@linux.dev&gt;
Cc: Wang Jinchao &lt;wangjinchao600@gmail.com&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog: skip checks when panic is in progress</title>
<updated>2025-09-14T00:32:53+00:00</updated>
<author>
<name>Jinchao Wang</name>
<email>wangjinchao600@gmail.com</email>
</author>
<published>2025-08-25T02:29:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3d5f4f15b778d6da9760d54455cc256ecf924c0a'/>
<id>3d5f4f15b778d6da9760d54455cc256ecf924c0a</id>
<content type='text'>
This issue was found when an EFI pstore was configured for kdump logging
with the NMI hard lockup detector enabled.  The efi-pstore write operation
was slow, and with a large number of logs, the pstore dump callback within
kmsg_dump() took a long time.

This delay triggered the NMI watchdog, leading to a nested panic.  The
call flow demonstrates how the secondary panic caused an
emergency_restart() to be triggered before the initial pstore operation
could finish, leading to a failure to dump the logs:

  real panic() {
	kmsg_dump() {
		...
		pstore_dump() {
			start_dump();
			... // long time operation triggers NMI watchdog
			nmi panic() {
				...
				emergency_restart(); // pstore unfinished
			}
			...
			finish_dump(); // never reached
		}
	}
  }

Both watchdog_buddy_check_hardlockup() and watchdog_overflow_callback()
may trigger during a panic.  This can lead to recursive panic handling.

Add panic_in_progress() checks so watchdog activity is skipped once a
panic has begun.

This prevents recursive panic and keeps the panic path more reliable.

Link: https://lkml.kernel.org/r/20250825022947.1596226-10-wangjinchao600@gmail.com
Signed-off-by: Jinchao Wang &lt;wangjinchao600@gmail.com&gt;
Reviewed-by: Yury Norov (NVIDIA) &lt;yury.norov@gmail.com&gt;
Cc: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Cc: "Guilherme G. Piccoli" &lt;gpiccoli@igalia.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Joanthan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Kees Cook &lt;kees@kernel.org&gt;
Cc: Li Huafei &lt;lihuafei1@huawei.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Luo Gengkun &lt;luogengkun@huaweicloud.com&gt;
Cc: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Cc: Nam Cao &lt;namcao@linutronix.de&gt;
Cc: oushixiong &lt;oushixiong@kylinos.cn&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Qianqiang Liu &lt;qianqiang.liu@163.com&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: Thomas Zimemrmann &lt;tzimmermann@suse.de&gt;
Cc: Thorsten Blum &lt;thorsten.blum@linux.dev&gt;
Cc: Ville Syrjala &lt;ville.syrjala@linux.intel.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Yunhui Cui &lt;cuiyunhui@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This issue was found when an EFI pstore was configured for kdump logging
with the NMI hard lockup detector enabled.  The efi-pstore write operation
was slow, and with a large number of logs, the pstore dump callback within
kmsg_dump() took a long time.

This delay triggered the NMI watchdog, leading to a nested panic.  The
call flow demonstrates how the secondary panic caused an
emergency_restart() to be triggered before the initial pstore operation
could finish, leading to a failure to dump the logs:

  real panic() {
	kmsg_dump() {
		...
		pstore_dump() {
			start_dump();
			... // long time operation triggers NMI watchdog
			nmi panic() {
				...
				emergency_restart(); // pstore unfinished
			}
			...
			finish_dump(); // never reached
		}
	}
  }

Both watchdog_buddy_check_hardlockup() and watchdog_overflow_callback()
may trigger during a panic.  This can lead to recursive panic handling.

Add panic_in_progress() checks so watchdog activity is skipped once a
panic has begun.

This prevents recursive panic and keeps the panic path more reliable.

Link: https://lkml.kernel.org/r/20250825022947.1596226-10-wangjinchao600@gmail.com
Signed-off-by: Jinchao Wang &lt;wangjinchao600@gmail.com&gt;
Reviewed-by: Yury Norov (NVIDIA) &lt;yury.norov@gmail.com&gt;
Cc: Anna Schumaker &lt;anna.schumaker@oracle.com&gt;
Cc: Baoquan He &lt;bhe@redhat.com&gt;
Cc: "Darrick J. Wong" &lt;djwong@kernel.org&gt;
Cc: Dave Young &lt;dyoung@redhat.com&gt;
Cc: Doug Anderson &lt;dianders@chromium.org&gt;
Cc: "Guilherme G. Piccoli" &lt;gpiccoli@igalia.com&gt;
Cc: Helge Deller &lt;deller@gmx.de&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Jason Gunthorpe &lt;jgg@ziepe.ca&gt;
Cc: Joanthan Cameron &lt;Jonathan.Cameron@huawei.com&gt;
Cc: Joel Granados &lt;joel.granados@kernel.org&gt;
Cc: John Ogness &lt;john.ogness@linutronix.de&gt;
Cc: Kees Cook &lt;kees@kernel.org&gt;
Cc: Li Huafei &lt;lihuafei1@huawei.com&gt;
Cc: "Luck, Tony" &lt;tony.luck@intel.com&gt;
Cc: Luo Gengkun &lt;luogengkun@huaweicloud.com&gt;
Cc: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Cc: Nam Cao &lt;namcao@linutronix.de&gt;
Cc: oushixiong &lt;oushixiong@kylinos.cn&gt;
Cc: Petr Mladek &lt;pmladek@suse.com&gt;
Cc: Qianqiang Liu &lt;qianqiang.liu@163.com&gt;
Cc: Sergey Senozhatsky &lt;senozhatsky@chromium.org&gt;
Cc: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Cc: Thomas Zimemrmann &lt;tzimmermann@suse.de&gt;
Cc: Thorsten Blum &lt;thorsten.blum@linux.dev&gt;
Cc: Ville Syrjala &lt;ville.syrjala@linux.intel.com&gt;
Cc: Vivek Goyal &lt;vgoyal@redhat.com&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Yunhui Cui &lt;cuiyunhui@bytedance.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/perf: Provide function for adjusting the event period</title>
<updated>2025-07-04T12:17:30+00:00</updated>
<author>
<name>Yicong Yang</name>
<email>yangyicong@hisilicon.com</email>
</author>
<published>2025-07-01T11:02:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=60bc47b5a0b164082d448815d7db3066266aa3ed'/>
<id>60bc47b5a0b164082d448815d7db3066266aa3ed</id>
<content type='text'>
Architecture's using perf events for hard lockup detection needs to
convert the watchdog_thresh to the event's period, some architecture
for example arm64 perform this conversion using the CPU's maximum
frequency which will be acquired by cpufreq. However by the time
the lockup detector's initialized the cpufreq driver may not be
initialized, thus launch a watchdog with inaccurate period. Provide
a function hardlockup_detector_perf_adjust_period() to allowing
adjust the event period. Then architecture can update with more
accurate period if cpufreq is initialized.

Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Link: https://lore.kernel.org/r/20250701110214.27242-2-yangyicong@huawei.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Architecture's using perf events for hard lockup detection needs to
convert the watchdog_thresh to the event's period, some architecture
for example arm64 perform this conversion using the CPU's maximum
frequency which will be acquired by cpufreq. However by the time
the lockup detector's initialized the cpufreq driver may not be
initialized, thus launch a watchdog with inaccurate period. Provide
a function hardlockup_detector_perf_adjust_period() to allowing
adjust the event period. Then architecture can update with more
accurate period if cpufreq is initialized.

Reviewed-by: Douglas Anderson &lt;dianders@chromium.org&gt;
Signed-off-by: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Link: https://lore.kernel.org/r/20250701110214.27242-2-yangyicong@huawei.com
Signed-off-by: Will Deacon &lt;will@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2025-04-01T17:06:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-04-01T17:06:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d6b02199cde4b9cb99b311eeab1cdbe23165082c'/>
<id>d6b02199cde4b9cb99b311eeab1cdbe23165082c</id>
<content type='text'>
Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull non-MM updates from Andrew Morton:

 - The series "powerpc/crash: use generic crashkernel reservation" from
   Sourabh Jain changes powerpc's kexec code to use more of the generic
   layers.

 - The series "get_maintainer: report subsystem status separately" from
   Vlastimil Babka makes some long-requested improvements to the
   get_maintainer output.

 - The series "ucount: Simplify refcounting with rcuref_t" from
   Sebastian Siewior cleans up and optimizing the refcounting in the
   ucount code.

 - The series "reboot: support runtime configuration of emergency
   hw_protection action" from Ahmad Fatoum improves the ability for a
   driver to perform an emergency system shutdown or reboot.

 - The series "Converge on using secs_to_jiffies() part two" from Easwar
   Hariharan performs further migrations from msecs_to_jiffies() to
   secs_to_jiffies().

 - The series "lib/interval_tree: add some test cases and cleanup" from
   Wei Yang permits more userspace testing of kernel library code, adds
   some more tests and performs some cleanups.

 - The series "hung_task: Dump the blocking task stacktrace" from Masami
   Hiramatsu arranges for the hung_task detector to dump the stack of
   the blocking task and not just that of the blocked task.

 - The series "resource: Split and use DEFINE_RES*() macros" from Andy
   Shevchenko provides some cleanups to the resource definition macros.

 - Plus the usual shower of singleton patches - please see the
   individual changelogs for details.

* tag 'mm-nonmm-stable-2025-03-30-18-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
  mailmap: consolidate email addresses of Alexander Sverdlin
  fs/procfs: fix the comment above proc_pid_wchan()
  relay: use kasprintf() instead of fixed buffer formatting
  resource: replace open coded variant of DEFINE_RES()
  resource: replace open coded variants of DEFINE_RES_*_NAMED()
  resource: replace open coded variant of DEFINE_RES_NAMED_DESC()
  resource: split DEFINE_RES_NAMED_DESC() out of DEFINE_RES_NAMED()
  samples: add hung_task detector mutex blocking sample
  hung_task: show the blocker task if the task is hung on mutex
  kexec_core: accept unaccepted kexec segments' destination addresses
  watchdog/perf: optimize bytes copied and remove manual NUL-termination
  lib/interval_tree: fix the comment of interval_tree_span_iter_next_gap()
  lib/interval_tree: skip the check before go to the right subtree
  lib/interval_tree: add test case for span iteration
  lib/interval_tree: add test case for interval_tree_iter_xxx() helpers
  lib/rbtree: add random seed
  lib/rbtree: split tests
  lib/rbtree: enable userland test suite for rbtree related data structure
  checkpatch: describe --min-conf-desc-length
  scripts/gdb/symbols: determine KASLR offset on s390
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/perf: optimize bytes copied and remove manual NUL-termination</title>
<updated>2025-03-17T19:17:01+00:00</updated>
<author>
<name>Thorsten Blum</name>
<email>thorsten.blum@linux.dev</email>
</author>
<published>2025-03-13T13:30:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6164be01f1799458b5c6f59a547d6241e4a4b4d6'/>
<id>6164be01f1799458b5c6f59a547d6241e4a4b4d6</id>
<content type='text'>
Currently, up to 23 bytes of the source string are copied to the
destination buffer (including the comma and anything after it), only to
then manually NUL-terminate the destination buffer again at index 'len'
(where the comma was found).

Fix this by calling strscpy() with 'len' instead of the destination buffer
size to copy only as many bytes from the source string as needed.

Change the length check to allow 'len' to be less than or equal to the
destination buffer size to fill the whole buffer if needed.

Remove the if-check for the return value of strscpy(), because calling
strscpy() with 'len' always truncates the source string at the comma as
expected and NUL-terminates the destination buffer at the corresponding
index instead.  Remove the manual NUL-termination.

No functional changes intended.

Link: https://lkml.kernel.org/r/20250313133004.36406-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum &lt;thorsten.blum@linux.dev&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, up to 23 bytes of the source string are copied to the
destination buffer (including the comma and anything after it), only to
then manually NUL-terminate the destination buffer again at index 'len'
(where the comma was found).

Fix this by calling strscpy() with 'len' instead of the destination buffer
size to copy only as many bytes from the source string as needed.

Change the length check to allow 'len' to be less than or equal to the
destination buffer size to fill the whole buffer if needed.

Remove the if-check for the return value of strscpy(), because calling
strscpy() with 'len' always truncates the source string at the comma as
expected and NUL-terminates the destination buffer at the corresponding
index instead.  Remove the manual NUL-termination.

No functional changes intended.

Link: https://lkml.kernel.org/r/20250313133004.36406-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum &lt;thorsten.blum@linux.dev&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Thomas Gleinxer &lt;tglx@linutronix.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup/perf: Warn if watchdog_ev is leaked</title>
<updated>2025-03-06T11:07:39+00:00</updated>
<author>
<name>Li Huafei</name>
<email>lihuafei1@huawei.com</email>
</author>
<published>2024-10-21T19:30:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=05763885e327f0e257ee8b96b30ac1b95f7dd532'/>
<id>05763885e327f0e257ee8b96b30ac1b95f7dd532</id>
<content type='text'>
When creating a new perf_event for the hardlockup watchdog, it should not
happen that the old perf_event is not released.

Introduce a WARN_ONCE() that should never trigger.

[ mingo: Changed the type of the warning to WARN_ONCE(). ]

Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241021193004.308303-2-lihuafei1@huawei.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When creating a new perf_event for the hardlockup watchdog, it should not
happen that the old perf_event is not released.

Introduce a WARN_ONCE() that should never trigger.

[ mingo: Changed the type of the warning to WARN_ONCE(). ]

Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241021193004.308303-2-lihuafei1@huawei.com
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/hardlockup/perf: Fix perf_event memory leak</title>
<updated>2025-03-06T11:05:33+00:00</updated>
<author>
<name>Li Huafei</name>
<email>lihuafei1@huawei.com</email>
</author>
<published>2024-10-21T19:30:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d6834d9c990333bfa433bc1816e2417f268eebbe'/>
<id>d6834d9c990333bfa433bc1816e2417f268eebbe</id>
<content type='text'>
During stress-testing, we found a kmemleak report for perf_event:

  unreferenced object 0xff110001410a33e0 (size 1328):
    comm "kworker/4:11", pid 288, jiffies 4294916004
    hex dump (first 32 bytes):
      b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de  ...;....".......
      f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff  .3.A.....3.A....
    backtrace (crc 24eb7b3a):
      [&lt;00000000e211b653&gt;] kmem_cache_alloc_node_noprof+0x269/0x2e0
      [&lt;000000009d0985fa&gt;] perf_event_alloc+0x5f/0xcf0
      [&lt;00000000084ad4a2&gt;] perf_event_create_kernel_counter+0x38/0x1b0
      [&lt;00000000fde96401&gt;] hardlockup_detector_event_create+0x50/0xe0
      [&lt;0000000051183158&gt;] watchdog_hardlockup_enable+0x17/0x70
      [&lt;00000000ac89727f&gt;] softlockup_start_fn+0x15/0x40
      ...

Our stress test includes CPU online and offline cycles, and updating the
watchdog configuration.

After reading the code, I found that there may be a race between cleaning up
perf_event after updating watchdog and disabling event when the CPU goes offline:

  CPU0                          CPU1                           CPU2
  (update watchdog)                                            (hotplug offline CPU1)

  ...                                                          _cpu_down(CPU1)
  cpus_read_lock()                                             // waiting for cpu lock
    softlockup_start_all
      smp_call_on_cpu(CPU1)
                                softlockup_start_fn
                                ...
                                  watchdog_hardlockup_enable(CPU1)
                                    perf create E1
                                    watchdog_ev[CPU1] = E1
  cpus_read_unlock()
                                                               cpus_write_lock()
                                                               cpuhp_kick_ap_work(CPU1)
                                cpuhp_thread_fun
                                ...
                                  watchdog_hardlockup_disable(CPU1)
                                    watchdog_ev[CPU1] = NULL
                                    dead_event[CPU1] = E1
  __lockup_detector_cleanup
    for each dead_events_mask
      release each dead_event
      /*
       * CPU1 has not been added to
       * dead_events_mask, then E1
       * will not be released
       */
                                    CPU1 -&gt; dead_events_mask
    cpumask_clear(&amp;dead_events_mask)
    // dead_events_mask is cleared, E1 is leaked

In this case, the leaked perf_event E1 matches the perf_event leak
reported by kmemleak. Due to the low probability of problem recurrence
(only reported once), I added some hack delays in the code:

  static void __lockup_detector_reconfigure(void)
  {
    ...
          watchdog_hardlockup_start();
          cpus_read_unlock();
  +       mdelay(100);
          /*
           * Must be called outside the cpus locked section to prevent
           * recursive locking in the perf code.
    ...
  }

  void watchdog_hardlockup_disable(unsigned int cpu)
  {
    ...
                  perf_event_disable(event);
                  this_cpu_write(watchdog_ev, NULL);
                  this_cpu_write(dead_event, event);
  +               mdelay(100);
                  cpumask_set_cpu(smp_processor_id(), &amp;dead_events_mask);
                  atomic_dec(&amp;watchdog_cpus);
    ...
  }

  void hardlockup_detector_perf_cleanup(void)
  {
    ...
                          perf_event_release_kernel(event);
                  per_cpu(dead_event, cpu) = NULL;
          }
  +       mdelay(100);
          cpumask_clear(&amp;dead_events_mask);
  }

Then, simultaneously performing CPU on/off and switching watchdog, it is
almost certain to reproduce this leak.

The problem here is that releasing perf_event is not within the CPU
hotplug read-write lock. Commit:

  941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")

introduced deferred release to solve the deadlock caused by calling
get_online_cpus() when releasing perf_event. Later, commit:

  efe951d3de91 ("perf/x86: Fix perf,x86,cpuhp deadlock")

removed the get_online_cpus() call on the perf_event release path to solve
another deadlock problem.

Therefore, it is now possible to move the release of perf_event back
into the CPU hotplug read-write lock, and release the event immediately
after disabling it.

Fixes: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")
Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241021193004.308303-1-lihuafei1@huawei.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
During stress-testing, we found a kmemleak report for perf_event:

  unreferenced object 0xff110001410a33e0 (size 1328):
    comm "kworker/4:11", pid 288, jiffies 4294916004
    hex dump (first 32 bytes):
      b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de  ...;....".......
      f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff  .3.A.....3.A....
    backtrace (crc 24eb7b3a):
      [&lt;00000000e211b653&gt;] kmem_cache_alloc_node_noprof+0x269/0x2e0
      [&lt;000000009d0985fa&gt;] perf_event_alloc+0x5f/0xcf0
      [&lt;00000000084ad4a2&gt;] perf_event_create_kernel_counter+0x38/0x1b0
      [&lt;00000000fde96401&gt;] hardlockup_detector_event_create+0x50/0xe0
      [&lt;0000000051183158&gt;] watchdog_hardlockup_enable+0x17/0x70
      [&lt;00000000ac89727f&gt;] softlockup_start_fn+0x15/0x40
      ...

Our stress test includes CPU online and offline cycles, and updating the
watchdog configuration.

After reading the code, I found that there may be a race between cleaning up
perf_event after updating watchdog and disabling event when the CPU goes offline:

  CPU0                          CPU1                           CPU2
  (update watchdog)                                            (hotplug offline CPU1)

  ...                                                          _cpu_down(CPU1)
  cpus_read_lock()                                             // waiting for cpu lock
    softlockup_start_all
      smp_call_on_cpu(CPU1)
                                softlockup_start_fn
                                ...
                                  watchdog_hardlockup_enable(CPU1)
                                    perf create E1
                                    watchdog_ev[CPU1] = E1
  cpus_read_unlock()
                                                               cpus_write_lock()
                                                               cpuhp_kick_ap_work(CPU1)
                                cpuhp_thread_fun
                                ...
                                  watchdog_hardlockup_disable(CPU1)
                                    watchdog_ev[CPU1] = NULL
                                    dead_event[CPU1] = E1
  __lockup_detector_cleanup
    for each dead_events_mask
      release each dead_event
      /*
       * CPU1 has not been added to
       * dead_events_mask, then E1
       * will not be released
       */
                                    CPU1 -&gt; dead_events_mask
    cpumask_clear(&amp;dead_events_mask)
    // dead_events_mask is cleared, E1 is leaked

In this case, the leaked perf_event E1 matches the perf_event leak
reported by kmemleak. Due to the low probability of problem recurrence
(only reported once), I added some hack delays in the code:

  static void __lockup_detector_reconfigure(void)
  {
    ...
          watchdog_hardlockup_start();
          cpus_read_unlock();
  +       mdelay(100);
          /*
           * Must be called outside the cpus locked section to prevent
           * recursive locking in the perf code.
    ...
  }

  void watchdog_hardlockup_disable(unsigned int cpu)
  {
    ...
                  perf_event_disable(event);
                  this_cpu_write(watchdog_ev, NULL);
                  this_cpu_write(dead_event, event);
  +               mdelay(100);
                  cpumask_set_cpu(smp_processor_id(), &amp;dead_events_mask);
                  atomic_dec(&amp;watchdog_cpus);
    ...
  }

  void hardlockup_detector_perf_cleanup(void)
  {
    ...
                          perf_event_release_kernel(event);
                  per_cpu(dead_event, cpu) = NULL;
          }
  +       mdelay(100);
          cpumask_clear(&amp;dead_events_mask);
  }

Then, simultaneously performing CPU on/off and switching watchdog, it is
almost certain to reproduce this leak.

The problem here is that releasing perf_event is not within the CPU
hotplug read-write lock. Commit:

  941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")

introduced deferred release to solve the deadlock caused by calling
get_online_cpus() when releasing perf_event. Later, commit:

  efe951d3de91 ("perf/x86: Fix perf,x86,cpuhp deadlock")

removed the get_online_cpus() call on the perf_event release path to solve
another deadlock problem.

Therefore, it is now possible to move the release of perf_event back
into the CPU hotplug read-write lock, and release the event immediately
after disabling it.

Fixes: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")
Signed-off-by: Li Huafei &lt;lihuafei1@huawei.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20241021193004.308303-1-lihuafei1@huawei.com
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog/perf: properly initialize the turbo mode timestamp and rearm counter</title>
<updated>2024-07-18T04:11:34+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2024-07-11T20:25:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f944ffcbc2e1c759764850261670586ddf3bdabb'/>
<id>f944ffcbc2e1c759764850261670586ddf3bdabb</id>
<content type='text'>
For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.

This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.

So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period.  As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.

Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.

Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.

This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.

So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period.  As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.

Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.

Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Arjan van de Ven &lt;arjan@linux.intel.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/watchdog_perf.c: tidy up kerneldoc</title>
<updated>2024-05-08T15:41:29+00:00</updated>
<author>
<name>Andrew Morton</name>
<email>akpm@linux-foundation.org</email>
</author>
<published>2024-05-04T23:41:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8fcb916cac8985188a3f825966ffa99507a90cb1'/>
<id>8fcb916cac8985188a3f825966ffa99507a90cb1</id>
<content type='text'>
It is unconventional to have a blank line between name-of-function and
description-of-args.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It is unconventional to have a blank line between name-of-function and
description-of-args.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>watchdog: allow nmi watchdog to use raw perf event</title>
<updated>2024-05-08T15:41:29+00:00</updated>
<author>
<name>Song Liu</name>
<email>song@kernel.org</email>
</author>
<published>2024-04-30T06:02:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=393fb313a2e150b768e4850658679e2afff431e9'/>
<id>393fb313a2e150b768e4850658679e2afff431e9</id>
<content type='text'>
NMI watchdog permanently consumes one hardware counters per CPU on the
system.  For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.

OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used.  Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog
to use raw event.  For example, on Intel CPUs, we can use "r300" to
configure the watchdog to use ref-cycles event.

If the raw event does not work, fall back to use "cycles".

[akpm@linux-foundation.org: fix kerneldoc]
Link: https://lkml.kernel.org/r/20240430060236.1878002-2-song@kernel.org
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
NMI watchdog permanently consumes one hardware counters per CPU on the
system.  For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.

OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used.  Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog
to use raw event.  For example, on Intel CPUs, we can use "r300" to
configure the watchdog to use ref-cycles event.

If the raw event does not work, fall back to use "cycles".

[akpm@linux-foundation.org: fix kerneldoc]
Link: https://lkml.kernel.org/r/20240430060236.1878002-2-song@kernel.org
Signed-off-by: Song Liu &lt;song@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: "Matthew Wilcox (Oracle)" &lt;willy@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
