summaryrefslogtreecommitdiff
path: root/tools/perf/scripts/python
diff options
context:
space:
mode:
authorAaron Tomlin <atomlin@atomlin.com>2026-03-03 15:30:30 -0500
committerAndrew Morton <akpm@linux-foundation.org>2026-03-27 21:19:40 -0700
commit49085e1b70f898695b63594ff559f5a243589b83 (patch)
tree4f24d7ac6f984c13d516e72d850b59bde9186748 /tools/perf/scripts/python
parent00b5cdeb9fe761654d5a76a411c79b8ff04a81e5 (diff)
hung_task: enable runtime reset of hung_task_detect_count
Currently, the hung_task_detect_count sysctl provides a cumulative count of hung tasks since boot. In long-running, high-availability environments, this counter may lose its utility if it cannot be reset once an incident has been resolved. Furthermore, the previous implementation relied upon implicit ordering, which could not strictly guarantee that diagnostic metadata published by one CPU was visible to the panic logic on another. This patch introduces the capability to reset the detection count by writing "0" to the hung_task_detect_count sysctl. The proc_handler logic has been updated to validate this input and atomically reset the counter. The synchronisation of sysctl_hung_task_detect_count relies upon a transactional model to ensure the integrity of the detection counter against concurrent resets from userspace. The application of atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct and provides the following guarantees: 1. Prevention of Load-Store Reordering via Acquire Semantics By utilising atomic_long_read_acquire() to snapshot the counter before initiating the task traversal, we establish a strict memory barrier. This prevents the compiler or hardware from reordering the initial load to a point later in the scan. Without this "acquire" barrier, a delayed load could potentially read a "0" value resulting from a userspace reset that occurred mid-scan. This would lead to the subsequent cmpxchg succeeding erroneously, thereby overwriting the user's reset with stale increment data. 2. Atomicity of the "Commit" Phase via Release Semantics The atomic_long_cmpxchg_release() serves as the transaction's commit point. The "release" barrier ensures that all diagnostic recordings and task-state observations made during the scan are globally visible before the counter is incremented. 3. Race Condition Resolution This pairing effectively detects any "out-of-band" reset of the counter. If sysctl_hung_task_detect_count is modified via the procfs interface during the scan, the final cmpxchg will detect the discrepancy between the current value and the "acquire" snapshot. Consequently, the update will fail, ensuring that a reset command from the administrator is prioritised over a scan that may have been invalidated by that very reset. Link: https://lkml.kernel.org/r/20260303203031.4097316-3-atomlin@atomlin.com Signed-off-by: Aaron Tomlin <atomlin@atomlin.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Joel Granados <joel.granados@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Lance Yang <lance.yang@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'tools/perf/scripts/python')
0 files changed, 0 insertions, 0 deletions