| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex fixes from Ingo Molnar:
- Tighten up the sys_futex_requeue() ABI a bit, to disallow dissimilar
futex flags and potential UaF access (Peter Zijlstra)
- Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
(Hao-Yu Yang)
- Clear stale exiting pointer in futex_lock_pi() retry path, which
triggered a warning (and potential misbehavior) in stress-testing
(Davidlohr Bueso)
* tag 'locking-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Clear stale exiting pointer in futex_lock_pi() retry path
futex: Fix UaF between futex_key_to_node_opt() and vma_replace_policy()
futex: Require sys_futex_requeue() to have identical flags
|
|
The following kfuncs currently accept void *meta__ign argument:
* bpf_obj_new_impl
* bpf_obj_drop_impl
* bpf_percpu_obj_new_impl
* bpf_percpu_obj_drop_impl
* bpf_refcount_acquire_impl
* bpf_list_push_back_impl
* bpf_list_push_front_impl
* bpf_rbtree_add_impl
The __ign suffix is an indicator for the verifier to skip the argument
in check_kfunc_args(). Then, in fixup_kfunc_call() the verifier may
set the value of this argument to struct btf_struct_meta *
kptr_struct_meta from insn_aux_data.
BPF programs must pass a dummy NULL value when calling these kfuncs.
Additionally, the list and rbtree _impl kfuncs also accept an implicit
u64 argument, which doesn't require __ign suffix because it's a
scalar, and BPF programs explicitly pass 0.
Add new kfuncs with KF_IMPLICIT_ARGS [1], that correspond to each
_impl kfunc accepting meta__ign. The existing _impl kfuncs remain
unchanged for backwards compatibility.
To support this, add "btf_struct_meta" to the list of recognized
implicit argument types in resolve_btfids.
Implement is_kfunc_arg_implicit() in the verifier, that determines
implicit args by inspecting both a non-_impl BTF prototype of the
kfunc.
Update the special_kfunc_list in the verifier and relevant checks to
support both the old _impl and the new KF_IMPLICIT_ARGS variants of
btf_struct_meta users.
[1] https://lore.kernel.org/bpf/20260120222638.3976562-1-ihor.solodrai@linux.dev/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260327203241.3365046-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
formats
Change 2d8b7f9bf8e6e ("tracing: Have show_event_trigger/filter format a bit more in columns")
added space padding to align the output.
However it used ("%*.s", len, "") which requests the default precision.
It doesn't matter here whether the userspace default (0) or kernel
default (no precision) is used, but the format should be "%*s".
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260326201824.3919-1-david.laight.linux@gmail.com
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix potential deadlock in osnoise and hotplug
The interface_lock can be called by a osnoise thread and the CPU
shutdown logic of osnoise can wait for this thread to finish. But
cpus_read_lock() can also be taken while holding the interface_lock.
This produces a circular lock dependency and can cause a deadlock.
Swap the ordering of cpus_read_lock() and the interface_lock to have
interface_lock taken within the cpus_read_lock() context to prevent
this circular dependency.
- Fix freeing of event triggers in early boot up
If the same trigger is added on the kernel command line, the second
one will fail to be applied and the trigger created will be freed.
This calls into the deferred logic and creates a kernel thread to do
the freeing. But the command line logic is called before kernel
threads can be created and this leads to a NULL pointer dereference.
Delay freeing event triggers until late init.
* tag 'trace-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Drain deferred trigger frees if kthread creation fails
tracing: Fix potential deadlock in cpu hotplug with osnoise
|
|
The function tracing_alloc_snapshot() is only used between trace.c and
trace_snapshot.c. When snapshot isn't configured, it's not used at all.
The stub function was defined as a global with no users and no prototype
causing build issues.
Remove the function when snapshot isn't configured as nothing is calling
it.
Also remove the EXPORT_SYMBOL_GPL() that was associated with it as it's
not used outside of the tracing subsystem which also includes any modules.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260328101946.2c4ef4a5@robin
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/acb-IuZ4vDkwwQLW@sirena.co.uk/
Fixes: bade44fe546212 (tracing: Move snapshot code out of trace.c and into trace_snapshot.c)
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The comment in exit_itimers() still refers to itimer_delete(),
which was replaced by posix_timer_delete(). Update the comment
accordingly.
Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260326142210.98632-1-zhanxusheng@xiaomi.com
|
|
Fuzzying/stressing futexes triggered:
WARNING: kernel/futex/core.c:825 at wait_for_owner_exiting+0x7a/0x80, CPU#11: futex_lock_pi_s/524
When futex_lock_pi_atomic() sees the owner is exiting, it returns -EBUSY
and stores a refcounted task pointer in 'exiting'.
After wait_for_owner_exiting() consumes that reference, the local pointer
is never reset to nil. Upon a retry, if futex_lock_pi_atomic() returns a
different error, the bogus pointer is passed to wait_for_owner_exiting().
CPU0 CPU1 CPU2
futex_lock_pi(uaddr)
// acquires the PI futex
exit()
futex_cleanup_begin()
futex_state = EXITING;
futex_lock_pi(uaddr)
futex_lock_pi_atomic()
attach_to_pi_owner()
// observes EXITING
*exiting = owner; // takes ref
return -EBUSY
wait_for_owner_exiting(-EBUSY, owner)
put_task_struct(); // drops ref
// exiting still points to owner
goto retry;
futex_lock_pi_atomic()
lock_pi_update_atomic()
cmpxchg(uaddr)
*uaddr ^= WAITERS // whatever
// value changed
return -EAGAIN;
wait_for_owner_exiting(-EAGAIN, exiting) // stale
WARN_ON_ONCE(exiting)
Fix this by resetting upon retry, essentially aligning it with requeue_pi.
Fixes: 3ef240eaff36 ("futex: Prevent exit livelock")
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260326001759.4129680-1-dave@stgolabs.net
|
|
Boot-time trigger registration can fail before the trigger-data cleanup
kthread exists. Deferring those frees until late init is fine, but the
post-boot fallback must still drain the deferred list if kthread
creation never succeeds.
Otherwise, boot-deferred nodes can accumulate on
trigger_data_free_list, later frees fall back to synchronously freeing
only the current object, and the older queued entries are leaked
forever.
To trigger this, add the following to the kernel command line:
trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon
The second traceon trigger will fail and be freed. This triggers a NULL
pointer dereference and crashes the kernel.
Keep the deferred boot-time behavior, but when kthread creation fails,
drain the whole queued list synchronously. Do the same in the late-init
drain path so queued entries are not stranded there either.
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com
Fixes: 61d445af0a7c ("tracing: Add bulk garbage collection of freeing event_trigger_data")
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
After the introduction of clear_pages() we exploit the fact that the
process vm_area is allocated in contiguous pages to just clear them all in
one swift operation.
Link: https://lkml.kernel.org/r/20260224-mm-fork-clear-pages-v1-1-184c65a72d49@kernel.org
Signed-off-by: Linus Walleij <linusw@kernel.org>
Suggested-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://lore.kernel.org/linux-mm/dpnwsp7dl4535rd7qmszanw6u5an2p74uxfex4dh53frpb7pu3@2bnjjavjrepe/
Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/20240311164638.2015063-7-pasha.tatashin@soleen.com
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now that kernel_clone() checks valid_signal(args->exit_signal), the "sig"
argument of do_notify_parent() must always be valid or we have a bug.
However, do_notify_parent() only checks that sig != -1 at the start, then
it does another valid_signal() check before __send_signal_locked().
This is confusing. Change do_notify_parent() to WARN and return early if
valid_signal(sig) is false.
Link: https://lkml.kernel.org/r/abld-ilvMEZ7VgMw@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Deepanshu Kartikey <Kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, the buddy system only performs checks every 3rd sample. With a
4-second interval. If a check window is missed, the next check occurs 12
seconds later, potentially delaying hard lockup detection for up to 24
seconds.
Modify the buddy system to perform checks at every interval (4s).
Introduce a missed-interrupt threshold to maintain the existing grace
period while reducing the detection window to 8-12 seconds.
Best and worst case detection scenarios:
Before (12s check window):
- Best case: Lockup occurs after first check but just before heartbeat
interval. Detected in ~8s (8s till next check).
- Worst case: Lockup occurs just after a check.
Detected in ~24s (missed check + 12s till next check + 12s logic).
After (4s check window with threshold of 3):
- Best case: Lockup occurs just before a check.
Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd).
- Worst case: Lockup occurs just after a check.
Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd).
Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-4-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, arch_touch_nmi_watchdog() causes an early return that skips
updating hrtimer_interrupts_saved. This leads to stale comparisons and
delayed lockup detection.
I found this issue because in our system the serial console is fairly
chatty. For example, the 8250 console driver frequently calls
touch_nmi_watchdog() via console_write(). If a CPU locks up after a timer
interrupt but before next watchdog check, we see the following sequence:
* watchdog_hardlockup_check() saves counter (e.g., 1000)
* Timer runs and updates the counter (1001)
* touch_nmi_watchdog() is called
* CPU locks up
* 10s pass: check() notices touch, returns early, skips update
* 10s pass: check() saves counter (1001)
* 10s pass: check() finally detects lockup
This delays detection to 30 seconds. With this fix, we detect the lockup
in 20 seconds.
Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-2-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "watchdog/hardlockup: Improvements to hardlockup", v2.
This series addresses limitations in the hardlockup detector
implementations and updates the documentation to reflect actual behavior
and recent changes.
The changes are structured as follows:
Refactoring (Patch 1)
=====================
Patch 1 refactors watchdog_hardlockup_check() to return early if no
lockup is detected. This reduces the indentation level of the main
logic block, serving as a clean base for the subsequent changes.
Hardlockup Detection Improvements (Patches 2 & 4)
=================================================
The hardlockup detector logic relies on updating saved interrupt counts to
determine if the CPU is making progress.
Patch 1 ensures that the saved interrupt count is updated unconditionally
before checking the "touched" flag. This prevents stale comparisons which
can delay detection. This is a logic fix that ensures the detector
remains accurate even when the watchdog is frequently touched.
Patch 3 improves the Buddy detector's timeliness. The current checking
interval (every 3rd sample) causes high variability in detection time (up
to 24s). This patch changes the Buddy detector to check at every hrtimer
interval (4s) with a missed-interrupt threshold of 3, narrowing the
detection window to a consistent 8-12 second range.
Documentation Updates (Patches 3 & 5)
=====================================
The current documentation does not fully capture the variable nature of
detection latency or the details of the Buddy system.
Patch 3 removes the strict "10 seconds" definition of a hardlockup, which
was misleading given the periodic nature of the detector. It adds a
"Detection Overhead" section to the admin guide, using "Best Case" and
"Worst Case" scenarios to illustrate that detection time can vary
significantly (e.g., ~6s to ~20s).
Patch 5 adds a dedicated section for the Buddy detector, which was
previously undocumented. It details the mechanism, the new timing logic,
and known limitations.
This patch (of 5):
Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()` to
return early when a hardlockup is not detected. This flattens the main
logic block, reducing the indentation level and making the code easier to
read and maintain.
This refactoring serves as a preparation patch for future hardlockup
changes.
Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com
Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-1-45bd8a0cc7ed@google.com
Signed-off-by: Mayank Rungta <mrungta@google.com>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Max Kellermann <max.kellermann@ionos.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Stephane Erainan <eranian@google.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Cc: Yunhui Cui <cuiyunhui@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
kexec_core.c does not do any cryptographic hashing, so the header
crypto/hash.h is not needed at all.
Link: https://lkml.kernel.org/r/20260314204144.44884-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Several files related to kernel crash dumps include crypto/sha1.h but
never use any of its functionality. Remove these includes so that these
files don't unnecessarily come up in searches for which kernel code is
still using the obsolete SHA-1 algorithm.
Link: https://lkml.kernel.org/r/20260314204243.45001-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, the hung task reporting mechanism indiscriminately labels all
TASK_UNINTERRUPTIBLE (D) tasks as "blocked", irrespective of whether they
are awaiting I/O completion or kernel locking primitives. This ambiguity
compels system administrators to manually inspect stack traces to discern
whether the delay stems from an I/O wait (typically indicative of hardware
or filesystem anomalies) or software contention. Such detailed analysis
is not always immediately accessible to system administrators or support
engineers.
To address this, this patch utilises the existing in_iowait field within
struct task_struct to augment the failure report. If the task is blocked
due to I/O (e.g., via io_schedule_prepare()), the log message is updated
to explicitly state "blocked in I/O wait".
Examples:
- Standard Block: "INFO: task bash:123 blocked for more than 120
seconds".
- I/O Block: "INFO: task dd:456 blocked in I/O wait for more than
120 seconds".
Theoretically, concurrent executions of io_schedule_finish() could result
in a race condition where the read flag does not precisely correlate with
the subsequently printed backtrace. However, this limitation is deemed
acceptable in practice. The entire reporting mechanism is inherently racy
by design; nevertheless, it remains highly reliable in the vast majority
of cases, particularly because it primarily captures protracted stalls.
Consequently, introducing additional synchronisation to mitigate this
minor inaccuracy would be entirely disproportionate to the situation.
Link: https://lkml.kernel.org/r/20260303221324.4106917-1-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A recent change allowed to reset the global counter of hung tasks using
the sysctl interface. A potential race with the regular check has been
solved by updating the global counter only once at the end of the check.
However, the hung task check can take a significant amount of time,
particularly when task information is being dumped to slow serial
consoles. Some users monitor this global counter to trigger immediate
migration of critical containers. Delaying the increment until the full
check completes postpones these high-priority rescue operations.
Update the global counter as soon as a hung task is detected. Since the
value is read asynchronously, a relaxed atomic operation is sufficient.
Link: https://lkml.kernel.org/r/20260303203031.4097316-4-atomlin@atomlin.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reported-by: Lance Yang <lance.yang@linux.dev>
Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset once
an incident has been resolved. Furthermore, the previous implementation
relied upon implicit ordering, which could not strictly guarantee that
diagnostic metadata published by one CPU was visible to the panic logic on
another.
This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the counter.
The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:
1. Prevention of Load-Store Reordering via Acquire Semantics By
utilising atomic_long_read_acquire() to snapshot the counter
before initiating the task traversal, we establish a strict
memory barrier. This prevents the compiler or hardware from
reordering the initial load to a point later in the scan. Without
this "acquire" barrier, a delayed load could potentially read a
"0" value resulting from a userspace reset that occurred
mid-scan. This would lead to the subsequent cmpxchg succeeding
erroneously, thereby overwriting the user's reset with stale
increment data.
2. Atomicity of the "Commit" Phase via Release Semantics The
atomic_long_cmpxchg_release() serves as the transaction's commit
point. The "release" barrier ensures that all diagnostic
recordings and task-state observations made during the scan are
globally visible before the counter is incremented.
3. Race Condition Resolution This pairing effectively detects any
"out-of-band" reset of the counter. If
sysctl_hung_task_detect_count is modified via the procfs
interface during the scan, the final cmpxchg will detect the
discrepancy between the current value and the "acquire" snapshot.
Consequently, the update will fail, ensuring that a reset command
from the administrator is prioritised over a scan that may have
been invalidated by that very reset.
Link: https://lkml.kernel.org/r/20260303203031.4097316-3-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Joel Granados <joel.granados@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "hung_task: Provide runtime reset interface for hung task
detector", v9.
This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.
Writing a "0" value to this file atomically resets the counter of detected
hung tasks. This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without
necessitating a system restart.
This patch (of 3):
The check_hung_task() function currently conflates two distinct
responsibilities: validating whether a task is hung and handling the
subsequent reporting (printing warnings, triggering panics, or
tracepoints).
This patch refactors the logic by introducing hung_task_info(), a function
dedicated solely to reporting. The actual detection check,
task_is_hung(), is hoisted into the primary loop within
check_hung_uninterruptible_tasks(). This separation clearly decouples the
mechanism of detection from the policy of reporting.
Furthermore, to facilitate future support for concurrent hung task
detection, the global sysctl_hung_task_detect_count variable is converted
from unsigned long to atomic_long_t. Consequently, the counting logic is
updated to accumulate the number of hung tasks locally (this_round_count)
during the iteration. The global counter is then updated atomically via
atomic_long_cmpxchg_relaxed() once the loop concludes, rather than
incrementally during the scan.
These changes are strictly preparatory and introduce no functional change
to the system's runtime behaviour.
Link: https://lkml.kernel.org/r/20260303203031.4097316-1-atomlin@atomlin.com
Link: https://lkml.kernel.org/r/20260303203031.4097316-2-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Joel Granados <joel.granados@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Replace sprintf() with sysfs_emit() in sysfs show functions. sysfs_emit()
is preferred for formatting sysfs output because it provides safer bounds
checking. No functional changes.
Link: https://lkml.kernel.org/r/20260301125106.911980-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Both copy_process() and alloc_pid() do the same PIDNS_ADDING check. The
reasons for these checks, and the fact that both are necessary, are not
immediately obvious. Add the comments.
Link: https://lkml.kernel.org/r/aaGIRElc78U4Er42@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "pid: make sub-init creation retryable".
This patch (of 2):
Currently we allow only one attempt to create init in a new namespace. If
the first fork() fails after alloc_pid() succeeds, free_pid() clears
PIDNS_ADDING and thus disables further PID allocations.
Nowadays this looks like an unnecessary limitation. The original reason
to handle "case PIDNS_ADDING" in free_pid() is gone, most probably after
commit 69879c01a0c3 ("proc: Remove the now unnecessary internal mount of
proc").
Change free_pid() to keep ns->pid_allocated == PIDNS_ADDING, and change
alloc_pid() to reset the cursor early, right after taking pidmap_lock.
Test-case:
#define _GNU_SOURCE
#include <linux/sched.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <assert.h>
#include <sched.h>
#include <errno.h>
int main(void)
{
struct clone_args args = {
.exit_signal = SIGCHLD,
.flags = CLONE_PIDFD,
.pidfd = 0,
};
unsigned long pidfd;
int pid;
assert(unshare(CLONE_NEWPID) == 0);
pid = syscall(__NR_clone3, &args, sizeof(args));
assert(pid == -1 && errno == EFAULT);
args.pidfd = (unsigned long)&pidfd;
pid = syscall(__NR_clone3, &args, sizeof(args));
if (pid)
assert(pid > 0 && wait(NULL) == pid);
else
assert(getpid() == 1);
return 0;
}
Link: https://lkml.kernel.org/r/aaGHu3ixbw9Y7kFj@redhat.com
Link: https://lkml.kernel.org/r/aaGIHa7vGdwhEc_D@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrei Vagin <avagin@gmail.com>
Cc: Adrian Reber <areber@redhat.com>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Kirill Tkhai <tkhai@ya.ru>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The function read_key_from_user_keying() is missing an 'r' in its name.
Fix the typo by renaming it to read_key_from_user_keyring().
Link: https://lkml.kernel.org/r/20260227230422.859423-1-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
'key_count' is an 'unsigned int' and cannot be less than zero. Remove
the redundant condition.
Link: https://lkml.kernel.org/r/20260228085136.861971-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Cc: Baoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Replace simple_strtoul() with the recommended kstrtoul() for parsing the
'coredump_filter=' boot parameter.
Check the return value of kstrtoul() and reject invalid values. This adds
error handling while preserving behavior for existing values, and removes
use of the deprecated simple_strtoul() helper. The current code silently
sets 'default_dump_filter = 0' if parsing fails, instead of leaving the
default value (MMF_DUMP_FILTER_DEFAULT) unchanged.
Rename the static variable 'default_dump_filter' to 'coredump_filter'
since it does not necessarily contain the default value and the current
name can be misleading.
Link: https://lkml.kernel.org/r/20251215142152.4082-2-thorsten.blum@linux.dev
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The "(signal->core_state || !(signal->flags & SIGNAL_GROUP_EXIT))" check
in complete_signal() is not obvious at all, and in fact it only adds
unnecessary confusion: this condition is always true.
prepare_signal() does:
if (signal->flags & SIGNAL_GROUP_EXIT) {
if (signal->core_state)
return sig == SIGKILL;
/*
* The process is in the middle of dying, drop the signal.
*/
return false;
}
This means that "!signal->core_state && (signal->flags &
SIGNAL_GROUP_EXIT)" in complete_signal() is never possible.
If SIGNAL_GROUP_EXIT is set, prepare_signal() can only return true if
signal->core_state is not NULL.
Link: https://lkml.kernel.org/r/aZsfkDhnqJ4s1oTs@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc; Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
do_notify_parent()
thread_group_empty(tsk) is only possible if tsk is a group leader, and
thread_group_empty() already does the thread_group_leader() check.
So it makes no sense to check "thread_group_leader() &&
thread_group_empty()"; thread_group_empty() alone is enough.
Link: https://lkml.kernel.org/r/aZsfeegKZPZZszJh@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Kees Cook <kees@kernel.org>
Cc; Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
in alloc_taint_buf()
However there's a convention of assuming that __init-time allocations
cannot fail. Because if a kmalloc() were to fail at this time, the kernel
is hopelessly messed up anyway. So simply panic() if that kmalloc failed,
then make that 350-byte buffer __initdata.
Link: https://lkml.kernel.org/r/20260223035914.4033-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The buffer used to hold the taint string is statically allocated, which
requires updating whenever a new taint flag is added.
Instead, allocate the exact required length at boot once the allocator is
available in an init function. The allocation sums the string lengths in
taint_flags[], along with space for separators and formatting.
print_tainted() is switched to use this dynamically allocated buffer.
If allocation fails, print_tainted() warns about the failure and continues
to use the original static buffer as a fallback.
Link: https://lkml.kernel.org/r/20260222140804.22225-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The verbose 'Tainted: ...' string in print_tainted_seq can total to 327
characters while the buffer defined in _print_tainted is 320 bytes.
Increase its size to 350 characters to hold all flags, along with some
headroom.
[akpm@linux-foundation.org: fix spello, add comment]
Link: https://lkml.kernel.org/r/20260220151500.13585-1-rioo.tsukatsukii@gmail.com
Signed-off-by: Rio <rioo.tsukatsukii@gmail.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Wang Jinchao <wangjinchao600@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When set_cred_ucounts() fails in ksys_unshare() new_nsproxy is leaked.
Let's call put_nsproxy() if that happens.
Link: https://lkml.kernel.org/r/20260213193959.2556730-1-mge@meta.com
Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred")
Signed-off-by: Michal Grzedzicki <mge@meta.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexey Gladkov (Intel) <legion@kernel.org>
Cc: Ben Segall <bsegall@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Kees Cook <kees@kernel.org>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vlastimil Babka <vbabka@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl fix from Joel Granados:
"Fix uninitialized variable error when writing to a sysctl bitmap
Removed the possibility of returning an unjustified -EINVAL when
writing to a sysctl bitmap"
* tag 'sysctl-7.00-fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: fix uninitialized variable in proc_do_large_bitmap
|
|
The following sequence may leads deadlock in cpu hotplug:
task1 task2 task3
----- ----- -----
mutex_lock(&interface_lock)
[CPU GOING OFFLINE]
cpus_write_lock();
osnoise_cpu_die();
kthread_stop(task3);
wait_for_completion();
osnoise_sleep();
mutex_lock(&interface_lock);
cpus_read_lock();
[DEAD LOCK]
Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock).
Cc: stable@vger.kernel.org
Cc: <mathieu.desnoyers@efficios.com>
Cc: <zhang.run@zte.com.cn>
Cc: <yang.tao172@zte.com.cn>
Cc: <ran.xiaokai@zte.com.cn>
Fixes: bce29ac9ce0bb ("trace: Add osnoise tracer")
Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
scx_bpf_dsq_move_to_local()
Add a comment explaining the design intent behind rejecting built-in DSQs
(%SCX_DSQ_GLOBAL and %SCX_DSQ_LOCAL*) as sources. Local DSQs support
reenqueueing but the BPF scheduler cannot directly iterate or move tasks
from them. %SCX_DSQ_GLOBAL is similar but also doesn't support
reenqueueing because it maps to multiple per-node DSQs, making the scope
difficult to define.
Also annotate @dsq_id to make clear it must be a user-created DSQ.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
get_data() has a sanity check for regular data blocks to ensure at
least space for the ID exists. But a regular block should also have
at least 1 byte of data (otherwise it would be data-less instead of
regular).
Expand the get_data() block size sanity check to additionally expect
at least 1 byte of data.
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260326133809.8045-2-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
Commit cc3bad11de6e ("printk_ringbuffer: Fix check of valid data
size when blk_lpos overflows") added sanity checking to get_data()
to avoid returning data of illegal sizes (too large or too small).
It uses the helper function data_check_size() for the check.
However, data_check_size() expects the size of the data, not the
size of the data block. get_data() is providing the size of the
data block. This means that if the data size (text_buf_size) is
at or near the maximum legal size:
sizeof(prb_data_block) + text_buf_size == DATA_SIZE(data_ring) / 2
data_check_size() will report failure because it adds
sizeof(prb_data_block) to the provided size. The sanity check in
get_data() is counting the data block header twice. The result is
that the reader fails to read the legal record.
Since get_data() subtracts the data block header size before returning,
move the sanity check to after the subtraction.
Luckily printk() is not vulnerable to this problem because
truncate_msg() limits printk-messages to 1/4 of the ringbuffer.
Indeed, by adjusting the printk_ringbuffer KUnit test, which does not
use printk() and its truncate_msg() check, it is easy to see that the
reader fails and the WARN_ON is triggered.
Fixes: cc3bad11de6e ("printk_ringbuffer: Fix check of valid data size when blk_lpos overflows")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://patch.msgid.link/20260326133809.8045-1-john.ogness@linutronix.de
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|
A bunch of new hooks for managing block devices were added a while ago
but they weren't actually appropriately classified.
* bpf_lsm_bdev_alloc() is called when the inode for the block
device is allocated. This happens from a sleepable context so mark the
function as sleepable. When this function is called the memory for the
block device storage embedded into the inode is zeroed. That block
device cannot be meaningfully reference or interacted with at this
point. So mark it as untrusted for now.
* bpf_lsm_bdev_free() is called when the inode for the block
device is freed. A bunch of memory associated with the block device
has already been freed and there's dangling pointers in there. So mark
it as untrusted. It cannot be meaningfully referenced or interacted
with anymore. It is also called from sb->s_op->free_inode:: which
means it runs in rcu context (most of the times). So leave it as
non-sleepable.
* bpf_lsm_bdev_setintegrity() is called when a dm-verity device
is instantiated (glossing over details for simplicity of the commit
message). The block device is very much alive so it remains a trusted
hook. It's also called with device mapper's suspend lock held and so
the hook is able to sleep so mark it sleepable.
Signed-off-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20260326-work-bpf-bdev-v2-1-5e3c58963987@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
When a bridge window contains big and small resource(s), the small
resource(s) may not amount to the half of the size of the big resource
which would allow calculate_head_align() to shrink the head alignment.
This results in always placing the small resource(s) after the big
resource.
In general, it would be good to be able to place the small resource(s)
before the big resource to achieve better utilization of the address space.
In the cases where the large resource can only fit at the end of the
window, it is even required.
However, carrying the information over from pbus_size_mem() and
calculate_head_align() to __pci_assign_resource() and
pcibios_align_resource() is not easy with the current data structures.
A somewhat hacky way to move the non-aligning tail part to the head is
possible within pcibios_align_resource(). The free space between the start
of the free space span and the aligned start address can be compared with
the non-aligning remainder of the size. If the free space is larger than
the remainder, placing the remainder before the start address is possible.
This relocation should generally work, because PCI resources consist only
power-of-2 atoms.
Various arch requirements may still need to override the relocation, so the
relocation is only applied selectively in such cases.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221205
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-10-ilpo.jarvinen@linux.intel.com
|
|
__find_resource_space() has variable called 'tmp'. Rename it to
'full_avail' to better indicate its purpose.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-4-ilpo.jarvinen@linux.intel.com
|
|
__find_resource_space() calculates the full extent of empty space but only
passes the aligned space to resource_alignf callback. In some situations,
the callback may choose take advantage of the free space before the
requested alignment.
Pass the full extent of the calculated empty space to resource_alignf
callback as an additional parameter.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-3-ilpo.jarvinen@linux.intel.com
|
|
|
|
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
|
|
Validate layout if present, but because the kernel must be
strict in what it accepts, reject BTF with unsupported kinds,
even if they are in the layout information.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260326145444.2076244-8-alan.maguire@oracle.com
|
|
__find_resource_space() currently uses resource_contains() but for
tentative resources that are not yet crafted into the resource tree. As
resource_contains() checks that IORESOURCE_UNSET is not set for either of
the resources, the caller has to hack around this problem by clearing the
IORESOURCE_UNSET flag (essentially lying to resource_contains()).
Instead of the hack, introduce __resource_contains_unbound() for cases like
this.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Xifer <xiferdev@gmail.com>
Link: https://patch.msgid.link/20260324165633.4583-2-ilpo.jarvinen@linux.intel.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the core and one in the
conservative governor, and two issues related to system sleep:
- Restore the cpufreq core behavior changed inadvertently during the
6.19 development cycle to call cpufreq_frequency_table_cpuinfo()
for cpufreq policies getting re-initialized which ensures that
policy->max and policy->cpuinfo_max_freq will be valid going
forward (Viresh Kumar)
- Adjust the cached requested frequency in the conservative cpufreq
governor on policy limits changes to prevent it from becoming stale
in some cases (Viresh Kumar)
- Prevent pm_restore_gfp_mask() from triggering a WARN_ON() in some
code paths in which it is legitimately called without invoking
pm_restrict_gfp_mask() previously (Youngjun Park)
- Update snapshot_write_finalize() to take trailing zero pages into
account properly which prevents user space restore from failing
subsequently in some cases (Alberto Garcia)"
* tag 'pm-7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: Drop spurious WARN_ON() from pm_restore_gfp_mask()
PM: hibernate: Drain trailing zero pages on userspace restore
cpufreq: conservative: Reset requested_freq on limits change
cpufreq: Don't skip cpufreq_frequency_table_cpuinfo()
|
|
Add optional reserved memory callbacks to perform region verification and
early fixup, then move all CMA related code in of_reserved_mem.c to them.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260325090023.3175348-5-m.szyprowski@samsung.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
Move init function from OF_DECLARE() argument to the given reserved
memory region ops structure and then pass that structure to the
OF_DECLARE() initializer. This node_init callback is mandatory for the
reserved mem driver. Such change makes it possible in the future to add
more functions called by the generic code before given memory region is
initialized and rmem object is created.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260325090023.3175348-4-m.szyprowski@samsung.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
When given reserved memory region doesn't really support given node,
return -ENODEV instead of -ENOENT. Then fix __reserved_mem_init_node()
function to properly propagate error code different from -ENODEV instead
of silently ignoring it.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260325090023.3175348-3-m.szyprowski@samsung.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
FDT node is not needed for anything besides the initialization, so it can
be simply passed as an argument to the reserved memory region init
function.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Link: https://patch.msgid.link/20260325090023.3175348-2-m.szyprowski@samsung.com
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
When a caller enqueues a work item using schedule_delayed_work() the used
wq is "system_wq" (per-cpu wq) while queue_delayed_work() uses
WORK_CPU_UNBOUND (used when no target CPU is specified). The same applies
to schedule_work() that is using system_wq and queue_work(), which again
makes use of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
Continue the effort to refactor workqueue APIs, which began with the
introduction of new workqueues and a new alloc_workqueue() flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
and switch smp_call_on_cpu() to use system_percpu_wq because system_wq is
going away once the ongoing workqueue restructuring is done.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://patch.msgid.link/20251110170332.319314-1-marco.crivellari@suse.com
|