| Age | Commit message (Collapse) | Author |
|
As discussed in [1], removing __cond_lock() will improve the readability
of trylock code. Now that Sparse context tracking support has been
removed, we can also remove __cond_lock().
Change existing APIs to either drop __cond_lock() completely, or make
use of the __cond_acquires() function attribute instead.
In particular, spinlock and rwlock implementations required switching
over to inline helpers rather than statement-expressions for their
trylock_* variants.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250207082832.GU7145@noisy.programming.kicks-ass.net/ [1]
Link: https://patch.msgid.link/20251219154418.3592607-25-elver@google.com
|
|
Remove Sparse support as discussed at [1].
The kernel codebase is still scattered with numerous places that try to
appease Sparse's context tracking ("annotation for sparse", "fake out
sparse", "work around sparse", etc.). Eventually, as more subsystems
enable Clang's context analysis, these places will show up and need
adjustment or removal of the workarounds altogether.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250207083335.GW7145@noisy.programming.kicks-ass.net/ [1]
Link: https://lore.kernel.org/all/Z6XTKTo_LMj9KmbY@elver.google.com/ [2]
Link: https://patch.msgid.link/20251219154418.3592607-24-elver@google.com
|
|
With Clang's context analysis, the compiler is a bit more strict about
what goes into the __acquires/__releases annotations and can't refer to
non-existent variables.
On an UM build, mm_id.h is transitively included into mm_types.h, and we
can observe the following error (if context analysis is enabled in e.g.
stackdepot.c):
In file included from lib/stackdepot.c:17:
In file included from include/linux/debugfs.h:15:
In file included from include/linux/fs.h:5:
In file included from include/linux/fs/super.h:5:
In file included from include/linux/fs/super_types.h:7:
In file included from include/linux/list_lru.h:14:
In file included from include/linux/xarray.h:16:
In file included from include/linux/gfp.h:7:
In file included from include/linux/mmzone.h:22:
In file included from include/linux/mm_types.h:26:
In file included from arch/um/include/asm/mmu.h:12:
>> arch/um/include/shared/skas/mm_id.h:24:54: error: use of undeclared identifier 'turnstile'
24 | void enter_turnstile(struct mm_id *mm_id) __acquires(turnstile);
| ^~~~~~~~~
arch/um/include/shared/skas/mm_id.h:25:53: error: use of undeclared identifier 'turnstile'
25 | void exit_turnstile(struct mm_id *mm_id) __releases(turnstile);
| ^~~~~~~~~
One (discarded) option was to use token_context_lock(turnstile) to just
define a token with the already used name, but that would not allow the
compiler to distinguish between different mm_id-dependent instances.
Another constraint is that struct mm_id is only declared and incomplete
in the header, so even if we tried to construct an expression to get to
the mutex instance, this would fail (including more headers transitively
everywhere should also be avoided).
Instead, just declare an mm_id-dependent helper to return the mutex, and
use the mm_id-dependent call expression in the __acquires/__releases
attributes; the compiler will consider the identity of the mutex to be
the call expression. Then using __get_turnstile() in the lock/unlock
wrappers (with context analysis enabled for mmu.c) the compiler will be
able to verify the implementation of the wrappers as-is.
We leave context analysis disabled in arch/um/kernel/skas/ for now. This
change is a preparatory change to allow enabling context analysis in
subsystems that include any of the above headers.
No functional change intended.
Closes: https://lore.kernel.org/oe-kbuild-all/202512171220.vHlvhpCr-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-23-elver@google.com
|
|
When compiling include/linux/debugfs.h with CONTEXT_ANALYSIS enabled, we
can see this error:
./include/linux/debugfs.h:239:17: error: use of undeclared identifier 'cancellation'
239 | void __acquires(cancellation)
Move the __acquires(..) attribute after the declaration, so that the
compiler can see the cancellation function argument, as well as making
struct debugfs_cancellation a real context lock to benefit from Clang's
context analysis.
This change is a preparatory change to allow enabling context analysis
in subsystems that include the above header.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251219154418.3592607-22-elver@google.com
|
|
Add support for Clang's context analysis for ww_mutex.
The programming model for ww_mutex is subtly more complex than other
locking primitives when using ww_acquire_ctx. Encoding the respective
pre-conditions for ww_mutex lock/unlock based on ww_acquire_ctx state
using Clang's context analysis makes incorrect use of the API harder.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-21-elver@google.com
|
|
Add support for Clang's context analysis for local_lock_t and
local_trylock_t.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-20-elver@google.com
|
|
Including <linux/local_lock.h> into an empty TU will result in the
compiler complaining:
./include/linux/local_lock.h: In function ‘class_local_lock_irqsave_constructor’:
./include/linux/local_lock_internal.h:95:17: error: implicit declaration of function ‘local_irq_save’; <...>
95 | local_irq_save(flags); \
| ^~~~~~~~~~~~~~
As well as (some architectures only, such as 'sh'):
./include/linux/local_lock_internal.h: In function ‘local_lock_acquire’:
./include/linux/local_lock_internal.h:33:20: error: ‘current’ undeclared (first use in this function)
33 | l->owner = current;
Include missing headers to allow including local_lock.h where the
required headers are not otherwise included.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251219154418.3592607-19-elver@google.com
|
|
Add support for Clang's context analysis for rw_semaphore.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-18-elver@google.com
|
|
Mark functions that conditionally acquire the passed lock.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251219154418.3592607-17-elver@google.com
|
|
Add support for Clang's context analysis for SRCU.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://patch.msgid.link/20251219154418.3592607-16-elver@google.com
|
|
Improve the existing annotations to properly support Clang's context
analysis.
The old annotations distinguished between RCU, RCU_BH, and RCU_SCHED;
however, to more easily be able to express that "hold the RCU read lock"
without caring if the normal, _bh(), or _sched() variant was used we'd
have to remove the distinction of the latter variants: change the _bh()
and _sched() variants to also acquire "RCU".
When (and if) we introduce context locks to denote more generally that
"IRQ", "BH", "PREEMPT" contexts are disabled, it would make sense to
acquire these instead of RCU_BH and RCU_SCHED respectively.
The above change also simplified introducing __guarded_by support, where
only the "RCU" context lock needs to be held: introduce __rcu_guarded,
where Clang's context analysis warns if a pointer is dereferenced
without any of the RCU locks held, or updated without the appropriate
helpers.
The primitives rcu_assign_pointer() and friends are wrapped with
context_unsafe(), which enforces using them to update RCU-protected
pointers marked with __rcu_guarded.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://patch.msgid.link/20251219154418.3592607-15-elver@google.com
|
|
The annotations for bit_spinlock.h have simply been using "bitlock" as
the token. For Sparse, that was likely sufficient in most cases. But
Clang's context analysis is more precise, and we need to ensure we
can distinguish different bitlocks.
To do so, add a token context, and a macro __bitlock(bitnum, addr)
that is used to construct unique per-bitlock tokens.
Add the appropriate test.
<linux/list_bl.h> is implicitly included through other includes, and
requires 2 annotations to indicate that acquisition (without release)
and release (without prior acquisition) of its bitlock is intended.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-14-elver@google.com
|
|
Including <linux/bit_spinlock.h> into an empty TU will result in the
compiler complaining:
./include/linux/bit_spinlock.h:34:4: error: call to undeclared function 'cpu_relax'; <...>
34 | cpu_relax();
| ^
1 error generated.
Include <asm/processor.h> to allow including bit_spinlock.h where
<asm/processor.h> is not otherwise included.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251219154418.3592607-13-elver@google.com
|
|
Add support for Clang's context analysis for seqlock_t.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-12-elver@google.com
|
|
Add support for Clang's context analysis for mutex.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-11-elver@google.com
|
|
While Sparse is oblivious to the return value of conditional acquire
functions, Clang's context analysis needs to know the return value
which indicates successful acquisition.
Add the additional argument, and convert existing uses.
Notably, Clang's interpretation of the value merely relates to the use
in a later conditional branch, i.e. 1 ==> context lock acquired in
branch taken if condition non-zero, and 0 ==> context lock acquired in
branch taken if condition is zero. Given the precise value does not
matter, introduce symbolic variants to use instead of either 0 or 1,
which should be more intuitive.
No functional change intended.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-10-elver@google.com
|
|
Add support for Clang's context analysis for raw_spinlock_t,
spinlock_t, and rwlock. This wholesale conversion is required because
all three of them are interdependent.
To avoid warnings in constructors, the initialization functions mark a
lock as acquired when initialized before guarded variables.
The test verifies that common patterns do not generate false positives.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-9-elver@google.com
|
|
Clang's context analysis can be made aware of functions that assert that
locks are held.
Presence of these annotations causes the analysis to assume the context
lock is held after calls to the annotated function, and avoid false
positives with complex control-flow; for example, where not all
control-flow paths in a function require a held lock, and therefore
marking the function with __must_hold(..) is inappropriate.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-8-elver@google.com
|
|
Introduce basic compatibility with cleanup.h infrastructure.
We need to allow the compiler to see the acquisition and release of the
context lock at the start and end of a scope. However, the current
"cleanup" helpers wrap the lock in a struct passed through separate
helper functions, which hides the lock alias from the compiler (no
inter-procedural analysis).
While Clang supports scoped guards in C++, it's not possible to apply in
C code: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html#scoped-context
However, together with recent improvements to Clang's alias analysis
abilities, idioms such as this work correctly now:
void spin_unlock_cleanup(spinlock_t **l) __releases(*l) { .. }
...
{
spinlock_t *lock_scope __cleanup(spin_unlock_cleanup) = &lock;
spin_lock(&lock); // lock through &lock
... critical section ...
} // unlock through lock_scope -[alias]-> &lock (no warnings)
To generalize this pattern and make it work with existing lock guards,
introduce DECLARE_LOCK_GUARD_1_ATTRS() and WITH_LOCK_GUARD_1_ATTRS().
These allow creating an explicit alias to the context lock instance that
is "cleaned" up with a separate cleanup helper. This helper is a dummy
function that does nothing at runtime, but has the release attributes to
tell the compiler what happens at the end of the scope.
Example usage:
DECLARE_LOCK_GUARD_1_ATTRS(mutex, __acquires(_T), __releases(*(struct mutex **)_T))
#define class_mutex_constructor(_T) WITH_LOCK_GUARD_1_ATTRS(mutex, _T)
Note: To support the for-loop based scoped helpers, the auxiliary
variable must be a pointer to the "class" type because it is defined in
the same statement as the guard variable. However, we initialize it with
the lock pointer (despite the type mismatch, the compiler's alias
analysis still works as expected). The "_unlock" attribute receives a
pointer to the auxiliary variable (a double pointer to the class type),
and must be cast and dereferenced appropriately.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-7-elver@google.com
|
|
Warn about applications of context_unsafe() without a comment, to
encourage documenting the reasoning behind why it was deemed safe.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-6-elver@google.com
|
|
Adds documentation in Documentation/dev-tools/context-analysis.rst, and
adds it to the index.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-5-elver@google.com
|
|
Add a simple test stub where we will add common supported patterns that
should not generate false positives for each new supported context lock.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-4-elver@google.com
|
|
Context Analysis is a language extension, which enables statically
checking that required contexts are active (or inactive), by acquiring
and releasing user-definable "context locks". An obvious application is
lock-safety checking for the kernel's various synchronization primitives
(each of which represents a "context lock"), and checking that locking
rules are not violated.
Clang originally called the feature "Thread Safety Analysis" [1]. This
was later changed and the feature became more flexible, gaining the
ability to define custom "capabilities". Its foundations can be found in
"Capability Systems" [2], used to specify the permissibility of
operations to depend on some "capability" being held (or not held).
Because the feature is not just able to express "capabilities" related
to synchronization primitives, and "capability" is already overloaded in
the kernel, the naming chosen for the kernel departs from Clang's
"Thread Safety" and "capability" nomenclature; we refer to the feature
as "Context Analysis" to avoid confusion. The internal implementation
still makes references to Clang's terminology in a few places, such as
`-Wthread-safety` being the warning option that also still appears in
diagnostic messages.
[1] https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
[2] https://www.cs.cornell.edu/talc/papers/capabilities.pdf
See more details in the kernel-doc documentation added in this and
subsequent changes.
Clang version 22+ is required.
[peterz: disable the thing for __CHECKER__ builds]
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251219154418.3592607-3-elver@google.com
|
|
The conditional definition of lock checking macros and attributes is
about to become more complex. Factor them out into their own header for
better readability, and to make it obvious which features are supported
by which mode (currently only Sparse). This is the first step towards
generalizing towards "context analysis".
No functional change intended.
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251219154418.3592607-2-elver@google.com
|
|
SGUnit interrupts are not initialized in survivability. Force I2C
controller to polling mode while in survivability.
v2: Use helper function instead of manual check (Riana)
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260105080750.16605-1-raag.jadav@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
|
|
resctrl assumes that all monitor events can be displayed as unsigned decimal
integers.
Hardware architecture counters may provide some telemetry events with greater
precision where the event is not a simple count, but is a measurement of some
sort (e.g. Joules for energy consumed).
Add a new argument to resctrl_enable_mon_event() for architecture code to
inform the file system that the value for a counter is a fixed-point value
with a specific number of binary places.
Only allow architecture to use floating point format on events that the file
system has marked with mon_evt::is_floating_point which reflects the contract
with user space on how the event values are displayed.
Display fixed point values with values rounded to ceil(binary_bits * log10(2))
decimal places. Special case for zero binary bits to print "{value}.0".
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
|
|
Commit 5d0bfe448481 ("drm/meson: Add HDMI 1.4 4k modes") added support
for HDMI 1.4 4k modes, which is what TVs need. For computer monitors
the code is using the DMT code-path, which ends up in
meson_venc_hdmi_supported_mode(), which does not allow the 4k modes yet.
The datasheet for all supported SoCs mentions "4Kx2K@60". It's not
clear whether "4K" here means 3840 or 4096 pixels.
Allow resolutions up to 3840x2160 pixels (including middle steps, such
as WQHD at 2560x1440 pixels) so they can be used with computer monitors
(using the DMT code-path in the driver).
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20251108134236.1299630-1-martin.blumenstingl@googlemail.com
|
|
Add a node for the TC9563 PCIe switch, which has three downstream ports.
Two embedded Ethernet devices are present on one of the downstream ports.
As all these ports are present in the node represent the downstream
ports and embedded endpoints.
Power to the TC9563 is supplied through two LDO regulators, controlled by
two GPIOs, which are added as fixed regulators. Configure the TC9563
through I2C.
Reviewed-by: Bjorn Andersson <andersson@kernel.org>
Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260105-tc9563-v1-1-642fd1fe7893@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The duty-cycle calculation in clk_rcg2_set_duty_cycle() currently
derives an intermediate percentage `duty_per = (num * 100) / den` and
then computes:
d = DIV_ROUND_CLOSEST(n * duty_per * 2, 100);
This introduces integer truncation at the percentage step (division by
`den`) and a redundant scaling by 100, which can reduce precision for
large `den` and skew the final rounding.
Compute `2d` directly from the duty fraction to preserve precision and
avoid the unnecessary scaling:
d = DIV_ROUND_CLOSEST(n * duty->num * 2, duty->den);
This keeps the intended formula `d ≈ n * 2 * (num/den)` while performing
a single, final rounded division, improving accuracy especially for small
duty cycles or large denominators. It also removes the unused `duty_per`
variable, simplifying the code.
There is no functional changes beyond improved numerical accuracy.
Fixes: 7f891faf596ed ("clk: qcom: clk-rcg2: Add support for duty-cycle for RCG")
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260105-duty_cycle_precision-v2-1-d1d466a6330a@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
strcpy() has been deprecated¹ because it performs no bounds checking on the
destination buffer, which can lead to buffer overflows. Use the safer
strscpy() instead.
¹ https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20251124125455.5495-2-thorsten.blum@linux.dev
|
|
Drop redundant llcc7_base entry from Glymur LLCC reg-items
Fixes: bd0b8028ce5f ("dt-bindings: cache: qcom,llcc: Document Glymur LLCC block")
Signed-off-by: Pankaj Patil <pankaj.patil@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260105130050.1062903-1-pankaj.patil@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
After kthread creation during probe sequence, a handful of other
failures could occur. If this were to happen, the kthread is never
explicitly deleted which results in a resource leak. Add explicit cleanup
of this resource.
Signed-off-by: Brandon Brnich <b-brnich@ti.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
AV1 specification describes 3 possibles tx modes: 4x4 only, largest and
select. The hardware allows 5 possibles tx modes: 4x4 only, 8x8, 16x16,
32x32 and select. Since the both aren't exactly matching we need to add
a mapping function to set the correct mode on hardware.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Fixes: 727a400686a2c ("media: verisilicon: Add Rockchip AV1 decoder")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
If all the fields of the CDEF parameters are zero (which is the default),
then av1_enable_cdef register needs to be unset
(despite the V4L2_AV1_SEQUENCE_FLAG_ENABLE_CDEF possibly being set).
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Fixes: 727a400686a2c ("media: verisilicon: Add Rockchip AV1 decoder")
Cc: stable@vger.kernel.org
Reported-by: Jianfeng Liu <liujianfeng1994@gmail.com>
Closes: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4786
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
[hverkuil: dropped Link tag since it just duplicated the Closes: URL]
|
|
For the i.MX8MQ platform, there is a hardware limitation: the g1 VPU and
g2 VPU cannot decode simultaneously; otherwise, it will cause below bus
error and produce corrupted pictures, even potentially lead to system hang.
[ 110.527986] hantro-vpu 38310000.video-codec: frame decode timed out.
[ 110.583517] hantro-vpu 38310000.video-codec: bus error detected.
Therefore, it is necessary to ensure that g1 and g2 operate alternately.
This allows for successful multi-instance decoding of H.264 and HEVC.
To achieve this, g1 and g2 share the same v4l2_m2m_dev, and then the
v4l2_m2m_dev can handle the scheduling.
Fixes: cb5dd5a0fa518 ("media: hantro: Introduce G2/HEVC decoder")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Co-developed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Adding a reference count to the v4l2_m2m_dev structure allow safely
sharing it across multiple hardware nodes. This can be used to prevent
running jobs concurrently on m2m cores that have some internal resource
sharing.
Signed-off-by: Ming Qian <ming.qian@oss.nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
[hverkuil: fix typos in v4l2_m2m_put documentation]
|
|
This is not supported by the hardware and trying to decode
these leads to LAT timeout errors.
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Rework how requests are completed in the MediaTek VCodec driver, by
implementing the new manual request completion feature, which allows to
keep a request open while allowing to add new bitstream data.
This is useful in this case, because the hardware has a LAT and a core
decode work, after the LAT decode the bitstream isn't required anymore
so the source buffer can be set done and the request stays open until
the core decode work finishes.
Signed-off-by: Sebastian Fricke <sebastian.fricke@collabora.com>
Co-developed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Keep track of the number of requests and request objects of a media
device. Helps to verify that all request-related memory is freed.
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
|
|
Manually complete the requests: this tests the manual completion
code.
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
|
|
By default when the last request object is completed, the whole
request completes as well.
But sometimes you want to delay this completion to an arbitrary point in
time so add a manual complete mode for this.
In req_queue the driver marks the request for manual completion by
calling media_request_mark_manual_completion, and when the driver
wants to manually complete the request it calls
media_request_manual_complete().
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
|
|
In wave5_vpu_open_enc() and wave5_vpu_open_dec(), a vpu instance is
allocated via kzalloc(). If the subsequent allocation for inst->codec_info
fails, the functions return -ENOMEM without freeing the previously
allocated instance, causing a memory leak.
Fix this by calling kfree() on the instance in this error path to ensure
it is properly released.
Fixes: 9707a6254a8a6 ("media: chips-media: wave5: Add the v4l2 layer")
Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
The current decoding method was to wait until each frame was
decoded after feeding a bitstream. As a result, performance was low
and Wave5 could not achieve max pixel processing rate.
Update driver to use an asynchronous approach for decoding and feeding a
bitstream in order to achieve full capabilities of the device.
WAVE5 supports command-queueing to maximize performance by pipelining
internal commands and by hiding wait cycle taken to receive a command
from Host processor.
Instead of waiting for each command to be executed before sending the
next command, Host processor just places all the commands in the
command-queue and goes on doing other things while the commands in the
queue are processed by VPU.
While Host processor handles its own tasks, it can receive VPU interrupt
request (IRQ).
In this case, host processor can simply exit interrupt service routine
(ISR) without accessing to host interface to read the result of the
command reported by VPU.
After host processor completed its tasks, host processor can read the
command result when host processor needs the reports and does
response processing.
To achieve this goal, the device_run() calls v4l2_m2m_job_finish
so that next command can be sent to VPU continuously, if there is
any result, then irq is triggered and gets decoded frames and returns
them to upper layer.
Theses processes work independently each other without waiting
a decoded frame.
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Tested-by: Brandon Brnich <b-brnich@ti.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
The dec_output_info should not be a null pointer, WARN_ON around it to
indicates a driver issue.
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Tested-by: Brandon Brnich <b-brnich@ti.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
When multi instances are created/destroyed, many interrupts happens
and structures for decoder are removed.
"struct vpu_instance" this structure is shared for all flow in the decoder,
so if the structure is not protected by lock, Null dereference
could happens sometimes.
IRQ Handler was spilt to two phases and Lock was added as well.
Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
Cc: stable@vger.kernel.org
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Tested-by: Brandon Brnich <b-brnich@ti.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
SError of kernel panic rarely happened while testing fluster.
The root cause was to enter suspend mode because timeout of autosuspend
delay happened.
[ 48.834439] SError Interrupt on CPU0, code 0x00000000bf000000 -- SError
[ 48.834455] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
[ 48.834461] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
[ 48.834464] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 48.834468] pc : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[ 48.834488] lr : wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[ 48.834495] sp : ffff8000856e3a30
[ 48.834497] x29: ffff8000856e3a30 x28: ffff0008093f6010 x27: ffff000809158130
[ 48.834504] x26: 0000000000000000 x25: ffff00080b625000 x24: ffff000804a9ba80
[ 48.834509] x23: ffff000802343028 x22: ffff000809158150 x21: ffff000802218000
[ 48.834513] x20: ffff0008093f6000 x19: ffff0008093f6000 x18: 0000000000000000
[ 48.834518] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffff74009618
[ 48.834523] x14: 000000010000000c x13: 0000000000000000 x12: 0000000000000000
[ 48.834527] x11: ffffffffffffffff x10: ffffffffffffffff x9 : ffff000802343028
[ 48.834532] x8 : ffff00080b6252a0 x7 : 0000000000000038 x6 : 0000000000000000
[ 48.834536] x5 : ffff00080b625060 x4 : 0000000000000000 x3 : 0000000000000000
[ 48.834541] x2 : 0000000000000000 x1 : ffff800084bf0118 x0 : ffff800084bf0000
[ 48.834547] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 48.834549] CPU: 0 UID: 0 PID: 1067 Comm: v4l2h265dec0:sr Not tainted 6.12.9-gc9e21a1ebd75-dirty #7
[ 48.834554] Hardware name: ti Texas Instruments J721S2 EVM/Texas Instruments J721S2 EVM, BIOS 2025.01-00345-gbaf3aaa8ecfa 01/01/2025
[ 48.834556] Call trace:
[ 48.834559] dump_backtrace+0x94/0xec
[ 48.834574] show_stack+0x18/0x24
[ 48.834579] dump_stack_lvl+0x38/0x90
[ 48.834585] dump_stack+0x18/0x24
[ 48.834588] panic+0x35c/0x3e0
[ 48.834592] nmi_panic+0x40/0x8c
[ 48.834595] arm64_serror_panic+0x64/0x70
[ 48.834598] do_serror+0x3c/0x78
[ 48.834601] el1h_64_error_handler+0x34/0x4c
[ 48.834605] el1h_64_error+0x64/0x68
[ 48.834608] wave5_dec_clr_disp_flag+0x40/0x80 [wave5]
[ 48.834615] wave5_vpu_dec_clr_disp_flag+0x54/0x80 [wave5]
[ 48.834622] wave5_vpu_dec_buf_queue+0x19c/0x1a0 [wave5]
[ 48.834628] __enqueue_in_driver+0x3c/0x74 [videobuf2_common]
[ 48.834639] vb2_core_qbuf+0x508/0x61c [videobuf2_common]
[ 48.834646] vb2_qbuf+0xa4/0x168 [videobuf2_v4l2]
[ 48.834656] v4l2_m2m_qbuf+0x80/0x238 [v4l2_mem2mem]
[ 48.834666] v4l2_m2m_ioctl_qbuf+0x18/0x24 [v4l2_mem2mem]
[ 48.834673] v4l_qbuf+0x48/0x5c [videodev]
[ 48.834704] __video_do_ioctl+0x180/0x3f0 [videodev]
[ 48.834725] video_usercopy+0x2ec/0x68c [videodev]
[ 48.834745] video_ioctl2+0x18/0x24 [videodev]
[ 48.834766] v4l2_ioctl+0x40/0x60 [videodev]
[ 48.834786] __arm64_sys_ioctl+0xa8/0xec
[ 48.834793] invoke_syscall+0x44/0x100
[ 48.834800] el0_svc_common.constprop.0+0xc0/0xe0
[ 48.834804] do_el0_svc+0x1c/0x28
[ 48.834809] el0_svc+0x30/0xd0
[ 48.834813] el0t_64_sync_handler+0xc0/0xc4
[ 48.834816] el0t_64_sync+0x190/0x194
[ 48.834820] SMP: stopping secondary CPUs
[ 48.834831] Kernel Offset: disabled
[ 48.834833] CPU features: 0x08,00002002,80200000,4200421b
[ 48.834837] Memory Limit: none
[ 49.161404] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
Fixes: 2092b3833487 ("media: chips-media: wave5: Support runtime suspend/resume")
Cc: stable@vger.kernel.org
Signed-off-by: Jackson Lee <jackson.lee@chipsnmedia.com>
Signed-off-by: Nas Chung <nas.chung@chipsnmedia.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Tested-by: Brandon Brnich <b-brnich@ti.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Move video device unregistration to the beginning of the remove function
to ensure all video operations are stopped before cleaning up the worker
thread and disabling PM runtime. This prevents hardware register access
after the device has been powered down.
In polling mode, the hrtimer periodically triggers
wave5_vpu_timer_callback() which queues work to the kthread worker.
The worker executes wave5_vpu_irq_work_fn() which reads hardware
registers via wave5_vdi_read_register().
The original cleanup order disabled PM runtime and powered down hardware
before unregistering video devices. When autosuspend triggers and powers
off the hardware, the video devices are still registered and the worker
thread can still be triggered by the hrtimer, causing it to attempt
reading registers from powered-off hardware. This results in a bus error
(synchronous external abort) and kernel panic.
This causes random kernel panics during encoding operations:
Internal error: synchronous external abort: 0000000096000010
[#1] PREEMPT SMP
Modules linked in: wave5 rpmsg_ctrl rpmsg_char ...
CPU: 0 UID: 0 PID: 1520 Comm: vpu_irq_thread
Tainted: G M W
pc : wave5_vdi_read_register+0x10/0x38 [wave5]
lr : wave5_vpu_irq_work_fn+0x28/0x60 [wave5]
Call trace:
wave5_vdi_read_register+0x10/0x38 [wave5]
kthread_worker_fn+0xd8/0x238
kthread+0x104/0x120
ret_from_fork+0x10/0x20
Code: aa1e03e9 d503201f f9416800 8b214000 (b9400000)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: synchronous external abort:
Fatal exception
Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
Cc: stable@vger.kernel.org
Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Fix the cleanup order in polling mode (irq < 0) to prevent kernel warnings
during module removal. Cancel the hrtimer before destroying the kthread
worker to ensure work queues are empty.
In polling mode, the driver uses hrtimer to periodically trigger
wave5_vpu_timer_callback() which queues work via kthread_queue_work().
The kthread_destroy_worker() function validates that both work queues
are empty with WARN_ON(!list_empty(&worker->work_list)) and
WARN_ON(!list_empty(&worker->delayed_work_list)).
The original code called kthread_destroy_worker() before hrtimer_cancel(),
creating a race condition where the timer could fire during worker
destruction and queue new work, triggering the WARN_ON.
This causes the following warning on every module unload in polling mode:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1034 at kernel/kthread.c:1430
kthread_destroy_worker+0x84/0x98
Modules linked in: wave5(-) rpmsg_ctrl rpmsg_char ...
Call trace:
kthread_destroy_worker+0x84/0x98
wave5_vpu_remove+0xc8/0xe0 [wave5]
platform_remove+0x30/0x58
...
---[ end trace 0000000000000000 ]---
Fixes: ed7276ed2fd0 ("media: chips-media: wave5: Add hrtimer based polling support")
Cc: stable@vger.kernel.org
Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|
|
Replace pm_runtime_put_sync() with pm_runtime_dont_use_autosuspend() in
the remove path to properly pair with pm_runtime_use_autosuspend() from
probe. This allows pm_runtime_disable() to handle reference count cleanup
correctly regardless of current suspend state.
The driver calls pm_runtime_put_sync() unconditionally in remove, but the
device may already be suspended due to autosuspend configured in probe.
When autosuspend has already suspended the device, the usage count is 0,
and pm_runtime_put_sync() decrements it to -1.
This causes the following warning on module unload:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 963 at kernel/kthread.c:1430
kthread_destroy_worker+0x84/0x98
...
vdec 30210000.video-codec: Runtime PM usage count underflow!
Fixes: 9707a6254a8a ("media: chips-media: wave5: Add the v4l2 layer")
Cc: stable@vger.kernel.org
Signed-off-by: Xulin Sun <xulin.sun@windriver.com>
Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
|