| Age | Commit message (Collapse) | Author |
|
Add a new VM_BIND flag, DRM_XE_VM_BIND_FLAG_DECOMPRESS, that lets userspace
express intent for the driver to perform on-device in-place decompression
for the GPU mapping created by a MAP bind operation.
This flag is used by subsequent driver changes to trigger scheduling of
GPU work that resolves compressed VRAM pages into an uncompressed PAT
VM mapping.
Behavior and semantics:
- Valid only for DRM_XE_VM_BIND_OP_MAP. IOCTLs using this flag on other ops
are rejected (-EINVAL).
- The bind's pat_index must select the device "no-compression" PAT entry;
otherwise the ioctl is rejected (-EINVAL).
- Only meaningful for VRAM-backed BOs on devices that support Flat CCS and
the required hardware generation (driver will return -EOPNOTSUPP if not).
- On success the driver schedules a migrate/resolve and installs the
returned dma_fence into the BO's kernel reservation
(DMA_RESV_USAGE_KERNEL).
Compute PR: https://github.com/intel/compute-runtime/pull/898
v3: Rebase on latest drm-tip and add compute pr info
v2: Add kernel doc (Matt)
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mrozek, Michal <michal.mrozek@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20260304123758.3050386-6-nitin.r.gote@intel.com
|
|
Biju Das needs a patch for rz-du merged in 7.0-rc3
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Use the correct function name and function parameter name to avoid
these kernel-doc warnings:
Warning: include/linux/wait_bit.h:424 expecting prototype for
wait_var_event_killable(). Prototype was for
wait_var_event_interruptible() instead
Warning: include/linux/wait_bit.h:508 function parameter 'lock' not
described in 'wait_var_event_mutex'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260302005237.3473095-1-rdunlap@infradead.org
|
|
net->xfrm.nlsk is used in 2 types of contexts:
- fully under RCU, with rcu_read_lock + rcu_dereference and a NULL check
- in the netlink handlers, with requests coming from a userspace socket
In the 2nd case, net->xfrm.nlsk is guaranteed to stay non-NULL and the
object is alive, since we can't enter the netns destruction path while
the user socket holds a reference on the netns.
After adding the __rcu annotation to netns_xfrm.nlsk (which silences
sparse warnings in the RCU users and __net_init code), we need to tell
sparse that the 2nd case is safe. Add a helper for that.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
From: Jakub Kicinski <kuba@kernel.org>
Make sure disable_ipv6_mod itself is not part of the IPv6 module,
in case core code wants to refer to it. We will remove support
for IPv6=m soon, this change helps make fixes we commit before
that less messy.
Link: https://patch.msgid.link/20260307-net-nd_tbl_fixes-v4-1-e2677e85628c@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Replace the global cgroup_file_kn_lock with a per-cgroup_file spinlock
to eliminate cross-cgroup contention as it is not really protecting
data shared between different cgroups.
The lock is initialized in cgroup_add_file() alongside timer_setup().
No lock acquisition is needed during initialization since the cgroup
directory is being populated under cgroup_mutex and no concurrent
accessors exist at that point.
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Add a new clone3() flag CLONE_PIDFD_AUTOKILL that ties a child's
lifetime to the pidfd returned from clone3(). When the last reference to
the struct file created by clone3() is closed the kernel sends SIGKILL
to the child. A pidfd obtained via pidfd_open() for the same process
does not keep the child alive and does not trigger autokill - only the
specific struct file from clone3() has this property.
This is useful for container runtimes, service managers, and sandboxed
subprocess execution - any scenario where the child must die if the
parent crashes or abandons the pidfd.
CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD (the whole point is tying
lifetime to the pidfd file) and CLONE_AUTOREAP (a killed child with no
one to reap it would become a zombie). CLONE_THREAD is rejected because
autokill targets a process not a thread.
The clone3 pidfd is identified by the PIDFD_AUTOKILL file flag set on
the struct file at clone3() time. The pidfs .release handler checks this
flag and sends SIGKILL via do_send_sig_info(SIGKILL, SEND_SIG_PRIV, ...)
only when it is set. Files from pidfd_open() or open_by_handle_at() are
distinct struct files that do not carry this flag. dup()/fork() share the
same struct file so they extend the child's lifetime until the last
reference drops.
CLONE_PIDFD_AUTOKILL uses a privilege model based on CLONE_NNP: without
CLONE_NNP the child could escalate privileges via setuid/setgid exec
after being spawned, so the caller must have CAP_SYS_ADMIN in its user
namespace. With CLONE_NNP the child can never gain new privileges so
unprivileged usage is allowed. This is a deliberate departure from the
pdeath_signal model which is reset during secureexec and commit_creds()
rendering it useless for container runtimes that need to deprivilege
themselves.
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-3-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new clone3() flag CLONE_NNP that sets no_new_privs on the child
process at clone time. This is analogous to prctl(PR_SET_NO_NEW_PRIVS)
but applied at process creation rather than requiring a separate step
after the child starts running.
CLONE_NNP is rejected with CLONE_THREAD. It's conceptually a lot simpler
if the whole thread-group is forced into NNP and not have single threads
running around with NNP.
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-2-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add a new clone3() flag CLONE_AUTOREAP that makes a child process
auto-reap on exit without ever becoming a zombie. This is a per-process
property in contrast to the existing auto-reap mechanism via
SA_NOCLDWAIT or SIG_IGN for SIGCHLD which applies to all children of a
given parent.
Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.
CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag and causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits.
CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern where the parent simply doesn't care about the child's exit
status. No exit signal is delivered so exit_signal must be zero.
CLONE_AUTOREAP is rejected in combination with CLONE_PARENT. If a
CLONE_AUTOREAP child were to clone(CLONE_PARENT) the new grandchild
would inherit exit_signal == 0 from the autoreap parent's group leader
but without signal->autoreap. This grandchild would become a zombie that
never sends a signal and is never autoreaped - confusing and arguably
broken behavior.
The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.
Link: https://github.com/uapi-group/kernel-features/issues/45
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-1-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Merge the definition of MDSS resets for SM6115 and SM6125 to allow them
to be made available in the DeviceTree branch.
|
|
Add the missing defines for MDSS resets, which are necessary to reset
the display subsystem in order to avoid issues caused by state left over
from the bootloader.
While here, align comment style with other SoCs.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Val Packett <val@packett.cool>
Link: https://lore.kernel.org/r/20260303034847.13870-3-val@packett.cool
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Add the missing defines for MDSS resets, which are necessary to reset
the display subsystem in order to avoid issues caused by state left over
from the bootloader.
While here, align comment style with other SoCs.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Val Packett <val@packett.cool>
Link: https://lore.kernel.org/r/20260303034847.13870-2-val@packett.cool
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
the ring is being resized, it's possible for the OR'ing of
IORING_SQ_TASKRUN to happen in the small window of swapping into the
new rings and the old rings being freed.
Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
protected by RCU. The task work flags manipulation is inside RCU
already, and if the resize ring freeing is done post an RCU synchronize,
then there's no need to add locking to the fast path of task work
additions.
Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
that supports ring resizing. If this ever changes, then they too need to
use the io_ctx_mark_taskrun() helper.
Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
clk-for-7.1
Merge DeviceTree bindings for Eliza global, rpmh, and tcsr clock
controllers through a topic branch, in case we need them in the
DeviceTree branch as well.
|
|
Add bindings documentation for TCSR Clock Controller for Eliza SoC.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260311-eliza-clocks-v6-2-453c4cf657a2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Add bindings documentation for the Global Clock Controller on Qualcomm
Eliza SoC. Reuse the Milos bindings schema since the controller resources
are exactly the same, even though the controllers are incompatible between
them.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Abel Vesa <abel.vesa@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260311-eliza-clocks-v6-1-453c4cf657a2@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Pick up the hrtick related hrtimer changes so other unrelated changes can
be queued on top.
|
|
Add a reserved range and a driver callback to allow the driver to
have custom mmaps.
Generated mmap offsets are cookies and are not related to the size of
the mmap. Advance the mmap offset by the minimum, PAGE_SIZE, rather
than the size of the mmap.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308909972.1279894.15543003811821875042.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add a private data pointer to the ucontext structure and add
per-client pass-throughs.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177325008318.52243.7367786996925601681.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Update the list of available link speeds. Fix comments.
Signed-off-by: Dean Luick <dean.luick@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Link: https://patch.msgid.link/177308908456.1279894.16723781060261360236.stgit@awdrv-04.cornelisnetworks.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
efi_range_is_wc() has no callers, so remove it.
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The AMD PMF driver provides realtime column utilization (npu_busy)
metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
query to expose these metrics to userspace.
Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
and update aie2_get_sensors() to return both the total power and up
to 8 column utilization sensors if the user buffer permits.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
[lizhi: support legacy tool which uses small buffer. checkpatch cleanup]
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260311171842.473453-1-lizhi.hou@amd.com
|
|
HEAD
KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end.
- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite being
rather unintuitive.
|
|
This reverts commit 36d6cbb62133fc6eea28f380409e0fb190f3dfbe.
Calling this as a passthrough hypercall leaves the VM in an inconsistent
state. Revert before it is released.
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
A flex array can be used to reduce the allocation to 1.
And actually mtdconcat was using the pointer + 1 trick to point to the
overallocated area. Better alternatives exist.
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Some USB devices incorrectly report bNumConfigurations as 0 in their
device descriptor, which causes the USB core to reject them during
enumeration.
logs:
usb 1-2: device descriptor read/64, error -71
usb 1-2: no configurations
usb 1-2: can't read configurations, error -22
However, these devices actually work correctly when
treated as having a single configuration.
Add a new quirk USB_QUIRK_FORCE_ONE_CONFIG to handle such devices.
When this quirk is set, assume the device has 1 configuration instead
of failing with -EINVAL.
This quirk is applied to the device with VID:PID 5131:2007 which
exhibits this behavior.
Signed-off-by: Jie Deng <dengjie03@kylinos.cn>
Link: https://patch.msgid.link/20260227084931.1527461-1-dengjie03@kylinos.cn
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The usb_control_msg(), usb_bulk_msg(), and usb_interrupt_msg() APIs in
usbcore allow unlimited timeout durations. And since they use
uninterruptible waits, this leaves open the possibility of hanging a
task for an indefinitely long time, with no way to kill it short of
unplugging the target device.
To prevent this sort of problem, enforce a maximum limit on the length
of these unkillable timeouts. The limit chosen here, somewhat
arbitrarily, is 60 seconds. On many systems (although not all) this
is short enough to avoid triggering the kernel's hung-task detector.
In addition, clear up the ambiguity of negative timeout values by
treating them the same as 0, i.e., using the maximum allowed timeout.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/linux-usb/3acfe838-6334-4f6d-be7c-4bb01704b33d@rowland.harvard.edu/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Link: https://patch.msgid.link/15fc9773-a007-47b0-a703-df89a8cf83dd@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The synchronous message API in usbcore (usb_control_msg(),
usb_bulk_msg(), and so on) uses uninterruptible waits. However,
drivers may call these routines in the context of a user thread, which
means it ought to be possible to at least kill them.
For this reason, introduce a new usb_bulk_msg_killable() function
which behaves the same as usb_bulk_msg() except for using
wait_for_completion_killable_timeout() instead of
wait_for_completion_timeout(). The same can be done later for
usb_control_msg() later on, if it turns out to be needed.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Suggested-by: Oliver Neukum <oneukum@suse.com>
Link: https://lore.kernel.org/linux-usb/3acfe838-6334-4f6d-be7c-4bb01704b33d@rowland.harvard.edu/
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
CC: stable@vger.kernel.org
Link: https://patch.msgid.link/248628b4-cc83-4e81-a620-3ce4e0376d41@rowland.harvard.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add support for responding to Sink Cap Extended msg request. To achieve
this, include parsing support for DT properties related to Sink Cap
Extended. The request for Sink Cap Ext is a control message while the
response is an extended message (chunked). As the Sink Caps Extended
Data Block size (24 Byte) is less than MaxExtendedMsgChunkLen (26 Byte),
a single chunk is sufficient to complete this AMS.
Supporting sink cap extended messages while responding to a
Get_Sink_Caps_Extended request when port is in Sink role is required in
order to be compliant with at least USB PD Rev3.1 Ver1.8.
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
Reviewed-by: Badhri Jagan Sridharan <badhri@google.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Link: https://patch.msgid.link/20260223-skedb-v2-2-60675765bc7e@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add additional properties for ports supporting sink mode. The properties
define certain hardware and electrical properties such as sink load
step, sink load characteristics, sink compliance and charging adapter
Power Delivery Profile (PDP) for the connector. These properties need to
be defined for a Type-C port in compliance with the PD 3.1 spec.
Signed-off-by: Amit Sunil Dhamne <amitsd@google.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://patch.msgid.link/20260223-skedb-v2-1-60675765bc7e@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Elan touchscreens have a HID-battery device for the stylus which is always
there even if there is no stylus.
This is causing upower to report an empty battery for the stylus and some
desktop-environments will show a notification about this, which is quite
annoying.
Because of this the HID-battery is being ignored on all Elan I2c and USB
touchscreens, but this causes there to be no battery reporting for
the stylus at all.
This adds a new HID_BATTERY_QUIRK_DYNAMIC and uses these for the Elan
touchscreens.
This new quirks causes the present value of the battery to start at 0,
which will make userspace ignore it and only sets present to 1 after
receiving a battery input report which only happens when the stylus
gets in range.
Reported-by: ggrundik@gmail.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221118
Signed-off-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Reviewed-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
vdso/datapage.h includes a lot of headers which are not strictly necessary.
Some of those headers include architecture-specific vDSO headers which
prevent the usage of vdso/datapage.h in kernel code on architectures
without an vDSO. This would be useful however to write generic code using
IS_ENABLED(), for example in drivers/char/random.c.
Remove the unnecessary includes.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-13-35d60acf7410@linutronix.de
|
|
vdso/datapage.h is useful without pulling in the architecture-specific
gettimeofday() helpers.
Move the include to the only users which needs it.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-12-35d60acf7410@linutronix.de
|
|
The usage of cpu_relax() requires vdso/processor.h. Currently
this header is included transitively, but that transitive inclusion is
about to go away.
Explicitly include the header.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-header-cleanups-v2-11-35d60acf7410@linutronix.de
|
|
All callers of vdso_read_retry() test its return value with unlikely().
Move the unlikely into the helper to make the code easier to read.
This is equivalent to the retry function of non-vDSO seqlocks.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-4-c848b4bc4850@linutronix.de
|
|
Currently this logic is duplicate multiple times.
Add a helper for it to make the code more readable.
[ bp: Add a missing clocksource.h include, see
https://lore.kernel.org/r/20260311113435-f72f81d8-33a6-4a0f-bd80-4997aad068cc@linutronix.de ]
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-3-c848b4bc4850@linutronix.de
|
|
Flags are boolean values, hence they should be typed
as bool, not as u8.
Signed-off-by: Oliver Neukum <oneukum@suse.com>
Link: https://patch.msgid.link/20260210122208.29244-1-oneukum@suse.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Chasing vfork()'ed tasks on a CID ownership mode switch requires a full
task list walk, which is obviously expensive on large systems.
Avoid that by keeping a list of tasks using a mm MMCID entity in mm::mm_cid
and walk this list instead. This removes the proven to be flaky counting
logic and avoids a full task list walk in the case of vfork()'ed tasks.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260310202526.183824481@kernel.org
|
|
A newly forked task is accounted as MMCID user before the task is visible
in the process' thread list and the global task list. This creates the
following problem:
CPU1 CPU2
fork()
sched_mm_cid_fork(tnew1)
tnew1->mm.mm_cid_users++;
tnew1->mm_cid.cid = getcid()
-> preemption
fork()
sched_mm_cid_fork(tnew2)
tnew2->mm.mm_cid_users++;
// Reaches the per CPU threshold
mm_cid_fixup_tasks_to_cpus()
for_each_other(current, p)
....
As tnew1 is not visible yet, this fails to fix up the already allocated CID
of tnew1. As a consequence a subsequent schedule in might fail to acquire a
(transitional) CID and the machine stalls.
Move the invocation of sched_mm_cid_fork() after the new task becomes
visible in the thread and the task list to prevent this.
This also makes it symmetrical vs. exit() where the task is removed as CID
user before the task is removed from the thread and task lists.
Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260310202525.969061974@kernel.org
|
|
Requested by Maxime Ripard for drm-misc-next because renesas people need
fb797a70108f ("drm: renesas: rz-du: mipi_dsi: Set DSI divider").
Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
|
|
Move the get/put/ref/flush_for_display calls to the display parent
interface.
For i915, move the hooks next to the other i915 core frontbuffer code in
i915_gem_object_frontbuffer.c. For xe, add new file xe_frontbuffer.c for
the same.
Note: The intel_frontbuffer_flush() calls from
i915_gem_object_frontbuffer.c will partially route back to i915 core via
the parent interface. This is less than stellar.
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patch.msgid.link/f69b967ed82bbcfd60ffa77ba197b26a1399f09f.1772475391.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Fix the copy-paste fail.
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Link: https://patch.msgid.link/0209e128312520ca1c6a0c39f9dfb0184125322a.1772475391.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
namespace aware clock
Currently there are three different open-coded variants of a time
namespace aware variant of vdso_read_begin(). They make the code hard to
read and introduce an inconsistency, as only the first copy uses
unlikely().
Split the code into a shared helper function.
Move that next to the definition of the regular vdso_read_begin(), so
that any future changes can be kept in sync easily.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260227-vdso-cleanups-v1-2-c848b4bc4850@linutronix.de
|
|
After sparc64, there are no remaining users of ARCH_CLOCKSOURCE_DATA
and it can just be removed.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Andreas Larsson <andreas@gaisler.com>
Reviewed-by: Andreas Larsson <andreas@gaisler.com>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260304-vdso-sparc64-generic-2-v6-14-d8eb3b0e1410@linutronix.de
[Thomas: drop sparc64 bits from the patch]
|
|
Allocating the data pages as part of the kernel image does not work on
SPARC. The MMU will raise a fault when userspace tries to access them.
Allocate the data pages through the page allocator instead.
Unused pages in the vDSO VMA are still allocated to keep the virtual
addresses aligned. Switch the mapping from PFNs to 'struct page' as that is
required for dynamically allocated pages. This also aligns the allocation
of the datapages with the code pages and is a prerequisite for mlockall()
support.
VM_MIXEDMAP is necessary for the call to vmf_insert_page() in the timens
prefault path to work.
The data pages need to be order-0, non-compound pages so that the mapping
to userspace and the different orderings work.
These pages are also used by the timekeeping, random pool and architecture
initialization code. Some of these are running before the page allocator is
available. To keep these subsytems working without changes, introduce
early, statically data storage which will then replaced by the real one as
soon as that is available.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org>
Link: https://patch.msgid.link/20260304-vdso-sparc64-generic-2-v6-3-d8eb3b0e1410@linutronix.de
|
|
The value of __BITS_PER_LONG from architecture-specific logic should
always match the generic one if that is available. It should also match
the actual C type 'long'.
Mismatches can happen for example when building the compat vDSO. Either
during the compilation, see commit 9a6d3ff10f7f ("arm64: uapi: Provide
correct __BITS_PER_LONG for the compat vDSO"), or when running sparse
when mismatched CHECKFLAGS are inherited from the kernel build.
Add some consistency checks which detect such issues early and clearly.
The kernel-internal BITS_PER_LONG is not checked as it is derived from
CONFIG_64BIT and therefore breaks for the compat vDSO. See the similar,
deactivated check above.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260302-vdso-compat-checkflags-v2-5-78e55baa58ba@linutronix.de
|
|
Remove the "[]" array indicators from the struct member descriptions
to avoid kernel-doc warnings.
Warning: include/vdso/datapage.h:107 struct member 'basetime' not
described in 'vdso_clock'
Warning: include/vdso/datapage.h:107 struct member 'offset' not described
in 'vdso_clock'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260228071711.2663851-1-rdunlap@infradead.org
|
|
GPU use-cases for mmu_interval_notifiers with hmm often involve
starting a gpu operation and then waiting for it to complete.
These operations are typically context preemption or TLB flushing.
With single-pass notifiers per GPU this doesn't scale in
multi-gpu scenarios. In those scenarios we'd want to first start
preemption- or TLB flushing on all GPUs and as a second pass wait
for them to complete.
One can do this on per-driver basis multiplexing per-driver
notifiers but that would mean sharing the notifier "user" lock
across all GPUs and that doesn't scale well either, so adding support
for multi-pass in the core appears to be the right choice.
Implement two-pass capability in the mmu_interval_notifier. Use a
linked list for the final passes to minimize the impact for
use-cases that don't need the multi-pass functionality by avoiding
a second interval tree walk, and to be able to easily pass data
between the two passes.
v1:
- Restrict to two passes (Jason Gunthorpe)
- Improve on documentation (Jason Gunthorpe)
- Improve on function naming (Alistair Popple)
v2:
- Include the invalidate_finish() callback in the
struct mmu_interval_notifier_ops.
- Update documentation (GitHub Copilot:claude-sonnet-4.6)
- Use lockless list for list management.
v3:
- Update kerneldoc for the struct mmu_interval_notifier_finish::list member
(Matthew Brost)
- Add a WARN_ON_ONCE() checking for NULL invalidate_finish() op if
if invalidate_start() is non-NULL. (Matthew Brost)
v4:
- Addressed documentation review comments by David Hildenbrand.
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Simona Vetter <simona.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <dri-devel@lists.freedesktop.org>
Cc: <linux-mm@kvack.org>
Cc: <linux-kernel@vger.kernel.org>
Assisted-by: GitHub Copilot:claude-sonnet-4.6 # Documentation only.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/20260305093909.43623-2-thomas.hellstrom@linux.intel.com
|
|
Merge branch adding support for device-managed workqueue allocation,
which allows cleaning up a couple of power-supply drivers.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Add documentation covering stmmac_clk, pclk, clk_ptp_ref and clk_tx_i
in the hope that this will help understand what each of these clocks
are for.
There is confusion around stmmac_clk and pclk which can't be easily
resolved today as the Imagination Technologies Pistachio board that
pclk was introduced for has no public documentation and is likely now
obsolete. So the origins of pclk are lost to the winds of time.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vzX5Z-0000000CVsb-1XTm@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|