summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-01-22net: bonding: move bond_should_notify_peers, e.g. into rtnl lock blockTonghao Zhang
This patch tries to avoid the possible peer notify event loss. In bond_mii_monitor()/bond_activebackup_arp_mon(), when we hold the rtnl lock: - check send_peer_notif again to avoid unconditionally reducing this value. - send_peer_notif may have been reset. Therefore, it is necessary to check whether to send peer notify via bond_should_notify_peers() to avoid the loss of notification events. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Jason Xing <kerneljasonxing@gmail.com> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/78cef328822b94638c97638b89011c507b8bf19e.1768709239.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-22net: bonding: use workqueue to make sure peer notify updated in lacp modeTonghao Zhang
The rtnl lock might be locked, preventing ad_cond_set_peer_notif() from acquiring the lock and updating send_peer_notif. This patch addresses the issue by using a workqueue. Since updating send_peer_notif does not require high real-time performance, such delayed updates are entirely acceptable. In fact, checking this value and using it in multiple places, all operations are protected at the same time by rtnl lock, such as - read send_peer_notif - send_peer_notif-- - bond_should_notify_peers By the way, rtnl lock is still required, when accessing bond.params.* for updating send_peer_notif. In lacp mode, resetting send_peer_notif in workqueue is safe, simple and effective way. Additionally, this patch introduces bond_peer_notify_may_events(), which is used to check whether an event should be sent. This function will be used in both patch 1 and 2. Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Hangbin Liu <liuhangbin@gmail.com> Cc: Jason Xing <kerneljasonxing@gmail.com> Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Tonghao Zhang <tonghao@bamaicloud.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/f95accb5db0b10ce3ed2f834fc70f716c9abbb9c.1768709239.git.tonghao@bamaicloud.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-22arm64: Unconditionally enable PAN supportMarc Zyngier
FEAT_PAN has been around since ARMv8.1 (over 11 years ago), has no compiler dependency (we have our own accessors), and is a great security benefit. Drop CONFIG_ARM64_PAN, and make the support unconditionnal. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22arm64: Unconditionally enable LSE supportMarc Zyngier
LSE atomics have been in the architecture since ARMv8.1 (released in 2014), and are hopefully supported by all modern toolchains. Drop the optional nature of LSE support in the kernel, and always compile the support in, as this really is very little code. LL/SC still is the default, and the switch to LSE is done dynamically. Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22sched: remove task_struct->faults_disabled_mappingChristoph Hellwig
This reverts commit 2b69987be575 ("sched: Add task_struct->faults_disabled_mapping"), which added this field without review or maintainer signoff. With bcachefs removed from the tree it is also unused now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260122085223.487092-1-hch@lst.de
2026-01-22sched: Update rq->avg_idle when a task is moved to an idle CPUShubhang Kaushik
Currently, rq->idle_stamp is only used to calculate avg_idle during wakeups. This means other paths that move a task to an idle CPU such as fork/clone, execve, or migrations, do not end the CPU's idle status in the scheduler's eyes, leading to an inaccurate avg_idle. This patch introduces update_rq_avg_idle() to provide a more accurate measurement of CPU idle duration. By invoking this helper in put_prev_task_idle(), we ensure avg_idle is updated whenever a CPU stops being idle, regardless of how the new task arrived. Testing on an 80-core Ampere Altra (ARMv8) with 6.19-rc5 baseline: - Hackbench : +7.2% performance gain at 16 threads. - Schbench: Reduced p99.9 tail latencies at high concurrency. Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Shubhang Kaushik <shubhang@os.amperecomputing.com> Link: https://patch.msgid.link/20260121-v8-patch-series-v8-1-b7f1cbee5055@os.amperecomputing.com
2026-01-22selftests/rseq: Add rseq slice histogram scriptPeter Zijlstra
A script that processes trace-cmd data and generates a histogram of rseq slice_ext durations for the recorded workload. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143208.340549136@infradead.org
2026-01-22hrtimer: Fix trace oddityThomas Gleixner
It turns out that __run_hrtimer() will trace like: <idle>-0 [032] d.h2. 20705.474563: hrtimer_cancel: hrtimer=0xff2db8f77f8226e8 <idle>-0 [032] d.h1. 20705.474563: hrtimer_expire_entry: hrtimer=0xff2db8f77f8226e8 now=20699452001850 function=tick_nohz_handler/0x0 Which is a bit nonsensical, the timer doesn't get canceled on expiration. The cause is the use of the incorrect debug helper. Fixes: c6a2a1770245 ("hrtimer: Add tracepoint for hrtimers") Reported-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143208.219595606@infradead.org
2026-01-22rseq: Lower default slice extensionPeter Zijlstra
Change the minimum slice extension to 5 usec. Since slice_test selftest reaches a staggering ~350 nsec extension: Task: slice_test Mean: 350.266 ns Latency (us) | Count ------------------------------ EXPIRED | 238 0 us | 143189 1 us | 167 2 us | 26 3 us | 11 4 us | 28 5 us | 31 6 us | 22 7 us | 23 8 us | 32 9 us | 16 10 us | 35 Lower the minimal (and default) value to 5 usecs -- which is still massive. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143208.073200729@infradead.org
2026-01-22rseq: Move slice_ext_nsec to debugfsPeter Zijlstra
Move changing the slice ext duration to debugfs, a sliglty less permanent interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143207.923520192@infradead.org
2026-01-22rseq: Allow registering RSEQ with slice extensionPeter Zijlstra
Since glibc cares about the number of syscalls required to initialize a new thread, allow initializing rseq with slice extension on. This avoids having to do another prctl(). Requested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260121143207.814193010@infradead.org
2026-01-22selftests/rseq: Implement time slice extension testThomas Gleixner
Provide an initial test case to evaluate the functionality. This needs to be extended to cover the ABI violations and expose the race condition between observing granted and arriving in rseq_slice_yield(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155709.320325431@linutronix.de
2026-01-22entry: Hook up rseq time slice extensionThomas Gleixner
Wire the grant decision function up in exit_to_user_mode_loop() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155709.258157362@linutronix.de
2026-01-22rseq: Implement rseq_grant_slice_extension()Thomas Gleixner
Provide the actual decision function, which decides whether a time slice extension is granted in the exit to user mode path when NEED_RESCHED is evaluated. The decision is made in two stages. First an inline quick check to avoid going into the actual decision function. This checks whether: #1 the functionality is enabled #2 the exit is a return from interrupt to user mode #3 any TIF bit, which causes extra work is set. That includes TIF_RSEQ, which means the task was already scheduled out. The slow path, which implements the actual user space ABI, is invoked when: A) #1 is true, #2 is true and #3 is false It checks whether user space requested a slice extension by setting the request bit in the rseq slice_ctrl field. If so, it grants the extension and stores the slice expiry time, so that the actual exit code can double check whether the slice is already exhausted before going back. B) #1 - #3 are true _and_ a slice extension was granted in a previous loop iteration In this case the grant is revoked. In case that the user space access faults or invalid state is detected, the task is terminated with SIGSEGV. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155709.195303303@linutronix.de
2026-01-22rseq: Reset slice extension when scheduledThomas Gleixner
When a time slice extension was granted in the need_resched() check on exit to user space, the task can still be scheduled out in one of the other pending work items. When it gets scheduled back in, and need_resched() is not set, then the stale grant would be preserved, which is just wrong. RSEQ already keeps track of that and sets TIF_RSEQ, which invokes the critical section and ID update mechanisms. Utilize them and clear the user space slice control member of struct rseq unconditionally within the existing user access sections. That's just an unconditional store more in that path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251215155709.131081527@linutronix.de
2026-01-22rseq: Implement time slice extension enforcement timerThomas Gleixner
If a time slice extension is granted and the reschedule delayed, the kernel has to ensure that user space cannot abuse the extension and exceed the maximum granted time. It was suggested to implement this via the existing hrtick() timer in the scheduler, but that turned out to be problematic for several reasons: 1) It creates a dependency on CONFIG_SCHED_HRTICK, which can be disabled independently of CONFIG_HIGHRES_TIMERS 2) HRTICK usage in the scheduler can be runtime disabled or is only used for certain aspects of scheduling. 3) The function is calling into the scheduler code and that might have unexpected consequences when this is invoked due to a time slice enforcement expiry. Especially when the task managed to clear the grant via sched_yield(0). It would be possible to address #2 and #3 by storing state in the scheduler, but that is extra complexity and fragility for no value. Implement a dedicated per CPU hrtimer instead, which is solely used for the purpose of time slice enforcement. The timer is armed when an extension was granted right before actually returning to user mode in rseq_exit_to_user_mode_restart(). It is disarmed, when the task relinquishes the CPU. This is expensive as the timer is probably the first expiring timer on the CPU, which means it has to reprogram the hardware. But that's less expensive than going through a full hrtimer interrupt cycle for nothing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251215155709.068329497@linutronix.de
2026-01-22rseq: Implement syscall entry work for time slice extensionsThomas Gleixner
The kernel sets SYSCALL_WORK_RSEQ_SLICE when it grants a time slice extension. This allows to handle the rseq_slice_yield() syscall, which is used by user space to relinquish the CPU after finishing the critical section for which it requested an extension. In case the kernel state is still GRANTED, the kernel resets both kernel and user space state with a set of sanity checks. If the kernel state is already cleared, then this raced against the timer or some other interrupt and just clears the work bit. Doing it in syscall entry work allows to catch misbehaving user space, which issues an arbitrary syscall, i.e. not rseq_slice_yield(), from the critical section. Contrary to the initial strict requirement to use rseq_slice_yield() arbitrary syscalls are not considered a violation of the ABI contract anymore to allow onion architecture applications, which cannot control the code inside a critical section, to utilize this as well. If the code detects inconsistent user space that result in a SIGSEGV for the application. If the grant was still active and the task was not preempted yet, the work code reschedules immediately before continuing through the syscall. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155709.005777059@linutronix.de
2026-01-22rseq: Implement sys_rseq_slice_yield()Thomas Gleixner
Provide a new syscall which has the only purpose to yield the CPU after the kernel granted a time slice extension. sched_yield() is not suitable for that because it unconditionally schedules, but the end of the time slice extension is not required to schedule when the task was already preempted. This also allows to have a strict check for termination to catch user space invoking random syscalls including sched_yield() from a time slice extension region. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20251215155708.929634896@linutronix.de
2026-01-22rseq: Add prctl() to enable time slice extensionsThomas Gleixner
Implement a prctl() so that tasks can enable the time slice extension mechanism. This fails, when time slice extensions are disabled at compile time or on the kernel command line and when no rseq pointer is registered in the kernel. That allows to implement a single trivial check in the exit to user mode hotpath, to decide whether the whole mechanism needs to be invoked. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.858717691@linutronix.de
2026-01-22rseq: Add statistics for time slice extensionsThomas Gleixner
Extend the quick statistics with time slice specific fields. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.795202254@linutronix.de
2026-01-22rseq: Provide static branch for time slice extensionsThomas Gleixner
Guard the time slice extension functionality with a static key, which can be disabled on the kernel command line. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.733429292@linutronix.de
2026-01-22rseq: Add fields and constants for time slice extensionThomas Gleixner
Aside of a Kconfig knob add the following items: - Two flag bits for the rseq user space ABI, which allow user space to query the availability and enablement without a syscall. - A new member to the user space ABI struct rseq, which is going to be used to communicate request and grant between kernel and user space. - A rseq state struct to hold the kernel state of this - Documentation of the new mechanism Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251215155708.669472597@linutronix.de
2026-01-22sched/debug: Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()Fushuai Wang
Using kstrtouint_from_user() instead of copy_from_user() + kstrtouint() makes the code simpler and less error-prone. Suggested-by: Yury Norov <ynorov@nvidia.com> Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Yury Norov <ynorov@nvidia.com> Link: https://patch.msgid.link/20260117145615.53455-2-fushuai.wang@linux.dev
2026-01-22arm64/fpsimd: signal: Fix restoration of SVE contextMark Rutland
When SME is supported, Restoring SVE signal context can go wrong in a few ways, including placing the task into an invalid state where the kernel may read from out-of-bounds memory (and may potentially take a fatal fault) and/or may kill the task with a SIGKILL. (1) Restoring a context with SVE_SIG_FLAG_SM set can place the task into an invalid state where SVCR.SM is set (and sve_state is non-NULL) but TIF_SME is clear, consequently resuting in out-of-bounds memory reads and/or killing the task with SIGKILL. This can only occur in unusual (but legitimate) cases where the SVE signal context has either been modified by userspace or was saved in the context of another task (e.g. as with CRIU), as otherwise the presence of an SVE signal context with SVE_SIG_FLAG_SM implies that TIF_SME is already set. While in this state, task_fpsimd_load() will NOT configure SMCR_ELx (leaving some arbitrary value configured in hardware) before restoring SVCR and attempting to restore the streaming mode SVE registers from memory via sve_load_state(). As the value of SMCR_ELx.LEN may be larger than the task's streaming SVE vector length, this may read memory outside of the task's allocated sve_state, reading unrelated data and/or triggering a fault. While this can result in secrets being loaded into streaming SVE registers, these values are never exposed. As TIF_SME is clear, fpsimd_bind_task_to_cpu() will configure CPACR_ELx.SMEN to trap EL0 accesses to streaming mode SVE registers, so these cannot be accessed directly at EL0. As fpsimd_save_user_state() verifies the live vector length before saving (S)SVE state to memory, no secret values can be saved back to memory (and hence cannot be observed via ptrace, signals, etc). When the live vector length doesn't match the expected vector length for the task, fpsimd_save_user_state() will send a fatal SIGKILL signal to the task. Hence the task may be killed after executing userspace for some period of time. (2) Restoring a context with SVE_SIG_FLAG_SM clear does not clear the task's SVCR.SM. If SVCR.SM was set prior to restoring the context, then the task will be left in streaming mode unexpectedly, and some register state will be combined inconsistently, though the task will be left in legitimate state from the kernel's PoV. This can only occur in unusual (but legitimate) cases where ptrace has been used to set SVCR.SM after entry to the sigreturn syscall, as syscall entry clears SVCR.SM. In these cases, the the provided SVE register data will be loaded into the task's sve_state using the non-streaming SVE vector length and the FPSIMD registers will be merged into this using the streaming SVE vector length. Fix (1) by setting TIF_SME when setting SVCR.SM. This also requires ensuring that the task's sme_state has been allocated, but as this could contain live ZA state, it should not be zeroed. Fix (2) by clearing SVCR.SM when restoring a SVE signal context with SVE_SIG_FLAG_SM clear. For consistency, I've pulled the manipulation of SVCR, TIF_SVE, TIF_SME, and fp_type earlier, immediately after the allocation of sve_state/sme_state, before the restore of the actual register state. This makes it easier to ensure that these are always modified consistently, even if a fault is taken while reading the register data from the signal context. I do not expect any software to depend on the exact state restored when a fault is taken while reading the context. Fixes: 85ed24dad290 ("arm64/sme: Implement streaming SVE signal handling") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-01-22arm64/fpsimd: signal: Allocate SSVE storage when restoring ZAMark Rutland
The code to restore a ZA context doesn't attempt to allocate the task's sve_state before setting TIF_SME. Consequently, restoring a ZA context can place a task into an invalid state where TIF_SME is set but the task's sve_state is NULL. In legitimate but uncommon cases where the ZA signal context was NOT created by the kernel in the context of the same task (e.g. if the task is saved/restored with something like CRIU), we have no guarantee that sve_state had been allocated previously. In these cases, userspace can enter streaming mode without trapping while sve_state is NULL, causing a later NULL pointer dereference when the kernel attempts to store the register state: | # ./sigreturn-za | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 | Mem abort info: | ESR = 0x0000000096000046 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x06: level 2 translation fault | Data abort info: | ISV = 0, ISS = 0x00000046, ISS2 = 0x00000000 | CM = 0, WnR = 1, TnD = 0, TagAccess = 0 | GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 | user pgtable: 4k pages, 52-bit VAs, pgdp=0000000101f47c00 | [0000000000000000] pgd=08000001021d8403, p4d=0800000102274403, pud=0800000102275403, pmd=0000000000000000 | Internal error: Oops: 0000000096000046 [#1] SMP | Modules linked in: | CPU: 0 UID: 0 PID: 153 Comm: sigreturn-za Not tainted 6.19.0-rc1 #1 PREEMPT | Hardware name: linux,dummy-virt (DT) | pstate: 214000c9 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) | pc : sve_save_state+0x4/0xf0 | lr : fpsimd_save_user_state+0xb0/0x1c0 | sp : ffff80008070bcc0 | x29: ffff80008070bcc0 x28: fff00000c1ca4c40 x27: 63cfa172fb5cf658 | x26: fff00000c1ca5228 x25: 0000000000000000 x24: 0000000000000000 | x23: 0000000000000000 x22: fff00000c1ca4c40 x21: fff00000c1ca4c40 | x20: 0000000000000020 x19: fff00000ff6900f0 x18: 0000000000000000 | x17: fff05e8e0311f000 x16: 0000000000000000 x15: 028fca8f3bdaf21c | x14: 0000000000000212 x13: fff00000c0209f10 x12: 0000000000000020 | x11: 0000000000200b20 x10: 0000000000000000 x9 : fff00000ff69dcc0 | x8 : 00000000000003f2 x7 : 0000000000000001 x6 : fff00000c1ca5b48 | x5 : fff05e8e0311f000 x4 : 0000000008000000 x3 : 0000000000000000 | x2 : 0000000000000001 x1 : fff00000c1ca5970 x0 : 0000000000000440 | Call trace: | sve_save_state+0x4/0xf0 (P) | fpsimd_thread_switch+0x48/0x198 | __switch_to+0x20/0x1c0 | __schedule+0x36c/0xce0 | schedule+0x34/0x11c | exit_to_user_mode_loop+0x124/0x188 | el0_interrupt+0xc8/0xd8 | __el0_irq_handler_common+0x18/0x24 | el0t_64_irq_handler+0x10/0x1c | el0t_64_irq+0x198/0x19c | Code: 54000040 d51b4408 d65f03c0 d503245f (e5bb5800) | ---[ end trace 0000000000000000 ]--- Fix this by having restore_za_context() ensure that the task's sve_state is allocated, matching what we do when taking an SME trap. Any live SVE/SSVE state (which is restored earlier from a separate signal context) must be preserved, and hence this is not zeroed. Fixes: 39782210eb7e ("arm64/sme: Implement ZA signal handling") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-01-22arm64/fpsimd: ptrace: Fix SVE writes on !SME systemsMark Rutland
When SVE is supported but SME is not supported, a ptrace write to the NT_ARM_SVE regset can place the tracee into an invalid state where (non-streaming) SVE register data is stored in FP_STATE_SVE format but TIF_SVE is clear. This can result in a later warning from fpsimd_restore_current_state(), e.g. WARNING: CPU: 0 PID: 7214 at arch/arm64/kernel/fpsimd.c:383 fpsimd_restore_current_state+0x50c/0x748 When this happens, fpsimd_restore_current_state() will set TIF_SVE, placing the task into the correct state. This occurs before any other check of TIF_SVE can possibly occur, as other checks of TIF_SVE only happen while the FPSIMD/SVE/SME state is live. Thus, aside from the warning, there is no functional issue. This bug was introduced during rework to error handling in commit: 9f8bf718f2923 ("arm64/fpsimd: ptrace: Gracefully handle errors") ... where the setting of TIF_SVE was moved into a block which is only executed when system_supports_sme() is true. Fix this by removing the system_supports_sme() check. This ensures that TIF_SVE is set for (SVE-formatted) writes to NT_ARM_SVE, at the cost of unconditionally manipulating the tracee's saved svcr value. The manipulation of svcr is benign and inexpensive, and we already do similar elsewhere (e.g. during signal handling), so I don't think it's worth guarding this with system_supports_sme() checks. Aside from the above, there is no functional change. The 'type' argument to sve_set_common() is only set to ARM64_VEC_SME (in ssve_set())) when system_supports_sme(), so the ARM64_VEC_SME case in the switch statement is still unreachable when !system_supports_sme(). When CONFIG_ARM64_SME=n, the only caller of sve_set_common() is sve_set(), and the compiler can constant-fold for the case where type is ARM64_VEC_SVE, removing the logic for other cases. Reported-by: syzbot+d4ab35af21e99d07ce67@syzkaller.appspotmail.com Fixes: 9f8bf718f292 ("arm64/fpsimd: ptrace: Gracefully handle errors") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-01-22Merge tag 'apple-soc-dt-6.20' of ↵Krzysztof Kozlowski
https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux into soc/dt Apple SoC DT update for 6.20 - Add all required nodes and connections for USB3 support. This is responsible for the majority of the diffstat. The dt-bindings for the Type-C PHY are scheduled to be sent via the PHY tree and are already in next. - Add RTC subnodes to the System Management Controller - Add chassis-type property for all M1 and M2 machines - Fix some minor power management issues - Add backlight nodes for the A9X-based iPad Pro * tag 'apple-soc-dt-6.20' of https://git.kernel.org/pub/scm/linux/kernel/git/sven/linux: arm64: dts: apple: t60xx: Add nodes for integrated USB Type-C ports arm64: dts: apple: t8112: Add nodes for integrated USB Type-C ports arm64: dts: apple: t8103: Add nodes for integrated USB Type-C ports arm64: dts: apple: t8103: Add ps_pmp dependency to ps_gfx arm64: dts: apple: t8103: Mark ATC USB AON domains as always-on arm64: dts: apple: t8112-j473: Keep the HDMI port powered on arm64: dts: apple: Add chassis-type property for Apple iMacs arm64: dts: apple: Add chassis-type property for Mac Pro arm64: dts: apple: Add chassis-type property for Apple desktop devices arm64: dts: apple: Add chassis-type property for all Macbooks arm64: dts: apple: s8001: Add DWI backlight for J98a, J99a arm64: dts: apple: t8103,t60xx,t8112: Add SMC RTC node Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22Merge tag 'lpc32xx-dt-for-6.20' of ↵Krzysztof Kozlowski
https://github.com/vzapolskiy/linux-lpc32xx into soc/dt ARM: nxp: lpc: device tree updates for v6.20 This pull request contains device tree changes for ARM NXP LPC32xx intended for v6.20, please pull the following: - Frank fixes device tree checker warnings reported for NXP LPC32xx boards, - Piotr addes a DMA mux block under SCB, DMA properties to controllers and I2S support for NXP LPC32xx, - Kuldeep corrects values of PrimeCell PL022 'clocks' and 'clock-names' properties, this is the change from a waiting queue, recently it was repeatedly done by Frank, the hesitation was about a probable ABI break, but here in particular the risk is practically negligible due to the kept backwards compatibale 'clocks' property, - Vladimir adds a few missing properties to a number of LPC32xx controllers. * tag 'lpc32xx-dt-for-6.20' of https://github.com/vzapolskiy/linux-lpc32xx: arm: dts: lpc32xx: add interrupts property to Motor Control PWM arm: dts: lpc32xx: add clocks property to Motor Control PWM device tree node ARM: dts: lpc32xx: Add missing properties to I2S device tree nodes ARM: dts: lpc32xx: Declare the second AHB master support on PL080 DMA controller ARM: dts: lpc32xx: Add missing DMA properties ARM: dts: lpc32xx: Use syscon for system control block ARM: dts: lpc32xx: describe FLASH_INT of SLC NAND controller ARM: dts: lpc32xx: change NAND controllers node names ARM: dts: lpc32xx: Update spi clock properties ARM: dts: lpc3250-phy3250: replace deprecated at25 properties with new ones ARM: dts: lpc3250-phy3250: rename nodename at@0 to eeprom@0 ARM: dts: lpc3250-ea3250: add key- prefix for gpio-keys ARM: dts: lpc32xx: remove usb bus and elevate all children nodes Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22Merge tag 'aspeed-6.20-devicetree-1' of ↵Krzysztof Kozlowski
https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux into soc/dt aspeed: second batch of arm devicetree changes for 6.20 New platforms: - Facebook Anacapa The Meta Anacapa BMC is the DC-SCM (Data Center Secure Control Module) controller for the Meta OCP Open Rack Wide (ORW) compute tray. This platform is a key component of the AMD Helios AI rack reference design system, designed for next-generation AI workloads. The BMC utilizes the Aspeed AST2600 SoC to manage the compute tray, which contains up to 4 AMD Instinct MI450 Series GPUs (connected via a Broadcom OCP NIC) and host CPUs. Its primary role is to provide essential system control, power sequencing, and telemetry reporting for the compute complex via the OpenBMC software stack. For more detail on the AMD Helios reference design: https://www.amd.com/en/blogs/2025/amd-helios-ai-rack-built-on-metas-2025-ocp-design.html - ASRock Rack ALTRAD8 The ALTRAD8 BMC is an Aspeed AST2500-based BMC for the ASRock Rack ALTRAD8UD-1L2T and ALTRAD8UD2-1L2Q boards. Significant changes: - Switch IBM FSI CFAM nodes to use non-deprecated AT25 properties Updated platforms: - bletchley (Facebook): USB-C tweaks * tag 'aspeed-6.20-devicetree-1' of https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux: ARM: dts: aspeed: ibm: Use non-deprecated AT25 properties ARM: dts: aspeed: add device tree for ASRock Rack ALTRAD8 BMC dt-bindings: arm: aspeed: add ASRock Rack ALTRAD8 board ARM: dts: aspeed: bletchley: Remove try-power-role from connectors ARM: dts: aspeed: Add Facebook Anacapa platform dt-bindings: arm: aspeed: Add compatible for Facebook Anacapa BMC Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22Merge tag 'nuvoton-arm64-6.20-devicetree-0' of ↵Krzysztof Kozlowski
https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux into soc/dt Nuvoton arm64 devicetree changes for 6.20 Just the one patch from Rob adding the device_type property to the memory node of the NPCM845 EVB DTS. * tag 'nuvoton-arm64-6.20-devicetree-0' of https://git.kernel.org/pub/scm/linux/kernel/git/bmc/linux: arm64: dts: nuvoton: Add missing "device_type" property on memory node Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22drm/xe: Use DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocationsSanjay Yadav
The VRAM/stolen memory managers do not currently set DRM_BUDDY_CONTIGUOUS_ALLOCATION for contiguous allocations. Enabling this flag activates the buddy allocator's try_harder path, which helps handle fragmented memory scenarios. This enables the __alloc_contig_try_harder fallback in the buddy allocator, allowing contiguous allocation requests to succeed even when memory is fragmented by combining allocations from both(RHS and LHS) sides of a large free block. v2: (Matt B) - Remove redundant logic for rounding allocation size and trimming when TTM_PL_FLAG_CONTIGUOUS is set, since drm_buddy now handles this when DRM_BUDDY_CONTIGUOUS_ALLOCATION is enabled Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6713 Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260121111416.3104399-2-sanjay.kumar.yadav@intel.com
2026-01-22drm, drm/xe: Fix xe userptr in the absence of CONFIG_DEVICE_PRIVATEThomas Hellström
CONFIG_DEVICE_PRIVATE is not selected by default by some distros, for example Fedora, and that leads to a regression in the xe driver since userptr support gets compiled out. It turns out that DRM_GPUSVM, which is needed for xe userptr support compiles also without CONFIG_DEVICE_PRIVATE, but doesn't compile without CONFIG_ZONE_DEVICE. Exclude the drm_pagemap files from compilation with !CONFIG_ZONE_DEVICE, and remove the CONFIG_DEVICE_PRIVATE dependency from CONFIG_DRM_GPUSVM and the xe driver's selection of it, re-enabling xe userptr for those configs. v2: - Don't compile the drm_pagemap files unless CONFIG_ZONE_DEVICE is set. - Adjust the drm_pagemap.h header accordingly. Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.18+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260121091048.41371-2-thomas.hellstrom@linux.intel.com (cherry picked from commit 1e372b246199ca7a35f930177fea91b557dac16e) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-01-22PCI: dwc: Use multiple iATU windows for mapping large bridge windows and DMA ↵Samuel Holland
ranges The DWC driver tries to use a single iATU region for mapping the individual entries of the bridge window and DMA range. If a bridge window/DMA range is larger than the iATU inbound/outbound window size, then the mapping will fail. Hence, avoid this failure by using multiple iATU windows to map the whole region. If the region runs out of iATU windows, then return failure. Signed-off-by: Charles Mirabile <cmirabil@redhat.com> Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Co-developed-by: Randolph Lin <randolph@andestech.com> Signed-off-by: Randolph Lin <randolph@andestech.com> [mani: reworded description, minor code cleanup] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Frank Li <Frank.Li@nxp.com> Acked-by: Charles Mirabile <cmirabil@redhat.com> Link: https://patch.msgid.link/20260109113430.2767264-1-randolph@andestech.com
2026-01-22arm64: dts: marvell: Add SoC specific compatibles to SafeXcel cryptoAngeloGioacchino Del Regno
Following the changes in the binding for the SafeXcel crypto engine, add SoC specific compatibles to the existing nodes in Armada 37xx and CP11x. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2026-01-22drm/i915/display: Fix color pipeline enum name leakChaitanya Kumar Borah
intel_color_pipeline_plane_init() allocates enum names for color pipelines, which are copied by drm_property_create_enum(). The temporary strings were not freed, resulting in a memory leak. Allocate enum names only after successful pipeline construction and free them on all exit paths. Fixes: ef105316819d ("drm/i915/color: Create a transfer function color pipeline") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260113102303.724205-5-chaitanya.kumar.borah@intel.com
2026-01-22drm/vkms: Fix color pipeline enum name leakChaitanya Kumar Borah
vkms_initialize_colorops() allocates enum names for color pipelines, which are copied by drm_property_create_enum(). The temporary strings were not freed, resulting in a memory leak. Allocate enum names only after successful pipeline construction and free them on all exit paths Fixes: c1e578bd08da ("drm/vkms: Add enumerated 1D curve colorop") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Link: https://patch.msgid.link/20260113102303.724205-4-chaitanya.kumar.borah@intel.com
2026-01-22drm/amd/display: Fix color pipeline enum name leakChaitanya Kumar Borah
dm_plane_init_colorops() allocates enum names for color pipelines. These are eventually passed to drm_property_create_enum() which create its own copies of the string. Free the strings after initialization is done. Also, allocate color pipeline enum names only after successfully creating color pipeline. Fixes: 9ba25915efba ("drm/amd/display: Add support for sRGB EOTF in DEGAM block") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Alex Deucher <alexander.deucher@amd.com> #irc Link: https://patch.msgid.link/20260113102303.724205-3-chaitanya.kumar.borah@intel.com
2026-01-22drm/i915/color: Place 3D LUT after CSC in plane color pipelineChaitanya Kumar Borah
Move the 3D LUT block to its correct position in the plane color pipeline: [Pre-CSC] -> [CSC] -> [3DLUT] -> [Post-CSC] Fixes: 65db7a1f9cf7 ("drm/i915/color: Add 3D LUT to color pipeline") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Acked-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260113102303.724205-2-chaitanya.kumar.borah@intel.com
2026-01-22KVM: arm64: nv: Return correct RES0 bits for FGT registersZenghui Yu (Huawei)
We had extended the sysreg masking infrastructure to more general registers, instead of restricting it to VNCR-backed registers, since commit a0162020095e ("KVM: arm64: Extend masking facility to arbitrary registers"). Fix kvm_get_sysreg_res0() to reflect this fact. Note that we're sure that we only deal with FGT registers in kvm_get_sysreg_res0(), the if (sr < __VNCR_START__) is actually a never false, which should probably be removed later. Fixes: 69c19e047dfe ("KVM: arm64: Add TCR2_EL2 to the sysreg arrays") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260121101631.41037-1-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2026-01-22fpga: dfl: fix typo in header fileMukesh Kumar Sisodiya
fix typo "definitons" ==> "definitions" in DFHv1 Register Offset comment. Signed-off-by: Mukesh Kumar Sisodiya <mukeshkumarsisodiya1981@gmail.com> Link: https://lore.kernel.org/r/20251226150408.2802-1-mukeshkumarsisodiya1981@gmail.com Reviewed-by: Xu Yilun <yilun.xu@intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
2026-01-22net: dsa: yt921x: Add LAG offloading supportDavid Yang
Add offloading for a link aggregation group supported by the YT921x switches. Signed-off-by: David Yang <mmyangfl@gmail.com> Link: https://patch.msgid.link/20260117162116.1063043-1-mmyangfl@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-01-22Merge tag 'v6.20-rockchip-dts64-1' of ↵Krzysztof Kozlowski
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt New boards: Orange Pi CM5 module + Baseboard, Radxa CM5 module + IO-board. PCIe-slot-overlay for rk3576-evb1 New peripherals: some of the video decoders on rk3576 and rk3588 Enabled peripherals: many RK3588-NPUs and a lot of other peripherals on a plethora of boards. * tag 'v6.20-rockchip-dts64-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (40 commits) arm64: dts: rockchip: Add the vdpu383 Video Decoder on rk3576 arm64: dts: rockchip: Add the vdpu381 Video Decoders on RK3588 arm64: dts: rockchip: Add rk3588s-orangepi-cm5-base device tree dt-bindings: arm: rockchip: Add Orange Pi CM5 Base arm64: dts: rockchip: Enable second HDMI output on CM3588 arm64: dts: rockchip: Add HDMI to Gameforce Ace arm64: dts: rockchip: Enable analog sound on RK3576 EVB1 arm64: dts: rockchip: Enable HDMI sound on RK3576 EVB1 arm64: dts: rockchip: Enable HDMI sound on Luckfox Core3576 arm64: dts: rockchip: Enable HDMI sound on FriendlyElec NanoPi M5 arm64: dts: rockchip: Use a readable audio card name on NanoPi M5 arm64: dts: rockchip: enable NPU on rk3588-jaguar arm64: dts: rockchip: enable NPU on rk3588-tiger dt-bindings: arm: rockchip: fix description for Radxa CM5 dt-bindings: arm: rockchip: fix description for Radxa CM3I arm64: dts: rockchip: Add missing everest,es8388 supplies to rk3399-roc-pc-plus arm64: dts: rockchip: Enable PCIe for ArmSoM Sige1 arm64: dts: rockchip: Enable the NPU on Turing RK1 arm64: dts: rockchip: Enable the NPU on FriendlyElec CM3588 arm64: dts: rockchip: Enable the NPU on NanoPC T6/T6-LTS ... Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22KVM: arm64: Always populate FGT masks at boot timeMarc Zyngier
We currently only populate the FGT masks if the underlying HW does support FEAT_FGT. However, with the addition of the RES1 support for system registers, this results in a lot of noise at boot time, as reported by Nathan. That's because even if FGT isn't supported, we still check for the attribution of the bits to particular features, and not keeping the masks up-to-date leads to (fairly harmess) warnings. Given that we want these checks to be enforced even if the HW doesn't support FGT, enable the generation of FGT masks unconditionally (this is rather cheap anyway). Only the storage of the FGT configuration is avoided, which will save a tiny bit of memory on these machines. Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Fixes: c259d763e6b09 ("KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co") Link: https://lore.kernel.org/r/20260120211558.GA834868@ax162 Link: https://patch.msgid.link/20260122085153.535538-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-22Merge tag 'v6.20-rockchip-dts32-1' of ↵Krzysztof Kozlowski
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt HEVC decoder node for RK3288. * tag 'v6.20-rockchip-dts32-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: dts: rockchip: Add vdec node for RK3288 Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22drm/xe: Select CONFIG_DEVICE_PRIVATE when DRM_XE_GPUSVM is selectedThomas Hellström
CONFIG_DEVICE_PRIVATE is a prerequisite for DRM_XE_GPUSVM. Explicitly select it so that DRM_XE_GPUSVM is not unintentionally left out from distro configs not explicitly enabling CONFIG_DEVICE_PRIVATE. v2: - Select also CONFIG_ZONE_DEVICE since it's needed by CONFIG_DEVICE_PRIVATE. v3: - Depend on CONFIG_ZONE_DEVICE rather than selecting it. Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <dri-devel@lists.freedesktop.org> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260121091048.41371-3-thomas.hellstrom@linux.intel.com
2026-01-22drm, drm/xe: Fix xe userptr in the absence of CONFIG_DEVICE_PRIVATEThomas Hellström
CONFIG_DEVICE_PRIVATE is not selected by default by some distros, for example Fedora, and that leads to a regression in the xe driver since userptr support gets compiled out. It turns out that DRM_GPUSVM, which is needed for xe userptr support compiles also without CONFIG_DEVICE_PRIVATE, but doesn't compile without CONFIG_ZONE_DEVICE. Exclude the drm_pagemap files from compilation with !CONFIG_ZONE_DEVICE, and remove the CONFIG_DEVICE_PRIVATE dependency from CONFIG_DRM_GPUSVM and the xe driver's selection of it, re-enabling xe userptr for those configs. v2: - Don't compile the drm_pagemap files unless CONFIG_ZONE_DEVICE is set. - Adjust the drm_pagemap.h header accordingly. Fixes: 9e9787414882 ("drm/xe/userptr: replace xe_hmm with gpusvm") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.18+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260121091048.41371-2-thomas.hellstrom@linux.intel.com
2026-01-22Merge tag 'juno-updates-7.0' of ↵Krzysztof Kozlowski
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/dt Armv8 Juno/Vexpress updates for v7.0 This contains a small set of DT updates: 1. Align DTS node naming with established coding style by replacing underscores with hyphens in node names. This is a safe change and does not affect ABI. 2. Add support for the CMN PMU on the Arm Morello platform, exposing the CMN-Skeena (CMN-600 r3p1–compatible) PMU via the standard CMN-600 binding. This enables PMU access on real Morello SDP hardware, where the registers are functional. * tag 'juno-updates-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: arm64: dts: arm: Use hyphen in node names arm64: dts: morello: Add CMN PMU Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22iommu/vt-d: Fix race condition during PASID entry replacementLu Baolu
The Intel VT-d PASID table entry is 512 bits (64 bytes). When replacing an active PASID entry (e.g., during domain replacement), the current implementation calculates a new entry on the stack and copies it to the table using a single structure assignment. struct pasid_entry *pte, new_pte; pte = intel_pasid_get_entry(dev, pasid); pasid_pte_config_first_level(iommu, &new_pte, ...); *pte = new_pte; Because the hardware may fetch the 512-bit PASID entry in multiple 128-bit chunks, updating the entire entry while it is active (Present bit set) risks a "torn" read. In this scenario, the IOMMU hardware could observe an inconsistent state — partially new data and partially old data — leading to unpredictable behavior or spurious faults. Fix this by removing the unsafe "replace" helpers and following the "clear-then-update" flow, which ensures the Present bit is cleared and the required invalidation handshake is completed before the new configuration is applied. Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22iommu/vt-d: Clear Present bit before tearing down context entryLu Baolu
When tearing down a context entry, the current implementation zeros the entire 128-bit entry using multiple 64-bit writes. This creates a window where the hardware can fetch a "torn" entry — where some fields are already zeroed while the 'Present' bit is still set — leading to unpredictable behavior or spurious faults. While x86 provides strong write ordering, the compiler may reorder writes to the two 64-bit halves of the context entry. Even without compiler reordering, the hardware fetch is not guaranteed to be atomic with respect to multiple CPU writes. Align with the "Guidance to Software for Invalidations" in the VT-d spec (Section 6.5.3.3) by implementing the recommended ownership handshake: 1. Clear only the 'Present' (P) bit of the context entry first to signal the transition of ownership from hardware to software. 2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU. 3. Perform the required cache and context-cache invalidation to ensure hardware no longer has cached references to the entry. 4. Fully zero out the entry only after the invalidation is complete. Also, add a dma_wmb() to context_set_present() to ensure the entry is fully initialized before the 'Present' bit becomes visible. Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver") Reported-by: Dmytro Maluka <dmaluka@chromium.org> Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/ Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dmytro Maluka <dmaluka@chromium.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-01-22iommu/vt-d: Clear Present bit before tearing down PASID entryLu Baolu
The Intel VT-d Scalable Mode PASID table entry consists of 512 bits (64 bytes). When tearing down an entry, the current implementation zeros the entire 64-byte structure immediately using multiple 64-bit writes. Since the IOMMU hardware may fetch these 64 bytes using multiple internal transactions (e.g., four 128-bit bursts), updating or zeroing the entire entry while it is active (P=1) risks a "torn" read. If a hardware fetch occurs simultaneously with the CPU zeroing the entry, the hardware could observe an inconsistent state, leading to unpredictable behavior or spurious faults. Follow the "Guidance to Software for Invalidations" in the VT-d spec (Section 6.5.3.3) by implementing the recommended ownership handshake: 1. Clear only the 'Present' (P) bit of the PASID entry. 2. Use a dma_wmb() to ensure the cleared bit is visible to hardware before proceeding. 3. Execute the required invalidation sequence (PASID cache, IOTLB, and Device-TLB flush) to ensure the hardware has released all cached references. 4. Only after the flushes are complete, zero out the remaining fields of the PASID entry. Also, add a dma_wmb() in pasid_set_present() to ensure that all other fields of the PASID entry are visible to the hardware before the Present bit is set. Fixes: 0bbeb01a4faf ("iommu/vt-d: Manage scalalble mode PASID tables") Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Dmytro Maluka <dmaluka@chromium.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20260120061816.2132558-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>