summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-01-13KVM: SVM: Limit incorrect check on SVM_EXIT_ERR to running as a VMSean Christopherson
Limit KVM's incorrect check for VMXEXIT_INVALID, a.k.a. SVM_EXIT_ERR, to running as a VM, as detected by X86_FEATURE_HYPERVISOR. The exit_code and all failure codes, e.g. VMXEXIT_INVALID, are 64-bit values, and so checking only bits 31:0 could result in false positives when running on non-broken hardware, e.g. in the extremely unlikely scenario exit code 0xffffffffull is ever generated by hardware. Keep the 32-bit check to play nice with running on broken KVM (for years, KVM has not set bits 63:32 when synthesizing nested SVM VM-Exits). Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Treat exit_code as an unsigned 64-bit value through all of KVMSean Christopherson
Fix KVM's long-standing buggy handling of SVM's exit_code as a 32-bit value. Per the APM and Xen commit d1bd157fbc ("Big merge the HVM full-virtualisation abstractions.") (which is arguably more trustworthy than KVM), offset 0x70 is a single 64-bit value: 070h 63:0 EXITCODE Track exit_code as a single u64 to prevent reintroducing bugs where KVM neglects to correctly set bits 63:32. Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface") Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Filter out 64-bit exit codes when invoking exit handlers on bare metalSean Christopherson
Explicitly filter out 64-bit exit codes when invoking exit handlers, as svm_exit_handlers[] will never be sized with entries that use bits 63:32. Processing the non-failing exit code as a 32-bit value will allow tracking exit_code as a single 64-bit value (which it is, architecturally). This will also allow hardening KVM against Spectre-like attacks without needing to do silly things to avoid build failures on 32-bit kernels (array_index_nospec() rightly asserts that the index fits in an "unsigned long"). Omit the check when running as a VM, as KVM has historically failed to set bits 63:32 appropriately when synthesizing VM-Exits, i.e. KVM could get false positives when running as a VM on an older, broken KVM/kernel. From a functional perspective, omitting the check is "fine", as any unwanted collision between e.g. VMEXIT_INVALID and a 32-bit exit code will be fatal to KVM-on-KVM regardless of what KVM-as-L1 does. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Check for an unexpected VM-Exit after RETPOLINE "fast" handlingSean Christopherson
Check for an unexpected/unhandled VM-Exit after the manual RETPOLINE=y handling. The entire point of the RETPOLINE checks is to optimize for common VM-Exits, i.e. checking for the rare case of an unsupported VM-Exit is counter-productive. This also aligns SVM and VMX exit handling. No functional change intended. Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Open code handling of unexpected exits in svm_invoke_exit_handler()Sean Christopherson
Fold svm_check_exit_valid() and svm_handle_invalid_exit() into their sole caller, svm_invoke_exit_handler(), as having tiny single-use helpers makes the code unncessarily difficult to follow. This will also allow for additional cleanups in svm_invoke_exit_handler(). No functional change intended. Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://patch.msgid.link/20251230211347.4099600-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Add a helper to detect VMRUN failuresSean Christopherson
Add a helper to detect VMRUN failures so that KVM can guard against its own long-standing bug, where KVM neglects to set exitcode[63:32] when synthesizing a nested VMFAIL_INVALID VM-Exit. This will allow fixing KVM's mess of treating exitcode as two separate 32-bit values without breaking KVM-on-KVM when running on an older, unfixed KVM. Cc: Jim Mattson <jmattson@google.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev> Link: https://patch.msgid.link/20251230211347.4099600-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: x86: align the code with kvm_x86_call()Jun Miao
The use of static_call_cond() is essentially the same as static_call() on x86 (e.g. static_call() now handles a NULL pointer as a NOP), and then the kvm_x86_call() is added to improve code readability and maintainability for keeping consistent code style. Fixes 8d032b683c29 ("KVM: TDX: create/destroy VM structure") Link: https://lore.kernel.org/all/3916caa1dcd114301a49beafa5030eca396745c1.1679456900.git.jpoimboe@kernel.org/ Link: https://lore.kernel.org/r/20240507133103.15052-3-wei.w.wang@intel.com Signed-off-by: Jun Miao <jun.miao@intel.com> Link: https://patch.msgid.link/20260105065423.1870622-1-jun.miao@intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: x86: Ignore -EBUSY when checking nested events from vcpu_block()Sean Christopherson
Ignore -EBUSY when checking nested events after exiting a blocking state while L2 is active, as exiting to userspace will generate a spurious userspace exit, usually with KVM_EXIT_UNKNOWN, and likely lead to the VM's demise. Continuing with the wakeup isn't perfect either, as *something* has gone sideways if a vCPU is awakened in L2 with an injected event (or worse, a nested run pending), but continuing on gives the VM a decent chance of surviving without any major side effects. As explained in the Fixes commits, it _should_ be impossible for a vCPU to be put into a blocking state with an already-injected event (exception, IRQ, or NMI). Unfortunately, userspace can stuff MP_STATE and/or injected events, and thus put the vCPU into what should be an impossible state. Don't bother trying to preserve the WARN, e.g. with an anti-syzkaller Kconfig, as WARNs can (hopefully) be added in paths where _KVM_ would be violating x86 architecture, e.g. by WARNing if KVM attempts to inject an exception or interrupt while the vCPU isn't running. Cc: Alessandro Ratti <alessandro@0x65c.net> Cc: stable@vger.kernel.org Fixes: 26844fee6ade ("KVM: x86: never write to memory from kvm_vcpu_check_block()") Fixes: 45405155d876 ("KVM: x86: WARN if a vCPU gets a valid wakeup that KVM can't yet inject") Link: https://syzkaller.appspot.com/text?tag=ReproC&x=10d4261a580000 Reported-by: syzbot+1522459a74d26b0ac33a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671bc7a7.050a0220.455e8.022a.GAE@google.com Link: https://patch.msgid.link/20260109030657.994759-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Tag sev_supported_vmsa_features as read-only after initSean Christopherson
Tag sev_supported_vmsa_features with __ro_after_init as it's configured by sev_hardware_setup() and never written after initial configuration (and if it were, that'd be a blatant bug). Opportunistically relocate the variable out of the module params area now that sev_es_debug_swap_enabled is gone (which largely motivated its original location). Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20260109033101.1005769-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: SVM: Drop the module param to control SEV-ES DebugSwapSean Christopherson
Rip out the DebugSwap module param, as the sequence of events that led to its inclusion was one big mistake, the param no longer serves any purpose. Commit d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES") goofed by not adding a way for the userspace VMM to control the feature. Functionally, that was fine, but it broke attestation signatures because SEV_FEATURES are included in the signature. Commit 5abf6dceb066 ("SEV: disable SEV-ES DebugSwap by default") fixed that issue, but the underlying flaw of userspace not having a way to control SEV_FEATURES was still there. That flaw was addressed by commit 4f5defae7089 ("KVM: SEV: introduce KVM_SEV_INIT2 operation"), and so then 4dd5ecacb9a4 ("KVM: SEV: allow SEV-ES DebugSwap again") re-enabled DebugSwap by default. Now that the dust has settled, the module param doesn't serve any meaningful purpose. Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://patch.msgid.link/20260109033101.1005769-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: x86: Update APICv ISR (a.k.a. SVI) as part of kvm_apic_update_apicv()Sean Christopherson
Fold the calls to .hwapic_isr_update() in kvm_apic_set_state(), kvm_lapic_reset(), and __kvm_vcpu_update_apicv() into kvm_apic_update_apicv(), as updating SVI is directly related to updating KVM's own cache of ISR information, e.g. SVI is more or less the APICv equivalent of highest_isr_cache. Note, calling .hwapic_isr_update() during kvm_apic_update_apicv() has benign side effects, as doing so changes the orders of the calls in kvm_lapic_reset() and kvm_apic_set_state(), specifically with respect to to the order between .hwapic_isr_update() and .apicv_post_state_restore(). However, the changes in ordering are glorified nops as the former hook is VMX-only and the latter is SVM-only. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: nVMX: Switch to vmcs01 to set virtual APICv mode on-demand if L2 is activeSean Christopherson
If L1's virtual APIC mode changes while L2 is active, e.g. because L1 doesn't intercept writes to the APIC_BASE MSR and L2 changes the mode, temporarily load vmcs01 and do all of the necessary actions instead of deferring the update until the next nested VM-Exit. This will help in fixing yet more issues related to updates while L2 is active, e.g. KVM neglects to update vmcs02 MSR intercepts if vmcs01's MSR intercepts are modified while L2 is active. Not updating x2APIC MSRs is benign because vmcs01's settings are not factored into vmcs02's bitmap, but deferring the x2APIC MSR updates would create a weird, inconsistent state. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: nVMX: Switch to vmcs01 to update APIC page on-demand if L2 is activeSean Christopherson
If the KVM-owned APIC-access page is migrated while L2 is running, temporarily load vmcs01 and immediately update APIC_ACCESS_ADDR instead of deferring the update until the next nested VM-Exit. Once changing the virtual APIC mode is converted to always do on-demand updates, all of the "defer until vmcs01 is active" logic will be gone. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: nVMX: Switch to vmcs01 to refresh APICv controls on-demand if L2 is activeSean Christopherson
If APICv is (un)inhibited while L2 is running, temporarily load vmcs01 and immediately refresh the APICv controls in vmcs01 instead of deferring the update until the next nested VM-Exit. This all but eliminates potential ordering issues due to vmcs01 not being synchronized with kvm_lapic.apicv_active, e.g. where KVM _thinks_ it refreshed APICv, but vmcs01 still contains stale state. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-6-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: nVMX: Switch to vmcs01 to update SVI on-demand if L2 is activeSean Christopherson
If APICv is activated while L2 is running and triggers an SVI update, temporarily load vmcs01 and immediately update SVI instead of deferring the update until the next nested VM-Exit. This will eventually allow killing off kvm_apic_update_hwapic_isr(), and all of nVMX's deferred APICv updates. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: nVMX: Switch to vmcs01 to update TPR threshold on-demand if L2 is activeSean Christopherson
If KVM updates L1's TPR Threshold while L2 is active, temporarily load vmcs01 and immediately update TPR_THRESHOLD instead of deferring the update until the next nested VM-Exit. Deferring the TPR Threshold update is relatively straightforward, but for several APICv related updates, deferring updates creates ordering and state consistency problems, e.g. KVM at-large thinks APICv is enabled, but vmcs01 is still running with stale (and effectively unknown) state. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: nVMX: Switch to vmcs01 to update PML controls on-demand if L2 is activeSean Christopherson
If KVM toggles "CPU dirty logging", a.k.a. Page-Modification Logging (PML), while L2 is active, temporarily load vmcs01 and immediately update the relevant controls instead of deferring the update until the next nested VM-Exit. For PML, deferring the update is relatively straightforward, but for several APICv related updates, deferring updates creates ordering and state consistency problems, e.g. KVM at-large thinks APICv is enabled, but vmcs01 is still running with stale (and effectively unknown) state. Convert PML first precisely because it's the simplest case to handle: if something is broken with the vmcs01 <=> vmcs02 dance, then hopefully bugs will bisect here. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13KVM: selftests: Add a test to verify APICv updates (while L2 is active)Sean Christopherson
Add a test to verify KVM correctly handles a variety of edge cases related to APICv updates, and in particular updates that are triggered while L2 is actively running. Reviewed-by: Chao Gao <chao.gao@intel.com> Link: https://patch.msgid.link/20260109034532.1012993-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-01-13x86/entry/vdso32: When using int $0x80, use it directlyH. Peter Anvin
When neither sysenter32 nor syscall32 is available (on either FRED-capable 64-bit hardware or old 32-bit hardware), there is no reason to do a bunch of stack shuffling in __kernel_vsyscall. Unfortunately, just overwriting the initial "push" instructions will mess up the CFI annotations, so suffer the 3-byte NOP if not applicable. Similarly, inline the int $0x80 when doing inline system calls in the vdso instead of calling __kernel_vsyscall. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-11-hpa@zytor.com
2026-01-13x86/cpufeature: Replace X86_FEATURE_SYSENTER32 with X86_FEATURE_SYSFAST32H. Peter Anvin
In most cases, the use of "fast 32-bit system call" depends either on X86_FEATURE_SEP or X86_FEATURE_SYSENTER32 || X86_FEATURE_SYSCALL32. However, nearly all the logic for both is identical. Define X86_FEATURE_SYSFAST32 which indicates that *either* SYSENTER32 or SYSCALL32 should be used, for either 32- or 64-bit kernels. This defaults to SYSENTER; use SYSCALL if the SYSCALL32 bit is also set. As this removes ALL existing uses of X86_FEATURE_SYSENTER32, which is a kernel-only synthetic feature bit, simply remove it and replace it with X86_FEATURE_SYSFAST32. This leaves an unused alternative for a true 32-bit kernel, but that should really not matter in any way. The clearing of X86_FEATURE_SYSCALL32 can be removed once the patches for automatically clearing disabled features has been merged. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-10-hpa@zytor.com
2026-01-13x86/vdso: Abstract out vdso system call internalsH. Peter Anvin
Abstract out the calling of true system calls from the vdso into macros. It has been a very long time since gcc did not allow %ebx or %ebp in inline asm in 32-bit PIC mode; remove the corresponding hacks. Remove the use of memory output constraints in gettimeofday.h in favor of "memory" clobbers. The resulting code is identical for the current use cases, as the system call is usually a terminal fallback anyway, and it merely complicates the macroization. This patch adds only a handful of more lines of code than it removes, and in fact could be made substantially smaller by removing the macros for the argument counts that aren't currently used, however, it seems better to be general from the start. [ v3: remove stray comment from prototyping; remove VDSO_SYSCALL6() since it would require special handling on 32 bits and is currently unused. (Uros Biszjak) Indent nested preprocessor directives. ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Uros Bizjak <ubizjak@gmail.com> Link: https://patch.msgid.link/20251216212606.1325678-9-hpa@zytor.com
2026-01-13x86/entry/vdso: Include GNU_PROPERTY and GNU_STACK PHDRsH. Peter Anvin
Currently the vdso doesn't include .note.gnu.property or a GNU noexec stack annotation (the -z noexecstack in the linker script is ineffective because we specify PHDRs explicitly.) The motivation is that the dynamic linker currently do not check these. However, this is a weak excuse: the vdso*.so are also supposed to be usable at link libraries, and there is no reason why the dynamic linker might not want or need to check these in the future, so add them back in -- it is trivial enough. Use symbolic constants for the PHDR permission flags. [ v4: drop unrelated formatting changes ] Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-8-hpa@zytor.com
2026-01-13x86/entry/vdso32: Remove open-coded DWARF in sigreturn.SH. Peter Anvin
The vdso32 sigreturn.S contains open-coded DWARF bytecode, which includes a hack for gdb to not try to step back to a previous call instruction when backtracing from a signal handler. Neither of those are necessary anymore: the backtracing issue is handled by ".cfi_entry simple" and ".cfi_signal_frame", both of which have been supported for a very long time now, which allows the remaining frame to be built using regular .cfi annotations. Add a few more register offsets to the signal frame just for good measure. Replace the nop on fallthrough of the system call (which should never, ever happen) with a ud2a trap. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-7-hpa@zytor.com
2026-01-13x86/entry/vdso32: Remove SYSCALL_ENTER_KERNEL macro in sigreturn.SH. Peter Anvin
A macro SYSCALL_ENTER_KERNEL was defined in sigreturn.S, with the ability of overriding it. The override capability, however, is not used anywhere, and the macro name is potentially confusing because it seems to imply that sysenter/syscall could be used here, which is NOT true: the sigreturn system calls MUST use int $0x80. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-6-hpa@zytor.com
2026-01-13x86/entry/vdso32: Don't rely on int80_landing_pad for adjusting ipH. Peter Anvin
There is no fundamental reason to use the int80_landing_pad symbol to adjust ip when moving the vdso. If ip falls within the vdso, and the vdso is moved, we should change the ip accordingly, regardless of mode or location within the vdso. This *currently* can only happen on 32 bits, but there isn't any reason not to do so generically. Note that if this is ever possible from a vdso-internal call, then the user space stack will also needed to be adjusted (as well as the shadow stack, if enabled.) Fortunately this is not currently the case. At the moment, we don't even consider other threads when moving the vdso. The assumption is that it is only used by process freeze/thaw for migration, where this is not an issue. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-5-hpa@zytor.com
2026-01-13Merge branch 'improve-the-performance-of-btf-type-lookups-with-binary-search'Andrii Nakryiko
Donglin Peng says: ==================== Improve the performance of BTF type lookups with binary search From: Donglin Peng <pengdonglin@xiaomi.com> The series addresses the performance limitations of linear search in large BTFs by: 1. Adding BTF permutation support 2. Using resolve_btfids to sort BTF during the build phase 3. Checking BTF sorting 4. Using binary search when looking up types Patch #1 introduces an interface for btf__permute in libbpf to relay out BTF. Patch #2 adds test cases to validate the functionality of btf__permute in base and split BTF scenarios. Patch #3 introduces a new phase in the resolve_btfids tool to sort BTF by name in ascending order. Patches #4-#7 implement the sorting check and binary search. Patches #8-#10 optimize type lookup performance of some functions by skipping anonymous types or invoking btf_find_by_name_kind. Patch #11 refactors the code by calling str_is_empty. Here is a simple performance test result [1] for lookups to find 87,584 named types in vmlinux BTF: ./vmtest.sh -- ./test_progs -t btf_permute/perf -v Results: | Condition | Lookup Time | Improvement | |--------------------|-------------|--------------| | Unsorted (Linear) | 36,534 ms | Baseline | | Sorted (Binary) | 15 ms | 2437x faster | The binary search implementation reduces lookup time from 36.5 seconds to 15 milliseconds, achieving a **2437x** speedup for large-scale type queries. Changelog: v12: - Set the start_id to 1 instead of btf->start_id in the btf__find_by_name (AI) v11: - Link: https://lore.kernel.org/bpf/20260108031645.1350069-1-dolinux.peng@gmail.com/ - PATCH #1: Modify implementation of btf__permute: id_map[0] must be 0 for base BTF (Andrii) - PATCH #3: Refactor the code (Andrii) - PATCH #4~8: - Revert to using the binary search in v7 to simplify the code (Andrii) - Refactor the code of btf_check_sorted (Andrii, Eduard) - Rename sorted_start_id to named_start_id - Rename btf_sorted_start_id to btf_named_start_id, and add comments (Andrii, Eduard) v10: - Link: https://lore.kernel.org/all/20251218113051.455293-1-dolinux.peng@gmail.com/ - Improve btf__permute() documentation (Eduard) - Fall back to linear search when locating anonymous types (Eduard) - Remove redundant NULL name check in libbpf's linear search path (Eduard) - Simplify btf_check_sorted() implementation (Eduard) - Treat kernel modules as unsorted by default - Introduce btf_is_sorted and btf_sorted_start_id for clarity (Eduard) - Fix optimizations in btf_find_decl_tag_value() and btf_prepare_func_args() to support split BTF - Remove linear search branch in determine_ptr_size() - Rebase onto Ihor's v4 patch series [4] v9: - Link: https://lore.kernel.org/bpf/20251208062353.1702672-1-dolinux.peng@gmail.com/ - Optimize the performance of the function determine_ptr_size by invoking btf__find_by_name_kind - Optimize the performance of btf_find_decl_tag_value/btf_prepare_func_args/ bpf_core_add_cands by skipping anonymous types - Rebase the patch series onto Ihor's v3 patch series [3] v8 - Link: https://lore.kernel.org/bpf/20251126085025.784288-1-dolinux.peng@gmail.com/ - Remove the type dropping feature of btf__permute (Andrii) - Refactor the code of btf__permute (Andrii, Eduard) - Make the self-test code cleaner (Eduard) - Reconstruct the BTF sorting patch based on Ihor's patch series [2] - Simplify the sorting logic and place anonymous types before named types (Andrii, Eduard) - Optimize type lookup performance of two kernel functions - Refactoring the binary search and type lookup logic achieves a 4.2% performance gain, reducing the average lookup time (via the perf test code in [1] for 60,995 named types in vmlinux BTF) from 10,217 us (v7) to 9,783 us (v8). v7: - Link: https://lore.kernel.org/all/20251119031531.1817099-1-dolinux.peng@gmail.com/ - btf__permute API refinement: Adjusted id_map and id_map_cnt parameter usage so that for base BTF, id_map[0] now contains the new id of original type id 1 (instead of VOID type id 0), improving logical consistency - Selftest updates: Modified test cases to align with the API usage changes - Refactor the code of resolve_btfids v6: - Link: https://lore.kernel.org/all/20251117132623.3807094-1-dolinux.peng@gmail.com/ - ID Map-based reimplementation of btf__permute (Andrii) - Build-time BTF sorting using resolve_btfids (Alexei, Eduard) - Binary search method refactoring (Andrii) - Enhanced selftest coverage v5: - Link: https://lore.kernel.org/all/20251106131956.1222864-1-dolinux.peng@gmail.com/ - Refactor binary search implementation for improved efficiency (Thanks to Andrii and Eduard) - Extend btf__permute interface with 'ids_sz' parameter to support type dropping feature (suggested by Andrii). Plan subsequent reimplementation of id_map version for comparative analysis with current sequence interface - Add comprehensive test coverage for type dropping functionality - Enhance function comment clarity and accuracy v4: - Link: https://lore.kernel.org/all/20251104134033.344807-1-dolinux.peng@gmail.com/ - Abstracted btf_dedup_remap_types logic into a helper function (suggested by Eduard). - Removed btf_sort.c and implemented sorting separately for libbpf and kernel (suggested by Andrii). - Added test cases for both base BTF and split BTF scenarios (suggested by Eduard). - Added validation for name-only sorting of types (suggested by Andrii) - Refactored btf__permute implementation to reduce complexity (suggested by Andrii) - Add doc comments for btf__permute (suggested by Andrii) v3: - Link: https://lore.kernel.org/all/20251027135423.3098490-1-dolinux.peng@gmail.com/ - Remove sorting logic from libbpf and provide a generic btf__permute() interface (suggested by Andrii) - Omitted the search direction patch to avoid conflicts with base BTF (suggested by Eduard). - Include btf_sort.c directly in btf.c to reduce function call overhead v2: - Link: https://lore.kernel.org/all/20251020093941.548058-1-dolinux.peng@gmail.com/ - Moved sorting to the build phase to reduce overhead (suggested by Alexei). - Integrated sorting into btf_dedup_compact_and_sort_types (suggested by Eduard). - Added sorting checks during BTF parsing. - Consolidated common logic into btf_sort.c for sharing (suggested by Alan). v1: - Link: https://lore.kernel.org/all/20251013131537.1927035-1-dolinux.peng@gmail.com/ [1] https://github.com/pengdonglin137/btf_sort_test [2] https://lore.kernel.org/bpf/20251126012656.3546071-1-ihor.solodrai@linux.dev/ [3] https://lore.kernel.org/bpf/20251205223046.4155870-1-ihor.solodrai@linux.dev/ [4] https://lore.kernel.org/bpf/20251218003314.260269-1-ihor.solodrai@linux.dev/ ==================== Link: https://patch.msgid.link/20260109130003.3313716-1-dolinux.peng@gmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2026-01-13btf: Refactor the code by calling str_is_emptyDonglin Peng
Calling the str_is_empty function to clarify the code and no functional changes are introduced. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-12-dolinux.peng@gmail.com
2026-01-13bpf: Optimize the performance of find_bpffs_btf_enumsDonglin Peng
Currently, vmlinux BTF is unconditionally sorted during the build phase. The function btf_find_by_name_kind executes the binary search branch, so find_bpffs_btf_enums can be optimized by using btf_find_by_name_kind. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-10-dolinux.peng@gmail.com
2026-01-13bpf: Skip anonymous types in type lookup for performanceDonglin Peng
Currently, vmlinux and kernel module BTFs are unconditionally sorted during the build phase, with named types placed at the end. Thus, anonymous types should be skipped when starting the search. In my vmlinux BTF, the number of anonymous types is 61,747, which means the loop count can be reduced by 61,747. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-9-dolinux.peng@gmail.com
2026-01-13btf: Verify BTF sortingDonglin Peng
This patch checks whether the BTF is sorted by name in ascending order. If sorted, binary search will be used when looking up types. Specifically, vmlinux and kernel module BTFs are always sorted during the build phase with anonymous types placed before named types, so we only need to identify the starting ID of named types. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260109130003.3313716-8-dolinux.peng@gmail.com
2026-01-13btf: Optimize type lookup with binary searchDonglin Peng
Improve btf_find_by_name_kind() performance by adding binary search support for sorted types. Falls back to linear search for compatibility. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260109130003.3313716-7-dolinux.peng@gmail.com
2026-01-14drm/atomic: verify that gamma/degamma LUTs are not too bigDmitry Baryshkov
The kernel specifies LUT table sizes in a separate property, however it doesn't enforce it as a maximum. Some drivers implement max size check on their own in the atomic_check path. Other drivers simply ignore the issue. Perform LUT size validation in the generic place. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260106-drm-fix-lut-checks-v3-3-f7f979eb73c8@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-14drm/atomic: add max_size check to drm_property_replace_blob_from_id()Dmitry Baryshkov
The function drm_property_replace_blob_from_id() allows checking whether the blob size is equal to a predefined value. In case of variable-size properties (like the gamma / degamma LUTs) we might want to check for the blob size against the maximum, allowing properties of the size lesser than the max supported by the hardware. Extend the function in order to support such checks. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260106-drm-fix-lut-checks-v3-2-f7f979eb73c8@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-14drm/mode_object: add drm_object_immutable_property_get_value()Dmitry Baryshkov
We have a helper to get property values for non-atomic drivers and another one default property values for atomic drivers. In some cases we need the ability to get value of immutable property, no matter what kind of driver it is. Implement new property-related helper, drm_object_immutable_property_get_value(), which lets the caller to get the value of the immutable property. Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260106-drm-fix-lut-checks-v3-1-f7f979eb73c8@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2026-01-13libbpf: Verify BTF sortingDonglin Peng
This patch checks whether the BTF is sorted by name in ascending order. If sorted, binary search will be used when looking up types. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-6-dolinux.peng@gmail.com
2026-01-13libbpf: Optimize type lookup with binary search for sorted BTFDonglin Peng
This patch introduces binary search optimization for BTF type lookups when the BTF instance contains sorted types. The optimization significantly improves performance when searching for types in large BTF instances with sorted types. For unsorted BTF, the implementation falls back to the original linear search. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20260109130003.3313716-5-dolinux.peng@gmail.com
2026-01-13tools/resolve_btfids: Support BTF sorting featureDonglin Peng
This introduces a new BTF sorting phase that specifically sorts BTF types by name in ascending order, so that the binary search can be used to look up types. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-4-dolinux.peng@gmail.com
2026-01-13selftests/bpf: Add test cases for btf__permute functionalityDonglin Peng
This patch introduces test cases for the btf__permute function to ensure it works correctly with both base BTF and split BTF scenarios. The test suite includes: - test_permute_base: Validates permutation on base BTF - test_permute_split: Tests permutation on split BTF Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-3-dolinux.peng@gmail.com
2026-01-13libbpf: Add BTF permutation support for type reorderingDonglin Peng
Introduce btf__permute() API to allow in-place rearrangement of BTF types. This function reorganizes BTF type order according to a provided array of type IDs, updating all type references to maintain consistency. Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260109130003.3313716-2-dolinux.peng@gmail.com
2026-01-13x86/entry/vdso: Refactor the vdso buildH. Peter Anvin
- Separate out the vdso sources into common, vdso32, and vdso64 directories. - Build the 32- and 64-bit vdsos in their respective subdirectories; this greatly simplifies the build flags handling. - Unify the mangling of Makefile flags between the 32- and 64-bit vdso code as much as possible; all common rules are put in arch/x86/entry/vdso/common/Makefile.include. The remaining is very simple for 32 bits; the 64-bit one is only slightly more complicated because it contains the x32 generation rule. - Define __DISABLE_EXPORTS when building the vdso. This need seems to have been masked by different ordering compile flags before. - Change CONFIG_X86_64 to BUILD_VDSO32_64 in vdso32/system_call.S, to make it compatible with including fake_32bit_build.h. - The -fcf-protection= option was "leaking" from the kernel build, for reasons that was not clear to me. Furthermore, several distributions ship with it set to a default value other than "-fcf-protection=none". Make it match the configuration options for *user space*. Note that this patch may seem large, but the vast majority of it is simply code movement. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-4-hpa@zytor.com
2026-01-13x86/entry/vdso: Move vdso2c to arch/x86/toolsH. Peter Anvin
It is generally better to build tools in arch/x86/tools to keep host cflags proliferation down, and to reduce makefile sequencing issues. Move the vdso build tool vdso2c into arch/x86/tools in preparation for refactoring the vdso makefiles. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-3-hpa@zytor.com
2026-01-13x86/entry/vdso: Rename vdso_image_* to vdso*_imageH. Peter Anvin
The vdso .so files are named vdso*.so. These structures are binary images and descriptions of these files, so it is more consistent for them to have a naming that more directly mirrors the filenames. It is also very slightly more compact (by one character...) and simplifies the Makefile just a little bit. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://patch.msgid.link/20251216212606.1325678-2-hpa@zytor.com
2026-01-13drm/nouveau/kms/nv50-: Assert we hold nv50_disp->lock in nv50_head_flush_*Lyude Paul
Now that we've had one bug that occurred in nouveau as the result of nv50_head_flush_* being called without the appropriate locks, let's add some lockdep asserts to make sure this doesn't happen in the future. Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251219215344.170852-3-lyude@redhat.com
2026-01-13drm/nouveau/disp/nv50-: Set lock_core in curs507a_prepareLyude Paul
For a while, I've been seeing a strange issue where some (usually not all) of the display DMA channels will suddenly hang, particularly when there is a visible cursor on the screen that is being frequently updated, and especially when said cursor happens to go between two screens. While this brings back lovely memories of fixing Intel Skylake bugs, I would quite like to fix it :). It turns out the problem that's happening here is that we're managing to reach nv50_head_flush_set() in our atomic commit path without actually holding nv50_disp->mutex. This means that cursor updates happening in parallel (along with any other atomic updates that need to use the core channel) will race with eachother, which eventually causes us to corrupt the pushbuffer - leading to a plethora of various GSP errors, usually: nouveau 0000:c1:00.0: gsp: Xid:56 CMDre 00000000 00000218 00102680 00000004 00800003 nouveau 0000:c1:00.0: gsp: Xid:56 CMDre 00000000 0000021c 00040509 00000004 00000001 nouveau 0000:c1:00.0: gsp: Xid:56 CMDre 00000000 00000000 00000000 00000001 00000001 The reason this is happening is because generally we check whether we need to set nv50_atom->lock_core at the end of nv50_head_atomic_check(). However, curs507a_prepare is called from the fb_prepare callback, which happens after the atomic check phase. As a result, this can lead to commits that both touch the core channel but also don't grab nv50_disp->mutex. So, fix this by making sure that we set nv50_atom->lock_core in cus507a_prepare(). Reviewed-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: 1590700d94ac ("drm/nouveau/kms/nv50-: split each resource type into their own source files") Cc: <stable@vger.kernel.org> # v4.18+ Link: https://patch.msgid.link/20251219215344.170852-2-lyude@redhat.com
2026-01-13dt-bindings: riscv: extensions: Drop unnecessary select schemaRob Herring (Arm)
The "select" schema is not necessary because this schema is referenced by riscv/cpus.yaml schema. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2026-01-13dt-bindings: riscv: Add Sha and its comprised extensionsGuodong Xu
Add descriptions for the Sha extension and the seven extensions it comprises: Shcounterenw, Shgatpa, Shtvala, Shvsatpa, Shvstvala, Shvstvecd, and Ssstateen. Sha is ratified in the RVA23 Profiles Version 1.0 (commit 0273f3c921b6 "rva23/rvb23 ratified") as a new profile-defined extension that captures the full set of features that are mandated to be supported along with the H extension. Extensions Shcounterenw, Shgatpa, Shtvala, Shvsatpa, Shvstvala, Shvstvecd, and Ssstateen are ratified in the RISC-V Profiles Version 1.0 (commit b1d806605f87 "Updated to ratified state"). The requirement status for Sha and its comprised extension in RISC-V Profiles are: - Sha: Mandatory in RVA23S64 - H: Optional in RVA22S64; Mandatory in RVA23S64 - Shcounterenw: Optional in RVA22S64; Mandatory in RVA23S64 - Shgatpa: Optional in RVA22S64; Mandatory in RVA23S64 - Shtvala: Optional in RVA22S64; Mandatory in RVA23S64 - Shvsatpa: Optional in RVA22S64; Mandatory in RVA23S64 - Shvstvala: Optional in RVA22S64; Mandatory in RVA23S64 - Shvstvecd: Optional in RVA22S64; Mandatory in RVA23S64 - Ssstateen: Optional in RVA22S64; Mandatory in RVA23S64 Signed-off-by: Guodong Xu <guodong@riscstar.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2026-01-13dt-bindings: riscv: Add Ssccptr, Sscounterenw, Sstvala, Sstvecd, Ssu64xlGuodong Xu
Add descriptions for five new extensions: Ssccptr, Sscounterenw, Sstvala, Sstvecd, and Ssu64xl. These extensions are ratified in RISC-V Profiles Version 1.0 (commit b1d806605f87 "Updated to ratified state."). They are introduced as new extension names for existing features and regulate implementation details for RISC-V Profile compliance. According to RISC-V Profiles Version 1.0 and RVA23 Profiles Version 1.0, their requirement status are: - Ssccptr: Mandatory in RVA20S64, RVA22S64, RVA23S64 - Sscounterenw: Mandatory in RVA22S64, RVA23S64 - Sstvala: Mandatory in RVA20S64, RVA22S64, RVA23S64 - Sstvecd: Mandatory in RVA20S64, RVA22S64, RVA23S64 - Ssu64xl: Optional in RVA20S64, RVA22S64; Mandatory in RVA23S64 Signed-off-by: Guodong Xu <guodong@riscstar.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2026-01-13dt-bindings: riscv: Add descriptions for Za64rs, Ziccamoa, Ziccif, and ZicclsmGuodong Xu
Add descriptions for four extensions: Za64rs, Ziccamoa, Ziccif, and Zicclsm. These extensions are ratified in RISC-V Profiles Version 1.0 (commit b1d806605f87 "Updated to ratified state."). They are introduced as new extension names for existing features and regulate implementation details for RISC-V Profile compliance. According to RISC-V Profiles Version 1.0 and RVA23 Profiles Version 1.0, they are mandatory for the following profiles: - za64rs: Mandatory in RVA22U64, RVA23U64 - ziccamoa: Mandatory in RVA20U64, RVA22U64, RVA23U64 - ziccif: Mandatory in RVA20U64, RVA22U64, RVA23U64 - zicclsm: Mandatory in RVA20U64, RVA22U64, RVA23U64 Ziccrse specifies the main memory must support "RsrvEventual", which is one (totally there are four) of the support level for Load-Reserved/ Store-Conditional (LR/SC) atomic instructions. Thus it depends on Zalrsc. Ziccamoa specifies the main memory must support AMOArithmetic, among the four levels of PMA support defined for AMOs in the A extension. Thus it depends on Zaamo. Za64rs defines reservation sets are contiguous, naturally aligned, and a maximum of 64 bytes. Za64rs is consumed by two extensions: Zalrsc and Zawrs. Zawrs itself depends on Zalrsc too. Based on the relationship that "A" = Zaamo + Zalrsc, add the following dependencies checks: Za64rs -> Zalrsc or A Ziccrse -> Zalrsc or A Ziccamoa -> Zaamo or A Signed-off-by: Guodong Xu <guodong@riscstar.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2026-01-13dt-bindings: riscv: Add B ISA extension descriptionGuodong Xu
Add description of the single-letter B extension for Bit Manipulation. B is mandatory for RVA23U64. The B extension is ratified in the 20240411 version of the unprivileged ISA specification. According to the ratified spec, the B standard extension comprises instructions provided by the Zba, Zbb, and Zbs extensions. Add two-way dependency check to enforce that B implies Zba/Zbb/Zbs; and when Zba/Zbb/Zbs (all of them) are specified, then B must be added too. The reason why B/Zba/Zbb/Zbs must coexist at the same time is that unlike other single-letter extensions, B was ratified (Apr/2024) much later than its component extensions Zba/Zbb/Zbs (Jun/2021). When "b" is specified, zba/zbb/zbs must be present to ensure backward compatibility with existing software and kernels that only look for the explicit component strings. When all three components zba/zbb/zbs are specified, "b" should also be present. Making "b" mandatory when all three components are present. Existing devicetrees with zba/zbb/zbs but without "b" will generate warnings that can be fixed in follow-up patches. Signed-off-by: Guodong Xu <guodong@riscstar.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
2026-01-13dt-bindings: riscv: update ratified version of h, svinval, svnapot, svpbmtGuodong Xu
The descriptions for h, svinval, svnapot, and svpbmt extensions currently reference the "20191213 version of the privileged ISA specification". While an Unprivileged ISA document exists with that date, there is no corresponding ratified Privileged ISA specification. These extensions were ratified in the RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 20211203. Update the descriptions to reference the correct specification version. RISC-V International hosts a website [1] for ratified specifications. Following the "Ratified ISA Specifications", historical versions of Volume II Privileged ISA can be found. Link: https://riscv.org/specifications/ratified/ [1] Fixes: aeb71e42caae ("dt-bindings: riscv: deprecate riscv,isa") Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Guodong Xu <guodong@riscstar.com> Signed-off-by: Conor Dooley <conor.dooley@microchip.com>