summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
4 daystracing: fprobe: use rhltable for fprobe_ip_tableMenglong Dong
[ Upstream commit 0de4c70d04a46a3c266547dd4275ce25f623796a ] For now, all the kernel functions who are hooked by the fprobe will be added to the hash table "fprobe_ip_table". The key of it is the function address, and the value of it is "struct fprobe_hlist_node". The budget of the hash table is FPROBE_IP_TABLE_SIZE, which is 256. And this means the overhead of the hash table lookup will grow linearly if the count of the functions in the fprobe more than 256. When we try to hook all the kernel functions, the overhead will be huge. Therefore, replace the hash table with rhltable to reduce the overhead. Link: https://lore.kernel.org/all/20250819031825.55653-1-dongml2@chinatelecom.cn/ Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Stable-dep-of: 845947aca681 ("tracing/fprobe: Remove fprobe from hash in failure path") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 dayssched/ext: Implement cgroup_set_idle() callbackzhidao su
[ Upstream commit 347ed2d566dabb06c7970fff01129c4f59995ed6 ] Implement the missing cgroup_set_idle() callback that was marked as a TODO. This allows BPF schedulers to be notified when a cgroup's idle state changes, enabling them to adjust their scheduling behavior accordingly. The implementation follows the same pattern as other cgroup callbacks like cgroup_set_weight() and cgroup_set_bandwidth(). It checks if the BPF scheduler has implemented the callback and invokes it with the appropriate parameters. Fixes a spelling error in the cgroup_set_bandwidth() documentation. tj: s/scx_cgroup_rwsem/scx_cgroup_ops_rwsem/ to fix build breakage. Signed-off-by: zhidao su <soolaugust@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Stable-dep-of: 80afd4c84bc8 ("sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
4 daysmm/damon/core: implement damon_kdamond_pid()SeongJae Park
commit 4262c53236977de3ceaa3bf2aefdf772c9b874dd upstream. Patch series "mm/damon: hide kdamond and kdamond_lock from API callers". 'kdamond' and 'kdamond_lock' fields initially exposed to DAMON API callers for flexible synchronization and use cases. As DAMON API became somewhat complicated compared to the early days, Keeping those exposed could only encourage the API callers to invent more creative but complicated and difficult-to-debug use cases. Fortunately DAMON API callers didn't invent that many creative use cases. There exist only two use cases of 'kdamond' and 'kdamond_lock'. Finding whether the kdamond is actively running, and getting the pid of the kdamond. For the first use case, a dedicated API function, namely 'damon_is_running()' is provided, and all DAMON API callers are using the function for the use case. Hence only the second use case is where the fields are directly being used by DAMON API callers. To prevent future invention of complicated and erroneous use cases of the fields, hide the fields from the API callers. For that, provide new dedicated DAMON API functions for the remaining use case, namely damon_kdamond_pid(), migrate DAMON API callers to use the new function, and mark the fields as private fields. This patch (of 5): 'kdamond' and 'kdamond_lock' are directly being used by DAMON API callers for getting the pid of the corresponding kdamond. To discourage invention of creative but complicated and erroneous new usages of the fields that require careful synchronization, implement a new API function that can simply be used without the manual synchronizations. Link: https://lkml.kernel.org/r/20260115152047.68415-1-sj@kernel.org Link: https://lkml.kernel.org/r/20260115152047.68415-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 daysptrace: slightly saner 'get_dumpable()' logicLinus Torvalds
commit 31e62c2ebbfdc3fe3dbdf5e02c92a9dc67087a3a upstream. The 'dumpability' of a task is fundamentally about the memory image of the task - the concept comes from whether it can core dump or not - and makes no sense when you don't have an associated mm. And almost all users do in fact use it only for the case where the task has a mm pointer. But we have one odd special case: ptrace_may_access() uses 'dumpable' to check various other things entirely independently of the MM (typically explicitly using flags like PTRACE_MODE_READ_FSCREDS). Including for threads that no longer have a VM (and maybe never did, like most kernel threads). It's not what this flag was designed for, but it is what it is. The ptrace code does check that the uid/gid matches, so you do have to be uid-0 to see kernel thread details, but this means that the traditional "drop capabilities" model doesn't make any difference for this all. Make it all make a *bit* more sense by saying that if you don't have a MM pointer, we'll use a cached "last dumpability" flag if the thread ever had a MM (it will be zero for kernel threads since it is never set), and require a proper CAP_SYS_PTRACE capability to override. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysprintk: add print_hex_dump_devel()Thorsten Blum
[ Upstream commit d134feeb5df33fbf77f482f52a366a44642dba09 ] Add print_hex_dump_devel() as the hex dump equivalent of pr_devel(), which emits output only when DEBUG is enabled, but keeps call sites compiled otherwise. Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Stable-dep-of: 177730a273b1 ("crypto: caam - guard HMAC key hex dumps in hash_digest_key") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysfirmware: exynos-acpm: Drop fake 'const' on handle pointerKrzysztof Kozlowski
[ Upstream commit a2be37eedb52ea26938fa4cc9de1ff84963c57ad ] All the functions operating on the 'handle' pointer are claiming it is a pointer to const thus they should not modify the handle. In fact that's a false statement, because first thing these functions do is drop the cast to const with container_of: struct acpm_info *acpm = handle_to_acpm_info(handle); And with such cast the handle is easily writable with simple: acpm->handle.ops.pmic_ops.read_reg = NULL; The code is not correct logically, either, because functions like acpm_get_by_node() and acpm_handle_put() are meant to modify the handle reference counting, thus they must modify the handle. Modification here happens anyway, even if the reference counting is stored in the container which the handle is part of. The code does not have actual visible bug, but incorrect 'const' annotations could lead to incorrect compiler decisions. Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260224104203.42950-2-krzysztof.kozlowski@oss.qualcomm.com Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> [ dropped hunks for DVFS/clk-acpm files and `acpm_dvfs_ops` struct that don't exist in 6.18 ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysmmc: core: Optimize time for secure erase/trim for some Kingston eMMCsLuke Wang
[ Upstream commit d6bf2e64dec87322f2b11565ddb59c0e967f96e3 ] Kingston eMMC IY2964 and IB2932 takes a fixed ~2 seconds for each secure erase/trim operation regardless of size - that is, a single secure erase/trim operation of 1MB takes the same time as 1GB. With default calculated 3.5MB max discard size, secure erase 1GB requires ~300 separate operations taking ~10 minutes total. Add a card quirk, MMC_QUIRK_FIXED_SECURE_ERASE_TRIM_TIME, to set maximum secure erase size for those devices. This allows 1GB secure erase to complete in a single operation, reducing time from 10 minutes to just 2 seconds. Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysmmc: core: Add quirk for incorrect manufacturing dateAvri Altman
[ Upstream commit 263ff314cc5602599d481b0912a381555fcbad28 ] Some eMMC vendors need to report manufacturing dates beyond 2025 but are reluctant to update the EXT_CSD revision from 8 to 9. Changing the Updating the EXT_CSD revision may involve additional testing or qualification steps with customers. To ease this transition and avoid a full re-qualification process, a workaround is needed. This patch introduces a temporary quirk that re-purposes the year codes corresponding to 2010, 2011, and 2012 to represent the years 2026, 2027, and 2028, respectively. This solution is only valid for this three-year period. After 2028, vendors must update their firmware to set EXT_CSD_REV=9 to continue reporting the correct manufacturing date in compliance with the JEDEC standard. The `MMC_QUIRK_BROKEN_MDT` is introduced and enabled for all Sandisk devices to handle this behavior. Signed-off-by: Avri Altman <avri.altman@sandisk.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Stable-dep-of: d6bf2e64dec8 ("mmc: core: Optimize time for secure erase/trim for some Kingston eMMCs") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysdma-mapping: add __dma_from_device_group_begin()/end()Michael S. Tsirkin
[ Upstream commit ca085faabb42c31ee204235facc5a430cb9e78a9 ] When a structure contains a buffer that DMA writes to alongside fields that the CPU writes to, cache line sharing between the DMA buffer and CPU-written fields can cause data corruption on non-cache-coherent platforms. Add __dma_from_device_group_begin()/end() annotations to ensure proper alignment to prevent this: struct my_device { spinlock_t lock1; __dma_from_device_group_begin(); char dma_buffer1[16]; char dma_buffer2[16]; __dma_from_device_group_end(); spinlock_t lock2; }; Message-ID: <19163086d5e4704c316f18f6da06bc1c72968904.1767601130.git.mst@redhat.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Stable-dep-of: 3023c050af36 ("hwmon: (powerz) Avoid cacheline sharing for DMA buffer") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysfbdev: defio: Disconnect deferred I/O from the lifetime of struct fb_infoThomas Zimmermann
[ Upstream commit 9ded47ad003f09a94b6a710b5c47f4aa5ceb7429 ] Hold state of deferred I/O in struct fb_deferred_io_state. Allocate an instance as part of initializing deferred I/O and remove it only after the final mapping has been closed. If the fb_info and the contained deferred I/O meanwhile goes away, clear struct fb_deferred_io_state.info to invalidate the mapping. Any access will then result in a SIGBUS signal. Fixes a long-standing problem, where a device hot-unplug happens while user space still has an active mapping of the graphics memory. The hot- unplug frees the instance of struct fb_info. Accessing the memory will operate on undefined state. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 60b59beafba8 ("fbdev: mm: Deferred IO support") Cc: Helge Deller <deller@gmx.de> Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v2.6.22+ Signed-off-by: Helge Deller <deller@gmx.de> [ replaced kzalloc_obj(*fbdefio_state) with kzalloc(sizeof(*fbdefio_state), GFP_KERNEL) ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 daysfanotify: fix false positive on permission eventsMiklos Szeredi
commit 7746e3bd4cc19b5092e00d32d676e329bfcb6900 upstream. fsnotify_get_mark_safe() may return false for a mark on an unrelated group, which results in bypassing the permission check. Fix by skipping over detached marks that are not in the current group. CC: stable@vger.kernel.org Fixes: abc77577a669 ("fsnotify: Provide framework for dropping SRCU lock in ->handle_event") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260410144950.156160-1-mszeredi@redhat.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07driver core: Add kernel-doc for DEV_FLAG_COUNT enum valueDouglas Anderson
commit 5b484311507b5d403c1f7a45f6aa3778549e268b upstream. Even though nobody should use this value (except when declaring the "flags" bitmap), kernel-doc still gets upset that it's not documented. It reports: WARNING: ../include/linux/device.h:519 Enum value 'DEV_FLAG_COUNT' not described in enum 'struct_device_flags' Add the description of DEV_FLAG_COUNT. Fixes: a2225b6e834a ("driver core: Don't let a device probe until it's ready") Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/f318cd43-81fd-48b9-abf7-92af85f12f91@infradead.org Signed-off-by: Douglas Anderson <dianders@chromium.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260413195910.1.I23aca74fe2d3636a47df196a80920fecb2643220@changeid Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07mm: prevent droppable mappings from being lockedAnthony Yznaga
[ Upstream commit d239462787b072c78eb19fc1f155c3d411256282 ] Droppable mappings must not be lockable. There is a check for VMAs with VM_DROPPABLE set in mlock_fixup() along with checks for other types of unlockable VMAs which ensures this when calling mlock()/mlock2(). For mlockall(MCL_FUTURE), the check for unlockable VMAs is different. In apply_mlockall_flags(), if the flags parameter has MCL_FUTURE set, the current task's mm's default VMA flag field mm->def_flags has VM_LOCKED applied to it. VM_LOCKONFAULT is also applied if MCL_ONFAULT is also set. When these flags are set as default in this manner they are cleared in __mmap_complete() for new mappings that do not support mlock. A check for VM_DROPPABLE in __mmap_complete() is missing resulting in droppable mappings created with VM_LOCKED set. To fix this and reduce that chance of similar bugs in the future, introduce and use vma_supports_mlock(). Link: https://lkml.kernel.org/r/20260310155821.17869-1-anthony.yznaga@oracle.com Fixes: 9651fcedf7b9 ("mm: add MAP_DROPPABLE for designating always lazily freeable mappings") Signed-off-by: Anthony Yznaga <anthony.yznaga@oracle.com> Suggested-by: David Hildenbrand <david@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Tested-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jason A. Donenfeld <jason@zx2c4.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ added const to is_vm_hugetlb_page and stubbed vma_supports_mlock in vma_internal.h instead of the split-out stubs.h ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07randomize_kstack: Maintain kstack_offset per taskRyan Roberts
commit 37beb42560165869838e7d91724f3e629db64129 upstream. kstack_offset was previously maintained per-cpu, but this caused a couple of issues. So let's instead make it per-task. Issue 1: add_random_kstack_offset() and choose_random_kstack_offset() expected and required to be called with interrupts and preemption disabled so that it could manipulate per-cpu state. But arm64, loongarch and risc-v are calling them with interrupts and preemption enabled. I don't _think_ this causes any functional issues, but it's certainly unexpected and could lead to manipulating the wrong cpu's state, which could cause a minor performance degradation due to bouncing the cache lines. By maintaining the state per-task those functions can safely be called in preemptible context. Issue 2: add_random_kstack_offset() is called before executing the syscall and expands the stack using a previously chosen random offset. choose_random_kstack_offset() is called after executing the syscall and chooses and stores a new random offset for the next syscall. With per-cpu storage for this offset, an attacker could force cpu migration during the execution of the syscall and prevent the offset from being updated for the original cpu such that it is predictable for the next syscall on that cpu. By maintaining the state per-task, this problem goes away because the per-task random offset is updated after the syscall regardless of which cpu it is executing on. Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Closes: https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/ Cc: stable@vger.kernel.org Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://patch.msgid.link/20260303150840.3789438-2-ryan.roberts@arm.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07tpm: avoid -Wunused-but-set-variableArnd Bergmann
commit 6f1d4d2ecfcd1b577dc87350ea965fe81f272e83 upstream. Outside of the EFI tpm code, the TPM_MEMREMAP()/TPM_MEMUNMAP functions are defined as trivial macros, leading to the mapping_size variable ending up unused: In file included from drivers/char/tpm/tpm-sysfs.c:16: In file included from drivers/char/tpm/tpm.h:28: include/linux/tpm_eventlog.h:167:6: error: variable 'mapping_size' set but not used [-Werror,-Wunused-but-set-variable] 167 | int mapping_size; Turn the stubs into inline functions to avoid this warning. Cc: stable@vger.kernel.org # v5.3+ Fixes: c46f3405692d ("tpm: Reserve the TPM final events table") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07mm/damon/core: fix damon_call() vs kdamond_fn() exit raceSeongJae Park
commit 55da81663b9642dd046b26dd6f1baddbcf337c1e upstream. Patch series "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race". damon_call() and damos_walk() can leak memory and/or deadlock when they race with kdamond terminations. Fix those. This patch (of 2); When kdamond_fn() main loop is finished, the function cancels all remaining damon_call() requests and unset the damon_ctx->kdamond so that API callers and API functions themselves can know the context is terminated. damon_call() adds the caller's request to the queue first. After that, it shows if the kdamond of the damon_ctx is still running (damon_ctx->kdamond is set). Only if the kdamond is running, damon_call() starts waiting for the kdamond's handling of the newly added request. The damon_call() requests registration and damon_ctx->kdamond unset are protected by different mutexes, though. Hence, damon_call() could race with damon_ctx->kdamond unset, and result in deadlocks. For example, let's suppose kdamond successfully finished the damon_call() requests cancelling. Right after that, damon_call() is called for the context. It registers the new request, and shows the context is still running, because damon_ctx->kdamond unset is not yet done. Hence the damon_call() caller starts waiting for the handling of the request. However, the kdamond is already on the termination steps, so it never handles the new request. As a result, the damon_call() caller threads infinitely waits. Fix this by introducing another damon_ctx field, namely call_controls_obsolete. It is protected by the damon_ctx->call_controls_lock, which protects damon_call() requests registration. Initialize (unset) it in kdamond_fn() before letting damon_start() returns and set it just before the cancelling of remaining damon_call() requests is executed. damon_call() reads the obsolete field under the lock and avoids adding a new request. After this change, only requests that are guaranteed to be handled or cancelled are registered. Hence the after-registration DAMON context termination check is no longer needed. Remove it together. Note that the deadlock will not happen when damon_call() is called for repeat mode request. In tis case, damon_call() returns instead of waiting for the handling when the request registration succeeds and it shows the kdamond is running. However, if the request also has dealloc_on_cancel, the request memory would be leaked. The issue is found by sashiko [1]. Link: https://lore.kernel.org/20260327233319.3528-1-sj@kernel.org Link: https://lore.kernel.org/20260327233319.3528-2-sj@kernel.org Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1] Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 6.14.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07mm/alloc_tag: clear codetag for pages allocated before page_ext initializationHao Ge
commit 6b1842775a460245e97d36d3a67d0cfba7c4ff79 upstream. Due to initialization ordering, page_ext is allocated and initialized relatively late during boot. Some pages have already been allocated and freed before page_ext becomes available, leaving their codetag uninitialized. A clear example is in init_section_page_ext(): alloc_page_ext() calls kmemleak_alloc(). If the slab cache has no free objects, it falls back to the buddy allocator to allocate memory. However, at this point page_ext is not yet fully initialized, so these newly allocated pages have no codetag set. These pages may later be reclaimed by KASAN, which causes the warning to trigger when they are freed because their codetag ref is still empty. Use a global array to track pages allocated before page_ext is fully initialized. The array size is fixed at 8192 entries, and will emit a warning if this limit is exceeded. When page_ext initialization completes, set their codetag to empty to avoid warnings when they are freed later. This warning is only observed with CONFIG_MEM_ALLOC_PROFILING_DEBUG=Y and mem_profiling_compressed disabled: [ 9.582133] ------------[ cut here ]------------ [ 9.582137] alloc_tag was not set [ 9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1 [ 9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy) [ 9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550 [ 9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7 [ 9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246 [ 9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c [ 9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460 [ 9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324 [ 9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00 [ 9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360 [ 9.582206] FS: 00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000 [ 9.582208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0 [ 9.582211] PKRU: 55555554 [ 9.582212] Call Trace: [ 9.582213] <TASK> [ 9.582214] ? __pfx___pgalloc_tag_sub+0x10/0x10 [ 9.582216] ? check_bytes_and_report+0x68/0x140 [ 9.582219] __free_frozen_pages+0x2e4/0x1150 [ 9.582221] ? __free_slab+0xc2/0x2b0 [ 9.582224] qlist_free_all+0x4c/0xf0 [ 9.582227] kasan_quarantine_reduce+0x15d/0x180 [ 9.582229] __kasan_slab_alloc+0x69/0x90 [ 9.582232] kmem_cache_alloc_noprof+0x14a/0x500 [ 9.582234] do_getname+0x96/0x310 [ 9.582237] do_readlinkat+0x91/0x2f0 [ 9.582239] ? __pfx_do_readlinkat+0x10/0x10 [ 9.582240] ? get_random_bytes_user+0x1df/0x2c0 [ 9.582244] __x64_sys_readlinkat+0x96/0x100 [ 9.582246] do_syscall_64+0xce/0x650 [ 9.582250] ? __x64_sys_getrandom+0x13a/0x1e0 [ 9.582252] ? __pfx___x64_sys_getrandom+0x10/0x10 [ 9.582254] ? do_syscall_64+0x114/0x650 [ 9.582255] ? ksys_read+0xfc/0x1d0 [ 9.582258] ? __pfx_ksys_read+0x10/0x10 [ 9.582260] ? do_syscall_64+0x114/0x650 [ 9.582262] ? do_syscall_64+0x114/0x650 [ 9.582264] ? __pfx_fput_close_sync+0x10/0x10 [ 9.582266] ? file_close_fd_locked+0x178/0x2a0 [ 9.582268] ? __x64_sys_faccessat2+0x96/0x100 [ 9.582269] ? __x64_sys_close+0x7d/0xd0 [ 9.582271] ? do_syscall_64+0x114/0x650 [ 9.582273] ? do_syscall_64+0x114/0x650 [ 9.582275] ? clear_bhb_loop+0x50/0xa0 [ 9.582277] ? clear_bhb_loop+0x50/0xa0 [ 9.582279] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9.582280] RIP: 0033:0x7ffbbda345ee [ 9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48 [ 9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b [ 9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee [ 9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c [ 9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001 [ 9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033 [ 9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0 [ 9.582292] </TASK> [ 9.582293] ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/20260331081312.123719-1-hao.ge@linux.dev Fixes: dcfe378c81f72 ("lib: introduce support for page allocation tagging") Signed-off-by: Hao Ge <hao.ge@linux.dev> Suggested-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Suren Baghdasaryan <surenb@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07device property: Make modifications of fwnode "flags" thread safeDouglas Anderson
commit f72e77c33e4b5657af35125e75bab249256030f3 upstream. In various places in the kernel, we modify the fwnode "flags" member by doing either: fwnode->flags |= SOME_FLAG; fwnode->flags &= ~SOME_FLAG; This type of modification is not thread-safe. If two threads are both mucking with the flags at the same time then one can clobber the other. While flags are often modified while under the "fwnode_link_lock", this is not universally true. Create some accessor functions for setting, clearing, and testing the FWNODE flags and move all users to these accessor functions. New accessor functions use set_bit() and clear_bit(), which are thread-safe. Cc: stable@vger.kernel.org Fixes: c2c724c868c4 ("driver core: Add fw_devlink_parse_fwtree()") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Saravana Kannan <saravanak@kernel.org> Link: https://patch.msgid.link/20260317090112.v2.1.I0a4d03104ecd5103df3d76f66c8d21b1d15a2e38@changeid [ Fix fwnode_clear_flag() argument alignment, restore dropped blank line in fwnode_dev_initialized(), and remove unnecessary parentheses around fwnode_test_flag() calls. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07driver core: Don't let a device probe until it's readyDouglas Anderson
commit a2225b6e834a838ae3c93709760edc0a169eb2f2 upstream. The moment we link a "struct device" into the list of devices for the bus, it's possible probe can happen. This is because another thread can load the driver at any time and that can cause the device to probe. This has been seen in practice with a stack crawl that looks like this [1]: really_probe() __driver_probe_device() driver_probe_device() __driver_attach() bus_for_each_dev() driver_attach() bus_add_driver() driver_register() __platform_driver_register() init_module() [some module] do_one_initcall() do_init_module() load_module() __arm64_sys_finit_module() invoke_syscall() As a result of the above, it was seen that device_links_driver_bound() could be called for the device before "dev->fwnode->dev" was assigned. This prevented __fw_devlink_pickup_dangling_consumers() from being called which meant that other devices waiting on our driver's sub-nodes were stuck deferring forever. It's believed that this problem is showing up suddenly for two reasons: 1. Android has recently (last ~1 year) implemented an optimization to the order it loads modules [2]. When devices opt-in to this faster loading, modules are loaded one-after-the-other very quickly. This is unlike how other distributions do it. The reproduction of this problem has only been seen on devices that opt-in to Android's "parallel module loading". 2. Android devices typically opt-in to fw_devlink, and the most noticeable issue is the NULL "dev->fwnode->dev" in device_links_driver_bound(). fw_devlink is somewhat new code and also not in use by all Linux devices. Even though the specific symptom where "dev->fwnode->dev" wasn't assigned could be fixed by moving that assignment higher in device_add(), other parts of device_add() (like the call to device_pm_add()) are also important to run before probe. Only moving the "dev->fwnode->dev" assignment would likely fix the current symptoms but lead to difficult-to-debug problems in the future. Fix the problem by preventing probe until device_add() has run far enough that the device is ready to probe. If somehow we end up trying to probe before we're allowed, __driver_probe_device() will return -EPROBE_DEFER which will make certain the device is noticed. In the race condition that was seen with Android's faster module loading, we will temporarily add the device to the deferred list and then take it off immediately when device_add() probes the device. Instead of adding another flag to the bitfields already in "struct device", instead add a new "flags" field and use that. This allows us to freely change the bit from different thread without worrying about corrupting nearby bits (and means threads changing other bit won't corrupt us). [1] Captured on a machine running a downstream 6.6 kernel [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel Cc: stable@vger.kernel.org Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-07usb: xhci: Make usb_host_endpoint.hcpriv survive endpoint_disable()Michal Pecio
commit 25e531b422dc2ac90cdae3b6e74b5cdeb081440d upstream. xHCI hardware maintains its endpoint state between add_endpoint() and drop_endpoint() calls followed by successful check_bandwidth(). So does the driver. Core may call endpoint_disable() during xHCI endpoint life, so don't clear host_ep->hcpriv then, because this breaks endpoint_reset(). If a driver calls usb_set_interface(), submits URBs which make host sequence state non-zero and calls usb_clear_halt(), the device clears its sequence state but xhci_endpoint_reset() bails out. The next URB malfunctions: USB2 loses one packet, USB3 gets Transaction Error or may not complete at all on some (buggy?) HCs from ASMedia and AMD. This is triggered by uvcvideo on bulk video devices. The code was copied from ehci_endpoint_disable() but it isn't needed here - hcpriv should only be NULL on emulated root hub endpoints. It might prevent resetting and inadvertently enabling a disabled and dropped endpoint, but core shouldn't try to reset dropped endpoints. Document xhci requirements regarding hcpriv. They are currently met. Fixes: 18b74067ac78 ("xhci: Fix use-after-free regression in xhci clear hub TT implementation") Cc: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://patch.msgid.link/20260402131342.2628648-26-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22mm/userfaultfd: fix hugetlb fault mutex hash calculationJianhui Zhou
commit 0217c7fb4de4a40cee667eb21901f3204effe5ac upstream. In mfill_atomic_hugetlb(), linear_page_index() is used to calculate the page index for hugetlb_fault_mutex_hash(). However, linear_page_index() returns the index in PAGE_SIZE units, while hugetlb_fault_mutex_hash() expects the index in huge page units. This mismatch means that different addresses within the same huge page can produce different hash values, leading to the use of different mutexes for the same huge page. This can cause races between faulting threads, which can corrupt the reservation map and trigger the BUG_ON in resv_map_release(). Fix this by introducing hugetlb_linear_page_index(), which returns the page index in huge page granularity, and using it in place of linear_page_index(). Link: https://lkml.kernel.org/r/20260310110526.335749-1-jianhuizzzzz@gmail.com Fixes: a08c7193e4f1 ("mm/filemap: remove hugetlb special casing in filemap.c") Signed-off-by: Jianhui Zhou <jianhuizzzzz@gmail.com> Reported-by: syzbot+f525fd79634858f478e7@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f525fd79634858f478e7 Acked-by: SeongJae Park <sj@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Jane Chu <jane.chu@oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: JonasZhou <JonasZhou@zhaoxin.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: SeongJae Park <sj@kernel.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: x86: Use scratch field in MMIO fragment to hold small write valuesSean Christopherson
commit 0b16e69d17d8c35c5c9d5918bf596c75a44655d3 upstream. When exiting to userspace to service an emulated MMIO write, copy the to-be-written value to a scratch field in the MMIO fragment if the size of the data payload is 8 bytes or less, i.e. can fit in a single chunk, instead of pointing the fragment directly at the source value. This fixes a class of use-after-free bugs that occur when the emulator initiates a write using an on-stack, local variable as the source, the write splits a page boundary, *and* both pages are MMIO pages. Because KVM's ABI only allows for physically contiguous MMIO requests, accesses that split MMIO pages are separated into two fragments, and are sent to userspace one at a time. When KVM attempts to complete userspace MMIO in response to KVM_RUN after the first fragment, KVM will detect the second fragment and generate a second userspace exit, and reference the on-stack variable. The issue is most visible if the second KVM_RUN is performed by a separate task, in which case the stack of the initiating task can show up as truly freed data. ================================================================== BUG: KASAN: use-after-free in complete_emulated_mmio+0x305/0x420 Read of size 1 at addr ffff888009c378d1 by task syz-executor417/984 CPU: 1 PID: 984 Comm: syz-executor417 Not tainted 5.10.0-182.0.0.95.h2627.eulerosv2r13.x86_64 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 __kasan_report.cold+0x6c/0x84 kasan_report+0x3a/0x50 check_memory_region+0xfd/0x1f0 memcpy+0x20/0x60 complete_emulated_mmio+0x305/0x420 kvm_arch_vcpu_ioctl_run+0x63f/0x6d0 kvm_vcpu_ioctl+0x413/0xb20 __se_sys_ioctl+0x111/0x160 do_syscall_64+0x30/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 RIP: 0033:0x42477d Code: <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa8e6890e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00000000004d7338 RCX: 000000000042477d RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005 RBP: 00000000004d7330 R08: 00007fff28d546df R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d733c R13: 0000000000000000 R14: 000000000040a200 R15: 00007fff28d54720 The buggy address belongs to the page: page:0000000029f6a428 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9c37 flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea0000270dc8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888009c37780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff888009c37880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff888009c37900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff888009c37980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== The bug can also be reproduced with a targeted KVM-Unit-Test by hacking KVM to fill a large on-stack variable in complete_emulated_mmio(), i.e. by overwrite the data value with garbage. Limit the use of the scratch fields to 8-byte or smaller accesses, and to just writes, as larger accesses and reads are not affected thanks to implementation details in the emulator, but add a sanity check to ensure those details don't change in the future. Specifically, KVM never uses on-stack variables for accesses larger that 8 bytes, e.g. uses an operand in the emulator context, and *all* reads are buffered through the mem_read cache. Note! Using the scratch field for reads is not only unnecessary, it's also extremely difficult to handle correctly. As above, KVM buffers all reads through the mem_read cache, and heavily relies on that behavior when re-emulating the instruction after a userspace MMIO read exit. If a read splits a page, the first page is NOT an MMIO page, and the second page IS an MMIO page, then the MMIO fragment needs to point at _just_ the second chunk of the destination, i.e. its position in the mem_read cache. Taking the "obvious" approach of copying the fragment value into the destination when re-emulating the instruction would clobber the first chunk of the destination, i.e. would clobber the data that was read from guest memory. Fixes: f78146b0f923 ("KVM: Fix page-crossing MMIO") Suggested-by: Yashu Zhang <zhangjiaji1@huawei.com> Reported-by: Yashu Zhang <zhangjiaji1@huawei.com> Closes: https://lore.kernel.org/all/369eaaa2b3c1425c85e8477066391bc7@huawei.com Cc: stable@vger.kernel.org Tested-by: Tom Lendacky <thomas.lendacky@gmail.com> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://patch.msgid.link/20260225012049.920665-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22x86: rename and clean up __copy_from_user_inatomic_nocache()Linus Torvalds
commit 5de7bcaadf160c1716b20a263cf8f5b06f658959 upstream. Similarly to the previous commit, this renames the somewhat confusingly named function. But in this case, it was at least less confusing: the __copy_from_user_inatomic_nocache is indeed copying from user memory, and it is indeed ok to be used in an atomic context, so it will not warn about it. But the previous commit also removed the NTB mis-use of the __copy_from_user_inatomic_nocache() function, and as a result every call-site is now _actually_ doing a real user copy. That means that we can now do the proper user pointer verification too. End result: add proper address checking, remove the double underscores, and change the "nocache" to "nontemporal" to more accurately describe what this x86-only function actually does. It might be worth noting that only the target is non-temporal: the actual user accesses are normal memory accesses. Also worth noting is that non-x86 targets (and on older 32-bit x86 CPU's before XMM2 in the Pentium III) we end up just falling back on a regular user copy, so nothing can actually depend on the non-temporal semantics, but that has always been true. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: Remove subtle "struct kvm_stats_desc" pseudo-overlaySean Christopherson
[ Upstream commit da142f3d373a6ddaca0119615a8db2175ddc4121 ] Remove KVM's internal pseudo-overlay of kvm_stats_desc, which subtly aliases the flexible name[] in the uAPI definition with a fixed-size array of the same name. The unusual embedded structure results in compiler warnings due to -Wflex-array-member-not-at-end, and also necessitates an extra level of dereferencing in KVM. To avoid the "overlay", define the uAPI structure to have a fixed-size name when building for the kernel. Opportunistically clean up the indentation for the stats macros, and replace spaces with tabs. No functional change intended. Reported-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://lore.kernel.org/all/aPfNKRpLfhmhYqfP@kspp Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> [..] Acked-by: Anup Patel <anup@brainfault.org> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20251205232655.445294-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Stable-dep-of: 2619da73bb2f ("KVM: x86: Use __DECLARE_FLEX_ARRAY() for UAPI structures with VLAs") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22KVM: SEV: Disallow LAUNCH_FINISH if vCPUs are actively being createdSean Christopherson
commit 624bf3440d7214b62c22d698a0a294323f331d5d upstream. Reject LAUNCH_FINISH for SEV-ES and SNP VMs if KVM is actively creating one or more vCPUs, as KVM needs to process and encrypt each vCPU's VMSA. Letting userspace create vCPUs while LAUNCH_FINISH is in-progress is "fine", at least in the current code base, as kvm_for_each_vcpu() operates on online_vcpus, LAUNCH_FINISH (all SEV+ sub-ioctls) holds kvm->mutex, and fully onlining a vCPU in kvm_vm_ioctl_create_vcpu() is done under kvm->mutex. I.e. there's no difference between an in-progress vCPU and a vCPU that is created entirely after LAUNCH_FINISH. However, given that concurrent LAUNCH_FINISH and vCPU creation can't possibly work (for any reasonable definition of "work"), since userspace can't guarantee whether a particular vCPU will be encrypted or not, disallow the combination as a hardening measure, to reduce the probability of introducing bugs in the future, and to avoid having to reason about the safety of future changes related to LAUNCH_FINISH. Cc: Jethro Beekman <jethro@fortanix.com> Closes: https://lore.kernel.org/all/b31f7c6e-2807-4662-bcdd-eea2c1e132fa@fortanix.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260310234829.2608037-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-22dma-mapping: add DMA_ATTR_CPU_CACHE_CLEANMichael S. Tsirkin
[ Upstream commit 61868dc55a119a5e4b912d458fc2c48ba80a35fe ] When multiple small DMA_FROM_DEVICE or DMA_BIDIRECTIONAL buffers share a cacheline, and DMA_API_DEBUG is enabled, we get this warning: cacheline tracking EEXIST, overlapping mappings aren't supported. This is because when one of the mappings is removed, while another one is active, CPU might write into the buffer. Add an attribute for the driver to promise not to do this, making the overlapping safe, and suppressing the warning. Message-ID: <2d5d091f9d84b68ea96abd545b365dd1d00bbf48.1767601130.git.mst@redhat.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Stable-dep-of: 3d48c9fd78dd ("dma-debug: suppress cacheline overlap warning when arch has no DMA alignment requirement") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22soc: qcom: pd-mapper: Fix element length in servreg_loc_pfr_req_eiMukesh Ojha
[ Upstream commit 641f6fda143b879da1515f821ee475073678cf2a ] It looks element length declared in servreg_loc_pfr_req_ei for reason not matching servreg_loc_pfr_req's reason field due which we could observe decoding error on PD crash. qmi_decode_string_elem: String len 81 >= Max Len 65 Fix this by matching with servreg_loc_pfr_req's reason field. Fixes: 1ebcde047c54 ("soc: qcom: add pd-mapper implementation") Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Tested-by: Nikita Travkin <nikita@trvn.ru> Link: https://lore.kernel.org/r/20260129152320.3658053-2-mukesh.ojha@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-22x86: shadow stacks: proper error handling for mmap lockLinus Torvalds
[ Upstream commit 52f657e34d7b21b47434d9d8b26fa7f6778b63a0 ] 김영민 reports that shstk_pop_sigframe() doesn't check for errors from mmap_read_lock_killable(), which is a silly oversight, and also shows that we haven't marked those functions with "__must_check", which would have immediately caught it. So let's fix both issues. Reported-by: 김영민 <osori@hspace.io> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-18firmware: thead: Fix buffer overflow and use standard endian macrosMichal Wilczynski
commit 88c4bd90725557796c15878b7cb70066e9e6b5ab upstream. Addresses two issues in the TH1520 AON firmware protocol driver: 1. Fix a potential buffer overflow where the code used unsafe pointer arithmetic to access the 'mode' field through the 'resource' pointer with an offset. This was flagged by Smatch static checker as: "buffer overflow 'data' 2 <= 3" 2. Replace custom RPC_SET_BE* and RPC_GET_BE* macros with standard kernel endianness conversion macros (cpu_to_be16, etc.) for better portability and maintainability. The functionality was re-tested with the GPU power-up sequence, confirming the GPU powers up correctly and the driver probes successfully. [ 12.702370] powervr ffef400000.gpu: [drm] loaded firmware powervr/rogue_36.52.104.182_v1.fw [ 12.711043] powervr ffef400000.gpu: [drm] FW version v1.0 (build 6645434 OS) [ 12.719787] [drm] Initialized powervr 1.0.0 for ffef400000.gpu on minor 0 Fixes: e4b3cbd840e5 ("firmware: thead: Add AON firmware protocol driver") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/17a0ccce-060b-4b9d-a3c4-8d5d5823b1c9@stanley.mountain/ Signed-off-by: Michal Wilczynski <m.wilczynski@samsung.com> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Drew Fustini <fustini@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11usb: core: use dedicated spinlock for offload stateGuan-Yu Lin
commit bd3d245b0fef571f93504904df62b8865b1c0d34 upstream. Replace the coarse USB device lock with a dedicated offload_lock spinlock to reduce contention during offload operations. Use offload_pm_locked to synchronize with PM transitions and replace the legacy offload_at_suspend flag. Optimize usb_offload_get/put by switching from auto-resume/suspend to pm_runtime_get_if_active(). This ensures offload state is only modified when the device is already active, avoiding unnecessary power transitions. Cc: stable <stable@kernel.org> Fixes: ef82a4803aab ("xhci: sideband: add api to trace sideband usage") Signed-off-by: Guan-Yu Lin <guanyulin@google.com> Tested-by: Hailong Liu <hailong.liu@oppo.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://patch.msgid.link/20260401123238.3790062-2-guanyulin@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11iio: add IIO_DECLARE_QUATERNION() macroDavid Lechner
commit 56bd57e7b161f75535df91b229b0b2c64c6e5581 upstream. Add a new IIO_DECLARE_QUATERNION() macro that is used to declare the field in an IIO buffer struct that contains a quaternion vector. Quaternions are currently the only IIO data type that uses the .repeat feature of struct iio_scan_type. This has an implicit rule that the element in the buffer must be aligned to the entire size of the repeated element. This macro will make that requirement explicit. Since this is the only user, we just call the macro IIO_DECLARE_QUATERNION() instead of something more generic. Signed-off-by: David Lechner <dlechner@baylibre.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: <Stable@vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-11netfilter: ipset: use nla_strcmp for IPSET_ATTR_NAME attrFlorian Westphal
[ Upstream commit b7e8590987aa94c9dc51518fad0e58cb887b1db5 ] IPSET_ATTR_NAME and IPSET_ATTR_NAMEREF are of NLA_STRING type, they cannot be treated like a c-string. They either have to be switched to NLA_NUL_STRING, or the compare operations need to use the nla functions. Fixes: f830837f0eed ("netfilter: ipset: list:set set type support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-11net: introduce mangleid_featuresPaolo Abeni
[ Upstream commit 31c5a71d982b57df75858974634c2f0a338f2fc6 ] Some/most devices implementing gso_partial need to disable the GSO partial features when the IP ID can't be mangled; to that extend each of them implements something alike the following[1]: if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID)) features &= ~NETIF_F_TSO; in the ndo_features_check() op, which leads to a bit of duplicate code. Later patch in the series will implement GSO partial support for virtual devices, and the current status quo will require more duplicate code and a new indirect call in the TX path for them. Introduce the mangleid_features mask, allowing the core to disable NIC features based on/requiring MANGLEID, without any further intervention from the driver. The same functionality could be alternatively implemented adding a single boolean flag to the struct net_device, but would require an additional checks in ndo_features_check(). Also note that [1] is incorrect if the NIC additionally implements NETIF_F_GSO_UDP_L4, mangleid_features transparently handle even such a case. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/5a7cdaeea40b0a29b88e525b6c942d73ed3b8ce7.1769011015.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: ddc748a391dd ("net: use skb_header_pointer() for TCPv4 GSO frag_off check") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-11netdevsim: fix build if SKB_EXTENSIONS=nQingfang Deng
[ Upstream commit 57a04a13aac1f247d171c3f3aef93efc69e6979e ] __skb_ext_put() is not declared if SKB_EXTENSIONS is not enabled, which causes a build error: drivers/net/netdevsim/netdev.c: In function 'nsim_forward_skb': drivers/net/netdevsim/netdev.c:114:25: error: implicit declaration of function '__skb_ext_put'; did you mean 'skb_ext_put'? [-Werror=implicit-function-declaration] 114 | __skb_ext_put(psp_ext); | ^~~~~~~~~~~~~ | skb_ext_put cc1: some warnings being treated as errors Add a stub to fix the build. Fixes: 7d9351435ebb ("netdevsim: drop PSP ext ref on forward failure") Signed-off-by: Qingfang Deng <dqfext@gmail.com> Link: https://patch.msgid.link/20260324140857.783-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02futex: Fix UaF between futex_key_to_node_opt() and vma_replace_policy()Hao-Yu Yang
[ Upstream commit 190a8c48ff623c3d67cb295b4536a660db2012aa ] During futex_key_to_node_opt() execution, vma->vm_policy is read under speculative mmap lock and RCU. Concurrently, mbind() may call vma_replace_policy() which frees the old mempolicy immediately via kmem_cache_free(). This creates a race where __futex_key_to_node() dereferences a freed mempolicy pointer, causing a use-after-free read of mpol->mode. [ 151.412631] BUG: KASAN: slab-use-after-free in __futex_key_to_node (kernel/futex/core.c:349) [ 151.414046] Read of size 2 at addr ffff888001c49634 by task e/87 [ 151.415969] Call Trace: [ 151.416732] __asan_load2 (mm/kasan/generic.c:271) [ 151.416777] __futex_key_to_node (kernel/futex/core.c:349) [ 151.416822] get_futex_key (kernel/futex/core.c:374 kernel/futex/core.c:386 kernel/futex/core.c:593) Fix by adding rcu to __mpol_put(). Fixes: c042c505210d ("futex: Implement FUTEX2_MPOL") Reported-by: Hao-Yu Yang <naup96721@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Hao-Yu Yang <naup96721@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Link: https://patch.msgid.link/20260324174418.GB1850007@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02netfs: Fix the handling of stream->front by removing itDavid Howells
[ Upstream commit 0e764b9d46071668969410ec5429be0e2f38c6d3 ] The netfs_io_stream::front member is meant to point to the subrequest currently being collected on a stream, but it isn't actually used this way by direct write (which mostly ignores it). However, there's a tracepoint which looks at it. Further, stream->front is actually redundant with stream->subrequests.next. Fix the potential problem in the direct code by just removing the member and using stream->subrequests.next instead, thereby also simplifying the code. Fixes: a0b4c7a49137 ("netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence") Reported-by: Paulo Alcantara <pc@manguebit.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/4158599.1774426817@warthog.procyon.org.uk Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02mm/huge_memory: fix folio isn't locked in softleaf_to_folio()Jinjiang Tu
[ Upstream commit 4c5e7f0fcd592801c9cc18f29f80fbee84eb8669 ] On arm64 server, we found folio that get from migration entry isn't locked in softleaf_to_folio(). This issue triggers when mTHP splitting and zap_nonpresent_ptes() races, and the root cause is lack of memory barrier in softleaf_to_folio(). The race is as follows: CPU0 CPU1 deferred_split_scan() zap_nonpresent_ptes() lock folio split_folio() unmap_folio() change ptes to migration entries __split_folio_to_order() softleaf_to_folio() set flags(including PG_locked) for tail pages folio = pfn_folio(softleaf_to_pfn(entry)) smp_wmb() VM_WARN_ON_ONCE(!folio_test_locked(folio)) prep_compound_page() for tail pages In __split_folio_to_order(), smp_wmb() guarantees page flags of tail pages are visible before the tail page becomes non-compound. smp_wmb() should be paired with smp_rmb() in softleaf_to_folio(), which is missed. As a result, if zap_nonpresent_ptes() accesses migration entry that stores tail pfn, softleaf_to_folio() may see the updated compound_head of tail page before page->flags. This issue will trigger VM_WARN_ON_ONCE() in pfn_swap_entry_folio() because of the race between folio split and zap_nonpresent_ptes() leading to a folio incorrectly undergoing modification without a folio lock being held. This is a BUG_ON() before commit 93976a20345b ("mm: eliminate further swapops predicates"), which in merged in v6.19-rc1. To fix it, add missing smp_rmb() if the softleaf entry is migration entry in softleaf_to_folio() and softleaf_to_page(). [tujinjiang@huawei.com: update function name and comments] Link: https://lkml.kernel.org/r/20260321075214.3305564-1-tujinjiang@huawei.com Link: https://lkml.kernel.org/r/20260319012541.4158561-1-tujinjiang@huawei.com Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()") Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Barry Song <baohua@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ applied fix to swapops.h using old pfn_swap_entry/swp_entry_t naming ] Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-02mm/damon/core: avoid use of half-online-committed contextSeongJae Park
commit 26f775a054c3cda86ad465a64141894a90a9e145 upstream. One major usage of damon_call() is online DAMON parameters update. It is done by calling damon_commit_ctx() inside the damon_call() callback function. damon_commit_ctx() can fail for two reasons: 1) invalid parameters and 2) internal memory allocation failures. In case of failures, the damon_ctx that attempted to be updated (commit destination) can be partially updated (or, corrupted from a perspective), and therefore shouldn't be used anymore. The function only ensures the damon_ctx object can safely deallocated using damon_destroy_ctx(). The API callers are, however, calling damon_commit_ctx() only after asserting the parameters are valid, to avoid damon_commit_ctx() fails due to invalid input parameters. But it can still theoretically fail if the internal memory allocation fails. In the case, DAMON may run with the partially updated damon_ctx. This can result in unexpected behaviors including even NULL pointer dereference in case of damos_commit_dests() failure [1]. Such allocation failure is arguably too small to fail, so the real world impact would be rare. But, given the bad consequence, this needs to be fixed. Avoid such partially-committed (maybe-corrupted) damon_ctx use by saving the damon_commit_ctx() failure on the damon_ctx object. For this, introduce damon_ctx->maybe_corrupted field. damon_commit_ctx() sets it when it is failed. kdamond_call() checks if the field is set after each damon_call_control->fn() is executed. If it is set, ignore remaining callback requests and return. All kdamond_call() callers including kdamond_fn() also check the maybe_corrupted field right after kdamond_call() invocations. If the field is set, break the kdamond_fn() main loop so that DAMON sill doesn't use the context that might be corrupted. [sj@kernel.org: let kdamond_call() with cancel regardless of maybe_corrupted] Link: https://lkml.kernel.org/r/20260320031553.2479-1-sj@kernel.org Link: https://sashiko.dev/#/patchset/20260319145218.86197-1-sj%40kernel.org Link: https://lkml.kernel.org/r/20260319145218.86197-1-sj@kernel.org Link: https://lore.kernel.org/20260319043309.97966-1-sj@kernel.org [1] Fixes: 3301f1861d34 ("mm/damon/sysfs: handle commit command using damon_call()") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-02writeback: don't block sync for filesystems with no data integrity guaranteesJoanne Koong
commit 76f9377cd2ab7a9220c25d33940d9ca20d368172 upstream. Add a SB_I_NO_DATA_INTEGRITY superblock flag for filesystems that cannot guarantee data persistence on sync (eg fuse). For superblocks with this flag set, sync kicks off writeback of dirty inodes but does not wait for the flusher threads to complete the writeback. This replaces the per-inode AS_NO_DATA_INTEGRITY mapping flag added in commit f9a49aa302a0 ("fs/writeback: skip AS_NO_DATA_INTEGRITY mappings in wait_sb_inodes()"). The flag belongs at the superblock level because data integrity is a filesystem-wide property, not a per-inode one. Having this flag at the superblock level also allows us to skip having to iterate every dirty inode in wait_sb_inodes() only to skip each inode individually. Prior to this commit, mappings with no data integrity guarantees skipped waiting on writeback completion but still waited on the flusher threads to finish initiating the writeback. Waiting on the flusher threads is unnecessary. This commit kicks off writeback but does not wait on the flusher threads. This change properly addresses a recent report [1] for a suspend-to-RAM hang seen on fuse-overlayfs that was caused by waiting on the flusher threads to finish: Workqueue: pm_fs_sync pm_fs_sync_work_fn Call Trace: <TASK> __schedule+0x457/0x1720 schedule+0x27/0xd0 wb_wait_for_completion+0x97/0xe0 sync_inodes_sb+0xf8/0x2e0 __iterate_supers+0xdc/0x160 ksys_sync+0x43/0xb0 pm_fs_sync_work_fn+0x17/0xa0 process_one_work+0x193/0x350 worker_thread+0x1a1/0x310 kthread+0xfc/0x240 ret_from_fork+0x243/0x280 ret_from_fork_asm+0x1a/0x30 </TASK> On fuse this is problematic because there are paths that may cause the flusher thread to block (eg if systemd freezes the user session cgroups first, which freezes the fuse daemon, before invoking the kernel suspend. The kernel suspend triggers ->write_node() which on fuse issues a synchronous setattr request, which cannot be processed since the daemon is frozen. Or if the daemon is buggy and cannot properly complete writeback, initiating writeback on a dirty folio already under writeback leads to writeback_get_folio() -> folio_prepare_writeback() -> unconditional wait on writeback to finish, which will cause a hang). This commit restores fuse to its prior behavior before tmp folios were removed, where sync was essentially a no-op. [1] https://lore.kernel.org/linux-fsdevel/CAJnrk1a-asuvfrbKXbEwwDSctvemF+6zfhdnuzO65Pt8HsFSRw@mail.gmail.com/T/#m632c4648e9cafc4239299887109ebd880ac6c5c1 Fixes: 0c58a97f919c ("fuse: remove tmp folio for writebacks and internal rb tree") Reported-by: John <therealgraysky@proton.me> Cc: stable@vger.kernel.org Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260320005145.2483161-2-joannelkoong@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-04-02spi: use generic driver_override infrastructureDanilo Krummrich
[ Upstream commit cc34d77dd48708d810c12bfd6f5bf03304f6c824 ] When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Also note that we do not enable the driver_override feature of struct bus_type, as SPI - in contrast to most other buses - passes "" to sysfs_emit() when the driver_override pointer is NULL. Thus, printing "\n" instead of "(null)\n". Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 5039563e7c25 ("spi: Add driver_override SPI device attribute") Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20260324005919.2408620-12-dakr@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02dma-mapping: add missing `inline` for `dma_free_attrs`Miguel Ojeda
[ Upstream commit 2cdaff22ed26f1e619aa2b43f27bb84f2c6ef8f8 ] Under an UML build for an upcoming series [1], I got `-Wstatic-in-inline` for `dma_free_attrs`: BINDGEN rust/bindings/bindings_generated.rs - due to target missing In file included from rust/helpers/helpers.c:59: rust/helpers/dma.c:17:2: warning: static function 'dma_free_attrs' is used in an inline function with external linkage [-Wstatic-in-inline] 17 | dma_free_attrs(dev, size, cpu_addr, dma_handle, attrs); | ^ rust/helpers/dma.c:12:1: note: use 'static' to give inline function 'rust_helper_dma_free_attrs' internal linkage 12 | __rust_helper void rust_helper_dma_free_attrs(struct device *dev, size_t size, | ^ | static The issue is that `dma_free_attrs` was not marked `inline` when it was introduced alongside the rest of the stubs. Thus mark it. Fixes: ed6ccf10f24b ("dma-mapping: properly stub out the DMA API for !CONFIG_HAS_DMA") Closes: https://lore.kernel.org/rust-for-linux/20260322194616.89847-1-ojeda@kernel.org/ [1] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260325015548.70912-1-ojeda@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02virtio-net: correct hdr_len handling for tunnel gsoXuan Zhuo
[ Upstream commit 6c860dc02a8e60b438e26940227dfa641fcdb66a ] The commit a2fb4bc4e2a6a03 ("net: implement virtio helpers to handle UDP GSO tunneling.") introduces support for the UDP GSO tunnel feature in virtio-net. The virtio spec says: If the \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, \field{hdr_len} accounts for all the headers up to and including the inner transport. The commit did not update the hdr_len to include the inner transport. I observed that the "hdr_len" is 116 for this packet: 17:36:18.241105 52:55:00:d1:27:0a > 2e:2c:df:46:a9:e1, ethertype IPv4 (0x0800), length 2912: (tos 0x0, ttl 64, id 45197, offset 0, flags [none], proto UDP (17), length 2898) 192.168.122.100.50613 > 192.168.122.1.4789: [bad udp cksum 0x8106 -> 0x26a0!] VXLAN, flags [I] (0x08), vni 1 fa:c3:ba:82:05:ee > ce:85:0c:31:77:e5, ethertype IPv4 (0x0800), length 2862: (tos 0x0, ttl 64, id 14678, offset 0, flags [DF], proto TCP (6), length 2848) 192.168.3.1.49880 > 192.168.3.2.9898: Flags [P.], cksum 0x9266 (incorrect -> 0xaa20), seq 515667:518463, ack 1, win 64, options [nop,nop,TS val 2990048824 ecr 2798801412], length 2796 116 = 14(mac) + 20(ip) + 8(udp) + 8(vxlan) + 14(inner mac) + 20(inner ip) + 32(innner tcp) Fixes: a2fb4bc4e2a6a03 ("net: implement virtio helpers to handle UDP GSO tunneling.") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260320021818.111741-3-xuanzhuo@linux.alibaba.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02virtio-net: correct hdr_len handling for VIRTIO_NET_F_GUEST_HDRLENXuan Zhuo
[ Upstream commit 38ec410b99a5ee6566f75650ce3d4fd632940fd0 ] The commit be50da3e9d4a ("net: virtio_net: implement exact header length guest feature") introduces support for the VIRTIO_NET_F_GUEST_HDRLEN feature in virtio-net. This feature requires virtio-net to set hdr_len to the actual header length of the packet when transmitting, the number of bytes from the start of the packet to the beginning of the transport-layer payload. However, in practice, hdr_len was being set using skb_headlen(skb), which is clearly incorrect. This commit fixes that issue. Fixes: be50da3e9d4a ("net: virtio_net: implement exact header length guest feature") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260320021818.111741-2-xuanzhuo@linux.alibaba.com Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02usb: core: new quirk to handle devices with zero configurationsJie Deng
[ Upstream commit 9f6a983cfa22ac662c86e60816d3a357d4b551e9 ] Some USB devices incorrectly report bNumConfigurations as 0 in their device descriptor, which causes the USB core to reject them during enumeration. logs: usb 1-2: device descriptor read/64, error -71 usb 1-2: no configurations usb 1-2: can't read configurations, error -22 However, these devices actually work correctly when treated as having a single configuration. Add a new quirk USB_QUIRK_FORCE_ONE_CONFIG to handle such devices. When this quirk is set, assume the device has 1 configuration instead of failing with -EINVAL. This quirk is applied to the device with VID:PID 5131:2007 which exhibits this behavior. Signed-off-by: Jie Deng <dengjie03@kylinos.cn> Link: https://patch.msgid.link/20260227084931.1527461-1-dengjie03@kylinos.cn Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02net: usb: r8152: add TRENDnet TUC-ET2GValentin Spreckels
[ Upstream commit 15fba71533bcdfaa8eeba69a5a5a2927afdf664a ] The TRENDnet TUC-ET2G is a RTL8156 based usb ethernet adapter. Add its vendor and product IDs. Signed-off-by: Valentin Spreckels <valentin@spreckels.dev> Link: https://patch.msgid.link/20260226195409.7891-2-valentin@spreckels.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02driver core: platform: use generic driver_override infrastructureDanilo Krummrich
[ Upstream commit 2b38efc05bf7a8568ec74bfffea0f5cfa62bc01d ] When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 3d713e0e382e ("driver core: platform: add device binding path 'driver_override'") Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260303115720.48783-5-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-04-02driver core: generalize driver_override in struct deviceDanilo Krummrich
[ Upstream commit cb3d1049f4ea77d5ad93f17d8ac1f2ed4da70501 ] Currently, there are 12 busses (including platform and PCI) that duplicate the driver_override logic for their individual devices. All of them seem to be prone to the bug described in [1]. While this could be solved for every bus individually using a separate lock, solving this in the driver-core generically results in less (and cleaner) changes overall. Thus, move driver_override to struct device, provide corresponding accessors for busses and handle locking with a separate lock internally. In particular, add device_set_driver_override(), device_has_driver_override(), device_match_driver_override() and generalize the sysfs store() and show() callbacks via a driver_override feature flag in struct bus_type. Until all busses have migrated, keep driver_set_override() in place. Note that we can't use the device lock for the reasons described in [2]. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220789 [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [2] Tested-by: Gui-Dong Han <hanguidong02@gmail.com> Co-developed-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260303115720.48783-2-dakr@kernel.org [ Use dev->bus instead of sp->bus for consistency; fix commit message to refer to the struct bus_type's driver_override feature flag. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Stable-dep-of: 2b38efc05bf7 ("driver core: platform: use generic driver_override infrastructure") Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25xen/privcmd: add boot control for restricted usage in domUJuergen Gross
commit 1613462be621ad5103ec338a7b0ca0746ec4e5f1 upstream. When running in an unprivileged domU under Xen, the privcmd driver is restricted to allow only hypercalls against a target domain, for which the current domU is acting as a device model. Add a boot parameter "unrestricted" to allow all hypercalls (the hypervisor will still refuse destructive hypercalls affecting other guests). Make this new parameter effective only in case the domU wasn't started using secure boot, as otherwise hypercalls targeting the domU itself might result in violating the secure boot functionality. This is achieved by adding another lockdown reason, which can be tested to not being set when applying the "unrestricted" option. This is part of XSA-482 Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-03-25binfmt_elf_fdpic: fix AUXV size calculation for ELF_HWCAP3 and ELF_HWCAP4Andrei Vagin
[ Upstream commit 4ced4cf5c9d172d91f181df3accdf949d3761aab ] Commit 4e6e8c2b757f ("binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4") added support for AT_HWCAP3 and AT_HWCAP4, but it missed updating the AUX vector size calculation in create_elf_fdpic_tables() and AT_VECTOR_SIZE_BASE in include/linux/auxvec.h. Similar to the fix for AT_HWCAP2 in commit c6a09e342f8e ("binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined"), this omission leads to a mismatch between the reserved space and the actual number of AUX entries, eventually triggering a kernel BUG_ON(csp != sp). Fix this by incrementing nitems when ELF_HWCAP3 or ELF_HWCAP4 are defined and updating AT_VECTOR_SIZE_BASE. Cc: Mark Brown <broonie@kernel.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io> Fixes: 4e6e8c2b757f ("binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4") Signed-off-by: Andrei Vagin <avagin@google.com> Link: https://patch.msgid.link/20260217180108.1420024-2-avagin@google.com Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2026-03-25bonding: prevent potential infinite loop in bond_header_parse()Eric Dumazet
[ Upstream commit b7405dcf7385445e10821777143f18c3ce20fa04 ] bond_header_parse() can loop if a stack of two bonding devices is setup, because skb->dev always points to the hierarchy top. Add new "const struct net_device *dev" parameter to (struct header_ops)->parse() method to make sure the recursion is bounded, and that the final leaf parse method is called. Fixes: 950803f72547 ("bonding: fix type confusion in bond_setup_by_slave()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@shopee.com> Tested-by: Jiayuan Chen <jiayuan.chen@shopee.com> Cc: Jay Vosburgh <jv@jvosburgh.net> Cc: Andrew Lunn <andrew+netdev@lunn.ch> Link: https://patch.msgid.link/20260315104152.1436867-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>