From 49340680d41fd540c1e67b98cf4d5310c7d6a35d Mon Sep 17 00:00:00 2001 From: Michal Pecio Date: Thu, 2 Apr 2026 16:13:42 +0300 Subject: usb: xhci: Make usb_host_endpoint.hcpriv survive endpoint_disable() commit 25e531b422dc2ac90cdae3b6e74b5cdeb081440d upstream. xHCI hardware maintains its endpoint state between add_endpoint() and drop_endpoint() calls followed by successful check_bandwidth(). So does the driver. Core may call endpoint_disable() during xHCI endpoint life, so don't clear host_ep->hcpriv then, because this breaks endpoint_reset(). If a driver calls usb_set_interface(), submits URBs which make host sequence state non-zero and calls usb_clear_halt(), the device clears its sequence state but xhci_endpoint_reset() bails out. The next URB malfunctions: USB2 loses one packet, USB3 gets Transaction Error or may not complete at all on some (buggy?) HCs from ASMedia and AMD. This is triggered by uvcvideo on bulk video devices. The code was copied from ehci_endpoint_disable() but it isn't needed here - hcpriv should only be NULL on emulated root hub endpoints. It might prevent resetting and inadvertently enabling a disabled and dropped endpoint, but core shouldn't try to reset dropped endpoints. Document xhci requirements regarding hcpriv. They are currently met. Fixes: 18b74067ac78 ("xhci: Fix use-after-free regression in xhci clear hub TT implementation") Cc: stable@vger.kernel.org Signed-off-by: Michal Pecio Signed-off-by: Mathias Nyman Link: https://patch.msgid.link/20260402131342.2628648-26-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index 2511f1e5b114d..375ee6e202dbe 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -55,7 +55,8 @@ struct ep_device; * @eusb2_isoc_ep_comp: eUSB2 isoc companion descriptor for this endpoint * @urb_list: urbs queued to this endpoint; maintained by usbcore * @hcpriv: for use by HCD; typically holds hardware dma queue head (QH) - * with one or more transfer descriptors (TDs) per urb + * with one or more transfer descriptors (TDs) per urb; must be preserved + * by core while BW is allocated for the endpoint * @ep_dev: ep_device for sysfs info * @extra: descriptors following this endpoint in the configuration * @extralen: how many bytes of "extra" are valid -- cgit v1.2.3 From 3e8fefd2997c85e98de81f35380616c0a51430a4 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Mon, 6 Apr 2026 16:22:54 -0700 Subject: driver core: Don't let a device probe until it's ready commit a2225b6e834a838ae3c93709760edc0a169eb2f2 upstream. The moment we link a "struct device" into the list of devices for the bus, it's possible probe can happen. This is because another thread can load the driver at any time and that can cause the device to probe. This has been seen in practice with a stack crawl that looks like this [1]: really_probe() __driver_probe_device() driver_probe_device() __driver_attach() bus_for_each_dev() driver_attach() bus_add_driver() driver_register() __platform_driver_register() init_module() [some module] do_one_initcall() do_init_module() load_module() __arm64_sys_finit_module() invoke_syscall() As a result of the above, it was seen that device_links_driver_bound() could be called for the device before "dev->fwnode->dev" was assigned. This prevented __fw_devlink_pickup_dangling_consumers() from being called which meant that other devices waiting on our driver's sub-nodes were stuck deferring forever. It's believed that this problem is showing up suddenly for two reasons: 1. Android has recently (last ~1 year) implemented an optimization to the order it loads modules [2]. When devices opt-in to this faster loading, modules are loaded one-after-the-other very quickly. This is unlike how other distributions do it. The reproduction of this problem has only been seen on devices that opt-in to Android's "parallel module loading". 2. Android devices typically opt-in to fw_devlink, and the most noticeable issue is the NULL "dev->fwnode->dev" in device_links_driver_bound(). fw_devlink is somewhat new code and also not in use by all Linux devices. Even though the specific symptom where "dev->fwnode->dev" wasn't assigned could be fixed by moving that assignment higher in device_add(), other parts of device_add() (like the call to device_pm_add()) are also important to run before probe. Only moving the "dev->fwnode->dev" assignment would likely fix the current symptoms but lead to difficult-to-debug problems in the future. Fix the problem by preventing probe until device_add() has run far enough that the device is ready to probe. If somehow we end up trying to probe before we're allowed, __driver_probe_device() will return -EPROBE_DEFER which will make certain the device is noticed. In the race condition that was seen with Android's faster module loading, we will temporarily add the device to the deferred list and then take it off immediately when device_add() probes the device. Instead of adding another flag to the bitfields already in "struct device", instead add a new "flags" field and use that. This allows us to freely change the bit from different thread without worrying about corrupting nearby bits (and means threads changing other bit won't corrupt us). [1] Captured on a machine running a downstream 6.6 kernel [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel Cc: stable@vger.kernel.org Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing") Reviewed-by: Alan Stern Reviewed-by: Rafael J. Wysocki (Intel) Reviewed-by: Danilo Krummrich Acked-by: Greg Kroah-Hartman Acked-by: Marek Szyprowski Signed-off-by: Douglas Anderson Link: https://patch.msgid.link/20260406162231.v5.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid Signed-off-by: Danilo Krummrich Signed-off-by: Greg Kroah-Hartman --- include/linux/device.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index 8733a4edf3ccd..f5d5da4dfab1d 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -477,6 +477,21 @@ struct device_physical_location { bool lid; }; +/** + * enum struct_device_flags - Flags in struct device + * + * Each flag should have a set of accessor functions created via + * __create_dev_flag_accessors() for each access. + * + * @DEV_FLAG_READY_TO_PROBE: If set then device_add() has finished enough + * initialization that probe could be called. + */ +enum struct_device_flags { + DEV_FLAG_READY_TO_PROBE = 0, + + DEV_FLAG_COUNT +}; + /** * struct device - The basic device structure * @parent: The device's "parent" device, the device to which it is attached. @@ -572,6 +587,7 @@ struct device_physical_location { * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers. * @dma_iommu: Device is using default IOMMU implementation for DMA and * doesn't rely on dma_ops structure. + * @flags: DEV_FLAG_XXX flags. Use atomic bitfield operations to modify. * * At the lowest level, every device in a Linux system is represented by an * instance of struct device. The device structure contains the information @@ -694,8 +710,36 @@ struct device { #ifdef CONFIG_IOMMU_DMA bool dma_iommu:1; #endif + + DECLARE_BITMAP(flags, DEV_FLAG_COUNT); }; +#define __create_dev_flag_accessors(accessor_name, flag_name) \ +static inline bool dev_##accessor_name(const struct device *dev) \ +{ \ + return test_bit(flag_name, dev->flags); \ +} \ +static inline void dev_set_##accessor_name(struct device *dev) \ +{ \ + set_bit(flag_name, dev->flags); \ +} \ +static inline void dev_clear_##accessor_name(struct device *dev) \ +{ \ + clear_bit(flag_name, dev->flags); \ +} \ +static inline void dev_assign_##accessor_name(struct device *dev, bool value) \ +{ \ + assign_bit(flag_name, dev->flags, value); \ +} \ +static inline bool dev_test_and_set_##accessor_name(struct device *dev) \ +{ \ + return test_and_set_bit(flag_name, dev->flags); \ +} + +__create_dev_flag_accessors(ready_to_probe, DEV_FLAG_READY_TO_PROBE); + +#undef __create_dev_flag_accessors + /** * struct device_link - Device link representation. * @supplier: The device on the supplier end of the link. -- cgit v1.2.3 From fa9a4c5e69aaae47df95328fa96b3f2931e3180a Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Tue, 17 Mar 2026 09:01:20 -0700 Subject: device property: Make modifications of fwnode "flags" thread safe commit f72e77c33e4b5657af35125e75bab249256030f3 upstream. In various places in the kernel, we modify the fwnode "flags" member by doing either: fwnode->flags |= SOME_FLAG; fwnode->flags &= ~SOME_FLAG; This type of modification is not thread-safe. If two threads are both mucking with the flags at the same time then one can clobber the other. While flags are often modified while under the "fwnode_link_lock", this is not universally true. Create some accessor functions for setting, clearing, and testing the FWNODE flags and move all users to these accessor functions. New accessor functions use set_bit() and clear_bit(), which are thread-safe. Cc: stable@vger.kernel.org Fixes: c2c724c868c4 ("driver core: Add fw_devlink_parse_fwtree()") Reviewed-by: Andy Shevchenko Acked-by: Mark Brown Reviewed-by: Wolfram Sang Signed-off-by: Douglas Anderson Reviewed-by: Rafael J. Wysocki (Intel) Reviewed-by: Saravana Kannan Link: https://patch.msgid.link/20260317090112.v2.1.I0a4d03104ecd5103df3d76f66c8d21b1d15a2e38@changeid [ Fix fwnode_clear_flag() argument alignment, restore dropped blank line in fwnode_dev_initialized(), and remove unnecessary parentheses around fwnode_test_flag() calls. - Danilo ] Signed-off-by: Danilo Krummrich Signed-off-by: Greg Kroah-Hartman --- include/linux/fwnode.h | 44 +++++++++++++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 097be89487bf5..80b38fbf2121c 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -15,6 +15,7 @@ #define _LINUX_FWNODE_H_ #include +#include #include #include #include @@ -42,12 +43,12 @@ struct device; * suppliers. Only enforce ordering with suppliers that have * drivers. */ -#define FWNODE_FLAG_LINKS_ADDED BIT(0) -#define FWNODE_FLAG_NOT_DEVICE BIT(1) -#define FWNODE_FLAG_INITIALIZED BIT(2) -#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD BIT(3) -#define FWNODE_FLAG_BEST_EFFORT BIT(4) -#define FWNODE_FLAG_VISITED BIT(5) +#define FWNODE_FLAG_LINKS_ADDED 0 +#define FWNODE_FLAG_NOT_DEVICE 1 +#define FWNODE_FLAG_INITIALIZED 2 +#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD 3 +#define FWNODE_FLAG_BEST_EFFORT 4 +#define FWNODE_FLAG_VISITED 5 struct fwnode_handle { struct fwnode_handle *secondary; @@ -57,7 +58,7 @@ struct fwnode_handle { struct device *dev; struct list_head suppliers; struct list_head consumers; - u8 flags; + unsigned long flags; }; /* @@ -212,16 +213,37 @@ static inline void fwnode_init(struct fwnode_handle *fwnode, INIT_LIST_HEAD(&fwnode->suppliers); } +static inline void fwnode_set_flag(struct fwnode_handle *fwnode, + unsigned int bit) +{ + set_bit(bit, &fwnode->flags); +} + +static inline void fwnode_clear_flag(struct fwnode_handle *fwnode, + unsigned int bit) +{ + clear_bit(bit, &fwnode->flags); +} + +static inline void fwnode_assign_flag(struct fwnode_handle *fwnode, + unsigned int bit, bool value) +{ + assign_bit(bit, &fwnode->flags, value); +} + +static inline bool fwnode_test_flag(struct fwnode_handle *fwnode, + unsigned int bit) +{ + return test_bit(bit, &fwnode->flags); +} + static inline void fwnode_dev_initialized(struct fwnode_handle *fwnode, bool initialized) { if (IS_ERR_OR_NULL(fwnode)) return; - if (initialized) - fwnode->flags |= FWNODE_FLAG_INITIALIZED; - else - fwnode->flags &= ~FWNODE_FLAG_INITIALIZED; + fwnode_assign_flag(fwnode, FWNODE_FLAG_INITIALIZED, initialized); } int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup, -- cgit v1.2.3 From d5b495ba9de0423ef39f8bd86729a885870c7efe Mon Sep 17 00:00:00 2001 From: Hao Ge Date: Tue, 31 Mar 2026 16:13:12 +0800 Subject: mm/alloc_tag: clear codetag for pages allocated before page_ext initialization commit 6b1842775a460245e97d36d3a67d0cfba7c4ff79 upstream. Due to initialization ordering, page_ext is allocated and initialized relatively late during boot. Some pages have already been allocated and freed before page_ext becomes available, leaving their codetag uninitialized. A clear example is in init_section_page_ext(): alloc_page_ext() calls kmemleak_alloc(). If the slab cache has no free objects, it falls back to the buddy allocator to allocate memory. However, at this point page_ext is not yet fully initialized, so these newly allocated pages have no codetag set. These pages may later be reclaimed by KASAN, which causes the warning to trigger when they are freed because their codetag ref is still empty. Use a global array to track pages allocated before page_ext is fully initialized. The array size is fixed at 8192 entries, and will emit a warning if this limit is exceeded. When page_ext initialization completes, set their codetag to empty to avoid warnings when they are freed later. This warning is only observed with CONFIG_MEM_ALLOC_PROFILING_DEBUG=Y and mem_profiling_compressed disabled: [ 9.582133] ------------[ cut here ]------------ [ 9.582137] alloc_tag was not set [ 9.582139] WARNING: ./include/linux/alloc_tag.h:164 at __pgalloc_tag_sub+0x40f/0x550, CPU#5: systemd/1 [ 9.582190] CPU: 5 UID: 0 PID: 1 Comm: systemd Not tainted 7.0.0-rc4 #1 PREEMPT(lazy) [ 9.582192] Hardware name: Red Hat KVM, BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 9.582194] RIP: 0010:__pgalloc_tag_sub+0x40f/0x550 [ 9.582196] Code: 00 00 4c 29 e5 48 8b 05 1f 88 56 05 48 8d 4c ad 00 48 8d 2c c8 e9 87 fd ff ff 0f 0b 0f 0b e9 f3 fe ff ff 48 8d 3d 61 2f ed 03 <67> 48 0f b9 3a e9 b3 fd ff ff 0f 0b eb e4 e8 5e cd 14 02 4c 89 c7 [ 9.582197] RSP: 0018:ffffc9000001f940 EFLAGS: 00010246 [ 9.582200] RAX: dffffc0000000000 RBX: 1ffff92000003f2b RCX: 1ffff110200d806c [ 9.582201] RDX: ffff8881006c0360 RSI: 0000000000000004 RDI: ffffffff9bc7b460 [ 9.582202] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff3a62324 [ 9.582203] R10: ffffffff9d311923 R11: 0000000000000000 R12: ffffea0004001b00 [ 9.582204] R13: 0000000000002000 R14: ffffea0000000000 R15: ffff8881006c0360 [ 9.582206] FS: 00007ffbbcf2d940(0000) GS:ffff888450479000(0000) knlGS:0000000000000000 [ 9.582208] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.582210] CR2: 000055ee3aa260d0 CR3: 0000000148b67005 CR4: 0000000000770ef0 [ 9.582211] PKRU: 55555554 [ 9.582212] Call Trace: [ 9.582213] [ 9.582214] ? __pfx___pgalloc_tag_sub+0x10/0x10 [ 9.582216] ? check_bytes_and_report+0x68/0x140 [ 9.582219] __free_frozen_pages+0x2e4/0x1150 [ 9.582221] ? __free_slab+0xc2/0x2b0 [ 9.582224] qlist_free_all+0x4c/0xf0 [ 9.582227] kasan_quarantine_reduce+0x15d/0x180 [ 9.582229] __kasan_slab_alloc+0x69/0x90 [ 9.582232] kmem_cache_alloc_noprof+0x14a/0x500 [ 9.582234] do_getname+0x96/0x310 [ 9.582237] do_readlinkat+0x91/0x2f0 [ 9.582239] ? __pfx_do_readlinkat+0x10/0x10 [ 9.582240] ? get_random_bytes_user+0x1df/0x2c0 [ 9.582244] __x64_sys_readlinkat+0x96/0x100 [ 9.582246] do_syscall_64+0xce/0x650 [ 9.582250] ? __x64_sys_getrandom+0x13a/0x1e0 [ 9.582252] ? __pfx___x64_sys_getrandom+0x10/0x10 [ 9.582254] ? do_syscall_64+0x114/0x650 [ 9.582255] ? ksys_read+0xfc/0x1d0 [ 9.582258] ? __pfx_ksys_read+0x10/0x10 [ 9.582260] ? do_syscall_64+0x114/0x650 [ 9.582262] ? do_syscall_64+0x114/0x650 [ 9.582264] ? __pfx_fput_close_sync+0x10/0x10 [ 9.582266] ? file_close_fd_locked+0x178/0x2a0 [ 9.582268] ? __x64_sys_faccessat2+0x96/0x100 [ 9.582269] ? __x64_sys_close+0x7d/0xd0 [ 9.582271] ? do_syscall_64+0x114/0x650 [ 9.582273] ? do_syscall_64+0x114/0x650 [ 9.582275] ? clear_bhb_loop+0x50/0xa0 [ 9.582277] ? clear_bhb_loop+0x50/0xa0 [ 9.582279] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 9.582280] RIP: 0033:0x7ffbbda345ee [ 9.582282] Code: 0f 1f 40 00 48 8b 15 29 38 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 0b 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d fa 37 0d 00 f7 d8 64 89 01 48 [ 9.582284] RSP: 002b:00007ffe2ad8de58 EFLAGS: 00000202 ORIG_RAX: 000000000000010b [ 9.582286] RAX: ffffffffffffffda RBX: 000055ee3aa25570 RCX: 00007ffbbda345ee [ 9.582287] RDX: 000055ee3aa25570 RSI: 00007ffe2ad8dee0 RDI: 00000000ffffff9c [ 9.582288] RBP: 0000000000001000 R08: 0000000000000003 R09: 0000000000001001 [ 9.582289] R10: 0000000000001000 R11: 0000000000000202 R12: 0000000000000033 [ 9.582290] R13: 00007ffe2ad8dee0 R14: 00000000ffffff9c R15: 00007ffe2ad8deb0 [ 9.582292] [ 9.582293] ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/20260331081312.123719-1-hao.ge@linux.dev Fixes: dcfe378c81f72 ("lib: introduce support for page allocation tagging") Signed-off-by: Hao Ge Suggested-by: Suren Baghdasaryan Acked-by: Suren Baghdasaryan Cc: Kent Overstreet Cc: Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/alloc_tag.h | 2 ++ include/linux/pgalloc_tag.h | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h index d40ac39bfbe8d..02de2ede560f3 100644 --- a/include/linux/alloc_tag.h +++ b/include/linux/alloc_tag.h @@ -163,9 +163,11 @@ static inline void alloc_tag_sub_check(union codetag_ref *ref) { WARN_ONCE(ref && !ref->ct, "alloc_tag was not set\n"); } +void alloc_tag_add_early_pfn(unsigned long pfn); #else static inline void alloc_tag_add_check(union codetag_ref *ref, struct alloc_tag *tag) {} static inline void alloc_tag_sub_check(union codetag_ref *ref) {} +static inline void alloc_tag_add_early_pfn(unsigned long pfn) {} #endif /* Caller should verify both ref and tag to be valid */ diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h index 38a82d65e58e9..951d333622685 100644 --- a/include/linux/pgalloc_tag.h +++ b/include/linux/pgalloc_tag.h @@ -181,7 +181,7 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page) if (get_page_tag_ref(page, &ref, &handle)) { alloc_tag_sub_check(&ref); - if (ref.ct) + if (ref.ct && !is_codetag_empty(&ref)) tag = ct_to_alloc_tag(ref.ct); put_page_tag_ref(handle); } -- cgit v1.2.3 From 2691332ad88b57179c38653e2cd613d5820a52cf Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Fri, 27 Mar 2026 16:33:14 -0700 Subject: mm/damon/core: fix damon_call() vs kdamond_fn() exit race commit 55da81663b9642dd046b26dd6f1baddbcf337c1e upstream. Patch series "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race". damon_call() and damos_walk() can leak memory and/or deadlock when they race with kdamond terminations. Fix those. This patch (of 2); When kdamond_fn() main loop is finished, the function cancels all remaining damon_call() requests and unset the damon_ctx->kdamond so that API callers and API functions themselves can know the context is terminated. damon_call() adds the caller's request to the queue first. After that, it shows if the kdamond of the damon_ctx is still running (damon_ctx->kdamond is set). Only if the kdamond is running, damon_call() starts waiting for the kdamond's handling of the newly added request. The damon_call() requests registration and damon_ctx->kdamond unset are protected by different mutexes, though. Hence, damon_call() could race with damon_ctx->kdamond unset, and result in deadlocks. For example, let's suppose kdamond successfully finished the damon_call() requests cancelling. Right after that, damon_call() is called for the context. It registers the new request, and shows the context is still running, because damon_ctx->kdamond unset is not yet done. Hence the damon_call() caller starts waiting for the handling of the request. However, the kdamond is already on the termination steps, so it never handles the new request. As a result, the damon_call() caller threads infinitely waits. Fix this by introducing another damon_ctx field, namely call_controls_obsolete. It is protected by the damon_ctx->call_controls_lock, which protects damon_call() requests registration. Initialize (unset) it in kdamond_fn() before letting damon_start() returns and set it just before the cancelling of remaining damon_call() requests is executed. damon_call() reads the obsolete field under the lock and avoids adding a new request. After this change, only requests that are guaranteed to be handled or cancelled are registered. Hence the after-registration DAMON context termination check is no longer needed. Remove it together. Note that the deadlock will not happen when damon_call() is called for repeat mode request. In tis case, damon_call() returns instead of waiting for the handling when the request registration succeeds and it shows the kdamond is running. However, if the request also has dealloc_on_cancel, the request memory would be leaked. The issue is found by sashiko [1]. Link: https://lore.kernel.org/20260327233319.3528-1-sj@kernel.org Link: https://lore.kernel.org/20260327233319.3528-2-sj@kernel.org Link: https://lore.kernel.org/20260325141956.87144-1-sj@kernel.org [1] Fixes: 42b7491af14c ("mm/damon/core: introduce damon_call()") Signed-off-by: SeongJae Park Cc: # 6.14.x Signed-off-by: Andrew Morton Signed-off-by: Greg Kroah-Hartman --- include/linux/damon.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 1a8a79d7e4e8d..6fe6f7fcf83d8 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -781,6 +781,7 @@ struct damon_ctx { /* lists of &struct damon_call_control */ struct list_head call_controls; + bool call_controls_obsolete; struct mutex call_controls_lock; struct damos_walk_control *walk_control; -- cgit v1.2.3 From c5dfddc57f1b6b4e4df6610e917baceb114b8f6e Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 22 Mar 2024 14:22:48 +0100 Subject: tpm: avoid -Wunused-but-set-variable commit 6f1d4d2ecfcd1b577dc87350ea965fe81f272e83 upstream. Outside of the EFI tpm code, the TPM_MEMREMAP()/TPM_MEMUNMAP functions are defined as trivial macros, leading to the mapping_size variable ending up unused: In file included from drivers/char/tpm/tpm-sysfs.c:16: In file included from drivers/char/tpm/tpm.h:28: include/linux/tpm_eventlog.h:167:6: error: variable 'mapping_size' set but not used [-Werror,-Wunused-but-set-variable] 167 | int mapping_size; Turn the stubs into inline functions to avoid this warning. Cc: stable@vger.kernel.org # v5.3+ Fixes: c46f3405692d ("tpm: Reserve the TPM final events table") Signed-off-by: Arnd Bergmann Reviewed-by: Thorsten Blum Reviewed-by: Jarkko Sakkinen Signed-off-by: Jarkko Sakkinen Signed-off-by: Greg Kroah-Hartman --- include/linux/tpm_eventlog.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h index 891368e82558e..aff8ea2fa98e5 100644 --- a/include/linux/tpm_eventlog.h +++ b/include/linux/tpm_eventlog.h @@ -131,11 +131,16 @@ struct tcg_algorithm_info { }; #ifndef TPM_MEMREMAP -#define TPM_MEMREMAP(start, size) NULL +static inline void *TPM_MEMREMAP(unsigned long start, size_t size) +{ + return NULL; +} #endif #ifndef TPM_MEMUNMAP -#define TPM_MEMUNMAP(start, size) do{} while(0) +static inline void TPM_MEMUNMAP(void *mapping, size_t size) +{ +} #endif /** -- cgit v1.2.3 From d780f24a4939862e1df9def415fe7de59280fc7c Mon Sep 17 00:00:00 2001 From: Ryan Roberts Date: Tue, 3 Mar 2026 15:08:38 +0000 Subject: randomize_kstack: Maintain kstack_offset per task commit 37beb42560165869838e7d91724f3e629db64129 upstream. kstack_offset was previously maintained per-cpu, but this caused a couple of issues. So let's instead make it per-task. Issue 1: add_random_kstack_offset() and choose_random_kstack_offset() expected and required to be called with interrupts and preemption disabled so that it could manipulate per-cpu state. But arm64, loongarch and risc-v are calling them with interrupts and preemption enabled. I don't _think_ this causes any functional issues, but it's certainly unexpected and could lead to manipulating the wrong cpu's state, which could cause a minor performance degradation due to bouncing the cache lines. By maintaining the state per-task those functions can safely be called in preemptible context. Issue 2: add_random_kstack_offset() is called before executing the syscall and expands the stack using a previously chosen random offset. choose_random_kstack_offset() is called after executing the syscall and chooses and stores a new random offset for the next syscall. With per-cpu storage for this offset, an attacker could force cpu migration during the execution of the syscall and prevent the offset from being updated for the original cpu such that it is predictable for the next syscall on that cpu. By maintaining the state per-task, this problem goes away because the per-task random offset is updated after the syscall regardless of which cpu it is executing on. Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Closes: https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/ Cc: stable@vger.kernel.org Acked-by: Mark Rutland Signed-off-by: Ryan Roberts Link: https://patch.msgid.link/20260303150840.3789438-2-ryan.roberts@arm.com Signed-off-by: Kees Cook Signed-off-by: Greg Kroah-Hartman --- include/linux/randomize_kstack.h | 26 +++++++++++++++----------- include/linux/sched.h | 4 ++++ 2 files changed, 19 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index 1d982dbdd0d0b..5d3916ca747cc 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -9,7 +9,6 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, randomize_kstack_offset); -DECLARE_PER_CPU(u32, kstack_offset); /* * Do not use this anywhere else in the kernel. This is used here because @@ -50,15 +49,14 @@ DECLARE_PER_CPU(u32, kstack_offset); * add_random_kstack_offset - Increase stack utilization by previously * chosen random offset * - * This should be used in the syscall entry path when interrupts and - * preempt are disabled, and after user registers have been stored to - * the stack. For testing the resulting entropy, please see: - * tools/testing/selftests/lkdtm/stack-entropy.sh + * This should be used in the syscall entry path after user registers have been + * stored to the stack. Preemption may be enabled. For testing the resulting + * entropy, please see: tools/testing/selftests/lkdtm/stack-entropy.sh */ #define add_random_kstack_offset() do { \ if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ &randomize_kstack_offset)) { \ - u32 offset = raw_cpu_read(kstack_offset); \ + u32 offset = current->kstack_offset; \ u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \ /* Keep allocation even after "ptr" loses scope. */ \ asm volatile("" :: "r"(ptr) : "memory"); \ @@ -69,9 +67,9 @@ DECLARE_PER_CPU(u32, kstack_offset); * choose_random_kstack_offset - Choose the random offset for the next * add_random_kstack_offset() * - * This should only be used during syscall exit when interrupts and - * preempt are disabled. This position in the syscall flow is done to - * frustrate attacks from userspace attempting to learn the next offset: + * This should only be used during syscall exit. Preemption may be enabled. This + * position in the syscall flow is done to frustrate attacks from userspace + * attempting to learn the next offset: * - Maximize the timing uncertainty visible from userspace: if the * offset is chosen at syscall entry, userspace has much more control * over the timing between choosing offsets. "How long will we be in @@ -85,14 +83,20 @@ DECLARE_PER_CPU(u32, kstack_offset); #define choose_random_kstack_offset(rand) do { \ if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ &randomize_kstack_offset)) { \ - u32 offset = raw_cpu_read(kstack_offset); \ + u32 offset = current->kstack_offset; \ offset = ror32(offset, 5) ^ (rand); \ - raw_cpu_write(kstack_offset, offset); \ + current->kstack_offset = offset; \ } \ } while (0) + +static inline void random_kstack_task_init(struct task_struct *tsk) +{ + tsk->kstack_offset = 0; +} #else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */ #define add_random_kstack_offset() do { } while (0) #define choose_random_kstack_offset(rand) do { } while (0) +#define random_kstack_task_init(tsk) do { } while (0) #endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */ #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index 3e2005e9e2f0b..2a540a9065dea 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1614,6 +1614,10 @@ struct task_struct { unsigned long prev_lowest_stack; #endif +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET + u32 kstack_offset; +#endif + #ifdef CONFIG_X86_MCE void __user *mce_vaddr; __u64 mce_kflags; -- cgit v1.2.3 From 6f977b0472f700e4bcb981342ceacf2f1b1a3671 Mon Sep 17 00:00:00 2001 From: Anthony Yznaga Date: Tue, 28 Apr 2026 16:05:33 -0400 Subject: mm: prevent droppable mappings from being locked [ Upstream commit d239462787b072c78eb19fc1f155c3d411256282 ] Droppable mappings must not be lockable. There is a check for VMAs with VM_DROPPABLE set in mlock_fixup() along with checks for other types of unlockable VMAs which ensures this when calling mlock()/mlock2(). For mlockall(MCL_FUTURE), the check for unlockable VMAs is different. In apply_mlockall_flags(), if the flags parameter has MCL_FUTURE set, the current task's mm's default VMA flag field mm->def_flags has VM_LOCKED applied to it. VM_LOCKONFAULT is also applied if MCL_ONFAULT is also set. When these flags are set as default in this manner they are cleared in __mmap_complete() for new mappings that do not support mlock. A check for VM_DROPPABLE in __mmap_complete() is missing resulting in droppable mappings created with VM_LOCKED set. To fix this and reduce that chance of similar bugs in the future, introduce and use vma_supports_mlock(). Link: https://lkml.kernel.org/r/20260310155821.17869-1-anthony.yznaga@oracle.com Fixes: 9651fcedf7b9 ("mm: add MAP_DROPPABLE for designating always lazily freeable mappings") Signed-off-by: Anthony Yznaga Suggested-by: David Hildenbrand Acked-by: David Hildenbrand (Arm) Reviewed-by: Pedro Falcato Reviewed-by: Lorenzo Stoakes (Oracle) Tested-by: Lorenzo Stoakes (Oracle) Cc: Jann Horn Cc: Jason A. Donenfeld Cc: Liam Howlett Cc: Michal Hocko Cc: Mike Rapoport Cc: Shuah Khan Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton [ added const to is_vm_hugetlb_page and stubbed vma_supports_mlock in vma_internal.h instead of the split-out stubs.h ] Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/hugetlb_inline.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h index 0660a03d37d98..846185ea626c7 100644 --- a/include/linux/hugetlb_inline.h +++ b/include/linux/hugetlb_inline.h @@ -6,14 +6,14 @@ #include -static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) +static inline bool is_vm_hugetlb_page(const struct vm_area_struct *vma) { return !!(vma->vm_flags & VM_HUGETLB); } #else -static inline bool is_vm_hugetlb_page(struct vm_area_struct *vma) +static inline bool is_vm_hugetlb_page(const struct vm_area_struct *vma) { return false; } -- cgit v1.2.3 From b8c5acce56e07eb6654d0ff0427945875d9179f8 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Mon, 13 Apr 2026 19:59:11 -0700 Subject: driver core: Add kernel-doc for DEV_FLAG_COUNT enum value commit 5b484311507b5d403c1f7a45f6aa3778549e268b upstream. Even though nobody should use this value (except when declaring the "flags" bitmap), kernel-doc still gets upset that it's not documented. It reports: WARNING: ../include/linux/device.h:519 Enum value 'DEV_FLAG_COUNT' not described in enum 'struct_device_flags' Add the description of DEV_FLAG_COUNT. Fixes: a2225b6e834a ("driver core: Don't let a device probe until it's ready") Reported-by: Randy Dunlap Closes: https://lore.kernel.org/f318cd43-81fd-48b9-abf7-92af85f12f91@infradead.org Signed-off-by: Douglas Anderson Tested-by: Randy Dunlap Reviewed-by: Randy Dunlap Link: https://patch.msgid.link/20260413195910.1.I23aca74fe2d3636a47df196a80920fecb2643220@changeid Signed-off-by: Danilo Krummrich Signed-off-by: Greg Kroah-Hartman --- include/linux/device.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index f5d5da4dfab1d..84910bffe7953 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -485,6 +485,7 @@ struct device_physical_location { * * @DEV_FLAG_READY_TO_PROBE: If set then device_add() has finished enough * initialization that probe could be called. + * @DEV_FLAG_COUNT: Number of defined struct_device_flags. */ enum struct_device_flags { DEV_FLAG_READY_TO_PROBE = 0, -- cgit v1.2.3