linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
2025-12-23	drm/gpusvm: Introduce a function to scan the current migration state	Thomas Hellström
	With multi-device we are much more likely to have multiple drm-gpusvm ranges pointing to the same struct mm range. To avoid calling into drm_pagemap_populate_mm(), which is always very costly, introduce a much less costly drm_gpusvm function, drm_gpusvm_scan_mm() to scan the current migration state. The device fault-handler and prefetcher can use this function to determine whether migration is really necessary. There are a couple of performance improvements that can be done for this function if it turns out to be too costly. Those are documented in the code. v3: - New patch. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-21-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap, drm/xe: Clean up the use of the device-private page owner	Thomas Hellström
	Use the dev_pagemap->owner field wherever possible, simplifying the code slightly. v3: New patch Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-20-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe/svm: Document how xe keeps drm_pagemap references	Thomas Hellström
	As an aid to understanding the lifetime of the drm_pagemaps used by the xe driver, document how the xe driver keeps the drm_pagemap references. v3: - Fix formatting (Matt Brost) Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-19-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe/vm: Add a couple of VM debug printouts	Thomas Hellström
	Add debug printouts that are valueable for pagemap prefetch, migration and page collection. v2: - Add additional debug prinouts around migration and page collection. - Require CONFIG_DRM_XE_DEBUG_VM. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://patch.msgid.link/20251219113320.183860-18-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe: Support pcie p2p dma as a fast interconnect	Thomas Hellström
	Mimic the dma-buf method using dma_[map\|unmap]_resource to map for pcie-p2p dma. There's an ongoing area of work upstream to sort out how this best should be done. One method proposed is to add an additional pci_p2p_dma_pagemap aliasing the device_private pagemap and use the corresponding pci_p2p_dma_pagemap page as input for dma_map_page(). However, that would incur double the amount of memory and latency to set up the drm_pagemap and given the huge amount of memory present on modern GPUs, that would really not work. Hence the simple approach used in this patch. v2: - Simplify xe_page_to_pcie(). (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-17-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe/uapi: Extend the madvise functionality to support foreign pagemap ↵	Thomas Hellström
	placement for svm Use device file descriptors and regions to represent pagemaps on foreign or local devices. The underlying files are type-checked at madvise time, and references are kept on the drm_pagemap as long as there is are madvises pointing to it. Extend the madvise preferred_location UAPI to support the region instance to identify the foreign placement. v2: - Improve UAPI documentation. (Matt Brost) - Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost) - Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost) - Don't allow a foreign drm_pagemap madvise without a fast interconnect. v3: - Add a comment about reference-counting in xe_devmem_open() and remove the reference-count get-and-put. (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe: Simplify madvise_preferred_mem_loc()	Thomas Hellström
	Simplify madvise_preferred_mem_loc by removing repetitive patterns in favour of local variables. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-15-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe: Use the vma attibute drm_pagemap to select where to migrate	Thomas Hellström
	Honor the drm_pagemap vma attribute when migrating SVM pages. Ensure that when the desired placement is validated as device memory, that we also check that the requested drm_pagemap is consistent with the current. v2: - Initialize a struct drm_pagemap pointer to NULL that could otherwise be dereferenced uninitialized. (CI) - Remove a redundant assignment (Matt Brost) - Slightly improved commit message (Matt Brost) - Extended drm_pagemap validation. v3: - Fix a compilation error if CONFIG_DRM_GPUSVM is not enabled. (kernel test robot <lkp@intel.com>) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20251219113320.183860-14-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe: Pass a drm_pagemap pointer around with the memory advise attributes	Thomas Hellström
	As a consequence, struct xe_vma_mem_attr() can't simply be assigned or freed without taking the reference count of individual members into account. Also add helpers to do that. v2: - Move some calls to xe_vma_mem_attr_fini() to xe_vma_free(). (Matt Brost) v3: - Rebase. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v2 Link: https://patch.msgid.link/20251219113320.183860-13-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe: Use the drm_pagemap_util helper to get a svm pagemap owner	Thomas Hellström
	Register a driver-wide owner list, provide a callback to identify fast interconnects and use the drm_pagemap_util helper to allocate or reuse a suitable owner struct. For now we consider pagemaps on different tiles on the same device as having fast interconnect and thus the same owner. v2: - Fix up the error onion unwind in xe_pagemap_create(). (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-12-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap_util: Add a utility to assign an owner to a set of ↵	Thomas Hellström
	interconnected gpus The hmm_range_fault() and the migration helpers currently need a common "owner" to identify pagemaps and clients with fast interconnect. Add a drm_pagemap utility to setup such owners by registering drm_pagemaps, in a registry, and for each new drm_pagemap, query which existing drm_pagemaps have fast interconnects with the new drm_pagemap. The "owner" scheme is limited in that it is static at drm_pagemap creation. Ideally one would want the owner to be adjusted at run-time, but that requires changes to hmm. If the proposed scheme becomes too limited, we need to revisit. v2: - Improve documentation of DRM_PAGEMAP_OWNER_LIST_DEFINE(). (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-11-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap: Remove the drm_pagemap_create() interface	Thomas Hellström
	With the drm_pagemap_init() interface, drm_pagemap_create() is not used anymore. v2: - Slightly more verbose commit message. (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-10-thomas.hellstrom@linux.intel.com
2025-12-23	drm/xe: Use the drm_pagemap cache and shrinker	Thomas Hellström
	Define a struct xe_pagemap that embeds all pagemap-related data used by xekmd, and use the drm_pagemap cache- and shrinker to manage lifetime. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-9-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap: Add a drm_pagemap cache and shrinker	Thomas Hellström
	Pagemaps are costly to set up and tear down, and they consume a lot of system memory for the struct pages. Ideally they should be created only when needed. Add a caching mechanism to allow doing just that: Create the drm_pagemaps when needed for migration. Keep them around to avoid destruction and re-creation latencies and destroy inactive/unused drm_pagemaps on memory pressure using a shrinker. Only add the helper functions. They will be hooked up to the xe driver in the upcoming patch. v2: - Add lockdep checking for drm_pagemap_put(). (Matt Brost) - Add a copyright notice. (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-8-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap, drm/xe: Manage drm_pagemap provider lifetimes	Thomas Hellström
	If a device holds a reference on a foregin device's drm_pagemap, and a device unbind is executed on the foreign device, Typically that foreign device would evict its device-private pages and then continue its device-managed cleanup eventually releasing its drm device and possibly allow for module unload. However, since we're still holding a reference on a drm_pagemap, when that reference is released and the provider module is unloaded we'd execute out of undefined memory. Therefore keep a reference on the provider device and module until the last drm_pagemap reference is gone. Note that in theory, the drm_gpusvm_helper module may be unloaded as soon as the final module_put() of the provider driver module is executed, so we need to add a module_exit() function that waits for the work item executing the module_put() has completed. v2: - Better commit message (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-7-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap: Add a refcounted drm_pagemap backpointer to struct drm_pagemap_zdd	Thomas Hellström
	To be able to keep track of drm_pagemap usage, add a refcounted backpointer to struct drm_pagemap_zdd. This will keep the drm_pagemap reference count from dropping to zero as long as there are drm_pagemap pages present in a CPU address space. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-6-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap, drm/xe: Add refcounting to struct drm_pagemap	Thomas Hellström
	With the end goal of being able to free unused pagemaps and allocate them on demand, add a refcount to struct drm_pagemap, remove the xe embedded drm_pagemap, allocating and freeing it explicitly. v2: - Make the drm_pagemap pointer in drm_gpusvm_pages reference-counted. v3: - Call drm_pagemap_get() before drm_pagemap_put() in drm_gpusvm_pages (Himal Prasad Ghimiray) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-5-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use	Thomas Hellström
	In situations where no system memory is migrated to devmem, and in upcoming patches where another GPU is performing the migration to the newly allocated devmem buffer, there is nothing to ensure any ongoing clear to the devmem allocation or async eviction from the devmem allocation is complete. Address that by passing a struct dma_fence down to the copy functions, and ensure it is waited for before migration is marked complete. v3: - New patch. v4: - Update the logic used for determining when to wait for the pre_migrate_fence. - Update the logic used for determining when to warn for the pre_migrate_fence since the scheduler fences apparently can signal out-of-order. v5: - Fix a UAF (CI) - Remove references to source P2P migration (Himal) - Put the pre_migrate_fence after migration. v6: - Pipeline the pre_migrate_fence dependency (Matt Brost) Fixes: c5b3eb5a906c ("drm/xe: Add GPUSVM device memory copy vfunc functions") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-4-thomas.hellstrom@linux.intel.com
2025-12-23	drm/pagemap: Remove some dead code	Thomas Hellström
	The page pointer can't be NULL. v5: - New patch. (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe. Link: https://patch.msgid.link/20251219113320.183860-3-thomas.hellstrom@linux.intel.com
2025-12-23	net: airoha: Move net_devs registration in a dedicated routine	Lorenzo Bianconi
	Since airoha_probe() is not executed under rtnl lock, there is small race where a given device is configured by user-space while the remaining ones are not completely loaded from the dts yet. This condition will allow a hw device misconfiguration since there are some conditions (e.g. GDM2 check in airoha_dev_init()) that require all device are properly loaded from the device tree. Fix the issue moving net_devices registration at the end of the airoha_probe routine. Fixes: 9cd451d414f6e ("net: airoha: Add loopback support for GDM2") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251214-airoha-fix-dev-registration-v1-1-860e027ad4c6@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23	clk: renesas: r9a09g057: Add entries for RSCIs	Lad Prabhakar
	Add clock and reset entries for the RSCI IPs. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251203094147.6429-3-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-12-23	clk: renesas: r9a09g056: Add entries for RSCIs	Lad Prabhakar
	Add clock and reset entries for the RSCI IPs. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251203094147.6429-2-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-12-23	soc: renesas: Enable ICU support on RZ/N2H	Cosmin Tanislav
	The Renesas RZ/N2H (R9A09G087) SoC has the same Interrupt Controller (ICU) as the Renesas RZ/T2H (R9A09G077) SoC. Enable support for it by selecting the RENESAS_RZT2H_ICU config. Signed-off-by: Cosmin Tanislav <cosmin-gabriel.tanislav.xa@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251216123421.124401-1-cosmin-gabriel.tanislav.xa@renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2025-12-23	erspan: Initialize options_len before referencing options.	Frode Nordahl
	The struct ip_tunnel_info has a flexible array member named options that is protected by a counted_by(options_len) attribute. The compiler will use this information to enforce runtime bounds checking deployed by FORTIFY_SOURCE string helpers. As laid out in the GCC documentation, the counter must be initialized before the first reference to the flexible array member. After scanning through the files that use struct ip_tunnel_info and also refer to options or options_len, it appears the normal case is to use the ip_tunnel_info_opts_set() helper. Said helper would initialize options_len properly before copying data into options, however in the GRE ERSPAN code a partial update is done, preventing the use of the helper function. Before this change the handling of ERSPAN traffic in GRE tunnels would cause a kernel panic when the kernel is compiled with GCC 15+ and having FORTIFY_SOURCE configured: memcpy: detected buffer overflow: 4 byte write of buffer size 0 Call Trace: <IRQ> __fortify_panic+0xd/0xf erspan_rcv.cold+0x68/0x83 ? ip_route_input_slow+0x816/0x9d0 gre_rcv+0x1b2/0x1c0 gre_rcv+0x8e/0x100 ? raw_v4_input+0x2a0/0x2b0 ip_protocol_deliver_rcu+0x1ea/0x210 ip_local_deliver_finish+0x86/0x110 ip_local_deliver+0x65/0x110 ? ip_rcv_finish_core+0xd6/0x360 ip_rcv+0x186/0x1a0 Cc: stable@vger.kernel.org Link: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-counted_005fby-variable-attribute Reported-at: https://launchpad.net/bugs/2129580 Fixes: bb5e62f2d547 ("net: Add options as a flexible array to struct ip_tunnel_info") Signed-off-by: Frode Nordahl <fnordahl@ubuntu.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251213101338.4693-1-fnordahl@ubuntu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23	Merge branch 'mptcp-fix-warn-on-bad-status'	Paolo Abeni
	Matthieu Baerts says: ==================== mptcp: fix warn on bad status Two somewhat related fixes addressing different issues found by syzkaller, and producing the exact same splat: a WARNING in subflow_data_ready(). - Patch 1: fallback earlier on simultaneous connections to avoid a warning. A fix for v5.19. - Patch 2: ensure context reset on disconnect, also to avoid a similar warning. A fix for v6.2. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> ==================== Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-0-d1f9fd1c36c8@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23	mptcp: ensure context reset on disconnect()	Paolo Abeni
	After the blamed commit below, if the MPC subflow is already in TCP_CLOSE status or has fallback to TCP at mptcp_disconnect() time, mptcp_do_fastclose() skips setting the `send_fastclose flag` and the later __mptcp_close_ssk() does not reset anymore the related subflow context. Any later connection will be created with both the `request_mptcp` flag and the msk-level fallback status off (it is unconditionally cleared at MPTCP disconnect time), leading to a warning in subflow_data_ready(): WARNING: CPU: 26 PID: 8996 at net/mptcp/subflow.c:1519 subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13)) Modules linked in: CPU: 26 UID: 0 PID: 8996 Comm: syz.22.39 Not tainted 6.18.0-rc7-05427-g11fc074f6c36 #1 PREEMPT(voluntary) Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 RIP: 0010:subflow_data_ready (net/mptcp/subflow.c:1519 (discriminator 13)) Code: 90 0f 0b 90 90 e9 04 fe ff ff e8 b7 1e f5 fe 89 ee bf 07 00 00 00 e8 db 19 f5 fe 83 fd 07 0f 84 35 ff ff ff e8 9d 1e f5 fe 90 <0f> 0b 90 e9 27 ff ff ff e8 8f 1e f5 fe 4c 89 e7 48 89 de e8 14 09 RSP: 0018:ffffc9002646fb30 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff88813b218000 RCX: ffffffff825c8435 RDX: ffff8881300b3580 RSI: ffffffff825c8443 RDI: 0000000000000005 RBP: 000000000000000b R08: ffffffff825c8435 R09: 000000000000000b R10: 0000000000000005 R11: 0000000000000007 R12: ffff888131ac0000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f88330af6c0(0000) GS:ffff888a93dd2000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f88330aefe8 CR3: 000000010ff59000 CR4: 0000000000350ef0 Call Trace: <TASK> tcp_data_ready (net/ipv4/tcp_input.c:5356) tcp_data_queue (net/ipv4/tcp_input.c:5445) tcp_rcv_state_process (net/ipv4/tcp_input.c:7165) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1955) __release_sock (include/net/sock.h:1158 (discriminator 6) net/core/sock.c:3180 (discriminator 6)) release_sock (net/core/sock.c:3737) mptcp_sendmsg (net/mptcp/protocol.c:1763 net/mptcp/protocol.c:1857) inet_sendmsg (net/ipv4/af_inet.c:853 (discriminator 7)) __sys_sendto (net/socket.c:727 (discriminator 15) net/socket.c:742 (discriminator 15) net/socket.c:2244 (discriminator 15)) __x64_sys_sendto (net/socket.c:2247) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) RIP: 0033:0x7f883326702d Address the issue setting an explicit `fastclosing` flag at fastclose time, and checking such flag after mptcp_do_fastclose(). Fixes: ae155060247b ("mptcp: fix duplicate reset on fastclose") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-2-d1f9fd1c36c8@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23	mptcp: fallback earlier on simult connection	Paolo Abeni
	Syzkaller reports a simult-connect race leading to inconsistent fallback status: WARNING: CPU: 3 PID: 33 at net/mptcp/subflow.c:1515 subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515 Modules linked in: CPU: 3 UID: 0 PID: 33 Comm: ksoftirqd/3 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:subflow_data_ready+0x40b/0x7c0 net/mptcp/subflow.c:1515 Code: 89 ee e8 78 61 3c f6 40 84 ed 75 21 e8 8e 66 3c f6 44 89 fe bf 07 00 00 00 e8 c1 61 3c f6 41 83 ff 07 74 09 e8 76 66 3c f6 90 <0f> 0b 90 e8 6d 66 3c f6 48 89 df e8 e5 ad ff ff 31 ff 89 c5 89 c6 RSP: 0018:ffffc900006cf338 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888031acd100 RCX: ffffffff8b7f2abf RDX: ffff88801e6ea440 RSI: ffffffff8b7f2aca RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 0000000000000004 R11: 0000000000002c10 R12: ffff88802ba69900 R13: 1ffff920000d9e67 R14: ffff888046f81800 R15: 0000000000000004 FS: 0000000000000000(0000) GS:ffff8880d69bc000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560fc0ca1670 CR3: 0000000032c3a000 CR4: 0000000000352ef0 Call Trace: <TASK> tcp_data_queue+0x13b0/0x4f90 net/ipv4/tcp_input.c:5197 tcp_rcv_state_process+0xfdf/0x4ec0 net/ipv4/tcp_input.c:6922 tcp_v6_do_rcv+0x492/0x1740 net/ipv6/tcp_ipv6.c:1672 tcp_v6_rcv+0x2976/0x41e0 net/ipv6/tcp_ipv6.c:1918 ip6_protocol_deliver_rcu+0x188/0x1520 net/ipv6/ip6_input.c:438 ip6_input_finish+0x1e4/0x4b0 net/ipv6/ip6_input.c:489 NF_HOOK include/linux/netfilter.h:318 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] ip6_input+0x105/0x2f0 net/ipv6/ip6_input.c:500 dst_input include/net/dst.h:471 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] NF_HOOK include/linux/netfilter.h:318 [inline] NF_HOOK include/linux/netfilter.h:312 [inline] ipv6_rcv+0x264/0x650 net/ipv6/ip6_input.c:311 __netif_receive_skb_one_core+0x12d/0x1e0 net/core/dev.c:5979 __netif_receive_skb+0x1d/0x160 net/core/dev.c:6092 process_backlog+0x442/0x15e0 net/core/dev.c:6444 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7494 napi_poll net/core/dev.c:7557 [inline] net_rx_action+0xa9f/0xfe0 net/core/dev.c:7684 handle_softirqs+0x216/0x8e0 kernel/softirq.c:579 run_ksoftirqd kernel/softirq.c:968 [inline] run_ksoftirqd+0x3a/0x60 kernel/softirq.c:960 smpboot_thread_fn+0x3f7/0xae0 kernel/smpboot.c:160 kthread+0x3c2/0x780 kernel/kthread.c:463 ret_from_fork+0x5d7/0x6f0 arch/x86/kernel/process.c:148 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> The TCP subflow can process the simult-connect syn-ack packet after transitioning to TCP_FIN1 state, bypassing the MPTCP fallback check, as the sk_state_change() callback is not invoked for * -> FIN_WAIT1 transitions. That will move the msk socket to an inconsistent status and the next incoming data will hit the reported splat. Close the race moving the simult-fallback check at the earliest possible stage - that is at syn-ack generation time. About the fixes tags: [2] was supposed to also fix this issue introduced by [3]. [1] is required as a dependence: it was not explicitly marked as a fix, but it is one and it has already been backported before [3]. In other words, this commit should be backported up to [3], including [2] and [1] if that's not already there. Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().") [1] Fixes: 4fd19a307016 ("mptcp: fix inconsistent state on fastopen race") [2] Fixes: 1e777f39b4d7 ("mptcp: add MSG_FASTOPEN sendmsg flag support") [3] Cc: stable@vger.kernel.org Reported-by: syzbot+0ff6b771b4f7a5bce83b@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/586 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251212-net-mptcp-subflow_data_ready-warn-v1-1-d1f9fd1c36c8@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23	drm/xe/svm: Fix a debug printout	Thomas Hellström
	Avoid spamming the log with drm_info(). Use drm_dbg() instead. Fixes: cc795e041034 ("drm/xe/svm: Make xe_svm_range_needs_migrate_to_vram() public") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: <stable@vger.kernel.org> # v6.17+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patch.msgid.link/20251219113320.183860-2-thomas.hellstrom@linux.intel.com
2025-12-23	team: fix check for port enabled in team_queue_override_port_prio_changed()	Jiri Pirko
	There has been a syzkaller bug reported recently with the following trace: list_del corruption, ffff888058bea080->prev is LIST_POISON2 (dead000000000122) ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:59! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI CPU: 3 UID: 0 PID: 21246 Comm: syz.0.2928 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:__list_del_entry_valid_or_report+0x13e/0x200 lib/list_debug.c:59 Code: 48 c7 c7 e0 71 f0 8b e8 30 08 ef fc 90 0f 0b 48 89 ef e8 a5 02 55 fd 48 89 ea 48 89 de 48 c7 c7 40 72 f0 8b e8 13 08 ef fc 90 <0f> 0b 48 89 ef e8 88 02 55 fd 48 89 ea 48 b8 00 00 00 00 00 fc ff RSP: 0018:ffffc9000d49f370 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffff888058bea080 RCX: ffffc9002817d000 RDX: 0000000000000000 RSI: ffffffff819becc6 RDI: 0000000000000005 RBP: dead000000000122 R08: 0000000000000005 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000001 R12: ffff888039e9c230 R13: ffff888058bea088 R14: ffff888058bea080 R15: ffff888055461480 FS: 00007fbbcfe6f6c0(0000) GS:ffff8880d6d0a000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c3afcb0 CR3: 00000000382c7000 CR4: 0000000000352ef0 Call Trace: <TASK> __list_del_entry_valid include/linux/list.h:132 [inline] __list_del_entry include/linux/list.h:223 [inline] list_del_rcu include/linux/rculist.h:178 [inline] __team_queue_override_port_del drivers/net/team/team_core.c:826 [inline] __team_queue_override_port_del drivers/net/team/team_core.c:821 [inline] team_queue_override_port_prio_changed drivers/net/team/team_core.c:883 [inline] team_priority_option_set+0x171/0x2f0 drivers/net/team/team_core.c:1534 team_option_set drivers/net/team/team_core.c:376 [inline] team_nl_options_set_doit+0x8ae/0xe60 drivers/net/team/team_core.c:2653 genl_family_rcv_msg_doit+0x209/0x2f0 net/netlink/genetlink.c:1115 genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline] genl_rcv_msg+0x55c/0x800 net/netlink/genetlink.c:1210 netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2552 genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219 netlink_unicast_kernel net/netlink/af_netlink.c:1320 [inline] netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1346 netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1896 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] ____sys_sendmsg+0xa98/0xc70 net/socket.c:2630 ___sys_sendmsg+0x134/0x1d0 net/socket.c:2684 __sys_sendmsg+0x16d/0x220 net/socket.c:2716 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xcd/0xfa0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f The problem is in this flow: 1) Port is enabled, queue_id != 0, in qom_list 2) Port gets disabled -> team_port_disable() -> team_queue_override_port_del() -> del (removed from list) 3) Port is disabled, queue_id != 0, not in any list 4) Priority changes -> team_queue_override_port_prio_changed() -> checks: port disabled && queue_id != 0 -> calls del - hits the BUG as it is removed already To fix this, change the check in team_queue_override_port_prio_changed() so it returns early if port is not enabled. Reported-by: syzbot+422806e5f4cce722a71f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=422806e5f4cce722a71f Fixes: 6c31ff366c11 ("team: remove synchronize_rcu() called during queue override change") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251212102953.167287-1-jiri@resnulli.us Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-23	dmaengine: sun6i: Add debug messages for cyclic DMA prepare	Chen-Yu Tsai
	The driver already has debug messages for memcpy and linear transfers, but is missing them for cyclic transfers. Cyclic transfers are one of the main uses of the DMA controller, used for audio data transfers. And since these are likely the first DMA peripherals to be enabled, it helps to have these debug messages. Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Signed-off-by: Chen-Yu Tsai <wens@kernel.org> Link: https://patch.msgid.link/20251221064754.1783369-1-wens@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-23	dmaengine: sun6i: Choose appropriate burst length under maxburst	Chen-Yu Tsai
	maxburst, as provided by the client, specifies the largest amount of data that is allowed to be transferred in one burst. This limit is normally provided to avoid a data burst overflowing the target FIFO. It does not mean that the DMA engine can only do bursts in that size. Let the driver pick the largest supported burst length within the given limit. This lets the driver work correctly with some clients that give a large maxburst value. In particular, the 8250_dw driver will give a quarter of the UART's FIFO size as maxburst. On some systems the FIFO size is 256 bytes, giving a maxburst of 64 bytes, while the hardware only supports bursts of up to 16 bytes. Signed-off-by: Chen-Yu Tsai <wens@kernel.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Link: https://patch.msgid.link/20251221080450.1813479-1-wens@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-23	dmaengine: idxd: uapi: use UAPI types	Thomas Weißschuh
	Using libc types and headers from the UAPI headers is problematic as it introduces a dependency on a full C toolchain. Use the fixed-width integer types provided by the UAPI headers instead. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20251222-uapi-idxd-v1-1-baa183adb20d@linutronix.de Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-23	dmaengine: sh: Discard pm_runtime_put() return value	Rafael J. Wysocki
	Clobbering an error value to be returned from shdma_tx_submit() with a pm_runtime_put() return value is not particularly useful, especially if the latter is 0, so stop doing that. This will facilitate a planned change of the pm_runtime_put() return type to void in the future. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/9626129.rMLUfLXkoz@rafael.j.wysocki Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-23	dt-bindings: soundwire: qcom: Add SoundWire v2.2.0 compatible	Prasad Kumpatla
	Add qcom,soundwire-v2.2.0 to the list of supported Qualcomm SoundWire controller versions. This version falls back to qcom,soundwire-v2.0.0 if not explicitly handled by the driver. Signed-off-by: Jingyi Wang <jingyi.wang@oss.qualcomm.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Prasad Kumpatla <prasad.kumpatla@oss.qualcomm.com> Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com> Link: https://patch.msgid.link/20251105-knp-audio-v2-v4-1-ae0953f02b44@oss.qualcomm.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-23	soundwire: Use bus methods for .probe(), .remove() and .shutdown()	Uwe Kleine-König
	These are nearly identical to the respective driver callbacks. The only differences are that .remove() returns void instead of int and .shutdown() has to cope for unbound devices. The objective is to get rid of users of struct device_driver callbacks .probe(), .remove() and .shutdown() to eventually remove these. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/20251215174925.1327021-6-u.kleine-koenig@baylibre.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-23	soundwire: Make remove function return no value	Uwe Kleine-König
	All remove functions return zero and the driver core ignores any other returned value (just emits a warning about it being ignored). So make all remove callbacks return void instead of an ignored int. This is in line with most other subsystems. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/20251215174925.1327021-5-u.kleine-koenig@baylibre.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2025-12-22	bpf: crypto: replace -EEXIST with -EBUSY	Daniel Gomez
	The -EEXIST error code is reserved by the module loading infrastructure to indicate that a module is already loaded. When a module's init function returns -EEXIST, userspace tools like kmod interpret this as "module already loaded" and treat the operation as successful, returning 0 to the user even though the module initialization actually failed. This follows the precedent set by commit 54416fd76770 ("netfilter: conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same issue in nf_conntrack_helper_register(). This affects bpf_crypto_skcipher module. While the configuration required to build it as a module is unlikely in practice, it is technically possible, so fix it for correctness. Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20251220-dev-module-init-eexists-bpf-v1-1-7f186663dbe7@samsung.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	Merge branch 'allow-calling-kfuncs-from-raw_tp-programs'	Alexei Starovoitov
	Puranjay Mohan says: ==================== Allow calling kfuncs from raw_tp programs V1: https://lore.kernel.org/all/20251218145514.339819-1-puranjay@kernel.org/ Changes in V1->V2: - Update selftests to allow success for raw_tp programs calling kfuncs. This set enables calling kfuncs from raw_tp programs. ==================== Link: https://patch.msgid.link/20251222133250.1890587-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	selftests: bpf: fix tests with raw_tp calling kfuncs	Puranjay Mohan
	As the previous commit allowed raw_tp programs to call kfuncs, so of the selftests that were expected to fail will now succeed. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251222133250.1890587-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	bpf: allow calling kfuncs from raw_tp programs	Puranjay Mohan
	Associate raw tracepoint program type with the kfunc tracing hook. This allows calling kfuncs from raw_tp programs. Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251222133250.1890587-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	Merge branch 'mm-bpf-kfuncs-to-access-memcg-data'	Alexei Starovoitov
	Roman Gushchin says: ==================== mm: bpf kfuncs to access memcg data Introduce kfuncs to simplify the access to the memcg data. These kfuncs can be used to accelerate monitoring use cases and for implementing custom OOM policies once BPF OOM is landed. This patchset was separated out from the BPF OOM patchset to simplify the logistics and accelerate the landing of the part which is useful by itself. No functional changes since BPF OOM v2. v4: - refactored memcg vm event and stat item idx checks (by Alexei) v3: - dropped redundant kfuncs flags (by Alexei) - fixed kdocs warnings (by Alexei) - merged memcg stats access patches into one (by Alexei) - restored root memcg usage reporting, added a comment - added checks for enum boundaries - added Shakeel and JP as co-maintainers (by Shakeel) v2: - added mem_cgroup_disabled() checks (by Shakeel B.) - added special handling of the root memcg in bpf_mem_cgroup_usage() (by Shakeel B.) - minor fixes in the kselftest (by Shakeel B.) - added a MAINTAINERS entry (by Shakeel B.) v1: https://lore.kernel.org/bpf/87ike29s5r.fsf@linux.dev/T/#t ==================== Link: https://patch.msgid.link/20251223044156.208250-1-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	MAINTAINERS: add an entry for MM BPF extensions	Roman Gushchin
	Let's create a separate entry for MM BPF extensions: these patches often require an attention from both bpf and mm communities. Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20251223044156.208250-7-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	bpf: selftests: selftests for memcg stat kfuncs	JP Kobryn
	Add test coverage for the kfuncs that fetch memcg stats. Using some common stats, test scenarios ensuring that the given stat increases by some arbitrary amount. The stats selected cover the three categories represented by the enums: node_stat_item, memcg_stat_item, vm_event_item. Since only a subset of all stats are queried, use a static struct made up of fields for each stat. Write to the struct with the fetched values when the bpf program is invoked and read the fields in the user mode program for verification. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20251223044156.208250-6-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	mm: introduce BPF kfuncs to access memcg statistics and events	Roman Gushchin
	Introduce BPF kfuncs to conveniently access memcg data: - bpf_mem_cgroup_vm_events(), - bpf_mem_cgroup_memory_events(), - bpf_mem_cgroup_usage(), - bpf_mem_cgroup_page_state(), - bpf_mem_cgroup_flush_stats(). These functions are useful for implementing BPF OOM policies, but also can be used to accelerate access to the memcg data. Reading it through cgroupfs is much more expensive, roughly 5x, mostly because of the need to convert the data into the text and back. JP Kobryn: An experiment was setup to compare the performance of a program that uses the traditional method of reading memory.stat vs a program using the new kfuncs. The control program opens up the root memory.stat file and for 1M iterations reads, converts the string values to numeric data, then seeks back to the beginning. The experimental program sets up the requisite libbpf objects and for 1M iterations invokes a bpf program which uses the kfuncs to fetch all available stats for node_stat_item, memcg_stat_item, and vm_event_item types. The results showed a significant perf benefit on the experimental side, outperforming the control side by a margin of 93%. In kernel mode, elapsed time was reduced by 80%, while in user mode, over 99% of time was saved. control: elapsed time real 0m38.318s user 0m25.131s sys 0m13.070s experiment: elapsed time real 0m2.789s user 0m0.187s sys 0m2.512s control: perf data 33.43% a.out libc.so.6 [.] __vfscanf_internal 6.88% a.out [kernel.kallsyms] [k] vsnprintf 6.33% a.out libc.so.6 [.] _IO_fgets 5.51% a.out [kernel.kallsyms] [k] format_decode 4.31% a.out libc.so.6 [.] __GI_____strtoull_l_internal 3.78% a.out [kernel.kallsyms] [k] string 3.53% a.out [kernel.kallsyms] [k] number 2.71% a.out libc.so.6 [.] _IO_sputbackc 2.41% a.out [kernel.kallsyms] [k] strlen 1.98% a.out a.out [.] main 1.70% a.out libc.so.6 [.] _IO_getline_info 1.51% a.out libc.so.6 [.] __isoc99_sscanf 1.47% a.out [kernel.kallsyms] [k] memory_stat_format 1.47% a.out [kernel.kallsyms] [k] memcpy_orig 1.41% a.out [kernel.kallsyms] [k] seq_buf_printf experiment: perf data 10.55% memcgstat bpf_prog_..._query [k] bpf_prog_16aab2f19fa982a7_query 6.90% memcgstat [kernel.kallsyms] [k] memcg_page_state_output 3.55% memcgstat [kernel.kallsyms] [k] _raw_spin_lock 3.12% memcgstat [kernel.kallsyms] [k] memcg_events 2.87% memcgstat [kernel.kallsyms] [k] __memcg_slab_post_alloc_hook 2.73% memcgstat [kernel.kallsyms] [k] kmem_cache_free 2.70% memcgstat [kernel.kallsyms] [k] entry_SYSRETQ_unsafe_stack 2.25% memcgstat [kernel.kallsyms] [k] __memcg_slab_free_hook 2.06% memcgstat [kernel.kallsyms] [k] get_page_from_freelist Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Co-developed-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Link: https://lore.kernel.org/r/20251223044156.208250-5-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	mm: introduce bpf_get_root_mem_cgroup() BPF kfunc	Roman Gushchin
	Introduce a BPF kfunc to get a trusted pointer to the root memory cgroup. It's very handy to traverse the full memcg tree, e.g. for handling a system-wide OOM. It's possible to obtain this pointer by traversing the memcg tree up from any known memcg, but it's sub-optimal and makes BPF programs more complex and less efficient. bpf_get_root_mem_cgroup() has a KF_ACQUIRE \| KF_RET_NULL semantics, however in reality it's not necessary to bump the corresponding reference counter - root memory cgroup is immortal, reference counting is skipped, see css_get(). Once set, root_mem_cgroup is always a valid memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op. Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20251223044156.208250-4-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	mm: introduce BPF kfuncs to deal with memcg pointers	Roman Gushchin
	To effectively operate with memory cgroups in BPF there is a need to convert css pointers to memcg pointers. A simple container_of cast which is used in the kernel code can't be used in BPF because from the verifier's point of view that's a out-of-bounds memory access. Introduce helper get/put kfuncs which can be used to get a refcounted memcg pointer from the css pointer: - bpf_get_mem_cgroup, - bpf_put_mem_cgroup. bpf_get_mem_cgroup() can take both memcg's css and the corresponding cgroup's "self" css. It allows it to be used with the existing cgroup iterator which iterates over cgroup tree, not memcg tree. Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://lore.kernel.org/r/20251223044156.208250-3-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-22	mm: declare memcg_page_state_output() in memcontrol.h	Roman Gushchin
	To use memcg_page_state_output() in bpf_memcontrol.c move the declaration from v1-specific memcontrol-v1.h to memcontrol.h. Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Link: https://lore.kernel.org/r/20251223044156.208250-2-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-12-23	wifi: rtlwifi: fix typo 'received' in comment	Rohit Chourasia
	Fix typo received -> received by review. Signed-off-by: Rohit Chourasia <chourasiarohit27@gmail.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Signed-off-by: Ping-Ke Shih <pkshih@realtek.com> Link: https://patch.msgid.link/20251211161911.30611-1-chourasiarohit27@gmail.com
2025-12-22	sched_ext: Avoid multiple irq_work_queue() calls in destroy_dsq()	Zqiang
	llist_add() returns true only when adding to an empty list, which indicates that no IRQ work is currently queued or running. Therefore, we only need to call irq_work_queue() when llist_add() returns true, to avoid unnecessarily re-queueing IRQ work that is already pending or executing. Signed-off-by: Zqiang <qiang.zhang@linux.dev> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-12-22	sched_ext: Use the resched_cpu() to replace resched_curr() in the ↵	Zqiang
	bypass_lb_node() For the PREEMPT_RT kernels, the scx_bypass_lb_timerfn() running in the preemptible per-CPU ktimer kthread context, this means that the following scenarios will occur(for x86 platform): cpu1 cpu2 ktimer kthread: ->scx_bypass_lb_timerfn ->bypass_lb_node ->for_each_cpu(cpu, resched_mask) migration/1: by preempt by migration/2: multi_cpu_stop() multi_cpu_stop() ->take_cpu_down() ->__cpu_disable() ->set cpu1 offline ->rq1 = cpu_rq(cpu1) ->resched_curr(rq1) ->smp_send_reschedule(cpu1) ->native_smp_send_reschedule(cpu1) ->if(unlikely(cpu_is_offline(cpu))) { WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu); return; } This commit therefore use the resched_cpu() to replace resched_curr() in the bypass_lb_node() to avoid send-ipi to offline CPUs. Signed-off-by: Zqiang <qiang.zhang@linux.dev> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>