linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
2026-05-06	ASoC: pxa2xx: push gpio usage into arch code	Arnd Bergmann
	There are no remaining static platform_device users of pxa2xx ac97, so the rest of that code path can go away as well. Since nothing in the driver uses the gpio number now, constrain the use of the legacy gpio interface to the architecture specific code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260505202426.3605262-2-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2026-05-06	ASoC: arm: pxa2xx: remove platform_data processing	Arnd Bergmann
	Nothing ever sets pxa2xx_audio_ops_t since the last users were removed in ce79f3a1ad5f ("ARM: pxa: prune unused device support") , so stop passing it around through the sound, ac97 code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260505202426.3605262-1-arnd@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org>
2026-05-05	Merge tag 'wq-for-7.1-rc2-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - Fix devm_alloc_workqueue() passing a va_list as a positional arg to the variadic alloc_workqueue() macro, which garbled wq->name and skipped lockdep init on the devm path. Fold both noprof entry points onto a va_list helper. Also, annotate it using __printf(1, 0) * tag 'wq-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Annotate alloc_workqueue_va() with __printf(1, 0) workqueue: fix devm_alloc_workqueue() va_list misuse
2026-05-05	Merge tag 'cgroup-for-7.1-rc2-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - During v6.19, cgroup task unlink was moved from do_exit() to after the final task switch to satisfy a controller invariant. That left the kernel seeing tasks past exit_signals() longer than userspace expected, and several v7.0 follow-ups tried to bridge the gap by making rmdir wait for the kernel side. None held up. The latest is an A-A deadlock when rmdir is invoked by the reaper of zombies whose pidns teardown the rmdir itself is waiting on, which points at the synchronizing approach being fundamentally wrong. Take a different approach: drop the wait, leave rmdir's user-visible side returning as soon as cgroup.procs is empty, and defer the css percpu_ref kill that drives ->css_offline() until the cgroup is fully depopulated. Tagged for stable. Somewhat invasive but contained. The hope is that fixing forward sticks. If not, the fallback is to revert the entire chain and rework on the development branch. Note that this doesn't plug a pre-existing analogous race in cgroup_apply_control_disable() (controller disable via subtree_control). Not a regression. The development branch will do the more invasive restructuring needed for that. - Documentation update for cgroup-v1 charge-commit section that still referenced functions removed when the memcg hugetlb try-commit-cancel protocol was retired. * tag 'cgroup-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: docs: cgroup-v1: Update charge-commit section cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
2026-05-05	Merge tag 'sched_ext-for-7.1-rc2-fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Fix idle CPU selection returning prev_cpu outside the task's cpus_ptr when the BPF caller's allowed mask was wider. Stable backport. - Two opposite-direction gaps in scx_task_iter's cgroup-scoped mode versus the global mode: - Tasks past exit_signals() are filtered by the cgroup walk but kept by global. Sub-scheduler enable abort leaked __scx_init_task() state. Add a CSS_TASK_ITER_WITH_DEAD flag to cgroup's task iterator (scx_task_iter is its only user) and use it. - Tasks past sched_ext_dead() are still returned, tripping WARN_ON_ONCE() in callers or making them touch torn-down state. Mark and skip under the per-task rq lock. * tag 'sched_ext-for-7.1-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: idle: Recheck prev_cpu after narrowing allowed mask sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked() cgroup, sched_ext: Include exiting tasks in cgroup iter
2026-05-05	PCI: switchtec: Add Gen6 Device IDs	Ben Reed
	Add device IDs for the next generation of switchtec products. No changes to the driver were required with the new version of the hardware. [logang: rewrote commit message] Signed-off-by: Ben Reed <Ben.Reed@microchip.com> Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260505161633.67454-1-logang@deltatee.com
2026-05-05	firmware: arm_scmi: Rename struct scmi_revision_info to scmi_base_info	Marek Vasut
	Rename struct scmi_revision_info to struct scmi_base_info , to accurately represent its content. The scmi_revision_info is no longer accurate, because the structure now contains more than only SCMI base protocol revision, it now also contains number of protocols, agents, vendor and subvendor strings. All those are fetched from the base protocol, so rename the structure to scmi_base_info, to match the other scmi_*_info structure names. No functional change. Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org> Link: https://patch.msgid.link/20260406155343.72087-1-marek.vasut+renesas@mailbox.org Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-05-05	firmware: arm_scmi: imx: Support getting reset reason of MISC protocol	Peng Fan
	MISC protocol supports getting reset reason per Logical Machine or System. Add the API for user to retrieve the information from System Manager. Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://patch.msgid.link/20260305-scmi-imx-reset-v1-1-18de78978ba9@nxp.com Signed-off-by: Sudeep Holla <sudeep.holla@kernel.org>
2026-05-05	rseq: Revert to historical performance killing behaviour	Thomas Gleixner
	The recent RSEQ optimization work broke the TCMalloc abuse of the RSEQ ABI as it not longer unconditionally updates the CPU, node, mm_cid fields, which are documented as read only for user space. Due to the observed behavior of the kernel it was possible for TCMalloc to overwrite the cpu_id_start field for their own purposes and rely on the kernel to update it unconditionally after each context switch and before signal delivery. The RSEQ ABI only guarantees that these fields are updated when the data changes, i.e. the task is migrated or the MMCID of the task changes due to switching from or to per CPU ownership mode. The optimization work eliminated the unconditional updates and reduced them to the documented ABI guarantees, which results in a massive performance win for syscall, scheduling heavy work loads, which in turn breaks the TCMalloc expectations. There have been several options discussed to restore the TCMalloc functionality while preserving the optimization benefits. They all end up in a series of hard to maintain workarounds, which in the worst case introduce overhead for everyone, e.g. in the scheduler. The requirements of TCMalloc and the optimization work are diametral and the required work arounds are a maintainence burden. They end up as fragile constructs, which are blocking further optimization work and are pretty much guaranteed to cause more subtle issues down the road. The optimization work heavily depends on the generic entry code, which is not used by all architectures yet. So the rework preserved the original mechanism moslty unmodified to keep the support for architectures, which handle rseq in their own exit to user space loop. That code is currently optimized out by the compiler on architectures which use the generic entry code. This allows to revert back to the original behaviour by replacing the compile time constant conditions with a runtime condition where required, which disables the optimization and the dependend time slice extension feature until the run-time condition can be enabled in the RSEQ registration code on a per task basis again. The following changes are required to restore the original behavior, which makes TCMalloc work again: 1) Replace the compile time constant conditionals with runtime conditionals where appropriate to prevent the compiler from optimizing the legacy mode out 2) Enforce unconditional update of IDs on context switch for the non-optimized v1 mode 3) Enforce update of IDs in the pre signal delivery path for the non-optimized v1 mode 4) Enforce update of IDs in the membarrier(RSEQ) IPI for the non-optimized v1 mode 5) Make time slice and future extensions depend on optimized v2 mode This brings back the full performance problems, but preserves the v2 optimization code and for generic entry code using architectures also the TIF_RSEQ optimization which avoids a full evaluation of the exit to user mode loop in many cases. Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Reported-by: Mathias Stearn <mathias@mongodb.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Closes: https://lore.kernel.org/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com Link: https://patch.msgid.link/20260428224427.517051752%40kernel.org Cc: stable@vger.kernel.org
2026-05-05	wifi: ieee80211: define UHR ML-PM extended MLD capability	Johannes Berg
	UHR defines bit 8 to mean multi-link power management, add a definition for it. Also reindent the other definitions to use tabs, not spaces. Link: https://patch.msgid.link/20260428110915.c6b6a06016cf.I7ebd97397507d320124547017e21191b55c5d34d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: mac80211: update UHR capabilities field order	Johannes Berg
	Since 802.11bn D1.4 the DBE capabilities are after the PHY capabilities, not between MAC and PHY, adjust the code accordingly. Also add a struct for DBE capabilities and use it for checking the correct length instead of hard-coding the lengths. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Miriam Rachel Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260428103657.b40af50f182d.I75306a092dc2c8a9eb7276160f0b7144b4846d18@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	pinctrl: add optional .release_mux() callback	Frank Li
	Add an optional .release_mux() callback to struct pinmux_ops. Some drivers acquire additional resources in .set_mux(), such as software locks. These resources may need to be released when the mux function is no longer active. Introducing a dedicated .release_mux() callback allows drivers to clean up such resources. The callback is optional and does not affect existing drivers. Commit 2243a87d90b42 ("pinctrl: avoid duplicated calling enable_pinmux_setting for a pin") removed the .disable() callback to resolve two issues: 1. desc->mux_usecount increasing monotonically 2. Hardware glitches caused by repeated .disable()/.enable() calls Adding .release_mux() does not reintroduce those problems. The callback is intended only for releasing driver-side resources (e.g. locks) and must not modify hardware registers. Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
2026-05-05	mux: add devm_mux_state_get_from_np() to get mux from child node	Frank Li
	Add new API devm_mux_state_get_from_np() to retrieve a mux control from a specified child device node. Make devm_mux_state_get() call devm_mux_state_get_from_np() with a NULL node parameter, which defaults to using the device's own of_node. Support the following DT schema: pinctrl@0 { uart-func { mux-state = <&mux_chip 0>; }; spi-func { mux-state = <&mux_chip 1>; }; }; Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Linus Walleij <linusw@kernel.org>
2026-05-05	thunderbolt: Don't create multiple DMA tunnels on firmware connection manager	Alan Borzeszkowski
	Firmware connection manager supports only one DMA tunnel per XDomain connection. Firmware prior Intel Titan Ridge failed the operation directly but the same does not happen anymore on Titan Ridge and forward. For this reason add an explicit check, and fail the operation accordingly in the driver. Signed-off-by: Alan Borzeszkowski <alan.borzeszkowski@linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2026-05-05	thunderbolt: Wait for tb_domain_release() to complete when driver is removed	Mika Westerberg
	We should not call nhi_shutdown() before the domain structure and the control channel rings are completely released. Otherwise we might release resources like the nhi->msix_ida that are still referenced in tb_domain_release(). For this reason wait for the tb_domain_release() to be completed before continuing to nhi_shutdown() and eventually releasing of the rest of the data structures. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
2026-05-05	wifi: cfg80211: add LTF keyseed support for secure ranging	Peddolla Harshavardhan Reddy
	Currently there is no way to install an LTF key seed that can be used in non-trigger-based (NTB) and trigger-based (TB) FTM ranging to protect NDP frames. Without this, drivers cannot enable PHY-layer security for peer measurement sessions, leaving ranging measurements vulnerable to eavesdropping and manipulation. Introduce NL80211_KEY_LTF_SEED attribute and the dedicated extended feature flag NL80211_EXT_FEATURE_SET_KEY_LTF_SEED to allow drivers to advertise and install LTF key seeds via nl80211. The key seed must be configured beforehand to ensure the peer measurement session is secure. The driver must advertise both NL80211_EXT_FEATURE_SECURE_LTF and NL80211_EXT_FEATURE_SET_KEY_LTF_SEED for the key seed installation to be permitted. The LTF key seed is pairwise key material and must only be used with pairwise key type. Reject attempts to use it with other key types. Signed-off-by: Peddolla Harshavardhan Reddy <peddolla.reddy@oss.qualcomm.com> Link: https://patch.msgid.link/20260420090856.2152905-13-peddolla.reddy@oss.qualcomm.com [fix policy coding style] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	soc: ti: knav_dma: fix all kernel-doc warnings in knav_dma.h	Randy Dunlap
	Use correct struct member names and formats to avoid kernel-doc warnings: Warning: include/linux/soc/ti/knav_dma.h:83 struct member 'priority' not described in 'knav_dma_tx_cfg' Warning: include/linux/soc/ti/knav_dma.h:113 struct member 'err_mode' not described in 'knav_dma_rx_cfg' Warning: include/linux/soc/ti/knav_dma.h:113 struct member 'desc_type' not described in 'knav_dma_rx_cfg' Warning: include/linux/soc/ti/knav_dma.h:113 struct member 'fdq' not described in 'knav_dma_rx_cfg' Warning: include/linux/soc/ti/knav_dma.h:127 struct member 'direction' not described in 'knav_dma_cfg' Warning: include/linux/soc/ti/knav_dma.h:127 struct member 'u' not described in 'knav_dma_cfg' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260301011228.3064940-1-rdunlap@infradead.org Signed-off-by: Nishanth Menon <nm@ti.com>
2026-05-05	mm/slab: Add kvfree_atomic() helper	Uladzislau Rezki (Sony)
	kvmalloc() now supports non-sleeping GFP flags, including the vmalloc fallback path. This means it may return vmalloc memory even for GFP_ATOMIC and GFP_NOWAIT allocations. Freeing such memory with kvfree() may then end up calling vfree(), which is not safe for non-sleeping contexts. Introduce kvfree_atomic() helper for such cases. It mirrors kvfree(), but uses vfree_atomic() for vmalloced memory. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-05-04	ipv4: igmp: encode multicast exponential fields	Ujjal Roy
	In IGMP, MRC and QQIC fields are not correctly encoded when generating query packets. Since the receiver of the query interprets these fields using the IGMPv3 floating- point decoding logic, any value that exceeds the linear threshold is incorrectly parsed as an exponential value, leading to an incorrect interval calculation. Encode and assign the corresponding protocol fields during query generation. Introduce the logic to dynamically calculate the exponent and mantissa using bit-scan (fls). This ensures MRC and QQIC fields (8-bit) are properly encoded when transmitting query packets with intervals that exceed their respective linear threshold value of 128 (for MRT/QQI). RFC3376: for both MRC and QQIC, values >= 128 represent the same floating-point encoding as follows: 0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+ \|1\| exp \| mant \| +-+-+-+-+-+-+-+-+ Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Link: https://patch.msgid.link/20260502131907.987-4-royujjal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-04	ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation	Ujjal Roy
	Get rid of the IGMPV3_MRC macro and use the igmpv3_mrt() API to calculate the Max Resp Time from the Maximum Response Code. Similarly, for IGMPV3_QQIC, use the igmpv3_qqi() API to calculate the Querier's Query Interval from the QQIC field. Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Ujjal Roy <royujjal@gmail.com> Link: https://patch.msgid.link/20260502131907.987-2-royujjal@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN	Waiman Long
	Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management"), kthreads default to use the HK_TYPE_DOMAIN cpumask. IOW, it is no longer affected by the setting of the nohz_full boot kernel parameter. That means HK_TYPE_KTHREAD should now be an alias of HK_TYPE_DOMAIN instead of HK_TYPE_KERNEL_NOISE to correctly reflect the current kthread behavior. Make the change as HK_TYPE_KTHREAD is still being used in some networking code. Fixes: 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-04	PCI: endpoint: pci-ep-msi: Add embedded doorbell fallback	Koichiro Den
	Some endpoint platforms cannot use platform MSI / GIC ITS to implement EP-side doorbells. In those cases, EPF drivers cannot provide an interrupt-driven doorbell and often fall back to polling. Add an "embedded" doorbell backend that uses a controller-integrated doorbell target (e.g. DesignWare integrated eDMA interrupt-emulation doorbell). The backend locates the doorbell register and a corresponding Linux IRQ via the EPC aux-resource API. If the doorbell register is already exposed via a fixed BAR mapping, provide BAR+offset. Otherwise provide the DMA address returned by dma_map_resource() (which may be an IOVA when an IOMMU is enabled) so EPF drivers can map it into BAR space. When MSI doorbell allocation fails with -ENODEV, pci_epf_alloc_doorbell() falls back to this embedded backend. Suggested-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Koichiro Den <den@valinux.co.jp> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260414141514.1341429-8-den@valinux.co.jp
2026-05-04	driver core: class: fix typo in struct class documentation	Prabhudasu Vatala
	Fix a spelling error in the comment for the ns_type member of struct class. Change "detemine" to "determine". Signed-off-by: Prabhudasu Vatala <prabhudasuvatala@gmail.com> Link: https://patch.msgid.link/20260503141826.27462-1-prabhudasuvatala@gmail.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-05-04	sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()	Tejun Heo
	scx_task_iter's cgroup-scoped mode can return tasks whose sched_ext_dead() has already completed: cgroup_task_dead() removes from cset->tasks after sched_ext_dead() in finish_task_switch() and is irq-work deferred on PREEMPT_RT. The global mode is fine - sched_ext_dead() removes from scx_tasks via list_del_init() first. Callers (sub-sched enable prep/abort/apply, scx_sub_disable(), scx_fail_parent()) assume returned tasks are still on @sch and trip WARN_ON_ONCE() or operate on torn-down state otherwise. Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and have scx_task_iter_next_locked() skip flagged tasks under the same lock. Setter and reader serialize on the per-task rq lock - no race. Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-04	cgroup, sched_ext: Include exiting tasks in cgroup iter	Tejun Heo
	a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made css_task_iter_advance() skip exiting tasks so cgroup.procs stays consistent with waitpid() visibility. Unfortunately, this broke scx_task_iter. scx_task_iter walks either scx_tasks (global) or a cgroup subtree via css_task_iter() and the two modes are expected to cover the same set of tasks. After the above change the cgroup-scoped mode silently skips tasks past exit_signals() that are still on scx_tasks. scx_sub_enable_workfn()'s abort path is one of the symptoms: an exiting SCX_TASK_SUB_INIT task can race past the cgroup iter leaking __scx_init_task() state. Other iterations share the same gap. Add CSS_TASK_ITER_WITH_DEAD to opt out of the skip and use it from scx_task_iter(). Fixes: b0e4c2f8a0f0 ("sched_ext: Implement cgroup subtree iteration for scx_task_iter") Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-04	cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated	Tejun Heo
	A chain of commits going back to v7.0 reworked rmdir to satisfy the controller invariant that a subsystem's ->css_offline() must not run while tasks are still doing kernel-side work in the cgroup. [1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out") [2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") [3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") [4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition") [5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context") [1] moved task cset unlink from do_exit() to finish_task_switch() so a task's cset link drops only after the task has fully stopped scheduling. That made tasks past exit_signals() linger on cset->tasks until their final context switch, which led to a series of problems as what userspace expected to see after rmdir diverged from what the kernel needs to wait for. [2]-[5] tried to bridge that divergence: [2] filtered the exiting tasks from cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4] fixed the wait's condition; [5] made nr_dying_subsys_* visible synchronously. The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g. host PID 1 systemd reaping orphan pids that were re-parented to it during the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those pids to free, the pids can't free because PID 1 is the reaper and it's stuck in rmdir, and the system A-A deadlocks. No internal lock ordering breaks this; the wait itself is the bug. The css killing side that drove the original reorder, however, can be made cleanly asynchronous: ->css_offline() is already async, run from css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to make that chain start only after all tasks have left the cgroup. rmdir's user-visible side then returns as soon as cgroup.procs and friends are empty, while ->css_offline() still runs only after the cgroup is fully drained. Verified by the original reproducer (pidns teardown + zombie reaper, runs under vng) which hangs vanilla and succeeds here, and by per-commit deterministic repros for [2], [3], [4], [5] with a boot parameter that widens the post-exit_signals() window so each state is reliably reachable. Some stress tests on top of that. cgroup_apply_control_disable() has the same shape of pre-existing race: when a controller is disabled via subtree_control, kill_css() ran synchronously while tasks past exit_signals() could still be linked to the cgroup's csets, and ->css_offline() could fire before they drained. This patch preserves the existing synchronous behavior at that call site (kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch will defer kill_css_finish() there using a per-css trigger. This seems like the right approach and I don't see problems with it. The changes are somewhat invasive but not excessively so, so backporting to -stable should be okay. If something does turn out to be wrong, the fallback is to revert the entire chain ([1]-[5]) and rework in the development branch instead. v2: Pin cgrp across the deferred destroy work with explicit cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1 wasn't actually broken (ordered cgroup_offline_wq + queue_work order in cgroup_task_dead() saved it) but the explicit ref removes the dependency on those non-obvious invariants. Also note the pre-existing cgroup_apply_control_disable() race in the description; a follow-up will defer kill_css_finish() there. Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") Cc: stable@vger.kernel.org # v7.0+ Reported-and-tested-by: Martin Pitt <martin@piware.de> Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/ Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/ Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2026-05-04	dma-buf/dma_fence_array: remove unused functionality v4	Christian König
	Amdgpu was the only user of the signal on any feature and we dropped that use case recently, so we can remove that functionality. v2: update num_pending only after the fence is signaled v3: separate out simplifying dma_fence_array implementation v4: fix XE patch split fallout Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20260422103012.1647-1-christian.koenig@amd.com
2026-05-04	mtd: spinand: Use secondary ops for continuous reads	Miquel Raynal
	In case a chip supports continuous reads, but uses a slightly different cache operation for these, it may provide a secondary operation template which will be used only during continuous cache read operations. From a vendor driver point of view, enabling this feature implies providing a new set of templates for these continuous read operations. The core will automatically pick the fastest variant, depending on the hardware capabilities. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-05-04	Merge tag 'mtd/spi-mem-cont-read-for-7.2' into nand/next	Miquel Raynal
	Aside from preparation changes in the SPI NAND core, the changes carried here focus on the shared spi-mem layer which is enhanced in order to bring two new features: - The possibility to fill a primary and a secondary operation template in the direct mapping structure in order to support continuous reads in SPI NAND, which may require two different read operations. - SPI controllers may indicate possible CS instabilities over long transfers by setting a boolean. This capability is related to the previous one, the need for it has arised while testing SPI NAND continuous reads with the Cadence QSPI controller which cannot, under certain conditions, keep the CS asserted for the length of an eraseblock-large transfer.
2026-05-04	spi: spi-mem: Add a no_cs_assertion capability	Miquel Raynal
	Some controllers are 'smart', and that's a problem. For instance, the Cadence quadspi controller is capable of deasserting the CS automatically whenever a too long period of time without any data to transfer elapses. This 'feature' combined with a loaded interconnect with arbitration, a "long" transfer may be split into smaller DMA transfers. In this case the controller may allow itself to deassert the CS between chunks. Deasserting the CS stops any ongoing continuous read. Reasserting it later to continue the reading will only result in the host getting garbage. In this case, the host controller driver has no control over the CS state, so we cannot reliably enable continuous reads. Flag this limitation through a spi-mem controller capability. The inversion in the flag name (starting with 'no_') is voluntary, in order to avoid the need to set this flag in all controller drivers. Only the broken controllers shall set this bit, the default being that the controller masters its CS fully. Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-05-04	spi: spi-mem: Create a secondary read operation	Miquel Raynal
	In some situations, direct mappings may need to use different operation templates. For instance, when enabling continuous reads, Winbond SPI NANDs no longer expect address cycles because they would be ignoring them otherwise. Hence, right after the command opcode, they start counting dummy cycles, followed by the data cycles as usual. This breaks the assumptions of "reads from cache" always being done identically once the best variant has been picked up, across the lifetime of the system. In order to support this feature, we must give direct mapping more than a single operation template to use, in order to switch to using secondary operations upon request by the upper layer. Create the concept of optional secondary operation template, which may or may not be fulfilled by the SPI NAND and SPI NOR cores. If the underlying SPI controller does not leverage any kind of direct mapping acceleration, the feature has no impact and can be freely used. Otherwise, the controller driver needs to opt-in for using this feature, if supported. The condition checked to know whether a secondary operation has been provided or not is to look for a non zero opcode to limit the creation of extra variables. In practice, the opcode 0x00 exist, but is not related to any cache related operation. Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-05-04	spi: spi-mem: Transform the read operation template	Miquel Raynal
	As of now, we only use a single operation template when creating SPI memory direct mappings. With the idea to extend this possibility to 2, rename the template to reflect that we are currently setting the "primary" operation, and create a pointer in the same structure to point to it. From a user point of view, the op_tmpl name remains but becomes a pointer, leading to minor changes in both the SPI NAND and SPI NOR cores. There is no functional change. Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-05-04	mtd: spinand: Drop ECC dirmaps	Miquel Raynal
	Direct mappings are very static concepts, which allow us to reuse a template to perform reads or writes in a very efficient manner after a single initialization. With the introduction of pipelined ECC engines for SPI controllers, the need to differentiate between an operation with and without correction has arised. The chosen solution at that time has been to create new direct mappings for these operations, jumping from 2 to 4 dirmaps per target. Enabling ECC was done by choosing the correct dirmap. Today, we need to further parametrize dirmaps. With the goal to enable continuous reads on a wider range of devices, we will need more flexibility regarding the read from cache operation template to pick at run time, for instance to use shorter "continuous read from cache" variants. We could create other direct mappings, but it would increase the matrix by a power of two, bringing the theoretical number of dirmaps to 8 (read/write, ecc, shorter read variants) per target. This grow is not sustainable, so let's change how dirmaps work - a little bit. Operations already carry an ECC parameter, use it to indicate whether error correction is required or not. In practice this change happens only at the core level, SPI controller drivers do not care about the direct mapping structure in this case, they just pick whatever is in the template as a base. As a result, we allow the core to dynamically change the content of the templates. He who can do more can do less, so during the checking steps, make sure to enable the ECC requirement just for the time of the checks. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-05-04	mtd: spinand: Expose spinand_op_is_odtr()	Miquel Raynal
	This helper is going to be needed in a vendor driver, so expose it. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-05-04	clk: renesas: r9a09g047: Add CLK_PLLDSI{0,1} clocks	Tommaso Merciai
	Add support for the PLLDSI{0,1} clocks in the r9a09g047 CPG driver. Introduce CLK_PLLDSI{0,1} also, introduce the rzg3e_cpg_pll_dsi{0,1}_limits structures to describe the frequency constraints specific to the RZ/G3E SoC. On Renesas RZ/G3E: - PLLDSI0 maximum output frequency: 1218 MHz - PLLDSI1 maximum output frequency: 609 MHz These limits are enforced through the newly added RZG3E_CPG_PLL_DSI{0,1}_LIMITS(). Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Link: https://patch.msgid.link/d26ec5349b0eb7ddb7d244fc53d1111a8530328f.1775636898.git.tommaso.merciai.xr@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
2026-05-04	gpiolib: move legacy interface into linux/gpio/legacy.h	Arnd Bergmann
	Split the old contents from gpio.h for clarity. Ideally any driver that still includes linux/gpio.h can now be ported over to use either linux/gpio/legacy.h or linux/gpio/consumer.h, with the original file getting removed once that is complete. No functional changes intended for now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260428154522.2861492-1-arnd@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
2026-05-02	ixgbe: E610: add discovering EEE capability	Jedrzej Jagielski
	Add detecting and parsing EEE device capability. Recently EEE functionality support has been introduced to E610 FW. Currently ixgbe driver has no possibility to detect whether NVM loaded on given adapter supports EEE. There's dedicated device capability element reflecting FW support for given EEE link speed. Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com> Tested-by: Rinitha S <sx.rinitha@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260430-jk-iwl-net-next-2026-04-30-v1-1-6f27ae1cd073@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	net/mlx5: use internal dma pools for frag buf alloc	Nimrod Oren
	Add mlx5_dma_pool alloc/free paths, and wire mlx5_frag_buf allocation and free paths to use them. mlx5_frag_buf_alloc_node() now selects an mlx5_dma_pool to allocate fragments from, instead of directly allocating full coherent pages. mlx5_frag_buf_free() frees from the respective pool. mlx5_dma_pool_alloc() keeps allocation fast by maintaining pages with available indexes at the head of the list, so the common allocation path can take a free index immediately. New backing pages are allocated only when no free index is available. mlx5_dma_pool_free() returns released indexes to the pool and frees a backing page once all of its indexes become free. This avoids keeping fully free pages for the lifetime of the pool and reduces coherent DMA memory footprint. Signed-off-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260429201429.223809-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	net/mlx5: add frag buf pools create/destroy paths	Nimrod Oren
	Introduce mlx5 DMA pool and pool-page data structures, and add the creation and teardown paths. Each NUMA node owns a set of mlx5_dma_pool instances, each one with a different block size. The sizes are defined as all powers of two starting from MLX5_ADAPTER_PAGE_SHIFT and up to PAGE_SHIFT. Since mlx5_frag_bufs are used to back objects whose sizes are encoded relative to MLX5_ADAPTER_PAGE_SHIFT, a smaller block_shift value cannot be used. Requests larger than PAGE_SIZE continue to be handled as page-sized fragments, as in the existing frag-buf allocation model. Signed-off-by: Nimrod Oren <noren@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260429201429.223809-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	tcp: move max_packets_out, cwnd_usage_seq, rate_delivered and ↵	Eric Dumazet
	rate_interval_us to tcp_sock_write_tx group These fields are used in TX path. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430100021.211139-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	tcp: move tp->bytes_acked to tcp_sock_write_tx group	Eric Dumazet
	tp->bytes_acked is touched in TX path only. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430100021.211139-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	tcp: move tp->first_tx_mstamp and tp->delivered_mstamp to tcp_sock_write_tx	Eric Dumazet
	These fields are touched in when payload is sent. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430100021.211139-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	tcp: move tp->segs_in and tp->segs_out to tcp_sock_write_txrx group	Eric Dumazet
	segs_in is changed for each incoming packet, including ACK packets. segs_out is changed for each outgoing packet, including ACK packets. They belong to tcp_sock_write_txrx group. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430100021.211139-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	tcp: move tp->delivered and tp->delivered_ce to tcp_sock_write_tx group	Eric Dumazet
	These counters are changed whenever sent data is acknowleged. They do not belong to tcp_sock_write_txrx group, because TCP receivers do not touch them. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260430100021.211139-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	Merge tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Fixes for rc2, the usual amdgpu/xe double header, I think xe had a couple of weeks combined due to some maintainer access issues, otherwise there's just a few misc fixes and documentation fixups. core and helpers: - calculate framebuffer geometry with format helpers - fix docs amdgpu: - GFX12 fix for CONFIG_DRM_DEBUG_MM configs - Fix DC analog support - Userq fixes - GART placement fix - Aldebaran SMU fixes - AMDGPU_INFO_READ_MMR_REG fix - UVD 3.1 fix - GC 6 TCC fix - Fix root reservation in amdgpu_vm_handle_fault() - RAS fix - Module reload fix for APUs - Fix build for CONFIG_DRM_FBDEV_EMULATION=n - IGT DWB regression fix - GC 11.5.4 fix - VCN user fence fixes - JPEG user fence fixes - SMU 13.0.6 fix - VCN 3/4 IB parser fixes - NV3x+ dGPU vblank fix - DCE6/8 fixes for LVDS/eDP panels without an EDID amdkfd: - Fix for when CONFIG_HSA_AMD is not set - SVM fixes xe: - uapi: Add missing pad and extensions check - uapi: Reject unsafe PAT indices for CPU cached memory - Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge - Xe3p tuning and workaround fixes - USE drm mm instead of drm SA for CCS read/write - Fix leaks and null derefs - Fix Wa_18022495364 appletbdrm: - allocate protocol buffers with kvzalloc() dma-buf: - fix docs imagination: - avoid segfault in debugfs ofdrm: - put PCI device reference on errors udl: - increase USB timeout" * tag 'drm-fixes-2026-05-02' of https://gitlab.freedesktop.org/drm/kernel: (77 commits) drm/xe/uapi: Reject coh_none PAT index for CPU_ADDR_MIRROR drm/xe/uapi: Reject coh_none PAT index for CPU cached memory in madvise drm/xe/xelp: Fix Wa_18022495364 drm/xe/gsc: Fix BO leak on error in query_compatibility_version() drm/xe/eustall: Fix drm_dev_put called before stream disable in close drm/xe: Fix error cleanup in xe_exec_queue_create_ioctl() drm/xe: Fix dma-buf attachment leak in xe_gem_prime_import() drm/xe: Fix bo leak in xe_dma_buf_init_obj() on allocation failure drm/xe/bo: Fix bo leak on GGTT flag validation in xe_bo_init_locked() drm/xe/bo: Fix bo leak on unaligned size validation in xe_bo_init_locked() drm/xe: Fix potential NULL deref in xe_exec_queue_tlb_inval_last_fence_put_unlocked drm/xe/vf: Use drm mm instead of drm sa for CCS read/write drm/xe: Add memory pool with shadow support drm/xe/debugfs: Correct printing of register whitelist ranges drm/xe: Mark ROW_CHICKEN5 as a masked register drm/xe/tuning: Use proper register offset for GAMSTLB_CTRL drm/xe/xe3p_lpg: Add missing indirect ring state feature flag drm/xe: Drop redundant rtp entries for Wa_14019988906 & Wa_14019877138 drm/xe/vm: Add missing pad and extensions check drm/xe: Drop registration of guc_submit_wedged_fini from xe_guc_submit_wedge() ...
2026-05-01	ne2k: fold drivers/net/Space.c into ne.c	Arnd Bergmann
	drivers/net/Space.c is the last remnant of the linux-2.4.x driver model that required each subsystem and device driver init function to be called from init/main.c explicitly, before the introduction of initcall levels. In linux-7.0, this was only used for a handful of ISA network drivers, with the ne2000 driver being the last one. Fold the code into ne.c directly, with minimal changes to preserve the existing command line parsing. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429145624.2948432-2-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	Merge tag 'nf-26-05-01' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Replace skb_try_make_writable() by skb_ensure_writable() in nft_fwd_netdev and the flowtable to deal with uncloned packets having their network header in paged fragments. 2) Drop packet if output device does not exist and ensure sufficient headroom in nft_fwd_netdev before transmitting the skb. 3) Use the existing dup recursion counter in nft_fwd_netdev for the neigh_xmit variant, from Weiming Shi. 4) Add .check_hooks interface to x_tables to detach the control plane hook check based on the match/target configuration. Then, update nft_compat to use .check_hooks from .validate path, this fixes a lack of hook validation for several match/targets. 5) Fix incorrect .usersize in xt_CT, from Florian Westphal. 6) Fix a memleak with netdev tables in dormant state, from Florian Westphal. 7) Several patches to check if the packet is a fragment, then skip layer 4 inspection, for x_tables and nf_tables; as well as common nf_socket infrastructure. The xt_hashlimit match drops fragments to stay consistent with the existing approach when failing to parse the layer 4 protocol header. 8) Ensure sufficient headroom in the flowtable before transmitting the skb. 9) Fix the flowtable inline vlan approach for double-tagged vlan: Reverse the iteration over .encap[] since it represents the encapsulation as seen from the ingress path. Postpone pushing layer 2 header so output device is available to calculate needed headroom. Finally, add and use nf_flow_vlan_push() to fix it. 10) Fix flowtable inline pppoe with GSO packets. Moreover, use FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware address since neighbour cache does not exist in pppoe. 11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for double-tagged vlan in particular this should provide some benefits in certain scenarios. More notes regarding 9-11): - sashiko is also signalling to use it for IPIP headers, but that needs more adjustments such setting skb->protocol after removing the IPIP header, will follow up in a separated patch. - I plan to submit selftests to cover double-tagged-vlan. As for pppoe, it should be possible but that would mandate a few userspace dependencies. This has been semi-automatically tested by me and reporters describing broken double-vlan-tagged and pppoe currently in the flowtable. * tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header netfilter: flowtable: fix inline pppoe encapsulation in xmit path netfilter: flowtable: fix inline vlan encapsulation in xmit path netfilter: flowtable: ensure sufficient headroom in xmit path netfilter: xtables: fix L4 header parsing for non-first fragments netfilter: nf_tables: skip L4 header parsing for non-first fragments netfilter: nf_socket: skip socket lookup for non-first fragments netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables netfilter: xt_CT: fix usersize for v1 and v2 revision netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate netfilter: x_tables: add .check_hooks to matches and targets netfilter: nft_fwd_netdev: use recursion counter in neigh egress path netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding netfilter: replace skb_try_make_writable() by skb_ensure_writable() ==================== Link: https://patch.msgid.link/20260501122237.296262-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01	smb: smbdirect: introduce and use include/linux/smbdirect.h	Stefan Metzmacher
	This makes it easier to rebuild cifs.ko and ksmbd.ko against a running kernel. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/ Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01	alarmtimer: Remove unused interfaces	Thomas Gleixner
	All alarmtimer users are converted to alarm_start_timer(). Remove the now unused interfaces. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20260408114952.670899355@kernel.org
2026-05-01	alarmtimer: Provide alarm_start_timer()	Thomas Gleixner
	Alarm timers utilize hrtimers for normal operation and only switch to the RTC on suspend. In order to catch already expired timers early and without going through a timer interrupt cycle, provide a new start function which internally uses hrtimer_start_range_ns_user(). If hrtimer_start_range_ns_user() detects an already expired timer, it does not queue it. In that case remove the timer from the alarm base as well. Return the status queued or not back to the caller to handle the early expiry. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: John Stultz <jstultz@google.com> Link: https://patch.msgid.link/20260408114952.332822525@kernel.org