summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
7 hoursirqchip/gic-v5: Move LPI allocation into the LPI domainSascha Bischoff
commit dec85d2fbd20de3711a71e65397dfdb40c3fa953 upstream. The IPI and ITS MSI domains currently allocate and release LPIs directly, then pass the selected LPI ID to the parent LPI domain. This leaks the LPI domain's allocation policy into its child domains and forces each child to duplicate part of the parent domain's teardown. Make the LPI domain allocate LPIs in its .alloc() callback and release them in a matching .free() callback. Child domains can then request a parent interrupt without passing an implementation-specific LPI ID, and the LPI lifetime is tied to the domain that owns the LPI namespace. Remove the gicv5_alloc_lpi() and gicv5_free_lpi() wrappers now that no external caller needs to manage LPIs directly. This is a preparatory change for an actual leakage problem in the allocation code and therefore tagged with the same Fixes tag. Fixes: 0f0101325876 ("irqchip/gic-v5: Add GICv5 LPI/IPI support") Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260506093634.382062-2-sascha.bischoff@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 hoursrseq: Implement read only ABI enforcement for optimized RSEQ V2 modeThomas Gleixner
commit 82f572449cfe75f12ea985986da60e11f308f77d upstream. The optimized RSEQ V2 mode requires that user space adheres to the ABI specification and does not modify the read-only fields cpu_id_start, cpu_id, node_id and mm_cid behind the kernel's back. While the kernel does not rely on these fields, the adherence to this is a fundamental prerequisite to allow multiple entities, e.g. libraries, in an application to utilize the full potential of RSEQ without stepping on each other toes. Validate this adherence on every update of these fields. If the kernel detects that user space modified the fields, the application is force terminated. Fixes: d6200245c75e ("rseq: Allow registering RSEQ with slice extension") Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.845230956%40kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursrseq: Revert to historical performance killing behaviourThomas Gleixner
commit b9eac6a9d93c952c4b7775a24d5c7a1bbf4c3c00 upstream. The recent RSEQ optimization work broke the TCMalloc abuse of the RSEQ ABI as it not longer unconditionally updates the CPU, node, mm_cid fields, which are documented as read only for user space. Due to the observed behavior of the kernel it was possible for TCMalloc to overwrite the cpu_id_start field for their own purposes and rely on the kernel to update it unconditionally after each context switch and before signal delivery. The RSEQ ABI only guarantees that these fields are updated when the data changes, i.e. the task is migrated or the MMCID of the task changes due to switching from or to per CPU ownership mode. The optimization work eliminated the unconditional updates and reduced them to the documented ABI guarantees, which results in a massive performance win for syscall, scheduling heavy work loads, which in turn breaks the TCMalloc expectations. There have been several options discussed to restore the TCMalloc functionality while preserving the optimization benefits. They all end up in a series of hard to maintain workarounds, which in the worst case introduce overhead for everyone, e.g. in the scheduler. The requirements of TCMalloc and the optimization work are diametral and the required work arounds are a maintainence burden. They end up as fragile constructs, which are blocking further optimization work and are pretty much guaranteed to cause more subtle issues down the road. The optimization work heavily depends on the generic entry code, which is not used by all architectures yet. So the rework preserved the original mechanism moslty unmodified to keep the support for architectures, which handle rseq in their own exit to user space loop. That code is currently optimized out by the compiler on architectures which use the generic entry code. This allows to revert back to the original behaviour by replacing the compile time constant conditions with a runtime condition where required, which disables the optimization and the dependend time slice extension feature until the run-time condition can be enabled in the RSEQ registration code on a per task basis again. The following changes are required to restore the original behavior, which makes TCMalloc work again: 1) Replace the compile time constant conditionals with runtime conditionals where appropriate to prevent the compiler from optimizing the legacy mode out 2) Enforce unconditional update of IDs on context switch for the non-optimized v1 mode 3) Enforce update of IDs in the pre signal delivery path for the non-optimized v1 mode 4) Enforce update of IDs in the membarrier(RSEQ) IPI for the non-optimized v1 mode 5) Make time slice and future extensions depend on optimized v2 mode This brings back the full performance problems, but preserves the v2 optimization code and for generic entry code using architectures also the TIF_RSEQ optimization which avoids a full evaluation of the exit to user mode loop in many cases. Fixes: 566d8015f7ee ("rseq: Avoid CPU/MM CID updates when no event pending") Reported-by: Mathias Stearn <mathias@mongodb.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Closes: https://lore.kernel.org/CAHnCjA25b+nO2n5CeifknSKHssJpPrjnf+dtr7UgzRw4Zgu=oA@mail.gmail.com Link: https://patch.msgid.link/20260428224427.517051752%40kernel.org Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursHID: core: introduce hid_safe_input_report()Benjamin Tissoires
[ Upstream commit 206342541fc887ae919774a43942dc883161fece ] hid_input_report() is used in too many places to have a commit that doesn't cross subsystem borders. Instead of changing the API, introduce a new one when things matters in the transport layers: - usbhid - i2chid This effectively revert to the old behavior for those two transport layers. Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursHID: pass the buffer size to hid_report_raw_eventBenjamin Tissoires
[ Upstream commit 2c85c61d1332e1e16f020d76951baf167dcb6f7a ] commit 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") enforced the provided data to be at least the size of the declared buffer in the report descriptor to prevent a buffer overflow. However, we can try to be smarter by providing both the buffer size and the data size, meaning that hid_report_raw_event() can make better decision whether we should plaining reject the buffer (buffer overflow attempt) or if we can safely memset it to 0 and pass it to the rest of the stack. Fixes: 0a3fe972a7cb ("HID: core: Mitigate potential OOB by removing bogus memset()") Cc: stable@vger.kernel.org Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Acked-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jiri Kosina <jkosina@suse.com> Stable-dep-of: 206342541fc8 ("HID: core: introduce hid_safe_input_report()") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourscgroup/cpuset: Reserve DL bandwidth only for root-domain movesGuopeng Zhang
commit 5dd74441cbf42c22e874450eb6a6bbb19390a216 upstream. cpuset_can_attach() currently adds the bandwidth of all migrating SCHED_DEADLINE tasks to sum_migrate_dl_bw. If the source and destination cpuset effective CPU masks do not overlap, the whole sum is then reserved in the destination root domain. set_cpus_allowed_dl(), however, subtracts bandwidth from the source root domain only when the affinity change really moves the task between root domains. A DL task can move between cpusets that are still in the same root domain, so including that task in sum_migrate_dl_bw can reserve destination bandwidth without a matching source-side subtraction. Share the root-domain move test with set_cpus_allowed_dl(). Keep nr_migrate_dl_tasks counting all migrating deadline tasks for cpuset DL task accounting, but add to sum_migrate_dl_bw only for tasks that need a root-domain bandwidth move. Keep using the destination cpuset effective CPU mask and leave the broader can_attach()/attach() transaction model unchanged. Fixes: 2ef269ef1ac0 ("cgroup/cpuset: Free DL BW in case can_attach() fails") Cc: stable@vger.kernel.org # v6.10+ Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 hoursworkqueue: fix devm_alloc_workqueue() va_list misuseBreno Leitao
[ Upstream commit 0de4cb473aed57ee4ba7e0551ad27bddc19fc519 ] devm_alloc_workqueue() built a va_list and passed it as a single positional argument to the variadic alloc_workqueue() macro: va_start(args, max_active); wq = alloc_workqueue(fmt, flags, max_active, args); va_end(args); C does not allow forwarding a va_list through a ... parameter. alloc_workqueue() expands to alloc_workqueue_noprof(), which runs its own va_start() over its ... params, so the inner vsnprintf(wq->name, sizeof(wq->name), fmt, args) in __alloc_workqueue() received the outer va_list object as the first variadic slot rather than the caller's actual format arguments. Add a new static helper alloc_workqueue_va() that wraps __alloc_workqueue() and runs wq_init_lockdep() on success, and fold both alloc_workqueue_noprof() and devm_alloc_workqueue_noprof() onto it as suggested by Tejun. The wq_init_lockdep() step is required on the devm path too, otherwise __flush_workqueue()'s on-stack COMPLETION_INITIALIZER_ONSTACK_MAP would NULL-deref wq->lockdep_map. No caller changes are required. devm_alloc_ordered_workqueue() is a macro forwarding to devm_alloc_workqueue() and inherits the fix. Two in-tree callers actively trigger the broken path on every probe: drivers/power/supply/mt6370-charger.c:889 drivers/power/supply/max77705_charger.c:649 both of which use devm_alloc_ordered_workqueue(dev, "%s", 0, dev_name(dev)). A standalone reproducer module is available at[1]. Link: https://github.com/leitao/debug/blob/main/workqueue/valist/wq_va_test.c [1] Fixes: 1dfc9d60a69e ("workqueue: devres: Add device-managed allocate workqueue") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursdpll: export __dpll_pin_change_ntf() for use under dpll_lockIvan Vecera
[ Upstream commit 620055cb1036a6125fd912e7a14b47a6572b809b ] Export __dpll_pin_change_ntf() so that drivers can send pin change notifications from within pin callbacks, which are already called under dpll_lock. Using dpll_pin_change_ntf() in that context would deadlock. Add lockdep_assert_held() to catch misuse without the lock held. Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-9-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 9e5dead140af ("ice: add dpll peer notification for paired SMA and U.FL pins") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourscdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro()Daan De Meyer
[ Upstream commit 0898a817621a2f0cddca8122d9b974003fe5036d ] The cdrom core never calls set_disk_ro() for a registered device, so BLKROGET on a CD-ROM device always returns 0 (writable), even when the drive has no write capabilities and writes will inevitably fail. This causes problems for userspace that relies on BLKROGET to determine whether a block device is read-only. For example, systemd's loop device setup uses BLKROGET to decide whether to create a loop device with LO_FLAGS_READ_ONLY. Without the read-only flag, writes pass through the loop device to the CD-ROM and fail with I/O errors. systemd-fsck similarly checks BLKROGET to decide whether to run fsck in no-repair mode (-n). The write-capability bits in cdi->mask come from two different sources: CDC_DVD_RAM and CDC_CD_RW are populated by the driver from the MODE SENSE capabilities page (page 0x2A) before register_cdrom() is called, while CDC_MRW_W and CDC_RAM require the MMC GET CONFIGURATION command and were only probed by cdrom_open_write() at device open time. This meant that any attempt to compute the writable state from the full mask at probe time was incorrect, because the GET CONFIGURATION bits were still unset (and cdi->mask is initialized such that capabilities are assumed present). Fix this by factoring the GET CONFIGURATION probing out of cdrom_open_write() into a new exported helper, cdrom_probe_write_features(), and having sr call it from sr_probe() right after get_capabilities() has populated the MODE SENSE bits. register_cdrom() then calls set_disk_ro() based on the full write-capability mask (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW) so the block layer reflects the drive's actual write support. The feature queries used (CDF_MRW and CDF_RWRT via GET CONFIGURATION with RT=00) report drive-level capabilities that are persistent across media, so a single probe before register_cdrom() is sufficient and the redundant probe at open time is dropped. With set_disk_ro() now accurate, the long-vestigial cd->writeable flag in sr can go: get_capabilities() used to set cd->writeable based on the same four mask bits, but because CDC_MRW_W and CDC_RAM default to "capability present" in cdi->mask and aren't touched by MODE SENSE, the condition that gated cd->writeable was always true, making it unconditionally 1. Replace the corresponding gate in sr_init_command() with get_disk_ro(cd->disk), which turns a previously no-op check into a real one and also catches kernel-internal bio writers that bypass blkdev_write_iter()'s bdev_read_only() check. The sd driver (SCSI disks) does not have this problem because it checks the MODE SENSE Write Protect bit and calls set_disk_ro() accordingly. The sr driver cannot use the same approach because the MMC specification does not define the WP bit in the MODE SENSE device-specific parameter byte for CD-ROM devices. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daan De Meyer <daan@amutable.com> Reviewed-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://patch.msgid.link/20260427210139.1400-2-phil@philpotter.co.uk Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursmtd: spinand: Add support for packed read data ODTR commandsMiquel Raynal
[ Upstream commit 5e25407b68f460142539536e31fa20338db6146f ] Some devices stuff address bits in the double byte opcode (in place of the repeated byte) in order to be able to increase the size of the devices, without adding extra address bytes. Create a flag to identify those devices. When the flag is set, use the "packed" variant for the read data operation. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Stable-dep-of: 8d655748aba1 ("mtd: spinand: winbond: Set the packed page read flag to W35N02/04JW") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursspi: spi-mem: Add a packed command operationMiquel Raynal
[ Upstream commit f79ee9e4b23244e77b28d176ce99a2d84d813ac5 ] Instead of repeating the command opcode twice, some flash devices try to pack command and address bits. In this case, the second opcode byte being sent (LSB) is free to be used. The input data must be ANDed to only provide the relevant bits. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Link: https://patch.msgid.link/20260410-winbond-6-19-rc1-oddr-v1-2-2ac4827a3868@bootlin.com Signed-off-by: Mark Brown <broonie@kernel.org> Stable-dep-of: 8d655748aba1 ("mtd: spinand: winbond: Set the packed page read flag to W35N02/04JW") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursrculist: add list_splice_rcu() for private listsPablo Neira Ayuso
[ Upstream commit f902877b635551513729bdf9a8d1422c4aab7741 ] This patch adds a helper function, list_splice_rcu(), to safely splice a private (non-RCU-protected) list into an RCU-protected list. The function ensures that only the pointer visible to RCU readers (prev->next) is updated using rcu_assign_pointer(), while the rest of the list manipulations are performed with regular assignments, as the source list is private and not visible to concurrent RCU readers. This is useful for moving elements from a private list into a global RCU-protected list, ensuring safe publication for RCU readers. Subsystems with some sort of batching mechanism from userspace can benefit from this new function. The function __list_splice_rcu() has been added for clarity and to follow the same pattern as in the existing list_splice*() interfaces, where there is a check to ensure that the list to splice is not empty. Note that __list_splice_rcu() has no documentation for this reason. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Stable-dep-of: a6134e62dba2 ("netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursnstree: fix func. parameter kernel-doc warningsRandy Dunlap
[ Upstream commit 43eb354ecb471426e97b0ce6a0c922ec20f82027 ] Use the correct parameter name ("__ns") for function parameter kernel-doc to avoid 3 warnings: Warning: include/linux/nstree.h:68 function parameter '__ns' not described in 'ns_tree_add_raw' Warning: include/linux/nstree.h:77 function parameter '__ns' not described in 'ns_tree_add' Warning: include/linux/nstree.h:88 function parameter '__ns' not described in 'ns_tree_remove' Fixes: 885fc8ac0a4d ("nstree: make iterator generic") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260416215429.948898-1-rdunlap@infradead.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourspppoe: drop PFC framesQingfang Deng
[ Upstream commit cc1ff87bce1ccd38410ab10960f576dcd17db679 ] RFC 2516 Section 7 states that Protocol Field Compression (PFC) is NOT RECOMMENDED for PPPoE. In practice, pppd does not support negotiating PFC for PPPoE sessions, and the current PPPoE driver assumes an uncompressed (2-byte) protocol field. However, the generic PPP layer function ppp_input() is not aware of the negotiation result, and still accepts PFC frames. If a peer with a broken implementation or an attacker sends a frame with a compressed (1-byte) protocol field, the subsequent PPP payload is shifted by one byte. This causes the network header to be 4-byte misaligned, which may trigger unaligned access exceptions on some architectures. To reduce the attack surface, drop PPPoE PFC frames. Introduce ppp_skb_is_compressed_proto() helper function to be used in both ppp_generic.c and pppoe.c to avoid open-coding. Fixes: 7fb1b8ca8fa1 ("ppp: Move PFC decompression to PPP generic layer") Signed-off-by: Qingfang Deng <qingfang.deng@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260415022456.141758-2-qingfang.deng@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourstcp: move tp->chrono_type next tp->chrono_stat[]Eric Dumazet
[ Upstream commit 4b78c9cbd8f1fbb9517aee48b372646f4cf05442 ] chrono_type is currently in tcp_sock_read_txrx group, which is supposed to hold read-mostly fields. But chrono_type is mostly written in tx path, it should be moved to tcp_sock_write_tx group, close to other chrono fields (chrono_stat[], chrono_start). Note this adds holes, but data locality is far more important. Use a full u8 for the time being, compiler can generate more efficient code. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260308122302.2895067-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 267bf3cf9a6f ("tcp: annotate data-races in tcp_get_info_chrono_stats()") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursnet: enetc: fix NTMP DMA use-after-free issueWei Fang
[ Upstream commit 3cade698881eb238f88cbbfec82acc2110440a3f ] The AI-generated review reported a potential DMA use-after-free issue [1]. If netc_xmit_ntmp_cmd() times out and returns an error, the pending command is not explicitly aborted, while ntmp_free_data_mem() unconditionally frees the DMA buffer. If the buffer has already been reallocated elsewhere, this may lead to silent memory corruption. Because the hardware eventually processes the pending command and perform a DMA write of the response to the physical address of the freed buffer. To resolve this issue, this patch does the following modifications: 1. Convert cbdr->ring_lock from a spinlock to a mutex The lock was originally a spinlock in case NTMP operations might be invoked from atomic context. After downstream support for all NTMP tables, no such usage has materialized. A mutex lock is now required because the driver now needs to reclaim used BDs and release associated DMA memory within the lock's context, while dma_free_coherent() might sleep. 2. Introduce software command BD (struct netc_swcbd) The hardware write-back overwrites the addr and len fields of the BD, so the driver cannot rely on the hardware BD to free the associated DMA memory. The driver now maintains a software shadow BD storing the DMA buffer pointer, DMA address, and size. And netc_xmit_ntmp_cmd() only reclaims older BDs when the number of used BDs reaches NETC_CBDR_CLEAN_WORK (16). The software BD enables correct DMA memory release. With this, struct ntmp_dma_buf and ntmp_free_data_mem() are no longer needed and are removed. 3. Require callers to hold ring_lock across netc_xmit_ntmp_cmd() netc_xmit_ntmp_cmd() releases the ring_lock before the caller finishes consuming the response. At this point, if a concurrent thread submits a new command, it may trigger ntmp_clean_cbdr() and free the DMA buffer while it is still in use. Move ring_lock ownership to the caller to ensure the response buffer cannot be reclaimed prematurely. So the helpers ntmp_select_and_lock_cbdr() and ntmp_unlock_cbdr() are added. These changes eliminate the DMA use-after-free condition and ensure safe and consistent BD reclamation and DMA buffer lifecycle management. Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP") Link: https://lore.kernel.org/netdev/20260403011729.1795413-1-kuba@kernel.org/ # [1] Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260415060833.2303846-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourslib/hexdump: print_hex_dump_bytes() calls print_hex_dump_debug()Geert Uytterhoeven
[ Upstream commit 36776b7f8a8955b4e75b5d490a75fee0c7a2a7ef ] print_hex_dump_bytes() claims to be a simple wrapper around print_hex_dump(), but it actally calls print_hex_dump_debug(), which means no output is printed if (dynamic) DEBUG is disabled. Update the documentation to match the implementation. Fixes: 091cb0994edd20d6 ("lib/hexdump: make print_hex_dump_bytes() a nop on !DEBUG builds") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://patch.msgid.link/3d5c3069fd9102ecaf81d044b750cd613eb72a08.1774970392.git.geert+renesas@glider.be Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourssunrpc: Fix compilation error (`make W=1`) when dprintk() is no-opAndy Shevchenko
[ Upstream commit 6f57293abb8d087de830dd3f02e66d94b3e59973 ] Clang compiler is not happy about set but unused variables: .../flexfilelayout/flexfilelayoutdev.c:56:9: error: variable 'ret' set but not used [-Werror,-Wunused-but-set-variable] .../flexfilelayout/flexfilelayout.c:1505:6: error: variable 'err' set but not used [-Werror,-Wunused-but-set-variable] .../nfs4proc.c:9244:12: error: variable 'ptr' set but not used [-Werror,-Wunused-but-set-variable] Fix these by forwarding parameters of dprintk() to no_printk(). The positive side-effect is a format-string checker enabled even for the cases when dprintk() is no-op. Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver") Fixes: fc931582c260 ("nfs41: create_session operation") Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourssunrpc: Kill RPC_IFDEBUG()Andy Shevchenko
[ Upstream commit adcc59114ccd402259c089b0fea24da5e4974563 ] RPC_IFDEBUG() is used in only two places. In one the user of the definition is guarded by ifdeffery, in the second one it's implied due to dprintk() usage. Kill the macro and move the ifdeffery to the regular condition with the variable defined inside, while in the second case add the same conditional and move the respective code there. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Stable-dep-of: 6f57293abb8d ("sunrpc: Fix compilation error (`make W=1`) when dprintk() is no-op") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursworkqueue: devres: Add device-managed allocate workqueueKrzysztof Kozlowski
[ Upstream commit 1dfc9d60a69ec148e1cb709256617d86e5f0e8f8 ] Add a Resource-managed version of alloc_workqueue() to fix common problem of drivers mixing devm() calls with destroy_workqueue. Such naive and discouraged driver approach leads to difficult to debug bugs when the driver: 1. Allocates workqueue in standard way and destroys it in driver remove() callback, 2. Sets work struct with devm_work_autocancel(), 3. Registers interrupt handler with devm_request_threaded_irq(). Which leads to following unbind/removal path: 1. destroy_workqueue() via driver remove(), Any interrupt coming now would still execute the interrupt handler, which queues work on destroyed workqueue. 2. devm_irq_release(), 3. devm_work_drop() -> cancel_work_sync() on destroyed workqueue. devm_alloc_workqueue() has two benefits: 1. Solves above problem of mix-and-match devres and non-devres code in driver, 2. Simplify any sane drivers which were correctly using alloc_workqueue() + devm_add_action_or_reset(). Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tejun Heo <tj@kernel.org> Stable-dep-of: 1e668baadefb ("power: supply: max77705: Free allocated workqueue and fix removal order") Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursstop_machine: Fix the documentation for a NULL cpus argumentThomas Weißschuh
[ Upstream commit 48f7a50c027dd2abb9e7b8a6ecc8e531d87f2c21 ] A recent refactoring of the kernel-docs for stop machine changed the description of the cpus parameter from "NULL = any online cpu" to "NULL = run on each online CPU". However the callback is only executed on a single CPU, not all of them. The old wording was a bit ambiguous and could have been read both ways. Reword the documentation to be correct again and hopefully also clearer. Fixes: fc6f89dc7078 ("stop_machine: Improve kernel-doc function-header comments") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourstracing: move __printf() attribute on __ftrace_vbprintk()Arnd Bergmann
[ Upstream commit 473e470f16f98569d59adc11c4a318780fb68fe9 ] The sunrpc change to use trace_printk() for debugging caused a new warning for every instance of dprintk() in some configurations, when -Wformat-security is enabled: fs/nfs/getroot.c: In function 'nfs_get_root': fs/nfs/getroot.c:90:17: error: format not a string literal and no format arguments [-Werror=format-security] 90 | nfs_errorf(fc, "NFS: Couldn't getattr on root"); I've been slowly chipping away at those warnings over time with the intention of enabling them by default in the future. While I could not figure out why this only happens for this one instance, I see that the __trace_bprintk() function is always called with a local variable as the format string, rather than a literal. Move the __printf(2,3) annotation on this function from the declaration to the caller. As this is can only be validated for literals, the attribute on the declaration causes the warnings every time, but removing it entirely introduces a new warning on the __ftrace_vbprintk() definition. The format strings still get checked because the underlying literal keeps getting passed into __trace_printk() in the "else" branch, which is not taken but still evaluated for compile-time warnings. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Simon Horman <horms@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260203164545.3174910-1-arnd@kernel.org Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer") Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursvfio: unhide vdev->debug_rootArnd Bergmann
[ Upstream commit 555aa178f8d22261d71da74df6267e6e6e97f95a ] When debugfs is disabled, the hisilicon driver now fails to build: drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c: In function 'hisi_acc_vfio_debug_init': drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c:1671:62: error: 'struct vfio_device' has no member named 'debug_root' 1671 | vfio_dev_migration = debugfs_lookup("migration", vdev->debug_root); | ^~ The driver otherwise relies on dead-code elimination, but this reference fails. The single struct member is not going to make much of a difference for memory consumption, so just keep this visible unconditionally. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: b398f91779b8 ("hisi_acc_vfio_pci: register debugfs for hisilicon migration driver") Link: https://lore.kernel.org/r/20260327165521.3779707-1-arnd@kernel.org Signed-off-by: Alex Williamson <alex@shazbot.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursquota: Fix race of dquot_scan_active() with quota deactivationJan Kara
[ Upstream commit e93ab401da4b2e2c1b8ef2424de2f238d51c8b2d ] dquot_scan_active() can race with quota deactivation in quota_release_workfn() like: CPU0 (quota_release_workfn) CPU1 (dquot_scan_active) ============================== ============================== spin_lock(&dq_list_lock); list_replace_init( &releasing_dquots, &rls_head); /* dquot X on rls_head, dq_count == 0, DQ_ACTIVE_B still set */ spin_unlock(&dq_list_lock); synchronize_srcu(&dquot_srcu); spin_lock(&dq_list_lock); list_for_each_entry(dquot, &inuse_list, dq_inuse) { /* finds dquot X */ dquot_active(X) -> true atomic_inc(&X->dq_count); } spin_unlock(&dq_list_lock); spin_lock(&dq_list_lock); dquot = list_first_entry(&rls_head); WARN_ON_ONCE(atomic_read(&dquot->dq_count)); The problem is not only a cosmetic one as under memory pressure the caller of dquot_scan_active() can end up working on freed dquot. Fix the problem by making sure the dquot is removed from releasing list when we acquire a reference to it. Fixes: 869b6ea1609f ("quota: Fix slow quotaoff") Reported-by: Sam Sun <samsun1006219@gmail.com> Link: https://lore.kernel.org/all/CAEkJfYPTt3uP1vAYnQ5V2ZWn5O9PLhhGi5HbOcAzyP9vbXyjeg@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursPM: domains: De-constify fields in struct dev_pm_domain_attach_dataDmitry Baryshkov
[ Upstream commit 1877d3f258cbb57d64e275754fb9b18b089ce72d ] It doesn't really make sense to keep u32 fields to be marked as const. Having the const fields prevents their modification in the driver. Instead the whole struct can be defined as const, if it is constant. Fixes: 161e16a5e50a ("PM: domains: Add helper functions to attach/detach multiple PM domains") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourspadata: Put CPU offline callback in ONLINE section to allow failureDaniel Jordan
[ Upstream commit c8c4a2972f83c8b68ff03b43cecdb898939ff851 ] syzbot reported the following warning: DEAD callback error for CPU1 WARNING: kernel/cpu.c:1463 at _cpu_down+0x759/0x1020 kernel/cpu.c:1463, CPU#0: syz.0.1960/14614 at commit 4ae12d8bd9a8 ("Merge tag 'kbuild-fixes-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux") which tglx traced to padata_cpu_dead() given it's the only sub-CPUHP_TEARDOWN_CPU callback that returns an error. Failure isn't allowed in hotplug states before CPUHP_TEARDOWN_CPU so move the CPU offline callback to the ONLINE section where failure is possible. Fixes: 894c9ef9780c ("padata: validate cpumask without removed CPU during offline") Reported-by: syzbot+123e1b70473ce213f3af@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69af0a05.050a0220.310d8.002f.GAE@google.com/ Debugged-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursiopoll: fix function parameter names in read_poll_timeout_atomic()Randy Dunlap
[ Upstream commit 878004e2852bc22ce0687c5597d6fe3909fb59f3 ] Correct the function parameter names to avoid kernel-doc warnings and to emphasize this function is atomic (non-sleeping). Warning: include/linux/iopoll.h:169 function parameter 'sleep_us' not described in 'read_poll_timeout_atomic' Warning: ../include/linux/iopoll.h:169 function parameter 'sleep_before_read' not described in 'read_poll_timeout_atomic' Fixes: 9df8043a546d ("iopoll: Generalize read_poll_timeout() into poll_timeout_us()") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patch.msgid.link/20260306221033.2357305-1-rdunlap@infradead.org Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursmodule: Fix freeing of charp module parameters when CONFIG_SYSFS=nPetr Pavlu
[ Upstream commit deffe1edba626d474fef38007c03646ca5876a0e ] When setting a charp module parameter, the param_set_charp() function allocates memory to store a copy of the input value. Later, when the module is potentially unloaded, the destroy_params() function is called to free this allocated memory. However, destroy_params() is available only when CONFIG_SYSFS=y, otherwise only a dummy variant is present. In the unlikely case that the kernel is configured with CONFIG_MODULES=y and CONFIG_SYSFS=n, this results in a memory leak of charp values when a module is unloaded. Fix this issue by making destroy_params() always available when CONFIG_MODULES=y. Rename the function to module_destroy_params() to clarify that it is intended for use by the module loader. Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs") Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourswifi: ieee80211: fix definition of EHT-MCS 15 in MRUShayne Chen
[ Upstream commit cb0caadb64ca0894c4a24e1a34841f260d462f90 ] According to the definition in IEEE Std 802.11be-2024, Table 9-417r, each bit indicates support for the transmission and reception of EHT-MCS 15 in: - B0: 52+26-tone and 106+26-tone MRUs. - B1: a 484+242-tone MRU if 80 MHz is supported. - B2: a 996+484-tone MRU and a 996+484+242-tone MRU if 160 MHz is supported. - B3: a 3×996-tone MRU if 320 MHz is supported. Fixes: 6239da18d2f9 ("wifi: mac80211: adjust EHT capa when lowering bandwidth") Signed-off-by: Shayne Chen <shayne.chen@mediatek.com> Link: https://patch.msgid.link/20260313062150.3165433-1-shayne.chen@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursfirmware: dmi: Correct an indexing error in dmi.hMario Limonciello (AMD)
[ Upstream commit c064abc68e009d2cc18416e7132d9c25e03125b6 ] The entries later in enum dmi_entry_type don't match the SMBIOS specification¹. The entry for type 33: `64-Bit Memory Error Information` is not present and thus the index for all later entries is incorrect. Add it. Also, add missing entry types 43-46, while at it. ¹ Search for "System Management BIOS (SMBIOS) Reference Specification" [ bp: Drop the flaky SMBIOS spec URL. ] Fixes: 93c890dbe5287 ("firmware: Add DMI entry types to the headers") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Link: https://patch.msgid.link/20260307141024.819807-2-superm1@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourssched/topology: Fix sched_domain_span()Peter Zijlstra
[ Upstream commit e379dce8af11d8d6040b4348316a499bfd174bfb ] Commit 8e8e23dea43e ("sched/topology: Compute sd_weight considering cpuset partitions") ends up relying on the fact that structure initialization should not touch the flexible array. However, the official GCC specification for "Arrays of Length Zero" [*] says: Although the size of a zero-length array is zero, an array member of this kind may increase the size of the enclosing type as a result of tail padding. Additionally, structure initialization will zero tail padding. With the end result that since offsetof(*type, member) < sizeof(*type), array initialization will clobber the flex array. Luckily, the way flexible array sizes are calculated is: sizeof(*type) + count * sizeof(*type->member) This means we have the complete size of the flex array *outside* of sizeof(*type), so use that instead of relying on the broken flex array definition. [*] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html Fixes: 8e8e23dea43e ("sched/topology: Compute sd_weight considering cpuset partitions") Reported-by: Nathan Chancellor <nathan@kernel.org> Debugged-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Chen Yu <yu.c.chen@intel.com> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20260323093627.GY3738010@noisy.programming.kicks-ass.net Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourslocking: Fix rwlock support in <linux/spinlock_up.h>Bart Van Assche
[ Upstream commit 756a0e011cfca0b45a48464aa25b05d9a9c2fb0b ] Architecture support for rwlocks must be available whether or not CONFIG_DEBUG_SPINLOCK has been defined. Move the definitions of the arch_{read,write}_{lock,trylock,unlock}() macros such that these become visbile if CONFIG_DEBUG_SPINLOCK=n. This patch prepares for converting do_raw_{read,write}_trylock() into inline functions. Without this patch that conversion triggers a build failure for UP architectures, e.g. arm-ep93xx. I used the following kernel configuration to build the kernel for that architecture: CONFIG_ARCH_MULTIPLATFORM=y CONFIG_ARCH_MULTI_V7=n CONFIG_ATAGS=y CONFIG_MMU=y CONFIG_ARCH_MULTI_V4T=y CONFIG_CPU_LITTLE_ENDIAN=y CONFIG_ARCH_EP93XX=y Fixes: fb1c8f93d869 ("[PATCH] spinlock consolidation") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260313171510.230998-2-bvanassche@acm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursww-mutex: Fix the ww_acquire_ctx function annotationsBart Van Assche
[ Upstream commit 3dcef70e41ab13483803c536ddea8d5f1803ee25 ] The ww_acquire_done() call is optional. Reflect this in the annotations of ww_acquire_done(). Fixes: 47907461e4f6 ("locking/ww_mutex: Support Clang's context analysis") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20260225183244.4035378-4-bvanassche@acm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourssignal: Fix the lock_task_sighand() annotationBart Van Assche
[ Upstream commit 39be7b21af24d1d2ed3b18caac57dd219fef226e ] lock_task_sighand() may return NULL. Make this clear in its lock context annotation. Fixes: 04e49d926f43 ("sched: Enable context analysis for core.c and fair.c") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20260225183244.4035378-3-bvanassche@acm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourslocking: Fix rwlock and spinlock lock context annotationsBart Van Assche
[ Upstream commit 38e18d825f7281fdc16d3241df5115ce6eaeaf79 ] Fix two incorrect rwlock_t lock context annotations. Add the raw_spinlock_t lock context annotations that are missing. Fixes: f16a802d402d ("locking/rwlock, spinlock: Support Clang's context analysis") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20260225183244.4035378-2-bvanassche@acm.org Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourslocking/mutex: Fix wrong comment for CONFIG_DEBUG_LOCK_ALLOCDavidlohr Bueso
[ Upstream commit babcde3be8c9148aa60a14b17831e8f249854963 ] ... that endif block should be CONFIG_DEBUG_LOCK_ALLOC, not CONFIG_LOCKDEP. Fixes: 51d7a054521d ("locking/mutex: Redo __mutex_init() to reduce generated code size") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260217191512.1180151-3-dave@stgolabs.net Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourslocking/mutex: Rename mutex_init_lockep()Davidlohr Bueso
[ Upstream commit 8b65eb52d93e4e496bd26e6867152344554eb39e ] Typo, this wants to be _lockdep(). Fixes: 51d7a054521d ("locking/mutex: Redo __mutex_init() to reduce generated code size") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260217191512.1180151-2-dave@stgolabs.net Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursbus: fsl-mc: use generic driver_override infrastructureDanilo Krummrich
[ Upstream commit 6c8dfb0362732bf1e4829867a2a5239fedc592d0 ] When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 1f86a00c1159 ("bus/fsl-mc: add support for 'driver_override' in the mc-bus") Link: https://patch.msgid.link/20260324005919.2408620-3-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursvdpa: use generic driver_override infrastructureDanilo Krummrich
[ Upstream commit 85bb534ff12aab6916058897b39c748940a7a4c6 ] When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 539fec78edb4 ("vdpa: add driver_override support") Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260324005919.2408620-9-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursplatform/wmi: use generic driver_override infrastructureDanilo Krummrich
[ Upstream commit 8a700b1fc94df4d847a04f14ebc7f8532592b367 ] When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 12046f8c77e0 ("platform/x86: wmi: Add driver_override support") Reviewed-by: Armin Wolf <W_Armin@gmx.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20260324005919.2408620-7-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursPCI: use generic driver_override infrastructureDanilo Krummrich
[ Upstream commit 10a4206a24013be4d558d476010cbf2eb4c9fa64 ] When a driver is probed through __driver_attach(), the bus' match() callback is called without the device lock held, thus accessing the driver_override field without a lock, which can cause a UAF. Fix this by using the driver-core driver_override infrastructure taking care of proper locking internally. Note that calling match() from __driver_attach() without the device lock held is intentional. [1] Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1] Reported-by: Gui-Dong Han <hanguidong02@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789 Fixes: 782a985d7af2 ("PCI: Introduce new device binding path using pci_dev.driver_override") Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Alex Williamson <alex@shazbot.org> Tested-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20260324005919.2408620-6-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hourscpufreq: Pass the policy to cpufreq_driver->adjust_perf()K Prateek Nayak
[ Upstream commit c03791085adcd61fa9b766ab303c7d0941d7378d ] cpufreq_cpu_get() can sleep on PREEMPT_RT in presence of concurrent writer(s), however amd-pstate depends on fetching the cpudata via the policy's driver data which necessitates grabbing the reference. Since schedutil governor can call "cpufreq_driver->update_perf()" during sched_tick/enqueue/dequeue with rq_lock held and IRQs disabled, fetching the policy object using the cpufreq_cpu_get() helper in the scheduler fast-path leads to "BUG: scheduling while atomic" on PREEMPT_RT [1]. Pass the cached cpufreq policy object in sg_policy to the update_perf() instead of just the CPU. The CPU can be inferred using "policy->cpu". The lifetime of cpufreq_policy object outlasts that of the governor and the cpufreq driver (allocated when the CPU is onlined and only reclaimed when the CPU is offlined / the CPU device is removed) which makes it safe to be referenced throughout the governor's lifetime. Closes:https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/ [1] Fixes: 1d215f0319c2 ("cpufreq: amd-pstate: Add fast switch function for AMD P-State") Reported-by: Bert Karwatzki <spasswolf@web.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Acked-by: Gary Guo <gary@garyguo.net> # Rust Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Zhongqiu Han <zhongqiu.han@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260316081849.19368-3-kprateek.nayak@amd.com Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
7 hoursfs: fix archiecture-specific compat_ftruncate64Christoph Hellwig
[ Upstream commit e43dce8a0bc09083ea1145a1a0c61d83cbe72d97 ] The "small" argument to do_sys_ftruncate indicates if > 32-bit size should be reject, but all the arch-specific compat ftruncate64 implementations get this wrong. Merge do_sys_ftruncate and ksys_ftruncate, replace the integer as boolean small flag with a descriptive one about LFS semantics, and use it correctly in the architecture-specific ftruncate64 implementations. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Fixes: 3dd681d944f6 ("arm64: 32-bit (compat) applications support") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260323070205.2939118-2-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
6 daysmm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap()Lorenzo Stoakes
[ Upstream commit 619eab23e1ce7c97e54bfc5a417306d94b3f6f13 ] The mmap_prepare hook functionality includes the ability to invoke mmap_prepare() from the mmap() hook of existing 'stacked' drivers, that is ones which are capable of calling the mmap hooks of other drivers/file systems (e.g. overlayfs, shm). As part of the mmap_prepare action functionality, we deal with errors by unmapping the VMA should one arise. This works in the usual mmap_prepare case, as we invoke this action at the last moment, when the VMA is established in the maple tree. However, the mmap() hook passes a not-fully-established VMA pointer to the caller (which is the motivation behind the mmap_prepare() work), which is detached. So attempting to unmap a VMA in this state will be problematic, with the most obvious symptom being a warning in vma_mark_detached(), because the VMA is already detached. It's also unncessary - the mmap() handler will clean up the VMA on error. So to fix this issue, this patch propagates whether or not an mmap action is being completed via the compatibility layer or directly. If the former, then we do not attempt VMA cleanup, if the latter, then we do. This patch also updates the userland VMA tests to reflect the change. Link: https://lore.kernel.org/20260421102150.189982-1-ljs@kernel.org Fixes: ac0a3fc9c07d ("mm: add ability to take further action in vm_area_desc") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: syzbot+db390288d141a1dccf96@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69e69734.050a0220.24bfd3.0027.GAE@google.com/ Cc: David Hildenbrand <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
6 dayscgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulatedTejun Heo
[ Upstream commit 93618edf753838a727dbff63c7c291dee22d656b ] A chain of commits going back to v7.0 reworked rmdir to satisfy the controller invariant that a subsystem's ->css_offline() must not run while tasks are still doing kernel-side work in the cgroup. [1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out") [2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") [3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") [4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition") [5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context") [1] moved task cset unlink from do_exit() to finish_task_switch() so a task's cset link drops only after the task has fully stopped scheduling. That made tasks past exit_signals() linger on cset->tasks until their final context switch, which led to a series of problems as what userspace expected to see after rmdir diverged from what the kernel needs to wait for. [2]-[5] tried to bridge that divergence: [2] filtered the exiting tasks from cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4] fixed the wait's condition; [5] made nr_dying_subsys_* visible synchronously. The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g. host PID 1 systemd reaping orphan pids that were re-parented to it during the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those pids to free, the pids can't free because PID 1 is the reaper and it's stuck in rmdir, and the system A-A deadlocks. No internal lock ordering breaks this; the wait itself is the bug. The css killing side that drove the original reorder, however, can be made cleanly asynchronous: ->css_offline() is already async, run from css_killed_work_fn() driven by percpu_ref_kill_and_confirm(). The fix is to make that chain start only after all tasks have left the cgroup. rmdir's user-visible side then returns as soon as cgroup.procs and friends are empty, while ->css_offline() still runs only after the cgroup is fully drained. Verified by the original reproducer (pidns teardown + zombie reaper, runs under vng) which hangs vanilla and succeeds here, and by per-commit deterministic repros for [2], [3], [4], [5] with a boot parameter that widens the post-exit_signals() window so each state is reliably reachable. Some stress tests on top of that. cgroup_apply_control_disable() has the same shape of pre-existing race: when a controller is disabled via subtree_control, kill_css() ran synchronously while tasks past exit_signals() could still be linked to the cgroup's csets, and ->css_offline() could fire before they drained. This patch preserves the existing synchronous behavior at that call site (kill_css_sync() + kill_css_finish() back-to-back) and a follow-up patch will defer kill_css_finish() there using a per-css trigger. This seems like the right approach and I don't see problems with it. The changes are somewhat invasive but not excessively so, so backporting to -stable should be okay. If something does turn out to be wrong, the fallback is to revert the entire chain ([1]-[5]) and rework in the development branch instead. v2: Pin cgrp across the deferred destroy work with explicit cgroup_get()/cgroup_put() around queue_work() and the work_fn. v1 wasn't actually broken (ordered cgroup_offline_wq + queue_work order in cgroup_task_dead() saved it) but the explicit ref removes the dependency on those non-obvious invariants. Also note the pre-existing cgroup_apply_control_disable() race in the description; a follow-up will defer kill_css_finish() there. Fixes: 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir") Cc: stable@vger.kernel.org # v7.0+ Reported-and-tested-by: Martin Pitt <martin@piware.de> Link: https://lore.kernel.org/all/afHNg2VX2jy9bW7y@piware.de/ Link: https://lore.kernel.org/all/35e0670adb4abeab13da2c321582af9f@kernel.org/ Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
8 daysptrace: slightly saner 'get_dumpable()' logicLinus Torvalds
commit 31e62c2ebbfdc3fe3dbdf5e02c92a9dc67087a3a upstream. The 'dumpability' of a task is fundamentally about the memory image of the task - the concept comes from whether it can core dump or not - and makes no sense when you don't have an associated mm. And almost all users do in fact use it only for the case where the task has a mm pointer. But we have one odd special case: ptrace_may_access() uses 'dumpable' to check various other things entirely independently of the MM (typically explicitly using flags like PTRACE_MODE_READ_FSCREDS). Including for threads that no longer have a VM (and maybe never did, like most kernel threads). It's not what this flag was designed for, but it is what it is. The ptrace code does check that the uid/gid matches, so you do have to be uid-0 to see kernel thread details, but this means that the traditional "drop capabilities" model doesn't make any difference for this all. Make it all make a *bit* more sense by saying that if you don't have a MM pointer, we'll use a cached "last dumpability" flag if the thread ever had a MM (it will be zero for kernel threads since it is never set), and require a proper CAP_SYS_PTRACE capability to override. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysprintk: add print_hex_dump_devel()Thorsten Blum
[ Upstream commit d134feeb5df33fbf77f482f52a366a44642dba09 ] Add print_hex_dump_devel() as the hex dump equivalent of pr_devel(), which emits output only when DEBUG is enabled, but keeps call sites compiled otherwise. Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Stable-dep-of: 177730a273b1 ("crypto: caam - guard HMAC key hex dumps in hash_digest_key") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 days8021q: use RCU for egress QoS mappingsLongxuan Yu
[ Upstream commit fc69decc811b155a0ed8eef17ee940f28c4f6dbc ] The TX fast path and reporting paths walk egress QoS mappings without RTNL. Convert the mapping lists to RCU-protected pointers, use RCU reader annotations in readers, and defer freeing mapping nodes with an embedded rcu_head. This prepares the egress QoS mapping code for safe removal of mapping nodes in a follow-up change while preserving the current behavior. Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Longxuan Yu <ylong030@ucr.edu> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/9136768189f8c6d3f824f476c62d2fa1111688e8.1776647968.git.yuantan098@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Stable-dep-of: 7dddc74af369 ("8021q: delete cleared egress QoS mappings") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysmmc: core: Optimize time for secure erase/trim for some Kingston eMMCsLuke Wang
[ Upstream commit d6bf2e64dec87322f2b11565ddb59c0e967f96e3 ] Kingston eMMC IY2964 and IB2932 takes a fixed ~2 seconds for each secure erase/trim operation regardless of size - that is, a single secure erase/trim operation of 1MB takes the same time as 1GB. With default calculated 3.5MB max discard size, secure erase 1GB requires ~300 separate operations taking ~10 minutes total. Add a card quirk, MMC_QUIRK_FIXED_SECURE_ERASE_TRIM_TIME, to set maximum secure erase size for those devices. This allows 1GB secure erase to complete in a single operation, reducing time from 10 minutes to just 2 seconds. Signed-off-by: Luke Wang <ziniu.wang_1@nxp.com> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
9 daysmmc: core: Add quirk for incorrect manufacturing dateAvri Altman
[ Upstream commit 263ff314cc5602599d481b0912a381555fcbad28 ] Some eMMC vendors need to report manufacturing dates beyond 2025 but are reluctant to update the EXT_CSD revision from 8 to 9. Changing the Updating the EXT_CSD revision may involve additional testing or qualification steps with customers. To ease this transition and avoid a full re-qualification process, a workaround is needed. This patch introduces a temporary quirk that re-purposes the year codes corresponding to 2010, 2011, and 2012 to represent the years 2026, 2027, and 2028, respectively. This solution is only valid for this three-year period. After 2028, vendors must update their firmware to set EXT_CSD_REV=9 to continue reporting the correct manufacturing date in compliance with the JEDEC standard. The `MMC_QUIRK_BROKEN_MDT` is introduced and enabled for all Sandisk devices to handle this behavior. Signed-off-by: Avri Altman <avri.altman@sandisk.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Stable-dep-of: d6bf2e64dec8 ("mmc: core: Optimize time for secure erase/trim for some Kingston eMMCs") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>