summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2026-02-26Merge tag 'mm-hotfixes-stable-2026-02-26-14-14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "12 hotfixes. 7 are cc:stable. 8 are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-02-26-14-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: update Yosry Ahmed's email address mailmap: add entry for Daniele Alessandrelli mm: fix NULL NODE_DATA dereference for memoryless nodes on boot mm/tracing: rss_stat: ensure curr is false from kthread context mm/kfence: fix KASAN hardware tag faults during late enablement mm/damon/core: disallow non-power of two min_region_sz Squashfs: check metadata block offset is within range MAINTAINERS, mailmap: update e-mail address for Vlastimil Babka liveupdate: luo_file: remember retrieve() status mm: thp: deny THP for files on anonymous inodes mm: change vma_alloc_folio_noprof() macro to inline function mm/kfence: disable KFENCE upon KASAN HW tags enablement
2026-02-26Merge tag 'pm-7.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix two intel_pstate driver issues causing it to crash on sysfs attribute accesses when some CPUs in the system are offline, finalize changes related to turning pm_runtime_put() into a void function, and update Daniel Lezcano's contact information: - Fix two issues in the intel_pstate driver causing it to crash when its sysfs interface is used on a system with some offline CPUs (David Arcari, Srinivas Pandruvada) - Update the last user of the pm_runtime_put() return value to discard it and turn pm_runtime_put() into a void function (Rafael Wysocki) - Update Daniel Lezcano's contact information in MAINTAINERS and .mailmap (Daniel Lezcano)" * tag 'pm-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: MAINTAINERS: Update contact with the kernel.org address cpufreq: intel_pstate: Fix crash during turbo disable cpufreq: intel_pstate: Fix NULL pointer dereference in update_cpu_qos_request() PM: runtime: Change pm_runtime_put() return type to void pmdomain: imx: gpcv2: Discard pm_runtime_put() return value
2026-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc2). Conflicts: tools/testing/selftests/drivers/net/hw/rss_ctx.py 19c3a2a81d2b ("selftests: drv-net: rss: Generate unique ports for RSS context tests") ce5a0f4612db ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up") include/net/inet_connection_sock.h 858d2a4f67ff6 ("tcp: fix potential race in tcp_v6_syn_recv_sock()") fcd3d039fab69 ("tcp: make tcp_v{4,6}_send_check() static") https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c 69050f8d6d075 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") bf4afc53b77ae ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") 8a96b9144f18a ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()") Adjacent changes: net/netfilter/ipvs/ip_vs_ctl.c c59bd9e62e06 ("ipvs: use more counters to avoid service lookups") bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26Merge tag 'kmalloc_obj-v7.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kmalloc_obj fixes from Kees Cook: - Fix pointer-to-array allocation types for ubd and kcsan - Force size overflow helpers to __always_inline - Bump __builtin_counted_by_ref to Clang 22.1 from 22.0 (Nathan Chancellor) * tag 'kmalloc_obj-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: kcsan: test: Adjust "expect" allocation type for kmalloc_obj overflow: Make sure size helpers are always inlined init/Kconfig: Adjust fixed clang version for __builtin_counted_by_ref ubd: Use pointer-to-pointers for io_thread_req arrays
2026-02-26ACPI: x86/rtc-cmos: Use platform device for driver bindingRafael J. Wysocki
Modify the rtc-cmos driver to bind to a platform device on systems with ACPI via acpi_match_table and advertise the CMOST RTC ACPI device IDs for driver auto-loading. Note that adding the requisite device IDs to it and exposing them via MODULE_DEVICE_TABLE() is sufficient for this purpose. Since the ACPI device IDs in question are the same as for the CMOS RTC ACPI scan handler, put them into a common header file and use the definition from there in both places. Additionally, to prevent a PNP device from being created for the CMOS RTC if a platform one is present already, make is_cmos_rtc_device() check cmos_rtc_platform_device_present introduced previously. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://patch.msgid.link/13969123.uLZWGnKmhe@rafael.j.wysocki
2026-02-26ACPI: x86: cmos_rtc: Create a CMOS RTC platform deviceRafael J. Wysocki
Make the CMOS RTC ACPI scan handler create a platform device that will be used subsequently by rtc-cmos for driver binding on x86 systems with ACPI and update add_rtc_cmos() to skip registering a fallback platform device for the CMOS RTC when the above one has been registered. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # x86 Link: https://patch.msgid.link/1962427.tdWV9SEqCh@rafael.j.wysocki
2026-02-26mm/slub: drop duplicate kernel-doc for ksize()Sanjay Chitroda
The implementation of ksize() was updated with kernel-doc by commit fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c") However, the public header still contains a kernel-doc comment attached to the ksize() prototype. Having documentation both in the header and next to the implementation causes Sphinx to treat the function as being documented twice, resulting in the warning: WARNING: Duplicate C declaration, also defined at core-api/mm-api:521 Declaration is '.. c:function:: size_t ksize(const void *objp)' Kernel-doc guidelines recommend keeping the documentation with the function implementation. Therefore remove the redundant kernel-doc block from include/linux/slab.h so that the implementation in slub.c remains the canonical source for documentation. No functional change. Fixes: fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c") Signed-off-by: Sanjay Chitroda <sanjayembeddedse@gmail.com> Link: https://patch.msgid.link/20260226054712.3610744-1-sanjayembedded@gmail.com Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2026-02-26mm/slab: mark alloc tags empty for sheaves allocated with __GFP_NO_OBJ_EXTSuren Baghdasaryan
alloc_empty_sheaf() allocates sheaves from SLAB_KMALLOC caches using __GFP_NO_OBJ_EXT to avoid recursion, however it does not mark their allocation tags empty before freeing, which results in a warning when CONFIG_MEM_ALLOC_PROFILING_DEBUG is set. Fix this by marking allocation tags for such sheaves as empty. The problem was technically introduced in commit 4c0a17e28340 but only becomes possible to hit with commit 913ffd3a1bf5. Fixes: 4c0a17e28340 ("slab: prevent recursive kmalloc() in alloc_empty_sheaf()") Fixes: 913ffd3a1bf5 ("slab: handle kmalloc sheaves bootstrap") Reported-by: David Wang <00107082@163.com> Closes: https://lore.kernel.org/all/20260223155128.3849-1-00107082@163.com/ Analyzed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: Harry Yoo <harry.yoo@oracle.com> Tested-by: David Wang <00107082@163.com> Link: https://patch.msgid.link/20260225163407.2218712-1-surenb@google.com Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
2026-02-26pppoe: remove kernel-mode relay supportQingfang Deng
The kernel-mode PPPoE relay feature and its two associated ioctls (PPPOEIOCSFWD and PPPOEIOCDFWD) are not used by any existing userspace PPPoE implementations. The most commonly-used package, RP-PPPoE [1], handles the relaying entirely in userspace. This legacy code has remained in the driver since its introduction in kernel 2.3.99-pre7 for over two decades, but has served no practical purpose. Remove the unused relay code. [1] https://dianne.skoll.ca/projects/rp-pppoe/ Signed-off-by: Qingfang Deng <dqfext@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20260224015053.42472-1-dqfext@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-26kthread: consolidate kthread exit paths to prevent use-after-freeChristian Brauner
Guillaume reported crashes via corrupted RCU callback function pointers during KUnit testing. The crash was traced back to the pidfs rhashtable conversion which replaced the 24-byte rb_node with an 8-byte rhash_head in struct pid, shrinking it from 160 to 144 bytes. struct kthread (without CONFIG_BLK_CGROUP) is also 144 bytes. With CONFIG_SLAB_MERGE_DEFAULT and SLAB_HWCACHE_ALIGN both round up to 192 bytes and share the same slab cache. struct pid.rcu.func and struct kthread.affinity_node both sit at offset 0x78. When a kthread exits via make_task_dead() it bypasses kthread_exit() and misses the affinity_node cleanup. free_kthread_struct() frees the memory while the node is still linked into the global kthread_affinity_list. A subsequent list_del() by another kthread writes through dangling list pointers into the freed and reused memory, corrupting the pid's rcu.func pointer. Instead of patching free_kthread_struct() to handle the missed cleanup, consolidate all kthread exit paths. Turn kthread_exit() into a macro that calls do_exit() and add kthread_do_exit() which is called from do_exit() for any task with PF_KTHREAD set. This guarantees that kthread-specific cleanup always happens regardless of the exit path - make_task_dead(), direct do_exit(), or kthread_exit(). Replace __to_kthread() with a new tsk_is_kthread() accessor in the public header. Export do_exit() since module code using the kthread_exit() macro now needs it directly. Reported-by: Guillaume Tucker <gtucker@gtucker.io> Tested-by: Guillaume Tucker <gtucker@gtucker.io> Tested-by: Mark Brown <broonie@kernel.org> Tested-by: David Gow <davidgow@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/all/20260224-mittlerweile-besessen-2738831ae7f6@brauner Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: 4d13f4304fa4 ("kthread: Implement preferred affinity") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-25Merge tag 'vfs-7.0-rc2.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix an uninitialized variable in file_getattr(). The flags_valid field wasn't initialized before calling vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse - Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is not enabled. sysctl_hung_task_timeout_secs is 0 in that case causing spurious "waiting for writeback completion for more than 1 seconds" warnings - Fix a null-ptr-deref in do_statmount() when the mount is internal - Add missing kernel-doc description for the @private parameter in iomap_readahead() - Fix mount namespace creation to hold namespace_sem across the mount copy in create_new_namespace(). The previous drop-and-reacquire pattern was fragile and failed to clean up mount propagation links if the real rootfs was a shared or dependent mount - Fix /proc mount iteration where m->index wasn't updated when m->show() overflows, causing a restart to repeatedly show the same mount entry in a rapidly expanding mount table - Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the inode number is out of range - Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared. copy_mnt_ns() received the live fs_struct so if a subsequent namespace creation failed the rollback would leave pwd and root pointing to detached mounts. Always allocate a new fs_struct when CLONE_NEWNS is requested - fserror bug fixes: - Remove the unused fsnotify_sb_error() helper now that all callers have been converted to fserror_report_metadata - Fix a lockdep splat in fserror_report() where igrab() takes inode::i_lock which can be held in IRQ context. Replace igrab() with a direct i_count bump since filesystems should not report inodes that are about to be freed or not yet exposed - Handle error pointer in procfs for try_lookup_noperm() - Fix an integer overflow in ep_loop_check_proc() where recursive calls returning INT_MAX would overflow when +1 is added, breaking the recursion depth check - Fix a misleading break in pidfs * tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: pidfs: avoid misleading break eventpoll: Fix integer overflow in ep_loop_check_proc() proc: Fix pointer error dereference fserror: fix lockdep complaint when igrabbing inode fsnotify: drop unused helper unshare: fix unshare_fs() handling minix: Correct errno in minix_new_inode namespace: fix proc mount iteration mount: hold namespace_sem across copy in create_new_namespace() iomap: Describe @private in iomap_readahead() statmount: Fix the null-ptr-deref in do_statmount() writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK fs: init flags_valid before calling vfs_fileattr_get
2026-02-25mtd: spinand: Clean the flags sectionMiquel Raynal
Mention that we are declaring the main SPI NAND flags with a comment. Align the values with tabs. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-02-25mtd: Add driver for concatenating devicesAmit Kumar Mahapatra
Introducing CONFIG_MTD_VIRT_CONCAT to separate the legacy flow from the new approach, where only the concatenated partition is registered as an MTD device, while the individual partitions that form it are not registered independently, as they are typically not required by the user. CONFIG_MTD_VIRT_CONCAT is a boolean configuration option that depends on CONFIG_MTD_PARTITIONED_MASTER. When enabled, it allows flash nodes to be exposed as individual MTD devices along with the other partitions. The solution focuses on fixed-partitions description only as it depends on device boundaries. It supports multiple sets of concatenated devices, each comprising two or more partitions. flash@0 { reg = <0>; partitions { compatible = "fixed-partitions"; part0@0 { part-concat-next = <&flash0_part1>; label = "part0_0"; reg = <0x0 0x800000>; }; flash0_part1: part1@800000 { label = "part0_1"; reg = <800000 0x800000>; }; part2@1000000 { part-concat-next = <&flash1_part0>; label = "part0_2"; reg = <0x800000 0x800000>; }; }; }; flash@1 { reg = <1>; partitions { compatible = "fixed-partitions"; flash1_part0: part1@0 { label = "part1_0"; reg = <0x0 0x800000>; }; part1@800000 { label = "part1_1"; reg = <0x800000 0x800000>; }; }; }; The partitions that gets created are flash@0 part0_0-part0_1-concat flash@1 part1_1 part0_2-part1_0-concat Suggested-by: Bernhard Frauendienst <kernel@nospam.obeliks.de> Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-02-25mtd: Move struct mtd_concat definition to header fileAmit Kumar Mahapatra
To enable a more generic approach for concatenating MTD devices, struct mtd_concat should be accessible beyond the mtdconcat driver. Therefore, the definition is being moved to a header file. Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com> Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-02-25coresight: Fix memory leak in coresight_alloc_device_name()Leo Yan
The memory leak detector reports: echo clear > /sys/kernel/debug/kmemleak modprobe coresight_funnel rmmod coresight_funnel # Scan memory leak and report it echo scan > /sys/kernel/debug/kmemleak cat /sys/kernel/debug/kmemleak unreferenced object 0xffff0008020c7200 (size 64): comm "modprobe", pid 410, jiffies 4295333721 hex dump (first 32 bytes): d8 da fe 7e 09 00 ff ff e8 2e ff 7e 09 00 ff ff ...~.......~.... b0 6c ff 7e 09 00 ff ff 30 83 00 7f 09 00 ff ff .l.~....0....... backtrace (crc 4116a690): kmemleak_alloc+0xd8/0xf8 __kmalloc_node_track_caller_noprof+0x2c8/0x6f0 krealloc_node_align_noprof+0x13c/0x2c8 coresight_alloc_device_name+0xe4/0x158 [coresight] 0xffffd327ecef8394 0xffffd327ecef85ec amba_probe+0x118/0x1c8 really_probe+0xc8/0x3f0 __driver_probe_device+0x88/0x190 driver_probe_device+0x44/0x120 __driver_attach+0x100/0x238 bus_for_each_dev+0x84/0xf0 driver_attach+0x2c/0x40 bus_add_driver+0x128/0x258 driver_register+0x64/0x138 __amba_driver_register+0x2c/0x48 The memory leak is caused by not freeing the device list that maintains device indices. This device list preserves stable device indices across unbind and rebind device operations, so it does not share the same lifetime as a device instances and must only be freed when the module is unloaded. Some modules do not implement a module exit callback because they are registered using module_platform_driver(). As a result, the device list cannot be released during module exit for those modules. Fix this by moving the device list into the core layer. As a general solution, instead of maintaining a static list in each driver, drivers now allocate device lists via coresight_allocate_device_list() and device indices via coresight_allocate_device_idx(). The list is released only when the core module is unloaded by calling coresight_release_device_list(), avoiding the leak. Fixes: 0f5f9b6ba9e1 ("coresight: Use platform agnostic names") Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Leo Yan <leo.yan@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20260209-arm_coresight_refactor_dev_register-v4-1-62d6042f76f7@arm.com
2026-02-25dmaengine: dw-edma: Add virtual IRQ for interrupt-emulation doorbellsKoichiro Den
Interrupt emulation can assert the dw-edma IRQ line without updating the DONE/ABORT bits. With the shared read/write/common IRQ handlers, the driver cannot reliably distinguish such an emulated interrupt from a real one and leaving a level IRQ asserted may wedge the line. Allocate a dedicated, requestable Linux virtual IRQ (db_irq) for interrupt emulation and attach an irq_chip whose .irq_ack runs the core-specific deassert sequence (.ack_emulated_irq()). The physical dw-edma interrupt handlers raise this virtual IRQ via generic_handle_irq(), ensuring emulated IRQs are always deasserted. Export the virtual IRQ number (db_irq) and the doorbell register offset (db_offset) via struct dw_edma_chip so platform users can expose interrupt emulation as a doorbell. Without this, a single interrupt-emulation write can leave the level IRQ line asserted and cause the generic IRQ layer to disable it. Signed-off-by: Koichiro Den <den@valinux.co.jp> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://patch.msgid.link/20260215152216.3393561-3-den@valinux.co.jp Signed-off-by: Vinod Koul <vkoul@kernel.org>
2026-02-25ata: libata-eh: avoid unnecessary calls to ata_scsi_port_error_handler()Damien Le Moal
When handling SCSI command timeouts, if we had no actual command timeouts (either because the command was a deferred qc or the completion path won the race with ata_scsi_cmd_error_handler()), we do not need to go through a port error handling, as there was in fact no errors at all. Modify ata_scsi_cmd_error_handler() to return the number of commands that timed out and use this return value in ata_scsi_error() to call ata_scsi_port_error_handler() only if we had command timeouts, or if the port EH has already been scheduled due to failed commands. Otherwise, simply call scsi_eh_flush_done_q() to finish the completed commands without running the full port error handling. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
2026-02-24net: stmmac: pass interface mode into fix_mac_speed() methodRussell King (Oracle)
Pass the current interface mode reported by phylink into the fix_mac_speed() method. This will be used by qcom-ethqos for its "SGMII" configuration. Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vuSKv-0000000AScG-1zv6@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-24overflow: Make sure size helpers are always inlinedKees Cook
With kmalloc_obj() performing implicit size calculations, the embedded size_mul() calls, while marked inline, were not always being inlined. I noticed a couple places where allocations were making a call out for things that would otherwise be compile-time calculated. Force the compilers to always inline these calculations. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/20260224232451.work.614-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-24liveupdate: luo_file: remember retrieve() statusPratyush Yadav (Google)
LUO keeps track of successful retrieve attempts on a LUO file. It does so to avoid multiple retrievals of the same file. Multiple retrievals cause problems because once the file is retrieved, the serialized data structures are likely freed and the file is likely in a very different state from what the code expects. The retrieve boolean in struct luo_file keeps track of this, and is passed to the finish callback so it knows what work was already done and what it has left to do. All this works well when retrieve succeeds. When it fails, luo_retrieve_file() returns the error immediately, without ever storing anywhere that a retrieve was attempted or what its error code was. This results in an errored LIVEUPDATE_SESSION_RETRIEVE_FD ioctl to userspace, but nothing prevents it from trying this again. The retry is problematic for much of the same reasons listed above. The file is likely in a very different state than what the retrieve logic normally expects, and it might even have freed some serialization data structures. Attempting to access them or free them again is going to break things. For example, if memfd managed to restore 8 of its 10 folios, but fails on the 9th, a subsequent retrieve attempt will try to call kho_restore_folio() on the first folio again, and that will fail with a warning since it is an invalid operation. Apart from the retry, finish() also breaks. Since on failure the retrieved bool in luo_file is never touched, the finish() call on session close will tell the file handler that retrieve was never attempted, and it will try to access or free the data structures that might not exist, much in the same way as the retry attempt. There is no sane way of attempting the retrieve again. Remember the error retrieve returned and directly return it on a retry. Also pass this status code to finish() so it can make the right decision on the work it needs to do. This is done by changing the bool to an integer. A value of 0 means retrieve was never attempted, a positive value means it succeeded, and a negative value means it failed and the error code is the value. Link: https://lkml.kernel.org/r/20260216132221.987987-1-pratyush@kernel.org Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks") Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-24mm: change vma_alloc_folio_noprof() macro to inline functionArnd Bergmann
In a few rare configurations with extra warnings eanbled, the new drm_pagemap_migrate_populate_ram_pfn() calls vma_alloc_folio_noprof() but that does not use all the arguments, leading to a harmless warning: drivers/gpu/drm/drm_pagemap.c: In function 'drm_pagemap_migrate_populate_ram_pfn': drivers/gpu/drm/drm_pagemap.c:701:63: error: parameter 'addr' set but not used [-Werror=unused-but-set-parameter=] 701 | unsigned long addr) | ~~~~~~~~~~~~~~^~~~ Replace the macro with an inline function so the compiler can see how the argument would be used, but is still able to optimize out the assignments. Link: https://lkml.kernel.org/r/20260216121751.2378374-1-arnd@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-24sched_ext: Optimize sched_ext_entity layout for cache localityDavid Carlier
Reorder struct sched_ext_entity to place ops_state, ddsp_dsq_id, and ddsp_enq_flags immediately after dsq. These fields are accessed together in the do_enqueue_task() and finish_dispatch() hot paths but were previously spread across three different cache lines. Grouping them on the same cache line reduces cache misses on every enqueue and dispatch operation. Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-24spi: add devm_spi_new_ancillary_device()Antoniu Miclaus
Add a devres-managed version of spi_new_ancillary_device() that automatically unregisters the ancillary SPI device when the parent device is removed. This follows the same devm_add_action_or_reset() pattern used by the other managed SPI functions (devm_spi_optimize_message, devm_spi_register_controller, etc.) and eliminates the need for drivers to open-code their own devm cleanup callbacks for ancillary devices. Acked-by: Nuno Sá <nuno.sa@analog.com> Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Link: https://patch.msgid.link/20260223162110.156746-3-antoniu.miclaus@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2026-02-24PCI/PTM: Drop pci_enable_ptm() granularity parameterMika Westerberg
No pci_enable_ptm() callers supply the "granularity" pointer where the clock granularity would be returned. Drop the unused pci_enable_ptm() parameter. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Link: https://patch.msgid.link/20260224111044.3487873-5-mika.westerberg@linux.intel.com
2026-02-24regmap: define cleanup helper for regmap_fieldSander Vanheule
For temporary field allocation, the user has to perform manual cleanup, or rely on devm_regmap_field_alloc() to (eventually) clean up the allocated resources when an error occurs. Add a cleanup helper that takes care of freeing the allocated regmap_field whenever it goes out of scope. This can simplify this example: struct regmap_field *field = regmap_field_alloc(...); if (IS_ERR(field)) return PTR_ERR(field); int err = regmap_field_read(...); if (err) goto out; /* some logic that may also error */ err = regmap_field_write(...); out: regmap_field_free(field); return err; into the shorter: struct regmap_field *field __free(regmap_field) = regmap_field_alloc(...); if (IS_ERR(field)) return PTR_ERR(field); int err = regmap_field_read(...); if (err) return err; /* some logic that may also error */ return regmap_field_write(...); Signed-off-by: Sander Vanheule <sander@svanheule.net> Link: https://patch.msgid.link/20260220160112.543391-2-sander@svanheule.net Signed-off-by: Mark Brown <broonie@kernel.org>
2026-02-24regmap: sort header includesSander Vanheule
Sort the included headers to make spotting duplicates easier and avoid discussions on where to add new includes. Signed-off-by: Sander Vanheule <sander@svanheule.net> Link: https://patch.msgid.link/20260220160112.543391-1-sander@svanheule.net Signed-off-by: Mark Brown <broonie@kernel.org>
2026-02-24mmc: Merge branch fixes into nextUlf Hansson
Merge the mmc fixes for v7.0-rc[n] into the next branch, to allow them to get tested together with the mmc changes that are targeted for the next release. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2026-02-24jiffies: Remove unused __jiffy_arch_dataPetr Pavlu
The __jiffy_arch_data definition was added in 2017 by commit 60b0a8c3d248 ("frv: declare jiffies to be located in the .data section") for the needs of the frv port. The frv support was removed in 2018 by commit fd8773f9f544 ("arch: remove frv port") and no other architecture has required __jiffy_arch_data. Therefore, remove this unused definition. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260217112638.1525094-1-petr.pavlu@suse.com
2026-02-24ata: libata-scsi: make ata_scsi_simulate() staticDamien Le Moal
ata_scsi_simulate() is called only from libata-scsi.c. Move this function definition as a static function before its call in __ata_scsi_queuecmd() and remove its declaration from include/linux/libata.h. No functional changes. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de>
2026-02-23binfmt_elf_fdpic: fix AUXV size calculation for ELF_HWCAP3 and ELF_HWCAP4Andrei Vagin
Commit 4e6e8c2b757f ("binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4") added support for AT_HWCAP3 and AT_HWCAP4, but it missed updating the AUX vector size calculation in create_elf_fdpic_tables() and AT_VECTOR_SIZE_BASE in include/linux/auxvec.h. Similar to the fix for AT_HWCAP2 in commit c6a09e342f8e ("binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined"), this omission leads to a mismatch between the reserved space and the actual number of AUX entries, eventually triggering a kernel BUG_ON(csp != sp). Fix this by incrementing nitems when ELF_HWCAP3 or ELF_HWCAP4 are defined and updating AT_VECTOR_SIZE_BASE. Cc: Mark Brown <broonie@kernel.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io> Fixes: 4e6e8c2b757f ("binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4") Signed-off-by: Andrei Vagin <avagin@google.com> Link: https://patch.msgid.link/20260217180108.1420024-2-avagin@google.com Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-23Input: export input_default_setkeycodeFabio Baltieri
Export input_default_setkeycode so that a driver can set a custom setkeycode handler to take some driver specific action but still call the default handler at some point. Signed-off-by: Fabio Baltieri <fabiobaltieri@chromium.org> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Link: https://patch.msgid.link/20260222003717.471977-1-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2026-02-23Merge branch 'ib-iio-thermal-qcom-pmic5' into togregJonathan Cameron
Immutable branch to allow this base work to be merged into thermal.
2026-02-23iio: adc: Add support for QCOM PMIC5 Gen3 ADCJishnu Prakash
The ADC architecture on PMIC5 Gen3 is similar to that on PMIC5 Gen2, with all SW communication to ADC going through PMK8550 which communicates with other PMICs through PBS. One major difference is that the register interface used here is that of an SDAM (Shared Direct Access Memory) peripheral present on PMK8550. There may be more than one SDAM used for ADC5 Gen3 and each has eight channels, which may be used for either immediate reads (same functionality as previous PMIC5 and PMIC5 Gen2 ADC peripherals) or recurring measurements (same as ADC_TM functionality). By convention, we reserve the first channel of the first SDAM for all immediate reads and use the remaining channels across all SDAMs for ADC_TM monitoring functionality. Add support for PMIC5 Gen3 ADC driver for immediate read functionality. ADC_TM is implemented as an auxiliary thermal driver under this ADC driver. Signed-off-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2026-02-23sched_ext: Fix ops.dequeue() semanticsAndrea Righi
Currently, ops.dequeue() is only invoked when the sched_ext core knows that a task resides in BPF-managed data structures, which causes it to miss scheduling property change events. In addition, ops.dequeue() callbacks are completely skipped when tasks are dispatched to non-local DSQs from ops.select_cpu(). As a result, BPF schedulers cannot reliably track task state. Fix this by guaranteeing that each task entering the BPF scheduler's custody triggers exactly one ops.dequeue() call when it leaves that custody, whether the exit is due to a dispatch (regular or via a core scheduling pick) or to a scheduling property change (e.g. sched_setaffinity(), sched_setscheduler(), set_user_nice(), NUMA balancing, etc.). BPF scheduler custody concept: a task is considered to be in the BPF scheduler's custody when the scheduler is responsible for managing its lifecycle. This includes tasks dispatched to user-created DSQs or stored in the BPF scheduler's internal data structures from ops.enqueue(). Custody ends when the task is dispatched to a terminal DSQ (such as the local DSQ or %SCX_DSQ_GLOBAL), selected by core scheduling, or removed due to a property change. Tasks directly dispatched to terminal DSQs bypass the BPF scheduler entirely and are never in its custody. Terminal DSQs include: - Local DSQs (%SCX_DSQ_LOCAL or %SCX_DSQ_LOCAL_ON): per-CPU queues where tasks go directly to execution. - Global DSQ (%SCX_DSQ_GLOBAL): the built-in fallback queue where the BPF scheduler is considered "done" with the task. As a result, ops.dequeue() is not invoked for tasks directly dispatched to terminal DSQs. To identify dequeues triggered by scheduling property changes, introduce the new ops.dequeue() flag %SCX_DEQ_SCHED_CHANGE: when this flag is set, the dequeue was caused by a scheduling property change. New ops.dequeue() semantics: - ops.dequeue() is invoked exactly once when the task leaves the BPF scheduler's custody, in one of the following cases: a) regular dispatch: a task dispatched to a user DSQ or stored in internal BPF data structures is moved to a terminal DSQ (ops.dequeue() called without any special flags set), b) core scheduling dispatch: core-sched picks task before dispatch (ops.dequeue() called with %SCX_DEQ_CORE_SCHED_EXEC flag set), c) property change: task properties modified before dispatch, (ops.dequeue() called with %SCX_DEQ_SCHED_CHANGE flag set). This allows BPF schedulers to: - reliably track task ownership and lifecycle, - maintain accurate accounting of managed tasks, - update internal state when tasks change properties. Cc: Tejun Heo <tj@kernel.org> Cc: Emil Tsalapatis <emil@etsalapatis.com> Cc: Kuba Piecuch <jpiecuch@google.com> Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-23dma-buf: Add dma_buf_attach_revocable()Leon Romanovsky
Some exporters need a flow to synchronously revoke access to the DMA-buf by importers. Once revoke is completed the importer is not permitted to touch the memory otherwise they may get IOMMU faults, AERs, or worse. DMA-buf today defines a revoke flow, for both pinned and dynamic importers, which is broadly: dma_resv_lock(dmabuf->resv, NULL); // Prevent new mappings from being established priv->revoked = true; // Tell all importers to eventually unmap dma_buf_invalidate_mappings(dmabuf); // Wait for any inprogress fences on the old mapping dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(dmabuf->resv, NULL); // Wait for all importers to complete unmap wait_for_completion(&priv->unmapped_comp); This works well, and an importer that continues to access the DMA-buf after unmapping it is very buggy. However, the final wait for unmap is effectively unbounded. Several importers do not support invalidate_mappings() at all and won't unmap until userspace triggers it. This unbounded wait is not suitable for exporters like VFIO and RDMA tha need to issue revoke as part of their normal operations. Add dma_buf_attach_revocable() to allow exporters to determine the difference between importers that can complete the above in bounded time, and those that can't. It can be called inside the exporter's attach op to reject incompatible importers. Document these details about how dma_buf_invalidate_mappings() works and what the required sequence is to achieve a full revocation. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20260131-dmabuf-revoke-v7-6-463d956bd527@nvidia.com
2026-02-23default_gfp(): avoid using the "newfangled" __VA_OPT__ trickLinus Torvalds
The default_gfp() helper that I added is not wrong, but it turns out that it causes unnecessary headaches for 'sparse' which doesn't support the use of __VA_OPT__ (introduced in C++20 and C23, and supported by gcc and clang for a long time). We do already use __VA_OPT__ in some other cases in the kernel (drm/xe and btrfs), but it has been fairly limited. Now it triggers for pretty much everything, and sparse ends up not working at all. We can use the traditional gcc ',##__VA_ARGS__' syntax instead: it may not be the "C standard" way and is slightly less natural in this context, but it is the traditional model for this and avoids the sparse problem. Reported-and-tested-by: Ricardo Ribalda <ribalda@chromium.org> Reported-and-tested-by: Richard Fitzgerald <rf@opensource.cirrus.com> Reported-by: Ben Dooks <ben.dooks@codethink.co.uk> Fixes: e19e1b480ac7 ("add default_gfp() helper macro and use it in the new *alloc_obj() helpers") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-23sched/fair: More complex proportional newidle balancePeter Zijlstra
It turns out that a few workloads (easyWave, fio) have a fairly low success rate on newidle balance, but still benefit greatly from having it anyway. Luckliky these workloads have a faily low newidle rate, so the cost if doing the newidle is relatively low, even if unsuccessfull. Add a simple rate based part to the newidle ratio compute, such that low rate newidle will still have a high newidle ratio. This cures the easyWave and fio workloads while not affecting the schbench numbers either (which have a very high newidle rate). Reported-by: Mario Roy <marioeroy@gmail.com> Reported-by: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Mario Roy <marioeroy@gmail.com> Tested-by: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com> Link: https://patch.msgid.link/20260127151748.GA1079264@noisy.programming.kicks-ass.net
2026-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after 7.0-rc1Alexei Starovoitov
Cross-merge trees after 7.0-rc1. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-23platform/x86: int3472: Handle GPIO type 0x10 (DOVDD)Leif Skunberg
The Lenovo ThinkPad X1 Fold 16 Gen 1 has an OV5675 sensor (ACPI HID OVTI5675) behind an INT3472 discrete PMIC controller. The INT3472 _DSM returns GPIO type 0x10 for one of the pins, which controls the DOVDD (digital I/O power) regulator enable. Type 0x10 is not currently handled by the driver, causing the GPIO to be ignored with a warning. Add INT3472_GPIO_TYPE_DOVDD (0x10) and handle it as a regulator with con_id "dovdd" to match the supply name used by sensor drivers (e.g. ov5675). Also increase GPIO_SUPPLY_NAME_LENGTH from 5 to 6 to accommodate the "dovdd" name (5 chars + null terminator). Signed-off-by: Leif Skunberg <diamondback@cohunt.app> Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com> Link: https://patch.msgid.link/20260210132129.17943-1-diamondback@cohunt.app Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2026-02-23dma-buf: use inline lock for the dma-fence-chainChristian König
Using the inline lock is now the recommended way for dma_fence implementations. So use this approach for the framework's internal fences as well. Also saves about 4 bytes for the external spinlock. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20260219160822.1529-9-christian.koenig@amd.com
2026-02-23dma-buf: use inline lock for the dma-fence-arrayChristian König
Using the inline lock is now the recommended way for dma_fence implementations. So use this approach for the framework's internal fences as well. Also saves about 4 bytes for the external spinlock. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20260219160822.1529-8-christian.koenig@amd.com
2026-02-23dma-buf: inline spinlock for fence protection v5Christian König
Implement per-fence spinlocks, allowing implementations to not give an external spinlock to protect the fence internal state. Instead a spinlock embedded into the fence structure itself is used in this case. Shared spinlocks have the problem that implementations need to guarantee that the lock lives at least as long all fences referencing them. Using a per-fence spinlock allows completely decoupling spinlock producer and consumer life times, simplifying the handling in most use cases. v2: improve naming, coverage and function documentation v3: fix one additional locking in the selftests v4: separate out some changes to make the patch smaller, fix one amdgpu crash found by CI systems v5: improve comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20260219160822.1529-5-christian.koenig@amd.com
2026-02-23dma-buf: abstract fence locking v2Christian König
Add dma_fence_lock_irqsafe() and dma_fence_unlock_irqrestore() wrappers and mechanically apply them everywhere. Just a pre-requisite cleanup for a follow up patch. v2: add some missing i915 bits, add abstraction for lockdep assertion as well v3: one more suggestion by Tvrtko Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Link: https://lore.kernel.org/r/20260219160822.1529-4-christian.koenig@amd.com
2026-02-23dma-buf: detach fence ops on signal v3Christian König
When neither a release nor a wait backend ops is specified it is possible to let the dma_fence live on independently of the module who issued it. This makes it possible to unload drivers and only wait for all their fences to signal. v2: fix typo in comment v3: fix sparse rcu warnings Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Philipp Stanner <phasta@kernel.org> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20260219160822.1529-3-christian.koenig@amd.com
2026-02-23dma-buf: protected fence ops by RCU v8Christian König
The fence ops of a dma_fence currently need to life as long as the dma_fence is alive. This means that the module which originally issued a dma_fence can't unload unless all fences are freed up. As first step to solve this issue protect the fence ops by RCU. While it is counter intuitive to protect a constant function pointer table by RCU it allows modules to wait for an RCU grace period before they unload, to make sure that nobody is executing their functions any more. This patch has not much functional change, but only adds the RCU handling for the static checker to test. v2: make one the now duplicated lockdep warnings a comment instead. v3: Add more documentation to ->wait and ->release callback. v4: fix typo in documentation v5: rebased on drm-tip v6: improve code comments v7: improve commit message and code comments v8: fix sparse rcu warnings Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://lore.kernel.org/r/20260219160822.1529-2-christian.koenig@amd.com
2026-02-23PM: runtime: Change pm_runtime_put() return type to voidRafael J. Wysocki
The primary role of pm_runtime_put() is to decrement the runtime PM usage counter of the given device. It always does that regardless of the value returned by it later. In addition, if the runtime PM usage counter after decrementation turns out to be zero, a work item is queued up to check whether or not the device can be suspended. This is not guaranteed to succeed though and even if it is successful, the device may still not be suspended going forward. There are multiple valid reasons why pm_runtime_put() may not decide to queue up the work item mentioned above, including, but not limited to, the case when user space has written "on" to the device's runtime PM "control" file in sysfs. In all of those cases, pm_runtime_put() returns a negative error code (even though the device's runtime PM usage counter has been successfully decremented by it) which is very confusing. In fact, its return value should only be used for debug purposes and care should be taken when doing it even in that case. Accordingly, to avoid the confusion mentioned above, change the return type of pm_runtime_put() to void. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Brian Norris <briannorris@chromium.org> Link: https://patch.msgid.link/14387202.RDIVbhacDa@rafael.j.wysocki
2026-02-23mmc: core: Avoid bitfield RMW for claim/retune flagsPenghe Geng
Move claimed and retune control flags out of the bitfield word to avoid unrelated RMW side effects in asynchronous contexts. The host->claimed bit shared a word with retune flags. Writes to claimed in __mmc_claim_host() or retune_now in mmc_mq_queue_rq() can overwrite other bits when concurrent updates happen in other contexts, triggering spurious WARN_ON(!host->claimed). Convert claimed, can_retune, retune_now and retune_paused to bool to remove shared-word coupling. Fixes: 6c0cedd1ef952 ("mmc: core: Introduce host claiming by context") Fixes: 1e8e55b67030c ("mmc: block: Add CQE support") Cc: stable@vger.kernel.org Suggested-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Penghe Geng <pgeng@nvidia.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2026-02-23shmem: adapt to rhashtable-based simple_xattrs with lazy allocationChristian Brauner
Adapt tmpfs/shmem to use the rhashtable-based xattr path and switch from an embedded struct to pointer-based lazy allocation. Change shmem_inode_info.xattrs from embedded 'struct simple_xattrs' to a pointer 'struct simple_xattrs *', initialized to NULL. This avoids the rhashtable overhead for every tmpfs inode, which helps when a lot of inodes exist. The xattr store is allocated on first use: - shmem_initxattrs(): Allocates via simple_xattrs_alloc() when security modules set initial xattrs during inode creation. - shmem_xattr_handler_set(): Allocates on first setxattr, with a short-circuit for removal when no xattrs are stored yet. All read paths (shmem_xattr_handler_get, shmem_listxattr) check for NULL xattrs pointer and return -ENODATA or 0 respectively. Replaced xattr entries are freed via simple_xattr_free_rcu() to allow concurrent RCU readers to finish. shmem_evict_inode() conditionally frees the xattr store only when allocated. Also change simple_xattr_add() from void to int to propagate rhashtable insertion failures. shmem_initxattrs() is the only caller. Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-3-c2efa4f74cb7@kernel.org Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-23xattr: add rhashtable-based simple_xattr infrastructureChristian Brauner
Add rhashtable support to the simple_xattr subsystem while keeping the existing rbtree code fully functional. This allows consumers to be migrated one at a time without breaking any intermediate build. struct simple_xattrs gains a dispatch flag and a union holding either the rbtree (rb_root + rwlock) or rhashtable state: struct simple_xattrs { bool use_rhashtable; union { struct { struct rb_root rb_root; rwlock_t lock; }; struct rhashtable ht; }; }; simple_xattrs_init() continues to set up the rbtree path for existing embedded-struct callers. Add simple_xattrs_alloc() which dynamically allocates a simple_xattrs and initializes the rhashtable path. This is the entry point for consumers switching to pointer-based lazy allocation. The five core functions (get, set, list, add, free) dispatch based on the use_rhashtable flag. Existing callers continue to use the rbtree path unchanged. As each consumer is converted it will switch to simple_xattrs_alloc() and the rhashtable path. Once all consumers are converted a follow-up patch will remove the rbtree code. Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-2-c2efa4f74cb7@kernel.org Acked-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-23xattr: add rcu_head and rhash_head to struct simple_xattrChristian Brauner
In preparation for converting simple_xattrs from rbtree to rhashtable, add rhash_head and rcu_head members to struct simple_xattr. The rhashtable implementation will use rhash_head for hash table linkage and RCU-based lockless reads, requiring that replaced or removed xattr entries be freed via call_rcu() rather than immediately. Add simple_xattr_free_rcu() which schedules RCU-deferred freeing of an xattr entry. This will be used by callers of simple_xattr_set() once they switch to the rhashtable-based xattr store. No functional changes. Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-1-c2efa4f74cb7@kernel.org Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>