| Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"12 hotfixes. 7 are cc:stable. 8 are for MM.
All are singletons - please see the changelogs for details"
* tag 'mm-hotfixes-stable-2026-02-26-14-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
MAINTAINERS: update Yosry Ahmed's email address
mailmap: add entry for Daniele Alessandrelli
mm: fix NULL NODE_DATA dereference for memoryless nodes on boot
mm/tracing: rss_stat: ensure curr is false from kthread context
mm/kfence: fix KASAN hardware tag faults during late enablement
mm/damon/core: disallow non-power of two min_region_sz
Squashfs: check metadata block offset is within range
MAINTAINERS, mailmap: update e-mail address for Vlastimil Babka
liveupdate: luo_file: remember retrieve() status
mm: thp: deny THP for files on anonymous inodes
mm: change vma_alloc_folio_noprof() macro to inline function
mm/kfence: disable KFENCE upon KASAN HW tags enablement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two intel_pstate driver issues causing it to crash on sysfs
attribute accesses when some CPUs in the system are offline, finalize
changes related to turning pm_runtime_put() into a void function, and
update Daniel Lezcano's contact information:
- Fix two issues in the intel_pstate driver causing it to crash when
its sysfs interface is used on a system with some offline CPUs
(David Arcari, Srinivas Pandruvada)
- Update the last user of the pm_runtime_put() return value to
discard it and turn pm_runtime_put() into a void function (Rafael
Wysocki)
- Update Daniel Lezcano's contact information in MAINTAINERS and
.mailmap (Daniel Lezcano)"
* tag 'pm-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
MAINTAINERS: Update contact with the kernel.org address
cpufreq: intel_pstate: Fix crash during turbo disable
cpufreq: intel_pstate: Fix NULL pointer dereference in update_cpu_qos_request()
PM: runtime: Change pm_runtime_put() return type to void
pmdomain: imx: gpcv2: Discard pm_runtime_put() return value
|
|
Cross-merge networking fixes after downstream PR (net-7.0-rc2).
Conflicts:
tools/testing/selftests/drivers/net/hw/rss_ctx.py
19c3a2a81d2b ("selftests: drv-net: rss: Generate unique ports for RSS context tests")
ce5a0f4612db ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up")
include/net/inet_connection_sock.h
858d2a4f67ff6 ("tcp: fix potential race in tcp_v6_syn_recv_sock()")
fcd3d039fab69 ("tcp: make tcp_v{4,6}_send_check() static")
https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c
drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c
69050f8d6d075 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
bf4afc53b77ae ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument")
8a96b9144f18a ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()")
Adjacent changes:
net/netfilter/ipvs/ip_vs_ctl.c
c59bd9e62e06 ("ipvs: use more counters to avoid service lookups")
bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kmalloc_obj fixes from Kees Cook:
- Fix pointer-to-array allocation types for ubd and kcsan
- Force size overflow helpers to __always_inline
- Bump __builtin_counted_by_ref to Clang 22.1 from 22.0 (Nathan Chancellor)
* tag 'kmalloc_obj-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kcsan: test: Adjust "expect" allocation type for kmalloc_obj
overflow: Make sure size helpers are always inlined
init/Kconfig: Adjust fixed clang version for __builtin_counted_by_ref
ubd: Use pointer-to-pointers for io_thread_req arrays
|
|
Modify the rtc-cmos driver to bind to a platform device on systems with
ACPI via acpi_match_table and advertise the CMOST RTC ACPI device IDs
for driver auto-loading. Note that adding the requisite device IDs to
it and exposing them via MODULE_DEVICE_TABLE() is sufficient for this
purpose.
Since the ACPI device IDs in question are the same as for the CMOS RTC
ACPI scan handler, put them into a common header file and use the
definition from there in both places.
Additionally, to prevent a PNP device from being created for the CMOS
RTC if a platform one is present already, make is_cmos_rtc_device()
check cmos_rtc_platform_device_present introduced previously.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Link: https://patch.msgid.link/13969123.uLZWGnKmhe@rafael.j.wysocki
|
|
Make the CMOS RTC ACPI scan handler create a platform device that will
be used subsequently by rtc-cmos for driver binding on x86 systems with
ACPI and update add_rtc_cmos() to skip registering a fallback platform
device for the CMOS RTC when the above one has been registered.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com> # x86
Link: https://patch.msgid.link/1962427.tdWV9SEqCh@rafael.j.wysocki
|
|
The implementation of ksize() was updated with kernel-doc by commit
fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c")
However, the public header still contains a kernel-doc comment
attached to the ksize() prototype.
Having documentation both in the header and next to the implementation
causes Sphinx to treat the function as being documented twice,
resulting in the warning:
WARNING: Duplicate C declaration, also defined at core-api/mm-api:521
Declaration is '.. c:function:: size_t ksize(const void *objp)'
Kernel-doc guidelines recommend keeping the documentation with the
function implementation. Therefore remove the redundant kernel-doc
block from include/linux/slab.h so that the implementation in slub.c
remains the canonical source for documentation.
No functional change.
Fixes: fab0694646d7 ("mm/slab: move [__]ksize and slab_ksize() to mm/slub.c")
Signed-off-by: Sanjay Chitroda <sanjayembeddedse@gmail.com>
Link: https://patch.msgid.link/20260226054712.3610744-1-sanjayembedded@gmail.com
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
|
|
alloc_empty_sheaf() allocates sheaves from SLAB_KMALLOC caches using
__GFP_NO_OBJ_EXT to avoid recursion, however it does not mark their
allocation tags empty before freeing, which results in a warning when
CONFIG_MEM_ALLOC_PROFILING_DEBUG is set. Fix this by marking allocation
tags for such sheaves as empty.
The problem was technically introduced in commit 4c0a17e28340 but only
becomes possible to hit with commit 913ffd3a1bf5.
Fixes: 4c0a17e28340 ("slab: prevent recursive kmalloc() in alloc_empty_sheaf()")
Fixes: 913ffd3a1bf5 ("slab: handle kmalloc sheaves bootstrap")
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/all/20260223155128.3849-1-00107082@163.com/
Analyzed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Tested-by: Harry Yoo <harry.yoo@oracle.com>
Tested-by: David Wang <00107082@163.com>
Link: https://patch.msgid.link/20260225163407.2218712-1-surenb@google.com
Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
|
|
The kernel-mode PPPoE relay feature and its two associated ioctls
(PPPOEIOCSFWD and PPPOEIOCDFWD) are not used by any existing userspace
PPPoE implementations. The most commonly-used package, RP-PPPoE [1],
handles the relaying entirely in userspace.
This legacy code has remained in the driver since its introduction in
kernel 2.3.99-pre7 for over two decades, but has served no practical
purpose.
Remove the unused relay code.
[1] https://dianne.skoll.ca/projects/rp-pppoe/
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Link: https://patch.msgid.link/20260224015053.42472-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Guillaume reported crashes via corrupted RCU callback function pointers
during KUnit testing. The crash was traced back to the pidfs rhashtable
conversion which replaced the 24-byte rb_node with an 8-byte rhash_head
in struct pid, shrinking it from 160 to 144 bytes.
struct kthread (without CONFIG_BLK_CGROUP) is also 144 bytes. With
CONFIG_SLAB_MERGE_DEFAULT and SLAB_HWCACHE_ALIGN both round up to
192 bytes and share the same slab cache. struct pid.rcu.func and
struct kthread.affinity_node both sit at offset 0x78.
When a kthread exits via make_task_dead() it bypasses kthread_exit() and
misses the affinity_node cleanup. free_kthread_struct() frees the memory
while the node is still linked into the global kthread_affinity_list. A
subsequent list_del() by another kthread writes through dangling list
pointers into the freed and reused memory, corrupting the pid's
rcu.func pointer.
Instead of patching free_kthread_struct() to handle the missed cleanup,
consolidate all kthread exit paths. Turn kthread_exit() into a macro
that calls do_exit() and add kthread_do_exit() which is called from
do_exit() for any task with PF_KTHREAD set. This guarantees that
kthread-specific cleanup always happens regardless of the exit path -
make_task_dead(), direct do_exit(), or kthread_exit().
Replace __to_kthread() with a new tsk_is_kthread() accessor in the
public header. Export do_exit() since module code using the
kthread_exit() macro now needs it directly.
Reported-by: Guillaume Tucker <gtucker@gtucker.io>
Tested-by: Guillaume Tucker <gtucker@gtucker.io>
Tested-by: Mark Brown <broonie@kernel.org>
Tested-by: David Gow <davidgow@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/all/20260224-mittlerweile-besessen-2738831ae7f6@brauner
Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 4d13f4304fa4 ("kthread: Implement preferred affinity")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
- Fix an uninitialized variable in file_getattr().
The flags_valid field wasn't initialized before calling
vfs_fileattr_get(), triggering KMSAN uninit-value reports in fuse
- Fix writeback wakeup and logging timeouts when DETECT_HUNG_TASK is
not enabled.
sysctl_hung_task_timeout_secs is 0 in that case causing spurious
"waiting for writeback completion for more than 1 seconds" warnings
- Fix a null-ptr-deref in do_statmount() when the mount is internal
- Add missing kernel-doc description for the @private parameter in
iomap_readahead()
- Fix mount namespace creation to hold namespace_sem across the mount
copy in create_new_namespace().
The previous drop-and-reacquire pattern was fragile and failed to
clean up mount propagation links if the real rootfs was a shared or
dependent mount
- Fix /proc mount iteration where m->index wasn't updated when
m->show() overflows, causing a restart to repeatedly show the same
mount entry in a rapidly expanding mount table
- Return EFSCORRUPTED instead of ENOSPC in minix_new_inode() when the
inode number is out of range
- Fix unshare(2) when CLONE_NEWNS is set and current->fs isn't shared.
copy_mnt_ns() received the live fs_struct so if a subsequent
namespace creation failed the rollback would leave pwd and root
pointing to detached mounts. Always allocate a new fs_struct when
CLONE_NEWNS is requested
- fserror bug fixes:
- Remove the unused fsnotify_sb_error() helper now that all callers
have been converted to fserror_report_metadata
- Fix a lockdep splat in fserror_report() where igrab() takes
inode::i_lock which can be held in IRQ context.
Replace igrab() with a direct i_count bump since filesystems
should not report inodes that are about to be freed or not yet
exposed
- Handle error pointer in procfs for try_lookup_noperm()
- Fix an integer overflow in ep_loop_check_proc() where recursive calls
returning INT_MAX would overflow when +1 is added, breaking the
recursion depth check
- Fix a misleading break in pidfs
* tag 'vfs-7.0-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
pidfs: avoid misleading break
eventpoll: Fix integer overflow in ep_loop_check_proc()
proc: Fix pointer error dereference
fserror: fix lockdep complaint when igrabbing inode
fsnotify: drop unused helper
unshare: fix unshare_fs() handling
minix: Correct errno in minix_new_inode
namespace: fix proc mount iteration
mount: hold namespace_sem across copy in create_new_namespace()
iomap: Describe @private in iomap_readahead()
statmount: Fix the null-ptr-deref in do_statmount()
writeback: Fix wakeup and logging timeouts for !DETECT_HUNG_TASK
fs: init flags_valid before calling vfs_fileattr_get
|
|
Mention that we are declaring the main SPI NAND flags with a comment.
Align the values with tabs.
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
Introducing CONFIG_MTD_VIRT_CONCAT to separate the legacy flow from the new
approach, where only the concatenated partition is registered as an MTD
device, while the individual partitions that form it are not registered
independently, as they are typically not required by the user.
CONFIG_MTD_VIRT_CONCAT is a boolean configuration option that depends on
CONFIG_MTD_PARTITIONED_MASTER. When enabled, it allows flash nodes to be
exposed as individual MTD devices along with the other partitions.
The solution focuses on fixed-partitions description only as it depends on
device boundaries. It supports multiple sets of concatenated devices, each
comprising two or more partitions.
flash@0 {
reg = <0>;
partitions {
compatible = "fixed-partitions";
part0@0 {
part-concat-next = <&flash0_part1>;
label = "part0_0";
reg = <0x0 0x800000>;
};
flash0_part1: part1@800000 {
label = "part0_1";
reg = <800000 0x800000>;
};
part2@1000000 {
part-concat-next = <&flash1_part0>;
label = "part0_2";
reg = <0x800000 0x800000>;
};
};
};
flash@1 {
reg = <1>;
partitions {
compatible = "fixed-partitions";
flash1_part0: part1@0 {
label = "part1_0";
reg = <0x0 0x800000>;
};
part1@800000 {
label = "part1_1";
reg = <0x800000 0x800000>;
};
};
};
The partitions that gets created are
flash@0
part0_0-part0_1-concat
flash@1
part1_1
part0_2-part1_0-concat
Suggested-by: Bernhard Frauendienst <kernel@nospam.obeliks.de>
Suggested-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
To enable a more generic approach for concatenating MTD devices,
struct mtd_concat should be accessible beyond the mtdconcat driver.
Therefore, the definition is being moved to a header file.
Signed-off-by: Amit Kumar Mahapatra <amit.kumar-mahapatra@amd.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
The memory leak detector reports:
echo clear > /sys/kernel/debug/kmemleak
modprobe coresight_funnel
rmmod coresight_funnel
# Scan memory leak and report it
echo scan > /sys/kernel/debug/kmemleak
cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff0008020c7200 (size 64):
comm "modprobe", pid 410, jiffies 4295333721
hex dump (first 32 bytes):
d8 da fe 7e 09 00 ff ff e8 2e ff 7e 09 00 ff ff ...~.......~....
b0 6c ff 7e 09 00 ff ff 30 83 00 7f 09 00 ff ff .l.~....0.......
backtrace (crc 4116a690):
kmemleak_alloc+0xd8/0xf8
__kmalloc_node_track_caller_noprof+0x2c8/0x6f0
krealloc_node_align_noprof+0x13c/0x2c8
coresight_alloc_device_name+0xe4/0x158 [coresight]
0xffffd327ecef8394
0xffffd327ecef85ec
amba_probe+0x118/0x1c8
really_probe+0xc8/0x3f0
__driver_probe_device+0x88/0x190
driver_probe_device+0x44/0x120
__driver_attach+0x100/0x238
bus_for_each_dev+0x84/0xf0
driver_attach+0x2c/0x40
bus_add_driver+0x128/0x258
driver_register+0x64/0x138
__amba_driver_register+0x2c/0x48
The memory leak is caused by not freeing the device list that maintains
device indices.
This device list preserves stable device indices across unbind and
rebind device operations, so it does not share the same lifetime as a
device instances and must only be freed when the module is unloaded.
Some modules do not implement a module exit callback because they are
registered using module_platform_driver(). As a result, the device
list cannot be released during module exit for those modules.
Fix this by moving the device list into the core layer. As a general
solution, instead of maintaining a static list in each driver, drivers
now allocate device lists via coresight_allocate_device_list() and
device indices via coresight_allocate_device_idx().
The list is released only when the core module is unloaded by calling
coresight_release_device_list(), avoiding the leak.
Fixes: 0f5f9b6ba9e1 ("coresight: Use platform agnostic names")
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20260209-arm_coresight_refactor_dev_register-v4-1-62d6042f76f7@arm.com
|
|
Interrupt emulation can assert the dw-edma IRQ line without updating the
DONE/ABORT bits. With the shared read/write/common IRQ handlers, the
driver cannot reliably distinguish such an emulated interrupt from a
real one and leaving a level IRQ asserted may wedge the line.
Allocate a dedicated, requestable Linux virtual IRQ (db_irq) for
interrupt emulation and attach an irq_chip whose .irq_ack runs the
core-specific deassert sequence (.ack_emulated_irq()). The physical
dw-edma interrupt handlers raise this virtual IRQ via
generic_handle_irq(), ensuring emulated IRQs are always deasserted.
Export the virtual IRQ number (db_irq) and the doorbell register offset
(db_offset) via struct dw_edma_chip so platform users can expose
interrupt emulation as a doorbell.
Without this, a single interrupt-emulation write can leave the level IRQ
line asserted and cause the generic IRQ layer to disable it.
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260215152216.3393561-3-den@valinux.co.jp
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
When handling SCSI command timeouts, if we had no actual command
timeouts (either because the command was a deferred qc or the completion
path won the race with ata_scsi_cmd_error_handler()), we do not need to
go through a port error handling, as there was in fact no errors at all.
Modify ata_scsi_cmd_error_handler() to return the number of commands
that timed out and use this return value in ata_scsi_error() to call
ata_scsi_port_error_handler() only if we had command timeouts, or if
the port EH has already been scheduled due to failed commands.
Otherwise, simply call scsi_eh_flush_done_q() to finish the completed
commands without running the full port error handling.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
Pass the current interface mode reported by phylink into the
fix_mac_speed() method. This will be used by qcom-ethqos for its
"SGMII" configuration.
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vuSKv-0000000AScG-1zv6@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With kmalloc_obj() performing implicit size calculations, the embedded
size_mul() calls, while marked inline, were not always being inlined.
I noticed a couple places where allocations were making a call out for
things that would otherwise be compile-time calculated. Force the
compilers to always inline these calculations.
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://patch.msgid.link/20260224232451.work.614-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
LUO keeps track of successful retrieve attempts on a LUO file. It does so
to avoid multiple retrievals of the same file. Multiple retrievals cause
problems because once the file is retrieved, the serialized data
structures are likely freed and the file is likely in a very different
state from what the code expects.
The retrieve boolean in struct luo_file keeps track of this, and is passed
to the finish callback so it knows what work was already done and what it
has left to do.
All this works well when retrieve succeeds. When it fails,
luo_retrieve_file() returns the error immediately, without ever storing
anywhere that a retrieve was attempted or what its error code was. This
results in an errored LIVEUPDATE_SESSION_RETRIEVE_FD ioctl to userspace,
but nothing prevents it from trying this again.
The retry is problematic for much of the same reasons listed above. The
file is likely in a very different state than what the retrieve logic
normally expects, and it might even have freed some serialization data
structures. Attempting to access them or free them again is going to
break things.
For example, if memfd managed to restore 8 of its 10 folios, but fails on
the 9th, a subsequent retrieve attempt will try to call
kho_restore_folio() on the first folio again, and that will fail with a
warning since it is an invalid operation.
Apart from the retry, finish() also breaks. Since on failure the
retrieved bool in luo_file is never touched, the finish() call on session
close will tell the file handler that retrieve was never attempted, and it
will try to access or free the data structures that might not exist, much
in the same way as the retry attempt.
There is no sane way of attempting the retrieve again. Remember the error
retrieve returned and directly return it on a retry. Also pass this
status code to finish() so it can make the right decision on the work it
needs to do.
This is done by changing the bool to an integer. A value of 0 means
retrieve was never attempted, a positive value means it succeeded, and a
negative value means it failed and the error code is the value.
Link: https://lkml.kernel.org/r/20260216132221.987987-1-pratyush@kernel.org
Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks")
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In a few rare configurations with extra warnings eanbled, the new
drm_pagemap_migrate_populate_ram_pfn() calls vma_alloc_folio_noprof() but
that does not use all the arguments, leading to a harmless warning:
drivers/gpu/drm/drm_pagemap.c: In function 'drm_pagemap_migrate_populate_ram_pfn':
drivers/gpu/drm/drm_pagemap.c:701:63: error: parameter 'addr' set but not used [-Werror=unused-but-set-parameter=]
701 | unsigned long addr)
| ~~~~~~~~~~~~~~^~~~
Replace the macro with an inline function so the compiler can see how the
argument would be used, but is still able to optimize out the assignments.
Link: https://lkml.kernel.org/r/20260216121751.2378374-1-arnd@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Jackman <jackmanb@google.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Reorder struct sched_ext_entity to place ops_state, ddsp_dsq_id, and
ddsp_enq_flags immediately after dsq. These fields are accessed together
in the do_enqueue_task() and finish_dispatch() hot paths but were
previously spread across three different cache lines. Grouping them on
the same cache line reduces cache misses on every enqueue and dispatch
operation.
Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Add a devres-managed version of spi_new_ancillary_device() that
automatically unregisters the ancillary SPI device when the parent
device is removed.
This follows the same devm_add_action_or_reset() pattern used by the
other managed SPI functions (devm_spi_optimize_message,
devm_spi_register_controller, etc.) and eliminates the need for drivers
to open-code their own devm cleanup callbacks for ancillary devices.
Acked-by: Nuno Sá <nuno.sa@analog.com>
Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com>
Link: https://patch.msgid.link/20260223162110.156746-3-antoniu.miclaus@analog.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
No pci_enable_ptm() callers supply the "granularity" pointer where the
clock granularity would be returned.
Drop the unused pci_enable_ptm() parameter.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Link: https://patch.msgid.link/20260224111044.3487873-5-mika.westerberg@linux.intel.com
|
|
For temporary field allocation, the user has to perform manual cleanup,
or rely on devm_regmap_field_alloc() to (eventually) clean up the
allocated resources when an error occurs.
Add a cleanup helper that takes care of freeing the allocated
regmap_field whenever it goes out of scope.
This can simplify this example:
struct regmap_field *field = regmap_field_alloc(...);
if (IS_ERR(field))
return PTR_ERR(field);
int err = regmap_field_read(...);
if (err)
goto out;
/* some logic that may also error */
err = regmap_field_write(...);
out:
regmap_field_free(field);
return err;
into the shorter:
struct regmap_field *field __free(regmap_field) = regmap_field_alloc(...);
if (IS_ERR(field))
return PTR_ERR(field);
int err = regmap_field_read(...);
if (err)
return err;
/* some logic that may also error */
return regmap_field_write(...);
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Link: https://patch.msgid.link/20260220160112.543391-2-sander@svanheule.net
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Sort the included headers to make spotting duplicates easier and avoid
discussions on where to add new includes.
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Link: https://patch.msgid.link/20260220160112.543391-1-sander@svanheule.net
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Merge the mmc fixes for v7.0-rc[n] into the next branch, to allow them to
get tested together with the mmc changes that are targeted for the next
release.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
The __jiffy_arch_data definition was added in 2017 by commit 60b0a8c3d248
("frv: declare jiffies to be located in the .data section") for the needs
of the frv port. The frv support was removed in 2018 by commit fd8773f9f544
("arch: remove frv port") and no other architecture has required
__jiffy_arch_data. Therefore, remove this unused definition.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260217112638.1525094-1-petr.pavlu@suse.com
|
|
ata_scsi_simulate() is called only from libata-scsi.c. Move this
function definition as a static function before its call in
__ata_scsi_queuecmd() and remove its declaration from
include/linux/libata.h.
No functional changes.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
|
Commit 4e6e8c2b757f ("binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4") added
support for AT_HWCAP3 and AT_HWCAP4, but it missed updating the AUX
vector size calculation in create_elf_fdpic_tables() and
AT_VECTOR_SIZE_BASE in include/linux/auxvec.h.
Similar to the fix for AT_HWCAP2 in commit c6a09e342f8e ("binfmt_elf_fdpic:
fix AUXV size calculation when ELF_HWCAP2 is defined"), this omission
leads to a mismatch between the reserved space and the actual number of
AUX entries, eventually triggering a kernel BUG_ON(csp != sp).
Fix this by incrementing nitems when ELF_HWCAP3 or ELF_HWCAP4 are
defined and updating AT_VECTOR_SIZE_BASE.
Cc: Mark Brown <broonie@kernel.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Fixes: 4e6e8c2b757f ("binfmt_elf: Wire up AT_HWCAP3 at AT_HWCAP4")
Signed-off-by: Andrei Vagin <avagin@google.com>
Link: https://patch.msgid.link/20260217180108.1420024-2-avagin@google.com
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Export input_default_setkeycode so that a driver can set a custom
setkeycode handler to take some driver specific action but still call
the default handler at some point.
Signed-off-by: Fabio Baltieri <fabiobaltieri@chromium.org>
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20260222003717.471977-1-dmitry.torokhov@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Immutable branch to allow this base work to be merged into thermal.
|
|
The ADC architecture on PMIC5 Gen3 is similar to that on PMIC5 Gen2,
with all SW communication to ADC going through PMK8550 which
communicates with other PMICs through PBS.
One major difference is that the register interface used here is that
of an SDAM (Shared Direct Access Memory) peripheral present on PMK8550.
There may be more than one SDAM used for ADC5 Gen3 and each has eight
channels, which may be used for either immediate reads (same functionality
as previous PMIC5 and PMIC5 Gen2 ADC peripherals) or recurring measurements
(same as ADC_TM functionality).
By convention, we reserve the first channel of the first SDAM for all
immediate reads and use the remaining channels across all SDAMs for
ADC_TM monitoring functionality.
Add support for PMIC5 Gen3 ADC driver for immediate read functionality.
ADC_TM is implemented as an auxiliary thermal driver under this ADC
driver.
Signed-off-by: Jishnu Prakash <jishnu.prakash@oss.qualcomm.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Currently, ops.dequeue() is only invoked when the sched_ext core knows
that a task resides in BPF-managed data structures, which causes it to
miss scheduling property change events. In addition, ops.dequeue()
callbacks are completely skipped when tasks are dispatched to non-local
DSQs from ops.select_cpu(). As a result, BPF schedulers cannot reliably
track task state.
Fix this by guaranteeing that each task entering the BPF scheduler's
custody triggers exactly one ops.dequeue() call when it leaves that
custody, whether the exit is due to a dispatch (regular or via a core
scheduling pick) or to a scheduling property change (e.g.
sched_setaffinity(), sched_setscheduler(), set_user_nice(), NUMA
balancing, etc.).
BPF scheduler custody concept: a task is considered to be in the BPF
scheduler's custody when the scheduler is responsible for managing its
lifecycle. This includes tasks dispatched to user-created DSQs or stored
in the BPF scheduler's internal data structures from ops.enqueue().
Custody ends when the task is dispatched to a terminal DSQ (such as the
local DSQ or %SCX_DSQ_GLOBAL), selected by core scheduling, or removed
due to a property change.
Tasks directly dispatched to terminal DSQs bypass the BPF scheduler
entirely and are never in its custody. Terminal DSQs include:
- Local DSQs (%SCX_DSQ_LOCAL or %SCX_DSQ_LOCAL_ON): per-CPU queues
where tasks go directly to execution.
- Global DSQ (%SCX_DSQ_GLOBAL): the built-in fallback queue where the
BPF scheduler is considered "done" with the task.
As a result, ops.dequeue() is not invoked for tasks directly dispatched
to terminal DSQs.
To identify dequeues triggered by scheduling property changes, introduce
the new ops.dequeue() flag %SCX_DEQ_SCHED_CHANGE: when this flag is set,
the dequeue was caused by a scheduling property change.
New ops.dequeue() semantics:
- ops.dequeue() is invoked exactly once when the task leaves the BPF
scheduler's custody, in one of the following cases:
a) regular dispatch: a task dispatched to a user DSQ or stored in
internal BPF data structures is moved to a terminal DSQ
(ops.dequeue() called without any special flags set),
b) core scheduling dispatch: core-sched picks task before dispatch
(ops.dequeue() called with %SCX_DEQ_CORE_SCHED_EXEC flag set),
c) property change: task properties modified before dispatch,
(ops.dequeue() called with %SCX_DEQ_SCHED_CHANGE flag set).
This allows BPF schedulers to:
- reliably track task ownership and lifecycle,
- maintain accurate accounting of managed tasks,
- update internal state when tasks change properties.
Cc: Tejun Heo <tj@kernel.org>
Cc: Emil Tsalapatis <emil@etsalapatis.com>
Cc: Kuba Piecuch <jpiecuch@google.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Some exporters need a flow to synchronously revoke access to the DMA-buf
by importers. Once revoke is completed the importer is not permitted to
touch the memory otherwise they may get IOMMU faults, AERs, or worse.
DMA-buf today defines a revoke flow, for both pinned and dynamic
importers, which is broadly:
dma_resv_lock(dmabuf->resv, NULL);
// Prevent new mappings from being established
priv->revoked = true;
// Tell all importers to eventually unmap
dma_buf_invalidate_mappings(dmabuf);
// Wait for any inprogress fences on the old mapping
dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_BOOKKEEP, false,
MAX_SCHEDULE_TIMEOUT);
dma_resv_unlock(dmabuf->resv, NULL);
// Wait for all importers to complete unmap
wait_for_completion(&priv->unmapped_comp);
This works well, and an importer that continues to access the DMA-buf
after unmapping it is very buggy.
However, the final wait for unmap is effectively unbounded. Several
importers do not support invalidate_mappings() at all and won't unmap
until userspace triggers it.
This unbounded wait is not suitable for exporters like VFIO and RDMA tha
need to issue revoke as part of their normal operations.
Add dma_buf_attach_revocable() to allow exporters to determine the
difference between importers that can complete the above in bounded time,
and those that can't. It can be called inside the exporter's attach op to
reject incompatible importers.
Document these details about how dma_buf_invalidate_mappings() works and
what the required sequence is to achieve a full revocation.
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260131-dmabuf-revoke-v7-6-463d956bd527@nvidia.com
|
|
The default_gfp() helper that I added is not wrong, but it turns out
that it causes unnecessary headaches for 'sparse' which doesn't support
the use of __VA_OPT__ (introduced in C++20 and C23, and supported by gcc
and clang for a long time).
We do already use __VA_OPT__ in some other cases in the kernel (drm/xe
and btrfs), but it has been fairly limited. Now it triggers for pretty
much everything, and sparse ends up not working at all.
We can use the traditional gcc ',##__VA_ARGS__' syntax instead: it may
not be the "C standard" way and is slightly less natural in this
context, but it is the traditional model for this and avoids the sparse
problem.
Reported-and-tested-by: Ricardo Ribalda <ribalda@chromium.org>
Reported-and-tested-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reported-by: Ben Dooks <ben.dooks@codethink.co.uk>
Fixes: e19e1b480ac7 ("add default_gfp() helper macro and use it in the new *alloc_obj() helpers")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It turns out that a few workloads (easyWave, fio) have a fairly low
success rate on newidle balance, but still benefit greatly from having
it anyway.
Luckliky these workloads have a faily low newidle rate, so the cost if
doing the newidle is relatively low, even if unsuccessfull.
Add a simple rate based part to the newidle ratio compute, such that
low rate newidle will still have a high newidle ratio.
This cures the easyWave and fio workloads while not affecting the
schbench numbers either (which have a very high newidle rate).
Reported-by: Mario Roy <marioeroy@gmail.com>
Reported-by: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Mario Roy <marioeroy@gmail.com>
Tested-by: "Mohamed Abuelfotoh, Hazem" <abuehaze@amazon.com>
Link: https://patch.msgid.link/20260127151748.GA1079264@noisy.programming.kicks-ass.net
|
|
Cross-merge trees after 7.0-rc1.
No conflicts.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The Lenovo ThinkPad X1 Fold 16 Gen 1 has an OV5675 sensor (ACPI HID
OVTI5675) behind an INT3472 discrete PMIC controller. The INT3472
_DSM returns GPIO type 0x10 for one of the pins, which controls the
DOVDD (digital I/O power) regulator enable.
Type 0x10 is not currently handled by the driver, causing the GPIO to
be ignored with a warning. Add INT3472_GPIO_TYPE_DOVDD (0x10) and
handle it as a regulator with con_id "dovdd" to match the supply name
used by sensor drivers (e.g. ov5675).
Also increase GPIO_SUPPLY_NAME_LENGTH from 5 to 6 to accommodate
the "dovdd" name (5 chars + null terminator).
Signed-off-by: Leif Skunberg <diamondback@cohunt.app>
Reviewed-by: Hans de Goede <johannes.goede@oss.qualcomm.com>
Link: https://patch.msgid.link/20260210132129.17943-1-diamondback@cohunt.app
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Using the inline lock is now the recommended way for dma_fence
implementations.
So use this approach for the framework's internal fences as well.
Also saves about 4 bytes for the external spinlock.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/r/20260219160822.1529-9-christian.koenig@amd.com
|
|
Using the inline lock is now the recommended way for dma_fence
implementations.
So use this approach for the framework's internal fences as well.
Also saves about 4 bytes for the external spinlock.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/r/20260219160822.1529-8-christian.koenig@amd.com
|
|
Implement per-fence spinlocks, allowing implementations to not give an
external spinlock to protect the fence internal state. Instead a spinlock
embedded into the fence structure itself is used in this case.
Shared spinlocks have the problem that implementations need to guarantee
that the lock lives at least as long all fences referencing them.
Using a per-fence spinlock allows completely decoupling spinlock producer
and consumer life times, simplifying the handling in most use cases.
v2: improve naming, coverage and function documentation
v3: fix one additional locking in the selftests
v4: separate out some changes to make the patch smaller,
fix one amdgpu crash found by CI systems
v5: improve comments
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/r/20260219160822.1529-5-christian.koenig@amd.com
|
|
Add dma_fence_lock_irqsafe() and dma_fence_unlock_irqrestore() wrappers
and mechanically apply them everywhere.
Just a pre-requisite cleanup for a follow up patch.
v2: add some missing i915 bits, add abstraction for lockdep assertion as
well
v3: one more suggestion by Tvrtko
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20260219160822.1529-4-christian.koenig@amd.com
|
|
When neither a release nor a wait backend ops is specified it is possible
to let the dma_fence live on independently of the module who issued it.
This makes it possible to unload drivers and only wait for all their
fences to signal.
v2: fix typo in comment
v3: fix sparse rcu warnings
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Philipp Stanner <phasta@kernel.org>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/r/20260219160822.1529-3-christian.koenig@amd.com
|
|
The fence ops of a dma_fence currently need to life as long as the
dma_fence is alive. This means that the module which originally issued
a dma_fence can't unload unless all fences are freed up.
As first step to solve this issue protect the fence ops by RCU.
While it is counter intuitive to protect a constant function pointer table
by RCU it allows modules to wait for an RCU grace period before they
unload, to make sure that nobody is executing their functions any more.
This patch has not much functional change, but only adds the RCU
handling for the static checker to test.
v2: make one the now duplicated lockdep warnings a comment instead.
v3: Add more documentation to ->wait and ->release callback.
v4: fix typo in documentation
v5: rebased on drm-tip
v6: improve code comments
v7: improve commit message and code comments
v8: fix sparse rcu warnings
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Link: https://lore.kernel.org/r/20260219160822.1529-2-christian.koenig@amd.com
|
|
The primary role of pm_runtime_put() is to decrement the runtime PM
usage counter of the given device. It always does that regardless of
the value returned by it later.
In addition, if the runtime PM usage counter after decrementation turns
out to be zero, a work item is queued up to check whether or not the
device can be suspended. This is not guaranteed to succeed though and
even if it is successful, the device may still not be suspended going
forward.
There are multiple valid reasons why pm_runtime_put() may not decide to
queue up the work item mentioned above, including, but not limited to,
the case when user space has written "on" to the device's runtime PM
"control" file in sysfs. In all of those cases, pm_runtime_put()
returns a negative error code (even though the device's runtime PM
usage counter has been successfully decremented by it) which is very
confusing. In fact, its return value should only be used for debug
purposes and care should be taken when doing it even in that case.
Accordingly, to avoid the confusion mentioned above, change the return
type of pm_runtime_put() to void.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Brian Norris <briannorris@chromium.org>
Link: https://patch.msgid.link/14387202.RDIVbhacDa@rafael.j.wysocki
|
|
Move claimed and retune control flags out of the bitfield word to
avoid unrelated RMW side effects in asynchronous contexts.
The host->claimed bit shared a word with retune flags. Writes to claimed
in __mmc_claim_host() or retune_now in mmc_mq_queue_rq() can overwrite
other bits when concurrent updates happen in other contexts, triggering
spurious WARN_ON(!host->claimed). Convert claimed, can_retune,
retune_now and retune_paused to bool to remove shared-word coupling.
Fixes: 6c0cedd1ef952 ("mmc: core: Introduce host claiming by context")
Fixes: 1e8e55b67030c ("mmc: block: Add CQE support")
Cc: stable@vger.kernel.org
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Penghe Geng <pgeng@nvidia.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Adapt tmpfs/shmem to use the rhashtable-based xattr path and switch
from an embedded struct to pointer-based lazy allocation.
Change shmem_inode_info.xattrs from embedded 'struct simple_xattrs' to
a pointer 'struct simple_xattrs *', initialized to NULL. This avoids
the rhashtable overhead for every tmpfs inode, which helps when a lot of
inodes exist.
The xattr store is allocated on first use:
- shmem_initxattrs(): Allocates via simple_xattrs_alloc() when
security modules set initial xattrs during inode creation.
- shmem_xattr_handler_set(): Allocates on first setxattr, with a
short-circuit for removal when no xattrs are stored yet.
All read paths (shmem_xattr_handler_get, shmem_listxattr) check for
NULL xattrs pointer and return -ENODATA or 0 respectively.
Replaced xattr entries are freed via simple_xattr_free_rcu() to allow
concurrent RCU readers to finish.
shmem_evict_inode() conditionally frees the xattr store only when
allocated.
Also change simple_xattr_add() from void to int to propagate
rhashtable insertion failures. shmem_initxattrs() is the only caller.
Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-3-c2efa4f74cb7@kernel.org
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Add rhashtable support to the simple_xattr subsystem while keeping the
existing rbtree code fully functional. This allows consumers to be
migrated one at a time without breaking any intermediate build.
struct simple_xattrs gains a dispatch flag and a union holding either
the rbtree (rb_root + rwlock) or rhashtable state:
struct simple_xattrs {
bool use_rhashtable;
union {
struct { struct rb_root rb_root; rwlock_t lock; };
struct rhashtable ht;
};
};
simple_xattrs_init() continues to set up the rbtree path for existing
embedded-struct callers.
Add simple_xattrs_alloc() which dynamically allocates a simple_xattrs
and initializes the rhashtable path. This is the entry point for
consumers switching to pointer-based lazy allocation.
The five core functions (get, set, list, add, free) dispatch based on
the use_rhashtable flag.
Existing callers continue to use the rbtree path unchanged. As each
consumer is converted it will switch to simple_xattrs_alloc() and the
rhashtable path. Once all consumers are converted a follow-up patch
will remove the rbtree code.
Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-2-c2efa4f74cb7@kernel.org
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In preparation for converting simple_xattrs from rbtree to rhashtable,
add rhash_head and rcu_head members to struct simple_xattr. The
rhashtable implementation will use rhash_head for hash table linkage
and RCU-based lockless reads, requiring that replaced or removed xattr
entries be freed via call_rcu() rather than immediately.
Add simple_xattr_free_rcu() which schedules RCU-deferred freeing of an
xattr entry. This will be used by callers of simple_xattr_set() once
they switch to the rhashtable-based xattr store.
No functional changes.
Link: https://patch.msgid.link/20260216-work-xattr-socket-v1-1-c2efa4f74cb7@kernel.org
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|