summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-06-02KVM: s390: Fix _gmap_crstep_xchg_atomic()Claudio Imbrenda
The previous incorrect behaviour cleared the vsie_notif bit without returning false, which allowed shadow crstes to be installed without the vsie_notif bit. Return false and do not perform the operation if an unshadow event has been triggered, but still attempt to clear the vsie_notif bit from the existing crste. This will prevent the installation of shadow crstes without vsie_notif bit and will also prevent the caller from looping forever if it was not checking for the sg->invalidated flag. Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-3-imbrenda@linux.ibm.com>
2026-06-02KVM: s390: Fix _gmap_unmap_crste()Claudio Imbrenda
In _gmap_unmap_crste(), the crste to be unmapped is zapped calling gmap_crstep_xchg_atomic() exactly once, and expecting it to succeed. This is a reasonable sanity check, since kvm->mmu_lock is being held in write mode, and thus no races should be possible. An upcoming patch will change the behaviour of gmap_crstep_xchg_atomic() to return false and clear the vsie_notif bit if the operation triggers an unshadow operation. With the new behaviour, an unmap operation that triggers an unshadow would cause the VM to be killed. Prepare for the change by checking if the vsie_notif bit was set in the old crste if gmap_crstep_xchg_atomic() fails the first time, and try a second time. The second time no failures are allowed. Fixes: b827ef02f409 ("KVM: s390: Remove non-atomic dat_crstep_xchg()") Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20260602142356.169458-2-imbrenda@linux.ibm.com>
2026-06-02tracing/eprobes: Allow use of BTF names to dereference pointersSteven Rostedt
Add syntax to the parsing of eprobes to be able to typecast a trace event field that is a pointer to a structure. Currently, a dereference must be a number, where the user has to figure out manually the offset of a member of a structure that they want to dereference. But for event probes that records a field that happens to be a pointer to a structure, it cannot dereference these values with BTF naming, but must use numerical offsets. For example, to find out what device a sk_buff is pointing to in the net_dev_xmit trace event, one must first use gdb to find the offsets of the members of the structures: (gdb) p &((struct sk_buff *)0)->dev $1 = (struct net_device **) 0x10 (gdb) p &((struct net_device *)0)->name $2 = (char (*)[16]) 0x118 And then use the raw numbers to dereference: # echo 'e:xmit net.net_dev_xmit +0x118(+0x10($skbaddr)):string' >> dynamic_events If BTF is in the kernel, then instead, the skbaddr can be typecast to sk_buff and use the normal dereference logic. # echo 'e:xmit net.net_dev_xmit (sk_buff)skbaddr->dev->name:string' >> dynamic_events # echo 1 > events/eprobes/xmit/enable # cat trace [..] sshd-session-1022 [000] b..2. 860.249343: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.250061: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.250142: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.263553: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.283820: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.302716: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.322905: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.342828: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.362268: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.382335: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.400856: xmit: (net.net_dev_xmit) arg1="enp7s0" sshd-session-1022 [000] b..2. 860.419893: xmit: (net.net_dev_xmit) arg1="enp7s0" The syntax is simply: (STRUCT)(FIELD)->MEMBER[->MEMBER..] Also add comments around the #else and #endif of #ifdef CONFIG_PROBE_EVENTS_BTF_ARGS to know what they are for. Link: https://lore.kernel.org/all/20260601130746.2139d926@gandalf.local.home/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-06-02iio: dac: ad5686: create bus ops structRodrigo Alencar
Create struct with bus operations, which will be used to extend bus implementation features. Auxiliary functions ad5686_write() and ad5686_read() are created and ad5686_probe() now receives an ops struct pointer rather than individual read and write functions. Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: cleanup doc header of local structsRodrigo Alencar
Review documentation comment header for ad5686_chip_info and ad5686_state. Update variable names and description and remove unnecessary blank line between comment and struct declaration. Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: add control_sync() for single-channel devicesRodrigo Alencar
Create ad5310_control_sync() and ad5683_control_sync() functions that properly consume the mask definitions with FIELD_PREP(). This allows to reuse a function that updates the control register with cached values, without relying on confusing logic that depends on st->use_internal_vref, which is initialized earlier in ad5686_probe() because it is also applicable to the AD5686_REGMAP case, removing the need for the has_external_vref. Powerdown masks initialization is simplified as *_control_sync() masks outs any unused bits for the single-channel case. The change cleans up ad5686_write_dac_powerdown() and ad5686_probe(), organizing the code for feature extension, e.g. gain control support for single-channel devices. Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: add helpers to handle powerdown masksRodrigo Alencar
Add ad5686_pd_field_set() and ad5686_pd_field_get() helpers to cleanup powerdown mask control. Define AD5686_PD_* constants, e.g. AD5686_PD_MSK to hold powerdown mask value for a single channel. AD5686_LDAC_PWRDN_* macros are replaced by AD5686_PD_MODE_*, because they are unused and the LDAC feature for async load of DAC channel values is not related to power down control. Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: add of_match table to the spi driverRodrigo Alencar
Add of_match table for the SPI device variants to be consistent with the AD5696 I2C driver. Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: drop enum idRodrigo Alencar
Split chip info table into separate structs and expose them to the spi i2c drivers. That is the preferrable approach and allows for the drivers to have knowledge of the device info before the common probe function gets called. Those chip info structs may be shared by SPI and I2C driver variants. Channel declaration definitions are grouped according to channel count and DECLARE_AD5693_CHANNELS() macro is renamed to DECLARE_AD5683_CHANNELS() to match the regmap_type enum. Use spi_get_device_match_data() and i2c_get_match_data() to get chip info struct reference, passing it as parameter to the core probe function. Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: remove redundant register definitionRodrigo Alencar
AD5683_REGMAP and AD5693_REGMAP behave the same way in the common code, and that is because they target single channel devices from the same sub-family. There is no reason to separate them and it will make things simpler when refactoring the chip info table. Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02iio: dac: ad5686: refactor include headersRodrigo Alencar
Apply IWYU principle, replacing unused/generic headers for specific/missing headers. The resulting include directive lists are sorted accordingly. Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Rodrigo Alencar <rodrigo.alencar@analog.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02Merge tag 'v7.1-rc6' into workJonathan Cameron
Linux 7.1-rc6
2026-06-02iio: adc: ad4080: fix AD4880 chip IDAntoniu Miclaus
The AD4880 chip ID was incorrectly set to 0x0750. According to the datasheet, the product ID registers read 0x00 (PRODUCT_ID_H) and 0x59 (PRODUCT_ID_L), giving a combined chip ID of 0x0059. Fix the value to match the actual hardware. Signed-off-by: Antoniu Miclaus <antoniu.miclaus@analog.com> Reviewed-by: Joshua Crofts <joshua.crofts1@gmail.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02printk: fix typos in commentsNaveen Kumar Chaudhary
Fix spelling/grammatical errors in printk.c and nbcon.c: - "precation" -> "precautionary" - "othrewise" -> "otherwise" - "An usable" -> "A usable" - "made a progress" -> "made progress" - "preemtible" -> "preemptible" - "mechasism" -> "mechanism" - "ownerhip" -> "ownership" Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com> Link: https://patch.msgid.link/pakfewagyzb7da3yuxnaxdaoma5w4j2c7i3xebmcld3xy4mqs5@zxsx2idpxrdq Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: add FMC signature extractionJohn Hubbard
Extract the SHA-384 hash, RSA public key, and RSA signature from the FMC ELF32 firmware sections. FSP Chain of Trust verification needs these to validate the FMC image during boot. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-14-jhubbard@nvidia.com [acourbot: derive `Zeroable` on `FmcSignature` for in-place initialization] Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: add FSP secure boot completion waitingJohn Hubbard
Hopper and Blackwell use FSP instead of SEC2 for secure boot. The driver must wait for FSP secure boot to complete before continuing with GSP bring-up. Poll for boot success with a 5-second timeout, and return the FSP interface only on success so that later Chain of Trust operations cannot run before FSP is ready. The interface owns the FSP falcon and the FMC firmware. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-13-jhubbard@nvidia.com [acourbot: use `inspect_err` instead of `map_err` and display actual error] [acourbot: limit visibility of `fsp_hal` to `super``] Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: add FMC firmware imageJohn Hubbard
FSP is the Falcon that runs FMC firmware on Hopper and Blackwell. Load the FMC ELF in two forms: the image section that FSP boots from, and the full Firmware object for later signature extraction during Chain of Trust verification. Declare the FMC image in the module's firmware table so it is bundled for FSP-based chipsets. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-12-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: add FSP falcon engine stubJohn Hubbard
Add the FSP (Foundation Security Processor) falcon engine type that will handle secure boot and Chain of Trust operations on Hopper and Blackwell architectures. The FSP falcon replaces SEC2's role in the boot sequence for these newer architectures. This initial stub just defines the falcon type and its base address. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-11-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: add auto-detection of 32-bit, 64-bit firmware imagesJohn Hubbard
A firmware image may be either a 32-bit or a 64-bit ELF, and callers should not have to know which. Detect the ELF class from the image header at parse time and dispatch to the matching parser, so a single entry point handles both layouts. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-10-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: add support for 32-bit firmware imagesJohn Hubbard
Some GPU firmware images are packaged as 32-bit ELF rather than 64-bit. Add a 32-bit implementation of the shared ELF section-parsing abstraction so those images can be parsed alongside the existing 64-bit path. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-9-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: don't assume 64-bit firmware imagesJohn Hubbard
Introduce a single ELF format abstraction that ties each ELF header type to its matching section-header type. This keeps the shared section parser ready for upcoming ELF32 support and avoids mixing 32-bit and 64-bit ELF layouts by mistake. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-8-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Blackwell: use correct sysmem flush registersJohn Hubbard
Blackwell GPUs moved the sysmem flush page registers away from the Ampere/Ada location. GB10x routes the flush through a pair of HSHUB0 register sets (primary and egress) that must both be programmed to the same address. GB20x routes it through FBHUB0. Define these registers relative to their HSHUB0 and FBHUB0 bases, as Open RM does, and implement the flush paths in the GB10x and GB20x framebuffer HALs. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-7-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: larger WPR2 (GSP) heapJohn Hubbard
The GSP-RM boot working memory portion of the WPR2 heap must be larger on Hopper and later GPUs than on Turing, Ampere, and Ada. Select the larger value for those generations. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-6-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: larger non-WPR heapJohn Hubbard
Hopper and Blackwell need a larger non-WPR heap than the 1 MiB that earlier architectures use. Hopper and Blackwell GB10x need 2 MiB, while Blackwell GB20x needs 2 MiB + 128 KiB. These sizes diverge by family, so give Hopper and each Blackwell family its own framebuffer HAL and select the non-WPR heap size per chipset family. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-5-jhubbard@nvidia.com Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Blackwell: compute PMU-reserved framebuffer sizeJohn Hubbard
GSP boot needs to know how much framebuffer memory is reserved for the PMU. Compute it per architecture: Blackwell dGPUs reserve a non-zero amount, earlier architectures leave it at zero, matching Open RM behavior. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-4-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: Hopper/Blackwell: new location for PCI config mirrorJohn Hubbard
Hopper and Blackwell GPUs moved the PCI config space mirror from 0x088000 to 0x092000. Select the correct address per architecture when building the GSP system info command. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Link: https://patch.msgid.link/20260602032111.224790-3-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02gpu: nova-core: set DMA mask width based on GPU architectureJohn Hubbard
Replace the hardcoded 47-bit DMA mask with a GPU HAL method that provides the correct value for the architecture. Set the DMA mask in Gpu::new(). Gpu owns all DMA allocations for the device, so no concurrent allocations can exist while the constructor is still running. Signed-off-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Eliot Courtney <ecourtney@nvidia.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20260602032111.224790-2-jhubbard@nvidia.com Co-developed-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
2026-06-02block, bfq: release cgroup stats with bfq_groupYu Kuai
BFQ cgroup stats contain percpu counters embedded in struct bfq_group, but the old free path destroys them from bfq_pd_free(), which is tied to blkg policy-data teardown. That is not the same lifetime as struct bfq_group. BFQ pins bfq_group while bfq_queue entities refer to it, so bfq_pd_free() can drop the policy-data reference while other bfq_group references still exist. The following blkcg change also defers policy-data release through RCU and leaves BFQ to run the final bfqg_put() from an RCU callback. For that conversion, stats teardown must belong to the last bfq_group put, not to policy-data teardown. Move stats teardown to bfqg_put() so the embedded counters are destroyed exactly when the last bfq_group reference is released, before kfree(bfqg). Without this preparatory change, the RCU-delayed policy-data free conversion reproduced the following KASAN report: BUG: KASAN: slab-use-after-free in percpu_counter_destroy_many+0xf1/0x2e0 Write of size 8 at addr ffff88811d9409e0 by task test_blkcg/535 CPU: 0 UID: 0 PID: 535 Comm: test_blkcg Not tainted 7.1.0-rc2-g1e14adca0199 #1 PREEMPT ea13f83d4b74a12510d20db4a7d9a0fe8275f05c Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x54/0x70 print_address_description+0x77/0x200 ? percpu_counter_destroy_many+0xf1/0x2e0 print_report+0x64/0x70 kasan_report+0x118/0x150 ? percpu_counter_destroy_many+0xf1/0x2e0 percpu_counter_destroy_many+0xf1/0x2e0 __mmdrop+0x1d8/0x350 finish_task_switch+0x3f5/0x570 __schedule+0xe8e/0x18a0 schedule+0xfe/0x1c0 schedule_timeout+0x7f/0x1d0 __wait_for_common+0x26c/0x3f0 wait_for_completion_state+0x21/0x40 call_usermodehelper_exec+0x271/0x2c0 __request_module+0x296/0x410 elv_iosched_store+0x1bc/0x2c0 queue_attr_store+0x152/0x1c0 kernfs_fop_write_iter+0x1d7/0x280 vfs_write+0x580/0x630 ksys_write+0xec/0x190 do_syscall_64+0x156/0x490 entry_SYSCALL_64_after_hwframe+0x77/0x7f Allocated by task 535: kasan_save_track+0x3e/0x80 __kasan_kmalloc+0x72/0x90 bfq_pd_alloc+0x60/0x100 [bfq] blkg_create+0x3bb/0xbe0 blkg_lookup_create+0x3a2/0x460 blkg_conf_start+0x24a/0x2d0 bfq_io_set_weight+0x17f/0x430 [bfq] cgroup_file_write+0x1c5/0x4b0 kernfs_fop_write_iter+0x1d7/0x280 vfs_write+0x580/0x630 ksys_write+0xec/0x190 do_syscall_64+0x156/0x490 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 0: kasan_save_track+0x3e/0x80 kasan_save_free_info+0x46/0x50 __kasan_slab_free+0x3a/0x60 kfree+0x14e/0x4f0 rcu_core+0x6f3/0xcd0 handle_softirqs+0x1a0/0x550 __irq_exit_rcu+0x8c/0x150 irq_exit_rcu+0xe/0x20 sysvec_apic_timer_interrupt+0x6e/0x80 asm_sysvec_apic_timer_interrupt+0x1a/0x20 Last potentially related work creation: kasan_save_stack+0x3e/0x60 kasan_record_aux_stack+0x99/0xb0 call_rcu+0x55/0x5c0 blkg_free_workfn+0x130/0x220 process_scheduled_works+0x655/0xb60 worker_thread+0x446/0x600 kthread+0x1f4/0x230 ret_from_fork+0x259/0x420 ret_from_fork_asm+0x1a/0x30 Signed-off-by: Yu Kuai <yukuai@fygo.io> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260601061502.899552-1-yukuai@fygo.io Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-06-02mm: mm_init: use div64_ul() instead of do_div()Giorgi Tchankvetadze
Fixes Coccinelle/coccicheck warning reported by do_div.cocci. Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder. There are no functional changes. The benefit is purely a semantic cleanup that better communicates the intent of the division and resolves the static analysis warning. Signed-off-by: Giorgi Tchankvetadze <giorgitchankvetadze1997@gmail.com> Link: https://patch.msgid.link/20260602-mm-div64-cleanup-v1-1-bf5d67d89d93@gmail.com Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-06-02driver core: Use system_percpu_wq instead of system_wqNathan Chancellor
Commit 1137838865bf ("driver core: Use mod_delayed_work to prevent lost deferred probe work") added a use of system_wq, which is deprecated in favor of system_percpu_wq added by commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq"). An upcoming warning in the workqueue tree flags this with: workqueue: work func deferred_probe_timeout_work_func enqueued on deprecated workqueue. Use system_{percpu|dfl}_wq instead. Switch to system_percpu_wq to clear up the warning. Fixes: 1137838865bf ("driver core: Use mod_delayed_work to prevent lost deferred probe work") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://patch.msgid.link/20260601-driver-core-fix-system_wq-warning-v1-1-f9001a70ee25@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-06-02iio: light: veml3328: add support for new deviceJoshua Crofts
Add support for the Vishay VEML3328 RGB/IR light sensor communicating via I2C (SMBus compatible). Also add a new entry for said driver into Kconfig and Makefile. Assisted-by: Gemini:3.1-Pro Signed-off-by: Joshua Crofts <joshua.crofts1@gmail.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
2026-06-02nvme: refresh multipath head zoned limits from path limitsYao Sang
queue_limits_stack_bdev() updates the multipath head limits from the path queue, but it does not propagate max_open_zones or max_active_zones. As a result, a zoned multipath namespace head can keep stale 0/0 values even after a ready path reports finite zoned resource limits. When refreshing the head limits in nvme_update_ns_info(), stack the zoned resource limits directly after stacking the path queue limits. Use min_not_zero() so the block layer's 0 value keeps its "no limit" meaning while finite limits are combined conservatively. This avoids advertising "no limit" on the multipath head while keeping the zoned-limit handling local to the NVMe multipath update path. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Yao Sang <sangyao@kylinos.cn> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-06-02nvme: fix FDP fdpcidx bounds checkliuxixin
The fdpcidx bounds check sets n = NUMFDPC + 1 but used > instead of >=, incorrectly accepting fdp_idx when it equals n (i.e. NUMFDPC + 1). Fixes: 30b5f20bb2dd ("nvme: register fdp parameters with the block layer") Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: liuxixin <gliuxen@gmail.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-06-02arm64: dts: rockchip: enable adc button for Radxa E25Chukun Pan
The Radxa E25 board has an ADC button. Enable it. Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Link: https://patch.msgid.link/20260601101000.2076721-1-amadeus@jmu.edu.cn Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2026-06-02Merge tag 'v7.2-rockchip-dts64-1' of ↵Linus Walleij
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt New peripherals: Watchdog on RK3528, MIPI CSI-2 receiver on RK3588. Adding frl-enable-gpios to a number of boards for HDMI 2.0 support. And a bunch of fixes and new peripherals for a number of boards. * tag 'v7.2-rockchip-dts64-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (30 commits) arm64: dts: rockchip: Add watchdog node for RK3528 arm64: dts: rockchip: add mipi csi-2 receiver nodes to rk3588 arm64: dts: rockchip: fix rk809 interrupt pin on rk3566-roc-pc arm64: dts: rockchip: Add missing pinctrl-names to rk3588s boards arm64: dts: rockchip: Add missing pinctrl-names to rk3588 boards arm64: dts: rockchip: Add missing pinctrl-names to rk3576 boards arm64: dts: rockchip: Drop unnecessary #{address,size}-cells from rk3588-jaguar arm64: dts: rockchip: Add frl-enable-gpios to rk3588s-roc-pc arm64: dts: rockchip: Add frl-enable-gpios to rk3588s-orangepi-cm5-base arm64: dts: rockchip: Add frl-enable-gpios to rk3588s-khadas-edge2 arm64: dts: rockchip: Add frl-enable-gpios to rk3588s-gameforce-ace arm64: dts: rockchip: Add frl-enable-gpios to rk3588s boards arm64: dts: rockchip: Add frl-enable-gpios to rk3588 boards arm64: dts: rockchip: Add frl-enable-gpios to rk3576-nanopi-r76s arm64: dts: rockchip: Add frl-enable-gpios to rk3576-luckfox-core3576 arm64: dts: rockchip: Add frl-enable-gpios to rk3576 boards arm64: dts: rockchip: Add AP6275P wireless support for Khadas Edge 2L arm64: dts: rockchip: Add HYM8563 RTC for Khadas Edge 2L arm64: dts: rockchip: Add #{address,size}-cells to Chromium-based /firmware arm64: dts: rockchip: Add HDMI and VOP support for Khadas Edge 2L ... Signed-off-by: Linus Walleij <linusw@kernel.org>
2026-06-02Merge tag 'v7.2-rockchip-dts32' of ↵Linus Walleij
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt Cleanups for RK3288-based ChromeOS platform. * tag 'v7.2-rockchip-dts32' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: dts: rockchip: Add #{address,size}-cells to Chromium-based /firmware ARM: dts: rockchip: Remove invalid properies from rk3288-veyron-analog-audio Signed-off-by: Linus Walleij <linusw@kernel.org>
2026-06-02wifi: mac80211: limit injected antenna index in ieee80211_parse_tx_radiotapDeepanshu Kartikey
When parsing the radiotap header of an injected frame, ieee80211_parse_tx_radiotap() uses the IEEE80211_RADIOTAP_ANTENNA value directly as a shift count: info->control.antennas |= BIT(*iterator.this_arg); *iterator.this_arg is an 8-bit value taken straight from the frame supplied by userspace, so BIT() can be asked to shift by up to 255. That is undefined behaviour on the unsigned long and is reported by UBSAN: UBSAN: shift-out-of-bounds in net/mac80211/tx.c:2174:30 shift exponent 235 is too large for 64-bit type 'unsigned long' Call Trace: ieee80211_parse_tx_radiotap+0xadb/0x1950 net/mac80211/tx.c:2174 ieee80211_monitor_start_xmit+0xb1f/0x1250 net/mac80211/tx.c:2451 ... packet_sendmsg+0x3eb6/0x50f0 net/packet/af_packet.c:3109 info->control.antennas is a 2-bit bitmap (u8 antennas:2), so only antenna indices 0 and 1 can ever be represented. Ignore any larger value instead of shifting out of bounds. Reported-by: syzbot+8e0622f6d9446420271f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8e0622f6d9446420271f Fixes: ef246a1480cc ("wifi: mac80211: support antenna control in injection") Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20260531011721.102941-1-kartikey406@gmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-06-02wifi: nl80211: reject oversized EMA RNR listsYuqi Xu
nl80211_parse_rnr_elems() stores the parsed element count in a u8-backed cfg80211_rnr_elems::cnt field and uses that count to size the flexible array allocation. Reject nested NL80211_ATTR_EMA_RNR_ELEMS input once the count reaches 255, before incrementing it again. This keeps the parser aligned with the data structure it fills and matches the existing bound check used by nl80211_parse_mbssid_elems(). Fixes: dbbb27e183b1 ("cfg80211: support RNR for EMA AP") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:gpt-5.4 Signed-off-by: Yuqi Xu <xuyuqiabc@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Link: https://patch.msgid.link/20260529152542.1412734-1-n05ec@lzu.edu.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-06-02ARM64: remove unnecessary architecture-specific <asm/device.h>Ethan Nelson-Moore
arch/arm64/include/asm/device.h is identical to include/asm-generic/device.h, and therefore the ARM64-specific version is unnecessary. Remove it. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-06-02nvme-tcp: Use WQ_PERCPU explicitly if wq_unbound is false.Kuniyuki Iwashima
Since commit 21c05ca88a54 ("workqueue: Add warnings and ensure one among WQ_PERCPU or WQ_UNBOUND is present"), we must explicitly set WQ_PERCPU or WQ_UNBOUND when creating workqueue. nvme_tcp_init_module() sets WQ_UNBOUND when the module param wq_unbound is set, but otherwise, WQ_PERCPU is missing, triggering the warning below: workqueue: nvme_tcp_wq is using neither WQ_PERCPU or WQ_UNBOUND. Setting WQ_PERCPU. WARNING: kernel/workqueue.c:5856 at __alloc_workqueue+0x1d02/0x2070 kernel/workqueue.c:5855, CPU#0: swapper/0/1 Let's set WQ_PERCPU if wq_unbound is false. Reported-by: syzbot+d078cba4418e65f61984@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6a1a9a86.323e8352.141b09.0001.GAE@google.com/ Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-06-02nvmet: fix pre-auth out-of-bounds heap read in Discovery Get Log PageBryam Vargas
nvmet_execute_disc_get_log_page() validates only the dword alignment of the host-supplied Log Page Offset (lpo). The 64-bit offset is then added to a small kzalloc'd buffer that holds the discovery log page and the result is passed straight to nvmet_copy_to_sgl(), which memcpy()s data_len bytes out to the host with no source-side bound check: u64 offset = nvmet_get_log_page_offset(req->cmd); /* 64-bit host */ size_t data_len = nvmet_get_log_page_len(req->cmd); /* 32-bit host */ ... if (offset & 0x3) { ... } /* only check */ ... alloc_len = sizeof(*hdr) + entry_size * discovery_log_entries(req); buffer = kzalloc(alloc_len, GFP_KERNEL); ... status = nvmet_copy_to_sgl(req, 0, buffer + offset, data_len); The Discovery controller is unauthenticated -- nvmet_host_allowed() returns true unconditionally for the discovery subsystem -- so the call is reachable pre-authentication by any TCP/RDMA/FC peer that can reach the nvmet target. With a discovery log page of ~1 KiB, an attacker requesting up to 4 KiB starting at offset == alloc_len reads the next slab page out and gets its content returned over the fabric (an empirical run on a default nvmet-tcp loopback target leaked 81 canonical kernel pointers in one Get Log Page response). Pointing the offset at unmapped kernel memory faults the in-kernel memcpy and crashes (or panics, on panic_on_oops=1) the target host instead. The attacker-controlled source-side offset pattern "nvmet_copy_to_sgl(req, 0, buffer + ATTACKER_OFFSET, ...)" is unique to nvmet_execute_disc_get_log_page in the entire nvmet codebase: every other Get Log Page handler in admin-cmd.c either ignores lpo (and silently starts every response at offset 0) or tracks a local destination offset with a fixed source pointer. Validate the host-supplied offset against the log page size, cap the copy length to what is actually available, and zero-fill any remainder of the host transfer buffer. The zero-fill matches the existing short-response pattern in nvmet_execute_get_log_changed_ns() (admin-cmd.c) and prevents leaking transport SGL contents when the host asks for more bytes than the log page contains. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Cc: stable@vger.kernel.org Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2026-06-02rseq: Fix using an uninitialized stack variable in rseq_exit_user_update()Qing Wang
There is an bug in which an uninitialized stack variable is used in rseq_exit_user_update() as reported by syzbot: BUG: KMSAN: kernel-infoleak in rseq_set_ids_get_csaddr include/linux/rseq_entry.h:502 [inline] The local variable: struct rseq_ids ids = { .cpu_id = task_cpu(t), .mm_cid = task_mm_cid(t), .node_id = cpu_to_node(ids.cpu_id), }; According to the C standard, the evaluation order of expressions in an initializer list is indeterminately sequenced. The compiler (Clang, in this KMSAN build) evaluates `cpu_to_node(ids.cpu_id)` *before* `ids.cpu_id` is initialized with `task_cpu(t)`. This is fixed by moving the assignment of ids.node_id outside the structure initialization. Fixes: 82f572449cfe ("rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode") Closes: https://syzkaller.appspot.com/bug?extid=185a631927096f9da2fc Reported-by: syzbot+185a631927096f9da2fc@syzkaller.appspotmail.com Signed-off-by: Qing Wang <wangqing7171@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://patch.msgid.link/20260602030854.574038-1-wangqing7171@gmail.com
2026-06-02sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime()Peter Zijlstra
assign_cfs_rq_runtime() during update_curr() sets the resched indicator and relies on check_cfs_rq_runtime() during pick_next_task() / put_prev_entity() to throttle the hierarchy once current task is preempted / blocks. Per-task throttle, on the other hand, uses throttle_cfs_rq() to simply propagate the throttle signals, and then relies on task work to individually throttle the runnable tasks on their way out to the userspace. Remove check_cfs_rq_runtime() and unify throttling into account_cfs_rq_runtime() which only sets the cfs_rq->throttled, cfs_rq->throttle_count indicators via throttle_cfs_rq() and optionally adds the task work to the current task (donor) it is on the throttled hierarchy. throttle_cfs_rq() requests for sched_cfs_bandwidth_slice() worth of bandwidth for the current hierarchy that enable it to continue running uninterrupted when selected. For the rest, it requests a bare minimum of "1" to ensure some bandwidth is available and pass the "runtime_remaining > 0" checks once selected. For SCHED_PROXY_EXEC, a mutex holder cannot exit to userspace without dropping it first and the mutex_unlock() ensures proxy is stopped before the mutex handoff which preserves the current semantics for running a throttled task until it exits to the userspace even if it acts as a donor. [ prateek: rebased on tip, comments, commit message. ] Reviewed-By: Benjamin Segall <bsegall@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Aaron Lu <ziqianlu@bytedance.com> Link: https://patch.msgid.link/20260602071005.11942-1-kprateek.nayak@amd.com
2026-06-02sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up()K Prateek Nayak
An update_curr() during the enqueue of throttled task will start throttling the hierarchy from subsequent commit. This can lead to tg_throttle_down() seeing non-empty throttled_limbo_list for the cfs_rq attaching the task from throttled_limbo_list one by one. For example: R | A / \ *B C | rq->curr *B is throttled with tasks on hte limbo list. When the tasks are unthrottled via tg_unthrottle_up() and entity of group B is placed onto A, update_curr() is called to catch up the vruntime and it may throttle group A causing the subsequent tg_throttle_down() to see the pending task's on B's limbo list. tg_unthrottle_up() /* --cfs_rq->throttle_count == 0 */ list_for_each_entry_safe(p, cfs_rq->throttled_limbo_list) enqueue_task_fair() enqueue_entity(se /* B->se */) update_curr(cfs_rq /* A->gcfs_rq */) account_cfs_rq_runtime(cfs_rq) throttle_cfs_rq(cfs_rq /* A->gcfs_rq */ ) tg_throttle_down() /* Reaches B->cfs_rq with throttle_count == 0 */ !!! !list_empty(&cfs_rq->throttled_limbo_list)) !!! Move the tasks from throttled_limbo_list onto a local list before starting the unthrottle to prevent the splat described above. If the hierarchy is throttled again in middle of an unthrottle, put the pending tasks back onto the limbo list to prevent running them unnecessarily. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Benjamin Segall <bsegall@google.com> Tested-by: Aaron Lu <ziqianlu@bytedance.com> Link: https://patch.msgid.link/20260602052531.11450-2-kprateek.nayak@amd.com
2026-06-02sched/fair: Call update_curr() before unthrottling the hierarchyK Prateek Nayak
Subsequent commits will allow update_curr() to throttle the hierarchy when the runtime accounting exceeds allocated quota. Call update_curr() before the unthrottle event, and in tg_unthrottle_up() to catch up on any remaining runtime and stabilize the "runtime_remaining" and "throttle_count" for that cfs_rq. Doing an update_curr() early ensures the cfs_rq is not throttled right back up again when the unthrottle is in progress. Since all callers of unthrottle_cfs_rq(), except two, already update the rq_clock and call rq_clock_start_loop_update(), move the update_rq_clock() from unthrottle_cfs_rq() to the callers that don't update the rq_clock. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Benjamin Segall <bsegall@google.com> Tested-by: Aaron Lu <ziqianlu@bytedance.com> Link: https://patch.msgid.link/20260602052531.11450-1-kprateek.nayak@amd.com
2026-06-02sched/fair: Use throttled_csd_list for local unthrottleK Prateek Nayak
When distribute_cfs_runtime() encounters a local cfs_rq, it adds it to a local list and unthrottles it at the end, when it is done unthrottling other cfs_rq(s) on cfs_b->throttled_cfs_rq until the bandwidth runs out. Instead of using a local list, reuse the local CPU's rq->throttled_csd_list and the __cfsb_csd_unthrottle() path for unthrottle. If this is the first cfs_rq to be queued on the "throttled_csd_list", it prevents the need for a remote CPUs to interrupt this local CPU if they themselves are performing async unthrottle. If this is not the first cfs_rq on the list, there is an async unthrottle operation pending on this local CPU and the unthrottle can be batched together. No functional changes intended. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Benjamin Segall <bsegall@google.com> Tested-by: Aaron Lu <ziqianlu@bytedance.com> Link: https://patch.msgid.link/20260602050005.11160-3-kprateek.nayak@amd.com
2026-06-02sched/fair: Convert cfs bandwidth throttling to use guardsK Prateek Nayak
Routine conversion of rcu_read_lock(), spin_lock*, and rq_lock usage within the cfs bandwidth controller to use class guards. Only notable changes are: - Checking for "cfs_rq->runtime_remaining <= 0" instead of the inverse to spot a throttle and break early. This also saves the need for extra indentation in the unthrottle case. - Reordering of list_del_rcu() against throttled_clock indicator update in unthrottle_cfs_rq(). Both are done with "cfs_b->lock" held after the "cfs_rq->throttled" is cleared which make the reordering safe against concurrent list modifications. No functional changes intended. Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ben Segall <bsegall@google.com> Tested-by: Aaron Lu <ziqianlu@bytedance.com> Link: https://patch.msgid.link/20260602050005.11160-2-kprateek.nayak@amd.com
2026-06-02sched/fair: Allocate cfs_tg_state with percpu allocatorZecheng Li
To remove the cfs_rq pointer array in task_group, allocate the combined cfs_rq and sched_entity using the per-cpu allocator. This patch implements the following: - Changes task_group->cfs_rq from 'struct cfs_rq **' to 'struct cfs_rq __percpu *'. - Updates memory allocation in alloc_fair_sched_group() and free_fair_sched_group() to use alloc_percpu() and free_percpu() respectively. - Uses the inline accessor tg_cfs_rq(tg, cpu) with per_cpu_ptr() to retrieve the pointer to cfs_rq for the given task group and CPU. - Replaces direct accesses tg->cfs_rq[cpu] with calls to the new tg_cfs_rq(tg, cpu) helper. - Handles the root_task_group: since struct rq is already a per-cpu variable (runqueues), its embedded cfs_rq (rq->cfs) is also per-cpu. Therefore, we assign root_task_group.cfs_rq = &runqueues.cfs. - Cleanup the code in initializing the root task group. This change places each CPU's cfs_rq and sched_entity in its local per-cpu memory area to remove the per-task_group pointer arrays. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Josh Don <joshdon@google.com> Link: https://patch.msgid.link/20260522141623.600235-4-zli94@ncsu.edu
2026-06-02sched/fair: Remove task_group->se pointer arrayZecheng Li
Now that struct sched_entity is co-located with struct cfs_rq for non-root task groups, the task_group->se pointer array is redundant. The associated sched_entity can be loaded directly from the cfs_rq. This patch performs the access conversion with the helpers: - is_root_task_group(tg): checks if a task group is the root task group. It compares the task group's address with the global root_task_group variable. - tg_se(tg, cpu): retrieves the cfs_rq and returns the address of the co-located se. This function checks if tg is the root task group to ensure behaving the same of previous tg->se[cpu]. Replaces all accesses that use the tg->se[cpu] pointer array with calls to the new tg_se(tg, cpu) accessor. - cfs_rq_se(cfs_rq): simplifies access paths like cfs_rq->tg->se[...] to use the co-located sched_entity. This function also checks if tg is the root task group to ensure same behavior. Since tg_se is not in very hot code paths, and the branch is a register comparison with an immediate value (`&root_task_group`), the performance impact is expected to be negligible. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Josh Don <joshdon@google.com> Link: https://patch.msgid.link/20260522141623.600235-3-zli94@ncsu.edu
2026-06-02sched/fair: Co-locate cfs_rq and sched_entity in cfs_tg_stateZecheng Li
Improve data locality and reduce pointer chasing by allocating struct cfs_rq and struct sched_entity together for non-root task groups. This is achieved by introducing a new combined struct cfs_tg_state that holds both objects in a single allocation. This patch: - Introduces struct cfs_tg_state that embeds cfs_rq, sched_entity, and sched_statistics together in a single structure. - Updates __schedstats_from_se() in stats.h to use cfs_tg_state for accessing sched_statistics from a group sched_entity. - Modifies alloc_fair_sched_group() and free_fair_sched_group() to allocate and free the new struct as a single unit. - Modifies the per-CPU pointers in task_group->se and task_group->cfs_rq to point to the members in the new combined structure. Signed-off-by: Zecheng Li <zecheng@google.com> Signed-off-by: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Josh Don <joshdon@google.com> Link: https://patch.msgid.link/20260522141623.600235-2-zli94@ncsu.edu