linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
2026-03-01	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf before 7.0-rc2	Alexei Starovoitov
	Cross-merge BPF and other fixes after downstream PR. No conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-01	spi: spi-mem: clean up kernel-doc in spi-mem.h	Randy Dunlap
	Eliminate all kernel-doc warnings in spi-mem.h: - add missing struct member descriptions - don't use "struct" for function descrptions Warning: include/linux/spi/spi-mem.h:202 struct member 'cmd' not described in 'spi_mem_op' Warning: include/linux/spi/spi-mem.h:202 struct member 'addr' not described in 'spi_mem_op' Warning: include/linux/spi/spi-mem.h:202 struct member 'dummy' not described in 'spi_mem_op' Warning: include/linux/spi/spi-mem.h:202 struct member 'data' not described in 'spi_mem_op' Warning: include/linux/spi/spi-mem.h:286 Incorrect use of kernel-doc format: * struct spi_mem_get_drvdata() - get driver private data attached to a SPI mem Warning: include/linux/spi/spi-mem.h:298 Incorrect use of kernel-doc format: * struct spi_controller_mem_ops - SPI memory operations Warning: include/linux/spi/spi-mem.h:362 expecting prototype for struct spi_mem_set_drvdata. Prototype was for struct spi_controller_mem_ops instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260301014743.3133167-1-rdunlap@infradead.org Signed-off-by: Mark Brown <broonie@kernel.org>
2026-03-01	iio: core: Add IIO_EV_INFO_SCALE to event info	Taha Ed-Dafili
	Implement support for IIO_EV_INFO_SCALE in the internal enum iio_event_info to allow proper ABI compliance. This allows drivers (like the ADXL345) to expose event scale attributes using the standard IIO ABI rather than manual device attributes. Signed-off-by: Taha Ed-Dafili <0rayn.dev@gmail.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2026-02-28	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds
	Pull bpf fixes from Alexei Starovoitov: - Fix alignment of arm64 JIT buffer to prevent atomic tearing (Fuad Tabba) - Fix invariant violation for single value tnums in the verifier (Harishankar Vishwanathan, Paul Chaignon) - Fix a bunch of issues found by ASAN in selftests/bpf (Ihor Solodrai) - Fix race in devmpa and cpumap on PREEMPT_RT (Jiayuan Chen) - Fix show_fdinfo of kprobe_multi when cookies are not present (Jiri Olsa) - Fix race in freeing special fields in BPF maps to prevent memory leaks (Kumar Kartikeya Dwivedi) - Fix OOB read in dmabuf_collector (T.J. Mercier) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (36 commits) selftests/bpf: Avoid simplification of crafted bounds test selftests/bpf: Test refinement of single-value tnum bpf: Improve bounds when tnum has a single possible value bpf: Introduce tnum_step to step through tnum's members bpf: Fix race in devmap on PREEMPT_RT bpf: Fix race in cpumap on PREEMPT_RT selftests/bpf: Add tests for special fields races bpf: Retire rcu_trace_implies_rcu_gp() from local storage bpf: Delay freeing fields in local storage bpf: Lose const-ness of map in map_check_btf() bpf: Register dtor for freeing special fields selftests/bpf: Fix OOB read in dmabuf_collector selftests/bpf: Fix a memory leak in xdp_flowtable test bpf: Fix stack-out-of-bounds write in devmap bpf: Fix kprobe_multi cookies access in show_fdinfo callback bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing selftests/bpf: Don't override SIGSEGV handler with ASAN selftests/bpf: Check BPFTOOL env var in detect_bpftool_path() selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN selftests/bpf: Fix array bounds warning in jit_disasm_helpers ...
2026-02-28	Merge patch series "scsi: target: Add support for completing commands from ↵	Martin K. Petersen
	backend context" The following patches made over Linus's current tree allow users to tell the target layer to perform direct completions instead of always deferring to LIO's completion workqueue. When the frontend driver already has multiple worker threads (or you are doing a LUN per target with a single thread per target) then bypassing the LIO workqueue can increase performance 20-30%. Link: https://patch.msgid.link/20260222232946.7637-1-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-02-28	scsi: target: Add support for completing commands from backend context	Mike Christie
	To complete a command several drivers just drop their reference and add it to list to be processed by a driver specific thread. So there's no need to go from backend context to the LIO thread then to the driver's thread. When avoiding the LIO thread, IOPS can increase from 20-30% for workloads like: fio --filename=/dev/sdb --direct=1 --rw=randrw --bs=8K \ --ioengine=libaio --iodepth=128 --numjobs=$jobs where increasing jobs increases the performance improvement (this is using NVMe drives with LIO's submit_type=1 to directly submit). Add the infrastructure so drivers and userspace can control how to complete a command like is done for the submission path. In this commit there is no behavior change and we continue to defer to the LIO workqueue thread. In the subsequent commits we will allow drivers to report what they support and allow userspace to control the behavior. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://patch.msgid.link/20260222232946.7637-2-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-02-28	scsi: core: Add 'serial' sysfs attribute for SCSI/SATA	Igor Pylypiv
	Add a 'serial' sysfs attribute for SCSI and SATA devices. This attribute exposes the Unit Serial Number, which is derived from the Device Identification Vital Product Data (VPD) page 0x80. Whitespace is stripped from the retrieved serial number to handle the different alignment (right-aligned for SCSI, potentially left-aligned for SATA). As noted in SAT-5 10.5.3, "Although SPC-5 defines the PRODUCT SERIAL NUMBER field as right-aligned, ACS-5 does not require its SERIAL NUMBER field to be right-aligned. Therefore, right-alignment of the PRODUCT SERIAL NUMBER field for the translation is not assured." This attribute is used by tools such as lsblk to display the serial number of block devices. [mkp: length adjustment] Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20260209212151.342151-1-ipylypiv@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2026-02-28	net: sched: sch_dualpi2: use qdisc_dequeue_drop() for dequeue drops	Jesper Dangaard Brouer
	DualPI2 drops packets during dequeue but was using kfree_skb_reason() directly, bypassing trace_qdisc_drop. Convert to qdisc_dequeue_drop() and add QDISC_DROP_L4S_STEP_NON_ECN to the qdisc drop reason enum. - Set TCQ_F_DEQUEUE_DROPS flag in dualpi2_init() - Use enum qdisc_drop_reason in drop_and_retry() - Replace kfree_skb_reason() with qdisc_dequeue_drop() Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211351978.3011628.11267023360997620069.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	net: sched: rename QDISC_DROP_CAKE_FLOOD to QDISC_DROP_FLOOD_PROTECTION	Jesper Dangaard Brouer
	Rename QDISC_DROP_CAKE_FLOOD to QDISC_DROP_FLOOD_PROTECTION to use a generic name without embedding the qdisc name. This follows the principle that drop reasons should describe the drop mechanism rather than being tied to a specific qdisc implementation. The flood protection drop reason is used by qdiscs implementing probabilistic drop algorithms (like BLUE) that detect unresponsive flows indicating potential DoS or flood attacks. CAKE uses this via its Cobalt AQM component. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211347537.3011628.13759059534638729639.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	net: sched: rename QDISC_DROP_FQ_* to generic names	Jesper Dangaard Brouer
	Rename FQ-specific drop reasons to generic names: - QDISC_DROP_FQ_BAND_LIMIT -> QDISC_DROP_BAND_LIMIT - QDISC_DROP_FQ_HORIZON_LIMIT -> QDISC_DROP_HORIZON_LIMIT This follows the principle that drop reasons should describe the drop mechanism rather than being tied to a specific qdisc implementation. These concepts (priority band limits, timestamp horizon) could apply to other qdiscs as well. Remove the local macro define FQDR() and instead use the full QDISC_DROP_* name to make it easier to navigate code. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211346902.3011628.12523261489552097455.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	net: sched: sfq: convert to qdisc drop reasons	Jesper Dangaard Brouer
	Convert SFQ to use the new qdisc-specific drop reason infrastructure. This patch demonstrates how to convert a flow-based qdisc to use the new enum qdisc_drop_reason. As part of this conversion: - Add QDISC_DROP_MAXFLOWS for flow table exhaustion - Rename FQ_FLOW_LIMIT to generic FLOW_LIMIT, now shared by FQ and SFQ - Use QDISC_DROP_OVERLIMIT for sfq_drop() when overall limit exceeded - Use QDISC_DROP_FLOW_LIMIT for per-flow depth limit exceeded The FLOW_LIMIT reason is now a common drop reason for per-flow limits, applicable to both FQ and SFQ qdiscs. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211345946.3011628.12770616071857185664.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	net: sched: introduce qdisc-specific drop reason tracing	Jesper Dangaard Brouer
	Create new enum qdisc_drop_reason and trace_qdisc_drop tracepoint for qdisc layer drop diagnostics with direct qdisc context visibility. The new tracepoint includes qdisc handle, parent, kind (name), and device information. Existing SKB_DROP_REASON_QDISC_DROP is retained for backwards compatibility via kfree_skb_reason(). Convert qdiscs with drop reasons to use the new infrastructure. Change CAKE's cobalt_should_drop() return type from enum skb_drop_reason to enum qdisc_drop_reason to fix implicit enum conversion warnings. Use QDISC_DROP_UNSPEC as the 'not dropped' sentinel instead of SKB_NOT_DROPPED_YET. Both have the same compiled value (0), so the comparison logic remains semantically equivalent. Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/177211345275.3011628.1974310302645218067.stgit@firesoul Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	xsk: Fix fragment node deletion to prevent buffer leak	Nikhil P. Rao
	After commit b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node"), the list_node field is reused for both the xskb pool list and the buffer free list, this causes a buffer leak as described below. xp_free() checks if a buffer is already on the free list using list_empty(&xskb->list_node). When list_del() is used to remove a node from the xskb pool list, it doesn't reinitialize the node pointers. This means list_empty() will return false even after the node has been removed, causing xp_free() to incorrectly skip adding the buffer to the free list. Fix this by using list_del_init() instead of list_del() in all fragment handling paths, this ensures the list node is reinitialized after removal, allowing the list_empty() to work correctly. Fixes: b692bf9a7543 ("xsk: Get rid of xdp_buff_xsk::xskb_list_node") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Nikhil P. Rao <nikhil.rao@amd.com> Link: https://patch.msgid.link/20260225000456.107806-2-nikhil.rao@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28	firmware: exynos-acpm: Drop fake 'const' on handle pointer	Krzysztof Kozlowski
	All the functions operating on the 'handle' pointer are claiming it is a pointer to const thus they should not modify the handle. In fact that's a false statement, because first thing these functions do is drop the cast to const with container_of: struct acpm_info *acpm = handle_to_acpm_info(handle); And with such cast the handle is easily writable with simple: acpm->handle.ops.pmic_ops.read_reg = NULL; The code is not correct logically, either, because functions like acpm_get_by_node() and acpm_handle_put() are meant to modify the handle reference counting, thus they must modify the handle. Modification here happens anyway, even if the reference counting is stored in the container which the handle is part of. The code does not have actual visible bug, but incorrect 'const' annotations could lead to incorrect compiler decisions. Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20260224104203.42950-2-krzysztof.kozlowski@oss.qualcomm.com Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-02-28	dt-bindings: clock: exynosautov920: add G3D clock definitions	Raghav Sharma
	Add device tree clock binding definitions for CMU_G3D Signed-off-by: Raghav Sharma <raghav.s@samsung.com> Link: https://patch.msgid.link/20260202103555.2089376-2-raghav.s@samsung.com Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-02-28	KVM: remove CONFIG_KVM_GENERIC_MMU_NOTIFIER	Paolo Bonzini
	All architectures now use MMU notifier for KVM page table management. Remove the Kconfig symbol and the code that is used when it is disabled. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-02-28	iio: tsl2772: fix all kernel-doc warnings	Randy Dunlap
	Use the correct kernel-doc notation for struct members to eliminate kernel-doc warnings: Warning: include/linux/platform_data/tsl2772.h:88 struct member 'prox_diode' not described in 'tsl2772_settings' Warning: include/linux/platform_data/tsl2772.h:88 struct member 'prox_power' not described in 'tsl2772_settings' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2026-02-28	io_uring: correct comment for IORING_SETUP_TASKRUN_FLAG	Jens Axboe
	Sync with a recent liburing fix, which corrects the comment explaining when the IORING_SETUP_TASKRUN_FLAG setup flag is valid to use. May be use with COOP_TASKRUN or DEFER_TASKRUN, not useful without either of this task_work mechanisms being used. Link: https://github.com/axboe/liburing/pull/1543 Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-28	ALSA: control: Verify put() result when in debug mode	Cezary Rojewski
	The put() operation is expected to return: 1) 0 on success if no changes were made 2) 1 on success if changes were made 3) error code otherwise Currently 2) is usually ignored when writing control-operations. While forcing compliance is not an option right now, make it easier for developers to adhere to the expectations and notice problems by logging them when CONFIG_SND_CTL_DEBUG is enabled. Due to large size of struct snd_ctl_elem_value, 'value_buf' is provided as a reusable buffer for kctl->put() verification. This prevents exhausting the stack when verifying the operation. >From user perspective, patch introduces a new trace/events category 'snd_ctl' containing a single 'snd_ctl_put' event type. Log sample: amixer-1086 [003] ..... 8.035939: snd_ctl_put: success: expected=0, actual=0 for ctl numid=1, iface=MIXER, name='Master Playback Volume', index=0, device=0, subdevice=0, card=0 amixer-1087 [003] ..... 8.938721: snd_ctl_put: success: expected=1, actual=1 for ctl numid=1, iface=MIXER, name='Master Playback Volume', index=0, device=0, subdevice=0, card=0 amixer-1088 [003] ..... 9.631470: snd_ctl_put: success: expected=1, actual=1 for ctl numid=1, iface=MIXER, name='Master Playback Volume', index=0, device=0, subdevice=0, card=0 amixer-1089 [000] ..... 9.636786: snd_ctl_put: fail: expected=1, actual=0 for ctl numid=5, iface=MIXER, name='Loopback Mute', index=0, device=0, subdevice=0, card=0 Signed-off-by: Cezary Rojewski <cezary.rojewski@intel.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20260224205619.584795-1-cezary.rojewski@intel.com
2026-02-28	ALSA: hda/tas2781: A workaround solution to lower-vol issue among lower ↵	Shenghao Ding
	calibrated-impedance micro-speaker on TAS2781 On TAS2781, if the Speaker calibrated impedance is lower than default value hard-coded inside the TAS2781, it will cuase vol lower than normal. In order to fix this issue, the parameter of SineGainI need updating. Signed-off-by: Shenghao Ding <shenghao-ding@ti.com> Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Link: https://patch.msgid.link/20260227144641.1243-1-shenghao-ding@ti.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-02-27	net: usb: r8152: add TRENDnet TUC-ET2G	Valentin Spreckels
	The TRENDnet TUC-ET2G is a RTL8156 based usb ethernet adapter. Add its vendor and product IDs. Signed-off-by: Valentin Spreckels <valentin@spreckels.dev> Link: https://patch.msgid.link/20260226195409.7891-2-valentin@spreckels.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	NFC: fix header file kernel-doc warnings	Randy Dunlap
	Repair some of the comments: - use the correct enum names - don't use "/**" for a non-kernel-doc comment to fix these warnings: Warning: include/uapi/linux/nfc.h:127 Excess enum value '@NFC_EVENT_DEVICE_DEACTIVATED' description in 'nfc_commands' Warning: include/uapi/linux/nfc.h:204 Excess enum value '@NFC_ATTR_APDU' description in 'nfc_attrs' Warning: include/uapi/linux/nfc.h:302 expecting prototype for Pseudo(). Prototype was for NFC_RAW_HEADER_SIZE() instead Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260226221004.1037909-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	net: inline skb_add_rx_frag_netmem()	Eric Dumazet
	This critical helper (via skb_add_rx_frag()) is mostly used from drivers rx fast path. It is time to inline it, this actually saves space in vmlinux: size vmlinux.old vmlinux text data bss dec hex filename 37350766 23092977 4846992 65290735 3e441ef vmlinux.old 37350600 23092977 4846992 65290569 3e44149 vmlinux Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260226041213.1892561-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	net/sched: Only allow act_ct to bind to clsact/ingress qdiscs and shared blocks	Victor Nogueira
	As Paolo said earlier [1]: "Since the blamed commit below, classify can return TC_ACT_CONSUMED while the current skb being held by the defragmentation engine. As reported by GangMin Kim, if such packet is that may cause a UaF when the defrag engine later on tries to tuch again such packet." act_ct was never meant to be used in the egress path, however some users are attaching it to egress today [2]. Attempting to reach a middle ground, we noticed that, while most qdiscs are not handling TC_ACT_CONSUMED, clsact/ingress qdiscs are. With that in mind, we address the issue by only allowing act_ct to bind to clsact/ingress qdiscs and shared blocks. That way it's still possible to attach act_ct to egress (albeit only with clsact). [1] https://lore.kernel.org/netdev/674b8cbfc385c6f37fb29a1de08d8fe5c2b0fbee.1771321118.git.pabeni@redhat.com/ [2] https://lore.kernel.org/netdev/cc6bfb4a-4a2b-42d8-b9ce-7ef6644fb22b@ovn.org/ Reported-by: GangMin Kim <km.kim1503@gmail.com> Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags") CC: stable@vger.kernel.org Signed-off-by: Victor Nogueira <victor@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260225134349.1287037-1-victor@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	net: use try_cmpxchg() in lock_sock_nested()	Eric Dumazet
	Add a fast path in lock_sock_nested(), to avoid acquiring the socket spinlock only to set @owned to one: spin_lock_bh(&sk->sk_lock.slock); if (unlikely(sock_owned_by_user_nocheck(sk))) __lock_sock(sk); sk->sk_lock.owned = 1; spin_unlock_bh(&sk->sk_lock.slock); On x86_64 compiler generates something quite efficient: 00000000000077c0 <lock_sock_nested>: 77c0: f3 0f 1e fa endbr64 77c4: e8 00 00 00 00 call __fentry__ 77c9: b9 01 00 00 00 mov $0x1,%ecx 77ce: 31 c0 xor %eax,%eax 77d0: f0 48 0f b1 8f 48 01 00 00 lock cmpxchg %rcx,0x148(%rdi) 77d9: 75 06 jne slow_path 77db: 2e e9 00 00 00 00 cs jmp __x86_return_thunk-0x4 slow_path: ... Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260226021215.1764237-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	net: phy: micrel: Add support for lan9645x internal phy	Jens Emil Schulz Østergaard
	LAN9645X is a family of switch chips with 5 internal copper phys. The internal PHY is based on parts of LAN8832. This is a low-power, single port triple-speed (10BASE-T/100BASE-TX/1000BASE-T) ethernet physical layer transceiver (PHY) that supports transmission and reception of data on standard CAT-5, as well as CAT-5e and CAT-6 Unshielded Twisted Pair (UTP) cables. Add support for the internal PHY of the lan9645x chip family. Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Signed-off-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260226-phy_micrel_add_support_for_lan9645x_internal_phy-v3-1-1fe82379962b@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	inet: annotate data-races around isk->inet_num	Eric Dumazet
	UDP/TCP lookups are using RCU, thus isk->inet_num accesses should use READ_ONCE() and WRITE_ONCE() where needed. Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260225203545.1512417-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	bpf: Introduce tnum_step to step through tnum's members	Harishankar Vishwanathan
	This commit introduces tnum_step(), a function that, when given t, and a number z returns the smallest member of t larger than z. The number z must be greater or equal to the smallest member of t and less than the largest member of t. The first step is to compute j, a number that keeps all of t's known bits, and matches all unknown bits to z's bits. Since j is a member of the t, it is already a candidate for result. However, we want our result to be (minimally) greater than z. There are only two possible cases: (1) Case j <= z. In this case, we want to increase the value of j and make it > z. (2) Case j > z. In this case, we want to decrease the value of j while keeping it > z. (Case 1) j <= z t = xx11x0x0 z = 10111101 (189) j = 10111000 (184) ^ k (Case 1.1) Let's first consider the case where j < z. We will address j == z later. Since z > j, there had to be a bit position that was 1 in z and a 0 in j, beyond which all positions of higher significance are equal in j and z. Further, this position could not have been unknown in a, because the unknown positions of a match z. This position had to be a 1 in z and known 0 in t. Let k be position of the most significant 1-to-0 flip. In our example, k = 3 (starting the count at 1 at the least significant bit). Setting (to 1) the unknown bits of t in positions of significance smaller than k will not produce a result > z. Hence, we must set/unset the unknown bits at positions of significance higher than k. Specifically, we look for the next larger combination of 1s and 0s to place in those positions, relative to the combination that exists in z. We can achieve this by concatenating bits at unknown positions of t into an integer, adding 1, and writing the bits of that result back into the corresponding bit positions previously extracted from z. >From our example, considering only positions of significance greater than k: t = xx..x z = 10..1 + 1 ----- 11..0 This is the exact combination 1s and 0s we need at the unknown bits of t in positions of significance greater than k. Further, our result must only increase the value minimally above z. Hence, unknown bits in positions of significance smaller than k should remain 0. We finally have, result = 11110000 (240) (Case 1.2) Now consider the case when j = z, for example t = 1x1x0xxx z = 10110100 (180) j = 10110100 (180) Matching the unknown bits of the t to the bits of z yielded exactly z. To produce a number greater than z, we must set/unset the unknown bits in t, and all the unknown bits of t candidates for being set/unset. We can do this similar to Case 1.1, by adding 1 to the bits extracted from the masked bit positions of z. Essentially, this case is equivalent to Case 1.1, with k = 0. t = 1x1x0xxx z = .0.1.100 + 1 --------- .0.1.101 This is the exact combination of bits needed in the unknown positions of t. After recalling the known positions of t, we get result = 10110101 (181) (Case 2) j > z t = x00010x1 z = 10000010 (130) j = 10001011 (139) ^ k Since j > z, there had to be a bit position which was 0 in z, and a 1 in j, beyond which all positions of higher significance are equal in j and z. This position had to be a 0 in z and known 1 in t. Let k be the position of the most significant 0-to-1 flip. In our example, k = 4. Because of the 0-to-1 flip at position k, a member of t can become greater than z if the bits in positions greater than k are themselves >= to z. To make that member minimally greater than z, the bits in positions greater than k must be exactly = z. Hence, we simply match all of t's unknown bits in positions more significant than k to z's bits. In positions less significant than k, we set all t's unknown bits to 0 to retain minimality. In our example, in positions of greater significance than k (=4), t=x000. These positions are matched with z (1000) to produce 1000. In positions of lower significance than k, t=10x1. All unknown bits are set to 0 to produce 1001. The final result is: result = 10001001 (137) This concludes the computation for a result > z that is a member of t. The procedure for tnum_step() in this commit implements the idea described above. As a proof of correctness, we verified the algorithm against a logical specification of tnum_step. The specification asserts the following about the inputs t, z and output res that: 1. res is a member of t, and 2. res is strictly greater than z, and 3. there does not exist another value res2 such that 3a. res2 is also a member of t, and 3b. res2 is greater than z 3c. res2 is smaller than res We checked the implementation against this logical specification using an SMT solver. The verification formula in SMTLIB format is available at [1]. The verification returned an "unsat": indicating that no input assignment exists for which the implementation and the specification produce different outputs. In addition, we also automatically generated the logical encoding of the C implementation using Agni [2] and verified it against the same specification. This verification also returned an "unsat", confirming that the implementation is equivalent to the specification. The formula for this check is also available at [3]. Link: https://pastebin.com/raw/2eRWbiit [1] Link: https://github.com/bpfverif/agni [2] Link: https://pastebin.com/raw/EztVbBJ2 [3] Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu> Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu> Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com> Link: https://lore.kernel.org/r/93fdf71910411c0f19e282ba6d03b4c65f9c5d73.1772225741.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27	net/sched: act_gate: snapshot parameters with RCU on replace	Paul Moses
	The gate action can be replaced while the hrtimer callback or dump path is walking the schedule list. Convert the parameters to an RCU-protected snapshot and swap updates under tcf_lock, freeing the previous snapshot via call_rcu(). When REPLACE omits the entry list, preserve the existing schedule so the effective state is unchanged. Fixes: a51c328df310 ("net: qos: introduce a gate control flow action") Cc: stable@vger.kernel.org Signed-off-by: Paul Moses <p@1g4.org> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Victor Nogueira <victor@mojatatu.com> Link: https://patch.msgid.link/20260223150512.2251594-2-p@1g4.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27	bpf: Lose const-ness of map in map_check_btf()	Kumar Kartikeya Dwivedi
	BPF hash map may now use the map_check_btf() callback to decide whether to set a dtor on its bpf_mem_alloc or not. Unlike C++ where members can opt out of const-ness using mutable, we must lose the const qualifier on the callback such that we can avoid the ugly cast. Make the change and adjust all existing users, and lose the comment in hashtab.c. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27	bpf: Register dtor for freeing special fields	Kumar Kartikeya Dwivedi
	There is a race window where BPF hash map elements can leak special fields if the program with access to the map value recreates these special fields between the check_and_free_fields done on the map value and its eventual return to the memory allocator. Several ways were explored prior to this patch, most notably [0] tried to use a poison value to reject attempts to recreate special fields for map values that have been logically deleted but still accessible to BPF programs (either while sitting in the free list or when reused). While this approach works well for task work, timers, wq, etc., it is harder to apply the idea to kptrs, which have a similar race and failure mode. Instead, we change bpf_mem_alloc to allow registering destructor for allocated elements, such that when they are returned to the allocator, any special fields created while they were accessible to programs in the mean time will be freed. If these values get reused, we do not free the fields again before handing the element back. The special fields thus may remain initialized while the map value sits in a free list. When bpf_mem_alloc is retired in the future, a similar concept can be introduced to kmalloc_nolock-backed kmem_cache, paired with the existing idea of a constructor. Note that the destructor registration happens in map_check_btf, after the BTF record is populated and (at that point) avaiable for inspection and duplication. Duplication is necessary since the freeing of embedded bpf_mem_alloc can be decoupled from actual map lifetime due to logic introduced to reduce the cost of rcu_barrier()s in mem alloc free path in 9f2c6e96c65e ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc."). As such, once all callbacks are done, we must also free the duplicated record. To remove dependency on the bpf_map itself, also stash the key size of the map to obtain value from htab_elem long after the map is gone. [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr") Fixes: 1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context") Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260227224806.646888-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-27	Merge tag 'pci-v7.0-fixes-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Update MAINTAINERS email address (Shawn Guo) - Refresh cached Endpoint driver MSI Message Address to fix a v7.0 regression when kernel changes the address after firmware has configured it (Niklas Cassel) - Flush Endpoint MSI-X writes so they complete before the outbound ATU entry is unmapped (Niklas Cassel) - Correct the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value, which broke VMM use of PCI capabilities (Bjorn Helgaas) * tag 'pci-v7.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value PCI: dwc: ep: Flush MSI-X write before unmapping its ATU entry PCI: dwc: ep: Refresh MSI Message Address cache on change MAINTAINERS: Update Shawn Guo's address for HiSilicon PCIe controller driver
2026-02-27	nsfs: tighten permission checks for ns iteration ioctls	Christian Brauner
	Even privileged services should not necessarily be able to see other privileged service's namespaces so they can't leak information to each other. Use may_see_all_namespaces() helper that centralizes this policy until the nstree adapts. Link: https://patch.msgid.link/20260226-work-visibility-fixes-v1-1-d2c2853313bd@kernel.org Fixes: a1d220d9dafa ("nsfs: iterate through mount namespaces") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@kernel.org # v6.12+ Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-27	ACPI: CPPC: add APIs and sysfs interface for perf_limited	Sumit Gupta
	Add sysfs interface to read/write the Performance Limited register. The Performance Limited register indicates to the OS that an unpredictable event (like thermal throttling) has limited processor performance. It contains two sticky bits set by the platform: - Bit 0 (Desired_Excursion): Set when delivered performance is constrained below desired performance. Not used when Autonomous Selection is enabled. - Bit 1 (Minimum_Excursion): Set when delivered performance is constrained below minimum performance. These bits remain set until OSPM explicitly clears them. The write operation accepts a bitmask of bits to clear: - Write 0x1 to clear bit 0 - Write 0x2 to clear bit 1 - Write 0x3 to clear both bits This enables users to detect if platform throttling impacted a workload. Users clear the register before execution, run the workload, then check afterward - if set, hardware throttling occurred during that time window. The interface is exposed as: /sys/devices/system/cpu/cpuX/cpufreq/perf_limited Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-7-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-27	ACPI: CPPC: Add cppc_get_perf() API to read performance controls	Sumit Gupta
	Add cppc_get_perf() function to read values of performance control registers including desired_perf, min_perf, max_perf, energy_perf, and auto_sel. This provides a read interface to complement the existing cppc_set_perf() write interface for performance control registers. Note that auto_sel is read by cppc_get_perf() but not written by cppc_set_perf() to avoid unintended mode changes during performance updates. It can be updated with existing dedicated cppc_set_auto_sel() API. Use cppc_get_perf() in cppc_cpufreq_get_cpu_data() to initialize perf_ctrls with current hardware register values during cpufreq policy initialization. Signed-off-by: Sumit Gupta <sumitg@nvidia.com> Reviewed-by: Pierre Gondois <pierre.gondois@arm.com> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20260206142658.72583-2-sumitg@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-02-27	Merge tag 'mmc-v7.0-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - Avoid bitfield RMW for claim/retune flags MMC host: - dw_mmc-rockchip: Fix runtime PM support for internal phase support - mmci: Fix device_node reference leak in of_get_dml_pipe_index() - sdhci-brcmstb: Use correct register offset for V1 pin_sel restore" * tag 'mmc-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: core: Avoid bitfield RMW for claim/retune flags mmc: sdhci-brcmstb: use correct register offset for V1 pin_sel restore mmc: dw_mmc-rockchip: Fix runtime PM support for internal phase support mmc: mmci: Fix device_node reference leak in of_get_dml_pipe_index()
2026-02-27	Merge tag 'slab-for-7.0-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fixes from Vlastimil Babka: - Fix for spurious page allocation warnings on sheaf refill (Harry Yoo) - Fix for CONFIG_MEM_ALLOC_PROFILING_DEBUG warnings (Suren Baghdasaryan) - Fix for kernel-doc warning on ksize() (Sanjay Chitroda) - Fix to avoid setting slab->stride later than on slab allocation. Doesn't yet fix the reports from powerpc; debugging is making progress (Harry Yoo) * tag 'slab-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: initialize slab->stride early to avoid memory ordering issues mm/slub: drop duplicate kernel-doc for ksize() mm/slab: mark alloc tags empty for sheaves allocated with __GFP_NO_OBJ_EXT mm/slab: pass __GFP_NOWARN to refill_sheaf() if fallback is available
2026-02-27	PCI/PTM: Do not enable PTM automatically for Root and Switch Upstream Ports	Mika Westerberg
	Currently we enable PTM automatically for Root and Switch Upstream Ports if the advertised capabilities support the relevant role. However, there are a few issues with this. First of all, if there is no Endpoint that actually needs the PTM functionality, this is just wasting link bandwidth. There are just a couple of drivers calling pci_ptm_enable() in the tree. Secondly, we do the enablement in pci_ptm_init() that is called pretty early for the Switch Upstream Port before Downstream Ports are even enumerated. Since the Upstream Port configuration affects the whole Switch, enabling it this early might cause PTM requests to be sent. We actually do see effects of this: pcieport 0000:00:07.1: pciehp: Slot(6-1): Card present pcieport 0000:00:07.1: pciehp: Slot(6-1): Link Up pci 0000:2c:00.0: [8086:5786] type 01 class 0x060400 PCIe Switch Upstream Port ... pci 0000:2c:00.0: PTM enabled, 4ns granularity At this point we have only enumerated the Switch Upstream Port and now PTM got enabled which immediately triggers a flood of errors: pcieport 0000:00:07.1: AER: Multiple Uncorrectable (Non-Fatal) error message received from 0000:00:07.1 pcieport 0000:00:07.1: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Receiver ID) pcieport 0000:00:07.1: device [8086:d44f] error status/mask=00200000/00000000 pcieport 0000:00:07.1: [21] ACSViol (First) pcieport 0000:00:07.1: AER: TLP Header: 0x34000000 0x00000052 0x00000000 0x00000000 pcieport 0000:00:07.1: AER: device recovery successful pcieport 0000:00:07.1: AER: Uncorrectable (Non-Fatal) error message received from 0000:00:07.1 In the above TLP Header the Requester ID is 0 which causes an error as we have ACS Source Validation enabled. Change the PTM enablement to happen at the time pci_enable_ptm() is called. It will try to enable PTM first for upstream devices before enabling for the Endpoint itself. For disable path we need to keep count of how many times PTM has been enabled and disable it only on the last, so change the dev->ptm_enabled to a counter (and rename it to dev->ptm_enable_cnt analogous to dev->pci_enable_cnt). Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20260224111044.3487873-6-mika.westerberg@linux.intel.com
2026-02-27	workqueue: Update documentation as per system_percpu_wq naming	Mallesh Koujalagi
	Update documentation to use "per-CPU workqueue" instead of "global workqueue" to match the system_wq to system_percpu_wq rename. The workqueue behavior remains unchanged; this just aligns terminology with the clearer naming. Fixes: a2be943b46b4 ("workqueue: replace use of system_wq with system_percpu_wq") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-02-27	Merge tag 'drm-fixes-2026-02-27' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Regular fixes pull, amdxdna and amdgpu are the main ones, with a couple of intel fixes, then a scattering of fixes across drivers, nothing too major. i915/display: - Fix Panel Replay stuck with X during mode transitions on Panther Lake xe: - W/a fix for multi-cast registers - Fix xe_sync initialization issues amdgpu: - UserQ fixes - DC fix - RAS fixes - VCN 5 fix - Slot reset fix - Remove MES workaround that's no longer needed amdxdna: - deadlock fix - NULL ptr deref fix - suspend failure fix - OOB access fix - buffer overflow fix - input sanitiation fix - firmware loading fix dw-dp: - An error handling fix ethosu: - A binary shift overflow fix imx: - An error handling fix logicvc: - A dt node reference leak fix nouveau: - A WARN_ON removal samsung-dsim: - A memory leak fix tiny: - sharp-memory: NULL pointer deref fix vmwgfx: - A reference count and error handling fix" * tag 'drm-fixes-2026-02-27' of https://gitlab.freedesktop.org/drm/kernel: (39 commits) drm/amd: Disable MES LR compute W/A drm/amdgpu: Fix error handling in slot reset drm/amdgpu/vcn5: Add SMU dpm interface type drm/amdgpu: Fix locking bugs in error paths drm/amdgpu: Unlock a mutex before destroying it drm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sink drm/amdgpu: add upper bound check on user inputs in wait ioctl drm/amdgpu: add upper bound check on user inputs in signal ioctl drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings drm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctl accel/amdxdna: Use a different name for latest firmware drm/client: Do not destroy NULL modes drm/gpusvm: Fix drm_gpusvm_pages_valid_unlocked() kernel-doc drm/xe/sync: Fix user fence leak on alloc failure drm/xe/sync: Cleanup partially initialized sync on parse failure drm/xe/wa: Steer RMW of MCR registers while building default LRC accel/amdxdna: Validate command buffer payload count accel/amdxdna: Prevent ubuf size overflow accel/amdxdna: Fix out-of-bounds memset in command slot handling accel/amdxdna: Fix command hang on suspended hardware context ...
2026-02-27	PCI: Correct PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value	Bjorn Helgaas
	fb82437fdd8c ("PCI: Change capability register offsets to hex") incorrectly converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex 0x32: -#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here / +#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 / end of v2 EPs w/ link */ This broke PCI capabilities in a VMM because subsequent ones weren't DWORD-aligned. Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34. fb82437fdd8c was from Baruch Siach <baruch@tkos.co.il>, but this was not Baruch's fault; it's a mistake I made when applying the patch. Fixes: fb82437fdd8c ("PCI: Change capability register offsets to hex") Reported-by: David Woodhouse <dwmw2@infradead.org> Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2026-02-27	ww-mutex: Fix the ww_acquire_ctx function annotations	Bart Van Assche
	The ww_acquire_done() call is optional. Reflect this in the annotations of ww_acquire_done(). Fixes: 47907461e4f6 ("locking/ww_mutex: Support Clang's context analysis") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20260225183244.4035378-4-bvanassche@acm.org
2026-02-27	signal: Fix the lock_task_sighand() annotation	Bart Van Assche
	lock_task_sighand() may return NULL. Make this clear in its lock context annotation. Fixes: 04e49d926f43 ("sched: Enable context analysis for core.c and fair.c") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20260225183244.4035378-3-bvanassche@acm.org
2026-02-27	locking: Fix rwlock and spinlock lock context annotations	Bart Van Assche
	Fix two incorrect rwlock_t lock context annotations. Add the raw_spinlock_t lock context annotations that are missing. Fixes: f16a802d402d ("locking/rwlock, spinlock: Support Clang's context analysis") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20260225183244.4035378-2-bvanassche@acm.org
2026-02-27	hrtimer: Use linked timerqueue	Thomas Gleixner
	To prepare for optimizing the rearming of enqueued timers, switch to the linked timerqueue. That allows to check whether the new expiry time changes the position of the timer in the RB tree or not, by checking the new expiry time against the previous and the next timers expiry. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.806643179@kernel.org
2026-02-27	timerqueue: Provide linked timerqueue	Thomas Gleixner
	The hrtimer subsystem wants to peak ahead to the next and previous timer to evaluated whether a to be rearmed timer can stay at the same position in the RB tree with the new expiry time. The linked RB tree provides the infrastructure for this as it maintains links to the previous and next nodes for each entry in the tree. Provide timerqueue wrappers around that. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.734827095@kernel.org
2026-02-27	rbtree: Provide rbtree with links	Thomas Gleixner
	Some RB tree users require quick access to the next and the previous node, e.g. to check whether a modification of the node results in a change of the nodes position in the tree. If the node position does not change, then the modification can happen in place without going through a full enqueue requeue cycle. A upcoming use case for this are the timer queues of the hrtimer subsystem as they can optimize for timers which are frequently rearmed while enqueued. This can be obviously achieved with rb_next() and rb_prev(), but those turned out to be quite expensive for hotpath operations depending on the tree depth. Add a linked RB tree variant where add() and erase() maintain the links between the nodes. Like the cached variant it provides a pointer to the left most node in the root. It intentionally does not use a [h]list head as there is no real need for true list operations as the list is strictly coupled to the tree and and cannot be manipulated independently. It sets the nodes previous pointer to NULL for the left most node and the next pointer to NULL for the right most node. This allows a quick check especially for the left most node without consulting the list head address, which creates better code. Aside of the rb_leftmost cached pointer this could trivially provide a rb_rightmost pointer as well, but there is no usage for that (yet). Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.668401024@kernel.org
2026-02-27	hrtimer: Keep track of first expiring timer per clock base	Thomas Gleixner
	Evaluating the next expiry time of all clock bases is cache line expensive as the expiry time of the first expiring timer is not cached in the base and requires to access the timer itself, which is definitely in a different cache line. It's way more efficient to keep track of the expiry time on enqueue and dequeue operations as the relevant data is already in the cache at that point. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.404839710@kernel.org
2026-02-27	hrtimer: Avoid re-evaluation when nothing changed	Thomas Gleixner
	Most times there is no change between hrtimer_interrupt() deferring the rearm and the invocation of hrtimer_rearm_deferred(). In those cases it's a pointless exercise to re-evaluate the next expiring timer. Cache the required data and use it if nothing changed. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.338569372@kernel.org
2026-02-27	hrtimer: Push reprogramming timers into the interrupt return path	Peter Zijlstra
	Currently hrtimer_interrupt() runs expired timers, which can re-arm themselves, after which it computes the next expiration time and re-programs the hardware. However, things like HRTICK, a highres timer driving preemption, cannot re-arm itself at the point of running, since the next task has not been determined yet. The schedule() in the interrupt return path will switch to the next task, which then causes a new hrtimer to be programmed. This then results in reprogramming the hardware at least twice, once after running the timers, and once upon selecting the new task. Notably, both events happen in the interrupt. By pushing the hrtimer reprogram all the way into the interrupt return path, it runs after schedule() picks the new task and the double reprogram can be avoided. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163431.273488269@kernel.org