summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-10-16net: Introduce net.core.bypass_prot_mem sysctl.Kuniyuki Iwashima
If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out of the global protocol memory accounting. Let's control the flag by a new sysctl knob. The flag is written once during socket(2) and is inherited to child sockets. Tested with a script that creates local socket pairs and send()s a bunch of data without recv()ing. Setup: # mkdir /sys/fs/cgroup/test # echo $$ >> /sys/fs/cgroup/test/cgroup.procs # sysctl -q net.ipv4.tcp_mem="1000 1000 1000" # ulimit -n 524288 Without net.core.bypass_prot_mem, charged to tcp_mem & memcg # python3 pressure.py & # cat /sys/fs/cgroup/test/memory.stat | grep sock sock 22642688 <-------------------------------------- charged to memcg # cat /proc/net/sockstat| grep TCP TCP: inuse 2006 orphan 0 tw 0 alloc 2008 mem 5376 <-- charged to tcp_mem # ss -tn | head -n 5 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:53188 ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:49972 ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:53868 ESTAB 2000 0 127.0.0.1:34479 127.0.0.1:53554 # nstat | grep Pressure || echo no pressure TcpExtTCPMemoryPressures 1 0.0 With net.core.bypass_prot_mem=1, charged to memcg only: # sysctl -q net.core.bypass_prot_mem=1 # python3 pressure.py & # cat /sys/fs/cgroup/test/memory.stat | grep sock sock 2757468160 <------------------------------------ charged to memcg # cat /proc/net/sockstat | grep TCP TCP: inuse 2006 orphan 0 tw 0 alloc 2008 mem 0 <- NOT charged to tcp_mem # ss -tn | head -n 5 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 111000 0 127.0.0.1:36019 127.0.0.1:49026 ESTAB 110000 0 127.0.0.1:36019 127.0.0.1:45630 ESTAB 110000 0 127.0.0.1:36019 127.0.0.1:44870 ESTAB 111000 0 127.0.0.1:36019 127.0.0.1:45274 # nstat | grep Pressure || echo no pressure no pressure Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://patch.msgid.link/20251014235604.3057003-4-kuniyu@google.com
2025-10-16net: Allow opt-out from global protocol memory accounting.Kuniyuki Iwashima
Some protocols (e.g., TCP, UDP) implement memory accounting for socket buffers and charge memory to per-protocol global counters pointed to by sk->sk_proto->memory_allocated. Sometimes, system processes do not want that limitation. For a similar purpose, there is SO_RESERVE_MEM for sockets under memcg. Also, by opting out of the per-protocol accounting, sockets under memcg can avoid paying costs for two orthogonal memory accounting mechanisms. A microbenchmark result is in the subsequent bpf patch. Let's allow opt-out from the per-protocol memory accounting if sk->sk_bypass_prot_mem is true. sk->sk_bypass_prot_mem and sk->sk_prot are placed in the same cache line, and sk_has_account() always fetches sk->sk_prot before accessing sk->sk_bypass_prot_mem, so there is no extra cache miss for this patch. The following patches will set sk->sk_bypass_prot_mem to true, and then, the per-protocol memory accounting will be skipped. Note that this does NOT disable memcg, but rather the per-protocol one. Another option not to use the hole in struct sock_common is create sk_prot variants like tcp_prot_bypass, but this would complicate SOCKMAP logic, tcp_bpf_prots etc. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Link: https://patch.msgid.link/20251014235604.3057003-3-kuniyu@google.com
2025-10-16sched_ext: Merge branch 'sched/core' of ↵Tejun Heo
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-6.19 Pull in tip/sched/core to receive: 50653216e4ff ("sched: Add support to pick functions to take rf") 4c95380701f5 ("sched/ext: Fold balance_scx() into pick_task_scx()") which will enable clean integration of DL server support among other things. This conflicts with the following from sched_ext/for-6.18-fixes: a8ad873113d3 ("sched_ext: defer queue_balance_callback() until after ops.dispatch") which adds maybe_queue_balance_callback() to balance_scx() which is removed by 50653216e4ff. Resolve by moving the invocation to pick_task_scx() in the equivalent location. Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.18-rc2). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-16Merge tag 'net-6.18-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from CAN Current release - regressions: - udp: do not use skb_release_head_state() before skb_attempt_defer_free() - gro_cells: use nested-BH locking for gro_cell - dpll: zl3073x: increase maximum size of flash utility Previous releases - regressions: - core: fix lockdep splat on device unregister - tcp: fix tcp_tso_should_defer() vs large RTT - tls: - don't rely on tx_work during send() - wait for pending async decryptions if tls_strp_msg_hold fails - can: j1939: add missing calls in NETDEV_UNREGISTER notification handler - eth: lan78xx: fix lost EEPROM write timeout in lan78xx_write_raw_eeprom Previous releases - always broken: - ip6_tunnel: prevent perpetual tunnel growth - dpll: zl3073x: handle missing or corrupted flash configuration - can: m_can: fix pm_runtime and CAN state handling - eth: - ixgbe: fix too early devlink_free() in ixgbe_remove() - ixgbevf: fix mailbox API compatibility - gve: Check valid ts bit on RX descriptor before hw timestamping - idpf: cleanup remaining SKBs in PTP flows - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H" * tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits) udp: do not use skb_release_head_state() before skb_attempt_defer_free() net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset netdevsim: set the carrier when the device goes up selftests: tls: add test for short splice due to full skmsg selftests: net: tls: add tests for cmsg vs MSG_MORE tls: don't rely on tx_work during send() tls: wait for pending async decryptions if tls_strp_msg_hold fails tls: always set record_type in tls_process_cmsg tls: wait for async encrypt in case of error during latter iterations of sendmsg tls: trim encrypted message to match the plaintext on short splice tg3: prevent use of uninitialized remote_adv and local_adv variables MAINTAINERS: new entry for IPv6 IOAM gve: Check valid ts bit on RX descriptor before hw timestamping net: core: fix lockdep splat on device unregister MAINTAINERS: add myself as maintainer for b53 selftests: net: check jq command is supported net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit() tcp: fix tcp_tso_should_defer() vs large RTT r8152: add error handling in rtl8152_driver_init usbnet: Fix using smp_processor_id() in preemptible code warnings ...
2025-10-16accel/amdxdna: Support getting last hardware errorLizhi Hou
Add new parameter DRM_AMDXDNA_HW_LAST_ASYNC_ERR to get array IOCTL. When hardware reports an error, the driver save the error information and timestamp. This new get array parameter retrieves the last error. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://lore.kernel.org/r/20251014234119.628453-1-lizhi.hou@amd.com
2025-10-16irqchip: Pass platform device to platform driversJohan Hovold
The IRQCHIP_PLATFORM_DRIVER macros can be used to convert OF irqchip drivers to platform drivers but currently reuse the OF init callback prototype that only takes OF nodes as arguments. This forces drivers to do reverse lookups of their struct devices during probe if they need them for things like dev_printk() and device managed resources. Half of the drivers doing reverse lookups also currently fail to release the additional reference taken during the lookup, while other drivers have had the reference leak plugged in various ways (e.g. using non-intuitive cleanup constructs which still confuse static checkers). Switch to using a probe callback that takes a platform device as its first argument to simplify drivers and plug the remaining (mostly benign) reference leaks. Fixes: 32c6c054661a ("irqchip: Add Broadcom BCM2712 MSI-X interrupt controller") Fixes: 70afdab904d2 ("irqchip: Add IMX MU MSI controller driver") Fixes: a6199bb514d8 ("irqchip: Add Qualcomm MPM controller driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Changhuang Liang <changhuang.liang@starfivetech.com>
2025-10-16gpiolib: of: Get rid of <linux/gpio/legacy-of-mm-gpiochip.h>Christophe Leroy
Last user of linux/gpio/legacy-of-mm-gpiochip.h is gone. Remove linux/gpio/legacy-of-mm-gpiochip.h and CONFIG_OF_GPIO_MM_GPIOCHIP Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-10-16gpio: regmap: add the .fixed_direction_output configuration parameterIoana Ciornei
There are GPIO controllers such as the one present in the LX2160ARDB QIXIS FPGA which have fixed-direction input and output GPIO lines mixed together in a single register. This cannot be modeled using the gpio-regmap as-is since there is no way to present the true direction of a GPIO line. In order to make this use case possible, add a new configuration parameter - fixed_direction_output - into the gpio_regmap_config structure. This will enable user drivers to provide a bitmap that represents the fixed direction of the GPIO lines. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Michael Walle <mwalle@kernel.org> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
2025-10-16dt-bindings: power: Add power domain IDs for Tegra264Thierry Reding
Add the set of power domain IDs available on the Tegra264 SoC so that they can be used in device tree files. Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-10-16sched: Add support to pick functions to take rfJoel Fernandes
Some pick functions like the internal pick_next_task_fair() already take rf but some others dont. We need this for scx's server pick function. Prepare for this by having pick functions accept it. [peterz: - added RETRY_TASK handling - removed pick_next_task_fair indirection] Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org>
2025-10-16sched: Rename do_set_cpus_allowed()Peter Zijlstra
Hopefully saner naming. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
2025-10-16sched: Employ sched_change guardsPeter Zijlstra
As proposed a long while ago -- and half done by scx -- wrap the scheduler's 'change' pattern in a guard helper. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
2025-10-15net: remove obsolete WARN_ON(refcount_read(&sk->sk_refcnt) == 1)Eric Dumazet
sk->sk_refcnt has been converted to refcount_t in 2017. __sock_put(sk) being refcount_dec(&sk->sk_refcnt), it will complain loudly if the current refcnt is 1 (or less) in a non racy way. We can remove four WARN_ON() in favor of the generic refcount_dec() check. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Xuanqiang Luo<luoxuanqiang@kylinos.cn> Link: https://patch.msgid.link/20251014140605.2982703-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15drm/bridge: dw-hdmi-qp: Fixup timer base setupCristian Ciocaltea
Currently the TIMER_BASE_CONFIG0 register gets initialized to a fixed value as initially found in vendor driver code supporting the RK3588 SoC. As a matter of fact the value matches the rate of the HDMI TX reference clock, which is roughly 428.57 MHz. However, on RK3576 SoC that rate is slightly lower, i.e. 396.00 MHz, and the incorrect register configuration breaks CEC functionality. Set the timer base according to the actual reference clock rate that shall be provided by the platform driver. Otherwise fallback to the vendor default. While at it, also drop the unnecessary empty lines in dw_hdmi_qp_init_hw(). Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250903-rk3588-hdmi-cec-v4-2-fa25163c4b08@collabora.com
2025-10-15drm/bridge: dw-hdmi-qp: Add CEC supportCristian Ciocaltea
Add support for the CEC interface of the Synopsys DesignWare HDMI QP TX controller. This is based on the downstream implementation, but rewritten on top of the CEC helpers added recently to the DRM HDMI connector framework. Also note struct dw_hdmi_qp_plat_data has been extended to include the CEC IRQ number to be provided by the platform driver. Co-developed-by: Algea Cao <algea.cao@rock-chips.com> Signed-off-by: Algea Cao <algea.cao@rock-chips.com> Co-developed-by: Derek Foreman <derek.foreman@collabora.com> Signed-off-by: Derek Foreman <derek.foreman@collabora.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Heiko Stuebner <heiko@sntech.de> Link: https://lore.kernel.org/r/20250903-rk3588-hdmi-cec-v4-1-fa25163c4b08@collabora.com
2025-10-15hung_task: fix warnings caused by unaligned lock pointersLance Yang
The blocker tracking mechanism assumes that lock pointers are at least 4-byte aligned to use their lower bits for type encoding. However, as reported by Eero Tamminen, some architectures like m68k only guarantee 2-byte alignment of 32-bit values. This breaks the assumption and causes two related WARN_ON_ONCE checks to trigger. To fix this, the runtime checks are adjusted to silently ignore any lock that is not 4-byte aligned, effectively disabling the feature in such cases and avoiding the related warnings. Thanks to Geert Uytterhoeven for bisecting! Link: https://lkml.kernel.org/r/20250909145243.17119-1-lance.yang@linux.dev Fixes: e711faaafbe5 ("hung_task: replace blocker_mutex with encoded blocker") Signed-off-by: Lance Yang <lance.yang@linux.dev> Reported-by: Eero Tamminen <oak@helsinkinet.fi> Closes: https://lore.kernel.org/lkml/CAMuHMdW7Ab13DdGs2acMQcix5ObJK0O2dG_Fxzr8_g58Rc1_0g@mail.gmail.com Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Finn Thain <fthain@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Lance Yang <lance.yang@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-15sched_ext: Add lockless peek operation for DSQsRyan Newton
The builtin DSQ queue data structures are meant to be used by a wide range of different sched_ext schedulers with different demands on these data structures. They might be per-cpu with low-contention, or high-contention shared queues. Unfortunately, DSQs have a coarse-grained lock around the whole data structure. Without going all the way to a lock-free, more scalable implementation, a small step we can take to reduce lock contention is to allow a lockless, small-fixed-cost peek at the head of the queue. This change allows certain custom SCX schedulers to cheaply peek at queues, e.g. during load balancing, before locking them. But it represents a few extra memory operations to update the pointer each time the DSQ is modified, including a memory barrier on ARM so the write appears correctly ordered. This commit adds a first_task pointer field which is updated atomically when the DSQ is modified, and allows any thread to peek at the head of the queue without holding the lock. Signed-off-by: Ryan Newton <newton@meta.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-15drm/gpuvm: Fix kernel-doc warning for drm_gpuvm_map_req.mapAnkan Biswas
The kernel-doc for struct drm_gpuvm_map_req.map was added as '@op_map' instead of '@map', leading to this warning during htmldocs build: WARNING: include/drm/drm_gpuvm.h:1083 struct member 'map' not described in 'drm_gpuvm_map_req' Fixes: 000a45dce7ad ("drm/gpuvm: Pass map arguments through a struct") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/all/20250821133539.03aa298e@canb.auug.org.au/ Signed-off-by: Ankan Biswas <spyjetfayed@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2025-10-15net: bcmgenet: remove unused platform codeHeiner Kallweit
This effectively reverts b0ba512e25d7 ("net: bcmgenet: enable driver to work without a device tree"). There has never been an in-tree user of struct bcmgenet_platform_data, all devices use OF or ACPI. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/108b4e64-55d4-4b4e-9a11-3c810c319d66@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15net: allow busy connected flows to switch tx queuesEric Dumazet
This is a followup of commit 726e9e8b94b9 ("tcp: refine skb->ooo_okay setting") and of prior commit in this series ("net: control skb->ooo_okay from skb_set_owner_w()") skb->ooo_okay might never be set for bulk flows that always have at least one skb in a qdisc queue of NIC queue, especially if TX completion is delayed because of a stressed cpu. The so-called "strange attractors" has caused many performance issues (see for instance 9b462d02d6dd ("tcp: TCP Small Queues and strange attractors")), we need to do better. We have tried very hard to avoid reorders because TCP was not dealing with them nicely a decade ago. Use the new net.core.txq_reselection_ms sysctl to let flows follow XPS and select a more efficient queue. After this patch, we no longer have to make sure threads are pinned to cpus, they now can be migrated without adding too much spinlock/qdisc/TX completion pressure anymore. TX completion part was problematic, because it added false sharing on various socket fields, but also added false sharing and spinlock contention in mm layers. Calling skb_orphan() from ndo_start_xmit() is not an option unfortunately. Note for later: 1) move sk->sk_tx_queue_mapping closer to sk_tx_queue_mapping_jiffies for better cache locality. 2) Study if 9b462d02d6dd ("tcp: TCP Small Queues and strange attractors") could be revised. Tested: Used a host with 32 TX queues, shared by groups of 8 cores. XPS setup : echo ff >/sys/class/net/eth1/queue/tx-0/xps_cpus echo ff00 >/sys/class/net/eth1/queue/tx-1/xps_cpus echo ff0000 >/sys/class/net/eth1/queue/tx-2/xps_cpus echo ff000000 >/sys/class/net/eth1/queue/tx-3/xps_cpus echo ff,00000000 >/sys/class/net/eth1/queue/tx-4/xps_cpus echo ff00,00000000 >/sys/class/net/eth1/queue/tx-5/xps_cpus echo ff0000,00000000 >/sys/class/net/eth1/queue/tx-6/xps_cpus echo ff000000,00000000 >/sys/class/net/eth1/queue/tx-7/xps_cpus ... Launched a tcp_stream with 15 threads and 1000 flows, initially affined to core 0-15 taskset -c 0-15 tcp_stream -T15 -F1000 -l1000 -c -H target_host Checked that only queues 0 and 1 are used as instructed by XPS : tc -s qdisc show dev eth1|grep backlog|grep -v "backlog 0b 0p" backlog 123489410b 1890p backlog 69809026b 1064p backlog 52401054b 805p Then force each thread to run on cpu 1,9,17,25,33,41,49,57,65,73,81,89,97,105,113,121 C=1;PID=`pidof tcp_stream`;for P in `ls /proc/$PID/task`; do taskset -pc $C $P; C=$(($C + 8));done Set txq_reselection_ms to 1000 echo 1000 > /proc/sys/net/core/txq_reselection_ms Check that the flows have migrated nicely: tc -s qdisc show dev eth1|grep backlog|grep -v "backlog 0b 0p" backlog 130508314b 1916p backlog 8584380b 126p backlog 8584380b 126p backlog 8379990b 123p backlog 8584380b 126p backlog 8487484b 125p backlog 8584380b 126p backlog 8448120b 124p backlog 8584380b 126p backlog 8720640b 128p backlog 8856900b 130p backlog 8584380b 126p backlog 8652510b 127p backlog 8448120b 124p backlog 8516250b 125p backlog 7834950b 115p Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251013152234.842065-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15net: add /proc/sys/net/core/txq_reselection_ms controlEric Dumazet
Add a new sysctl to control how often a queue reselection can happen even if a flow has a persistent queue of skbs in a Qdisc or NIC queue. A value of zero means the feature is disabled. Default is 1000 (1 second). This sysctl is used in the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251013152234.842065-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15net: add SK_WMEM_ALLOC_BIAS constantEric Dumazet
sk->sk_wmem_alloc is initialized to 1, and sk_wmem_alloc_get() takes care of this initial value. Add SK_WMEM_ALLOC_BIAS define to not spread this magic value. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251013152234.842065-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15tcp: better handle TCP_TX_DELAY on established flowsEric Dumazet
Some applications uses TCP_TX_DELAY socket option after TCP flow is established. Some metrics need to be updated, otherwise TCP might take time to adapt to the new (emulated) RTT. This patch adjusts tp->srtt_us, tp->rtt_min, icsk_rto and sk->sk_pacing_rate. This is best effort, and for instance icsk_rto is reset without taking backoff into account. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251013145926.833198-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-15drm/xe/uapi: Add documentation for DRM_XE_GEM_CREATE_FLAG_SCANOUTSanjay Yadav
Add documentation for drm_xe_gem_create structure flag DRM_XE_GEM_CREATE_FLAG_SCANOUT. Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251014142823.3701228-2-sanjay.kumar.yadav@intel.com
2025-10-15ASoC: use sof_sdw as default Intel SOF SDW machineMark Brown
Merge series from Bard Liao <yung-chuan.liao@linux.intel.com>: Currently, we create a ACPI mach table for every new audio configuration. And all Intel SOF SoundWire configurations point to the same sof_sdw machine driver. Also, we don't need a specific topology for a coufguration, we can use the function topology instead. That give us a change to generate an ACPI mach table based on the SoundWire codec information reported by the ACPI table and use the sof_sdw machine driver as the default machine driver. This will reduce the effort to support a new Intel SOF SoundWire audio configuration.
2025-10-15bpf: Consistently use bpf_rcu_lock_held() everywhereAndrii Nakryiko
We have many places which open-code what's now is bpf_rcu_lock_held() macro, so replace all those places with a clean and short macro invocation. For that, move bpf_rcu_lock_held() macro into include/linux/bpf.h. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20251014201403.4104511-1-andrii@kernel.org
2025-10-15bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate ↵Alexei Starovoitov
bpf_async_cb structures. The following kmemleak splat: [ 8.105530] kmemleak: Trying to color unknown object at 0xff11000100e918c0 as Black [ 8.106521] Call Trace: [ 8.106521] <TASK> [ 8.106521] dump_stack_lvl+0x4b/0x70 [ 8.106521] kvfree_call_rcu+0xcb/0x3b0 [ 8.106521] ? hrtimer_cancel+0x21/0x40 [ 8.106521] bpf_obj_free_fields+0x193/0x200 [ 8.106521] htab_map_update_elem+0x29c/0x410 [ 8.106521] bpf_prog_cfc8cd0f42c04044_overwrite_cb+0x47/0x4b [ 8.106521] bpf_prog_8c30cd7c4db2e963_overwrite_timer+0x65/0x86 [ 8.106521] bpf_prog_test_run_syscall+0xe1/0x2a0 happens due to the combination of features and fixes, but mainly due to commit 6d78b4473cdb ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()") It's using __GFP_HIGH, which instructs slub/kmemleak internals to skip kmemleak_alloc_recursive() on allocation, so subsequent kfree_rcu()-> kvfree_call_rcu()->kmemleak_ignore() complains with the above splat. To fix this imbalance, replace bpf_map_kmalloc_node() with kmalloc_nolock() and kfree_rcu() with call_rcu() + kfree_nolock() to make sure that the objects allocated with kmalloc_nolock() are freed with kfree_nolock() rather than the implicit kfree() that kfree_rcu() uses internally. Note, the kmalloc_nolock() happens under bpf_spin_lock_irqsave(), so it will always fail in PREEMPT_RT. This is not an issue at the moment, since bpf_timers are disabled in PREEMPT_RT. In the future bpf_spin_lock will be replaced with state machine similar to bpf_task_work. Fixes: 6d78b4473cdb ("bpf: Tell memcg to use allow_spinning=false path in bpf_timer_init()") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: linux-mm@kvack.org Link: https://lore.kernel.org/bpf/20251015000700.28988-1-alexei.starovoitov@gmail.com
2025-10-15regulator: core: forward undervoltage events downstream by defaultOleksij Rempel
Forward critical supply events downstream so consumers can react in time. An under-voltage event on an upstream rail may otherwise never reach end devices (e.g. eMMC). Register a notifier on a regulator's supply when the supply is resolved, and forward only REGULATOR_EVENT_UNDER_VOLTAGE to the consumer's notifier chain. Event handling is deferred to process context via a workqueue; the consumer rdev is lifetime-pinned and the rdev lock is held while calling the notifier chain. The notifier is unregistered on regulator teardown. No DT/UAPI changes. Behavior applies to all regulators with a supply. Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://patch.msgid.link/20251001105650.2391477-1-o.rempel@pengutronix.de Signed-off-by: Mark Brown <broonie@kernel.org>
2025-10-14net: mdio: use macro module_driver to avoid boilerplate codeHeiner Kallweit
Use macro module_driver to avoid boilerplate code. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/e5c37417-4984-4b57-8154-264deef61e0d@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-14livepatch: Introduce source code helpers for livepatch modulesJosh Poimboeuf
Add some helper macros which can be used by livepatch source .patch files to register callbacks, convert static calls to regular calls where needed, and patch syscalls. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14objtool/klp: Introduce klp diff subcommand for diffing object filesJosh Poimboeuf
Add a new klp diff subcommand which performs a binary diff between two object files and extracts changed functions into a new object which can then be linked into a livepatch module. This builds on concepts from the longstanding out-of-tree kpatch [1] project which began in 2012 and has been used for many years to generate livepatch modules for production kernels. However, this is a complete rewrite which incorporates hard-earned lessons from 12+ years of maintaining kpatch. Key improvements compared to kpatch-build: - Integrated with objtool: Leverages objtool's existing control-flow graph analysis to help detect changed functions. - Works on vmlinux.o: Supports late-linked objects, making it compatible with LTO, IBT, and similar. - Simplified code base: ~3k fewer lines of code. - Upstream: No more out-of-tree #ifdef hacks, far less cruft. - Cleaner internals: Vastly simplified logic for symbol/section/reloc inclusion and special section extraction. - Robust __LINE__ macro handling: Avoids false positive binary diffs caused by the __LINE__ macro by introducing a fix-patch-lines script (coming in a later patch) which injects #line directives into the source .patch to preserve the original line numbers at compile time. Note the end result of this subcommand is not yet functionally complete. Livepatch needs some ELF magic which linkers don't like: - Two relocation sections (.rela*, .klp.rela*) for the same text section. - Use of SHN_LIVEPATCH to mark livepatch symbols. Unfortunately linkers tend to mangle such things. To work around that, klp diff generates a linker-compliant intermediate binary which encodes the relevant KLP section/reloc/symbol metadata. After module linking, a klp post-link step (coming soon) will clean up the mess and convert the linked .ko into a fully compliant livepatch module. Note this subcommand requires the diffed binaries to have been compiled with -ffunction-sections and -fdata-sections, and processed with 'objtool --checksum'. Those constraints will be handled by a klp-build script introduced in a later patch. Without '-ffunction-sections -fdata-sections', reliable object diffing would be infeasible due to toolchain limitations: - For intra-file+intra-section references, the compiler might occasionally generated hard-coded instruction offsets instead of relocations. - Section-symbol-based references can be ambiguous: - Overlapping or zero-length symbols create ambiguity as to which symbol is being referenced. - A reference to the end of a symbol (e.g., checking array bounds) can be misinterpreted as a reference to the next symbol, or vice versa. A potential future alternative to '-ffunction-sections -fdata-sections' would be to introduce a toolchain option that forces symbol-based (non-section) relocations. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14objtool: Unify STACK_FRAME_NON_STANDARD entry sizesJosh Poimboeuf
The C implementation of STACK_FRAME_NON_STANDARD emits 8-byte entries, whereas the asm version's entries are only 4 bytes. Make them consistent by converting the asm version to 8-byte entries. This is much easier than converting the C version to 4-bytes, which would require awkwardly putting inline asm in a dummy function in order to pass the 'func' pointer to the asm. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14x86/asm: Annotate special section entriesJosh Poimboeuf
In preparation for the objtool klp diff subcommand, add annotations for special section entries. This will enable objtool to determine the size and location of the entries and to extract them when needed. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14objtool: Add ANNOTATE_DATA_SPECIALJosh Poimboeuf
In preparation for the objtool klp diff subcommand, add an ANNOTATE_DATA_SPECIAL macro which annotates special section entries so that objtool can determine their size and location and extract them when needed. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14objtool: Move ANNOTATE* macros to annotate.hJosh Poimboeuf
In preparation for using the objtool annotation macros in higher-level objtool.h macros like UNWIND_HINT, move them to their own file. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14interval_tree: Fix ITSTATIC usage for *_subtree_search()Josh Poimboeuf
For consistency with the other function templates, change _subtree_search_*() to use the user-supplied ITSTATIC rather than the hard-coded 'static'. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14kbuild: Remove 'kmod_' prefix from __KBUILD_MODNAMEJosh Poimboeuf
In preparation for the objtool klp diff subcommand, remove the arbitrary 'kmod_' prefix from __KBUILD_MODNAME and instead add it explicitly in the __initcall_id() macro. This change supports the standardization of "unique" symbol naming by ensuring the non-unique portion of the name comes before the unique part. That will enable objtool to properly correlate symbols across builds. Cc: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14elfnote: Change ELFNOTE() to use __UNIQUE_ID()Josh Poimboeuf
In preparation for the objtool klp diff subcommand, replace the custom unique symbol name generation in ELFNOTE() with __UNIQUE_ID(). This standardizes the naming format for all "unique" symbols, which will allow objtool to properly correlate them. Note this also removes the "one ELF note per line" limitation. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14compiler.h: Make addressable symbols less of an eyesoreJosh Poimboeuf
Avoid underscore overload by changing: __UNIQUE_ID___addressable_loops_per_jiffy_868 to the following: __UNIQUE_ID_addressable_loops_per_jiffy_868 This matches the format used by other __UNIQUE_ID()-generated symbols and improves readability for those who stare at ELF symbol table dumps. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14compiler: Tweak __UNIQUE_ID() namingJosh Poimboeuf
In preparation for the objtool klp diff subcommand, add an underscore between the name and the counter. This will make it possible for objtool to distinguish between the non-unique and unique parts of the symbol name so it can properly correlate the symbols. Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14vmlinux.lds: Unify TEXT_MAIN, DATA_MAIN, and related macrosJosh Poimboeuf
TEXT_MAIN, DATA_MAIN and friends are defined differently depending on whether certain config options enable -ffunction-sections and/or -fdata-sections. There's no technical reason for that beyond voodoo coding. Keeping the separate implementations adds unnecessary complexity, fragments the logic, and increases the risk of subtle bugs. Unify the macros by using the same input section patterns across all configs. This is a prerequisite for the upcoming livepatch klp-build tooling which will manually enable -ffunction-sections and -fdata-sections via KCFLAGS. Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Petr Mladek <pmladek@suse.com> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2025-10-14sched/ext: Implement cgroup_set_idle() callbackzhidao su
Implement the missing cgroup_set_idle() callback that was marked as a TODO. This allows BPF schedulers to be notified when a cgroup's idle state changes, enabling them to adjust their scheduling behavior accordingly. The implementation follows the same pattern as other cgroup callbacks like cgroup_set_weight() and cgroup_set_bandwidth(). It checks if the BPF scheduler has implemented the callback and invokes it with the appropriate parameters. Fixes a spelling error in the cgroup_set_bandwidth() documentation. tj: s/scx_cgroup_rwsem/scx_cgroup_ops_rwsem/ to fix build breakage. Signed-off-by: zhidao su <soolaugust@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-10-14drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workaroundsMatt Roper
When Wa_22010954014 and Wa_14022085890 were first implemented, we didn't have a device workaround infrastructure so we hacked them into the GT workaround list. Now that we have proper device workaround support, move them to the proper place. Note that Wa_14022085890 specifically applies to BMG-G21 platforms, so this requires defining a BMG subplatform to capture the correct subset of device IDs. Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20251013200944.2499947-40-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-10-14media: v4l2-mem2mem: Fix outdated documentationLaurent Pinchart
Commit cbd9463da1b1 ("media: v4l2-mem2mem: Avoid calling .device_run in v4l2_m2m_job_finish") deferred calls to .device_run() to a work queue to avoid recursive calls when a job is finished right away from .device_run(). It failed to update the v4l2_m2m_job_finish() documentation that still states the function must not be called from .device_run(). Fix it. Fixes: cbd9463da1b1 ("media: v4l2-mem2mem: Avoid calling .device_run in v4l2_m2m_job_finish") Cc: stable@vger.kernel.org Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2025-10-14media: include: remove c8sectpfe headerRaphael Gallais-Pou
Driver is not used anymore. Remove header file. Signed-off-by: Raphael Gallais-Pou <rgallaispou@gmail.com> Reviewed-by: Patrice Chotard <patrice.chotard@foss.st.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
2025-10-14mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiersSumanth Korikkar
MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers were introduced to prepare the transition of memory to and from a physically accessible state. This enhancement was crucial for implementing the "memmap on memory" feature for s390. With introduction of dynamic (de)configuration of hotpluggable memory, memory can be brought to accessible state before add_memory(). Memory can be brought to inaccessible state before remove_memory(). Hence, there is no need of MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers anymore. This basically reverts commit c5f1e2d18909 ("mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers") Additionally, apply minor adjustments to the function parameters of move_pfn_range_to_zone() and mhp_supports_memmap_on_memory() to ensure compatibility with the latest branch. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-14netmem: replace __netmem_clear_lsb() with netmem_to_nmdesc()Byungchul Park
Now that we have struct netmem_desc, it'd better access the pp fields via struct netmem_desc rather than struct net_iov. Introduce netmem_to_nmdesc() for safely converting netmem_ref to netmem_desc regardless of the type underneath e.i. netmem_desc, net_iov. While at it, remove __netmem_clear_lsb() and make netmem_to_nmdesc() used instead. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/20251013044133.69472-1-byungchul@sk.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-14HID: core: Add printk_ratelimited variants to hid_warn() etcVicki Pfau
hid_warn_ratelimited() is needed. Add the others as part of the block. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jiri Kosina <jkosina@suse.com>
2025-10-14dt-bindings: clock: renesas,r9a09g047-cpg: Add USB2 PHY core clocksTommaso Merciai
Add definitions for USB2 PHY core clocks in the R9A09G047 CPG DT bindings header file. Signed-off-by: Tommaso Merciai <tommaso.merciai.xr@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251001212709.579080-9-tommaso.merciai.xr@bp.renesas.com Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>