linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
2026-05-21	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-7.1-rc5). No conflicts, adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c cc199cd1b912 ("net/mlx5e: Reduce branches in napi poll") c326f9c68921 ("net/mlx5e: xsk: Fix unlocked writing to ICOSQ") drivers/net/ethernet/mellanox/mlx5/core/eswitch.c c6df9a65cbb0 ("net/mlx5: Skip disabled vports when setting max TX speed") 1fba57c91416 ("net/mlx5: Add VHCA_ID page management mode support") net/mac80211/mlme.c a6e6ccd5bd07 ("wifi: mac80211: consume only present negotiated TTLM maps") 49e62ec6eb06 ("wifi: mac80211: move frame RX handling to type files") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-21	net: dsa: add NETC switch tag support	Wei Fang
	The NXP NETC switch tag is a proprietary header added to frames after the source MAC address. The switch tag has 3 types, and each type has 1 ~ 4 subtypes, the details are as follows. Forward NXP switch tag (Type=0): Represents forwarded frames. - SubType = 0 - Normal frame processing. To_Port NXP switch tag (Type=1): Represents frames that are to be sent to a specific switch port. - SubType = 0. No request to perform timestamping. - SubType = 1. Request to perform one-step timestamping. - SubType = 2. Request to perform two-step timestamping. - SubType = 3. Request to perform both one-step timestamping and two-step timestamping. To_Host NXP switch tag (Type=2): Represents frames redirected or copied to the switch management port. - SubType = 0. Received frames redirected or copied to the switch management port. - SubType = 1. Received frames redirected or copied to the switch management port with captured timestamp at the switch port where the frame was received. - SubType = 2. Transmit timestamp response (two-step timestamping). In addition, the length of different type switch tag is different, the minimum length is 6 bytes, the maximum length is 14 bytes. Currently, Forward tag, SubType 0 of To_Port tag and Subtype 0 of To_Host tag are supported. More tags will be supported in the future. Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260518082506.1318236-10-wei.fang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-20	tcp: fix stale per-CPU tcp_tw_isn leak enabling ISN prediction	Eric Dumazet
	Blamed commit moved the TIME_WAIT-derived ISN from the skb control block to a per-CPU variable, assuming the value would always be consumed by tcp_conn_request() for the same packet that wrote it. That assumption is violated by multiple drop paths between the producer (__this_cpu_write(tcp_tw_isn, isn) in tcp_v{4,6}_rcv()) and the consumer (tcp_conn_request()): - min_ttl / min_hopcount check - xfrm policy check - tcp_inbound_hash() MD5/AO mismatch - tcp_filter() eBPF/SO_ATTACH_FILTER drop - th->syn && th->fin discard in tcp_rcv_state_process() TCP_LISTEN - psp_sk_rx_policy_check() in tcp_v{4,6}_do_rcv() - tcp_checksum_complete() in tcp_v{4,6}_do_rcv() - tcp_v{4,6}_cookie_check() returning NULL When a packet is dropped on any of these paths, tcp_tw_isn is left set. The next SYN processed on the same CPU then consumes the non zero value in tcp_conn_request(), receiving a potentially predictable ISN. This patch moves back tcp_tw_isn to skb->cb[], getting rid of the per-cpu variable. Note that tcp_v{4,6}_fill_cb() do not set it. Very litle impact on overall code size/complexity: $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 2/1 up/down: 8/-15 (-7) Function old new delta tcp_v6_rcv 3038 3042 +4 tcp_v4_rcv 3035 3039 +4 tcp_conn_request 2938 2923 -15 Total: Before=24436060, After=24436053, chg -0.00% Fixes: 41eecbd712b7 ("tcp: replace TCP_SKB_CB(skb)->tcp_tw_isn with a per-cpu field") Reported-by: Chris Mason <clm@meta.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260519084611.2485277-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20	smc: Use flexible array for SMCD connections	Rosen Penev
	Store the per-DMB connection pointers in the SMCD device allocation instead of allocating a separate connection array. This keeps the connection table tied to the SMCD device lifetime and simplifies the allocation and cleanup paths. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Sidraya Jayagond <sidraya@linux.ibm.com> Link: https://patch.msgid.link/20260519005206.628071-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20	net: shaper: rework the VALID marking (again)	Jakub Kicinski
	Recent commit changed the semantics from NOT_VALID to VALID. I didn't realize that the flags are not stored atomically with the entry in XArray. There's still a race of reader observing a VALID mark for a slot, getting interrupted, writer replacing the entry with a different one, reader continuing, fetching the entry which is now a different pointer than the pointer for which VALID was meant. The biggest consequence of this is that we may see a UAF since net_shaper_rollback() assumed that entries without VALID can be freed without observing RCU. Looks like the XArray marks are buying us nothing at this point. Let's convert the code to an explicit valid field. The smp_load_acquire() / smp_store_release() barriers are marginally cleaner. Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: 93954b40f6a4 ("net-shapers: implement NL set and delete operations") Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260515221325.1685455-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-20	wifi: cfg80211: add a function to parse UHR DBE	Johannes Berg
	Add a function that takes the DBE information and parses it into an existing chandef that should hold the BSS channel. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260515141209.4eb1490f5cc6.I3ca9421f1fe4c31073846b1b62017f12c75889de@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-18	Merge tag 'nf-26-05-16' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix small race windows in nf_ct_helper_log() when accessing helper, from Florian Westphal. 2) Fix potential infinite loop and race conditions in IPVS caused by frequent user-triggered service table changes, from Julia Anastasov. 3) Fix a race condition when dumping ipsets for restore, from Jozsef Kadlecsik. 4) Fix inner transport offset in IPv6 in nft_inner when extension headers come before the layer 4 transport header, from Yizhou Zhao. 5) Fix incorrect iteration over IPv4 ranges in several hash set types, from Nan Li. 6) Fix incorrect order when restoring BH in nft_inner_restore_tun_ctx(), from Florian Westphal. 7) Validate option array from ip6t_hbh checkpath() to fix an off-by-one access, from Zhengchuan Liang. 8) Fix race condition between ipset list -terse and concurrent updates, from Jozsef Kadlecisk. 9) Fix race condition when inserting elements into a hash bucket, also from Jozsef. 10) Annotate access to first free slot in hashtable, from Jozsef Kadlecsik. 11) Ensure sufficient headroom in br_netfilter neigh transmission, from Lorenzo Bianconi. 12) Hold reference on skb->dev in nfqueue exit path, bridge local input is speciall since skb->dev != state->indev, allowing for net_device to go away while packet is sitting in nfqueue. From Haoze Xie. * tag 'nf-26-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_queue: hold bridge skb->dev while queued netfilter: br_netfilter: Reallocate headroom if necessary in neigh_hh_bridge() netfilter: ipset: annotate "pos" for concurrent readers/writers netfilter: ipset: Fix data race between add and dump in all hash types netfilter: ipset: Fix data race between add and list header in all hash types netfilter: ip6t_hbh: reject oversized option lists netfilter: nft_inner: release local_lock before re-enabling softirqs netfilter: ipset: stop hash:* range iteration at end netfilter: nft_inner: Fix IPv6 inner_thoff desync netfilter: ipset: fix a potential dump-destroy race ipvs: avoid possible loop in ip_vs_dst_event on resizing netfilter: nf_conntrack_helper: fix possible null deref during error log ==================== Link: https://patch.msgid.link/20260516115627.967773-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-16	netfilter: nf_queue: hold bridge skb->dev while queued	Haoze Xie
	br_pass_frame_up() rewrites skb->dev from the ingress port to the bridge master before queueing bridge LOCAL_IN packets. NFQUEUE only holds references on state.in/out and bridge physdevs, so a queued bridge packet can retain a freed bridge master in skb->dev until reinjection. When the verdict is reinjected later, br_netif_receive_skb() re-enters the receive path with skb->dev still pointing at the freed bridge master, triggering a use-after-free. Store skb->dev in the queue entry, hold a reference on it for the queue lifetime, and use the saved device when dropping queued packets during NETDEV_DOWN handling. Fixes: ac2863445686 ("netfilter: bridge: add nf_afinfo to enable queuing to userspace") Cc: stable@kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Haoze Xie <royenheart@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-16	netfilter: br_netfilter: Reallocate headroom if necessary in neigh_hh_bridge()	Lorenzo Bianconi
	neigh_hh_bridge() assumes the skb always has sufficient headroom to copy the aligned L2 header. This assumption can trigger the crash reported below using the following netfilter setup: $modprobe br_netfilter $sysctl -w net.bridge.bridge-nf-call-iptables=1 $root@OpenWrt:~# nft list ruleset table ip nat { chain prerouting { type nat hook prerouting priority dstnat; policy accept; ip daddr 192.168.83.123 dnat to 192.168.83.120 } } - iperf3 client (192.168.83.119) --> bridge (192.168.83.118) --> iperf3 server (192.168.83.120) the iperf3 client is sending packet for 192.168.83.123 to the bridge device. [ 1579.036575] Unable to handle kernel write to read-only memory at virtual address ffffff8004d76ffe [ 1579.045482] Mem abort info: [ 1579.048273] ESR = 0x000000009600004f [ 1579.052024] EC = 0x25: DABT (current EL), IL = 32 bits [ 1579.057363] SET = 0, FnV = 0 [ 1579.060417] EA = 0, S1PTW = 0 [ 1579.063550] FSC = 0x0f: level 3 permission fault [ 1579.068345] Data abort info: [ 1579.071224] ISV = 0, ISS = 0x0000004f, ISS2 = 0x00000000 [ 1579.076720] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 1579.081770] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1579.087092] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080dc4000 [ 1579.093794] [ffffff8004d76ffe] pgd=180000009ffff003, p4d=180000009ffff003, pud=180000009ffff003, pmd=180000009ffe3003, pte=0060000084d76787 [ 1579.106343] Internal error: Oops: 000000009600004f [#1] SMP [ 1579.193824] CPU: 0 UID: 0 PID: 235 Comm: napi/qdma_eth-3 Tainted: G O 6.12.57 #0 [ 1579.202614] Tainted: [O]=OOT_MODULE [ 1579.206102] Hardware name: Airoha AN7581 Evaluation Board (DT) [ 1579.211929] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1579.218889] pc : br_nf_pre_routing_finish_bridge+0x1ac/0xcc8 [br_netfilter] [ 1579.225859] lr : br_nf_pre_routing_finish_bridge+0x18c/0xcc8 [br_netfilter] [ 1579.232822] sp : ffffffc0817cba20 [ 1579.236128] x29: ffffffc0817cba20 x28: 0000000000000000 x27: ffffff8002b89000 [ 1579.243273] x26: ffffff8004d7700e x25: 0000000000000008 x24: 0000000000000000 [ 1579.250416] x23: ffffffc08179d4c0 x22: 0000000000000000 x21: ffffffc08179d4c0 [ 1579.257561] x20: ffffff8004d9b800 x19: ffffff8015010000 x18: 0000000000000014 [ 1579.264704] x17: ffffffbf9e930000 x16: ffffffc0817c8000 x15: 0000000000000070 [ 1579.271848] x14: 0000000000000080 x13: 0000000000000001 x12: 0000000000000000 [ 1579.278993] x11: ffffffc0798caae0 x10: ffffff8014db6fd8 x9 : 0000000000000000 [ 1579.286136] x8 : 0000000000000003 x7 : ffffffc08171f628 x6 : 000000001a3b83d3 [ 1579.293281] x5 : 0000000000000000 x4 : 1beb76f22fee0000 x3 : ffffff8004d7700e [ 1579.300425] x2 : 0000000000000000 x1 : ffffff8004d9b8bc x0 : ffffff80026ed000 [ 1579.307570] Call trace: [ 1579.310018] br_nf_pre_routing_finish_bridge+0x1ac/0xcc8 [br_netfilter] [ 1579.316632] br_nf_hook_thresh+0xd4/0x14bc [br_netfilter] [ 1579.322032] br_nf_hook_thresh+0x250/0x14bc [br_netfilter] [ 1579.327517] br_nf_hook_thresh+0x76c/0x14bc [br_netfilter] [ 1579.333003] br_handle_frame+0x180/0x480 [ 1579.336935] __netif_receive_skb_core.constprop.0+0x540/0xf40 [ 1579.342682] __netif_receive_skb_one_core+0x28/0x50 [ 1579.347561] process_backlog+0x98/0x1e0 [ 1579.351398] __napi_poll+0x34/0x1c4 [ 1579.354887] net_rx_action+0x178/0x330 [ 1579.358638] handle_softirqs+0x108/0x2d4 [ 1579.362560] __do_softirq+0x10/0x18 [ 1579.366051] ____do_softirq+0xc/0x20 [ 1579.369627] call_on_irq_stack+0x30/0x4c [ 1579.373550] do_softirq_own_stack+0x18/0x20 [ 1579.377734] do_softirq+0x4c/0x60 [ 1579.381050] __local_bh_enable_ip+0x88/0x98 [ 1579.385234] napi_threaded_poll_loop+0x188/0x21c [ 1579.389853] napi_threaded_poll+0x70/0x80 [ 1579.393863] kthread+0xd8/0xdc [ 1579.396918] ret_from_fork+0x10/0x20 [ 1579.400499] Code: 88dffc22 3707ffc2 f9406663 f9406684 (f81f0064) [ 1579.406589] ---[ end trace 0000000000000000 ]--- [ 1579.411209] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 1579.418083] SMP: stopping secondary CPUs [ 1579.422012] Kernel Offset: disabled Fix the issue reallocating the skb headroom if necessary in neigh_hh_bridge routine. Fixes: e179e6322ac33 ("netfilter: bridge-netfilter: Fix MAC header handling with IP DNAT") Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-16	ipvs: avoid possible loop in ip_vs_dst_event on resizing	Julian Anastasov
	Sashiko points out that unprivileged user can frequently call ip_vs_flush() or ip_vs_del_service() to trigger svc_table_changes updates that can lead to infinite loop in ip_vs_dst_event(). This can also happen if the user triggers frequent table resizing without deleting all services. We should also consider the possible effects if the user triggers many NETDEV_DOWN events. One way to solve it is to hold svc_resize_sem in ip_vs_dst_event() but this can block the dev notifier during the whole resizing process. Instead, use new rw_semaphore svc_replace_sem to protect just the svc_table replacement which is a short code section. Then hold svc_replace_sem in ip_vs_dst_event() to serialize with replacing the svc_table. As result, loop is avoided as there is no need to repeat the table walking from the start. By this way changes in svc_table_changes can happen only when all services are removed and all dev references dropped which allows us to abort the table walking. As IP_VS_WORK_SVC_NORESIZE is the flag used to stop the svc_resize_work under service_mutex, we should check only this flag often but not while under service_mutex. To remove the mutex_trylock() for service_mutex in the second phase where the resizer installs the new table after rehashing, we will avoid holding the service_mutex there. As result, the code in configuration context which is under service_mutex should access ipvs->svc_table under RCU because it can be replaced at anytime and released after a RCU grace period. As for ip_vs_zero_all(), it needs different solution as a table walker which can escape single RCU read-side critical section: to hold the svc_replace_sem to prevent table to be replaced. In ip_vs_status_show() prefer to hold svc_replace_sem to avoid many loops, just detect if the svc_table is removed. Prefer the newly attached table for the u_thresh/l_thresh checks to know when to grow/shrink while adding or deleting services because the new table size is based on the latest parameters. Link: https://sashiko.dev/#/patchset/20260505001648.360569-1-pablo%40netfilter.org Fixes: 840aac3d900d ("ipvs: use resizable hash table for services") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-15	net: always declare __sock_wfree() and tcp_wfree()	Eric Dumazet
	Even if guarded by IS_ENABLED(CONFIG_INET) compilers need to know what __sock_wfree() and tcp_wfree() are: include/net/sock.h:1861:63: note: each undeclared identifier is reported only once for each function it appears in include/net/sock.h:1862:63: error: 'tcp_wfree' undeclared (first use in this function); did you mean 'sock_wfree'? 1862 \| (IS_ENABLED(CONFIG_INET) && skb->destructor == tcp_wfree); Fixes: f0de88303d5e ("net: make is_skb_wmem() available to modules") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202605141607.mDXnYFKY-lkp@intel.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260514095506.3919094-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-14	net/sched: qdisc_qstats_qlen_backlog() runs locklessly	Eric Dumazet
	qdisc_qstats_qlen_backlog() can be called without qdisc spinlock being held. Use qdisc_qlen_lockless() instead of qdisc_qlen(). Add a const qualifier to its first parameter (struct Qdisc *sch). Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260513080853.1383975-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-14	netlink: add one debug check in nla_nest_end()	Eric Dumazet
	Add a DEBUG_NET_WARN_ON_ONCE(diff > U16_MAX) to warn if the kernel sends corrupted nested attribute to user space. Offenders can be converted to nla_nest_end_safe(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260512155244.4137851-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-14	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-7.1-rc4). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-14	Bluetooth: serialize accept_q access	Jiexun Wang
	bt_sock_poll() walks the accept queue without synchronization, while child teardown can unlink the same socket and drop its last reference. The unsynchronized accept queue walk has existed since the initial Bluetooth import. Protect accept_q with a dedicated lock for queue updates and polling. Also rework bt_accept_dequeue() to take temporary child references under the queue lock before dropping it and locking the child socket. Fixes: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Reported-by: Jann Horn <jannh@google.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Reviewed-by: Jann Horn <jannh@google.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-13	net: make is_skb_wmem() available to modules	Eric Dumazet
	Following patch will use is_skb_wmem() from fq_codel. Provide __sock_wfree() only if CONFIG_INET=y Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260512094859.3673997-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-13	macsec: use rcu_work to defer TX SA crypto cleanup out of softirq	Jinliang Zheng
	free_txsa() is an RCU callback running in softirq context, but calls crypto_free_aead() which can invoke vunmap() internally on hardware crypto drivers (e.g. hisi_sec2), triggering a kernel crash. Use rcu_work to defer the cleanup to a workqueue, for the same reasons as the analogous fix to free_rxsa() in the previous patch. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260511153102.2640368-4-alexjlzheng@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-13	macsec: use rcu_work to defer RX SA crypto cleanup out of softirq	Jinliang Zheng
	crypto_free_aead() can internally invoke vunmap() (e.g. via dma_free_attrs() in hardware crypto drivers such as hisi_sec2). vunmap() must not be called from softirq context, but free_rxsa() is an RCU callback that runs in softirq, leading to a kernel crash: vunmap+0x4c/0x70 __iommu_dma_free+0xd0/0x138 dma_free_attrs+0xf4/0x100 sec_aead_exit+0x64/0xb8 [hisi_sec2] crypto_destroy_tfm+0x98/0x110 free_rxsa+0x28/0x50 [macsec] rcu_do_batch+0x184/0x460 rcu_core+0xf4/0x1f8 handle_softirqs+0x118/0x330 Use rcu_work to defer the cleanup to a workqueue. rcu_work dispatches the worker asynchronously after the RCU grace period, so no thread blocks waiting, and concurrent releases of multiple SAs naturally share the same grace period. Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/20260511153102.2640368-3-alexjlzheng@tencent.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-11	net/sched: mq: no longer acquire qdisc spinlocks in dump operations	Eric Dumazet
	Prepare mq_dump_common() for RTNL avoidance. Use RCU instead of RTNL, and no longer acquire each children spinlock. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260510091455.4039245-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-11	net/sched: add const qualifiers to gnet_stats helpers	Eric Dumazet
	In preparation of lockless qdisc dumps, add const qualifiers to: - gnet_stats_add_basic() - gnet_stats_copy_basic() - gnet_stats_copy_basic_hw() - gnet_stats_copy_queue() - gnet_stats_read_basic() - ___gnet_stats_copy_basic() - qdisc_qstats_copy() Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260510091455.4039245-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-11	net/sched: add qdisc_qlen_lockless() helper	Eric Dumazet
	Used in contexts were qdisc spinlock is not held. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260510091455.4039245-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-11	net/sched: annotate data-races around sch->qstats.backlog	Eric Dumazet
	Add qstats_backlog_sub() and qstats_backlog_add() helpers and use them instead of open-coding them. These helpers use WRITE_ONCE() to prevent store-tearing. Also use WRITE_ONCE() in fq_reset() and qdisc_reset() when sch->qstats.backlog is cleared. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260510091455.4039245-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-11	net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()	Eric Dumazet
	Helpers to increment or decrement sch->q.qlen, with appropriate WRITE_ONCE() to prevent store tearing. Add other WRITE_ONCE() when sch->q.qlen is changed. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260510091455.4039245-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-08	Merge tag 'nf-26-05-08' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Allow initial x_tables table replacement without emitting an audit log message. Delay the register message until after hooks are wired up to avoid unnecessary unregister logs during error unwinding. 2) Fix a NULL dereference by allocating hook ops before adding the table to the per-netns list. Use `synchronize_rcu()` during error unwinding to ensure the table stops processing packets before teardown. Defer audit log register message until all operations succeed. 3) Refactor xtables to use a single `xt_unregister_table_pre_exit` function. Eliminate code duplication by centralizing table unregistration logic within the xtables core. ebtables cannot be changed due to incompatibility. 4) Unregister xtables templates before module removal. This prevents a race condition where userspace instantiates a new table after the pernet unreg removed the current table. 5) Add `xtables_unregister_table_exit` to fully unregister netfilter tables during module removal. Unlink the table from dying lists, then free hook operations. 6) Implement a two-stage removal scheme for ebtables following the x_tables pattern. Assign table->ops while holding the ebt mutex to prevent exposing partially-filled structures. 7) Fix ebtables module initialization race. Register the template last in table initialization functions. Prevent table instantiation before pernet operations are available. 8) Fix a race condition in x_tables module initialization. Ensure pernet ops are fully set up before exposing the table to userspace. 9) Fix a race condition in ebtables module initialization, similar to previous patch. 10) Restore propagation of helper to expected connection, this is a fix-for-recent-fix. 11) Validate that the expectation tuple and mask netlink attributes are present when adding expectation via nfqueue, this fixes a possible null-ptr-deref. 12) Fix possible rare memleak in the SIP helper in case helper has been detached from conntrack entry, from Li Xiasong. 13) Fix refcount leak in nft_ct when creating custom expectation, also from Li Xiason. Patches 1-9 from Florian Westphal. 10) Restore propagation of helper to expected connection, this is a fix-for-recent-fix. 11) Check that tuple and mask netlink attributes are set when creating an expectation via nfqueue. * tag 'nf-26-05-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nft_ct: fix missing expect put in obj eval netfilter: nf_conntrack_sip: get helper before allocating expectation netfilter: ctnetlink: check tuple and mask in expectations created via nfqueue netfilter: nf_conntrack_expect: restore helper propagation via expectation netfilter: bridge: eb_tables: close module init race netfilter: x_tables: close dangling table module init race netfilter: ebtables: close dangling table module init race netfilter: ebtables: move to two-stage removal scheme netfilter: x_tables: add and use xtables_unregister_table_exit netfilter: x_tables: unregister the templates first netfilter: x_tables: add and use xt_unregister_table_pre_exit netfilter: x_tables: allocate hook ops while under mutex netfilter: x_tables: allow initial table replace without emitting audit log message ==================== Link: https://patch.msgid.link/20260507234509.603182-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-08	genetlink: free the skb on 'group >= family->n_mcgrps'	Alice Ryhl
	These methods generally consume ownership of the provided skb, so even if an error path is encountered, the skb is freed. This is because the very first thing they do after some initial setup is to unconditionally consume the skb via consume_skb(skb). Any subsequent errors lead to the core netlink layer freeing the skb. However, there is one check that occurs before ownership is passed, which is the check for the group index. So if this error condition is encountered, then the skb is leaked. This error condition is generally considered a violation of the netlink API, so it's not expected to occur under normal circumstances. For the same reason, no callers check for this error condition, and no callers need to be adjusted. However, we should still follow the same ownership semantics of the rest of the function. Thus, free the skb in this codepath. Suggested-by: Andrew Lunn <andrew@lunn.ch> Suggested-by: Matthew Maurer <mmaurer@google.com> Fixes: 2a94fe48f32c ("genetlink: make multicast groups const, prevent abuse") Link: https://lore.kernel.org/r/845b36ba-7b3a-41f2-acb2-b284f253e2ca@lunn.ch Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260506-genlmsg-return-v2-1-a63ee2a055d6@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-08	net: nsh: fix incorrect header length macros	Ilya Maximets
	NSH header length is a 6-bit field that encodes the total length of the header in 4-byte words. So the maximum length is 0b111111 * 4, which is 252 and not 256. The maximum context length is the same number minus the length of the base header (8), so 244. These macros are used to validate push_nsh() action in openvswitch. Miscalculation here doesn't cause any real issues. In the worst case the oversized context is truncated while building the header, so we'll construct and send a broken packet, which is not a big problem, as any receiver should validate the fields. No invalid memory accesses will happen during the header push. But we should fix the macros to reject the incorrect actions in the first place. Using previously defined values and calculating the length instead of defining numbers directly, so it's easier to understand where they come from and harder to make a mistake. Fixes: 1f0b7744c505 ("net: add NSH header structures and helpers") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://patch.msgid.link/20260507120434.2962505-1-i.maximets@ovn.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-08	ipv6: flowlabel: enforce per-netns limit for unprivileged callers	Maoyi Xie
	fl_size, fl_ht and ip6_fl_lock in net/ipv6/ip6_flowlabel.c are file scope and shared across netns. mem_check() reads fl_size to decide whether to deny non-CAP_NET_ADMIN callers. capable() runs against init_user_ns, so an unprivileged user in any non-init userns can push fl_size past FL_MAX_SIZE - FL_MAX_SIZE / 4 and starve every other unprivileged userns on the host. Add struct netns_ipv6::flowlabel_count, bumped and decremented next to fl_size in fl_intern, ip6_fl_gc and ip6_fl_purge. The new field fills the existing 4-byte hole after ipmr_seq, so struct netns_ipv6 stays the same size on 64-bit builds. Bump FL_MAX_SIZE from 4096 to 8192. It has been 4096 since the file was added. Machines and connection counts have grown. mem_check() folds an extra per-netns ceiling into the existing non-CAP_NET_ADMIN conditional. The ceiling is half of the total budget that unprivileged callers have ever been able to use, i.e. (FL_MAX_SIZE - FL_MAX_SIZE / 4) / 2 = 3072 entries. With FL_MAX_SIZE doubled, this preserves the original per-user reach of 3K (what an unprivileged caller could already obtain before this change), while forcing an attacker to spread allocations across at least two netns to exhaust the global non-CAP_NET_ADMIN budget. CAP_NET_ADMIN against init_user_ns still bypasses both caps. The previous patch took ip6_fl_lock across mem_check and fl_intern, so the new flowlabel_count read in mem_check and the new flowlabel_count++ in fl_intern run under the same critical section. flowlabel_count is therefore plain int, like fl_size. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Suggested-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Link: https://patch.msgid.link/20260506082416.2259567-3-maoyixie.tju@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-08	netfilter: nf_conntrack_expect: restore helper propagation via expectation	Pablo Neira Ayuso
	A recent series to fix expectations broke helper propagation via expectation, this mechanism is used by the sip and h323 helper. This also propagates the conntrack helper to expected connections. I changed semantics of exp->helper which now tells us the actual helper that created the expectation. Add an explicit assign_helper field to expectations for this purpose and update helpers to use it. Restore this feature for userspace conntrack helper via ctnetlink nfqueue integration so it is again possible to attach a helper to an expectation, where it makes sense. This is not restored via ctnetlink expectation creation as there is no client for such feature. Use the expectation layer 4 protocol number for the helper lookup for consistency. Make sure the expectation using this helper propagation mechanism also go away when the helper is unregistered. Fixes: 9c42bc9db90a ("netfilter: nf_conntrack_expect: honor expectation helper field") Fixes: 917b61fa2042 ("netfilter: ctnetlink: ignore explicit helper on new expectations") Reported-by: Ilya Maximets <i.maximets@ovn.org> Tested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-05-07	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski
	Cross-merge networking fixes after downstream PR (net-7.1-rc3). Conflicts: net/ipv4/igmp.c 726fa7da2d8c ("ipv4: igmp: get rid of IGMPV3_{QQIC,MRC} and simplify calculation") c6bebaa744f7 ("ipv4: igmp: annotate data-races in igmp_heard_query()") https://lore.kernel.org/a7365e4873340f7a5e30411207de3bf9@kernel.org Adjacent changes: net/psp/psp_main.c 30cb24f97d44 ("psp: strip variable-length PSP header in psp_dev_rcv()") c2b22277ad89 ("psp: validate IPv4 header fields in psp_dev_rcv()") net/sched/sch_fq_codel.c f83e07b29246 ("net/sched: sch_fq_codel: annotate data-races from fq_codel_dump_class_stats()") 3f3aa77ff1c8 ("net/sched: add qstats_cpu_drop_inc() helper") net/wireless/pmsr.c 0f3c0a197309 ("wifi: nl80211: fix NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST usage") 410aa47fd9d3 ("wifi: cfg80211: allow suppressing FTM result reporting for PD requests") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-07	xfrm: route MIGRATE notifications to caller's netns	Maoyi Xie
	xfrm_send_migrate() in net/xfrm/xfrm_user.c and pfkey_send_migrate() in net/key/af_key.c both hardcode &init_net for the multicast that announces a successful XFRM_MSG_MIGRATE / SADB_X_MIGRATE. XFRM_MSG_MIGRATE arrives on a per-netns NETLINK_XFRM socket, and the rest of the xfrm/af_key netlink path was made netns-aware in 2008. The other 14 multicast paths in xfrm_user.c route their event using xs_net(x), xp_net(xp) or sock_net(skb->sk); only the migrate path was missed. Two consequences of the init_net hardcoding: 1. The notification (selector, old/new endpoint addresses, and the km_address) is delivered to listeners on init_net's XFRMNLGRP_MIGRATE / pfkey BROADCAST_ALL groups rather than on the issuing netns. An IKE daemon running in init_net therefore receives migration notifications originating from any other netns on the host. 2. An IKE daemon running inside a non-init netns and subscribed to its own XFRMNLGRP_MIGRATE / pfkey groups never receives the notification of its own migration. IKEv2 MOBIKE / address-update handling inside a netns is silently broken. Thread struct net through km_migrate() and the xfrm_mgr.migrate function pointer, drop the &init_net override in xfrm_send_migrate() and pfkey_send_migrate(), and pass the caller's net (already in scope in xfrm_migrate() via sock_net(skb->sk)) all the way down. struct xfrm_mgr is in-tree only and not exported as a stable API, so the function-pointer signature change is internal. pfkey_broadcast() is already netns-aware via net_generic(net, pfkey_net_id) since the pernet conversion. The five other pfkey_broadcast() callers in af_key.c already pass xs_net(x), sock_net(sk) or a per-netns net, so this only removes the &init_net outlier. Fixes: 5c79de6e79cd ("[XFRM]: User interface for handling XFRM_MSG_MIGRATE") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by: Maoyi Xie <maoyi.xie@ntu.edu.sg> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2026-05-06	Bluetooth: l2cap: defer conn param update to avoid conn->lock/hdev->lock ↵	Mikhail Gavrilov
	inversion When a BLE peripheral sends an L2CAP Connection Parameter Update Request the processing path is: process_pending_rx() [takes conn->lock] l2cap_le_sig_channel() l2cap_conn_param_update_req() hci_le_conn_update() [takes hdev->lock] Meanwhile other code paths take the locks in the opposite order: l2cap_chan_connect() [takes hdev->lock] ... mutex_lock(&conn->lock) l2cap_conn_ready() [hdev->lock via hci_cb_list_lock] ... mutex_lock(&conn->lock) This is a classic AB/BA deadlock which lockdep reports as a circular locking dependency when connecting a BLE MIDI keyboard (Carry-On FC-49). Fix this by making hci_le_conn_update() defer the HCI command through hci_cmd_sync_queue() so it no longer needs to take hdev->lock in the caller context. The sync callback uses __hci_cmd_sync_status_sk() to wait for the HCI_EV_LE_CONN_UPDATE_COMPLETE event, then updates the stored connection parameters (hci_conn_params) and notifies userspace (mgmt_new_conn_param) only after the controller has confirmed the update. A reference on hci_conn is held via hci_conn_get()/hci_conn_put() for the lifetime of the queued work to prevent use-after-free, and hci_conn_valid() is checked before proceeding in case the connection was removed while the work was pending. The hci_dev_lock is held across hci_conn_valid() and all conn field accesses to prevent a concurrent disconnect from invalidating the connection mid-use. Fixes: f044eb0524a0 ("Bluetooth: Store latency and supervision timeout in connection params") Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2026-05-06	Merge tag 'wireless-next-2026-05-06' of ↵	Jakub Kicinski
	https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Lots of new content in cfg80211/mac80211, notably - more NAN work, mostly complete now (also hwsim) - more UHR work (e.g. non-primary channel access), this will continue for a while - FTM ranging APIs * tag 'wireless-next-2026-05-06' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (70 commits) wifi: mac80211: explicitly disable FTM responder on AP stop wifi: iwlwifi: don't blindly start the responder upon BSS_CHANGED_FTM_RESPONDER wifi: mac80211_hwsim: claim HT STBC capability wifi: mac80211_hwsim: enable NAN_DATA interface simulation support wifi: mac80211_hwsim: Support Tx of multicast data on NAN wifi: mac80211_hwsim: Do not declare support for NDPE wifi: mac80211_hwsim: Declare support for secure NAN wifi: mac80211_hwsim: add NAN data path TX/RX support wifi: mac80211_hwsim: set HAS_RATE_CONTROL when using NAN wifi: mac80211_hwsim: implement NAN schedule callbacks wifi: mac80211_hwsim: add NAN PHY capabilities wifi: mac80211_hwsim: add NAN_DATA interface limits wifi: mac80211_hwsim: implement NAN synchronization wifi: mac80211_hwsim: protect tsf_offset using a spinlock wifi: mac80211_hwsim: only RX on NAN when active on a slot wifi: mac80211_hwsim: select NAN TX channel based on current TSF wifi: mac80211_hwsim: limit TX of frames to the NAN DW wifi: cfg80211: don't allow NAN DATA on multi radio devices wifi: mac80211: check AP using NPCA has NPCA capability wifi: mac80211: don't parse full UHR operation from beacons ... ==================== Link: https://patch.msgid.link/20260506111147.224296-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	net: mana: Use per-queue allocation for tx_qp to reduce allocation size	Aditya Garg
	Convert tx_qp from a single contiguous array allocation to per-queue individual allocations. Each mana_tx_qp struct is approximately 35KB. With many queues (e.g., 32/64), the flat array requires a single contiguous allocation that can fail under memory fragmentation. Change mana_tx_qp tx_qp to mana_tx_qp *tx_qp (array of pointers), allocating each queue's mana_tx_qp individually via kvzalloc. This reduces each allocation to ~35KB and provides vmalloc fallback, avoiding allocation failure due to fragmentation. Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260502074552.23857-2-gargaditya@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	Merge tag 'nf-26-05-05' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== IPVS fixes for net The following batch contains IPVS fixes for net to address issues from the latest net-next pull request. Julian Anastasov made the following summary: 1-3) Fixes for the recently added resizable hash tables 4) dest from trash can be leaked if ip_vs_start_estimator() fails 5) fixed races and locking for the estimation kthreads 6) fix for wrong roundup_pow_of_two() usage in the resizable hash tables 7-8) v2 of the changes from Waiman Long to properly guard against the housekeeping_cpumask() updates: https://lore.kernel.org/netfilter-devel/20260331165015.2777765-1-longman@redhat.com/ I added missing Fixes tag. The original description: Since commit 041ee6f3727a ("kthread: Rely on HK_TYPE_DOMAIN for preferred affinity management"), the HK_TYPE_KTHREAD housekeeping cpumask may no longer be correct in showing the actual CPU affinity of kthreads that have no predefined CPU affinity. As the ipvs networking code is still using HK_TYPE_KTHREAD, we need to make HK_TYPE_KTHREAD reflect the reality. This patch series makes HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN and uses RCU to protect access to the HK_TYPE_KTHREAD housekeeping cpumask. Julian plans to post a nf-next patch to limit the connections by using "conn_max" sysctl. With Simon Horman, they agreed that this is an old problem that we do not have a limit of connections and it is not a stopper for this patchset. * tag 'nf-26-05-05' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: sched/isolation: Make HK_TYPE_KTHREAD an alias of HK_TYPE_DOMAIN ipvs: Guard access of HK_TYPE_KTHREAD cpumask with RCU ipvs: fix shift-out-of-bounds in ip_vs_rht_desired_size ipvs: fix races around est_mutex and est_cpulist ipvs: do not leak dest after get from dest trash ipvs: fix the spin_lock usage for RT build ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars ipvs: fixes for the new ip_vs_status info ==================== Link: https://patch.msgid.link/20260505001648.360569-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	amt: Store struct sock in struct amt_dev.	Kuniyuki Iwashima
	amt does not need to access struct socket itself in the fast path; it only reads struct sock, and struct socket is only used for tunnel setup and teardown. Let's store struct sock directly in struct amt. amt_dev_stop() is called as dev->netdev_ops->ndo_stop(). synchronize_net() in unregister_netdevice_many_notify() ensures that inflight amt RX fast paths finish before amt_dev is freed. amt no longer needs synchronize_rcu() in udp_tunnel_sock_release(). Note that amt_dev_stop() looks buggy; cancel_delayed_work_sync() should be called after udp_tunnel_sock_release(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	vxlan: Free vxlan_sock with kfree_rcu().	Kuniyuki Iwashima
	We will remove synchronize_rcu() in udp_tunnel_sock_release(). We must ensure that vxlan_sock is freed after inflight RX fast path. Let's free vxlan_sock with kfree_rcu(). Note that vxlan_sock.vni_list[] is 8K and struct rcu_head must be placed before it. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	vxlan: Store struct sock in struct vxlan_sock.	Kuniyuki Iwashima
	Commit 3cf7203ca620 ("net/tunnel: wait until all sk_user_data reader finish before releasing the sock") added synchronize_rcu() in udp_tunnel_sock_release(). This was intended to protect the fast path of a dying vxlan device from dereferencing vxlan_sock->sock->sk after sock_orphan() has set sock->sk to NULL. However, vxlan does not need to access struct socket itself in the fast path; it only reads struct sock, and struct socket is only used for tunnel setup and teardown. Let's store struct sock directly in struct vxlan_sock. In the next patch, we will free vxlan_sock with kfree_rcu(), then vxlan no longer needs synchronize_rcu() in udp_tunnel_sock_release(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	udp_tunnel: Pass struct sock to udp_tunnel_notify_{add,del}_rx_port().	Kuniyuki Iwashima
	None of the udp_tunnel users need struct socket in their fast paths; it is only used for tunnel setup / teardown. Even udp_tunnel_notify_{add,del}_rx_port() do not need struct socket. Let's change udp_tunnel_notify_{add,del}_rx_port() to take struct sock instead of struct socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	udp_tunnel: Pass struct sock to udp_tunnel_{push,drop}_rx_port().	Kuniyuki Iwashima
	None of the udp_tunnel users need struct socket in their fast paths; it is only used for tunnel setup / teardown. Even udp_tunnel_{push,drop}_rx_port() do not need struct socket. Let's change udp_tunnel_{push,drop}_rx_port() to take struct sock instead of struct socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	udp_tunnel: Pass struct sock to udp_tunnel6_dst_lookup().	Kuniyuki Iwashima
	None of the udp_tunnel users need struct socket in their fast paths; it is only used for tunnel setup / teardown. Even udp_tunnel6_dst_lookup() does not need struct socket. Let's change udp_tunnel6_dst_lookup() to take struct sock instead of struct socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	udp_tunnel: Pass struct sock to setup_udp_tunnel_sock().	Kuniyuki Iwashima
	None of the udp_tunnel users need struct socket in their fast paths; it is only used for tunnel setup / teardown. Even setup_udp_tunnel_sock() does not need struct socket. Let's change setup_udp_tunnel_sock() to take struct sock instead of struct socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	udp_tunnel: Pass struct sock to udp_tunnel_sock_release().	Kuniyuki Iwashima
	None of the udp_tunnel users need struct socket in their fast paths; it is only used for tunnel setup / teardown. While the UDP tunnel interface accepts struct socket, this encourages users to store the pointer unnecessarily. This leads to extra dereferences when accessing struct sock fields (e.g., sk->sk_user_data instead of sock->sk->sk_user_data). Furthermore, these dereferences necessitate synchronize_rcu() in udp_tunnel_sock_release() to protect the fast paths from sock_orphan() setting sk->sk_socket to NULL. This overhead can be avoided if users store the struct sock pointer directly in their private structures. As a prep, let's change udp_tunnel_sock_release() to take struct sock instead of struct socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260502031401.3557229-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-05	net: mana: Fix crash from unvalidated SHM offset read from BAR0 during FLR	Dipayaan Roy
	During Function Level Reset recovery, the MANA driver reads hardware BAR0 registers that may temporarily contain garbage values. The SHM (Shared Memory) offset read from GDMA_REG_SHM_OFFSET is used to compute gc->shm_base, which is later dereferenced via readl() in mana_smc_poll_register(). If the hardware returns an unaligned or out-of-range value, the driver must not blindly use it, as this would propagate the hardware error into a kernel crash. The following crash was observed on an arm64 Hyper-V guest running kernel 6.17.0-3013-azure during VF reset recovery triggered by HWC timeout. [13291.785274] Unable to handle kernel paging request at virtual address ffff8000a200001b [13291.785311] Mem abort info: [13291.785332] ESR = 0x0000000096000021 [13291.785343] EC = 0x25: DABT (current EL), IL = 32 bits [13291.785355] SET = 0, FnV = 0 [13291.785363] EA = 0, S1PTW = 0 [13291.785372] FSC = 0x21: alignment fault [13291.785382] Data abort info: [13291.785391] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 [13291.785404] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [13291.785412] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [13291.785421] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000014df3a1000 [13291.785432] [ffff8000a200001b] pgd=1000000100438403, p4d=1000000100438403, pud=1000000100439403, pmd=0068000fc2000711 [13291.785703] Internal error: Oops: 0000000096000021 [#1] SMP [13291.830975] Modules linked in: tls qrtr mana_ib ib_uverbs ib_core xt_owner xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables cfg80211 8021q garp mrp stp llc binfmt_misc joydev serio_raw nls_iso8859_1 hid_generic aes_ce_blk aes_ce_cipher polyval_ce ghash_ce sm4_ce_gcm sm4_ce_ccm sm4_ce sm4_ce_cipher hid_hyperv sm4 sm3_ce sha3_ce hv_netvsc hid vmgenid hyperv_keyboard hyperv_drm sch_fq_codel nvme_fabrics efi_pstore dm_multipath nfnetlink vsock_loopback vmw_vsock_virtio_transport_common hv_sock vmw_vsock_vmci_transport vmw_vmci vsock dmi_sysfs ip_tables x_tables autofs4 [13291.862630] CPU: 122 UID: 0 PID: 61796 Comm: kworker/122:2 Tainted: G W 6.17.0-3013-azure #13-Ubuntu VOLUNTARY [13291.869902] Tainted: [W]=WARN [13291.871901] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 01/08/2026 [13291.878086] Workqueue: events mana_serv_func [13291.880718] pstate: 62400005 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [13291.884835] pc : mana_smc_poll_register+0x48/0xb0 [13291.887902] lr : mana_smc_setup_hwc+0x70/0x1c0 [13291.890493] sp : ffff8000ab79bbb0 [13291.892364] x29: ffff8000ab79bbb0 x28: ffff00410c8b5900 x27: ffff00410d630680 [13291.896252] x26: ffff004171f9fd80 x25: 000000016ed55000 x24: 000000017f37e000 [13291.899990] x23: 0000000000000000 x22: 000000016ed55000 x21: 0000000000000000 [13291.904497] x20: ffff8000a200001b x19: 0000000000004e20 x18: ffff8000a6183050 [13291.908308] x17: 0000000000000000 x16: 0000000000000000 x15: 000000000000000a [13291.912542] x14: 0000000000000004 x13: 0000000000000000 x12: 0000000000000000 [13291.916298] x11: 0000000000000000 x10: 0000000000000001 x9 : ffffc45006af1bd8 [13291.920945] x8 : ffff000151129000 x7 : 0000000000000000 x6 : 0000000000000000 [13291.925293] x5 : 000000015f214000 x4 : 000000017217a000 x3 : 000000016ed50000 [13291.930436] x2 : 000000016ed55000 x1 : 0000000000000000 x0 : ffff8000a1ffffff [13291.934342] Call trace: [13291.935736] mana_smc_poll_register+0x48/0xb0 (P) [13291.938611] mana_smc_setup_hwc+0x70/0x1c0 [13291.941113] mana_hwc_create_channel+0x1a0/0x3a0 [13291.944283] mana_gd_setup+0x16c/0x398 [13291.946584] mana_gd_resume+0x24/0x70 [13291.948917] mana_do_service+0x13c/0x1d0 [13291.951583] mana_serv_func+0x34/0x68 [13291.953732] process_one_work+0x168/0x3d0 [13291.956745] worker_thread+0x2ac/0x480 [13291.959104] kthread+0xf8/0x110 [13291.961026] ret_from_fork+0x10/0x20 [13291.963560] Code: d2807d00 9417c551 71000673 54000220 (b9400281) [13291.967299] ---[ end trace 0000000000000000 ]--- Disassembly of mana_smc_poll_register() around the crash site: Disassembly of section .text: 00000000000047c8 <mana_smc_poll_register>: 47c8: d503201f nop 47cc: d503201f nop 47d0: d503233f paciasp 47d4: f800865e str x30, [x18], #8 47d8: a9bd7bfd stp x29, x30, [sp, #-48]! 47dc: 910003fd mov x29, sp 47e0: a90153f3 stp x19, x20, [sp, #16] 47e4: 91007014 add x20, x0, #0x1c 47e8: 5289c413 mov w19, #0x4e20 47ec: f90013f5 str x21, [sp, #32] 47f0: 12001c35 and w21, w1, #0xff 47f4: 14000008 b 4814 <mana_smc_poll_register+0x4c> 47f8: 36f801e1 tbz w1, #31, 4834 <mana_smc_poll_register+0x6c> 47fc: 52800042 mov w2, #0x2 4800: d280fa01 mov x1, #0x7d0 4804: d2807d00 mov x0, #0x3e8 4808: 94000000 bl 0 <usleep_range_state> 480c: 71000673 subs w19, w19, #0x1 4810: 54000200 b.eq 4850 <mana_smc_poll_register+0x88> 4814: b9400281 ldr w1, [x20] <-- ** CRASHED HERE *** 4818: d50331bf dmb oshld 481c: 2a0103e2 mov w2, w1 ... From the crash signature x20 = ffff8000a200001b, this address ends in 0x1b which is not 4-byte aligned, so the 'ldr w1, [x20]' instruction (readl) triggers the arm64 alignment fault (FSC = 0x21). The root cause is in mana_gd_init_vf_regs(), which computes: gc->shm_base = gc->bar0_va + mana_gd_r64(gc, GDMA_REG_SHM_OFFSET); The offset is used without any validation. The same problem exists in mana_gd_init_pf_regs() for sriov_base_off and sriov_shm_off. Fix this by validating all offsets before use: - VF: check shm_off is within BAR0, properly aligned to 4 bytes (readl requirement), and leaves room for the full 256-bit (32-byte) SMC aperture. - PF: check sriov_base_off is within BAR0, aligned to 8 bytes (readq requirement), and leaves room to safely read the sriov_shm_off register at sriov_base_off + GDMA_PF_REG_SHM_OFF. Then check sriov_shm_off leaves room for the full SMC aperture. All arithmetic uses subtraction rather than addition to avoid integer overflow on garbage values. Define SMC_APERTURE_SIZE (32 bytes, derived from the 256-bit aperture width) Return -EPROTO on invalid values. The existing recovery path in mana_serv_reset() already handles -EPROTO by falling through to PCI device rescan, giving the hardware another chance to present valid register values after reset. Fixes: 9bf66036d686 ("net: mana: Handle hardware recovery events when probing the device") Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com> Link: https://patch.msgid.link/afQUMClyjmBVfD+u@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-05-05	wifi: cfg80211: separate NPCA validity from chandef validity	Johannes Berg
	When considering both NPCA and DBE, it can appear that the NPCA configuration is invalid, e.g. for an 80 MHz BSS channel with DBE to 160 MHz: \| primary channel \| NPCA primary channel \| \| V V \| p \| \| n \| \| \| \| \| \| \| BSS channel \| \| DBE channel \| Now the NPCA primary channel is in the same half as the primary channel, and the NPCA puncturing bitmap could be completely invalid as a puncturing bitmap when considering the overall channel. Split out the validity checks from cfg80211_chandef_valid() to a new cfg80211_chandef_npca_valid() function that just checks the NPCA configuration against the BSS chandef. Link: https://patch.msgid.link/20260428112708.1225df131557.If3a6afadcce05d215b72fd82175f72373a0f6d24@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: mac80211: mlme: use NPCA chandef if capable	Johannes Berg
	If the device is capable, parse the AP chandef with NPCA. Also advertise the other NPCA operational parameters to the underlying driver and track if they change (though not with BSS critical update etc. yet) Since NPCA can only be enabled when the chanctx isn't shared, the channel context code needs to clear/set npca.enabled in the per-link configuration, except during association since we can't enable NPCA before having completed association. In this case, set npca.enabled during the association process. Link: https://patch.msgid.link/20260428112708.eb1e42c0b6d7.I0acd8445d4600363afb8430922531450399d0fab@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: mac80211: allow only AP chanctx sharing with NPCA	Johannes Berg
	When two interfaces share a channel context, disable NPCA unless both are AP interfaces that require NPCA. This way, two AP interfaces can have identical chandefs set up and share the channel context, but any non-APs cannot share a chanctx with NPCA (they'd almost certainly have different BSS color.) This doesn't mean the chanctx cannot be shared but rather that NPCA will be disabled on the shared channel context. Link: https://patch.msgid.link/20260428112708.3832e15f4e78.I08a7c7f47d796f4d5d8f9a682c1fba37db2e4cf5@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: cfg80211: add helper for parsing NPCA to chandef	Johannes Berg
	Add a cfg80211_chandef_add_npca() helper function that takes an existing chandef without NPCA and sets the NPCA information from the format used in UHR operation and UHR Parameters Update. Link: https://patch.msgid.link/20260428112708.5cdc4e69a306.I95d396ac671da438f340b1afb735ebfe33164894@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: cfg80211: allow representing NPCA in chandef	Johannes Berg
	Add the necessary fields to the chandef data structure to represent NPCA (the NPCA primary channel and NPCA punctured/disabled subchannels bitmap), and the code to check these for validity, compatibility, as well as allowing it to be passed for AP mode for capable devices. Compatibility is assumed to only be the case when it's actually identical, enabling later management of this in channel contexts in mac80211 for multiple APs, but requiring userspace to set up the identical chandef on all AP interfaces that share a channel (and BSS color.) Link: https://patch.msgid.link/20260428112708.46f3872aeb35.I85888dab88a6659ba52db4b3318979ca5bcfc0c8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: cfg80211: allow devices to advertise extended MLD capa/ops	Johannes Berg
	For UHR, multi-link power-management capability lives there, and so it's needed that hostapd knows what to advertise, and clients should have it shown to userspace for information. Repurpose the existing NL80211_ATTR_ASSOC_MLD_EXT_CAPA_OPS by renaming it to NL80211_ATTR_EXT_MLD_CAPA_AND_OPS (with a define for compatibility) and advertise the capabilities. We can also later use the value, if needed, to set per-station capabilities on STAs added to AP interfaces. Link: https://patch.msgid.link/20260428110915.e808e70feed6.I378a7c017bfc1ebb072fa8d5d1db2ac9b45596c9@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2026-05-05	wifi: mac80211: track AP's extended MLD capa/ops	Johannes Berg
	For UHR multi-link power management, the driver/device needs to know if the AP supports it, to be able to use it. Track the AP's extended MLD capabilities and operations so it does. Link: https://patch.msgid.link/20260428110915.e4038a00e4b2.I323686be5d4a73e8b962019a30d51309496b86a6@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>