summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2026-03-05drm/ras: Introduce the DRM RAS infrastructure over generic netlinkRodrigo Vivi
Introduces the DRM RAS infrastructure over generic netlink. The new interface allows drivers to expose RAS nodes and their associated error counters to userspace in a structured and extensible way. Each drm_ras node can register its own set of error counters, which are then discoverable and queryable through netlink operations. This lays the groundwork for reporting and managing hardware error states in a unified manner across different DRM drivers. Currently it only supports error-counter nodes. But it can be extended later. The registration is also not tied to any drm node, so it can be used by accel devices as well. It uses the new and mandatory YAML description format stored in Documentation/netlink/specs/. This forces a single generic netlink family namespace for the entire drm: "drm-ras". But multiple-endpoints are supported within the single family. Any modification to this API needs to be applied to Documentation/netlink/specs/drm_ras.yaml before regenerating the code: $ tools/net/ynl/pyynl/ynl_gen_c.py --spec \ Documentation/netlink/specs/drm_ras.yaml --mode uapi --header \ -o include/uapi/drm/drm_ras.h $ tools/net/ynl/pyynl/ynl_gen_c.py --spec \ Documentation/netlink/specs/drm_ras.yaml --mode kernel \ --header -o drivers/gpu/drm/drm_ras_nl.h $ tools/net/ynl/pyynl/ynl_gen_c.py --spec \ Documentation/netlink/specs/drm_ras.yaml \ --mode kernel --source -o drivers/gpu/drm/drm_ras_nl.c Cc: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: netdev@vger.kernel.org Co-developed-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Zack McKevitt <zachary.mckevitt@oss.qualcomm.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260304074412.464435-8-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-03-05x86/cpu: Add platform ID to CPU matching structureDave Hansen
The existing x86_match_cpu() infrastructure can be used to match a bunch of attributes of a CPU: vendor, family, model, steppings and CPU features. But, there's one more attribute that's missing and unable to be matched against: the platform ID, enumerated on Intel CPUs in MSR_IA32_PLATFORM_ID. It is a little more obscure and is only queried during microcode loading. This is because Intel sometimes has CPUs with identical family/model/stepping but which need different microcode. These CPUs are differentiated with the platform ID. Add a field in 'struct x86_cpu_id' for the platform ID. Similar to the stepping field, make the new field a mask of platform IDs. Some examples: 0x01: matches only platform ID 0x0 0x02: matches only platform ID 0x1 0x03: matches platform IDs 0x0 or 0x1 0x80: matches only platform ID 0x7 0xff: matches all 8 possible platform IDs Since the mask is only a byte wide, it nestles in next to another u8 and does not even increase the size of 'struct x86_cpu_id'. Reserve the all 0's value as the wildcard (X86_PLATFORM_ANY). This avoids forcing changes to existing 'struct x86_cpu_id' users. They can just continue to fill the field with 0's and their matching will work exactly as before. Note: If someone is ever looking for space in 'struct x86_cpu_id', this new field could probably get stuck over in ->driver_data for the one user that there is. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://patch.msgid.link/20260304181022.058DF07C@davehans-spike.ostc.intel.com
2026-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-7.0-rc3). No conflicts. Adjacent changes: net/netfilter/nft_set_rbtree.c fb7fb4016300 ("netfilter: nf_tables: clone set on flush only") 3aea466a4399 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05Merge tag 'libcrypto-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library fixes from Eric Biggers: - Several test fixes: - Fix flakiness in the interrupt context tests in certain VMs - Make the lib/crypto/ KUnit tests depend on the corresponding library options rather than selecting them. This follows the standard KUnit convention, and it fixes an issue where enabling CONFIG_KUNIT_ALL_TESTS pulled in all the crypto library code - Add a kunitconfig file for lib/crypto/ - Fix a couple stale references to "aes-generic" that made it in concurrently with the rename to "aes-lib" - Update the help text for several CRYPTO kconfig options to remove outdated information about users that now use the library instead * tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: crypto: testmgr - Fix stale references to aes-generic crypto: Clean up help text for CRYPTO_CRC32 crypto: Clean up help text for CRYPTO_CRC32C crypto: Clean up help text for CRYPTO_XXHASH crypto: Clean up help text for CRYPTO_SHA256 crypto: Clean up help text for CRYPTO_BLAKE2B lib/crypto: tests: Add a .kunitconfig file lib/crypto: tests: Depend on library options rather than selecting them kunit: irq: Ensure timer doesn't fire too frequently
2026-03-05Merge tag 'net-7.0-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN, netfilter and wireless. Current release - new code bugs: - sched: cake: fixup cake_mq rate adjustment for diffserv config - wifi: fix missing ieee80211_eml_params member initialization Previous releases - regressions: - tcp: give up on stronger sk_rcvbuf checks (for now) Previous releases - always broken: - net: fix rcu_tasks stall in threaded busypoll - sched: - fq: clear q->band_pkt_count[] in fq_reset() - only allow act_ct to bind to clsact/ingress qdiscs and shared blocks - bridge: check relevant per-VLAN options in VLAN range grouping - xsk: fix fragment node deletion to prevent buffer leak Misc: - spring cleanup of inactive maintainers" * tag 'net-7.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits) xdp: produce a warning when calculated tailroom is negative net: enetc: use truesize as XDP RxQ info frag_size libeth, idpf: use truesize as XDP RxQ info frag_size i40e: use xdp.frame_sz as XDP RxQ info frag_size i40e: fix registering XDP RxQ info ice: change XDP RxQ frag_size from DMA write length to xdp.frame_sz ice: fix rxq info registering in mbuf packets xsk: introduce helper to determine rxq->frag_size xdp: use modulo operation to calculate XDP frag tailroom selftests/tc-testing: Add tests exercising act_ife metalist replace behaviour net/sched: act_ife: Fix metalist update behavior selftests: net: add test for IPv4 route with loopback IPv6 nexthop net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop net: vxlan: fix nd_tbl NULL dereference when IPv6 is disabled net: bridge: fix nd_tbl NULL dereference when IPv6 is disabled MAINTAINERS: remove Thomas Falcon from IBM ibmvnic MAINTAINERS: remove Claudiu Manoil and Alexandre Belloni from Ocelot switch MAINTAINERS: replace Taras Chornyi with Elad Nachman for Marvell Prestera MAINTAINERS: remove Jonathan Lemon from OpenCompute PTP MAINTAINERS: replace Clark Wang with Frank Li for Freescale FEC ...
2026-03-05ACPI: CPPC: Move reference performance to capabilitiesPengjie Zhang
Currently, the `Reference Performance` register is read every time the CPU frequency is sampled in `cppc_get_perf_ctrs()`. This function is on the hot path of the cppc_cpufreq driver. Reference Performance indicates the performance level that corresponds to the Reference Counter incrementing and is not expected to change dynamically during runtime (unlike the Delivered and Reference counters). Reading this register in the hot path incurs unnecessary overhead, particularly on platforms where CPC registers are located in the PCC (Platform Communication Channel) subspace. This patch moves `reference_perf` from the dynamic feedback counters structure (`cppc_perf_fb_ctrs`) to the static capabilities structure (`cppc_perf_caps`). Signed-off-by: Pengjie Zhang <zhangpengjie2@huawei.com> [ rjw: Changelog adjustment ] Link: https://patch.msgid.link/20260213100935.19111-1-zhangpengjie2@huawei.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-05evm: fix security.evm for a file with IMA signatureCoiby Xu
When both IMA and EVM fix modes are enabled, accessing a file with IMA signature but missing EVM HMAC won't cause security.evm to be fixed. Add a function evm_fix_hmac which will be explicitly called to fix EVM HMAC for this case. Suggested-by: Mimi Zohar <zohar@linux.ibm.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2026-03-05Merge tag 'asoc-fix-v7.0-rc2' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v7.0 A moderately large pile of fixes, though none of them are super major, plus a few new quirks and device IDs.
2026-03-05integrity: Make arch_ima_get_secureboot integrity-wideCoiby Xu
EVM and other LSMs need the ability to query the secure boot status of the system, without directly calling the IMA arch_ima_get_secureboot function. Refactor the secure boot status check into a general function named arch_get_secureboot. Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2026-03-05Merge tag 'trace-v7.0-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix thresh_return of function graph tracer The update to store data on the shadow stack removed the abuse of using the task recursion word as a way to keep track of what functions to ignore. The trace_graph_return() was updated to handle this, but when function_graph tracer is using a threshold (only trace functions that took longer than a specified time), it uses trace_graph_thresh_return() instead. This function was still incorrectly using the task struct recursion word causing the function graph tracer to permanently set all functions to "notrace" - Fix thresh_return nosleep accounting When the calltime was moved to the shadow stack storage instead of being on the fgraph descriptor, the calculations for the amount of sleep time was updated. The calculation was done in the trace_graph_thresh_return() function, which also called the trace_graph_return(), which did the calculation again, causing the time to be doubled. Remove the call to trace_graph_return() as what it needed to do wasn't that much, and just do the work in trace_graph_thresh_return(). - Fix syscall trace event activation on boot up The syscall trace events are pseudo events attached to the raw_syscall tracepoints. When the first syscall event is enabled, it enables the raw_syscall tracepoint and doesn't need to do anything when a second syscall event is also enabled. When events are enabled via the kernel command line, syscall events are partially enabled as the enabling is called before rcu_init. This is due to allow early events to be enabled immediately. Because kernel command line events do not distinguish between different types of events, the syscall events are enabled here but are not fully functioning. After rcu_init, they are disabled and re-enabled so that they can be fully enabled. The problem happened is that this "disable-enable" is done one at a time. If more than one syscall event is specified on the command line, by disabling them one at a time, the counter never gets to zero, and the raw_syscall is not disabled and enabled, keeping the syscall events in their non-fully functional state. Instead, disable all events and re-enabled them all, as that will ensure the raw_syscall event is also disabled and re-enabled. - Disable preemption in ftrace pid filtering The ftrace pid filtering attaches to the fork and exit tracepoints to add or remove pids that should be traced. They access variables protected by RCU (preemption disabled). Now that tracepoint callbacks are called with preemption enabled, this protection needs to be added explicitly, and not depend on the functions being called with preemption disabled. - Disable preemption in event pid filtering The event pid filtering needs the same preemption disabling guards as ftrace pid filtering. - Fix accounting of the memory mapped ring buffer on fork Memory mapping the ftrace ring buffer sets the vm_flags to DONTCOPY. But this does not prevent the application from calling madvise(MADVISE_DOFORK). This causes the mapping to be copied on fork. After the first tasks exits, the mapping is considered unmapped by everyone. But when he second task exits, the counter goes below zero and triggers a WARN_ON. Since nothing prevents two separate tasks from mmapping the ftrace ring buffer (although two mappings may mess each other up), there's no reason to stop the memory from being copied on fork. Update the vm_operations to have an ".open" handler to update the accounting and let the ring buffer know someone else has it mapped. - Add all ftrace headers in MAINTAINERS file The MAINTAINERS file only specifies include/linux/ftrace.h But misses ftrace_irq.h and ftrace_regs.h. Make the file use wildcards to get all *ftrace* files. * tag 'trace-v7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ftrace: Add MAINTAINERS entries for all ftrace headers tracing: Fix WARN_ON in tracing_buffers_mmap_close tracing: Disable preemption in the tracepoint callbacks handling filtered pids ftrace: Disable preemption in the tracepoint callbacks handling filtered pids tracing: Fix syscall events activation by ensuring refcount hits zero fgraph: Fix thresh_return nosleeptime double-adjust fgraph: Fix thresh_return clear per-task notrace
2026-03-05libeth, idpf: use truesize as XDP RxQ info frag_sizeLarysa Zaremba
The only user of frag_size field in XDP RxQ info is bpf_xdp_frags_increase_tail(). It clearly expects whole buffer size instead of DMA write size. Different assumptions in idpf driver configuration lead to negative tailroom. To make it worse, buffer sizes are not actually uniform in idpf when splitq is enabled, as there are several buffer queues, so rxq->rx_buf_size is meaningless in this case. Use truesize of the first bufq in AF_XDP ZC, as there is only one. Disable growing tail for regular splitq. Fixes: ac8a861f632e ("idpf: prepare structures to support XDP") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-8-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05xsk: introduce helper to determine rxq->frag_sizeLarysa Zaremba
rxq->frag_size is basically a step between consecutive strictly aligned frames. In ZC mode, chunk size fits exactly, but if chunks are unaligned, there is no safe way to determine accessible space to grow tailroom. Report frag_size to be zero, if chunks are unaligned, chunk_size otherwise. Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://patch.msgid.link/20260305111253.2317394-3-larysa.zaremba@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05net/sched: act_ife: Fix metalist update behaviorJamal Hadi Salim
Whenever an ife action replace changes the metalist, instead of replacing the old data on the metalist, the current ife code is appending the new metadata. Aside from being innapropriate behavior, this may lead to an unbounded addition of metadata to the metalist which might cause an out of bounds error when running the encode op: [ 138.423369][ C1] ================================================================== [ 138.424317][ C1] BUG: KASAN: slab-out-of-bounds in ife_tlv_meta_encode (net/ife/ife.c:168) [ 138.424906][ C1] Write of size 4 at addr ffff8880077f4ffe by task ife_out_out_bou/255 [ 138.425778][ C1] CPU: 1 UID: 0 PID: 255 Comm: ife_out_out_bou Not tainted 7.0.0-rc1-00169-gfbdfa8da05b6 #624 PREEMPT(full) [ 138.425795][ C1] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 138.425800][ C1] Call Trace: [ 138.425804][ C1] <IRQ> [ 138.425808][ C1] dump_stack_lvl (lib/dump_stack.c:122) [ 138.425828][ C1] print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) [ 138.425839][ C1] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 138.425844][ C1] ? __virt_addr_valid (./arch/x86/include/asm/preempt.h:95 (discriminator 1) ./include/linux/rcupdate.h:975 (discriminator 1) ./include/linux/mmzone.h:2207 (discriminator 1) arch/x86/mm/physaddr.c:54 (discriminator 1)) [ 138.425853][ C1] ? ife_tlv_meta_encode (net/ife/ife.c:168) [ 138.425859][ C1] kasan_report (mm/kasan/report.c:221 mm/kasan/report.c:597) [ 138.425868][ C1] ? ife_tlv_meta_encode (net/ife/ife.c:168) [ 138.425878][ C1] kasan_check_range (mm/kasan/generic.c:186 (discriminator 1) mm/kasan/generic.c:200 (discriminator 1)) [ 138.425884][ C1] __asan_memset (mm/kasan/shadow.c:84 (discriminator 2)) [ 138.425889][ C1] ife_tlv_meta_encode (net/ife/ife.c:168) [ 138.425893][ C1] ? ife_tlv_meta_encode (net/ife/ife.c:171) [ 138.425898][ C1] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 138.425903][ C1] ife_encode_meta_u16 (net/sched/act_ife.c:57) [ 138.425910][ C1] ? __pfx_do_raw_spin_lock (kernel/locking/spinlock_debug.c:114) [ 138.425916][ C1] ? __asan_memcpy (mm/kasan/shadow.c:105 (discriminator 3)) [ 138.425921][ C1] ? __pfx_ife_encode_meta_u16 (net/sched/act_ife.c:45) [ 138.425927][ C1] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 138.425931][ C1] tcf_ife_act (net/sched/act_ife.c:847 net/sched/act_ife.c:879) To solve this issue, fix the replace behavior by adding the metalist to the ife rcu data structure. Fixes: aa9fd9a325d51 ("sched: act: ife: update parameters via rcu handling") Reported-by: Ruitong Liu <cnitlrt@gmail.com> Tested-by: Ruitong Liu <cnitlrt@gmail.com> Co-developed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260304140603.76500-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05net: ethtool: Update doc for tunableMohsin Bashir
ETHTOOL_PFC_PREVENTION_TOUT enables the configuration of timeout value for PFC storm prevention. This can also be used to configure storm detection timeout for global pause settings. In fact some existing drivers are already using it for the said purpose. Highlight that the knob can formally be used to configure timeout value for pause storm prevention mechanism. The update to the ethtool man page will follow afterwards. Link: https://lore.kernel.org/aa5f189a-ac62-4633-97b5-ebf939e9c535@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-3-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05net: ethtool: Track pause storm eventsMohsin Bashir
With TX pause enabled, if a device is unable to pass packets up to the stack (e.g., CPU is hanged), the device can cause pause storm. Given that devices can have native support to protect the neighbor from such flooding, such events need some tracking. This support is to track TX pause storm events for better observability. Reviewed-by: Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-2-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05powercap: intel_rapl: Allow interface drivers to configure rapl_defaultsKuppuswamy Sathyanarayanan
RAPL default settings vary across different RAPL interfaces (MSR, TPMI, MMIO). Currently, these defaults are stored in the common RAPL driver, which requires interface-specific handling logic and makes the common layer unnecessarily complex. There is no strong reason for the common code to own these defaults, since they are inherently interface-specific. To prepare for moving default configuration into the individual interface drivers, 1. Move struct rapl_defaults into a shared header so that interface drivers can directly populate their own default settings. 2. Change the @defaults field in struct rapl_if_priv from void * to const struct rapl_defaults * to improve type safety and readability and update the common driver to use the typed defaults structure. 3. Update all internal getter functions and local pointers to use const struct rapl_defaults * to maintain const-correctness. 4. Rename and export the common helper functions (check_unit, set_floor_freq, compute_time_window) so interface drivers may reuse or override them as appropriate. No functional changes. This is a preparatory refactoring to allow interface drivers to supply their own RAPL default settings. Co-developed-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20260212233044.329790-9-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-05powercap: intel_rapl: Use unit conversion macros from units.hKuppuswamy Sathyanarayanan
Replace hardcoded numeric constants with standard unit conversion macros from linux/units.h for better code clarity and self-documentation. Add MICROJOULE_PER_JOULE and NANOJOULE_PER_JOULE to units.h to support energy unit conversions, following the existing pattern for power units. No functional changes. Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Link: https://patch.msgid.link/20260212233044.329790-8-sathyanarayanan.kuppuswamy@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-05netfilter: nft_set_pipapo: split gc into unlink and reclaim phaseFlorian Westphal
Yiming Qian reports Use-after-free in the pipapo set type: Under a large number of expired elements, commit-time GC can run for a very long time in a non-preemptible context, triggering soft lockup warnings and RCU stall reports (local denial of service). We must split GC in an unlink and a reclaim phase. We cannot queue elements for freeing until pointers have been swapped. Expired elements are still exposed to both the packet path and userspace dumpers via the live copy of the data structure. call_rcu() does not protect us: dump operations or element lookups starting after call_rcu has fired can still observe the free'd element, unless the commit phase has made enough progress to swap the clone and live pointers before any new reader has picked up the old version. This a similar approach as done recently for the rbtree backend in commit 35f83a75529a ("netfilter: nft_set_rbtree: don't gc elements on insert"). Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Reported-by: Yiming Qian <yimingqian591@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-03-05netfilter: nf_tables: clone set on flush onlyPablo Neira Ayuso
Syzbot with fault injection triggered a failing memory allocation with GFP_KERNEL which results in a WARN splat: iter.err WARNING: net/netfilter/nf_tables_api.c:845 at nft_map_deactivate+0x34e/0x3c0 net/netfilter/nf_tables_api.c:845, CPU#0: syz.0.17/5992 Modules linked in: CPU: 0 UID: 0 PID: 5992 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2026 RIP: 0010:nft_map_deactivate+0x34e/0x3c0 net/netfilter/nf_tables_api.c:845 Code: 8b 05 86 5a 4e 09 48 3b 84 24 a0 00 00 00 75 62 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 63 6d fa f7 90 <0f> 0b 90 43 +80 7c 35 00 00 0f 85 23 fe ff ff e9 26 fe ff ff 89 d9 RSP: 0018:ffffc900045af780 EFLAGS: 00010293 RAX: ffffffff89ca45bd RBX: 00000000fffffff4 RCX: ffff888028111e40 RDX: 0000000000000000 RSI: 00000000fffffff4 RDI: 0000000000000000 RBP: ffffc900045af870 R08: 0000000000400dc0 R09: 00000000ffffffff R10: dffffc0000000000 R11: fffffbfff1d141db R12: ffffc900045af7e0 R13: 1ffff920008b5f24 R14: dffffc0000000000 R15: ffffc900045af920 FS: 000055557a6a5500(0000) GS:ffff888125496000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb5ea271fc0 CR3: 000000003269e000 CR4: 00000000003526f0 Call Trace: <TASK> __nft_release_table+0xceb/0x11f0 net/netfilter/nf_tables_api.c:12115 nft_rcv_nl_event+0xc25/0xdb0 net/netfilter/nf_tables_api.c:12187 notifier_call_chain+0x19d/0x3a0 kernel/notifier.c:85 blocking_notifier_call_chain+0x6a/0x90 kernel/notifier.c:380 netlink_release+0x123b/0x1ad0 net/netlink/af_netlink.c:761 __sock_release net/socket.c:662 [inline] sock_close+0xc3/0x240 net/socket.c:1455 Restrict set clone to the flush set command in the preparation phase. Add NFT_ITER_UPDATE_CLONE and use it for this purpose, update the rbtree and pipapo backends to only clone the set when this iteration type is used. As for the existing NFT_ITER_UPDATE type, update the pipapo backend to use the existing set clone if available, otherwise use the existing set representation. After this update, there is no need to clone a set that is being deleted, this includes bound anonymous set. An alternative approach to NFT_ITER_UPDATE_CLONE is to add a .clone interface and call it from the flush set path. Reported-by: syzbot+4924a0edc148e8b4b342@syzkaller.appspotmail.com Fixes: 3f1d886cc7c3 ("netfilter: nft_set_pipapo: move cloning of match info to insert/removal path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2026-03-05net: Provide a PREEMPT_RT specific check for netdev_queue::_xmit_lockSebastian Andrzej Siewior
After acquiring netdev_queue::_xmit_lock the number of the CPU owning the lock is recorded in netdev_queue::xmit_lock_owner. This works as long as the BH context is not preemptible. On PREEMPT_RT the softirq context is preemptible and without the softirq-lock it is possible to have multiple user in __dev_queue_xmit() submitting a skb on the same CPU. This is fine in general but this means also that the current CPU is recorded as netdev_queue::xmit_lock_owner. This in turn leads to the recursion alert and the skb is dropped. Instead checking the for CPU number, that owns the lock, PREEMPT_RT can check if the lockowner matches the current task. Add netif_tx_owned() which returns true if the current context owns the lock by comparing the provided CPU number with the recorded number. This resembles the current check by negating the condition (the current check returns true if the lock is not owned). On PREEMPT_RT use rt_mutex_owner() to return the lock owner and compare the current task against it. Use the new helper in __dev_queue_xmit() and netif_local_xmit_active() which provides a similar check. Update comments regarding pairing READ_ONCE(). Reported-by: Bert Karwatzki <spasswolf@web.de> Closes: https://lore.kernel.org/all/20260216134333.412332-1-spasswolf@web.de Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: Bert Karwatzki <spasswolf@web.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20260302162631.uGUyIqDT@linutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05RDMA/mlx5: Add support for TLP VAR allocationMaher Sanalla
Extend the VAR allocation UAPI to accept an optional flags attribute, allowing userspace to request TLP VAR allocation via the MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP flag. When the TLP flag "MLX5_IB_UAPI_VAR_ALLOC_FLAG_TLP" is specified, the driver selects the TLP VAR region for allocation instead of the regular VirtIO VAR region. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05Add support for TLP emulationLeon Romanovsky
This series adds support for Transaction Layer Packet (TLP) emulation response gateway regions, enabling userspace device emulation software to write TLP responses directly to lower layers without kernel driver involvement. Currently, the mlx5 driver exposes VirtIO emulation access regions via the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that ioctl to also support allocating TLP response gateway channels for PCI device emulation use cases. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05RDMA/core: Delete not-implemented get_vector_affinityLeon Romanovsky
No drivers implement .get_vector_affinity(), and no callers invoke ib_get_vector_affinity(), so remove it. Link: https://patch.msgid.link/20260226-get_vector_affinity-v1-1-910a899c4e5d@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05RDMA/nldev: Expose kernel-internal FRMR pools in netlinkMichael Guralnik
Allow netlink users, through the usage of driver-details netlink attribute, to get information about internal FRMR pools that use the kernel_vendor_key FRMR key member. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-11-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05RDMA/nldev: Add command to set pinned FRMR handlesMichael Guralnik
Allow users to set through netlink, for a specific FRMR pool, the amount of handles that are not aged, and fill the pool to this amount. This allows users to warm-up the FRMR pools to an expected amount of handles with specific attributes that fits their expected usage. Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260226-frmr_pools-v4-10-95360b54f15e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05net/mlx5: Expose TLP emulation capabilitiesMaher Sanalla
Expose and query TLP device emulation caps on driver load. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05net/mlx5: Add TLP emulation device capabilitiesMaher Sanalla
Introduce the hardware structures and definitions needed for the driver support of TLP emulation in mlx5_ifc. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-05Merge tag 'nf-next-26-03-04' of ↵Paolo Abeni
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter updates for *net-next*, including changes to IPv6 stack and updates to IPVS from Julian Anastasov. 1) ipv6: export fib6_lookup for nft_fib_ipv6 module 2) factor out ipv6_anycast_destination logic so its usable without dst_entry. These are dependencies for patch 3. 3) switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. This gets us ~13% higher packet rate in my tests. Patches 4 to 8, from Eric Dumazet, zap sk_callback_lock usage in netfilter. Patch 9 removes another sk_callback_lock instance. Remaining patches, from Julian Anastasov, improve IPVS, Quoting Julian: * Add infrastructure for resizable hash tables based on hlist_bl. * Change the 256-bucket service hash table to be resizable. * Change the global connection table to be per-net and resizable. * Make connection hashing more secure for setups with multiple services. netfilter pull request nf-next-26-03-04 * tag 'nf-next-26-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: ipvs: use more keys for connection hashing ipvs: switch to per-net connection table ipvs: use resizable hash table for services ipvs: add resizable hash tables rculist_bl: add hlist_bl_for_each_entry_continue_rcu netfilter: nfnetlink_queue: remove locking in nfqnl_get_sk_secctx netfilter: nfnetlink_queue: no longer acquire sk_callback_lock netfilter: nfnetlink_log: no longer acquire sk_callback_lock netfilter: nft_meta: no longer acquire sk_callback_lock in nft_meta_get_eval_skugid() netfilter: xt_owner: no longer acquire sk_callback_lock in mt_owner() netfilter: nf_log_syslog: no longer acquire sk_callback_lock in nf_log_dump_sk_uid_gid() netfilter: nft_fib_ipv6: switch to fib6_lookup ipv6: make ipv6_anycast_destination logic usable without dst_entry ipv6: export fib6_lookup for nft_fib_ipv6 ==================== Link: https://patch.msgid.link/20260304114921.31042-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-04net: use ktime_t in struct scm_timestamping_internalEric Dumazet
Instead of using struct timespec64 in scm_timestamping_internal, use ktime_t, saving 24 bytes in kernel stack. This makes tcp_update_recv_tstamps() small enough to be inlined. The ktime_t -> timespec64 conversions happen after socket lock has been released in tcp_recvmsg(), and only if the application requested them. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/2 grow/shrink: 5/4 up/down: 146/-277 (-131) Function old new delta tcp_zerocopy_receive 2383 2425 +42 mptcp_recvmsg 1565 1607 +42 tcp_recvmsg_locked 3797 3823 +26 put_cmsg_scm_timestamping64 131 149 +18 put_cmsg_scm_timestamping 131 149 +18 __pfx_tcp_update_recv_tstamps 16 - -16 do_tcp_getsockopt 4024 4006 -18 tcp_recv_timestamp 474 430 -44 tcp_zc_handle_leftover 417 371 -46 __sock_recv_timestamp 1087 1031 -56 tcp_update_recv_tstamps 97 - -97 Total: Before=25223788, After=25223657, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260304012747.881644-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net: openvswitch: clean up some kernel-doc warningsRandy Dunlap
Fix some kernel-doc warnings in openvswitch.h: Mark enum placeholders that are not used as "private" so that kernel-doc comments are not needed for them. Correct names for 2 enum values: Warning: include/uapi/linux/openvswitch.h:300 Excess enum value '@OVS_VPORT_UPCALL_SUCCESS' description in 'ovs_vport_upcall_attr' Warning: include/uapi/linux/openvswitch.h:300 Excess enum value '@OVS_VPORT_UPCALL_FAIL' description in 'ovs_vport_upcall_attr' Convert one comment from "/**" kernel-doc to a plain C "/*" comment: Warning: include/uapi/linux/openvswitch.h:638 This comment starts with '/**', but isn't a kernel-doc comment. * Omit attributes for notifications. Add more kernel-doc: - add kernel-doc for kernel-only enums; - add missing kernel-doc for enum ovs_datapath_attr; - add missing kernel-doc for enum ovs_flow_attr; - add missing kernel-doc for enum ovs_sample_attr; - add kernel-doc for enum ovs_check_pkt_len_attr; - add kernel-doc for enum ovs_action_attr; - add kernel-doc for enum ovs_action_push_eth; - add kernel-doc for enum ovs_vport_attr; Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Ilya Maximets <i.maximets@ovn.org> Link: https://patch.msgid.link/20260304012437.469151-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04tcp: secure_seq: add back ports to TS offsetEric Dumazet
This reverts 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") tcp_tw_recycle went away in 2017. Zhouyan Deng reported off-path TCP source port leakage via SYN cookie side-channel that can be fixed in multiple ways. One of them is to bring back TCP ports in TS offset randomization. As a bonus, we perform a single siphash() computation to provide both an ISN and a TS offset. Fixes: 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") Reported-by: Zhouyan Deng <dengzhouyan_nwpu@163.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260302205527.1982836-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscsKoichiro Den
When shrinking the number of real tx queues, netif_set_real_num_tx_queues() calls qdisc_reset_all_tx_gt() to flush qdiscs for queues which will no longer be used. qdisc_reset_all_tx_gt() currently serializes qdisc_reset() with qdisc_lock(). However, for lockless qdiscs, the dequeue path is serialized by qdisc_run_begin/end() using qdisc->seqlock instead, so qdisc_reset() can run concurrently with __qdisc_run() and free skbs while they are still being dequeued, leading to UAF. This can easily be reproduced on e.g. virtio-net by imposing heavy traffic while frequently changing the number of queue pairs: iperf3 -ub0 -c $peer -t 0 & while :; do ethtool -L eth0 combined 1 ethtool -L eth0 combined 2 done With KASAN enabled, this leads to reports like: BUG: KASAN: slab-use-after-free in __qdisc_run+0x133f/0x1760 ... Call Trace: <TASK> ... __qdisc_run+0x133f/0x1760 __dev_queue_xmit+0x248f/0x3550 ip_finish_output2+0xa42/0x2110 ip_output+0x1a7/0x410 ip_send_skb+0x2e6/0x480 udp_send_skb+0xb0a/0x1590 udp_sendmsg+0x13c9/0x1fc0 ... </TASK> Allocated by task 1270 on cpu 5 at 44.558414s: ... alloc_skb_with_frags+0x84/0x7c0 sock_alloc_send_pskb+0x69a/0x830 __ip_append_data+0x1b86/0x48c0 ip_make_skb+0x1e8/0x2b0 udp_sendmsg+0x13a6/0x1fc0 ... Freed by task 1306 on cpu 3 at 44.558445s: ... kmem_cache_free+0x117/0x5e0 pfifo_fast_reset+0x14d/0x580 qdisc_reset+0x9e/0x5f0 netif_set_real_num_tx_queues+0x303/0x840 virtnet_set_channels+0x1bf/0x260 [virtio_net] ethnl_set_channels+0x684/0xae0 ethnl_default_set_doit+0x31a/0x890 ... Serialize qdisc_reset_all_tx_gt() against the lockless dequeue path by taking qdisc->seqlock for TCQ_F_NOLOCK qdiscs, matching the serialization model already used by dev_reset_queue(). Additionally clear QDISC_STATE_NON_EMPTY after reset so the qdisc state reflects an empty queue, avoiding needless re-scheduling. Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking") Signed-off-by: Koichiro Den <den@valinux.co.jp> Link: https://patch.msgid.link/20260228145307.3955532-1-den@valinux.co.jp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net-sysfs: use rps_tag_ptr and remove metadata from rps_dev_flow_tableEric Dumazet
Instead of storing the @log at the beginning of rps_dev_flow_table use 5 low order bits of the rps_tag_ptr to store the log of the size. This removes a potential cache line miss (for light traffic). This allows us to switch to one high-order allocation instead of vmalloc() when CONFIG_RFS_ACCEL is not set. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net-sysfs: remove rcu field from 'struct rps_dev_flow_table'Eric Dumazet
Remove rps_dev_flow_table_release() in favor of kvfree_rcu_mightsleep(). In the following pach, we will remove "u8 @log" field and 'struct rps_dev_flow_table' size will be a power-of-two. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net-sysfs: use rps_tag_ptr and remove metadata from rps_sock_flow_tableEric Dumazet
Instead of storing the @mask at the beginning of rps_sock_flow_table, use 5 low order bits of the rps_tag_ptr to store the log of the size. This removes a potential cache line miss to fetch @mask. More importantly, we can switch to vmalloc_huge() without wasting memory. Tested with: numactl --interleave=all bash -c "echo 4194304 >/proc/sys/net/core/rps_sock_flow_entries" Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net-sysfs: add rps_sock_flow_table_mask() helperEric Dumazet
In preparation of the following patch, abstract access to the @mask field in 'struct rps_sock_flow_table'. Also cleanup rps_sock_flow_sysctl() a bit : - Rename orig_sock_table to o_sock_table. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net-sysfs: remove rcu field from 'struct rps_sock_flow_table'Eric Dumazet
Removing rcu_head (and @mask in a following patch) will allow a power-of-two allocation and thus high-order allocation for better performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net: add rps_tag_ptr type and helpersEric Dumazet
Add a new rps_tag_ptr type to encode a pointer and a size to a power-of-two table. Three helpers are added converting an rps_tag_ptr to: 1) A log of the size. 2) A mask : (size - 1). 3) A pointer to the array. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260302181432.1836150-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net: fix off-by-one in udp_flow_src_port() / psp_write_headers()Eric Dumazet
udp_flow_src_port() and psp_write_headers() use ip_local_port_range. ip_local_port_range is inclusive : all ports between min and max can be used. Before this patch, if ip_local_port_range was set to 40000-40001 40001 would not be used as a source port. Use reciprocal_scale() to help code readability. Not tagged for stable trees, as this change could break user expectations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260302163933.1754393-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04Merge tag 'wireless-next-2026-03-04' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Notable features this time: - cfg80211/mac80211 - finished assoc frame encryption/EPPKE/802.1X-over-auth (also hwsim) - radar detection improvements - 6 GHz incumbent signal detection APIs - multi-link support for FILS, probe response templates and client probling - ath12k: - monitor mode support on IPQ5332 - basic hwmon temperature reporting * tag 'wireless-next-2026-03-04' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (38 commits) wifi: UHR: define DPS/DBE/P-EDCA elements and fix size parsing wifi: mac80211_hwsim: change hwsim_class to a const struct wifi: mac80211: give the AP more time for EPPKE as well wifi: ath12k: Remove the unused argument from the Rx data path wifi: ath12k: Enable monitor mode support on IPQ5332 wifi: ath12k: Set up MLO after SSR wifi: ath11k: Silence remoteproc probe deferral prints wifi: cfg80211: support key installation on non-netdev wdevs wifi: cfg80211: make cluster id an array wifi: mac80211: update outdated comment wifi: mac80211: Advertise IEEE 802.1X authentication support wifi: mac80211: Add support for IEEE 802.1X authentication protocol in non-AP STA mode wifi: cfg80211: add support for IEEE 802.1X Authentication Protocol wifi: mac80211: Advertise EPPKE support based on driver capabilities wifi: mac80211_hwsim: Advertise support for (Re)Association frame encryption wifi: mac80211: Fix AAD/Nonce computation for management frames with MLO wifi: rt2x00: use generic nvmem_cell_get wifi: mac80211: fetch unsolicited probe response template by link ID wifi: mac80211: fetch FILS discovery template by link ID wifi: nl80211: don't allow DFS channels for NAN ... ==================== Link: https://patch.msgid.link/20260304113707.175181-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04Merge tag 'vfs-7.0-rc3.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - kthread: consolidate kthread exit paths to prevent use-after-free - iomap: - don't mark folio uptodate if read IO has bytes pending - don't report direct-io retries to fserror - reject delalloc mappings during writeback - ns: tighten visibility checks - netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence * tag 'vfs-7.0-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: reject delalloc mappings during writeback iomap: don't mark folio uptodate if read IO has bytes pending selftests: fix mntns iteration selftests nstree: tighten permission checks for listing nsfs: tighten permission checks for handle opening nsfs: tighten permission checks for ns iteration ioctls netfs: Fix unbuffered/DIO writes to dispatch subrequests in strict sequence kthread: consolidate kthread exit paths to prevent use-after-free iomap: don't report direct-io retries to fserror
2026-03-04dt-bindings: arm: qcom,ids: Add SoC ID for CQ7790Krzysztof Kozlowski
Document the IDs used by Eliza SoC IoT variant: CQ7790S (without modem) and CQ7790M, present for example on MTP7790 IoT and evalkit boards. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260120164706.501119-3-krzysztof.kozlowski@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-03-04mm/mmu_notifier: clean up mmu_notifier.h kernel-docRandy Dunlap
Eliminate kernel-doc warnings in mmu_notifier.h: - add a missing struct short description - use the correct format for function parameters - add missing function return comment sections Warning: include/linux/mmu_notifier.h:236 missing initial short description on line: * struct mmu_interval_notifier_ops Warning: include/linux/mmu_notifier.h:325 function parameter 'interval_sub' not described in 'mmu_interval_set_seq' Warning: include/linux/mmu_notifier.h:325 function parameter 'cur_seq' not described in 'mmu_interval_set_seq' Warning: include/linux/mmu_notifier.h:346 function parameter 'interval_sub' not described in 'mmu_interval_read_retry' Warning: include/linux/mmu_notifier.h:346 function parameter 'seq' not described in 'mmu_interval_read_retry' Warning: include/linux/mmu_notifier.h:346 No description found for return value of 'mmu_interval_read_retry' Warning: include/linux/mmu_notifier.h:370 function parameter 'interval_sub' not described in 'mmu_interval_check_retry' Warning: include/linux/mmu_notifier.h:370 function parameter 'seq' not described in 'mmu_interval_check_retry' Warning: include/linux/mmu_notifier.h:370 No description found for return value of 'mmu_interval_check_retry' Link: https://lkml.kernel.org/r/20260302005222.3470783-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-04uaccess: correct kernel-doc parameter formatRandy Dunlap
Use the correct kernel-doc function parameter format to avoid kernel-doc warnings: Warning: include/linux/uaccess.h:814 function parameter 'uptr' not described in 'scoped_user_rw_access_size' Warning: include/linux/uaccess.h:826 function parameter 'uptr' not described in 'scoped_user_rw_access' Link: https://lkml.kernel.org/r/20260302005229.3471955-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-04Revert "ptdesc: remove references to folios from __pagetable_ctor() and ↵Axel Rasmussen
pagetable_dtor()" This change swapped out mod_node_page_state for lruvec_stat_add_folio. But, these two APIs are not interchangeable: the lruvec version also increments memcg stats, in addition to "global" pgdat stats. So after this change, the "pagetables" memcg stat in memory.stat always yields "0", which is a userspace visible regression. I tried to look for a refactor where we add a variant of lruvec_stat_mod_folio which takes a pgdat and a memcg instead of a folio, to try to adhere to the spirit of the original patch. But at the end of the day this just means we have to call folio_memcg(ptdesc_folio(ptdesc)) anyway, which doesn't really accomplish much. This regression is visible in master as well as 6.18 stable, so CC stable too. Link: https://lkml.kernel.org/r/20260225002434.2953895-1-axelrasmussen@google.com Fixes: f0c92726e89f ("ptdesc: remove references to folios from __pagetable_ctor() and pagetable_dtor()") Signed-off-by: Axel Rasmussen <axelrasmussen@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: David Hildenbrand <david@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-04KVM: Bury kvm_{en,dis}able_virtualization() in kvm_main.c once moreSean Christopherson
Now that TDX handles doing VMXON without KVM's involvement, bury the top-level APIs to enable and disable virtualization back in kvm_main.c. No functional change intended. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-16-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04KVM: x86: Move kvm_rebooting to x86Sean Christopherson
Move kvm_rebooting, which is only read by x86, to KVM x86 so that it can be moved again to core x86 code. Add a "shutdown" arch hook to facilate setting the flag in KVM x86, along with a pile of comments to provide more context around what KVM x86 is doing and why. Reviewed-by: Chao Gao <chao.gao@intel.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Sagi Shahar <sagis@google.com> Link: https://patch.msgid.link/20260214012702.2368778-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-03-04xen/xenbus: better handle backend crashJuergen Gross
When the backend domain crashes, coordinated device cleanup is not possible (as it involves waiting for the backend state change). In that case, toolstack forcefully removes frontend xenstore entries. xenbus_dev_changed() handles this case, and triggers device cleanup. It's possible that toolstack manages to connect new device in that place, before xenbus_dev_changed() notices the old one is missing. If that happens, new one won't be probed and will forever remain in XenbusStateInitialising. Fix this by checking the frontend's state in Xenstore. In case it has been reset to XenbusStateInitialising by Xen tools, consider this being the result of an unplug+plug operation. It's important that cleanup on such unplug doesn't modify Xenstore entries (especially the "state" key) as it belong to the new device to be probed - changing it would derail establishing connection to the new backend (most likely, closing the device before it was even connected). Handle this case by setting new xenbus_device->vanished flag to true, and check it before changing state entry. And even if xenbus_dev_changed() correctly detects the device was forcefully removed, the cleanup handling is still racy. Since this whole handling doesn't happened in a single Xenstore transaction, it's possible that toolstack might put a new device there already. Avoid re-creating the state key (which in the case of loosing the race would actually close newly attached device). The problem does not apply to frontend domain crash, as this case involves coordinated cleanup. Problem originally reported at https://lore.kernel.org/xen-devel/aOZvivyZ9YhVWDLN@mail-itl/T/#t, including reproduction steps. Based-on-patch-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260218095205.453657-3-jgross@suse.com>
2026-03-04xenbus: add xenbus_device parameter to xenbus_read_driver_state()Juergen Gross
In order to prepare checking the xenbus device status in xenbus_read_driver_state(), add the pointer to struct xenbus_device as a parameter. Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com> # SCSI Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/xen-pcifront.c Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20260218095205.453657-2-jgross@suse.com>
2026-03-04drm/dp: Add definition for Panel Replay full-line granularityJouni Högander
DP specification is saying value 0xff 0xff in PANEL REPLAY SELECTIVE UPDATE X GRANULARITY CAPABILITY registers (0xb2 and 0xb3) means full-line granularity. Add definition for this. Cc: dri-devel@lists.freedesktop.org Signed-off-by: Jouni Högander <jouni.hogander@intel.com> Reviewed-by: Uma Shankar <uma.shankar@intel.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/20260225074221.1744330-1-jouni.hogander@intel.com (cherry picked from commit b93311673263bb98a200ab1cb6304f969bdada5c) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>