linux.git - Linux kernel source tree

Age	Commit message (Collapse)	Author
2 days	Merge tag 'char-misc-7.2-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull Android/IIO fixes from Greg KH: "Here is a set of bugfixes for 7.2-rc3 that resolve a bunch of reported issues in just the binder and iio codebases. Included in here are: - binder driver bugfixes for both the rust and c versions for reported problems - lots and lots of iio driver bugfixes for lots of reported issues (including a hid sensor driver bugfix) Full details are in the shortlog, all of these have been in linux-next with no reported issues" * tag 'char-misc-7.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (36 commits) iio: event: Fix event FIFO reset race iio: imu: inv_icm42600: fix timestamp clock period by using lower value iio: light: al3010: fix incorrect scale for the highest gain range iio: adc: nxp-sar-adc: Fix the delay calculation in nxp_sar_adc_wait_for() iio: light: tsl2591: return actual error from probe IRQ failure iio: imu: inv_icm42600: fix timestamping by limiting FIFO reading iio: imu: st_lsm6dsx: deselect shub page before reading whoami rust_binder: clear freeze listener on node removal rust_binder: reject context manager self-transaction rust_binder: use a u64 stride when cleaning up the offsets array binder: fix UAF in binder_free_transaction() binder: fix UAF in binder_thread_release() rust_binder: synchronize Rust Binder stats with freeze commands binder: cache secctx size before release zeroes it rust_binder: fix BINDER_GET_EXTENDED_ERROR iio: adc: ad7779: add missing 'select IIO_TRIGGERED_BUFFER' to Kconfig iio: adc: ad4130: add missing `select IIO_TRIGGERED_BUFFER` to Kconfig iio: adc: ti-ads124s08: Return reset GPIO lookup errors iio: temperature: Build mlx90635 with CONFIG_MLX90635 iio: light: al3320a: add missing REGMAP_I2C to Kconfig ...
2 days	Merge tag 'trace-v7.2-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Free field in error path of synthetic event parse In __create_synth_event() the field was allocated but was not freed in the error path - Fix ring_buffer_event_length() on 8 byte aligned architectures On architectures with CONFIG_HAVE_64BIT_ALIGNED_ACCESS set to y, the ring_buffer_event_length() may return the wrong size. This is because archs with that config set will always use the "big event meta header" as that is 8 bytes keeping the payload 8 bytes aligned, even when a 4 byte header could hold the size of the event But ring_buffer_event_length() doesn't take this into account and only subtracts 4 bytes for the meta header in the length when it should have subtracted 8 bytes - Have osnoise wait for a full rcu synchronization on unregister osnoise_unregister_instance() used to call synchronize_rcu() before freeing its copy of the instance but was switched to kfree_rcu(). The osniose tracer has code that traverses the instances that it uses, and inst is just a pointer to that instance. By using kfree_rcu() instead of synchronize_rcu(), the instance that the inst pointer is pointing to can be freed while the osnoise code is still referencing it That is, a rmdir on an instance first unregisters the tracer. When the unregister finishes, the rmdir expects that the tracer is finished with the instance that it is using. By putting back the synchronize_rcu() in osnoise_unregister_instance() the unregistering of osnoise will now return when all the users of the instance have finished - Remove an unused setting of "ret" in tracing_set_tracer() - Fix ring_buffer_read_page() copying events The commit that changed ring_buffer_read_page() to show dropped events from the buffer itself, split the "commit" variable between the commit value (with flags) and "size" that holds the size of the sub-buffer. A cut and paste error changed the test of the reading from checking the size of the buffer to the size of the event causing reads to only read one event at a time - Make tracepoint_printk a static variable When the tracing sysctl knobs were move from sysctl.c to trace.c, the variable tracepoint_printk no longer needed to be global. Make it static - Fix some typos - Fix NULL pointer dereference in func_set_flag() The flags update of the function tracer first checks if the value of the flag is the same and exits if they are, and then it checks if the current tracer is the function tracer and exits if it isn't. The problem is that these checks need to be in a reversed order, as if the tracer isn't the function tracer, then the flag being checked may not exist. Reverse the order of these checks - Fix ufs core trace events to not dereference a pointer in TP_printk() The TP_printk() part of the TRACE_EVENT() macro is called when the user reads the "trace" file. This can be seconds, minutes, hours, days, weeks, and even months after the data was recorded into the ring buffer. Thus, saving a pointer to an object into the ring buffer and then dereferencing it from TP_printk() can cause harm as the object the pointer is pointing to may no longer exist Fix all the trace events in ufs core to save the device name in the ring buffer instead of dereferencing the device descriptor from TP_printk() - Prevent out-of-bound reads in glob matching of trace events The filter logic of events allows simple glob logic to add wild cards to filter on strings. But some events have fields that may not have a terminating 'nul' character. This may cause the glob matching to go beyond the string. Change the logic to always pass in the length of the field that is being matched - Add no-rcu-check version of trace_##event##_enabled() The trace_##event##_enabled() usually wraps trace events to do extra work that is only needed when the trace event is enabled. But this can hide events that are placed in locations where RCU is not watching, and can make lockdep not see these bugs when the event is not enabled The trace_##event##_enabled() was updated to always test to make sure RCU is watching to catch locations that may call events without RCU being active This caused a false positive for the irq_disabled() and related events. As that use trace_irq_disabled_enabled() to force RCU to be watching when the event is enabled via the ct_irq_enter() function, calls the event, and then calls ct_irq_exit() to put RCU back to its original state The trace_irq_disabled_enabled() should not trigger a warning when RCU is not watching because the code within its block handles the case properly. Make a __trace_##event##_enabled() version for this event to use that doesn't check RCU is watching as it handles the case when it isn't - Fix use-after-free in user_event_mm_dup() When the enabler is removed from the link list, it is freed immediately. But it is protected via RCU and needs to be freed after an RCU grace period. Use queue_rcu_work() so that the event_mutex can also be taken as user_event_put() takes the mutex on the last reference is released - Free type string in error path of parse_synth_field() There's an error path in parse_synth_field() where the allocated type string is not freed - Add selftest that tests deferred event teardown - Fix leak in error path of trace_remote_alloc_buffer() If page allocation fails, the desc->nr_cpus is not incremented for the current CPU and the allocations done for it are not freed - Fix allocation length in trace_remote_alloc_buffer() The logic to calculate the struct_len was doing a double count and setting the value too large. Calculate the size upfront to fix the error and simplify the logic - Fix sparse CPU masks in ring_buffer_desc() If there are sparse CPUs (gaps in the numbering), the ring_buffer_desc() will fail as it tests the CPU number against the number of CPUs that are used * tag 'trace-v7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Allow sparse CPU masks in ring_buffer_desc() tracing/remotes: Fix struct_len in trace_remote_alloc_buffer() tracing/remotes: Fix leak in trace_remote_alloc_buffer() error path selftests/user_events: Wait for deferred event teardown after unregister tracing/synthetic: Free type string on error path tracing/user_events: Fix use-after-free in user_event_mm_dup() tracing: Add a no-rcu-check version of trace_##event##_enabled() tracing: Prevent out-of-bounds read in glob matching ufs: core: tracing: Do not dereference pointers in TP_printk() tracing: Fix NULL pointer dereference in func_set_flag() samples: ftrace: Fix typos in benchmark comment tracing: Make tracepoint_printk static as not exported ring-buffer: Fix ring_buffer_read_page() copying only one event per page tracing: Remove unused ret assignment in tracing_set_tracer() tracing/osnoise: Call synchronize_rcu() when unregistering ring-buffer: Fix event length with forced 8-byte alignment tracing/synthetic: Free pending field on error path
4 days	Merge tag 'nfs-for-7.2-2' of git://git.linux-nfs.org/projects/anna/linux-nfs	Linus Torvalds
	Pull NFS client fixes from Anna Schumaker: - SUNRPC: - Release lower rpc_clnt if killed waiting for XPRT_LOCKED - Pin upper rpc_clnt across the TLS connect_worker - NFS: - Include MAY_WRITE in open permission mask for O_TRUNC - Charge unstable writes by request size, not folio size * tag 'nfs-for-7.2-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFS: Charge unstable writes by request size, not folio size NFSv4: include MAY_WRITE in open permission mask for O_TRUNC SUNRPC: pin upper rpc_clnt across the TLS connect_worker SUNRPC: release lower rpc_clnt if killed waiting for XPRT_LOCKED
4 days	Merge tag 'drm-fixes-2026-07-10' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Weekly fixes pull for drm, amdgpu, amdxdna, xe leading the way, some small core fixes and a nouveau stability fix along with some minor changes in other drivers. Seems to be a bit quiter than last week at least. fb-helper: - Sync on first active crtc in fb_dirty, rather than first crtc drm_exec: - Use direct label in drm_exec buddy: - Rework try_harder in the buddy allocator i915: - fix underrun on panthor lake - LT PHY SSC programming fix - fix some NULL derefs and leaks nouveau: - fix a vmm large/small page table update race xe: - Fix PTE index in xe_vm_populate_pgtable for chunked binds - Wait on external BO kernel fences in exec IOCTL - Remove duplicate include - Free madvise VMA array on L2 flush failure - Stub notifier_lock helpers when DRM_GPUSVM=n amdgpu: - PSP 15.0.9 update - SMU 15.0.9 update - VCN 5.3 fix - VI ASPM fix - Userq fix - lifetime fix for amdgpu_vm_get_task_info_pasid() - Gfx10 fix - SMU 14 fix amdkfd: - CRIU bounds checking fixes - secondary context id fix - Event bounds checking fix amdxdna: - Fix uaf in mmap failure path - A lot of deadlocks, access races and return value fixes analogix_dp: - Fix analogix_dp bitshifts during link training v3d: - Fix absent indirect bo handling imagination: - Make function static to solve compiler warning - Fix error checking" * tag 'drm-fixes-2026-07-10' of https://gitlab.freedesktop.org/drm/kernel: (44 commits) nouveau/vmm: fix another SPT/LPT race drm/imagination: fix error checking of pvr_vm_context_lookup() drm/imagination: make pvr_fw_trace_init_mask_ops static gpu/buddy: bail out of try_harder when alignment cannot be honoured drm/xe/userptr: Stub notifier_lock helpers when DRM_GPUSVM=n drm/xe: free madvise VMA array on L2 flush failure drm/xe: remove duplicate <kunit/test-bug.h> include drm/xe: Wait on external BO kernel fences in exec IOCTL drm/xe: Fix PTE index in xe_vm_populate_pgtable() for chunked binds drm/fb-helper: Only consider active CRTCs for vblank sync drm/amdkfd: Check bounds on CRIU restore queue type and mqd size drm/amd/pm: fix smu14 power limit range calculation drm/amdkfd: Check bounds in allocate_event_notification_slot amdkfd: properly free secondary context id drm/amdkfd: Don't acquire buffers during CRIU queue restore drm/amdkfd: Check bounds on CRIU restore event id drm/gfx10: Program DB_RING_CONTROL drm/amdgpu: fix lifetime issue of amdgpu_vm_get_task_info_pasid() drm/amdgpu: trigger GPU recovery when userq destroy fails to unmap a hung queue drm/amd/amdgpu: disable ASPM on VI if pcie dpm is disabled ...
5 days	Merge tag 'drm-misc-fixes-2026-07-09' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes drm-misc-fixes for v7.2-rc3: - Fix uaf in amdxdna mmap failure path. - A lot of deadlocks, access races and return value fixes in amdxdna. - Fix analogix_dp bitshifts during link training. - Use direct label in drm_exec. - Fix absent indirect bo handling in v3d. - Sync on first active crtc in fb_dirty, rather than first crtc. - Rework try_harder in the buddy allocator. - Make imagination function static to solve compiler warning. - Fix imagination error checking. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patch.msgid.link/71e5b48b-307f-47f5-8fd5-b60ea43e4196@linux.intel.com
5 days	Merge tag 'net-7.2-rc3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter, Bluetooth and batman-adv. Current release - regressions: - bluetooth: fix using chan->conn as indication to no remote netdev Current release - new code bugs: - netfilter: cap to maximum number of expectation per master on updates Previous releases - regressions: - bluetooth: - fix UAF of hci_conn_params in add_device_complete - fix null ptr deref in hci_abort_conn() - igmp: remove multicast group from hash table on device destruction - batman-adv: prevent TVLV OOB check overflow - eth: mlx5/mlx5e: - fix off-by-one in single-FDB error rollback - skip peer flow cleanup when LAG seq is unavailable - fix crashes in dynamic per-channel stats and HV VHCA agent - eth: mana: Sync page pool RX frags for CPU Previous releases - always broken: - netfilter: - mark malformed IPv6 extension headers for hotdrop - terminate table name before find_table_lock() - ipvs: use parsed transport offset in TCP state lookup - sched: act_pedit: fix TOCTOU heap OOB write in tc offload - ethtool: rss: fix hfunc and input_xfrm parsing on big endian - ipv4/ipv6: fix UAF and memory leak in IGMP/MLD - tls: consume empty data records in tls_sw_read_sock() - eth: - octeontx2-af: fix VF bringup affecting PF promiscuous state - gue: validate REMCSUM private option length" * tag 'net-7.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) macsec: don't read an unset MAC header in macsec_encrypt() dibs: loopback: validate offset and size in move_data() octeontx2-af: fix VF bringup affecting PF promiscuous state ethtool: rss: Fix hfunc and input_xfrm parsing on big endian net/mlx5: Fix L3 tunnel entropy refcount leak net: macb: drop in-flight Tx SKBs on close net: mana: Sync page pool RX frags for CPU net: mana: Validate the packet length reported by the NIC selftests/net: fix EVP_MD_CTX leak in tcp_mmap ipvs: ensure inner headers in ICMP errors are in headroom ipvs: use parsed transport offset in SCTP state lookup ipvs: use parsed transport offset in TCP state lookup ipvs: pass parsed transport offset to state handlers netfilter: handle unreadable frags netfilter: flowtable: support IPIP tunnel with direct xmit netfilter: flowtable: IPIP tunnel hardware offload is not yet support netfilter: flowtable: use dst in this direction when pushing IPIP header netfilter: ipset: allocate the proper memory for the generic hash structure netfilter: ipset: cleanup the add/del backlog when resize failed netfilter: ipset: exclude gc when resize is in progress ...
5 days	Merge tag 'nf-26-07-08' of ↵	Paolo Abeni
	https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Florian Westphal says: ==================== netfilter: updates for net The following patchset contains Netfilter fixes for net. Most of these are LLM fixes for old issues flagged by sashiko/LLMs. Many of these trigger drive-by-findings in sashiko. In particular: - many load/store tearing and missing memory barriers, races etc. in ipset, esp. with GC and resizing. Keeping the proposed patches spinning for yet-another-iteration keeps legit fixes back, so I prefer to add these now and follow up with other reports later. - flowtable work queue still has possible races with teardown, but same rationale as with ipset: drive-by findings, not problems coming with the flowtable IPIP changeset in this PR. - ever since unreadable frag skb support was added in 6.12, we can no longer do: BUG_ON(skb_copy_bits( ...): it will fire with such skbs. Mina Almasry is looking at similar patterns elsewhere in the stack. 1) Guard skb->mac_header adjustment after IPv6 defragmentation in nf_conntrack_reasm. From Xiang Mei. 2) NUL-terminate ebtables table names before calling find_table_lock() to prevent stack-out-of-bounds reads. Also from Xiang Mei. 3) Zero the ebtables chainstack array, else error unwind may free bogus pointer when CPU mask is sparse. All three issues date from 2.6 days. 4) Ensure ebtables module names are c-strings, same bug pattern as 2). Bug added in 4.6. 5) Fix catchall element handling for inverted lookups in nft_lookup. Fold the catchall lookup into ext before computing the match status. Was like this ever since catchall elements got introduced in 5.13. From Tamaki Yanagawa. 6-9) ipset updates from Jozsef Kadlecsik: - mark rcu protected areas correctly - address gc and resize clash in the comment extension - add/del backlog cleanup in the error path - allocate right size for the generic hash structure 10-12): IPIP flowtable updates from Pablo Neira Ayuso: - Use the current direction's route when pushing IPIP headers Fix incorrect headroom and fragmentation offset calculations. - Avoid hardware offload for IPIP tunnels due to lack of driver support. - Support IPIP tunnels with direct xmit in netfilter flowtable. dst_cache and dst_cookie are moved outside the union to share route state across flows. This is a followup to work done in 6.19 cycle. 13) Don't BUG() on skb_copy_bits error. Handle unreadable fragments by either returning an error or restricting the copy operations to linear area, This became an issue when unreable frag support was merged in 6.12. 14-16): IPVS updates from Yizhou Zhao: - Pass parsed transport offset to IPVS state handlers. update callback signatures. - use correct transport header offset on state lookp in TCP. As-is it was possible for ipv6 extension header data to be treated as L4 header. - same for SCTP. This was also broken since 2.6 days. 17) Ensure inner IP headers in ICMP errors are in the skb headroom after stripping outer headers. Add more checks for the length of inner headers. This was broken since 3.7 days. From Julian Anastasov. netfilter pull request nf-26-07-08 * tag 'nf-26-07-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: ipvs: ensure inner headers in ICMP errors are in headroom ipvs: use parsed transport offset in SCTP state lookup ipvs: use parsed transport offset in TCP state lookup ipvs: pass parsed transport offset to state handlers netfilter: handle unreadable frags netfilter: flowtable: support IPIP tunnel with direct xmit netfilter: flowtable: IPIP tunnel hardware offload is not yet support netfilter: flowtable: use dst in this direction when pushing IPIP header netfilter: ipset: allocate the proper memory for the generic hash structure netfilter: ipset: cleanup the add/del backlog when resize failed netfilter: ipset: exclude gc when resize is in progress netfilter: ipset: mark the rcu locked areas properly netfilter: nft_lookup: fix catchall element handling with inverted lookups netfilter: ebtables: module names must be null-terminated netfilter: ebtables: zero chainstack array netfilter: ebtables: terminate table name before find_table_lock() netfilter: nf_conntrack_reasm: guard mac_header adjustment after IPv6 defrag ==================== Link: https://patch.msgid.link/20260708140309.19633-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
5 days	net: mana: Sync page pool RX frags for CPU	Dexuan Cui
	MANA allocates RX buffers from page pool fragments when frag_count is greater than 1. In that case the buffers remain DMA mapped by page pool and the RX completion path does not call dma_unmap_single(). As a result, the implicit sync-for-CPU normally performed by dma_unmap_single() is missing before the packet data is passed to the networking stack. This breaks RX on configurations which require explicit DMA syncing, for example when booted with swiotlb=force. Fix this by recording the page pool page and DMA sync offset when the RX buffer is allocated, and syncing the received packet range for CPU access before handing the RX buffer to the stack. Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.") Cc: stable@vger.kernel.org Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Link: https://patch.msgid.link/20260702041237.617719-3-decui@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 days	SUNRPC: pin upper rpc_clnt across the TLS connect_worker	Chuck Lever
	The TLS connect path has a use-after-free: nothing pins the upper rpc_clnt across the delayed connect_worker. xs_connect() stores task->tk_client in sock_xprt::clnt as a raw pointer and queues the worker; for TLS-secured transports that worker is xs_tcp_tls_setup_socket(), which reads several fields out of the saved pointer (cl_timeout, cl_program, cl_prog, cl_vers, cl_cred, cl_stats) to construct the args for the inner handshake rpc_clnt. The xprt does not reference the rpc_clnt; the rpc_clnt references the xprt. xs_destroy() does cancel the connect_worker, but it runs only when the xprt's refcount drops to zero, which cannot happen until the rpc_clnt releases its cl_xprt reference in rpc_free_client_work(). When a TLS handshake fails fatally (for example, an mTLS mount whose client cert does not match the server), the connecting task is woken with -EACCES and exits, the mount caller invokes rpc_shutdown_client(), and the upper rpc_clnt is freed before the queued connect_worker fires. xs_tcp_tls_setup_socket() then dereferences the freed clnt, producing the refcount_t underflow Michael Nemanov reported. Take a reference on the upper rpc_clnt in xs_connect() for TLS transports via a new rpc_hold_client() helper, and drop it in the connect_worker's exit path with rpc_release_client(). The xprt_lock_connect() / xprt_unlock_connect() pairing already serialises xs_connect() with xs_tcp_tls_setup_socket(), so the take and release are balanced one-for-one. The non-TLS connect worker (xs_tcp_setup_socket) never reads sock_xprt::clnt, so leave that path alone and avoid the clnt-holds-xprt-holds-clnt cycle that would otherwise prevent xprt destruction. Reported-by: Michael Nemanov <michael.nemanov@vastdata.com> Closes: https://lore.kernel.org/linux-nfs/40e3d522-dfcf-4fc1-9c55-b5e81f1536d5@vastdata.com/ Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Michael Nemanov <michael.nemanov@vastdata.com> Reviewed-by: Michael Nemanov <michael.nemanov@vastdata.com> Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
6 days	ipvs: pass parsed transport offset to state handlers	Yizhou Zhao
	IPVS callers already parse the packet into struct ip_vs_iphdr before updating connection state. For IPv6 this records the real transport-header offset after extension headers in iph.len. Pass this parsed transport offset through ip_vs_set_state() and the protocol state_transition() callback so protocol handlers can use the same packet context as scheduling and NAT handling. This patch only changes the common callback plumbing and adapts the protocol callback signatures; TCP and SCTP start using the value in follow-up patches. Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
6 days	netfilter: flowtable: support IPIP tunnel with direct xmit	Pablo Neira Ayuso
	The combination of IPIP tunnel with direct xmit, eg. bridge device, breaks because no dst_entry is provided to check the skb headroom and to set the iph->frag_off field. This leads to invalid dst usage and can trigger a crash in the tunnel transmit path. Fix this by moving dst_cache and dst_cookie out of the runtime union so that they can be shared by neighbour, xfrm, and direct tunnel flows. For FLOW_OFFLOAD_XMIT_DIRECT tuples carrying tunnel metadata, preserve route state in these shared fields and release it through the common dst release path. Since dst_entry is now available to the three supported xmit modes and dst_release() already deals with NULL dst, remove the xmit type check in nft_flow_dst_release(). Moreover, skip the check if the dst entry is NULL in nf_flow_dst_check() which is now the case for the direct xmit case. Based on patch from Rein Wei <n05ec@lzu.edu.cn>. Fixes: d30301ba4b07 ("netfilter: flowtable: Add IPIP tx sw acceleration") Cc: stable@vger.kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reported-by: Zhengyang Chen <chzhengyang2023@lzu.edu.cn> Reported-by: Ren Wei <n05ec@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
6 days	netfilter: flowtable: IPIP tunnel hardware offload is not yet support	Pablo Neira Ayuso
	No driver supports for IPIP tunnels yet, give up early on setting up the hardware offload for this scenario. This patch adds a stub that can be enhanced to add more configuration that are currently not supported. As of now, the offload work is enqueued to the worker, then ignored if the hardware offload configuration is not supported. Check the NF_FLOW_HW flag to know if this entry was already tried once to be offloaded so this is not retried on refresh when unsupported. Move NF_FLOW_HW flag check to nf_flow_offload_add(). If this NF_FLOW_HW flag is unset the _del and _stats variants are never called. This can be updated later on to skip hardware offload work to be queued in case hardware offload does not support it. Fixes: d98103575dcd ("netfilter: flowtable: Add IP6IP6 rx sw acceleration") Fixes: ab427db17885 ("netfilter: flowtable: Add IPIP rx sw acceleration") Cc: stable@vger.kernel.org Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Reported-by: Zhengyang Chen <chzhengyang2023@lzu.edu.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
6 days	ipv6: mcast: Fix potential UAF in MLD delayed work	Eric Dumazet
	A race condition exists between device teardown and incoming MLD query processing, leading to a Use-After-Free in the MLD delayed work. During device destruction, the primary reference to inet6_dev is dropped, which can drop its refcount to 0. The actual freeing of inet6_dev memory is deferred via RCU. Concurrently, the packet receive path runs under RCU read lock and obtains the inet6_dev pointer. Because the memory is RCU-protected, CPU-0 can safely dereference inet6_dev even if its refcount has hit 0. However, if CPU-0 calls igmp6_event_query() and schedules delayed work, it attempts to acquire a reference using in6_dev_hold(). This increments the refcount from 0 to 1, triggering a "refcount_t: addition on 0" warning. Since the inet6_dev memory is still scheduled to be freed after the RCU grace period, the device is freed while the work is still scheduled. When the work runs, it accesses the freed memory, causing a kernel panic. Fix this by using refcount_inc_not_zero() (via a new helper in6_dev_hold_safe()) to prevent acquiring a reference if the device is already being destroyed. If the refcount is 0, we do not schedule the work. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260705181756.963063-3-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
6 days	ipv4: igmp: Fix potential UAF in igmp_gq_start_timer()	Eric Dumazet
	A race condition exists between device teardown (inetdev_destroy) and incoming IGMP query processing (igmp_rcv), leading to a Use-After-Free in the IGMP timer callback. During device destruction, inetdev_destroy() drops the primary reference to in_device, which can drop its refcount to 0. The actual freeing of in_device memory is deferred via RCU (using call_rcu()). Concurrently, igmp_rcv() runs under RCU read lock and obtains the in_device pointer. Because the memory is RCU-protected, CPU-0 can safely dereference in_device even if its refcount has hit 0. However, if CPU-0 calls igmp_gq_start_timer() and re-arms the timer, it attempts to acquire a reference using in_dev_hold(). This increments the refcount from 0 to 1, triggering a "refcount_t: addition on 0" warning. Since the in_device memory is still scheduled to be freed after the RCU grace period (as the free callback does not check the refcount again), the device is freed while the timer is still armed. When the timer expires, it accesses the freed memory, causing a kernel panic. Fix this by using refcount_inc_not_zero() (via a new helper in_dev_hold_safe()) to prevent acquiring a reference if the device is already being destroyed. If the refcount is 0, we do not arm the timer. A similar issue in IPv6 MLD is fixed in a subsequent patch. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260705181756.963063-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
7 days	tracing: Add a no-rcu-check version of trace_##event##_enabled()	Steven Rostedt
	Tracepoints require that RCU is watching. To prevent them from being used in places that RCU is not watching, the trace_##event() macro always calls rcu_is_watching() even when the event is not enabled and warns if RCU is not watching. This is to make sure a warning is triggered even if the tracepoint is never enabled (as it is only a bug when it is). It was noticed that tracepoints could be hidden within trace_#event#_enabled() calls, which are used to do extra work for the tracepoint only if the tracepoint is enabled. But this also can hide the fact that a tracepoint is placed in a location that can be called when RCU is not watching. Commit 9764e731ef6ab ("tracepoint: Add lockdep rcu_is_watching() check to trace_##name##_enabled()") added a check to the trace_##event##_enabled() macro to make sure RCU is watching when it is called to make sure not to hide the bug of a tracepoint being called when RCU is not watching. There is one case in the irq_disable tracepoint where it is within a trace_irq_disable_enabled() block, but it checks if RCU is watching, and if it isn't, it makes a call to ct_irq_enter() that makes RCU watch again. But because trace_irq_disable_enabled() now checks if RCU is watching and will trigger if it isn't. This is a false warning as the code within the block handles this case. Add a new internal macro __trace_##event##_enabled() that doesn't check if RCU is watching, and convert the irq_enable/disable tracepoints over to it. Link: https://patch.msgid.link/20260701132744.6a7fc68b@robin Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/all/CAMuHMdXud_RpWag_hFqa2ByBGRxg6KnxGL1ObCWZrpTsk3TfAw@mail.gmail.com/ Fixes: 9764e731ef6ab ("tracepoint: Add lockdep rcu_is_watching() check to trace_##name##_enabled()") Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7 days	tracing: Prevent out-of-bounds read in glob matching	Huihui Huang
	String event fields are not necessarily NUL-terminated, so the filter predicate functions (filter_pred_string(), filter_pred_strloc() and filter_pred_strrelloc()) pass the field length to the regex match callbacks, and the length-aware matchers honour it. regex_match_glob() was the exception: it ignored the length and called glob_match(), which scans the string until it hits a NUL byte. Some string fields are not NUL-terminated. One example is the dynamic char array of the xfs_* namespace tracepoints, which is copied without a trailing NUL. For such a field, glob matching reads past the end of the event field, causing a KASAN slab-out-of-bounds read in glob_match(), reached via regex_match_glob() and filter_match_preds() from the xfs_lookup tracepoint. Add a length-bounded glob_match_len() and use it from regex_match_glob() so glob matching always stops at the field boundary. The matching loop is factored into a shared helper so glob_match() keeps its behaviour. Fixes: 60f1d5e3bac4 ("ftrace: Support full glob matching") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/da1aaf125fc3b63320b0c540fd6afa7c3d5b4f1a.1782836943.git.hhhuang@smu.edu.sg Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Assisted-by: Codex:GPT-5.4 Signed-off-by: Huihui Huang <hhhuang@smu.edu.sg> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7 days	drm/drm_exec: avoid indirect goto	Christian König
	The drm_exec component uses a variable with scope limited to the for() and an indirect goto to allow instantiating multiple macros in the same function. This unfortunately doesn't work well with certain compilers when the indirect goto can't be lowered to a direct jump. Switch the indirect goto to a direct goto, the drawback is that we now can't use the dma_exec_until_all_locked() macro in the same function multiple times. The is currently only one user of this and only as a hacky workaround which is about to be removed. So document that the __label__ statement should be used when the macro is used multiple times and fix the tests and the only use case where that is necessary. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 9920249a5288 ("drm/amdgpu: convert amdgpu_vm_lock_by_pasid() to drm_exec") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202606231854.7LeCtlLe-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606232356.gwHMAJAW-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606240753.kYjobJVl-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202606241110.iUga5vVw-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031446.1PWG18mN-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607031837.HSmBj8pr-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202607040159.GopyEswS-lkp@intel.com/ Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/r/20260704084133.122053-1-christian.koenig@amd.com
8 days	Merge tag 'mm-hotfixes-stable-2026-07-06-17-49' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 17 are for MM. 12 are cc:stable and the remaining 8 address post-7.1 issues or aren't considered suitable for backporting. Two patches from SJ addresses a couple of quite old DAMON issues. And two patches from Yichong Chen fixes tools/virtio build issues. The remaining patches are singletons" * tag 'mm-hotfixes-stable-2026-07-06-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: tools/include: include stdint.h for SIZE_MAX in overflow.h tools/virtio: add missing compat definitions for vhost_net_test mm: do file ownership checks with the proper mount idmap samples/damon/mtier: fail early if address range parameters are invalid mm: a second pagecache maintainer mm/damon: add a kernel-doc comment for damon_ctx->rnd_state mm/damon: add a kernel-doc comment for damon_ctx->probes mailmap: add entries for Radu Rendec selftests/mm: hmm-tests: include linux/mman.h to access MADV_COLLAPSE selftests/mm: pagemap_ioctl: use the correct page size for transact_test() fs/proc: fix KPF_KSM reported for all anonymous pages mm: page_ext: add count limit to page_ext_iter_next to prevent invalid PFN access mm/damon/ops-common: handle extreme intervals in damon_hot_score() MAINTAINERS: add Lance as an rmap reviewer mm/compaction: handle free_pages_prepare() properly in compaction_free() mm/damon/sysfs-schemes: put stats for scheme_add_dirs() internal error mm/damon/sysfs-schemes: fix dir put orders in access_pattern_add_dirs() mm: shrinker: fix NULL pointer dereference in debugfs mm: shrinker: fix shrinker_info teardown race with expansion selftests/mm: fix ksft_process_madv.sh test category
8 days	Bluetooth: ISO: exclude RFU bits from ISO_SDU_Length	Pauli Virtanen
	slen contains ISO_SDU_Length (12 bits), RFU (2 bits), Packet_Status_Flags (2 bits). Exclude the RFU bits from hci_iso_data_len. Also add masks to the pack macro. Fixes: 4de0fc599eb9 ("Bluetooth: Add definitions for CIS connections") Signed-off-by: Pauli Virtanen <pav@iki.fi> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
8 days	Bluetooth: L2CAP: Fix use-after-free in l2cap_sock_new_connection_cb()	Siwei Zhang
	l2cap_sock_new_connection_cb() returned l2cap_pi(sk)->chan after release_sock(parent). Once the parent lock is dropped the newly enqueued child socket sk is reachable via the accept queue, so another task can accept and free it before the callback dereferences sk, resulting in a use-after-free. Rework the ->new_connection() op so the core, rather than the callback, owns the child channel's lifetime. The op now receives a pre-allocated new_chan and returns an errno instead of allocating and returning a channel. l2cap_new_connection() allocates the child channel and links it into the conn list via __l2cap_chan_add() before invoking the callback, so the conn-list reference keeps the channel alive once release_sock(parent) exposes the socket to other tasks. Channel configuration that was duplicated in l2cap_sock_init() and the various new_connection callbacks is consolidated into l2cap_chan_set_defaults(), which now inherits from the parent channel when one is supplied. Fixes: 8ffb929098a5 ("Bluetooth: Remove parent socket usage from l2cap_core.c") Cc: stable@kernel.org Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
8 days	Bluetooth: hci_conn: Fix null ptr deref in hci_abort_conn()	Siwei Zhang
	hci_abort_conn() read hci_skb_event(hdev->sent_cmd) when a connection was pending, but hdev->sent_cmd can be NULL while req_status is still HCI_REQ_PEND, leading to a NULL pointer dereference and a general protection fault from the hci_rx_work() receive path. Instead of inspecting hdev->sent_cmd, track the in-flight create connection command with a new per-connection HCI_CONN_CREATE flag and route all cancellation through hci_cancel_connect_sync(), which dispatches to a dedicated per-type cancel function. The create command is in exactly one of two states: still queued, or in flight. The cancel function holds cmd_sync_work_lock across the whole decision: the worker takes this lock to dequeue every entry, so while it is held a queued command cannot start running and an in-flight command cannot complete and let the next command become pending. This keeps the flag test and hci_cmd_sync_cancel() atomic with respect to the worker, so a queued command is simply dequeued, and an in-flight command owned by this connection is cancelled without the risk of cancelling an unrelated command that became pending in the meantime. CIS uses the same flag mechanism via HCI_CONN_CREATE_CIS but cannot be dequeued per-connection. hci_acl_create_conn_sync() and hci_le_create_conn_sync() clear HCI_CONN_CREATE after the create command completes, but the command status handler can free conn via hci_conn_del() (for example when the controller rejects the connection) while the worker is still blocked on the connection complete event. Hold a reference on conn across the create command so the flag can be cleared without a use-after-free. Fixes: a13f316e90fd ("Bluetooth: hci_conn: Consolidate code for aborting connections") Cc: stable@vger.kernel.org Suggested-by: XIAO WU <xiaowu.417@qq.com> Assisted-by: Claude:claude-opus-4-8 Signed-off-by: Siwei Zhang <oss@fourdim.xyz> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
8 days	net/sched: act_pedit: fix TOCTOU heap OOB write in tc offload	Jamal Hadi Salim
	There is a TOCTOU race condition in flower lockless approach between sizing a flow_rule buffer and filling it. zdi-disclosures@trendmicro.com reports: The cls_flower classifier operates with TCF_PROTO_OPS_DOIT_UNLOCKED (fl_change runs without RTNL), while RTM_NEWACTION holds RTNL, so the independent locking domains make the race reachable in practice. KASAN confirms: BUG: KASAN: slab-out-of-bounds in tcf_pedit_offload_act_setup+0x81b/0x930 Write of size 4 at addr ffff888001f27520 by task poc-toctou/312 The buggy address is located 0 bytes to the right of allocated 288-byte region [ffff888001f27400, ffff888001f27520) (cache kmalloc-512) Note: The result is a heap OOB write attacker-controlled content into the adjacent slab object (requires CAP_NET_ADMIN). The fix introduces reading tcfp_nkeys under act->tcfa_lock in all places using a new tcf_pedit_nkeys_locked() which replaces the old tcf_pedit_nkeys(). Additionally we close the remaining TOCTOU window between the sizing read and the fill reads by more careful accounting. Rather than silently truncating the key count, which leads to incorrect action semantics offloaded to hardware and secondary OOB writes if the remaining capacity is zero or consumed by prior actions, we enforce remaining capacity checks and return -ENOSPC if the required space exceeds the remaining capacity. Fixes: 71d0ed7079df ("net/act_pedit: Support using offset relative to the conventional network headers") Reported-by: zdi-disclosures@trendmicro.com Tested-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260701161912.125355-1-jhs@mojatatu.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
11 days	Merge tag 'drm-fixes-2026-07-04' of https://gitlab.freedesktop.org/drm/kernel	Linus Torvalds
	Pull drm fixes from Dave Airlie: "Weekly fixes for drm. This is large for rc2 but it's just a lot of small fixes across a bunch of drivers, xe, amdgpu as usual, plus some sashiko-inspired fixes for panthor, and some dma-fence updates. core: - kernel doc fix - include types.h in drm_ras.h dma-fence: - fix NULL ptr dereference - use correct callback - make dma_fence_dedup_array more robust dp: - handle torn down topology gracefully - fix kernel doc i915: - Input validation fixes for BIOS and EDID - Fix HDCP code buffer overflow and seq_num_v monotonic increase check - Fix near-NULL deref in i915_active during GFP_ATOMIC exhaustion xe: - Wedge from the timeout handler only after releasing the queue - Fix a NULL pointer dereference - Remove redundant exec_queue_suspended - RTP / OA whitelist fixes - Return error on non-migratable faults requiring devmem - Skip FORCE_WC and vm_bound check for external dma-bufs - Hold notifier lock for write on inject test path - Drop bogus static from finish in force_invalidate - Fix double-free of managed BO in error path - Don't attempt to process FAST_REQ or EVENT relays - Fix NPD in bo_meminfo - Prevent invalid cursor access for purged BOs - Fix offset alignment for MERT WHITELST_OA_MERT_MMIO_TRG amdgpu: - Soc24 aborted suspend fix - Drop unecessary BUG() and BUG_ON() from error paths - SCPM fix - Power reporting fix - DCE HDR fix - UVD boundary checks - VCN boundary checks - VCE boundary checks - DCN 4.2 fixes - Large stack allocation fixes - Fix aperture mapping leak - UserQ fixes - Ignore_damage_clips fix - ACP fixes - DC boundary checks - GPUVM fixes - JPEG idle check fixes - Userptr fix - GC 11.7 updates - Non-4K page fix - SMU 13 fixes - DP alt mode fix amdkfd: - Boundary checks - CRIU fixes amdxdna: - fix device removal issues - fix use after free in debug BO imagination: - fix double call to scheduler fini - fix ioctl return values - fix user array stride virtio: - handle EDIDs better panthor: - irq safe fence lock fix - reset work fix - fix invalid pointer - fix iomem access in suspended state - sched resume fix - unplug suspend fix - drop needless check - eviction leak fix - bail on group start/resume fix - keep irqs masked malidp: - use clock bulk API komeda: - clock prepare fixes" * tag 'drm-fixes-2026-07-04' of https://gitlab.freedesktop.org/drm/kernel: (105 commits) drm/xe/oa: Fix offset alignment for MERT WHITELIST_OA_MERT_MMIO_TRG drm/xe/pt: prevent invalid cursor access for purged BOs drm/xe: fix NPD in bo_meminfo() drm/xe/pf: Don't attempt to process FAST_REQ or EVENT relays drm/xe/hw_engine: Fix double-free of managed BO in error path drm/xe/userptr: Drop bogus static from finish in force_invalidate drm/xe/userptr: Hold notifier_lock for write on inject test path drm/xe/display: skip FORCE_WC and vm_bound check for external dma-bufs drm/xe: Return error on non-migratable faults requiring devmem drm/xe/rtp: Ensure locking/ref counting for OA whitelists drm/xe/oa: (De-)whitelist OA registers on OA stream open/release drm/xe/rtp: (De-)whitelist OA registers for all hwe's for a gt drm/xe/rtp: Toggle 'deny' bit to (de-)whitelist OA regs drm/xe/rtp: Save OA nonpriv registers to register save/restore lists drm/xe/rtp: Generalize whitelist_apply_to_hwe drm/xe/rtp: Keep track of non-OA nonpriv slots drm/xe/rtp: Maintain OA whitelists separately drm/xe/rtp: Fix build error with clang < 21 and non-const initializers drm/imagination: Fix user array stride in pvr_set_uobj_array() drm/imagination: Fix returned size for DRM_IOCTL_PVR_DEV_QUERY ...
11 days	Merge tag 'acpi-7.2-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI support fixes from Rafael Wysocki: "These fix a coding mistake in the ACPI TAD (Time and Alarm Device) driver introduced by one of its previous updates and get rid of the ugly #ifdef __KERNEL__ conditional compilation in acpi_ut_safe_strncpy() by redefining that function as an alias for strscpy_pad(): - Add a missing ACPI_TAD_AC_WAKE capability check omitted by mistake to the ACPI TAD driver (Xu Rao) - Define acpi_ut_safe_strncpy() as an alias for strscpy_pad() which is viable because that function is only called from kernel code (Rafael Wysocki)" * tag 'acpi-7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPICA: Define acpi_ut_safe_strncpy() as strscpy_pad() alias ACPI: TAD: Check AC wake capability before enabling wakeup
11 days	Merge tag 'vfs-7.2-rc2.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - netfs: - fix the decision when to disallow write-streaming with fscache in use, handling of asynchronous cache object creation, a double fput in cachefiles, clearing S_KERNEL_FILE without the inode lock held, page extraction bugs in the iov_iter helpers (a potential underflow, a missing allocation failure check, a memory leak, and a folio offset miscalculation), writeback error and ENOMEM handling, DIO write retry for filesystems without a ->prepare_write() method, and the replacement of the wb_lock mutex with a bit lock plus writethrough collection offload so that multiple asynchronous writebacks don't interfere with each other. - Fix the barriering when walking the netfs subrequest list during retries as it was possible to see a subrequest that was just added by the application thread. - iomap: - Change iomap to submit read bios after each extent instead of building them up across extents. The old behavior was considered problematic for a while and now caused an actual erofs bug. - Guard the ioend io_size EOF trim in iomap against underflow when a concurrent truncate moves EOF below the start of the ioend, wrapping io_size to a huge value. - overlayfs - Fix a stale overlayfs comment about the locking order. - Store the linked-in upper dentry instead of the disconnected O_TMPFILE dentry during overlayfs tmpfile copy-up. With a FUSE or virtiofs upper layer ->d_revalidate() would try to look up "/" in the workdir and fail, causing persistent ESTALE errors that broke dpkg and apt. - vfs-bpf: Have the bpf_real_data_inode() kfunc take a struct file instead of a dentry so it is usable from the bprm_check_security, mmap_file, and file_mprotect hooks, and rename it from bpf_real_inode() to make the data-inode semantics explicit. The kfunc landed this cycle so the change is safe. - afs: NULL pointer dereferences in the callback service and in afs_get_tree(), several memory and refcount leaks, missing locking around the dynamic root inode numbers and premature cell exposure through /afs, a netns destruction hang caused by a misplaced increment of net->cells_outstanding, a bulk lookup malfunction caused by the dir_emit() API change, inode (re)initialisation issues, and assorted smaller fixes to error codes, seqlock handling, and debug output. - vfs: Refuse O_TMPFILE creation with an unmapped fsuid or fsgid and add a selftest for it. - vboxsf: Add Jori Koolstra as vboxsf maintainer, taking over from Hans de Goede. - dio: Release the pages attached to a short atomic dio bio; the REQ_ATOMIC size check error path leaked them. - procfs: Only bump the parent directory link count when registering directories in procfs. Registering regular files inflated the count and leaked a link on every create and remove cycle. - minix: Avoid an unsigned overflow in the minix bitmap block count calculation that let crafted images with huge inode or zone counts pass superblock validation and crash the kernel during mount. - cachefiles: Fix a double unlock in the cachefiles nomem_d_alloc error path left over from the start_creating() conversion. - fat: Stop fat from reading directory entries past the 0x00 end-of-directory marker. If the trailing on-disk slots aren't zero-filled the driver surfaced arbitrary garbage as directory entries. - freexvfs: Don't BUG() on unknown typed-extent types in freevxfs, reachable via ioctl(FIBMAP) on a crafted image; fail with an I/O error instead. - orangefs: Keep the readdir entry size 64-bit in orangefs fill_from_part(). Truncating it to __u32 bypassed the bounds check and led to out-of-bounds reads triggerable by the userspace client. - xfs: Fix the error unwind in xfs_open_devices() which released the rt device file twice and left dangling buftarg pointers behind that were freed again when the failed mount was torn down. - exec: Fix an off-by-one in the comment documenting the maximum binfmt rewrite depth in exec_binprm(). The code allows five rewrites, not four; restricting the code would break userspace so the comment is fixed instead. - file handles: Reject detached mounts in capable_wrt_mount(). A detached mount can be dissolved concurrently, leaving a NULL mount namespace that open_by_handle_at() would dereference. * tag 'vfs-7.2-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (57 commits) netfs: Fix barriering when walking subrequest list iomap: submit read bio after each extent fuse: call fuse_send_readpages explicitly from fuse_readahead iomap: consolidate bio submission fhandle: reject detached mounts in capable_wrt_mount() netfs: Fix DIO write retry for filesystems without a ->prepare_write() netfs: Fix folio state after ENOMEM whilst under writeback iteration netfs: Fix writeback error handling netfs: Fix writethrough to use collection offload netfs: Replace wb_lock with a bit lock for asynchronicity netfs: Fix kdoc warning scatterlist: Fix offset in folio calc in extract_xarray_to_sg() iov_iter: Remove unused variable in kunit_iov_iter.c iov_iter: Fix a memory leak in iov_iter_extract_user_pages() iov_iter: Fix missing alloc fail check in iov_iter_extract_bvec_pages() iov_iter: Fix potential underflow in iov_iter_extract_xarray_pages() cachefiles: Fix file burial to take lock when unsetting S_KERNEL_FILE cachefiles: Fix double fput netfs: Fix netfs_create_write_req() to handle async cache object creation netfs: Fix decision whether to disallow write-streaming due to fscache use ...
11 days	Merge tag 'for-linus-7.2a-rc2-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - rename function parameters and a comment related to xen_exchange_memory() (Jan Beulich) - replace __ASSEMBLY__ with __ASSEMBLER__ (Thomas Huth) - add some sanity checking to the Xen pvcalls frontend driver (Michael Bommarito) - fix error handling in the Xen gntdev driver (Wentao Liang) - fix several minor bugs in Xen related drivers (Yousef Alhouseen) * tag 'for-linus-7.2a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/Xen: correct commentary and parameter naming of xen_exchange_memory() xenbus: reject unterminated directory replies xen/gntalloc: validate grant count before allocation xen/gntalloc: make grant counters unsigned xen/front-pgdir-shbuf: free grant reference head on errors xen/gntdev: fix error handling in ioctl xen: Replace __ASSEMBLY__ with __ASSEMBLER__ in header files xen/pvcalls: bound backend response req_id before indexing rsp[]
11 days	gue: validate REMCSUM private option length	Qihang
	GUE private flags can indicate that remote checksum offload metadata is present. The private flags field itself is accounted for by guehdr_flags_len(), but guehdr_priv_flags_len() currently returns 0 even when GUE_PFLAG_REMCSUM is set. This lets a packet with only the private flags field pass validate_gue_flags(), after which gue_remcsum() and gue_gro_remcsum() read the missing REMCSUM start/offset fields from the following bytes. Account for GUE_PLEN_REMCSUM when GUE_PFLAG_REMCSUM is present so that malformed packets are rejected during option validation. Fixes: c1aa8347e73e ("gue: Protocol constants for remote checksum offload") Signed-off-by: Qihang <q.h.hack.winter@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
12 days	Merge tag 'device-id-rework' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux Pull mod_devicetable.h header split from Uwe Kleine-König: "Split <linux/mod_devicetable.h> in per subsystem headers <linux/mod_devicetable.h> is included transitively in nearly every driver in an x86_64 allmodconfig build of v7.1: $ find drivers -name \.o -not -name \.mod.o \| wc -l 21330 $ find drivers -name \.o.cmd -not -name \.mod.o.cmd \| xargs grep -l mod_devicetable.h \| wc -l 17038 The result of this mixture of different and unrelated subsystem details is that even when touching an obscure device id struct most of the kernel needs to be recompiled. Given that each driver typically only needs one or two of these structures, splitting into per subsystem headers and only including what is really needed reduces the amount of needed recompilation. This split is implemented in the first commit and then after some preparatory work in the following commits, the last two replace includes of <linux/mod_devicetable.h> by the actually needed more specific headers. There are still a few instances left, but the ones with high impact (that is in headers that are used a lot) and the easy ones (.c files) are handled. These remaining includes will be addressed during the next merge window" * tag 'device-id-rework' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux: Replace <linux/mod_devicetable.h> by more specific <linux/device-id/.h> (c files) Replace <linux/mod_devicetable.h> by more specific <linux/device-id/.h> (headers) parisc: #include <linux/compiler.h> for unlikely() in <asm/ptrace.h> media: em28xx: Add include for struct usb_device_id LoongArch: KVM: Add include defining struct cpu_feature ALSA: hda/core: Add include defining struct hda_device_id usb: dwc2: Add include defining struct pci_device_id platform/x86: int3472: Add include defining struct dmi_system_id platform/x86: x86-android-tablets: Add include defining struct dmi_system_id i2c: Let i2c-core.h include <linux/i2c.h> of: Explicitly include <linux/types.h> and <linux/err.h> platform/x86: msi-ec: Ensure dmi_system_id is defined usb: serial: Include <linux/usb.h> in <linux/usb/serial.h> driver core: platform: Include header for struct platform_device_id driver: core: Include headers for acpi_device_id and of_device_id for struct device_driver media: ti: vpe: #include <linux/platform_device.h> explicitly mod_devicetable.h: Split into per subsystem headers
12 days	Replace <linux/mod_devicetable.h> by more specific <linux/device-id/*.h> ↵	Uwe Kleine-König (The Capable Hub)
	(headers) <linux/mod_devicetable.h> is included in a many files: $ git grep '<linux/mod_devicetable.h>' ef0c9f75a195 \| wc -l 1598 ; some of them are widely used headers. To stop mixing up different and unrelated driver( type)s let the subsystem headers only use the subset of the recently split <linux/mod_devicetable.h> that are relevant for them. The fallout (I hope) is addressed in the previous commits that handle sources relying on e.g. <linux/i2c.h> pulling in the full legacy header and thus providing pci_device_id. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://patch.msgid.link/199fe46b624ba07fb9bd3e0cd6ff13757932cb5f.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	ALSA: hda/core: Add include defining struct hda_device_id	Uwe Kleine-König (The Capable Hub)
	Traditionally all *_device_id were defined in a single header <linux/mod_devicetable.h>. This was split now with the objective that only the relevant bits are included. So including <linux/pci.h> won't be enough to get a definition of (the unrelated to pci) struct hda_device_id. Add an explicit include for the header defining struct hda_device_id to keep working when <linux/pci.h> stops providing this defintion. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/376883bc5889d5cca01efb6f8d4e07a20158f2b8.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	platform/x86: int3472: Add include defining struct dmi_system_id	Uwe Kleine-König (The Capable Hub)
	Currently <linux/mod_devicetable.h> is included transitively in int3472.h via <linux/clk-provider.h> -> <linux/of.h> -> <linux/mod_devicetable.h> However these includes will be tightend such that only the bits relevant for of will be provided by <linux/of.h>. To ensure that dmi_system_id stays around, include the respective header explicitly. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Link: https://patch.msgid.link/0ba52730f67dc995d9d896b81fa6a7320bf8cb4b.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	of: Explicitly include <linux/types.h> and <linux/err.h>	Uwe Kleine-König (The Capable Hub)
	<linux/of_platform.h> uses resource_size_t and relies on the transitive include <linux/mod_devicetable.h> -> <linux/types.h>. It also uses error constants and thus relying on the include chain <linux/mod_devicetable.h> -> <linux/uuid.h> -> <linux/string.h> -> <linux/err.h>. With the plan to split <linux/mod_devicetable.h> per subsystem and then only letting of_platform.h include the of-specific bits (which don't require these two headers), add the needed includes explicitly to keep the header self-contained. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://patch.msgid.link/a730991bc8813cf70c2445064ea425291538f709.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	usb: serial: Include <linux/usb.h> in <linux/usb/serial.h>	Uwe Kleine-König (The Capable Hub)
	All consumers of the latter also include the former, but without that struct usb_driver and struct usb_device_id (and maybe more) are not defined. Add an include for <linux/usb.h> to make the header self-contained. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://patch.msgid.link/82219ab65d16ee5bfe5a35d11bc938baac3fd3bc.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	driver core: platform: Include header for struct platform_device_id	Uwe Kleine-König (The Capable Hub)
	Platform drivers can define an array containing the supported device variants to be assigned to the struct platform_driver's .id_table. While a forward declaration of struct platform_device_id is technically enough to make the driver self-contained, it's reasonable to provide the (very lightweight) data type definition for that array in <linux/platform_device.h> to not add that burden to all platform drivers with an id-table. Note that currently <linux/device.h> transitively includes <linux/mod_devicetable.h> that provides struct platform_device_id. But that include is planned to be replaced by a tighter set of includes that only define the structures relevant for the stuff in <linux/device.h>. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://patch.msgid.link/4ca29592c9d1c6d528a65e05b80af7355f3c79c5.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	driver: core: Include headers for acpi_device_id and of_device_id for struct ↵	Uwe Kleine-König (The Capable Hub)
	device_driver struct device_driver contains pointers of type struct of_device_id* and struct acpi_device_id* but doesn't ensure these are defined. To make the header self-contained add the (very lightweight) includes that contain the respective definitions. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://patch.msgid.link/199ba71b4ac73f4b4d9f5d2be635c96eec73c70e.1782808461.git.u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	mod_devicetable.h: Split into per subsystem headers	Uwe Kleine-König (The Capable Hub)
	<linux/mod_devicetable.h> is included transitively in nearly every driver in an x86_64 allmodconfig build of v7.1: $ find drivers -name \.o -not -name \.mod.o \| wc -l 21330 $ find drivers -name \.o.cmd -not -name \.mod.o.cmd \| xargs grep -l mod_devicetable.h \| wc -l 17038 The result is that even when touching an obscure device id struct most of the kernel needs to be recompiled. Given that each driver typically only needs one or two of these structures, splitting into per subsystem headers and only including what is really needed reduces the amount of needed recompilation. Implement the first step and define each device id struct in a separate header (together with its associated #defines). <linux/mod_devicetable.h> is modified to include all the new headers to continue to provide the same symbols. Several headers currently include <linux/mod_devicetable.h>, those that are most lukrative to include only their subsystem headers only are: $ git -C source grep -l mod_devicetable.h include/linux \| while read h; do echo -n "$h:"; find drivers -name \.o.cmd -not -name \.mod.o.cmd \| xargs grep -l $h \| wc -l; done \| sort -t: -k2 -n -r \| head include/linux/of.h:10897 include/linux/pci.h:7920 include/linux/acpi.h:7097 include/linux/i2c.h:5402 include/linux/spi/spi.h:1897 include/linux/dmi.h:1643 include/linux/usb.h:1222 include/linux/input.h:1205 include/linux/mdio.h:835 include/linux/phy.h:733 struct cpu_feature isn't really a device_id struct. That is kept in <linux/mod_devicetable.h> for now. Acked-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # zorro Link: https://patch.msgid.link/41400e323be8640702b906d04327e833c5bdaf4a.1782808461.git.u.kleine-koenig@baylibre.com [Drop "MOD" from the header guards] Signed-off-by: Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com>
12 days	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds
	Pull BPF fixes from Daniel Borkmann: - Initialize task local storage before fork bails out to free the task (Jann Horn) - Fix insn_aux_data leak on verifier error path (KaFai Wan) - Reject BPF inode storage map creation when BPF LSM is uninitialized (Matt Bobrowski) - Mask pseudo pointer values in verifier logs when pointer leaks are not allowed (Nuoqi Gui) - Harden BPF JIT against spraying via IBPB flush (Pawan Gupta) - Reject a skb-modifying SK_SKB stream parser since the latter is only meant to measure the next message (Sechang Lim) - Fix bpf_refcount_acquire to reject refcounted allocation arguments with a non-zero fixed offset (Yiyang Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Prefer dirty packs for eBPF allocations bpf: Prefer packs that won't trigger an IBPB flush on allocation bpf: Skip redundant IBPB in pack allocator bpf: Restrict JIT predictor flush to cBPF x86/bugs: Enable IBPB flush on BPF JIT allocation bpf: Support for hardening against JIT spraying bpf: Reject BPF_MAP_TYPE_INODE_STORAGE creation if BPF LSM is uninitialized bpf,fork: wipe ->bpf_storage before bailouts that access it bpf: Fix insn_aux_data leak on verifier err_free_env path selftests/bpf: Cover pseudo-BTF ksym log masking bpf: Mask pseudo pointer values in verifier logs selftests/bpf: Cover refcount acquire node offsets bpf: Reject offset refcount acquire arguments selftests/bpf: test rejection of a packet-modifying SK_SKB stream parser bpf, sockmap: reject a packet-modifying SK_SKB stream parser selftests/bpf: don't modify the skb in the strparser parser prog
12 days	Merge tag 'vfio-v7.2-rc2' of https://github.com/awilliam/linux-vfio	Linus Torvalds
	Pull VFIO fixes from Alex Williamson: "Mostly straightforward fixes here, inconsistent runtime PM handling due to global device policies, bitfield races, unwind path gaps, teardown ordering, and a misplaced library flag. - Fix racy bitfield updates in vfio-pci-core and the mlx5 vfio-pci variant driver with a binary split between setup/release and runtime modified flags. These were noted across several Sashiko reviews as pre-existing issues (Alex Williamson) - Fix runtime PM inconsistency where the vfio-pci driver module_init could modify the idle PM policy of existing devices through globals managed in vfio-pci-core, leading to unbalanced runtime PM operations (Alex Williamson) - Restore mutability of writable vfio-pci module options by further pulling policy globals out of vfio-pci-core, to instead be latched per device at device init. Provide visibility of the per device latched values through debugfs (Alex Williamson) - Fix missing VGA arbiter uninit callback in unwind path (Alex Williamson) - Reorder device debugfs removal before device_del() to avoid gap where debugfs is available with stale devres pointers (Alex Williamson) - Move UUID library linking flag from vfio selftest Makefile into libvfio.mk to avoid exposing such dependencies when linking with KVM selftests (Sean Christopherson)" * tag 'vfio-v7.2-rc2' of https://github.com/awilliam/linux-vfio: vfio: selftests: Add luuid to libvfio.mk's list of libraries, not to the Makefile vfio/pci: Expose latched module parameter policy in debugfs vfio: Remove device debugfs before releasing devres vfio/pci: Latch all module parameters per device vfio/mlx5: Fix racy bitfields and tighten struct layout vfio/pci: Fix racy bitfields and tighten struct layout vfio/pci: Release the VGA arbiter client on register_device() failure vfio/pci: Latch disable_idle_d3 per device
12 days	Merge tag 'net-7.2-rc2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and batman-adv. Current release - new code bugs: - netfilter: cthelper: cap to maximum number of expectation per master Previous releases - regressions: - netpoll: fix a use-after-free on shutdown path - tcp: restore RCU grace period in tcp_ao_destroy_sock - ipv6: fix NULL deref in fib6_walk_continiue() on multi-batch dump - batman-adv: dat: ensure accessible eth_hdr proto field - eth: - virtio_net: disable cb when NAPI is busy-polled - lan743x: Initialize eth_syslock spinlock before use Previous releases - always broken: - netfilter: - nft_set_pipapo: don't leak bad clone into future transaction - sched: - sch_teql: Introduce slaves_lock to avoid race condition and UAF - replace direct dequeue call with peek and qdisc_dequeue_peeked - sctp: add INIT verification after cookie unpacking - tipc: fix out-of-bounds read in broadcast Gap ACK blocks - seg6: validate SRH length before reading fixed fields - eth: - mlx5e: fix use-after-free of metadata_dst on RX SC delete - enetc: check the number of BDs needed for xdp_frame - fbnic: don't cache shinfo across skb realloc" * tag 'net-7.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits) net/mlx5: HWS, fix matcher leak on resize target setup failure net/sched: hhf: clear heavy-hitter state on reset net/sched: dualpi2: clear stale classification on filter miss net/sched: act_bpf: use rcu_dereference_bh() to read the filter selftests: drv-net: tso: don't touch dangerous feature bits cxgb4: Fix decode strings dump for T6 adapters virtio_net: disable cb when NAPI is busy-polled sctp: fix addr_wq_timer race in sctp_free_addr_wq() selftests: net: bump default cmd() timeout to 20 seconds bridge: stp: Fix a potential use-after-free when deleting a bridge net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF net: gianfar: dispose irq mappings on probe failure and device removal net: lan743x: Initialize eth_syslock spinlock before use net: libwx: fix VMDQ mask for 1-queue mode net: airoha: fix max receive size configuration fsl/fman: Free init resources on KeyGen failure in fman_init() netfilter: nftables: restrict checkum update offset netfilter: nftables: restrict linklayer and network header writes netfilter: nfnetlink_queue: restrict writes to network header netfilter: nft_fib: reject fib expression on the netdev egress hook ...
13 days	mm: do file ownership checks with the proper mount idmap	Pedro Falcato
	Ever since idmapped mounts were introduced, inode ownership checks (for side-channel protection) in mincore() and madvise(MADV_PAGEOUT) were done against the nop_mnt_idmap, which completely ignores the file's mount's idmap. This results in odd edgecases like: 1) mount/bind-mount with an idmap userA:userB:1 2) userB runs an owner_or_capable() check on file that is owned by userA on-disk/in-memory, but owned by userB after idmap translation 3) owner_or_capable() mysteriously fails as the correct idmap wasn't supplied In the case of mincore/madvise MADV_PAGEOUT, this is usually benign, because file_permission(file, MAY_WRITE) will probably succeed, as it uses the proper idmap internally, but it does not need to be the case on e.g a 0444 file where even the owner itself doesn't have permissions to write to it. Since this is clearly not trivial to get right, introduce a file_owner_or_capable() that can carry the correct semantics, and switch the various users in mm to it. The issue was found by manual code inspection & an off-list discussion with Jan Kara. Link: https://lore.kernel.org/20260625153853.913949-1-pfalcato@suse.de Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Signed-off-by: Pedro Falcato <pfalcato@suse.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner (Amutable) <brauner@kernel.org> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 days	mm/damon: add a kernel-doc comment for damon_ctx->rnd_state	SJ Park
	Fix below kernel document build warning: WARNING: ../include/linux/damon.h:909 struct member 'rnd_state' not described in 'damon_ctx' Link: https://lore.kernel.org/20260628220808.98931-3-sj@kernel.org Fixes: 9012c4e647df ("mm/damon: replace damon_rand() with a per-ctx lockless PRNG") Signed-off-by: SJ Park <sj@kernel.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/4df95955-b255-4e5a-90c4-35db02f3111f@infradead.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 days	mm/damon: add a kernel-doc comment for damon_ctx->probes	SJ Park
	The two fields of damon_ctx struct dont have their kernel-doc comments. That causes kernel document builds to warn. Fix those. This patch (of 2): Fix below document build warning: WARNING: ../include/linux/damon.h:909 struct member 'probes' not described in 'damon_ctx' Link: https://lore.kernel.org/20260628220808.98931-1-sj@kernel.org Link: https://lore.kernel.org/20260628220808.98931-2-sj@kernel.org Fixes: 18c777859f28 ("mm/damon/core: embed damon_probe objects in damon_ctx") Signed-off-by: SJ Park <sj@kernel.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Closes: https://lore.kernel.org/4df95955-b255-4e5a-90c4-35db02f3111f@infradead.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 days	mm: page_ext: add count limit to page_ext_iter_next to prevent invalid PFN ↵	Ketan
	access The page_ext iteration API does not validate if the PFN still belongs to a valid section while advancing the iterator. When dynamically adding memory in the hotplug path, it can lead to a NULL pointer dereference during page_ext_lookup at the boundary of the last valid section when iterator count equals __pgcount. The for_each_page_ext() macro calls page_ext_iter_next() as its loop increment. for_each_page_ext() does a "__page_ext = page_ext_iter_next(&__iter)" at the end. This causes page_ext_iter_next() to increment iter->index past __pgcount and call page_ext_lookup(start_pfn + __pgcount). During memory hotplug (online), the PFN at start_pfn + __pgcount may belong to a section that has not yet been initialized, causing page_ext_lookup() to trigger a NULL pointer dereference. [ 14.555124][ T846] Call trace: [ 14.555125][ T846] lookup_page_ext+0x6c/0x108 (P) [ 14.555127][ T846] page_ext_lookup+0x30/0x3c [ 14.555129][ T846] __reset_page_owner+0x11c/0x260 [ 14.571201][ T846] __free_pages_ok+0x5e8/0x8e0 [ 14.571204][ T846] __free_pages_core+0x78/0xf0 [ 14.571206][ T846] generic_online_page+0x14/0x24 [ 14.597782][ T846] online_pages+0x178/0x30c [ 14.597784][ T846] memory_block_change_state+0x284/0x32c [ 14.597787][ T846] memory_subsys_online+0x4c/0x64 [ 14.597789][ T846] device_online+0x88/0xb0 [ 14.597791][ T846] online_memory_block+0x30/0x40 [ 14.597793][ T846] walk_memory_blocks+0xac/0xe8 [ 14.597794][ T846] add_memory_resource+0x280/0x298 [ 14.656161][ T846] add_memory+0x60/0x98 Move the iteration boundary enforcement inside the iterator functions, so callers cannot inadvertently access beyond the requested range. Link: https://lore.kernel.org/20260623-page_ext-v3-1-a89799a5367c@oss.qualcomm.com Fixes: 9039b9096ea2 ("mm: page_ext: add an iteration API for page extensions") Signed-off-by: Ketan Kishore <ketan.kishore@oss.qualcomm.com> Suggested-by: David Hildenbrand <david@redhat.com> Suggested-by: Matthew Wilcox <willy@infradead.org> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Luiz Capitulino <luizcap@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 days	mm/damon/ops-common: handle extreme intervals in damon_hot_score()	SeongJae Park
	Fix three issues in damon_hot_score() that comes from wrong handling of extreme (zero or too high) monitoring intervals user setup. When the user sets sampling interval zero, damon_max_nr_accesses(), which is called from damon_hot_score(), causes a divide-by-zero. Needless to say, it is a problem. When the user sets the aggregation interval zero, the function returns zero. It is wrong, since the real maximum nr_acceses in the setup should be one. Worse yet, it can cause another divide-by-zero from its caller, damon_hot_score(), since it uses damon_max_nr_accesses() return value as a denominator. When the user sets the aggregation interval very high, damon_hot_score() could return a value out of [0, DAMOS_MAX_SCORE] range. Since the return value is used as an index to the regions_score_histogram array, which is DAMOS_MAX_SCORE+1 size, it causes out of bounds array access. The issues can be relatively easily reproduced like below. The sysfs write permission is required, though. # ./damo start --damos_action lru_prio --damos_quota_space 100M \ --damos_quota_interval 1s # cd /sys/kernel/mm/damon/admin/kdamonds/0 # echo 0 > contexts/0/monitoring_attrs/intervals/sample_us # echo 0 > contexts/0/monitoring_attrs/intervals/aggr_us # echo commit > state # dmesg [...] [ 131.329762] Oops: divide error: 0000 [#1] SMP NOPTI [...] [ 131.336089] RIP: 0010:damon_hot_score+0x27/0xd0 [...] Fix the divide-by-zero intervals problems by explicitly handling the zero intervals in damon_max_nr_accesses(). Fix the out-of-bound array access by applying [0, DAMOS_MAX_SCORE] bounds before returning from damon_hot_score(). The issue was discovered [1] by Sashiko. Link: https://lore.kernel.org/20260623135834.67189-1-sj@kernel.org Link: https://lore.kernel.org/20260619202459.145010-1-sj@kernel.org [1] Fixes: 198f0f4c58b9 ("mm/damon/vaddr,paddr: support pageout prioritization") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> # 5.16.x Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 days	iomap: consolidate bio submission	Christoph Hellwig
	Add a iomap_bio_submit_read_endio helper factored out of iomap_bio_submit_read to that all ->submit_read implementations for iomap_read_ops that use iomap_bio_read_folio_range can shared the logic. Right now that logic is mostly trivial, but already has a bug for XFS because the XFS version is too trivial: file system integrity validation needs a workqueue context and thus can't happen from the default iomap bi_end_io I/O handler. Unfortunately the iomap refactoring just before fs integrity landed moved code around here and the call go misplaced, meaning it never got called. The PI information still is verified by the block layer, but the offloading is less efficient (and the future userspace interface can't get at it). Fixes: 0b10a370529c ("iomap: support T10 protection information") Cc: stable@vger.kernel.org # v7.1 Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260629121750.3392300-2-hch@lst.de Acked-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
13 days	netfs: Replace wb_lock with a bit lock for asynchronicity	David Howells
	The netfs_inode::wb_lock mutex is used to prevent multiple simultaneous writebacks from fighting each other (a writeback thread will write multiple discontiguous regions within the same request). The mutex, however, only serialises the issuing of subrequests; it doesn't serialise the collection of results, and, in particular, the updating of file size information and fscache populatedness data. Unfortunately, the mutex cannot be held around the entire process as it has to be unlocked in the same thread in which it is locked - and we don't want to hold up the allocator whilst we complete the writeback. Fix this by replacing the mutex with a bit flag and a list of lock waiters so that the lock can be dropped in the collector thread after collection is complete. Link: https://sashiko.dev/#/patchset/20260608145432.681865-1-dhowells%40redhat.com Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260625140640.3116900-12-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
13 days	netfs: Fix kdoc warning	David Howells
	Fix a kdoc warning due to a misnamed parameter in the description. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://patch.msgid.link/20260625140640.3116900-11-dhowells@redhat.com cc: Paulo Alcantara <pc@manguebit.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
13 days	ACPICA: Define acpi_ut_safe_strncpy() as strscpy_pad() alias	Rafael J. Wysocki
	Commit 292db66afd20 ("ACPICA: Unbreak tools build after switching over to strscpy_pad()") added an #ifdef based on a __KERNEL__ check which is sort of nasty to the acpi_ut_safe_strncpy() definition to unbreak ACPICA tools builds broken by commit 97f7d3f9c9ac ("ACPICA: Replace strncpy() with strscpy_pad() in acpi_ut_safe_strncpy()"). However, that #ifdef effectively produces dead code when tools are built because they don't call acpi_ut_safe_strncpy(). Accordingly, drop the existing definition of acpi_ut_safe_strncpy() and define it as a strscpy_pad() alias. Fixes: 292db66afd20 ("ACPICA: Unbreak tools build after switching over to strscpy_pad()") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [ rjw: Tweak the changelog ] Link: https://patch.msgid.link/12941764.O9o76ZdvQC@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
13 days	bpf: Restrict JIT predictor flush to cBPF	Pawan Gupta
	Currently predictor flush on memory reuse is done for all BPF JIT allocations, but only cBPF programs can be loaded by an unprivileged user. eBPF is privileged by default, and flushing predictors for all CPUs on every eBPF reuse penalizes the common case for no security benefit. eBPF allocations can be frequent on busy systems, only flush predictors for cBPF programs. Trampoline and dispatcher allocations also skip the flush as they are eBPF-only. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
13 days	bpf: Support for hardening against JIT spraying	Pawan Gupta
	The BPF JIT allocator packs many small programs into larger executable allocations and reuses space within those allocations as programs are loaded and freed. When fresh code is written into space that a previous program occupied, an indirect jump into the new program can reuse a branch prediction left behind by the old one. Flush the indirect branch predictors before reusing JIT memory so that indirect jumps into a newly written program don't reuse predictions from an old program that occupied the same space. Introduce bpf_arch_pred_flush_enabled static key and bpf_arch_pred_flush static call for flushing the branch predictors on JIT memory reuse. Architectures that need a flush, can update it to a predictor flush function. By default, its a NOP and does not emit any CALL. Allocations larger than a pack are not covered by this flush. That is safe because cBPF programs (the unprivileged attack surface) are bounded well below a pack size. Issue a warning if this assumption is ever violated while the flush is active. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>