linux-stable.git/kernel/bpf/task_iter.c, branch v6.12.91

bpf: return VMA snapshot from task_vma iterator

2026-05-23T11:04:24+00:00

[ Upstream commit 4cbee026db54cad39c39db4d356100cb133412b3 ]

Holding the per-VMA lock across the BPF program body creates a lock
ordering problem when helpers acquire locks that depend on mmap_lock:

  vm_lock -> i_rwsem -> mmap_lock -> vm_lock

Snapshot the VMA under the per-VMA lock in _next() via memcpy(), then
drop the lock before returning. The BPF program accesses only the
snapshot.

The verifier only trusts vm_mm and vm_file pointers (see
BTF_TYPE_SAFE_TRUSTED_OR_NULL in verifier.c). vm_file is reference-
counted with get_file() under the lock and released via fput() on the
next iteration or in _destroy(). vm_mm is already correct because
lock_vma_under_rcu() verifies vma->vm_mm == mm. All other pointers
are left as-is by memcpy() since the verifier treats them as untrusted.

Fixes: 4ac454682158 ("bpf: Introduce task_vma open-coded iterator kfuncs")
Signed-off-by: Puranjay Mohan 
Acked-by: Andrii Nakryiko 
Acked-by: Mykyta Yatsenko 
Link: https://lore.kernel.org/r/20260408154539.3832150-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Sasha Levin

bpf: switch task_vma iterator from mmap_lock to per-VMA locks

2026-05-23T11:04:24+00:00

[ Upstream commit bee9ef4a40a277bf401be43d39ba7f7f063cf39c ]

The open-coded task_vma iterator holds mmap_lock for the entire duration
of iteration, increasing contention on this highly contended lock.

Switch to per-VMA locking. Find the next VMA via an RCU-protected maple
tree walk and lock it with lock_vma_under_rcu(). lock_next_vma() is not
used because its fallback takes mmap_read_lock(), and the iterator must
work in non-sleepable contexts.

lock_vma_under_rcu() is a point lookup (mas_walk) that finds the VMA
containing a given address but cannot iterate across gaps. An
RCU-protected vma_next() walk (mas_find) first locates the next VMA's
vm_start to pass to lock_vma_under_rcu().

Between the RCU walk and the lock, the VMA may be removed, shrunk, or
write-locked. On failure, advance past it using vm_end from the RCU
walk. Because the VMA slab is SLAB_TYPESAFE_BY_RCU, vm_end may be
stale; fall back to PAGE_SIZE advancement when it does not make forward
progress. Concurrent VMA insertions at addresses already passed by the
iterator are not detected.

CONFIG_PER_VMA_LOCK is required; return -EOPNOTSUPP without it.

Signed-off-by: Puranjay Mohan 
Link: https://lore.kernel.org/r/20260408154539.3832150-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov 
Stable-dep-of: 4cbee026db54 ("bpf: return VMA snapshot from task_vma iterator")
Signed-off-by: Sasha Levin

bpf: fix mm lifecycle in open-coded task_vma iterator

2026-05-23T11:04:24+00:00

[ Upstream commit d8e27d2d22b6e2df3a0125b8c08e9aace38c954c ]

The open-coded task_vma iterator reads task->mm locklessly and acquires
mmap_read_trylock() but never calls mmget(). If the task exits
concurrently, the mm_struct can be freed as it is not
SLAB_TYPESAFE_BY_RCU, resulting in a use-after-free.

Safely read task->mm with a trylock on alloc_lock and acquire an mm
reference. Drop the reference via bpf_iter_mmput_async() in _destroy()
and error paths. bpf_iter_mmput_async() is a local wrapper around
mmput_async() with a fallback to mmput() on !CONFIG_MMU.

Reject irqs-disabled contexts (including NMI) up front. Operations used
by _next() and _destroy() (mmap_read_unlock, bpf_iter_mmput_async)
take spinlocks with IRQs disabled (pool->lock, pi_lock). Running from
NMI or from a tracepoint that fires with those locks held could
deadlock.

A trylock on alloc_lock is used instead of the blocking task_lock()
(get_task_mm) to avoid a deadlock when a softirq BPF program iterates
a task that already holds its alloc_lock on the same CPU.

Fixes: 4ac454682158 ("bpf: Introduce task_vma open-coded iterator kfuncs")
Signed-off-by: Puranjay Mohan 
Link: https://lore.kernel.org/r/20260408154539.3832150-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Sasha Levin

bpf: Fix iter/task tid filtering

2024-10-17T17:52:18+00:00

In userspace, you can add a tid filter by setting
the "task.tid" field for "bpf_iter_link_info".
However, `get_pid_task` when called for the
`BPF_TASK_ITER_TID` type should have been using
`PIDTYPE_PID` (tid) instead of `PIDTYPE_TGID` (pid).

Fixes: f0d74c4da1f0 ("bpf: Parameterize task iterators.")
Signed-off-by: Jordan Rome 
Signed-off-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20241016210048.1213935-1-linux@jordanrome.com

bpf: Remove unnecessary loop in task_file_seq_get_next()

2024-07-08T14:23:19+00:00

After commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") this
loop always iterates exactly one time.  Delete the for statement and pull
the code in a tab.

Signed-off-by: Dan Carpenter 
Signed-off-by: Daniel Borkmann 
Reviewed-by: Christian Brauner 
Acked-by: Jiri Olsa 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/bpf/ZoWJF51D4zWb6f5t@stanley.mountain

bpf: Fix an issue due to uninitialized bpf_iter_task

2024-02-19T11:28:15+00:00

Failure to initialize it->pos, coupled with the presence of an invalid
value in the flags variable, can lead to it->pos referencing an invalid
task, potentially resulting in a kernel panic. To mitigate this risk, it's
crucial to ensure proper initialization of it->pos to NULL.

Fixes: ac8148d957f5 ("bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)")
Signed-off-by: Yafang Shao 
Signed-off-by: Daniel Borkmann 
Acked-by: Yonghong Song 
Acked-by: Oleg Nesterov 
Link: https://lore.kernel.org/bpf/20240217114152.1623-2-laoar.shao@gmail.com

bpf: bpf_iter_task_next: use next_task(kit->task) rather than next_task(kit->pos)

2023-11-19T19:43:44+00:00

This looks more clear and simplifies the code. While at it, remove the
unnecessary initialization of pos/task at the start of bpf_iter_task_new().

Note that we can even kill kit->task, we can just use pos->group_leader,
but I don't understand the BUILD_BUG_ON() checks in bpf_iter_task_new().

Signed-off-by: Oleg Nesterov 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/r/20231114163239.GA903@redhat.com
Signed-off-by: Alexei Starovoitov

bpf: bpf_iter_task_next: use __next_thread() rather than next_thread()

2023-11-19T19:43:44+00:00

Lockless use of next_thread() should be avoided, kernel/bpf/task_iter.c
is the last user and the usage is wrong.

bpf_iter_task_next() can loop forever, "kit->pos == kit->task" can never
happen if kit->pos execs. Change this code to use __next_thread().

With or without this change the usage of kit->pos/task and next_task()
doesn't look nice, see the next patch.

Signed-off-by: Oleg Nesterov 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/r/20231114163237.GA897@redhat.com
Signed-off-by: Alexei Starovoitov

bpf: task_group_seq_get_next: use __next_thread() rather than next_thread()

2023-11-19T19:43:44+00:00

Lockless use of next_thread() should be avoided, kernel/bpf/task_iter.c
is the last user and the usage is wrong.

task_group_seq_get_next() can return the group leader twice if it races
with mt-thread exec which changes the group->leader's pid.

Change the main loop to use __next_thread(), kill "next_tid == common->pid"
check.

__next_thread() can't loop forever, we can also change this code to retry
if next_tid == 0.

Signed-off-by: Oleg Nesterov 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/r/20231114163234.GA890@redhat.com
Signed-off-by: Alexei Starovoitov

bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg

2023-11-07T23:24:25+00:00

BTF_TYPE_SAFE_TRUSTED(struct bpf_iter__task) in verifier.c wanted to
teach BPF verifier that bpf_iter__task -> task is a trusted ptr. But it
doesn't work well.

The reason is, bpf_iter__task -> task would go through btf_ctx_access()
which enforces the reg_type of 'task' is ctx_arg_info->reg_type, and in
task_iter.c, we actually explicitly declare that the
ctx_arg_info->reg_type is PTR_TO_BTF_ID_OR_NULL.

Actually we have a previous case like this[1] where PTR_TRUSTED is added to
the arg flag for map_iter.

This patch sets ctx_arg_info->reg_type is PTR_TO_BTF_ID_OR_NULL |
PTR_TRUSTED in task_reg_info.

Similarly, bpf_cgroup_reg_info -> cgroup is also PTR_TRUSTED since we are
under the protection of cgroup_mutex and we would check cgroup_is_dead()
in __cgroup_iter_seq_show().

This patch is to improve the user experience of the newly introduced
bpf_iter_css_task kfunc before hitting the mainline. The Fixes tag is
pointing to the commit introduced the bpf_iter_css_task kfunc.

Link[1]:https://lore.kernel.org/all/20230706133932.45883-3-aspsk@isovalent.com/

Fixes: 9c66dc94b62a ("bpf: Introduce css_task open-coded iterator kfuncs")
Signed-off-by: Chuyi Zhou 
Acked-by: Yonghong Song 
Link: https://lore.kernel.org/r/20231107132204.912120-2-zhouchuyi@bytedance.com
Signed-off-by: Martin KaFai Lau