linux-stable.git/kernel/bpf/hashtab.c, branch v6.1.2

treewide: use get_random_u32() when possible

2022-10-11T23:42:58+00:00

The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.

Reviewed-by: Greg Kroah-Hartman 
Reviewed-by: Kees Cook 
Reviewed-by: Yury Norov 
Reviewed-by: Jan Kara  # for ext4
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Acked-by: Chuck Lever  # for nfsd
Acked-by: Jakub Kicinski 
Acked-by: Mika Westerberg  # for thunderbolt
Acked-by: Darrick J. Wong  # for xfs
Acked-by: Helge Deller  # for parisc
Acked-by: Heiko Carstens  # for s390
Signed-off-by: Jason A. Donenfeld

bpf: Always use raw spinlock for hash bucket lock

2022-09-22T01:08:54+00:00

For a non-preallocated hash map on RT kernel, regular spinlock instead
of raw spinlock is used for bucket lock. The reason is that on RT kernel
memory allocation is forbidden under atomic context and regular spinlock
is sleepable under RT.

Now hash map has been fully converted to use bpf_map_alloc, and there
will be no synchronous memory allocation for non-preallocated hash map,
so it is safe to always use raw spinlock for bucket lock on RT. So
removing the usage of htab_use_raw_lock() and updating the comments
accordingly.

Signed-off-by: Hou Tao 
Link: https://lore.kernel.org/r/20220921073826.2365800-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov

bpf: add missing percpu_counter_destroy() in htab_map_alloc()

2022-09-10T23:04:07+00:00

syzbot is reporting ODEBUG bug in htab_map_alloc() [1], for
commit 86fe28f7692d96d2 ("bpf: Optimize element count in non-preallocated
hash map.") added percpu_counter_init() to htab_map_alloc() but forgot to
add percpu_counter_destroy() to the error path.

Link: https://syzkaller.appspot.com/bug?extid=5d1da78b375c3b5e6c2b [1]
Reported-by: syzbot 
Signed-off-by: Tetsuo Handa 
Fixes: 86fe28f7692d96d2 ("bpf: Optimize element count in non-preallocated hash map.")
Reviewed-by: Stanislav Fomichev 
Link: https://lore.kernel.org/r/e2e4cc0e-9d36-4ca1-9bfa-ce23e6f8310b@I-love.SAKURA.ne.jp
Signed-off-by: Alexei Starovoitov

bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.

2022-09-05T13:33:07+00:00

User space might be creating and destroying a lot of hash maps. Synchronous
rcu_barrier-s in a destruction path of hash map delay freeing of hash buckets
and other map memory and may cause artificial OOM situation under stress.
Optimize rcu_barrier usage between bpf hash map and bpf_mem_alloc:
- remove rcu_barrier from hash map, since htab doesn't use call_rcu
  directly and there are no callback to wait for.
- bpf_mem_alloc has call_rcu_in_progress flag that indicates pending callbacks.
  Use it to avoid barriers in fast path.
- When barriers are needed copy bpf_mem_alloc into temp structure
  and wait for rcu barrier-s in the worker to let the rest of
  hash map freeing to proceed.

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Link: https://lore.kernel.org/bpf/20220902211058.60789-17-alexei.starovoitov@gmail.com

bpf: Convert percpu hash map to per-cpu bpf_mem_alloc.

2022-09-05T13:33:06+00:00

Convert dynamic allocations in percpu hash map from alloc_percpu() to
bpf_mem_cache_alloc() from per-cpu bpf_mem_alloc. Since bpf_mem_alloc frees
objects after RCU gp the call_rcu() is removed. pcpu_init_value() now needs to
zero-fill per-cpu allocations, since dynamically allocated map elements are now
similar to full prealloc, since alloc_percpu() is not called inline and the
elements are reused in the freelist.

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Acked-by: Kumar Kartikeya Dwivedi 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20220902211058.60789-12-alexei.starovoitov@gmail.com

bpf: Add percpu allocation support to bpf_mem_alloc.

2022-09-05T13:33:06+00:00

Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations.
Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects.
bpf_mem_cache_free() will free them back into global per-cpu pool after
observing RCU grace period.
per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps.

The free list cache consists of tuples { llist_node, per-cpu pointer }
Unlike alloc_percpu() that returns per-cpu pointer
the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and
bpf_mem_cache_free() expects to receive it back.

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Acked-by: Kumar Kartikeya Dwivedi 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20220902211058.60789-11-alexei.starovoitov@gmail.com

bpf: Optimize call_rcu in non-preallocated hash map.

2022-09-05T13:33:06+00:00

Doing call_rcu() million times a second becomes a bottle neck.
Convert non-preallocated hash map from call_rcu to SLAB_TYPESAFE_BY_RCU.
The rcu critical section is no longer observed for one htab element
which makes non-preallocated hash map behave just like preallocated hash map.
The map elements are released back to kernel memory after observing
rcu critical section.
This improves 'map_perf_test 4' performance from 100k events per second
to 250k events per second.

bpf_mem_alloc + percpu_counter + typesafe_by_rcu provide 10x performance
boost to non-preallocated hash map and make it within few % of preallocated map
while consuming fraction of memory.

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Acked-by: Kumar Kartikeya Dwivedi 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20220902211058.60789-8-alexei.starovoitov@gmail.com

bpf: Optimize element count in non-preallocated hash map.

2022-09-05T13:33:06+00:00

The atomic_inc/dec might cause extreme cache line bouncing when multiple cpus
access the same bpf map. Based on specified max_entries for the hash map
calculate when percpu_counter becomes faster than atomic_t and use it for such
maps. For example samples/bpf/map_perf_test is using hash map with max_entries
1000. On a system with 16 cpus the 'map_perf_test 4' shows 14k events per
second using atomic_t. On a system with 15 cpus it shows 100k events per second
using percpu. map_perf_test is an extreme case where all cpus colliding on
atomic_t which causes extreme cache bouncing. Note that the slow path of
percpu_counter is 5k events per secound vs 14k for atomic, so the heuristic is
necessary. See comment in the code why the heuristic is based on
num_online_cpus().

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Acked-by: Kumar Kartikeya Dwivedi 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20220902211058.60789-7-alexei.starovoitov@gmail.com

bpf: Convert hash map to bpf_mem_alloc.

2022-09-05T13:33:05+00:00

Convert bpf hash map to use bpf memory allocator.

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Acked-by: Kumar Kartikeya Dwivedi 
Acked-by: Andrii Nakryiko 
Link: https://lore.kernel.org/bpf/20220902211058.60789-3-alexei.starovoitov@gmail.com

bpf: Propagate error from htab_lock_bucket() to userspace

2022-08-31T21:10:01+00:00

In __htab_map_lookup_and_delete_batch() if htab_lock_bucket() returns
-EBUSY, it will go to next bucket. Going to next bucket may not only
skip the elements in current bucket silently, but also incur
out-of-bound memory access or expose kernel memory to userspace if
current bucket_cnt is greater than bucket_size or zero.

Fixing it by stopping batch operation and returning -EBUSY when
htab_lock_bucket() fails, and the application can retry or skip the busy
batch as needed.

Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked")
Reported-by: Hao Sun 
Signed-off-by: Hou Tao 
Link: https://lore.kernel.org/r/20220831042629.130006-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau