| Age | Commit message (Collapse) | Author |
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200507185755.GA15014@embeddedor
|
|
It's not very nice to zero trap for this, because then system calls no
longer have trap_is_syscall(regs) invariant, and we can't distinguish
between sc and scv system calls (in a later patch).
Take one last unused bit from the low bits of the pt_regs.trap word
for this instead. There is not a really good reason why it should be
in trap as opposed to another field, but trap has some concept of
flags and it exists. Ideally I think we would move trap to 2-byte
field and have 2 more bytes available independently.
Add a selftests case for this, which can be seen to fail if
trap_norestart() is changed to return false.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Make them static inlines]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200507121332.2233629-4-mpe@ellerman.id.au
|
|
Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and
bpf_sk_ancestor_cgroup_id helpers from cgroup skb program.
The test creates a testing cgroup, starts a TCPv6 server inside the
cgroup and creates two client sockets: one inside testing cgroup and one
outside.
Then it attaches cgroup skb program to the cgroup that checks all TCP
segments coming to the server and allows only those coming from the
cgroup of the server. If a segment comes from a peer outside of the
cgroup, it'll be dropped.
Finally the test checks that client from inside testing cgroup can
successfully connect to the server, but client outside the cgroup fails
to connect by timeout.
The main goal of the test is to check newly introduced
bpf_sk_{,ancestor_}cgroup_id helpers.
It also checks a couple of socket lookup helpers (tcp & release), but
lookup helpers were introduced much earlier and covered by other tests.
Here it's mostly checked that they can be called from cgroup skb.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com
|
|
Add two new network helpers.
connect_fd_to_fd connects an already created client socket fd to address
of server fd. Sometimes it's useful to separate client socket creation
and connecting this socket to a server, e.g. if client socket has to be
created in a cgroup different from that of server cgroup.
Additionally connect_to_fd is now implemented using connect_fd_to_fd,
both helpers don't treat EINPROGRESS as an error and let caller decide
how to proceed with it.
connect_wait is a helper to work with non-blocking client sockets so
that if connect_to_fd or connect_fd_to_fd returned -1 with errno ==
EINPROGRESS, caller can wait for connect to finish or for connection
timeout. The helper returns -1 on error, 0 on timeout (1sec,
hard-coded), and positive number on success.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com
|
|
There is a spelling mistake in an error message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200514121529.259668-1-colin.king@canonical.com
|
|
Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr
programs.
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com
|
|
Clean up after recent fixes, move address calculations
around and change the variable init, so that we can have
just one start_offset == end_offset check.
Make the check a little stricter to preserve the -EINVAL
error if requested start offset is larger than the region
itself.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There are a few fentry/fexit programs returning non-0.
The tests with these programs will break with the previous
patch which enfoced return-0 rules. Fix them properly.
Fixes: ac065870d928 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200514053207.1298479-1-yhs@fb.com
|
|
Flower tests used to create ingress filter with specified parent qdisc
"parent ffff:" but dump them on "ingress". With recent commit that fixed
tcm_parent handling in dump those are not considered same parent anymore,
which causes iproute2 tc to emit additional "parent ffff:" in first line of
filter dump output. The change in output causes filter match in tests to
fail.
Prevent parent qdisc output when dumping filters in flower tests by always
correctly specifying "ingress" parent both when creating and dumping
filters.
Fixes: a7df4870d79b ("net_sched: fix tcm_parent in tc filter dump")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mmap() subsystem allows user-space application to memory-map region with
initial page offset. This wasn't taken into account in initial implementation
of BPF array memory-mapping. This would result in wrong pages, not taking into
account requested page shift, being memory-mmaped into user-space. This patch
fixes this gap and adds a test for such scenario.
Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512235925.3817805-1-andriin@fb.com
|
|
Commit 6879c042e105 ("tools/bpf: selftests: Add bpf_iter selftests")
added self tests for bpf_iter feature. But two subtests
ipv6_route and netlink needs llvm latest 10.x release branch
or trunk due to a bug in llvm BPF backend. This patch added
the file README.rst to document these two failures
so people using llvm 10.0.0 can be aware of them.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200513180215.2949237-1-yhs@fb.com
|
|
It is sometimes desirable to be able to trigger BPF program from user-space
with minimal overhead. sys_enter would seem to be a good candidate, yet in
a lot of cases there will be a lot of noise from syscalls triggered by other
processes on the system. So while searching for low-overhead alternative, I've
stumbled upon getpgid() syscall, which seems to be specific enough to not
suffer from accidental syscall by other apps.
This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe,
fentry and fmod_ret with returning error (so that syscall would not be
executed), to determine the lowest-overhead way. Here are results on my
machine (using benchs/run_bench_trigger.sh script):
base : 9.200 ± 0.319M/s
tp : 6.690 ± 0.125M/s
rawtp : 8.571 ± 0.214M/s
kprobe : 6.431 ± 0.048M/s
fentry : 8.955 ± 0.241M/s
fmodret : 8.903 ± 0.135M/s
So it seems like fmodret doesn't give much benefit for such lightweight
syscall. Raw tracepoint is pretty decent despite additional filtering logic,
but it will be called for any other syscall in the system, which rules it out.
Fentry, though, seems to be adding the least amoung of overhead and achieves
97.3% of performance of baseline no-BPF-attached syscall.
Using getpgid() seems to be preferable to set_task_comm() approach from
test_overhead, as it's about 2.35x faster in a baseline performance.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
|
|
Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement
user-space benchmarking part into benchmark runner to compare results. Results
with ./bench are consistently somewhat lower than test_overhead's, but relative
performance of various types of BPF programs stay consisten (e.g., kretprobe is
noticeably slower). This slowdown seems to be coming from the fact that
test_overhead is single-threaded, while benchmark always spins off at least
one thread for producer. This has been confirmed by hacking multi-threaded
test_overhead variant and also single-threaded bench variant. Resutls are
below. run_bench_rename.sh script from benchs/ subdirectory was used to
produce results for ./bench.
Single-threaded implementations
===============================
/* bench: single-threaded, atomics */
base : 4.622 ± 0.049M/s
kprobe : 3.673 ± 0.052M/s
kretprobe : 2.625 ± 0.052M/s
rawtp : 4.369 ± 0.089M/s
fentry : 4.201 ± 0.558M/s
fexit : 4.309 ± 0.148M/s
fmodret : 4.314 ± 0.203M/s
/* selftest: single-threaded, no atomics */
task_rename base 4555K events per sec
task_rename kprobe 3643K events per sec
task_rename kretprobe 2506K events per sec
task_rename raw_tp 4303K events per sec
task_rename fentry 4307K events per sec
task_rename fexit 4010K events per sec
task_rename fmod_ret 3984K events per sec
Multi-threaded implementations
==============================
/* bench: multi-threaded w/ atomics */
base : 3.910 ± 0.023M/s
kprobe : 3.048 ± 0.037M/s
kretprobe : 2.300 ± 0.015M/s
rawtp : 3.687 ± 0.034M/s
fentry : 3.740 ± 0.087M/s
fexit : 3.510 ± 0.009M/s
fmodret : 3.485 ± 0.050M/s
/* selftest: multi-threaded w/ atomics */
task_rename base 3872K events per sec
task_rename kprobe 3068K events per sec
task_rename kretprobe 2350K events per sec
task_rename raw_tp 3731K events per sec
task_rename fentry 3639K events per sec
task_rename fexit 3558K events per sec
task_rename fmod_ret 3511K events per sec
/* selftest: multi-threaded, no atomics */
task_rename base 3945K events per sec
task_rename kprobe 3298K events per sec
task_rename kretprobe 2451K events per sec
task_rename raw_tp 3718K events per sec
task_rename fentry 3782K events per sec
task_rename fexit 3543K events per sec
task_rename fmod_ret 3526K events per sec
Note that the fact that ./bench benchmark always uses atomic increments for
counting, while test_overhead doesn't, doesn't influence test results all that
much.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com
|
|
While working on BPF ringbuf implementation, testing, and benchmarking, I've
developed a pretty generic and modular benchmark runner, which seems to be
generically useful, as I've already used it for one more purpose (testing
fastest way to trigger BPF program, to minimize overhead of in-kernel code).
This patch adds generic part of benchmark runner and sets up Makefile for
extending it with more sets of benchmarks.
Benchmarker itself operates by spinning up specified number of producer and
consumer threads, setting up interval timer sending SIGALARM signal to
application once a second. Every second, current snapshot with hits/drops
counters are collected and stored in an array. Drops are useful for
producer/consumer benchmarks in which producer might overwhelm consumers.
Once test finishes after given amount of warm-up and testing seconds, mean and
stddev are calculated (ignoring warm-up results) and is printed out to stdout.
This setup seems to give consistent and accurate results.
To validate behavior, I added two atomic counting tests: global and local.
For global one, all the producer threads are atomically incrementing same
counter as fast as possible. This, of course, leads to huge drop of
performance once there is more than one producer thread due to CPUs fighting
for the same memory location.
Local counting, on the other hand, maintains one counter per each producer
thread, incremented independently. Once per second, all counters are read and
added together to form final "counting throughput" measurement. As expected,
such setup demonstrates linear scalability with number of producers (as long
as there are enough physical CPU cores, of course). See example output below.
Also, this setup can nicely demonstrate disastrous effects of false sharing,
if care is not taken to take those per-producer counters apart into
independent cache lines.
Demo output shows global counter first with 1 producer, then with 4. Both
total and per-producer performance significantly drop. The last run is local
counter with 4 producers, demonstrating near-perfect scalability.
$ ./bench -a -w1 -d2 -p1 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s
Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s
Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s
Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s
Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s
Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s
Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s
Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s
Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s
$ ./bench -a -w1 -d2 -p4 count-local
Setting up benchmark 'count-local'...
Benchmark 'count-local' started.
Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s
Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s
Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s
Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s
Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s
Benchmark runner supports setting thread affinity for producer and consumer
threads. You can use -a flag for default CPU selection scheme, where first
consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads
pick up next CPU and increment one-by-one as well. But user can also specify
a set of CPUs independently for producers and consumers with --prod-affinity
1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force
producers and consumers to share same set of CPUs, if necessary.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com
|
|
Add testing_helpers.c, which will contain generic helpers for test runners and
tests needing some common generic functionality, like parsing a set of
numbers.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-2-andriin@fb.com
|
|
|
|
This is basically a test-suite for setns() and as of now contains:
- test that we can't pass garbage flags
- test that we can't attach to the namespaces of task that has already exited
- test that we can incrementally setns into all namespaces of a target task
using a pidfd
- test that we can setns atomically into all namespaces of a target task
- test that we can't cross setns into a user namespace outside of our user
namespace hierarchy
- test that we can't setns into namespaces owned by user namespaces over which
we are not privileged
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20200505140432.181565-4-christian.brauner@ubuntu.com
|
|
Before commit 74b5a5968fe8 ("selftests/bpf: Replace test_progs and
test_maps w/ general rule") selftests/bpf used generic install
target from selftests/lib.mk to install generated bpf test progs
by mentioning them in TEST_GEN_FILES variable.
Take that functionality back.
Fixes: 74b5a5968fe8 ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513021722.7787-1-yauheni.kaliuta@redhat.com
|
|
Add new subcommands to kunit.py to allow stages of the existing 'run'
subcommand to be run independently:
- 'config': Verifies that .config is a subset of .kunitconfig
- 'build': Compiles a UML kernel for KUnit
- 'exec': Runs the kernel, and outputs the test results.
- 'parse': Parses test results from a file or stdin
The 'run' command continues to behave as before.
Signed-off-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200507185057.GA13981@embeddedor
|
|
The added test includes the following subtests:
- test verifier change for btf_id_or_null
- test load/create_iter/read for
ipv6_route/netlink/bpf_map/task/task_file
- test anon bpf iterator
- test anon bpf iterator reading one char at a time
- test file bpf iterator
- test overflow (single bpf program output not overflow)
- test overflow (single bpf program output overflows)
- test bpf prog returning 1
The ipv6_route tests the following verifier change
- access fields in the variable length array of the structure.
The netlink load tests the following verifier change
- put a btf_id ptr value in a stack and accessible to
tracing/iter programs.
The anon bpf iterator also tests link auto attach through skeleton.
$ test_progs -n 2
#2/1 btf_id_or_null:OK
#2/2 ipv6_route:OK
#2/3 netlink:OK
#2/4 bpf_map:OK
#2/5 task:OK
#2/6 task_file:OK
#2/7 anon:OK
#2/8 anon-read-one-char:OK
#2/9 file:OK
#2/10 overflow:OK
#2/11 overflow-e2big:OK
#2/12 prog-ret-1:OK
#2 bpf_iter:OK
Summary: 1/12 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175923.2477637-1-yhs@fb.com
|
|
The implementation is arbitrary, just to show how the bpf programs
can be written for bpf_map/task/task_file. They can be costomized
for specific needs.
For example, for bpf_map, the iterator prints out:
$ cat /sys/fs/bpf/my_bpf_map
id refcnt usercnt locked_vm
3 2 0 20
6 2 0 20
9 2 0 20
12 2 0 20
13 2 0 20
16 2 0 20
19 2 0 20
%%% END %%%
For task, the iterator prints out:
$ cat /sys/fs/bpf/my_task
tgid gid
1 1
2 2
....
1944 1944
1948 1948
1949 1949
1953 1953
=== END ===
For task/file, the iterator prints out:
$ cat /sys/fs/bpf/my_task_file
tgid gid fd file
1 1 0 ffffffff95c97600
1 1 1 ffffffff95c97600
1 1 2 ffffffff95c97600
....
1895 1895 255 ffffffff95c8fe00
1932 1932 0 ffffffff95c8fe00
1932 1932 1 ffffffff95c8fe00
1932 1932 2 ffffffff95c8fe00
1932 1932 3 ffffffff95c185c0
This is able to print out all open files (fd and file->f_op), so user can compare
f_op against a particular kernel file operations to find what it is.
For example, from /proc/kallsyms, we can find
ffffffff95c185c0 r eventfd_fops
so we will know tgid 1932 fd 3 is an eventfd file descriptor.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175922.2477576-1-yhs@fb.com
|
|
Two bpf programs are added in this patch for netlink and ipv6_route
target. On my VM, I am able to achieve identical
results compared to /proc/net/netlink and /proc/net/ipv6_route.
$ cat /proc/net/netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
000000002c42d58b 0 0 00000000 0 0 0 2 0 7
00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719
00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422
000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424
....
00000000159a170d 15 1862 00000002 0 0 0 2 0 1886
000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076
00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683
000000008398fb08 16 0 00000000 0 0 0 2 0 27
$ cat /sys/fs/bpf/my_netlink
sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode
000000002c42d58b 0 0 00000000 0 0 0 2 0 7
00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719
00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422
000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424
....
00000000159a170d 15 1862 00000002 0 0 0 2 0 1886
000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076
00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683
000000008398fb08 16 0 00000000 0 0 0 2 0 27
$ cat /proc/net/ipv6_route
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
$ cat /sys/fs/bpf/my_ipv6_route
fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0
ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0
00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com
|
|
matchall-flower ordering
Make sure that the drive restricts incorrect order of inserted matchall
vs. flower rules.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Check that matchall rules with sample actions are not possible to be
inserted to egress.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The file is about to contain matchall restrictions too, so change the
name to make it more generic.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
We want to have a tighter control on what ports we bind to in
the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means
connect() becomes slightly more expensive. The expensive part
comes from the fact that we now need to call inet_csk_get_port()
that verifies that the port is not used and allocates an entry
in the hash table for it.
Since we can't rely on "snum || !bind_address_no_port" to prevent
us from calling POST_BIND hook anymore, let's add another bind flag
to indicate that the call site is BPF program.
v5:
* fix wrong AF_INET (should be AF_INET6) in the bpf program for v6
v3:
* More bpf_bind documentation refinements (Martin KaFai Lau)
* Add UDP tests as well (Martin KaFai Lau)
* Don't start the thread, just do socket+bind+listen (Martin KaFai Lau)
v2:
* Update documentation (Andrey Ignatov)
* Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
|
|
1. Move pkt_v4 and pkt_v6 into network_helpers and adjust the users.
2. Copy-paste spin_lock_thread into two tests that use it.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-3-sdf@google.com
|
|
Move the following routines that let us start a background listener
thread and connect to a server by fd to the test_prog:
* start_server - socket+bind+listen
* connect_to_fd - connect to the server identified by fd
These will be used in the next commit.
Also, extend these helpers to support AF_INET6 and accept the family
as an argument.
v5:
* drop pthread.h (Martin KaFai Lau)
* add SO_SNDTIMEO (Martin KaFai Lau)
v4:
* export extra helper to start server without a thread (Martin KaFai Lau)
* tcp_rtt is no longer starting background thread (Martin KaFai Lau)
v2:
* put helpers into network_helpers.c (Andrii Nakryiko)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200508174611.228805-2-sdf@google.com
|
|
shellcheck complains that egrep is deprecated, and the grep man page
agrees. Use grep -E instead.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
It is Very Rude to clear dmesg in test scripts. That's because the
script may be part of a larger test run, and clearing dmesg
potentially destroys the output of other tests.
We can avoid using dmesg -c by saving the content of dmesg before the
test, and then using diff to compare that to the dmesg afterward,
producing a log with just the added lines.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
module does not exist
The UNRESOLVED state is much more apporiate than the UNSUPPORTED state
for the absence of the test module, as it matches "test was set up
incorrectly" situation in the README file.
A possible scenario is that the function was enabled (supported by the
kernel) but the module was not installed properly, in this case we
cannot call this as UNSUPPORTED.
This change also make it consistent with other module-related tests
in ftrace.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.
This issue was found with the help of Coccinelle.
[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Merge misc fixes from Andrew Morton:
"14 fixes and one selftest to verify the ipc fixes herein"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: limit boost_watermark on small zones
ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST
mm/vmscan: remove unnecessary argument description of isolate_lru_pages()
epoll: atomically remove wait entry on wake up
kselftests: introduce new epoll60 testcase for catching lost wakeups
percpu: make pcpu_alloc() aware of current gfp context
mm/slub: fix incorrect interpretation of s->offset
scripts/gdb: repair rb_first() and rb_last()
eventpoll: fix missing wakeup for ovflist in ep_poll_callback
arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()
scripts/decodecode: fix trapping instruction formatting
kernel/kcov.c: fix typos in kcov_remote_start documentation
mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
mm, memcg: fix error return value of mem_cgroup_css_alloc()
ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
|
|
When I added the expected error testing, I forgot I need to set
the return to zero when we successfully see an error.
Without this change we only end up testing a single heap
before the test quits.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: Benjamin Gaignard <benjamin.gaignard@linaro.org>
Cc: Brian Starkey <brian.starkey@arm.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: "Andrew F. Davis" <afd@ti.com>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
DR6 was incorrectly copied from the value in the VM. Instead,
DR6.BD should be set in order to catch this case.
On AMD this does not need any special code because the processor triggers
a #DB exception that is intercepted. However, the testcase would fail
without the previous patch because both DR6.BS and DR6.BD would be set.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The documentation for UBSAN_ALIGNMENT already mentions that it should
not be used on all*config builds (and for efficient-unaligned-access
architectures), so just refactor the Kconfig to correctly implement this
so randconfigs will stop creating insane images that freak out objtool
under CONFIG_UBSAN_TRAP (due to the false positives producing functions
that never return, etc).
Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook
Fixes: 0887a7ebc977 ("ubsan: add trap instrumentation option")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This test case catches lost wake up introduced by commit 339ddb53d373
("fs/epoll: remove unnecessary wakeups of nested epoll")
The test is simple: we have 10 threads and 10 event fds. Each thread
can harvest only 1 event. 1 producer fires all 10 events at once and
waits that all 10 events will be observed by 10 threads.
In case of lost wakeup epoll_wait() will timeout and 0 will be returned.
Test case catches two sort of problems: forgotten wakeup on event, which
hits the ->ovflist list, this problem was fixed by:
5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback")
the other problem is when several sequential events hit the same waiting
thread, thus other waiters get no wakeups. Problem is fixed in the
following patch.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Khazhismel Kumykov <khazhy@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Heiher <r@hev.cc>
Cc: Jason Baron <jbaron@akamai.com>
Link: http://lkml.kernel.org/r/20200430130326.1368509-1-rpenyaev@suse.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"ftrace test fixes and a fix to kvm Makefile for relocatable
native/cross builds and installs"
* tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: fix kvm relocatable native/cross builds and installs
selftests/ftrace: Make XFAIL green color
ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set
ftrace/selftests: workaround cgroup RT scheduling issues
|
|
'rcu-tasks.2020.04.27a', 'stall.2020.04.27a' and 'torture.2020.05.07a' into HEAD
fixes.2020.04.27a: Miscellaneous fixes.
kfree_rcu.2020.04.27a: Changes related to kfree_rcu().
rcu-tasks.2020.04.27a: Addition of new RCU-tasks flavors.
stall.2020.04.27a: RCU CPU stall-warning updates.
torture.2020.05.07a: Torture-test updates.
|
|
Make it a bit easier to apply KASAN to rcutorture runs with a new --kasan
argument, again leveraging the config_override_param() bash function.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit saves a few lines of code by also using the bash
config_override_param() to set the initial list of Kconfig options from
the CFcommon file. While in the area, it makes this function capable of
update-in-place on the file containing the cumulative Kconfig options,
thus avoiding annoying changes when adding another source of options.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit applies config_override_param() to allow scenario-specific
Kconfig options to override those in CFcommon. This in turn will allow
additional Kconfig options to be placed in CFcommon, for example, an
option common to all but a few scenario can be placed in CFcommon and
then overridden in those few scenarios. Plus this change saves one
whole line of code.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Currently, attempting to override a --kcsan default with a --kconfig
option might or might not work. However, it would be good to allow the
user to adjust the --kcsan defaults, for example, to specify a different
time for CONFIG_KCSAN_REPORT_ONCE_IN_MS. This commit therefore uses the
new config_override_param() bash function to apply the --kcsan defaults
and then apply the --kconfig options, which allows this overriding
to occur.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
This commit introduces a config_override_param() bash function that
folds in an additional set of Kconfig options. This is initially applied
to fold in the --kconfig kvm.sh parameter, but later commits will also
apply it to the Kconfig options added by the --kcsan kvm.sh parameter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The #CHECK# directives that can be present in CFcommon and in the
rcutorture scenario Kconfig files are both copied to ConfigFragment
and grepped out of the two directive files and added to ConfigFragment.
This commit therefore removes the redundant "grep" commands and takes
advantage of the consequent opportunity to simplify redirection.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The KCSAN tool emits a great many warnings for current kernels, for
example, a one-hour run of the full set of rcutorture scenarios results
in no fewer than 3252 such warnings, many of which are duplicates
or are otherwise closely related. This commit therefore introduces
a kcsan-collapse.sh script that maps these warnings down to a set of
function pairs (22 of them given the 3252 individual warnings), placing
the resulting list in decreasing order of frequency of occurrence into
a kcsan.sum file. If any KCSAN warnings were produced, the pathname of
this file is emitted at the end of the summary of the rcutorture runs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Although the existing --kconfig argument can be used to run KCSAN for
an rcutorture test, it is not as straightforward as one might like:
--kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y \
CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n \
CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n \
CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 \
CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y"
This commit therefore adds a "--kcsan" argument that emulates the above
--kconfig command. Note that if you specify a Kconfig option using
-kconfig that conflicts with one that --kcsan adds, you get whatever
the script and the build system decide to give you.
Cc: Marco Elver <elver@google.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
The number of CPUs is tuned to allow "4*CFLIST TREE10" on a large system,
up from "3*CFLIST TREE10" previously.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"Bugfixes, mostly for ARM and AMD, and more documentation.
Slightly bigger than usual because I couldn't send out what was
pending for rc4, but there is nothing worrisome going on. I have more
fixes pending for guest debugging support (gdbstub) but I will send
them next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM: selftests: Fix build for evmcs.h
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
docs/virt/kvm: Document configuring and running nested guests
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
KVM: x86: Fixes posted interrupt check for IRQs delivery modes
KVM: SVM: fill in kvm_run->debug.arch.dr[67]
KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
KVM: arm64: Fix 32bit PC wrap-around
KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM: arm64: Save/restore sp_el0 as part of __guest_enter
KVM: arm64: Delete duplicated label in invalid_vector
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
...
|