linux-stable.git/kernel/bpf, branch v4.4.166

bpf: generally move prog destruction to RCU deferral

2018-11-10T15:41:37+00:00

[ Upstream commit 1aacde3d22c42281236155c1ef6d7a5aa32a826b ]

Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:

 - Set up a process with threads T1, T2 and T3
 - Let T1 set up a socket filter F1 that invokes another filter F2
   through a BPF map [tail call]
 - Let T1 trigger the socket filter via a unix domain socket write,
   don't wait for completion
 - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
 - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
 - Let T3 close the file descriptor for F2, dropping the reference
   count of F2 to 2
 - At this point, T1 should have looked up F2 from the map, but not
   finished executing it
 - Let T3 remove F2 from the BPF map, dropping the reference count of
   F2 to 1
 - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
   the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
   via schedule_work()
 - At this point, the BPF program could be freed
 - BPF execution is still running in a freed BPF program

While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().

That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb56070359b ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.

Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.

Reported-by: Jann Horn 
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin

bpf: fix references to free_bpf_prog_info() in comments

2018-08-06T14:24:37+00:00

[ Upstream commit ab7f5bf0928be2f148d000a6eaa6c0a36e74750e ]

Comments in the verifier refer to free_bpf_prog_info() which
seems to have never existed in tree.  Replace it with
free_used_maps().

Signed-off-by: Jakub Kicinski 
Reviewed-by: Quentin Monnet 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Sasha Levin 
Signed-off-by: Greg Kroah-Hartman

bpf: map_get_next_key to return first key on NULL

2018-05-16T08:06:46+00:00

commit 8fe45924387be6b5c1be59a7eb330790c61d5d10 upstream.

When iterating through a map, we need to find a key that does not exist
in the map so map_get_next_key will give us the first key of the map.
This often requires a lot of guessing in production systems.

This patch makes map_get_next_key return the first key when the key
pointer in the parameter is NULL.

Signed-off-by: Teng Qin 
Signed-off-by: Alexei Starovoitov 
Acked-by: Daniel Borkmann 
Signed-off-by: David S. Miller 
Signed-off-by: Chenbo Feng 
Cc: Lorenzo Colitti 
Signed-off-by: Greg Kroah-Hartman

bpf: skip unnecessary capability check

2018-03-28T16:40:17+00:00

commit 0fa4fe85f4724fff89b09741c437cbee9cf8b008 upstream.

The current check statement in BPF syscall will do a capability check
for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This
code path will trigger unnecessary security hooks on capability checking
and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN
access. This can be resolved by simply switch the order of the statement
and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is
allowed.

Signed-off-by: Chenbo Feng 
Acked-by: Lorenzo Colitti 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Greg Kroah-Hartman

bpf: fix incorrect sign extension in check_alu_op()

2018-03-22T08:23:32+00:00

commit 95a762e2c8c942780948091f8f2a4f32fce1ac6f upstream.

Distinguish between
BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit)
and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit);
only perform sign extension in the first case.

This patch differs from the mainline one because the verifier's internals
have changed in the meantime. Mainline tracks register values as 64-bit
values; however, 4.4 still stores tracked register values as 32-bit
values with sign extension. Therefore, in the case of a 32-bit op with
negative immediate, the value can't be tracked; leave the register as
UNKNOWN_VALUE (set by the preceding check_reg_arg() call).


I have manually tested this patch on top of 4.4.122. For the following BPF
bytecode:

        BPF_MOV64_IMM(BPF_REG_1, 1),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 1, 1),
        BPF_EXIT_INSN(),

        BPF_MOV32_IMM(BPF_REG_1, 1),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 1, 1),
        BPF_EXIT_INSN(),

        BPF_MOV64_IMM(BPF_REG_1, -1),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, -1, 1),
        BPF_EXIT_INSN(),

        BPF_MOV32_IMM(BPF_REG_1, -1),
        BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, -1, 2),
        BPF_MOV32_IMM(BPF_REG_0, 42),
        BPF_EXIT_INSN(),

        BPF_MOV32_IMM(BPF_REG_0, 43),
        BPF_EXIT_INSN()

Verifier output on 4.4.122 without this patch:

0: (b7) r1 = 1
1: (15) if r1 == 0x1 goto pc+1
3: (b4) (u32) r1 = (u32) 1
4: (15) if r1 == 0x1 goto pc+1
6: (b7) r1 = -1
7: (15) if r1 == 0xffffffff goto pc+1
9: (b4) (u32) r1 = (u32) -1
10: (15) if r1 == 0xffffffff goto pc+2
13: (b4) (u32) r0 = (u32) 43
14: (95) exit

Verifier output on 4.4.122+ with this patch:

0: (b7) r1 = 1
1: (15) if r1 == 0x1 goto pc+1
3: (b4) (u32) r1 = (u32) 1
4: (15) if r1 == 0x1 goto pc+1
6: (b7) r1 = -1
7: (15) if r1 == 0xffffffff goto pc+1
9: (b4) (u32) r1 = (u32) -1
10: (15) if r1 == 0xffffffff goto pc+2
 R1=inv R10=fp
11: (b4) (u32) r0 = (u32) 42
12: (95) exit

from 10 to 13: R1=imm-1 R10=fp
13: (b4) (u32) r0 = (u32) 43
14: (95) exit


Signed-off-by: Jann Horn 
Acked-by: Daniel Borkmann 
Signed-off-by: Greg Kroah-Hartman

bpf: reject stores into ctx via st and xadd

2018-02-03T16:04:25+00:00

[ upstream commit f37a8cb84cce18762e8f86a70bd6a49a66ab964c ]

Alexei found that verifier does not reject stores into context
via BPF_ST instead of BPF_STX. And while looking at it, we
also should not allow XADD variant of BPF_STX.

The context rewriter is only assuming either BPF_LDX_MEM- or
BPF_STX_MEM-type operations, thus reject anything other than
that so that assumptions in the rewriter properly hold. Add
test cases as well for BPF selftests.

Fixes: d691f9e8d440 ("bpf: allow programs to write to certain skb fields")
Reported-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Greg Kroah-Hartman

bpf: fix 32-bit divide by zero

2018-02-03T16:04:25+00:00

[ upstream commit 68fda450a7df51cff9e5a4d4a4d9d0d5f2589153 ]

due to some JITs doing if (src_reg == 0) check in 64-bit mode
for div/mod operations mask upper 32-bits of src register
before doing the check

Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.")
Reported-by: syzbot+48340bb518e88849e2e3@syzkaller.appspotmail.com
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Greg Kroah-Hartman

bpf: fix divides by zero

2018-02-03T16:04:24+00:00

[ upstream commit c366287ebd698ef5e3de300d90cd62ee9ee7373e ]

Divides by zero are not nice, lets avoid them if possible.

Also do_div() seems not needed when dealing with 32bit operands,
but this seems a minor detail.

Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: Eric Dumazet 
Reported-by: syzbot 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Greg Kroah-Hartman

bpf: arsh is not supported in 32 bit alu thus reject it

2018-02-03T16:04:24+00:00

[ upstream commit 7891a87efc7116590eaba57acc3c422487802c6f ]

The following snippet was throwing an 'unknown opcode cc' warning
in BPF interpreter:

  0: (18) r0 = 0x0
  2: (7b) *(u64 *)(r10 -16) = r0
  3: (cc) (u32) r0 s>>= (u32) r0
  4: (95) exit

Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
generation, not all of them do and interpreter does neither. We can
leave existing ones and implement it later in bpf-next for the
remaining ones, but reject this properly in verifier for the time
being.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann 
Signed-off-by: Alexei Starovoitov 
Signed-off-by: Greg Kroah-Hartman

bpf: introduce BPF_JIT_ALWAYS_ON config

2018-02-03T16:04:24+00:00

[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]

The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.

A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."

To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64

The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden

v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)

v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
  It will be sent when the trees are merged back to net-next

Considered doing:
  int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.

Signed-off-by: Alexei Starovoitov 
Signed-off-by: Daniel Borkmann 
Signed-off-by: Greg Kroah-Hartman