<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/samples/bpf, branch v4.2.6</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>bpf: BPF based latency tracing</title>
<updated>2015-06-23T13:09:58+00:00</updated>
<author>
<name>Daniel Wagner</name>
<email>daniel.wagner@bmw-carit.de</email>
</author>
<published>2015-06-19T14:00:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0fb1170ee68a6aa14eca0666e02c4b62cbf1251d'/>
<id>0fb1170ee68a6aa14eca0666e02c4b62cbf1251d</id>
<content type='text'>
BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

The first array is used to store the start time stamp. The key is the
CPU id. The second array stores the log2(time diff). We need to use
static allocation here (array and not hash tables). The kprobes
hooking into trace_preempt_on|off should not calling any dynamic
memory allocation or free path. We need to avoid recursivly
getting called. Besides that, it reduces jitter in the measurement.

CPU 0
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 166723   |*************************************** |
    4096 -&gt; 8191     : 19870    |***                                     |
    8192 -&gt; 16383    : 6324     |                                        |
   16384 -&gt; 32767    : 1098     |                                        |
   32768 -&gt; 65535    : 190      |                                        |
   65536 -&gt; 131071   : 179      |                                        |
  131072 -&gt; 262143   : 18       |                                        |
  262144 -&gt; 524287   : 4        |                                        |
  524288 -&gt; 1048575  : 1363     |                                        |
CPU 1
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 114042   |*************************************** |
    4096 -&gt; 8191     : 9587     |**                                      |
    8192 -&gt; 16383    : 4140     |                                        |
   16384 -&gt; 32767    : 673      |                                        |
   32768 -&gt; 65535    : 179      |                                        |
   65536 -&gt; 131071   : 29       |                                        |
  131072 -&gt; 262143   : 4        |                                        |
  262144 -&gt; 524287   : 1        |                                        |
  524288 -&gt; 1048575  : 364      |                                        |
CPU 2
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 40147    |*************************************** |
    4096 -&gt; 8191     : 2300     |*                                       |
    8192 -&gt; 16383    : 828      |                                        |
   16384 -&gt; 32767    : 178      |                                        |
   32768 -&gt; 65535    : 59       |                                        |
   65536 -&gt; 131071   : 2        |                                        |
  131072 -&gt; 262143   : 0        |                                        |
  262144 -&gt; 524287   : 1        |                                        |
  524288 -&gt; 1048575  : 174      |                                        |
CPU 3
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 29626    |*************************************** |
    4096 -&gt; 8191     : 2704     |**                                      |
    8192 -&gt; 16383    : 1090     |                                        |
   16384 -&gt; 32767    : 160      |                                        |
   32768 -&gt; 65535    : 72       |                                        |
   65536 -&gt; 131071   : 32       |                                        |
  131072 -&gt; 262143   : 26       |                                        |
  262144 -&gt; 524287   : 12       |                                        |
  524288 -&gt; 1048575  : 298      |                                        |

All this is based on the trace3 examples written by
Alexei Starovoitov &lt;ast@plumgrid.com&gt;.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Cc: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BPF offers another way to generate latency histograms. We attach
kprobes at trace_preempt_off and trace_preempt_on and calculate the
time it takes to from seeing the off/on transition.

The first array is used to store the start time stamp. The key is the
CPU id. The second array stores the log2(time diff). We need to use
static allocation here (array and not hash tables). The kprobes
hooking into trace_preempt_on|off should not calling any dynamic
memory allocation or free path. We need to avoid recursivly
getting called. Besides that, it reduces jitter in the measurement.

CPU 0
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 166723   |*************************************** |
    4096 -&gt; 8191     : 19870    |***                                     |
    8192 -&gt; 16383    : 6324     |                                        |
   16384 -&gt; 32767    : 1098     |                                        |
   32768 -&gt; 65535    : 190      |                                        |
   65536 -&gt; 131071   : 179      |                                        |
  131072 -&gt; 262143   : 18       |                                        |
  262144 -&gt; 524287   : 4        |                                        |
  524288 -&gt; 1048575  : 1363     |                                        |
CPU 1
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 114042   |*************************************** |
    4096 -&gt; 8191     : 9587     |**                                      |
    8192 -&gt; 16383    : 4140     |                                        |
   16384 -&gt; 32767    : 673      |                                        |
   32768 -&gt; 65535    : 179      |                                        |
   65536 -&gt; 131071   : 29       |                                        |
  131072 -&gt; 262143   : 4        |                                        |
  262144 -&gt; 524287   : 1        |                                        |
  524288 -&gt; 1048575  : 364      |                                        |
CPU 2
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 40147    |*************************************** |
    4096 -&gt; 8191     : 2300     |*                                       |
    8192 -&gt; 16383    : 828      |                                        |
   16384 -&gt; 32767    : 178      |                                        |
   32768 -&gt; 65535    : 59       |                                        |
   65536 -&gt; 131071   : 2        |                                        |
  131072 -&gt; 262143   : 0        |                                        |
  262144 -&gt; 524287   : 1        |                                        |
  524288 -&gt; 1048575  : 174      |                                        |
CPU 3
      latency        : count     distribution
       1 -&gt; 1        : 0        |                                        |
       2 -&gt; 3        : 0        |                                        |
       4 -&gt; 7        : 0        |                                        |
       8 -&gt; 15       : 0        |                                        |
      16 -&gt; 31       : 0        |                                        |
      32 -&gt; 63       : 0        |                                        |
      64 -&gt; 127      : 0        |                                        |
     128 -&gt; 255      : 0        |                                        |
     256 -&gt; 511      : 0        |                                        |
     512 -&gt; 1023     : 0        |                                        |
    1024 -&gt; 2047     : 0        |                                        |
    2048 -&gt; 4095     : 29626    |*************************************** |
    4096 -&gt; 8191     : 2704     |**                                      |
    8192 -&gt; 16383    : 1090     |                                        |
   16384 -&gt; 32767    : 160      |                                        |
   32768 -&gt; 65535    : 72       |                                        |
   65536 -&gt; 131071   : 32       |                                        |
  131072 -&gt; 262143   : 26       |                                        |
  262144 -&gt; 524287   : 12       |                                        |
  524288 -&gt; 1048575  : 298      |                                        |

All this is based on the trace3 examples written by
Alexei Starovoitov &lt;ast@plumgrid.com&gt;.

Signed-off-by: Daniel Wagner &lt;daniel.wagner@bmw-carit.de&gt;
Cc: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Cc: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: introduce current-&gt;pid, tgid, uid, gid, comm accessors</title>
<updated>2015-06-15T22:53:50+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-06-13T02:39:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89'/>
<id>ffeedafbf0236f03aeb2e8db273b3e5ae5f5bc89</id>
<content type='text'>
eBPF programs attached to kprobes need to filter based on
current-&gt;pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-&gt;tgid &lt;&lt; 32 | current-&gt;pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid &lt;&lt; 32 | current_uid

bpf_get_current_comm(char *buf, int size_of_buf)
stores current-&gt;comm into buf

They can be used from the programs attached to TC as well to classify packets
based on current task fields.

Update tracex2 example to print histogram of write syscalls for each process
instead of aggregated for all.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
eBPF programs attached to kprobes need to filter based on
current-&gt;pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-&gt;tgid &lt;&lt; 32 | current-&gt;pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid &lt;&lt; 32 | current_uid

bpf_get_current_comm(char *buf, int size_of_buf)
stores current-&gt;comm into buf

They can be used from the programs attached to TC as well to classify packets
based on current task fields.

Update tracex2 example to print histogram of write syscalls for each process
instead of aggregated for all.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: allow programs to write to certain skb fields</title>
<updated>2015-06-07T09:01:33+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-06-04T17:11:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d691f9e8d4405c334aa10d556e73c8bf44cb0e01'/>
<id>d691f9e8d4405c334aa10d556e73c8bf44cb0e01</id>
<content type='text'>
allow programs read/write skb-&gt;mark, tc_index fields and
((struct qdisc_skb_cb *)cb)-&gt;data.

mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.

All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

Add verifier tests and improve sample code.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
allow programs read/write skb-&gt;mark, tc_index fields and
((struct qdisc_skb_cb *)cb)-&gt;data.

mark and tc_index are generically useful in TC.
cb[0]-cb[4] are primarily used to pass arguments from one
program to another called via bpf_tail_call() which can
be seen in sockex3_kern.c example.

All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
mark, tc_index are writeable from tc_cls_act only.
cb[0]-cb[4] are writeable by both sockets and tc_cls_act.

Add verifier tests and improve sample code.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: make programs see skb-&gt;data == L2 for ingress and egress</title>
<updated>2015-06-07T09:01:33+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-06-04T17:11:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3431205e03977aaf32bce6d4b16fb8244b510056'/>
<id>3431205e03977aaf32bce6d4b16fb8244b510056</id>
<content type='text'>
eBPF programs attached to ingress and egress qdiscs see inconsistent skb-&gt;data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt-&gt;func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.

This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
       w/o JIT   w/JIT
before  20.5     23.6 Mpps
after   21.8     26.6 Mpps

Old programs with BPF_LL_OFF will still work as-is.

We can now undo most of the earlier workaround commit:
a166151cbe33 ("bpf: fix bpf helpers to use skb-&gt;mac_header relative offsets")

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
eBPF programs attached to ingress and egress qdiscs see inconsistent skb-&gt;data.
For ingress L2 header is already pulled, whereas for egress it's present.
This is known to program writers which are currently forced to use
BPF_LL_OFF workaround.
Since programs don't change skb internal pointers it is safe to do
pull/push right around invocation of the program and earlier taps and
later pt-&gt;func() will not be affected.
Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
around run_filter/BPF_PROG_RUN even if skb_shared.

This fix finally allows programs to use optimized LD_ABS/IND instructions
without BPF_LL_OFF for higher performance.
tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
       w/o JIT   w/JIT
before  20.5     23.6 Mpps
after   21.8     26.6 Mpps

Old programs with BPF_LL_OFF will still work as-is.

We can now undo most of the earlier workaround commit:
a166151cbe33 ("bpf: fix bpf helpers to use skb-&gt;mac_header relative offsets")

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Acked-by: Jamal Hadi Salim &lt;jhs@mojatatu.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>samples/bpf: bpf_tail_call example for networking</title>
<updated>2015-05-21T21:07:59+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-05-19T23:59:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=530b2c8619f25f9c332c85510579943aa46df515'/>
<id>530b2c8619f25f9c332c85510579943aa46df515</id>
<content type='text'>
Usage:
$ sudo ./sockex3
IP     src.port -&gt; dst.port               bytes      packets
127.0.0.1.42010 -&gt; 127.0.0.1.12865         1568            8
127.0.0.1.59526 -&gt; 127.0.0.1.33778     11422636       173070
127.0.0.1.33778 -&gt; 127.0.0.1.59526  11260224828       341974
127.0.0.1.12865 -&gt; 127.0.0.1.42010         1832           12
IP     src.port -&gt; dst.port               bytes      packets
127.0.0.1.42010 -&gt; 127.0.0.1.12865         1568            8
127.0.0.1.59526 -&gt; 127.0.0.1.33778     23198092       351486
127.0.0.1.33778 -&gt; 127.0.0.1.59526  22972698518       698616
127.0.0.1.12865 -&gt; 127.0.0.1.42010         1832           12

this example is similar to sockex2 in a way that it accumulates per-flow
statistics, but it does packet parsing differently.
sockex2 inlines full packet parser routine into single bpf program.
This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6
and one main program that starts the process.
bpf_tail_call() mechanism allows each program to be small and be called
on demand potentially multiple times, so that many vlan, mpls, ip in ip,
gre encapsulations can be parsed. These and other protocol parsers can
be added or removed at runtime. TLVs can be parsed in similar manner.
Note, tail_call_cnt dynamic check limits the number of tail calls to 32.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Usage:
$ sudo ./sockex3
IP     src.port -&gt; dst.port               bytes      packets
127.0.0.1.42010 -&gt; 127.0.0.1.12865         1568            8
127.0.0.1.59526 -&gt; 127.0.0.1.33778     11422636       173070
127.0.0.1.33778 -&gt; 127.0.0.1.59526  11260224828       341974
127.0.0.1.12865 -&gt; 127.0.0.1.42010         1832           12
IP     src.port -&gt; dst.port               bytes      packets
127.0.0.1.42010 -&gt; 127.0.0.1.12865         1568            8
127.0.0.1.59526 -&gt; 127.0.0.1.33778     23198092       351486
127.0.0.1.33778 -&gt; 127.0.0.1.59526  22972698518       698616
127.0.0.1.12865 -&gt; 127.0.0.1.42010         1832           12

this example is similar to sockex2 in a way that it accumulates per-flow
statistics, but it does packet parsing differently.
sockex2 inlines full packet parser routine into single bpf program.
This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6
and one main program that starts the process.
bpf_tail_call() mechanism allows each program to be small and be called
on demand potentially multiple times, so that many vlan, mpls, ip in ip,
gre encapsulations can be parsed. These and other protocol parsers can
be added or removed at runtime. TLVs can be parsed in similar manner.
Note, tail_call_cnt dynamic check limits the number of tail calls to 32.

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>samples/bpf: bpf_tail_call example for tracing</title>
<updated>2015-05-21T21:07:59+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-05-19T23:59:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=5bacd7805ab4f07a69c7ef4b1d45ce553d2b1c3a'/>
<id>5bacd7805ab4f07a69c7ef4b1d45ce553d2b1c3a</id>
<content type='text'>
kprobe example that demonstrates how future seccomp programs may look like.
It attaches to seccomp_phase1() function and tail-calls other BPF programs
depending on syscall number.

Existing optimized classic BPF seccomp programs generated by Chrome look like:
if (sd.nr &lt; 121) {
  if (sd.nr &lt; 57) {
    if (sd.nr &lt; 22) {
      if (sd.nr &lt; 7) {
        if (sd.nr &lt; 4) {
          if (sd.nr &lt; 1) {
            check sys_read
          } else {
            if (sd.nr &lt; 3) {
              check sys_write and sys_open
            } else {
              check sys_close
            }
          }
        } else {
      } else {
    } else {
  } else {
} else {
}

the future seccomp using native eBPF may look like:
  bpf_tail_call(&amp;sd, &amp;syscall_jmp_table, sd.nr);
which is simpler, faster and leaves more room for per-syscall checks.

Usage:
$ sudo ./tracex5
&lt;...&gt;-366   [001] d...     4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
&lt;...&gt;-369   [003] d...     4.870066: : mmap
&lt;...&gt;-369   [003] d...     4.870077: : syscall=110 (one of get/set uid/pid/gid)
&lt;...&gt;-369   [003] d...     4.870089: : syscall=107 (one of get/set uid/pid/gid)
   sh-369   [000] d...     4.891740: : read(fd=0, buf=00000000023d1000, size=512)
   sh-369   [000] d...     4.891747: : write(fd=1, buf=00000000023d3000, size=512)
   sh-369   [000] d...     4.891747: : read(fd=1, buf=00000000023d3000, size=512)

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kprobe example that demonstrates how future seccomp programs may look like.
It attaches to seccomp_phase1() function and tail-calls other BPF programs
depending on syscall number.

Existing optimized classic BPF seccomp programs generated by Chrome look like:
if (sd.nr &lt; 121) {
  if (sd.nr &lt; 57) {
    if (sd.nr &lt; 22) {
      if (sd.nr &lt; 7) {
        if (sd.nr &lt; 4) {
          if (sd.nr &lt; 1) {
            check sys_read
          } else {
            if (sd.nr &lt; 3) {
              check sys_write and sys_open
            } else {
              check sys_close
            }
          }
        } else {
      } else {
    } else {
  } else {
} else {
}

the future seccomp using native eBPF may look like:
  bpf_tail_call(&amp;sd, &amp;syscall_jmp_table, sd.nr);
which is simpler, faster and leaves more room for per-syscall checks.

Usage:
$ sudo ./tracex5
&lt;...&gt;-366   [001] d...     4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
&lt;...&gt;-369   [003] d...     4.870066: : mmap
&lt;...&gt;-369   [003] d...     4.870077: : syscall=110 (one of get/set uid/pid/gid)
&lt;...&gt;-369   [003] d...     4.870089: : syscall=107 (one of get/set uid/pid/gid)
   sh-369   [000] d...     4.891740: : read(fd=0, buf=00000000023d1000, size=512)
   sh-369   [000] d...     4.891747: : write(fd=1, buf=00000000023d3000, size=512)
   sh-369   [000] d...     4.891747: : read(fd=1, buf=00000000023d3000, size=512)

Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>samples/bpf: fix in-source build of samples with clang</title>
<updated>2015-05-13T03:15:25+00:00</updated>
<author>
<name>Brenden Blanco</name>
<email>bblanco@plumgrid.com</email>
</author>
<published>2015-05-12T04:25:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b88c06e36dcb9b4ae285f7821f62d68dc34b25d3'/>
<id>b88c06e36dcb9b4ae285f7821f62d68dc34b25d3</id>
<content type='text'>
in-source build of 'make samples/bpf/' was incorrectly
using default compiler instead of invoking clang/llvm.
out-of-source build was ok.

Fixes: a80857822b0c ("samples: bpf: trivial eBPF program in C")
Signed-off-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
in-source build of 'make samples/bpf/' was incorrectly
using default compiler instead of invoking clang/llvm.
out-of-source build was ok.

Fixes: a80857822b0c ("samples: bpf: trivial eBPF program in C")
Signed-off-by: Brenden Blanco &lt;bblanco@plumgrid.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix two bugs in verification logic when accessing 'ctx' pointer</title>
<updated>2015-04-16T18:08:49+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-04-15T23:19:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=725f9dcd58dedfea49ef958babf6c0bf6b7594a9'/>
<id>725f9dcd58dedfea49ef958babf6c0bf6b7594a9</id>
<content type='text'>
1.
first bug is a silly mistake. It broke tracing examples and prevented
simple bpf programs from loading.

In the following code:
if (insn-&gt;imm == 0 &amp;&amp; BPF_SIZE(insn-&gt;code) == BPF_W) {
} else if (...) {
  // this part should have been executed when
  // insn-&gt;code == BPF_W and insn-&gt;imm != 0
}

Obviously it's not doing that. So simple instructions like:
r2 = *(u64 *)(r1 + 8)
will be rejected. Note the comments in the code around these branches
were and still valid and indicate the true intent.

Replace it with:
if (BPF_SIZE(insn-&gt;code) != BPF_W)
  continue;

if (insn-&gt;imm == 0) {
} else if (...) {
  // now this code will be executed when
  // insn-&gt;code == BPF_W and insn-&gt;imm != 0
}

2.
second bug is more subtle.
If malicious code is using the same dest register as source register,
the checks designed to prevent the same instruction to be used with different
pointer types will fail to trigger, since we were assigning src_reg_type
when it was already overwritten by check_mem_access().
The fix is trivial. Just move line:
src_reg_type = regs[insn-&gt;src_reg].type;
before check_mem_access().
Add new 'access skb fields bad4' test to check this case.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
1.
first bug is a silly mistake. It broke tracing examples and prevented
simple bpf programs from loading.

In the following code:
if (insn-&gt;imm == 0 &amp;&amp; BPF_SIZE(insn-&gt;code) == BPF_W) {
} else if (...) {
  // this part should have been executed when
  // insn-&gt;code == BPF_W and insn-&gt;imm != 0
}

Obviously it's not doing that. So simple instructions like:
r2 = *(u64 *)(r1 + 8)
will be rejected. Note the comments in the code around these branches
were and still valid and indicate the true intent.

Replace it with:
if (BPF_SIZE(insn-&gt;code) != BPF_W)
  continue;

if (insn-&gt;imm == 0) {
} else if (...) {
  // now this code will be executed when
  // insn-&gt;code == BPF_W and insn-&gt;imm != 0
}

2.
second bug is more subtle.
If malicious code is using the same dest register as source register,
the checks designed to prevent the same instruction to be used with different
pointer types will fail to trigger, since we were assigning src_reg_type
when it was already overwritten by check_mem_access().
The fix is trivial. Just move line:
src_reg_type = regs[insn-&gt;src_reg].type;
before check_mem_access().
Add new 'access skb fields bad4' test to check this case.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix bpf helpers to use skb-&gt;mac_header relative offsets</title>
<updated>2015-04-16T18:08:49+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@plumgrid.com</email>
</author>
<published>2015-04-15T19:55:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a166151cbe33b53221c24259e4a7201064b3ba79'/>
<id>a166151cbe33b53221c24259e4a7201064b3ba79</id>
<content type='text'>
For the short-term solution, lets fix bpf helper functions to use
skb-&gt;mac_header relative offsets instead of skb-&gt;data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For the short-term solution, lets fix bpf helper functions to use
skb-&gt;mac_header relative offsets instead of skb-&gt;data in order to
get the same eBPF programs with cls_bpf and act_bpf work on ingress
and egress qdisc path. We need to ensure that mac_header is set
before calling into programs. This is effectively the first option
from below referenced discussion.

More long term solution for LD_ABS|LD_IND instructions will be more
intrusive but also more beneficial than this, and implemented later
as it's too risky at this point in time.

I.e., we plan to look into the option of moving skb_pull() out of
eth_type_trans() and into netif_receive_skb() as has been suggested
as second option. Meanwhile, this solution ensures ingress can be
used with eBPF, too, and that we won't run into ABI troubles later.
For dealing with negative offsets inside eBPF helper functions,
we've implemented bpf_skb_clone_unwritable() to test for unwriteable
headers.

Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Signed-off-by: Alexei Starovoitov &lt;ast@plumgrid.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next</title>
<updated>2015-04-15T16:00:47+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2015-04-15T16:00:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6c373ca89399c5a3f7ef210ad8f63dc3437da345'/>
<id>6c373ca89399c5a3f7ef210ad8f63dc3437da345</id>
<content type='text'>
Pull networking updates from David Miller:

 1) Add BQL support to via-rhine, from Tino Reichardt.

 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading.  From Floria Fainelli.

 3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

 4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

 6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc.  And use this to clean up the neigh layer, then use it to
    implement MPLS support.  All from Eric Biederman.

 7) Support L3 forwarding offloading in switches, from Scott Feldman.

 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further.  From Alexander Duyck.

 9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf.  In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
    established in the main hash table.  Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath.  From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6.  From
    Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
  fm10k: Bump driver version to 0.15.2
  fm10k: corrected VF multicast update
  fm10k: mbx_update_max_size does not drop all oversized messages
  fm10k: reset head instead of calling update_max_size
  fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
  fm10k: update xcast mode before synchronizing multicast addresses
  fm10k: start service timer on probe
  fm10k: fix function header comment
  fm10k: comment next_vf_mbx flow
  fm10k: don't handle mailbox events in iov_event path and always process mailbox
  fm10k: use separate workqueue for fm10k driver
  fm10k: Set PF queues to unlimited bandwidth during virtualization
  fm10k: expose tx_timeout_count as an ethtool stat
  fm10k: only increment tx_timeout_count in Tx hang path
  fm10k: remove extraneous "Reset interface" message
  fm10k: separate PF only stats so that VF does not display them
  fm10k: use hw-&gt;mac.max_queues for stats
  fm10k: only show actual queues, not the maximum in hardware
  fm10k: allow creation of VLAN on default vid
  fm10k: fix unused warnings
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking updates from David Miller:

 1) Add BQL support to via-rhine, from Tino Reichardt.

 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers
    can support hw switch offloading.  From Floria Fainelli.

 3) Allow 'ip address' commands to initiate multicast group join/leave,
    from Madhu Challa.

 4) Many ipv4 FIB lookup optimizations from Alexander Duyck.

 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel
    Borkmann.

 6) Remove the ugly compat support in ARP for ugly layers like ax25,
    rose, etc.  And use this to clean up the neigh layer, then use it to
    implement MPLS support.  All from Eric Biederman.

 7) Support L3 forwarding offloading in switches, from Scott Feldman.

 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed
    up route lookups even further.  From Alexander Duyck.

 9) Many improvements and bug fixes to the rhashtable implementation,
    from Herbert Xu and Thomas Graf.  In particular, in the case where
    an rhashtable user bulk adds a large number of items into an empty
    table, we expand the table much more sanely.

10) Don't make the tcp_metrics hash table per-namespace, from Eric
    Biederman.

11) Extend EBPF to access SKB fields, from Alexei Starovoitov.

12) Split out new connection request sockets so that they can be
    established in the main hash table.  Much less false sharing since
    hash lookups go direct to the request sockets instead of having to
    go first to the listener then to the request socks hashed
    underneath.  From Eric Dumazet.

13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk.

14) Support stable privacy address generation for RFC7217 in IPV6.  From
    Hannes Frederic Sowa.

15) Hash network namespace into IP frag IDs, also from Hannes Frederic
    Sowa.

16) Convert PTP get/set methods to use 64-bit time, from Richard
    Cochran.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits)
  fm10k: Bump driver version to 0.15.2
  fm10k: corrected VF multicast update
  fm10k: mbx_update_max_size does not drop all oversized messages
  fm10k: reset head instead of calling update_max_size
  fm10k: renamed mbx_tx_dropped to mbx_tx_oversized
  fm10k: update xcast mode before synchronizing multicast addresses
  fm10k: start service timer on probe
  fm10k: fix function header comment
  fm10k: comment next_vf_mbx flow
  fm10k: don't handle mailbox events in iov_event path and always process mailbox
  fm10k: use separate workqueue for fm10k driver
  fm10k: Set PF queues to unlimited bandwidth during virtualization
  fm10k: expose tx_timeout_count as an ethtool stat
  fm10k: only increment tx_timeout_count in Tx hang path
  fm10k: remove extraneous "Reset interface" message
  fm10k: separate PF only stats so that VF does not display them
  fm10k: use hw-&gt;mac.max_queues for stats
  fm10k: only show actual queues, not the maximum in hardware
  fm10k: allow creation of VLAN on default vid
  fm10k: fix unused warnings
  ...
</pre>
</div>
</content>
</entry>
</feed>
