<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/trace/bpf_trace.c, branch v5.4.239</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>bpf: Skip task with pid=1 in send_signal_common()</title>
<updated>2023-02-06T06:52:48+00:00</updated>
<author>
<name>Hao Sun</name>
<email>sunhao.th@gmail.com</email>
</author>
<published>2023-01-06T08:48:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=4923160393b06a34759a11b17930d71e06f396f2'/>
<id>4923160393b06a34759a11b17930d71e06f396f2</id>
<content type='text'>
[ Upstream commit a3d81bc1eaef48e34dd0b9b48eefed9e02a06451 ]

The following kernel panic can be triggered when a task with pid=1 attaches
a prog that attempts to send killing signal to itself, also see [1] for more
details:

  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148
  Call Trace:
  &lt;TASK&gt;
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
  panic+0x2c4/0x60f kernel/panic.c:275
  do_exit.cold+0x63/0xe4 kernel/exit.c:789
  do_group_exit+0xd4/0x2a0 kernel/exit.c:950
  get_signal+0x2460/0x2600 kernel/signal.c:2858
  arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306
  exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
  exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
  syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
  do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

So skip task with pid=1 in bpf_send_signal_common() to avoid the panic.

  [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com

Signed-off-by: Hao Sun &lt;sunhao.th@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit a3d81bc1eaef48e34dd0b9b48eefed9e02a06451 ]

The following kernel panic can be triggered when a task with pid=1 attaches
a prog that attempts to send killing signal to itself, also see [1] for more
details:

  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  CPU: 3 PID: 1 Comm: systemd Not tainted 6.1.0-09652-g59fe41b5255f #148
  Call Trace:
  &lt;TASK&gt;
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x100/0x178 lib/dump_stack.c:106
  panic+0x2c4/0x60f kernel/panic.c:275
  do_exit.cold+0x63/0xe4 kernel/exit.c:789
  do_group_exit+0xd4/0x2a0 kernel/exit.c:950
  get_signal+0x2460/0x2600 kernel/signal.c:2858
  arch_do_signal_or_restart+0x78/0x5d0 arch/x86/kernel/signal.c:306
  exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
  exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
  __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
  syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
  do_syscall_64+0x44/0xb0 arch/x86/entry/common.c:86
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

So skip task with pid=1 in bpf_send_signal_common() to avoid the panic.

  [1] https://lore.kernel.org/bpf/20221222043507.33037-1-sunhao.th@gmail.com

Signed-off-by: Hao Sun &lt;sunhao.th@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Acked-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Link: https://lore.kernel.org/bpf/20230106084838.12690-1-sunhao.th@gmail.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracepoint: Add tracepoint_probe_register_may_exist() for BPF tracing</title>
<updated>2021-07-14T14:53:08+00:00</updated>
<author>
<name>Steven Rostedt (VMware)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2021-06-29T13:40:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c65755f595cd9f21da9569224c11c8a43a670ace'/>
<id>c65755f595cd9f21da9569224c11c8a43a670ace</id>
<content type='text'>
commit 9913d5745bd720c4266805c8d29952a3702e4eca upstream.

All internal use cases for tracepoint_probe_register() is set to not ever
be called with the same function and data. If it is, it is considered a
bug, as that means the accounting of handling tracepoints is corrupted.
If the function and data for a tracepoint is already registered when
tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
return with EEXISTS.

The BPF system call can end up calling tracepoint_probe_register() with
the same data, which now means that this can trigger the warning because
of a user space process. As WARN_ON_ONCE() should not be called because
user space called a system call with bad data, there needs to be a way to
register a tracepoint without triggering a warning.

Enter tracepoint_probe_register_may_exist(), which can be called, but will
not cause a WARN_ON() if the probe already exists. It will still error out
with EEXIST, which will then be sent to the user space that performed the
BPF system call.

This keeps the previous testing for issues with other users of the
tracepoint code, while letting BPF call it with duplicated data and not
warn about it.

Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/
Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153

Cc: stable@vger.kernel.org
Fixes: c4f6699dfcb85 ("bpf: introduce BPF_RAW_TRACEPOINT")
Reported-by: syzbot &lt;syzbot+721aa903751db87aa244@syzkaller.appspotmail.com&gt;
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9913d5745bd720c4266805c8d29952a3702e4eca upstream.

All internal use cases for tracepoint_probe_register() is set to not ever
be called with the same function and data. If it is, it is considered a
bug, as that means the accounting of handling tracepoints is corrupted.
If the function and data for a tracepoint is already registered when
tracepoint_probe_register() is called, it will call WARN_ON_ONCE() and
return with EEXISTS.

The BPF system call can end up calling tracepoint_probe_register() with
the same data, which now means that this can trigger the warning because
of a user space process. As WARN_ON_ONCE() should not be called because
user space called a system call with bad data, there needs to be a way to
register a tracepoint without triggering a warning.

Enter tracepoint_probe_register_may_exist(), which can be called, but will
not cause a WARN_ON() if the probe already exists. It will still error out
with EEXIST, which will then be sent to the user space that performed the
BPF system call.

This keeps the previous testing for issues with other users of the
tracepoint code, while letting BPF call it with duplicated data and not
warn about it.

Link: https://lore.kernel.org/lkml/20210626135845.4080-1-penguin-kernel@I-love.SAKURA.ne.jp/
Link: https://syzkaller.appspot.com/bug?id=41f4318cf01762389f4d1c1c459da4f542fe5153

Cc: stable@vger.kernel.org
Fixes: c4f6699dfcb85 ("bpf: introduce BPF_RAW_TRACEPOINT")
Reported-by: syzbot &lt;syzbot+721aa903751db87aa244@syzkaller.appspotmail.com&gt;
Reported-by: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Tested-by: syzbot+721aa903751db87aa244@syzkaller.appspotmail.com
Signed-off-by: Steven Rostedt (VMware) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address()</title>
<updated>2020-12-30T10:51:18+00:00</updated>
<author>
<name>Andrii Nakryiko</name>
<email>andrii@kernel.org</email>
</author>
<published>2020-12-03T20:46:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=08e22710601a33464cbec3d2d371cb992738256c'/>
<id>08e22710601a33464cbec3d2d371cb992738256c</id>
<content type='text'>
[ Upstream commit 12cc126df82c96c89706aa207ad27c56f219047c ]

__module_address() needs to be called with preemption disabled or with
module_mutex taken. preempt_disable() is enough for read-only uses, which is
what this fix does. Also, module_put() does internal check for NULL, so drop
it as well.

Fixes: a38d1107f937 ("bpf: support raw tracepoints in modules")
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20201203204634.1325171-2-andrii@kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 12cc126df82c96c89706aa207ad27c56f219047c ]

__module_address() needs to be called with preemption disabled or with
module_mutex taken. preempt_disable() is enough for read-only uses, which is
what this fix does. Also, module_put() does internal check for NULL, so drop
it as well.

Fixes: a38d1107f937 ("bpf: support raw tracepoints in modules")
Signed-off-by: Andrii Nakryiko &lt;andrii@kernel.org&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Link: https://lore.kernel.org/bpf/20201203204634.1325171-2-andrii@kernel.org
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Fix deadlock with rq_lock in bpf_send_signal()</title>
<updated>2020-04-17T08:49:57+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2020-03-04T19:11:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fd29a0242f86b2d95ad666aa9f92a3d0f7bfdab6'/>
<id>fd29a0242f86b2d95ad666aa9f92a3d0f7bfdab6</id>
<content type='text'>
[ Upstream commit 1bc7896e9ef44fd77858b3ef0b8a6840be3a4494 ]

When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153f2b58 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153f2b58, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;sys/mman.h&gt;
  #include &lt;pthread.h&gt;
  #include &lt;sys/types.h&gt;
  #include &lt;sys/stat.h&gt;
  #include &lt;fcntl.h&gt;

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd &lt; 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&amp;ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc &lt; 2)
                return 0;

        filename = argv[1];

        for (i = 0; i &lt; THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i &lt; THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
#     --- a/tools/trace.py
#     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&amp;__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &amp;__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -&gt; #1 (&amp;p-&gt;pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -&gt; #0 (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &amp;(&amp;sighand-&gt;siglock)-&gt;rlock --&gt; &amp;p-&gt;pi_lock --&gt; &amp;rq-&gt;lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&amp;rq-&gt;lock);
  [   32.856671]                                lock(&amp;p-&gt;pi_lock);
  [   32.857332]                                lock(&amp;rq-&gt;lock);
  [   32.857999]   lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);

  Deadlock happens on CPU0 when it tries to acquire &amp;sighand-&gt;siglock
  but it has been held by CPU1 and CPU1 tries to grab &amp;rq-&gt;lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 1bc7896e9ef44fd77858b3ef0b8a6840be3a4494 ]

When experimenting with bpf_send_signal() helper in our production
environment (5.2 based), we experienced a deadlock in NMI mode:
   #5 [ffffc9002219f770] queued_spin_lock_slowpath at ffffffff8110be24
   #6 [ffffc9002219f770] _raw_spin_lock_irqsave at ffffffff81a43012
   #7 [ffffc9002219f780] try_to_wake_up at ffffffff810e7ecd
   #8 [ffffc9002219f7e0] signal_wake_up_state at ffffffff810c7b55
   #9 [ffffc9002219f7f0] __send_signal at ffffffff810c8602
  #10 [ffffc9002219f830] do_send_sig_info at ffffffff810ca31a
  #11 [ffffc9002219f868] bpf_send_signal at ffffffff8119d227
  #12 [ffffc9002219f988] bpf_overflow_handler at ffffffff811d4140
  #13 [ffffc9002219f9e0] __perf_event_overflow at ffffffff811d68cf
  #14 [ffffc9002219fa10] perf_swevent_overflow at ffffffff811d6a09
  #15 [ffffc9002219fa38] ___perf_sw_event at ffffffff811e0f47
  #16 [ffffc9002219fc30] __schedule at ffffffff81a3e04d
  #17 [ffffc9002219fc90] schedule at ffffffff81a3e219
  #18 [ffffc9002219fca0] futex_wait_queue_me at ffffffff8113d1b9
  #19 [ffffc9002219fcd8] futex_wait at ffffffff8113e529
  #20 [ffffc9002219fdf0] do_futex at ffffffff8113ffbc
  #21 [ffffc9002219fec0] __x64_sys_futex at ffffffff81140d1c
  #22 [ffffc9002219ff38] do_syscall_64 at ffffffff81002602
  #23 [ffffc9002219ff50] entry_SYSCALL_64_after_hwframe at ffffffff81c00068

The above call stack is actually very similar to an issue
reported by Commit eac9153f2b58 ("bpf/stackmap: Fix deadlock with
rq_lock in bpf_get_stack()") by Song Liu. The only difference is
bpf_send_signal() helper instead of bpf_get_stack() helper.

The above deadlock is triggered with a perf_sw_event.
Similar to Commit eac9153f2b58, the below almost identical reproducer
used tracepoint point sched/sched_switch so the issue can be easily caught.
  /* stress_test.c */
  #include &lt;stdio.h&gt;
  #include &lt;stdlib.h&gt;
  #include &lt;sys/mman.h&gt;
  #include &lt;pthread.h&gt;
  #include &lt;sys/types.h&gt;
  #include &lt;sys/stat.h&gt;
  #include &lt;fcntl.h&gt;

  #define THREAD_COUNT 1000
  char *filename;
  void *worker(void *p)
  {
        void *ptr;
        int fd;
        char *pptr;

        fd = open(filename, O_RDONLY);
        if (fd &lt; 0)
                return NULL;
        while (1) {
                struct timespec ts = {0, 1000 + rand() % 2000};

                ptr = mmap(NULL, 4096 * 64, PROT_READ, MAP_PRIVATE, fd, 0);
                usleep(1);
                if (ptr == MAP_FAILED) {
                        printf("failed to mmap\n");
                        break;
                }
                munmap(ptr, 4096 * 64);
                usleep(1);
                pptr = malloc(1);
                usleep(1);
                pptr[0] = 1;
                usleep(1);
                free(pptr);
                usleep(1);
                nanosleep(&amp;ts, NULL);
        }
        close(fd);
        return NULL;
  }

  int main(int argc, char *argv[])
  {
        void *ptr;
        int i;
        pthread_t threads[THREAD_COUNT];

        if (argc &lt; 2)
                return 0;

        filename = argv[1];

        for (i = 0; i &lt; THREAD_COUNT; i++) {
                if (pthread_create(threads + i, NULL, worker, NULL)) {
                        fprintf(stderr, "Error creating thread\n");
                        return 0;
                }
        }

        for (i = 0; i &lt; THREAD_COUNT; i++)
                pthread_join(threads[i], NULL);
        return 0;
  }
and the following command:
  1. run `stress_test /bin/ls` in one windown
  2. hack bcc trace.py with the following change:
#     --- a/tools/trace.py
#     +++ b/tools/trace.py
     @@ -513,6 +513,7 @@ BPF_PERF_OUTPUT(%s);
              __data.tgid = __tgid;
              __data.pid = __pid;
              bpf_get_current_comm(&amp;__data.comm, sizeof(__data.comm));
     +        bpf_send_signal(10);
      %s
      %s
              %s.perf_submit(%s, &amp;__data, sizeof(__data));
  3. in a different window run
     ./trace.py -p $(pidof stress_test) t:sched:sched_switch

The deadlock can be reproduced in our production system.

Similar to Song's fix, the fix is to delay sending signal if
irqs is disabled to avoid deadlocks involving with rq_lock.
With this change, my above stress-test in our production system
won't cause deadlock any more.

I also implemented a scale-down version of reproducer in the
selftest (a subsequent commit). With latest bpf-next,
it complains for the following potential deadlock.
  [   32.832450] -&gt; #1 (&amp;p-&gt;pi_lock){-.-.}:
  [   32.833100]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.833696]        task_rq_lock+0x2c/0xa0
  [   32.834182]        task_sched_runtime+0x59/0xd0
  [   32.834721]        thread_group_cputime+0x250/0x270
  [   32.835304]        thread_group_cputime_adjusted+0x2e/0x70
  [   32.835959]        do_task_stat+0x8a7/0xb80
  [   32.836461]        proc_single_show+0x51/0xb0
  ...
  [   32.839512] -&gt; #0 (&amp;(&amp;sighand-&gt;siglock)-&gt;rlock){....}:
  [   32.840275]        __lock_acquire+0x1358/0x1a20
  [   32.840826]        lock_acquire+0xc7/0x1d0
  [   32.841309]        _raw_spin_lock_irqsave+0x44/0x80
  [   32.841916]        __lock_task_sighand+0x79/0x160
  [   32.842465]        do_send_sig_info+0x35/0x90
  [   32.842977]        bpf_send_signal+0xa/0x10
  [   32.843464]        bpf_prog_bc13ed9e4d3163e3_send_signal_tp_sched+0x465/0x1000
  [   32.844301]        trace_call_bpf+0x115/0x270
  [   32.844809]        perf_trace_run_bpf_submit+0x4a/0xc0
  [   32.845411]        perf_trace_sched_switch+0x10f/0x180
  [   32.846014]        __schedule+0x45d/0x880
  [   32.846483]        schedule+0x5f/0xd0
  ...

  [   32.853148] Chain exists of:
  [   32.853148]   &amp;(&amp;sighand-&gt;siglock)-&gt;rlock --&gt; &amp;p-&gt;pi_lock --&gt; &amp;rq-&gt;lock
  [   32.853148]
  [   32.854451]  Possible unsafe locking scenario:
  [   32.854451]
  [   32.855173]        CPU0                    CPU1
  [   32.855745]        ----                    ----
  [   32.856278]   lock(&amp;rq-&gt;lock);
  [   32.856671]                                lock(&amp;p-&gt;pi_lock);
  [   32.857332]                                lock(&amp;rq-&gt;lock);
  [   32.857999]   lock(&amp;(&amp;sighand-&gt;siglock)-&gt;rlock);

  Deadlock happens on CPU0 when it tries to acquire &amp;sighand-&gt;siglock
  but it has been held by CPU1 and CPU1 tries to grab &amp;rq-&gt;lock
  and cannot get it.

  This is not exactly the callstack in our production environment,
  but sympotom is similar and both locks are using spin_lock_irqsave()
  to acquire the lock, and both involves rq_lock. The fix to delay
  sending signal when irq is disabled also fixed this issue.

Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Cc: Song Liu &lt;songliubraving@fb.com&gt;
Link: https://lore.kernel.org/bpf/20200304191104.2796501-1-yhs@fb.com
Signed-off-by: Sasha Levin &lt;sashal@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Make use of probe_user_write in probe write helper</title>
<updated>2020-01-17T18:48:40+00:00</updated>
<author>
<name>Daniel Borkmann</name>
<email>daniel@iogearbox.net</email>
</author>
<published>2019-11-01T23:17:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=600e9099642bb7584410b1f45eab8371f4c66bc7'/>
<id>600e9099642bb7584410b1f45eab8371f4c66bc7</id>
<content type='text'>
commit eb1b66887472eaa7342305b7890ae510dd9d1a79 upstream.

Convert the bpf_probe_write_user() helper to probe_user_write() such that
writes are not attempted under KERNEL_DS anymore which is buggy as kernel
and user space pointers can have overlapping addresses. Also, given we have
the access_ok() check inside probe_user_write(), the helper doesn't need
to do it twice.

Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit eb1b66887472eaa7342305b7890ae510dd9d1a79 upstream.

Convert the bpf_probe_write_user() helper to probe_user_write() such that
writes are not attempted under KERNEL_DS anymore which is buggy as kernel
and user space pointers can have overlapping addresses. Also, given we have
the access_ok() check inside probe_user_write(), the helper doesn't need
to do it twice.

Fixes: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers")
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: Andrii Nakryiko &lt;andriin@fb.com&gt;
Link: https://lore.kernel.org/bpf/841c461781874c07a0ee404a454c3bc0459eed30.1572649915.git.daniel@iogearbox.net
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net</title>
<updated>2019-09-29T00:47:33+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-29T00:47:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=02dc96ef6c25f990452c114c59d75c368a1f4c8f'/>
<id>02dc96ef6c25f990452c114c59d75c368a1f4c8f</id>
<content type='text'>
Pull networking fixes from David Miller:

 1) Sanity check URB networking device parameters to avoid divide by
    zero, from Oliver Neukum.

 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
    don't work properly. Longer term this needs a better fix tho. From
    Vijay Khemka.

 3) Small fixes to selftests (use ping when ping6 is not present, etc.)
    from David Ahern.

 4) Bring back rt_uses_gateway member of struct rtable, it's semantics
    were not well understood and trying to remove it broke things. From
    David Ahern.

 5) Move usbnet snaity checking, ignore endpoints with invalid
    wMaxPacketSize. From Bjørn Mork.

 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.

 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
    Alex Vesker, and Yevgeny Kliteynik

 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.

 9) Fix crash when removing sch_cbs entry while offloading is enabled,
    from Vinicius Costa Gomes.

10) Signedness bug fixes, generally in looking at the result given by
    of_get_phy_mode() and friends. From Dan Crapenter.

11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.

12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.

13) Fix quantization code in tcp_bbr, from Kevin Yang.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
  net: tap: clean up an indentation issue
  nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
  tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
  sk_buff: drop all skb extensions on free and skb scrubbing
  tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
  mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
  Documentation: Clarify trap's description
  mlxsw: spectrum: Clear VLAN filters during port initialization
  net: ena: clean up indentation issue
  NFC: st95hf: clean up indentation issue
  net: phy: micrel: add Asym Pause workaround for KSZ9021
  net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
  ptp: correctly disable flags on old ioctls
  lib: dimlib: fix help text typos
  net: dsa: microchip: Always set regmap stride to 1
  nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
  nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
  net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
  vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
  net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull networking fixes from David Miller:

 1) Sanity check URB networking device parameters to avoid divide by
    zero, from Oliver Neukum.

 2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
    don't work properly. Longer term this needs a better fix tho. From
    Vijay Khemka.

 3) Small fixes to selftests (use ping when ping6 is not present, etc.)
    from David Ahern.

 4) Bring back rt_uses_gateway member of struct rtable, it's semantics
    were not well understood and trying to remove it broke things. From
    David Ahern.

 5) Move usbnet snaity checking, ignore endpoints with invalid
    wMaxPacketSize. From Bjørn Mork.

 6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.

 7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
    Alex Vesker, and Yevgeny Kliteynik

 8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.

 9) Fix crash when removing sch_cbs entry while offloading is enabled,
    from Vinicius Costa Gomes.

10) Signedness bug fixes, generally in looking at the result given by
    of_get_phy_mode() and friends. From Dan Crapenter.

11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.

12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.

13) Fix quantization code in tcp_bbr, from Kevin Yang.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
  net: tap: clean up an indentation issue
  nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
  tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
  sk_buff: drop all skb extensions on free and skb scrubbing
  tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
  mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
  Documentation: Clarify trap's description
  mlxsw: spectrum: Clear VLAN filters during port initialization
  net: ena: clean up indentation issue
  NFC: st95hf: clean up indentation issue
  net: phy: micrel: add Asym Pause workaround for KSZ9021
  net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
  ptp: correctly disable flags on old ioctls
  lib: dimlib: fix help text typos
  net: dsa: microchip: Always set regmap stride to 1
  nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
  nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
  net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
  vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
  net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security</title>
<updated>2019-09-28T15:14:15+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2019-09-28T15:14:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=aefcf2f4b58155d27340ba5f9ddbe9513da8286d'/>
<id>aefcf2f4b58155d27340ba5f9ddbe9513da8286d</id>
<content type='text'>
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current-&gt;comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull kernel lockdown mode from James Morris:
 "This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.

  From the original description:

    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.

    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.

  There are two major changes since this was last proposed for mainline:

   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/

   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

  The new locked_down LSM hook is provided to allow LSMs to make a
  policy decision around whether kernel functionality that would allow
  tampering with or examining the runtime state of the kernel should be
  permitted.

  The included lockdown LSM provides an implementation with a simple
  policy intended for general purpose use. This policy provides a coarse
  level of granularity, controllable via the kernel command line:

    lockdown={integrity|confidentiality}

  Enable the kernel lockdown feature. If set to integrity, kernel features
  that allow userland to modify the running kernel are disabled. If set to
  confidentiality, kernel features that allow userland to extract
  confidential information from the kernel are also disabled.

  This may also be controlled via /sys/kernel/security/lockdown and
  overriden by kernel configuration.

  New or existing LSMs may implement finer-grained controls of the
  lockdown features. Refer to the lockdown_reason documentation in
  include/linux/security.h for details.

  The lockdown feature has had signficant design feedback and review
  across many subsystems. This code has been in linux-next for some
  weeks, with a few fixes applied along the way.

  Stephen Rothwell noted that commit 9d1f8be5cf42 ("bpf: Restrict bpf
  when kernel lockdown is in confidentiality mode") is missing a
  Signed-off-by from its author. Matthew responded that he is providing
  this under category (c) of the DCO"

* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
  kexec: Fix file verification on S390
  security: constify some arrays in lockdown LSM
  lockdown: Print current-&gt;comm in restriction messages
  efi: Restrict efivar_ssdt_load when the kernel is locked down
  tracefs: Restrict tracefs when the kernel is locked down
  debugfs: Restrict debugfs when the kernel is locked down
  kexec: Allow kexec_file() with appropriate IMA policy when locked down
  lockdown: Lock down perf when in confidentiality mode
  bpf: Restrict bpf when kernel lockdown is in confidentiality mode
  lockdown: Lock down tracing and perf kprobes when in confidentiality mode
  lockdown: Lock down /proc/kcore
  x86/mmiotrace: Lock down the testmmiotrace module
  lockdown: Lock down module params that specify hardware parameters (eg. ioport)
  lockdown: Lock down TIOCSSERIAL
  lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
  acpi: Disable ACPI table override if the kernel is locked down
  acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
  ACPI: Limit access to custom_method when the kernel is locked down
  x86/msr: Restrict MSR access when the kernel is locked down
  x86: Lock down IO port access when the kernel is locked down
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Fix bpf_event_output re-entry issue</title>
<updated>2019-09-27T09:24:29+00:00</updated>
<author>
<name>Allan Zhang</name>
<email>allanzhang@google.com</email>
</author>
<published>2019-09-25T23:43:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=768fb61fcc13b2acaca758275d54c09a65e2968b'/>
<id>768fb61fcc13b2acaca758275d54c09a65e2968b</id>
<content type='text'>
BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it
can be called from atomic and non-atomic contexts since we don't have
bpf_prog_active to prevent it happen.

This patch enables 3 levels of nesting to support normal, irq and nmi
context.

We can easily reproduce the issue by running netperf crr mode with 100
flows and 10 threads from netperf client side.

Here is the whole stack dump:

[  515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220
[  515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G        W        4.15.0-smp-fixpanic #44
[  515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018
[  515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220
[  515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246
[  515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000
[  515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80
[  515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012
[  515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002
[  515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0
[  515.228910] FS:  00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000
[  515.228910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0
[  515.228911] Call Trace:
[  515.228913]  &lt;IRQ&gt;
[  515.228919]  [&lt;ffffffff82c6c6cb&gt;] bpf_sockopt_event_output+0x3b/0x50
[  515.228923]  [&lt;ffffffff8265daee&gt;] ? bpf_ktime_get_ns+0xe/0x10
[  515.228927]  [&lt;ffffffff8266fda5&gt;] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
[  515.228930]  [&lt;ffffffff82cf90a5&gt;] ? tcp_init_transfer+0x125/0x150
[  515.228933]  [&lt;ffffffff82cf9159&gt;] ? tcp_finish_connect+0x89/0x110
[  515.228936]  [&lt;ffffffff82cf98e4&gt;] ? tcp_rcv_state_process+0x704/0x1010
[  515.228939]  [&lt;ffffffff82c6e263&gt;] ? sk_filter_trim_cap+0x53/0x2a0
[  515.228942]  [&lt;ffffffff82d90d1f&gt;] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0
[  515.228945]  [&lt;ffffffff82d92160&gt;] ? tcp_v6_do_rcv+0x1c0/0x460
[  515.228947]  [&lt;ffffffff82d93558&gt;] ? tcp_v6_rcv+0x9f8/0xb30
[  515.228951]  [&lt;ffffffff82d737c0&gt;] ? ip6_route_input+0x190/0x220
[  515.228955]  [&lt;ffffffff82d5f7ad&gt;] ? ip6_protocol_deliver_rcu+0x6d/0x450
[  515.228958]  [&lt;ffffffff82d60246&gt;] ? ip6_rcv_finish+0xb6/0x170
[  515.228961]  [&lt;ffffffff82d5fb90&gt;] ? ip6_protocol_deliver_rcu+0x450/0x450
[  515.228963]  [&lt;ffffffff82d60361&gt;] ? ipv6_rcv+0x61/0xe0
[  515.228966]  [&lt;ffffffff82d60190&gt;] ? ipv6_list_rcv+0x330/0x330
[  515.228969]  [&lt;ffffffff82c4976b&gt;] ? __netif_receive_skb_one_core+0x5b/0xa0
[  515.228972]  [&lt;ffffffff82c497d1&gt;] ? __netif_receive_skb+0x21/0x70
[  515.228975]  [&lt;ffffffff82c4a8d2&gt;] ? process_backlog+0xb2/0x150
[  515.228978]  [&lt;ffffffff82c4aadf&gt;] ? net_rx_action+0x16f/0x410
[  515.228982]  [&lt;ffffffff830000dd&gt;] ? __do_softirq+0xdd/0x305
[  515.228986]  [&lt;ffffffff8252cfdc&gt;] ? irq_exit+0x9c/0xb0
[  515.228989]  [&lt;ffffffff82e02de5&gt;] ? smp_call_function_single_interrupt+0x65/0x120
[  515.228991]  [&lt;ffffffff82e020e1&gt;] ? call_function_single_interrupt+0x81/0x90
[  515.228992]  &lt;/IRQ&gt;
[  515.228996]  [&lt;ffffffff82a11ff0&gt;] ? io_serial_in+0x20/0x20
[  515.229000]  [&lt;ffffffff8259c040&gt;] ? console_unlock+0x230/0x490
[  515.229003]  [&lt;ffffffff8259cbaa&gt;] ? vprintk_emit+0x26a/0x2a0
[  515.229006]  [&lt;ffffffff8259cbff&gt;] ? vprintk_default+0x1f/0x30
[  515.229008]  [&lt;ffffffff8259d9f5&gt;] ? vprintk_func+0x35/0x70
[  515.229011]  [&lt;ffffffff8259d4bb&gt;] ? printk+0x50/0x66
[  515.229013]  [&lt;ffffffff82637637&gt;] ? bpf_event_output+0xb7/0x220
[  515.229016]  [&lt;ffffffff82c6c6cb&gt;] ? bpf_sockopt_event_output+0x3b/0x50
[  515.229019]  [&lt;ffffffff8265daee&gt;] ? bpf_ktime_get_ns+0xe/0x10
[  515.229023]  [&lt;ffffffff82c29e87&gt;] ? release_sock+0x97/0xb0
[  515.229026]  [&lt;ffffffff82ce9d6a&gt;] ? tcp_recvmsg+0x31a/0xda0
[  515.229029]  [&lt;ffffffff8266fda5&gt;] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
[  515.229032]  [&lt;ffffffff82ce77c1&gt;] ? tcp_set_state+0x191/0x1b0
[  515.229035]  [&lt;ffffffff82ced10e&gt;] ? tcp_disconnect+0x2e/0x600
[  515.229038]  [&lt;ffffffff82cecbbb&gt;] ? tcp_close+0x3eb/0x460
[  515.229040]  [&lt;ffffffff82d21082&gt;] ? inet_release+0x42/0x70
[  515.229043]  [&lt;ffffffff82d58809&gt;] ? inet6_release+0x39/0x50
[  515.229046]  [&lt;ffffffff82c1f32d&gt;] ? __sock_release+0x4d/0xd0
[  515.229049]  [&lt;ffffffff82c1f3e5&gt;] ? sock_close+0x15/0x20
[  515.229052]  [&lt;ffffffff8273b517&gt;] ? __fput+0xe7/0x1f0
[  515.229055]  [&lt;ffffffff8273b66e&gt;] ? ____fput+0xe/0x10
[  515.229058]  [&lt;ffffffff82547bf2&gt;] ? task_work_run+0x82/0xb0
[  515.229061]  [&lt;ffffffff824086df&gt;] ? exit_to_usermode_loop+0x7e/0x11f
[  515.229064]  [&lt;ffffffff82408171&gt;] ? do_syscall_64+0x111/0x130
[  515.229067]  [&lt;ffffffff82e0007c&gt;] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops")
Signed-off-by: Allan Zhang &lt;allanzhang@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20190925234312.94063-2-allanzhang@google.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it
can be called from atomic and non-atomic contexts since we don't have
bpf_prog_active to prevent it happen.

This patch enables 3 levels of nesting to support normal, irq and nmi
context.

We can easily reproduce the issue by running netperf crr mode with 100
flows and 10 threads from netperf client side.

Here is the whole stack dump:

[  515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220
[  515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G        W        4.15.0-smp-fixpanic #44
[  515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018
[  515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220
[  515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246
[  515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000
[  515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80
[  515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012
[  515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002
[  515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0
[  515.228910] FS:  00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000
[  515.228910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0
[  515.228911] Call Trace:
[  515.228913]  &lt;IRQ&gt;
[  515.228919]  [&lt;ffffffff82c6c6cb&gt;] bpf_sockopt_event_output+0x3b/0x50
[  515.228923]  [&lt;ffffffff8265daee&gt;] ? bpf_ktime_get_ns+0xe/0x10
[  515.228927]  [&lt;ffffffff8266fda5&gt;] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
[  515.228930]  [&lt;ffffffff82cf90a5&gt;] ? tcp_init_transfer+0x125/0x150
[  515.228933]  [&lt;ffffffff82cf9159&gt;] ? tcp_finish_connect+0x89/0x110
[  515.228936]  [&lt;ffffffff82cf98e4&gt;] ? tcp_rcv_state_process+0x704/0x1010
[  515.228939]  [&lt;ffffffff82c6e263&gt;] ? sk_filter_trim_cap+0x53/0x2a0
[  515.228942]  [&lt;ffffffff82d90d1f&gt;] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0
[  515.228945]  [&lt;ffffffff82d92160&gt;] ? tcp_v6_do_rcv+0x1c0/0x460
[  515.228947]  [&lt;ffffffff82d93558&gt;] ? tcp_v6_rcv+0x9f8/0xb30
[  515.228951]  [&lt;ffffffff82d737c0&gt;] ? ip6_route_input+0x190/0x220
[  515.228955]  [&lt;ffffffff82d5f7ad&gt;] ? ip6_protocol_deliver_rcu+0x6d/0x450
[  515.228958]  [&lt;ffffffff82d60246&gt;] ? ip6_rcv_finish+0xb6/0x170
[  515.228961]  [&lt;ffffffff82d5fb90&gt;] ? ip6_protocol_deliver_rcu+0x450/0x450
[  515.228963]  [&lt;ffffffff82d60361&gt;] ? ipv6_rcv+0x61/0xe0
[  515.228966]  [&lt;ffffffff82d60190&gt;] ? ipv6_list_rcv+0x330/0x330
[  515.228969]  [&lt;ffffffff82c4976b&gt;] ? __netif_receive_skb_one_core+0x5b/0xa0
[  515.228972]  [&lt;ffffffff82c497d1&gt;] ? __netif_receive_skb+0x21/0x70
[  515.228975]  [&lt;ffffffff82c4a8d2&gt;] ? process_backlog+0xb2/0x150
[  515.228978]  [&lt;ffffffff82c4aadf&gt;] ? net_rx_action+0x16f/0x410
[  515.228982]  [&lt;ffffffff830000dd&gt;] ? __do_softirq+0xdd/0x305
[  515.228986]  [&lt;ffffffff8252cfdc&gt;] ? irq_exit+0x9c/0xb0
[  515.228989]  [&lt;ffffffff82e02de5&gt;] ? smp_call_function_single_interrupt+0x65/0x120
[  515.228991]  [&lt;ffffffff82e020e1&gt;] ? call_function_single_interrupt+0x81/0x90
[  515.228992]  &lt;/IRQ&gt;
[  515.228996]  [&lt;ffffffff82a11ff0&gt;] ? io_serial_in+0x20/0x20
[  515.229000]  [&lt;ffffffff8259c040&gt;] ? console_unlock+0x230/0x490
[  515.229003]  [&lt;ffffffff8259cbaa&gt;] ? vprintk_emit+0x26a/0x2a0
[  515.229006]  [&lt;ffffffff8259cbff&gt;] ? vprintk_default+0x1f/0x30
[  515.229008]  [&lt;ffffffff8259d9f5&gt;] ? vprintk_func+0x35/0x70
[  515.229011]  [&lt;ffffffff8259d4bb&gt;] ? printk+0x50/0x66
[  515.229013]  [&lt;ffffffff82637637&gt;] ? bpf_event_output+0xb7/0x220
[  515.229016]  [&lt;ffffffff82c6c6cb&gt;] ? bpf_sockopt_event_output+0x3b/0x50
[  515.229019]  [&lt;ffffffff8265daee&gt;] ? bpf_ktime_get_ns+0xe/0x10
[  515.229023]  [&lt;ffffffff82c29e87&gt;] ? release_sock+0x97/0xb0
[  515.229026]  [&lt;ffffffff82ce9d6a&gt;] ? tcp_recvmsg+0x31a/0xda0
[  515.229029]  [&lt;ffffffff8266fda5&gt;] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100
[  515.229032]  [&lt;ffffffff82ce77c1&gt;] ? tcp_set_state+0x191/0x1b0
[  515.229035]  [&lt;ffffffff82ced10e&gt;] ? tcp_disconnect+0x2e/0x600
[  515.229038]  [&lt;ffffffff82cecbbb&gt;] ? tcp_close+0x3eb/0x460
[  515.229040]  [&lt;ffffffff82d21082&gt;] ? inet_release+0x42/0x70
[  515.229043]  [&lt;ffffffff82d58809&gt;] ? inet6_release+0x39/0x50
[  515.229046]  [&lt;ffffffff82c1f32d&gt;] ? __sock_release+0x4d/0xd0
[  515.229049]  [&lt;ffffffff82c1f3e5&gt;] ? sock_close+0x15/0x20
[  515.229052]  [&lt;ffffffff8273b517&gt;] ? __fput+0xe7/0x1f0
[  515.229055]  [&lt;ffffffff8273b66e&gt;] ? ____fput+0xe/0x10
[  515.229058]  [&lt;ffffffff82547bf2&gt;] ? task_work_run+0x82/0xb0
[  515.229061]  [&lt;ffffffff824086df&gt;] ? exit_to_usermode_loop+0x7e/0x11f
[  515.229064]  [&lt;ffffffff82408171&gt;] ? do_syscall_64+0x111/0x130
[  515.229067]  [&lt;ffffffff82e0007c&gt;] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops")
Signed-off-by: Allan Zhang &lt;allanzhang@google.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Reviewed-by: Stanislav Fomichev &lt;sdf@google.com&gt;
Reviewed-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Link: https://lore.kernel.org/bpf/20190925234312.94063-2-allanzhang@google.com
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: Restrict bpf when kernel lockdown is in confidentiality mode</title>
<updated>2019-08-20T04:54:16+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2019-08-20T00:17:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9d1f8be5cf42b497a3bddf1d523f2bb142e9318c'/>
<id>9d1f8be5cf42b497a3bddf1d523f2bb142e9318c</id>
<content type='text'>
bpf_read() and bpf_read_str() could potentially be abused to (eg) allow
private keys in kernel memory to be leaked. Disable them if the kernel
has been locked down in confidentiality mode.

Suggested-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Signed-off-by: Matthew Garrett &lt;mjg59@google.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee &lt;jlee@suse.com&gt;
cc: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
bpf_read() and bpf_read_str() could potentially be abused to (eg) allow
private keys in kernel memory to be leaked. Disable them if the kernel
has been locked down in confidentiality mode.

Suggested-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Signed-off-by: Matthew Garrett &lt;mjg59@google.com&gt;
Reviewed-by: Kees Cook &lt;keescook@chromium.org&gt;
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee &lt;jlee@suse.com&gt;
cc: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Cc: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: James Morris &lt;jmorris@namei.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: fix compiler warning with CONFIG_MODULES=n</title>
<updated>2019-06-26T12:44:07+00:00</updated>
<author>
<name>Yonghong Song</name>
<email>yhs@fb.com</email>
</author>
<published>2019-06-26T00:35:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9db1ff0a415c7de8eb67df5b2c56ac409ccefc37'/>
<id>9db1ff0a415c7de8eb67df5b2c56ac409ccefc37</id>
<content type='text'>
With CONFIG_MODULES=n, the following compiler warning occurs:
  /data/users/yhs/work/net-next/kernel/trace/bpf_trace.c:605:13: warning:
      ‘do_bpf_send_signal’ defined but not used [-Wunused-function]
  static void do_bpf_send_signal(struct irq_work *entry)

The __init function send_signal_irq_work_init(), which calls
do_bpf_send_signal(), is defined under CONFIG_MODULES. Hence,
when CONFIG_MODULES=n, nobody calls static function do_bpf_send_signal(),
hence the warning.

The init function send_signal_irq_work_init() should work without
CONFIG_MODULES. Moving it out of CONFIG_MODULES
code section fixed the compiler warning, and also make bpf_send_signal()
helper work without CONFIG_MODULES.

Fixes: 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
Reported-By: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
With CONFIG_MODULES=n, the following compiler warning occurs:
  /data/users/yhs/work/net-next/kernel/trace/bpf_trace.c:605:13: warning:
      ‘do_bpf_send_signal’ defined but not used [-Wunused-function]
  static void do_bpf_send_signal(struct irq_work *entry)

The __init function send_signal_irq_work_init(), which calls
do_bpf_send_signal(), is defined under CONFIG_MODULES. Hence,
when CONFIG_MODULES=n, nobody calls static function do_bpf_send_signal(),
hence the warning.

The init function send_signal_irq_work_init() should work without
CONFIG_MODULES. Moving it out of CONFIG_MODULES
code section fixed the compiler warning, and also make bpf_send_signal()
helper work without CONFIG_MODULES.

Fixes: 8b401f9ed244 ("bpf: implement bpf_send_signal() helper")
Reported-By: Arnd Bergmann &lt;arnd@arndb.de&gt;
Signed-off-by: Yonghong Song &lt;yhs@fb.com&gt;
Acked-by: Song Liu &lt;songliubraving@fb.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
</pre>
</div>
</content>
</entry>
</feed>
