<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/signal.c, branch v5.8.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>task_work: only grab task signal lock when needed</title>
<updated>2020-08-19T06:27:10+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2020-08-13T15:01:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7e430e763be2a0c7eefe2675fc574d8ed6b00874'/>
<id>7e430e763be2a0c7eefe2675fc574d8ed6b00874</id>
<content type='text'>
commit ebf0d100df0731901c16632f78d78d35f4123bc4 upstream.

If JOBCTL_TASK_WORK is already set on the targeted task, then we need
not go through {lock,unlock}_task_sighand() to set it again and queue
a signal wakeup. This is safe as we're checking it _after_ adding the
new task_work with cmpxchg().

The ordering is as follows:

task_work_add()				get_signal()
--------------------------------------------------------------
STORE(task-&gt;task_works, new_work);	STORE(task-&gt;jobctl);
mb();					mb();
LOAD(task-&gt;jobctl);			LOAD(task-&gt;task_works);

This speeds up TWA_SIGNAL handling quite a bit, which is important now
that io_uring is relying on it for all task_work deliveries.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ebf0d100df0731901c16632f78d78d35f4123bc4 upstream.

If JOBCTL_TASK_WORK is already set on the targeted task, then we need
not go through {lock,unlock}_task_sighand() to set it again and queue
a signal wakeup. This is safe as we're checking it _after_ adding the
new task_work with cmpxchg().

The ordering is as follows:

task_work_add()				get_signal()
--------------------------------------------------------------
STORE(task-&gt;task_works, new_work);	STORE(task-&gt;jobctl);
mb();					mb();
LOAD(task-&gt;jobctl);			LOAD(task-&gt;task_works);

This speeds up TWA_SIGNAL handling quite a bit, which is important now
that io_uring is relying on it for all task_work deliveries.

Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Jann Horn &lt;jannh@google.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>signal: fix typo in dequeue_synchronous_signal()</title>
<updated>2020-07-26T21:57:52+00:00</updated>
<author>
<name>Pavel Machek</name>
<email>pavel@ucw.cz</email>
</author>
<published>2020-07-24T09:05:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7665a47f70b3f64bf09c233cc7df73fde9e506f1'/>
<id>7665a47f70b3f64bf09c233cc7df73fde9e506f1</id>
<content type='text'>
s/postive/positive/

Signed-off-by: Pavel Machek (CIP) &lt;pavel@denx.de&gt;
Link: https://lore.kernel.org/r/20200724090531.GA14409@amd
[christian.brauner@ubuntu.com: tweak commit message]
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
s/postive/positive/

Signed-off-by: Pavel Machek (CIP) &lt;pavel@denx.de&gt;
Link: https://lore.kernel.org/r/20200724090531.GA14409@amd
[christian.brauner@ubuntu.com: tweak commit message]
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>task_work: teach task_work_add() to do signal_wake_up()</title>
<updated>2020-06-30T18:18:08+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2020-06-30T15:32:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e91b48162332480f5840902268108bb7fb7a44c7'/>
<id>e91b48162332480f5840902268108bb7fb7a44c7</id>
<content type='text'>
So that the target task will exit the wait_event_interruptible-like
loop and call task_work_run() asap.

The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the
new TWA_SIGNAL flag implies signal_wake_up().  However, it needs to
avoid the race with recalc_sigpending(), so the patch also adds the
new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK.

TODO: once this patch is merged we need to change all current users
of task_work_add(notify = true) to use TWA_RESUME.

Cc: stable@vger.kernel.org # v5.7
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
So that the target task will exit the wait_event_interruptible-like
loop and call task_work_run() asap.

The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the
new TWA_SIGNAL flag implies signal_wake_up().  However, it needs to
avoid the race with recalc_sigpending(), so the patch also adds the
new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK.

TODO: once this patch is merged we need to change all current users
of task_work_add(notify = true) to use TWA_RESUME.

Cc: stable@vger.kernel.org # v5.7
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs</title>
<updated>2020-06-01T23:21:46+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-06-01T23:21:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=8b39a57e965403c12a27d0859901a8a7d1d7318f'/>
<id>8b39a57e965403c12a27d0859901a8a7d1d7318f</id>
<content type='text'>
Pull uaccess/coredump updates from Al Viro:
 "set_fs() removal in coredump-related area - mostly Christoph's
  stuff..."

* 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump
  binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump
  binfmt_elf: remove the set_fs in fill_siginfo_note
  signal: refactor copy_siginfo_to_user32
  powerpc/spufs: simplify spufs core dumping
  powerpc/spufs: stop using access_ok
  powerpc/spufs: fix copy_to_user while atomic
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull uaccess/coredump updates from Al Viro:
 "set_fs() removal in coredump-related area - mostly Christoph's
  stuff..."

* 'work.set_fs-exec' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  binfmt_elf_fdpic: remove the set_fs(KERNEL_DS) in elf_fdpic_core_dump
  binfmt_elf: remove the set_fs(KERNEL_DS) in elf_core_dump
  binfmt_elf: remove the set_fs in fill_siginfo_note
  signal: refactor copy_siginfo_to_user32
  powerpc/spufs: simplify spufs core dumping
  powerpc/spufs: stop using access_ok
  powerpc/spufs: fix copy_to_user while atomic
</pre>
</div>
</content>
</entry>
<entry>
<title>signal: refactor copy_siginfo_to_user32</title>
<updated>2020-05-05T20:46:09+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2020-05-05T10:12:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c3b3f52476412a3899f2c65b220075aceb18dd2c'/>
<id>c3b3f52476412a3899f2c65b220075aceb18dd2c</id>
<content type='text'>
Factor out a copy_siginfo_to_external32 helper from
copy_siginfo_to_user32 that fills out the compat_siginfo, but does so
on a kernel space data structure.  With that we can let architectures
override copy_siginfo_to_user32 with their own implementations using
copy_siginfo_to_external32.  That allows moving the x32 SIGCHLD purely
to x86 architecture code.

As a nice side effect copy_siginfo_to_external32 also comes in handy
for avoiding a set_fs() call in the coredump code later on.

Contains improvements from Eric W. Biederman &lt;ebiederm@xmission.com&gt;
and Arnd Bergmann &lt;arnd@arndb.de&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Factor out a copy_siginfo_to_external32 helper from
copy_siginfo_to_user32 that fills out the compat_siginfo, but does so
on a kernel space data structure.  With that we can let architectures
override copy_siginfo_to_user32 with their own implementations using
copy_siginfo_to_external32.  That allows moving the x32 SIGCHLD purely
to x86 architecture code.

As a nice side effect copy_siginfo_to_external32 also comes in handy
for avoiding a set_fs() call in the coredump code later on.

Contains improvements from Eric W. Biederman &lt;ebiederm@xmission.com&gt;
and Arnd Bergmann &lt;arnd@arndb.de&gt;.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2020-04-23T20:30:18+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-23T20:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b4f633221f0aeac102e463a4be46a643b2e3b819'/>
<id>b4f633221f0aeac102e463a4be46a643b2e3b819</id>
<content type='text'>
Pull SIGCHLD fix from Eric Biederman:
 "Christof Meerwald reported that do_notify_parent has not been
  successfully populating si_pid and si_uid for multi-threaded
  processes.

  This is the one-liner fix. Strictly speaking a one-liner plus
  comment"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Avoid corrupting si_pid and si_uid in do_notify_parent
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull SIGCHLD fix from Eric Biederman:
 "Christof Meerwald reported that do_notify_parent has not been
  successfully populating si_pid and si_uid for multi-threaded
  processes.

  This is the one-liner fix. Strictly speaking a one-liner plus
  comment"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  signal: Avoid corrupting si_pid and si_uid in do_notify_parent
</pre>
</div>
</content>
</entry>
<entry>
<title>signal: Avoid corrupting si_pid and si_uid in do_notify_parent</title>
<updated>2020-04-21T14:55:30+00:00</updated>
<author>
<name>Eric W. Biederman</name>
<email>ebiederm@xmission.com</email>
</author>
<published>2020-04-20T16:41:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=61e713bdca3678e84815f2427f7a063fc353a1fc'/>
<id>61e713bdca3678e84815f2427f7a063fc353a1fc</id>
<content type='text'>
Christof Meerwald &lt;cmeerw@cmeerw.org&gt; writes:
&gt; Hi,
&gt;
&gt; this is probably related to commit
&gt; 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
&gt; fixups of si_pid and si_uid).
&gt;
&gt; With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
&gt; properly set si_pid field - this seems to happen for multi-threaded
&gt; child processes.
&gt;
&gt; A simple test program (based on the sample from the signalfd man page):
&gt;
&gt; #include &lt;sys/signalfd.h&gt;
&gt; #include &lt;signal.h&gt;
&gt; #include &lt;unistd.h&gt;
&gt; #include &lt;spawn.h&gt;
&gt; #include &lt;stdlib.h&gt;
&gt; #include &lt;stdio.h&gt;
&gt;
&gt; #define handle_error(msg) \
&gt;     do { perror(msg); exit(EXIT_FAILURE); } while (0)
&gt;
&gt; int main(int argc, char *argv[])
&gt; {
&gt;   sigset_t mask;
&gt;   int sfd;
&gt;   struct signalfd_siginfo fdsi;
&gt;   ssize_t s;
&gt;
&gt;   sigemptyset(&amp;mask);
&gt;   sigaddset(&amp;mask, SIGCHLD);
&gt;
&gt;   if (sigprocmask(SIG_BLOCK, &amp;mask, NULL) == -1)
&gt;     handle_error("sigprocmask");
&gt;
&gt;   pid_t chldpid;
&gt;   char *chldargv[] = { "./sfdclient", NULL };
&gt;   posix_spawn(&amp;chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
&gt;
&gt;   sfd = signalfd(-1, &amp;mask, 0);
&gt;   if (sfd == -1)
&gt;     handle_error("signalfd");
&gt;
&gt;   for (;;) {
&gt;     s = read(sfd, &amp;fdsi, sizeof(struct signalfd_siginfo));
&gt;     if (s != sizeof(struct signalfd_siginfo))
&gt;       handle_error("read");
&gt;
&gt;     if (fdsi.ssi_signo == SIGCHLD) {
&gt;       printf("Got SIGCHLD %d %d %d %d\n",
&gt;           fdsi.ssi_status, fdsi.ssi_code,
&gt;           fdsi.ssi_uid, fdsi.ssi_pid);
&gt;       return 0;
&gt;     } else {
&gt;       printf("Read unexpected signal\n");
&gt;     }
&gt;   }
&gt; }
&gt;
&gt;
&gt; and a multi-threaded client to test with:
&gt;
&gt; #include &lt;unistd.h&gt;
&gt; #include &lt;pthread.h&gt;
&gt;
&gt; void *f(void *arg)
&gt; {
&gt;   sleep(100);
&gt; }
&gt;
&gt; int main()
&gt; {
&gt;   pthread_t t[8];
&gt;
&gt;   for (int i = 0; i != 8; ++i)
&gt;   {
&gt;     pthread_create(&amp;t[i], NULL, f, NULL);
&gt;   }
&gt; }
&gt;
&gt; I tried to do a bit of debugging and what seems to be happening is
&gt; that
&gt;
&gt;   /* From an ancestor pid namespace? */
&gt;   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
&gt;
&gt; fails inside task_pid_nr_ns because the check for "pid_alive" fails.
&gt;
&gt; This code seems to be called from do_notify_parent and there we
&gt; actually have "tsk != current" (I am assuming both are threads of the
&gt; current process?)

I instrumented the code with a warning and received the following backtrace:
&gt; WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
&gt; Modules linked in:
&gt; CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
&gt; Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
&gt; RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
&gt; Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 &lt;0f&gt; 0b e9 3fd
&gt; RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
&gt; RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
&gt; RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
&gt; RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
&gt; R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
&gt; R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
&gt; FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
&gt; CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&gt; CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
&gt; Call Trace:
&gt;  send_signal+0x1c8/0x310
&gt;  do_notify_parent+0x50f/0x550
&gt;  release_task.part.21+0x4fd/0x620
&gt;  do_exit+0x6f6/0xaf0
&gt;  do_group_exit+0x42/0xb0
&gt;  get_signal+0x13b/0xbb0
&gt;  do_signal+0x2b/0x670
&gt;  ? __audit_syscall_exit+0x24d/0x2b0
&gt;  ? rcu_read_lock_sched_held+0x4d/0x60
&gt;  ? kfree+0x24c/0x2b0
&gt;  do_syscall_64+0x176/0x640
&gt;  ? trace_hardirqs_off_thunk+0x1a/0x1c
&gt;  entry_SYSCALL_64_after_hwframe+0x49/0xb3

The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.

The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie.  So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.

So change do_notify_parent to call __send_signal directly.

Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30.  This fix only
backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.

Cc: stable@vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
Reported-by: Christof Meerwald &lt;cmeerw@cmeerw.org&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Christof Meerwald &lt;cmeerw@cmeerw.org&gt; writes:
&gt; Hi,
&gt;
&gt; this is probably related to commit
&gt; 7a0cf094944e2540758b7f957eb6846d5126f535 (signal: Correct namespace
&gt; fixups of si_pid and si_uid).
&gt;
&gt; With a 5.6.5 kernel I am seeing SIGCHLD signals that don't include a
&gt; properly set si_pid field - this seems to happen for multi-threaded
&gt; child processes.
&gt;
&gt; A simple test program (based on the sample from the signalfd man page):
&gt;
&gt; #include &lt;sys/signalfd.h&gt;
&gt; #include &lt;signal.h&gt;
&gt; #include &lt;unistd.h&gt;
&gt; #include &lt;spawn.h&gt;
&gt; #include &lt;stdlib.h&gt;
&gt; #include &lt;stdio.h&gt;
&gt;
&gt; #define handle_error(msg) \
&gt;     do { perror(msg); exit(EXIT_FAILURE); } while (0)
&gt;
&gt; int main(int argc, char *argv[])
&gt; {
&gt;   sigset_t mask;
&gt;   int sfd;
&gt;   struct signalfd_siginfo fdsi;
&gt;   ssize_t s;
&gt;
&gt;   sigemptyset(&amp;mask);
&gt;   sigaddset(&amp;mask, SIGCHLD);
&gt;
&gt;   if (sigprocmask(SIG_BLOCK, &amp;mask, NULL) == -1)
&gt;     handle_error("sigprocmask");
&gt;
&gt;   pid_t chldpid;
&gt;   char *chldargv[] = { "./sfdclient", NULL };
&gt;   posix_spawn(&amp;chldpid, "./sfdclient", NULL, NULL, chldargv, NULL);
&gt;
&gt;   sfd = signalfd(-1, &amp;mask, 0);
&gt;   if (sfd == -1)
&gt;     handle_error("signalfd");
&gt;
&gt;   for (;;) {
&gt;     s = read(sfd, &amp;fdsi, sizeof(struct signalfd_siginfo));
&gt;     if (s != sizeof(struct signalfd_siginfo))
&gt;       handle_error("read");
&gt;
&gt;     if (fdsi.ssi_signo == SIGCHLD) {
&gt;       printf("Got SIGCHLD %d %d %d %d\n",
&gt;           fdsi.ssi_status, fdsi.ssi_code,
&gt;           fdsi.ssi_uid, fdsi.ssi_pid);
&gt;       return 0;
&gt;     } else {
&gt;       printf("Read unexpected signal\n");
&gt;     }
&gt;   }
&gt; }
&gt;
&gt;
&gt; and a multi-threaded client to test with:
&gt;
&gt; #include &lt;unistd.h&gt;
&gt; #include &lt;pthread.h&gt;
&gt;
&gt; void *f(void *arg)
&gt; {
&gt;   sleep(100);
&gt; }
&gt;
&gt; int main()
&gt; {
&gt;   pthread_t t[8];
&gt;
&gt;   for (int i = 0; i != 8; ++i)
&gt;   {
&gt;     pthread_create(&amp;t[i], NULL, f, NULL);
&gt;   }
&gt; }
&gt;
&gt; I tried to do a bit of debugging and what seems to be happening is
&gt; that
&gt;
&gt;   /* From an ancestor pid namespace? */
&gt;   if (!task_pid_nr_ns(current, task_active_pid_ns(t))) {
&gt;
&gt; fails inside task_pid_nr_ns because the check for "pid_alive" fails.
&gt;
&gt; This code seems to be called from do_notify_parent and there we
&gt; actually have "tsk != current" (I am assuming both are threads of the
&gt; current process?)

I instrumented the code with a warning and received the following backtrace:
&gt; WARNING: CPU: 0 PID: 777 at kernel/pid.c:501 __task_pid_nr_ns.cold.6+0xc/0x15
&gt; Modules linked in:
&gt; CPU: 0 PID: 777 Comm: sfdclient Not tainted 5.7.0-rc1userns+ #2924
&gt; Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
&gt; RIP: 0010:__task_pid_nr_ns.cold.6+0xc/0x15
&gt; Code: ff 66 90 48 83 ec 08 89 7c 24 04 48 8d 7e 08 48 8d 74 24 04 e8 9a b6 44 00 48 83 c4 08 c3 48 c7 c7 59 9f ac 82 e8 c2 c4 04 00 &lt;0f&gt; 0b e9 3fd
&gt; RSP: 0018:ffffc9000042fbf8 EFLAGS: 00010046
&gt; RAX: 000000000000000c RBX: 0000000000000000 RCX: ffffc9000042faf4
&gt; RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81193d29
&gt; RBP: ffffc9000042fc18 R08: 0000000000000000 R09: 0000000000000001
&gt; R10: 000000100f938416 R11: 0000000000000309 R12: ffff8880b941c140
&gt; R13: 0000000000000000 R14: 0000000000000000 R15: ffff8880b941c140
&gt; FS:  0000000000000000(0000) GS:ffff8880bca00000(0000) knlGS:0000000000000000
&gt; CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
&gt; CR2: 00007f2e8c0a32e0 CR3: 0000000002e10000 CR4: 00000000000006f0
&gt; Call Trace:
&gt;  send_signal+0x1c8/0x310
&gt;  do_notify_parent+0x50f/0x550
&gt;  release_task.part.21+0x4fd/0x620
&gt;  do_exit+0x6f6/0xaf0
&gt;  do_group_exit+0x42/0xb0
&gt;  get_signal+0x13b/0xbb0
&gt;  do_signal+0x2b/0x670
&gt;  ? __audit_syscall_exit+0x24d/0x2b0
&gt;  ? rcu_read_lock_sched_held+0x4d/0x60
&gt;  ? kfree+0x24c/0x2b0
&gt;  do_syscall_64+0x176/0x640
&gt;  ? trace_hardirqs_off_thunk+0x1a/0x1c
&gt;  entry_SYSCALL_64_after_hwframe+0x49/0xb3

The immediate problem is as Christof noticed that "pid_alive(current) == false".
This happens because do_notify_parent is called from the last thread to exit
in a process after that thread has been reaped.

The bigger issue is that do_notify_parent can be called from any
process that manages to wait on a thread of a multi-threaded process
from wait_task_zombie.  So any logic based upon current for
do_notify_parent is just nonsense, as current can be pretty much
anything.

So change do_notify_parent to call __send_signal directly.

Inspecting the code it appears this problem has existed since the pid
namespace support started handling this case in 2.6.30.  This fix only
backports to 7a0cf094944e ("signal: Correct namespace fixups of si_pid and si_uid")
where the problem logic was moved out of __send_signal and into send_signal.

Cc: stable@vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Ref: 921cf9f63089 ("signals: protect cinit from unblocked SIG_DFL signals")
Link: https://lore.kernel.org/lkml/20200419201336.GI22017@edge.cmeerw.net/
Reported-by: Christof Meerwald &lt;cmeerw@cmeerw.org&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Signed-off-by: "Eric W. Biederman" &lt;ebiederm@xmission.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signal: use kill_proc_info instead of kill_pid_info in kill_something_info</title>
<updated>2020-04-12T20:46:34+00:00</updated>
<author>
<name>Zhiqiang Liu</name>
<email>liuzhiqiang26@huawei.com</email>
</author>
<published>2020-03-30T02:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3075afdf15b89a063f8d31c0db08a50472bb7faf'/>
<id>3075afdf15b89a063f8d31c0db08a50472bb7faf</id>
<content type='text'>
signal.c provides kill_proc_info, we can use it instead of kill_pid_info
in kill_something_info func gracefully.

Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/r/80236965-f0b5-c888-95ff-855bdec75bb3@huawei.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
signal.c provides kill_proc_info, we can use it instead of kill_pid_info
in kill_something_info func gracefully.

Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/r/80236965-f0b5-c888-95ff-855bdec75bb3@huawei.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>signal: check sig before setting info in kill_pid_usb_asyncio</title>
<updated>2020-04-12T20:46:34+00:00</updated>
<author>
<name>Zhiqiang Liu</name>
<email>liuzhiqiang26@huawei.com</email>
</author>
<published>2020-03-30T02:18:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=eaec2b0bd30690575c581eebffae64bfb7f684ac'/>
<id>eaec2b0bd30690575c581eebffae64bfb7f684ac</id>
<content type='text'>
In kill_pid_usb_asyncio, if signal is not valid, we do not need to
set info struct.

Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/r/f525fd08-1cf7-fb09-d20c-4359145eb940@huawei.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In kill_pid_usb_asyncio, if signal is not valid, we do not need to
set info struct.

Signed-off-by: Zhiqiang Liu &lt;liuzhiqiang26@huawei.com&gt;
Acked-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
Link: https://lore.kernel.org/r/f525fd08-1cf7-fb09-d20c-4359145eb940@huawei.com
Signed-off-by: Christian Brauner &lt;christian.brauner@ubuntu.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace</title>
<updated>2020-04-02T18:22:17+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2020-04-02T18:22:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d987ca1c6b7e22fbd30664111e85cec7aa66000d'/>
<id>d987ca1c6b7e22fbd30664111e85cec7aa66000d</id>
<content type='text'>
Pull exec/proc updates from Eric Biederman:
 "This contains two significant pieces of work: the work to sort out
  proc_flush_task, and the work to solve a deadlock between strace and
  exec.

  Fixing proc_flush_task so that it no longer requires a persistent
  mount makes improvements to proc possible. The removal of the
  persistent mount solves an old regression that that caused the hidepid
  mount option to only work on remount not on mount. The regression was
  found and reported by the Android folks. This further allows Alexey
  Gladkov's work making proc mount options specific to an individual
  mount of proc to move forward.

  The work on exec starts solving a long standing issue with exec that
  it takes mutexes of blocking userspace applications, which makes exec
  extremely deadlock prone. For the moment this adds a second mutex with
  a narrower scope that handles all of the easy cases. Which makes the
  tricky cases easy to spot. With a little luck the code to solve those
  deadlocks will be ready by next merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
  signal: Extend exec_id to 64bits
  pidfd: Use new infrastructure to fix deadlocks in execve
  perf: Use new infrastructure to fix deadlocks in execve
  proc: io_accounting: Use new infrastructure to fix deadlocks in execve
  proc: Use new infrastructure to fix deadlocks in execve
  kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
  kernel: doc: remove outdated comment cred.c
  mm: docs: Fix a comment in process_vm_rw_core
  selftests/ptrace: add test cases for dead-locks
  exec: Fix a deadlock in strace
  exec: Add exec_update_mutex to replace cred_guard_mutex
  exec: Move exec_mmap right after de_thread in flush_old_exec
  exec: Move cleanup of posix timers on exec out of de_thread
  exec: Factor unshare_sighand out of de_thread and call it separately
  exec: Only compute current once in flush_old_exec
  pid: Improve the comment about waiting in zap_pid_ns_processes
  proc: Remove the now unnecessary internal mount of proc
  uml: Create a private mount of proc for mconsole
  uml: Don't consult current to find the proc_mnt in mconsole_proc
  proc: Use a list of inodes to flush from proc
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull exec/proc updates from Eric Biederman:
 "This contains two significant pieces of work: the work to sort out
  proc_flush_task, and the work to solve a deadlock between strace and
  exec.

  Fixing proc_flush_task so that it no longer requires a persistent
  mount makes improvements to proc possible. The removal of the
  persistent mount solves an old regression that that caused the hidepid
  mount option to only work on remount not on mount. The regression was
  found and reported by the Android folks. This further allows Alexey
  Gladkov's work making proc mount options specific to an individual
  mount of proc to move forward.

  The work on exec starts solving a long standing issue with exec that
  it takes mutexes of blocking userspace applications, which makes exec
  extremely deadlock prone. For the moment this adds a second mutex with
  a narrower scope that handles all of the easy cases. Which makes the
  tricky cases easy to spot. With a little luck the code to solve those
  deadlocks will be ready by next merge window"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits)
  signal: Extend exec_id to 64bits
  pidfd: Use new infrastructure to fix deadlocks in execve
  perf: Use new infrastructure to fix deadlocks in execve
  proc: io_accounting: Use new infrastructure to fix deadlocks in execve
  proc: Use new infrastructure to fix deadlocks in execve
  kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve
  kernel: doc: remove outdated comment cred.c
  mm: docs: Fix a comment in process_vm_rw_core
  selftests/ptrace: add test cases for dead-locks
  exec: Fix a deadlock in strace
  exec: Add exec_update_mutex to replace cred_guard_mutex
  exec: Move exec_mmap right after de_thread in flush_old_exec
  exec: Move cleanup of posix timers on exec out of de_thread
  exec: Factor unshare_sighand out of de_thread and call it separately
  exec: Only compute current once in flush_old_exec
  pid: Improve the comment about waiting in zap_pid_ns_processes
  proc: Remove the now unnecessary internal mount of proc
  uml: Create a private mount of proc for mconsole
  uml: Don't consult current to find the proc_mnt in mconsole_proc
  proc: Use a list of inodes to flush from proc
  ...
</pre>
</div>
</content>
</entry>
</feed>
