linux.git/kernel/signal.c, branch v7.1

signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()

2026-05-22T13:19:31+00:00

When a multi-threaded process receives a stop signal (e.g., SIGSTOP),
do_signal_stop() sets JOBCTL_STOP_PENDING and JOBCTL_STOP_CONSUME on all
threads and sets signal->group_stop_count to the number of threads. If
one of the threads concurrently calls execve(), de_thread() invokes
zap_other_threads() to kill all other threads. zap_other_threads()
aborts the pending group stop by resetting signal->group_stop_count to 0
and clears the JOBCTL_PENDING_MASK for all other threads. However, it
fails to clear the job control flags for the calling thread.

When execve() completes, the calling thread returns to user mode and
checks for pending signals. Seeing the stale JOBCTL_STOP_PENDING flag,
it calls do_signal_stop(), which invokes task_participate_group_stop().
Since JOBCTL_STOP_CONSUME is still set, it attempts to decrement the
already-zero signal->group_stop_count, triggering a warning:

sig->group_stop_count == 0
WARNING: CPU: 1 PID: 6475 at kernel/signal.c:373
task_participate_group_stop+0x215/0x2d0
Call Trace:
 
 do_signal_stop+0x3be/0x5c0 kernel/signal.c:2619
 get_signal+0xa8c/0x1330 kernel/signal.c:2884
 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop+0x8c/0x4d0 kernel/entry/common.c:98
 do_syscall_64+0x33e/0xf80 arch/x86/entry/syscall_64.c:100
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 

Fix this race condition by clearing the JOBCTL_PENDING_MASK for the
calling thread in zap_other_threads(), ensuring it does not retain any
stale job control state after the thread group is destroyed. This aligns
with other functions that tear down a thread group and abort group
stops, such as zap_process() and complete_signal(), which correctly
clear these flags for all threads including the current one.

Fixes: 39efa3ef3a37 ("signal: Use GROUP_STOP_PENDING to stop once for a single group stop")
Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot
Reported-by: syzbot+b109633ea805cac54a61@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=b109633ea805cac54a61
Link: https://syzkaller.appspot.com/ai_job?id=d70208cc-862b-4fe3-bf02-3031e10cd0b3
Signed-off-by: Aleksandr Nogikh 
Link: https://patch.msgid.link/20260521142240.2973022-1-nogikh@google.com
Signed-off-by: Christian Brauner (Amutable)

Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2026-04-17T03:11:56+00:00

Pull non-MM updates from Andrew Morton:

 - "pid: make sub-init creation retryable" (Oleg Nesterov)

   Make creation of init in a new namespace more robust by clearing away
   some historical cruft which is no longer needed. Also some
   documentation fixups

 - "selftests/fchmodat2: Error handling and general" (Mark Brown)

   Fix and a cleanup for the fchmodat2() syscall selftest

 - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko)

 - "hung_task: Provide runtime reset interface for hung task detector"
   (Aaron Tomlin)

   Give administrators the ability to zero out
   /proc/sys/kernel/hung_task_detect_count

 - "tools/getdelays: use the static UAPI headers from
   tools/include/uapi" (Thomas Weißschuh)

   Teach getdelays to use the in-kernel UAPI headers rather than the
   system-provided ones

 - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta)

   Several cleanups and fixups to the hardlockup detector code and its
   documentation

 - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law)

   A couple of small/theoretical fixes in the bch code

 - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo)

 - "cleanup the RAID5 XOR library" (Christoph Hellwig)

   A quite far-reaching cleanup to this code. I can't do better than to
   quote Christoph:

     "The XOR library used for the RAID5 parity is a bit of a mess right
      now. The main file sits in crypto/ despite not being cryptography
      and not using the crypto API, with the generic implementations
      sitting in include/asm-generic and the arch implementations
      sitting in an asm/ header in theory. The latter doesn't work for
      many cases, so architectures often build the code directly into
      the core kernel, or create another module for the architecture
      code.

      Change this to a single module in lib/ that also contains the
      architecture optimizations, similar to the library work Eric
      Biggers has done for the CRC and crypto libraries later. After
      that it changes to better calling conventions that allow for
      smarter architecture implementations (although none is contained
      here yet), and uses static_call to avoid indirection function call
      overhead"

 - "lib/list_sort: Clean up list_sort() scheduling workarounds"
   (Kuan-Wei Chiu)

   Clean up this library code by removing a hacky thing which was added
   for UBIFS, which UBIFS doesn't actually need

 - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt)

   Fix a few bugs in the scatterlist code, add in-kernel tests for the
   now-fixed bugs and fix a leak in the test itself

 - "kdump: Enable LUKS-encrypted dump target support in ARM64 and
   PowerPC" (Coiby Xu)

   Enable support of the LUKS-encrypted device dump target on arm64 and
   powerpc

 - "ocfs2: consolidate extent list validation into block read callbacks"
   (Joseph Qi)

   Cleanup, simplify, and make more robust ocfs2's validation of extent
   list fields (Kernel test robot loves mounting corrupted fs images!)

* tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits)
  ocfs2: validate group add input before caching
  ocfs2: validate bg_bits during freefrag scan
  ocfs2: fix listxattr handling when the buffer is full
  doc: watchdog: fix typos etc
  update Sean's email address
  ocfs2: use get_random_u32() where appropriate
  ocfs2: split transactions in dio completion to avoid credit exhaustion
  ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()
  ocfs2: validate extent block list fields during block read
  ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()
  ocfs2: validate dx_root extent list fields during block read
  ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
  ocfs2: handle invalid dinode in ocfs2_group_extend
  .get_maintainer.ignore: add Askar
  ocfs2: validate bg_list extent bounds in discontig groups
  checkpatch: exclude forward declarations of const structs
  tools/accounting: handle truncated taskstats netlink messages
  taskstats: set version in TGID exit notifications
  ocfs2/heartbeat: fix slot mapping rollback leaks on error paths
  arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel
  ...

Merge tag 'kernel-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

2026-04-15T03:28:40+00:00

Pull pid_namespace updates from Christian Brauner:

 - pid_namespace: make init creation more flexible

   Annotate ->child_reaper accesses with {READ,WRITE}_ONCE() to protect
   the unlocked readers from cpu/compiler reordering, and enforce that
   pid 1 in a pid namespace is always the first allocated pid (the
   set_tid path already required this).

   On top of that, allow opening pid_for_children before the pid
   namespace init has been created. This lets one process create the pid
   namespace and a different process create the init via setns(), which
   makes clone3(set_tid) usable in all cases evenly and is particularly
   useful to CRIU when restoring nested containers.

   A new selftest covers both the basic create-pidns-then-init flow and
   the cross-process variant, and a MAINTAINERS entry for the pid
   namespace code is added.

 - unrelated signal cleanup: update outdated comment for the removed
   freezable_schedule()

* tag 'kernel-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  signal: update outdated comment for removed freezable_schedule()
  MAINTAINERS: add a pid namespace entry
  selftests: Add tests for creating pidns init via setns
  pid_namespace: allow opening pid_for_children before init was created
  pid: check init is created first after idr alloc
  pid_namespace: avoid optimization of accesses to ->child_reaper

do_notify_parent: sanitize the valid_signal() checks

2026-03-28T04:19:48+00:00

Now that kernel_clone() checks valid_signal(args->exit_signal), the "sig"
argument of do_notify_parent() must always be valid or we have a bug.

However, do_notify_parent() only checks that sig != -1 at the start, then
it does another valid_signal() check before __send_signal_locked().

This is confusing.  Change do_notify_parent() to WARN and return early if
valid_signal(sig) is false.

Link: https://lkml.kernel.org/r/abld-ilvMEZ7VgMw@redhat.com
Signed-off-by: Oleg Nesterov 
Acked-by: Deepanshu Kartikey 
Signed-off-by: Andrew Morton

complete_signal: kill always-true "core_state || !SIGNAL_GROUP_EXIT" check

2026-03-28T04:19:35+00:00

The "(signal->core_state || !(signal->flags & SIGNAL_GROUP_EXIT))" check
in complete_signal() is not obvious at all, and in fact it only adds
unnecessary confusion: this condition is always true.

prepare_signal() does:

	if (signal->flags & SIGNAL_GROUP_EXIT) {
		if (signal->core_state)
			return sig == SIGKILL;
		/*
		 * The process is in the middle of dying, drop the signal.
		 */
		return false;
	}

This means that "!signal->core_state && (signal->flags &
SIGNAL_GROUP_EXIT)" in complete_signal() is never possible.

If SIGNAL_GROUP_EXIT is set, prepare_signal() can only return true if
signal->core_state is not NULL.

Link: https://lkml.kernel.org/r/aZsfkDhnqJ4s1oTs@redhat.com
Signed-off-by: Oleg Nesterov 
Cc: Christian Brauner 
Cc: Kees Cook 
Cc: Mateusz Guzik 
Cc; Deepanshu Kartikey 
Signed-off-by: Andrew Morton

exit: kill unnecessary thread_group_leader() checks in exit_notify() and do_notify_parent()

2026-03-28T04:19:34+00:00

thread_group_empty(tsk) is only possible if tsk is a group leader, and
thread_group_empty() already does the thread_group_leader() check.

So it makes no sense to check "thread_group_leader() &&
thread_group_empty()"; thread_group_empty() alone is enough.

Link: https://lkml.kernel.org/r/aZsfeegKZPZZszJh@redhat.com
Signed-off-by: Oleg Nesterov 
Cc: Christian Brauner 
Cc: Mateusz Guzik 
Cc: Kees Cook 
Cc; Deepanshu Kartikey 
Signed-off-by: Andrew Morton

signal: update outdated comment for removed freezable_schedule()

2026-03-23T15:38:31+00:00

The function freezable_schedule() was removed in commit
f5d39b020809 ("freezer,sched: Rewrite core freezer logic"), which
rewrote the freezer to use a dedicated TASK_FROZEN state instead.

do_signal_stop() and ptrace_stop() no longer call
freezable_schedule(); they now set TASK_STOPPED/TASK_TRACED and the
freezer handles those states directly via TASK_FROZEN.  Update the
comment to reflect the current mechanism.

Assisted-by: unnamed:deepseek-v3.2 coccinelle
Signed-off-by: Kexin Sun 
Link: https://patch.msgid.link/20260321105927.7979-1-kexinsun@smail.nju.edu.cn
Signed-off-by: Christian Brauner

clone: add CLONE_AUTOREAP

2026-03-11T22:14:02+00:00

Add a new clone3() flag CLONE_AUTOREAP that makes a child process
auto-reap on exit without ever becoming a zombie. This is a per-process
property in contrast to the existing auto-reap mechanism via
SA_NOCLDWAIT or SIG_IGN for SIGCHLD which applies to all children of a
given parent.

Currently the only way to automatically reap children is to set
SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property
affecting all children which makes it unsuitable for libraries or
applications that need selective auto-reaping of specific children while
still being able to wait() on others.

CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct.
When the child exits do_notify_parent() checks this flag and causes
exit_notify() to transition the task directly to EXIT_DEAD. Since the
flag lives on the child it survives reparenting: if the original parent
exits and the child is reparented to a subreaper or init the child still
auto-reaps when it eventually exits.

CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to
monitor the child's exit via poll() and retrieve exit status via
PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget
pattern where the parent simply doesn't care about the child's exit
status. No exit signal is delivered so exit_signal must be zero.

CLONE_AUTOREAP is rejected in combination with CLONE_PARENT. If a
CLONE_AUTOREAP child were to clone(CLONE_PARENT) the new grandchild
would inherit exit_signal == 0 from the autoreap parent's group leader
but without signal->autoreap. This grandchild would become a zombie that
never sends a signal and is never autoreaped - confusing and arguably
broken behavior.

The flag is not inherited by the autoreap process's own children. Each
child that should be autoreaped must be explicitly created with
CLONE_AUTOREAP.

Link: https://github.com/uapi-group/kernel-features/issues/45
Link: https://patch.msgid.link/20260226-work-pidfs-autoreap-v5-1-d148b984a989@kernel.org
Reviewed-by: Oleg Nesterov 
Signed-off-by: Christian Brauner

compiler-context-analysis: Remove __cond_lock() function-like helper

2026-01-05T15:43:33+00:00

As discussed in [1], removing __cond_lock() will improve the readability
of trylock code. Now that Sparse context tracking support has been
removed, we can also remove __cond_lock().

Change existing APIs to either drop __cond_lock() completely, or make
use of the __cond_acquires() function attribute instead.

In particular, spinlock and rwlock implementations required switching
over to inline helpers rather than statement-expressions for their
trylock_* variants.

Suggested-by: Peter Zijlstra 
Signed-off-by: Marco Elver 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lore.kernel.org/all/20250207082832.GU7145@noisy.programming.kicks-ass.net/ [1]
Link: https://patch.msgid.link/20251219154418.3592607-25-elver@google.com

signal: Move MMCID exit out of sighand lock

2025-11-25T18:45:40+00:00

There is no need anymore to keep this under sighand lock as the current
code and the upcoming replacement are not depending on the exit state of a
task anymore.

That allows to use a mutex in the exit path.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Thomas Gleixner 
Reviewed-by: Mathieu Desnoyers 
Link: https://patch.msgid.link/20251119172549.706439391@linutronix.de