linux.git - Linux kernel source tree

diff options

author	Christian Brauner <brauner@kernel.org>	2026-05-22 16:03:30 +0200
committer	Christian Brauner <brauner@kernel.org>	2026-05-26 11:02:02 +0200
commit	38205ecbe6b6dc47968ad4e9c978e2117720969e (patch)
tree	c41998df5cb1af4d6965558bcbc732bd582f721c /drivers/platform/wmi/tests/git@git.tavy.me:linux.git
parent	4425cd76b5e73ce92bea9dc61a0027ef3d55c9f0 (diff)

exec: free the old mm outside the exec locksrefs/merge-window/d72f629726433dd34b82ee31359234a9e00e8c98

exec_mmap() installs the new mm and then tears the old one down while still holding exec_update_lock for writing -- and with cred_guard_mutex held all the way to setup_new_exec(): setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); mm_update_next_owner(old_mm); mmput(old_mm); Neither lock is needed for this. exec_update_lock only exists to make the mm swap atomic with the later commit_creds(), so that permission-checking readers (proc, ptrace, the futex robust list, perf, kcmp, mm_access()) never observe the new mm together with the old credentials. Those readers all operate on task->mm, i.e. the new mm after the swap; none looks at the detached old mm, its ->owner or signal->maxrss. cred_guard_mutex guards credential calculation and is equally irrelevant here. The cost is real: __mmput() runs exit_mmap() over the entire old address space and can block in exit_aio() waiting for in-flight AIO, all while holding exec_update_lock for writing and cred_guard_mutex. For execve() of a large process this blocks ptrace_attach() and every exec_update_lock reader for the duration of the teardown. Stash the old mm in bprm->old_mm and release it from setup_new_exec() after both locks are dropped. setup_new_exec() still runs before setup_arg_pages() and the segment mappings, so the old address space is freed before the new one is populated and peak memory is unchanged. The ordering constraints are kept: old_mm's mmap_lock is still dropped in exec_mmap() before mm_update_next_owner() (required since commit 31a78f23bac0 ("mm owner: fix race between swapoff and exit")), and mm_update_next_owner() still precedes mmput(); both run in the execing task's context, as mm_update_next_owner() requires. If exec swaps the mm but fails before setup_new_exec() runs the old mm would leak, so add a backstop in free_bprm(). The lazy-tlb case (old_mm == NULL, e.g. kernel_execve()) has no address space to free and is left in exec_mmap(). Link: https://patch.msgid.link/20260522-work-exit_mm-v1-1-bd32d5a560bb@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

Diffstat (limited to 'drivers/platform/wmi/tests/git@git.tavy.me:linux.git')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: