linux.git/arch/um/os-Linux, branch v6.16

um: fix SECCOMP 32bit xstate register restore

2025-06-04T09:40:36+00:00

There was a typo that caused the extended FP state to be copied into the
wrong location on 32 bit. On 32 bit we only store the xstate internally
as that already contains everything. However, for compatibility, the
mcontext on 32 bit first contains the legacy FP state and then the
xstate.

The code copied the xstate on top of the legacy FP state instead of
using the correct offset. This offset was already calculated in the
xstate_* variables, so simply switch to those to fix the problem.

With this SECCOMP mode works on 32 bit, so lift the restriction.

Fixes: b1e1bd2e6943 ("um: Add helper functions to get/set state for SECCOMP")
Signed-off-by: Benjamin Berg 
Link: https://patch.msgid.link/20250604081705.934112-1-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg

um: pass FD for memory operations when needed

2025-06-02T14:20:10+00:00

Instead of always sharing the FDs with the userspace process, only hand
over the FDs needed for mmap when required. The idea is that userspace
might be able to force the stub into executing an mmap syscall, however,
it will not be able to manipulate the control flow sufficiently to have
access to an FD that would allow mapping arbitrary memory.

Security wise, we need to be sure that only the expected syscalls are
executed after the kernel sends FDs through the socket. This is
currently not the case, as userspace can trivially jump to the
rt_sigreturn syscall instruction to execute any syscall that the stub is
permitted to do. With this, it can trick the kernel to send the FD,
which in turn allows userspace to freely map any physical memory.

As such, this is currently *not* secure. However, in principle the
approach should be fine with a more strict SECCOMP filter and a careful
review of the stub control flow (as userspace can prepare a stack). With
some care, it is likely possible to extend the security model to SMP if
desired.

Signed-off-by: Benjamin Berg 
Link: https://patch.msgid.link/20250602130052.545733-8-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg

um: Add SECCOMP support detection and initialization

2025-06-02T14:20:01+00:00

This detects seccomp support, sets the global using_seccomp variable and
initilizes the exec registers. The support is only enabled if the
seccomp= kernel parameter is set to either "on" or "auto". With "auto" a
fallback to ptrace mode will happen if initialization failed.

Signed-off-by: Benjamin Berg 
Signed-off-by: Benjamin Berg 
Link: https://patch.msgid.link/20250602130052.545733-7-benjamin@sipsolutions.net
[extend help with Kconfig text from v2, use exit syscall instead of libc,
 remove unneeded mctx_offset assignment, disable on 32-bit for now]
Signed-off-by: Johannes Berg

um: Implement kernel side of SECCOMP based process handling

2025-06-02T13:17:19+00:00

This adds the kernel side of the seccomp based process handling.

Co-authored-by: Johannes Berg 
Signed-off-by: Benjamin Berg 
Signed-off-by: Benjamin Berg 
Link: https://patch.msgid.link/20250602130052.545733-6-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg

um: Track userspace children dying in SECCOMP mode

2025-06-02T13:17:19+00:00

When in seccomp mode, we would hang forever on the futex if a child has
died unexpectedly. In contrast, ptrace mode will notice it and kill the
corresponding thread when it fails to run it.

Fix this issue using a new IRQ that is fired after a SIGCHLD and keeping
an (internal) list of all MMs. In the IRQ handler, find the affected MM
and set its PID to -1 as well as the futex variable to FUTEX_IN_KERN.

This, together with futex returning -EINTR after the signal is
sufficient to implement a race-free detection of a child dying.

Note that this also enables IRQ handling while starting a userspace
process. This should be safe and SECCOMP requires the IRQ in case the
process does not come up properly.

Signed-off-by: Benjamin Berg 
Signed-off-by: Benjamin Berg 
Link: https://patch.msgid.link/20250602130052.545733-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg

um: Move faultinfo extraction into userspace routine

2025-06-02T13:17:19+00:00

The segv handler is called slightly differently depending on whether
PTRACE_FULL_FAULTINFO is set or not (32bit vs. 64bit). The only
difference is that we don't try to pass the registers and instruction
pointer to the segv handler.

It would be good to either document or remove the difference, but I do
not know why this difference exists. And, passing NULL can even result
in a crash.

Signed-off-by: Benjamin Berg 
Link: https://patch.msgid.link/20250602130052.545733-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg

um: Fix tgkill compile error on old host OSes

2025-06-02T09:22:55+00:00

tgkill is a quite old syscall since kernel 2.5.75, but unfortunately glibc
doesn't support it before 2.30. Thus some systems fail to compile the
latest UserMode Linux.

Here is the compile error I encountered when I tried to compile UML in
my system shipped with glibc-2.28.

    CALL    scripts/checksyscalls.sh
    CC      arch/um/os-Linux/sigio.o
  In file included from arch/um/os-Linux/sigio.c:17:
  arch/um/os-Linux/sigio.c: In function ‘write_sigio_thread’:
  arch/um/os-Linux/sigio.c:49:19: error: implicit declaration of function ‘tgkill’; did you mean ‘kill’? [-Werror=implicit-function-declaration]
     CATCH_EINTR(r = tgkill(pid, pid, SIGIO));
                     ^~~~~~
  ./arch/um/include/shared/os.h:21:48: note: in definition of macro ‘CATCH_EINTR’
  #define CATCH_EINTR(expr) while ((errno = 0, ((expr) < 0)) && (errno == EINTR))
                                                ^~~~
  cc1: some warnings being treated as errors

Fix it by Replacing glibc call with raw syscall.

Fixes: 33c9da5dfb18 ("um: Rewrite the sigio workaround based on epoll and tgkill")
Signed-off-by: Yongting Lin 
Link: https://patch.msgid.link/20250527151222.40371-1-linyongting@gmail.com
Signed-off-by: Johannes Berg

um: Remove obsolete legacy network transports

2025-05-05T08:26:59+00:00

These legacy network transports were marked as obsolete in commit
40814b98a570 ("um: Mark non-vector net transports as obsolete").
More than five years have passed since then. Remove these network
transports to reduce the maintenance burden.

Suggested-by: Anton Ivanov 
Signed-off-by: Tiwei Bie 
Acked-By: Anton Ivanov 
Link: https://patch.msgid.link/20250503051710.3286595-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg

um: Rewrite the sigio workaround based on epoll and tgkill

2025-03-20T08:28:44+00:00

The existing sigio workaround implementation removes FDs from the
poll when events are triggered, requiring users to re-add them via
add_sigio_fd() after processing. This introduces a potential race
condition between FD removal in write_sigio_thread() and next_poll
update in __add_sigio_fd(), and is inefficient due to frequent FD
removal and re-addition. Rewrite the implementation based on epoll
and tgkill for improved efficiency and reliability.

Signed-off-by: Tiwei Bie 
Link: https://patch.msgid.link/20250315161910.4082396-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg

um: Prohibit the VM_CLONE flag in run_helper_thread()

2025-03-20T08:26:38+00:00

Directly creating helper threads with VM_CLONE using clone can
compromise the thread safety of errno. Since all these helper
threads have been converted to use os_run_helper_thread(), let's
prevent using this flag in run_helper_thread().

Signed-off-by: Tiwei Bie 
Link: https://patch.msgid.link/20250319135523.97050-5-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg