diff options
| author | Christian Brauner <brauner@kernel.org> | 2026-03-02 10:53:15 +0100 |
|---|---|---|
| committer | Christian Brauner <brauner@kernel.org> | 2026-03-12 13:34:59 +0100 |
| commit | 43dcce3f2a6209898f31d1ef99e0a4a1335ebb67 (patch) | |
| tree | 6dc999bf0cf144862b77d1a7734dcc465d7f1efd /rust/kernel/ptr/git@git.tavy.me:linux.git | |
| parent | 4e9f7592b6f5fe4929b2d755785788acba123db5 (diff) | |
| parent | bb5c17bc863d1ac9ee0d51d300d5399d632fe69f (diff) | |
Merge patch series "move_mount: expand MOVE_MOUNT_BENEATH"
Christian Brauner <brauner@kernel.org> says:
I'm too tired now to keep refining this but I think it's in good enough
shape for review.
Allow MOVE_MOUNT_BENEATH to target the caller's rootfs, allowing to
switch out the rootfs without pivot_root(2).
The traditional approach to switching the rootfs involves pivot_root(2)
or a chroot_fs_refs()-based mechanism that atomically updates fs->root
for all tasks sharing the same fs_struct. This has consequences for
fork(), unshare(CLONE_FS), and setns().
This series instead decomposes root-switching into individually atomic,
locally-scoped steps:
fd_tree = open_tree(-EBADF, "/newroot",
OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC);
fchdir(fd_tree);
move_mount(fd_tree, "", AT_FDCWD, "/",
MOVE_MOUNT_BENEATH | MOVE_MOUNT_F_EMPTY_PATH);
chroot(".");
umount2(".", MNT_DETACH);
Since each step only modifies the caller's own state, the
fork/unshare/setns races are eliminated by design.
A key step to making this possible is to remove the locked mount
restriction. Originally MOVE_MOUNT_BENEATH doesn't support mounting
beneath a mount that is locked. The locked mount protects the underlying
mount from being revealed. This is a core mechanism of
unshare(CLONE_NEWUSER | CLONE_NEWNS). The mounts in the new mount
namespace become locked. That effectively makes the new mount table
useless as the caller cannot ever get rid of any of the mounts no matter
how useless they are.
We can lift this restriction though. We simply transfer the locked
property from the top mount to the mount beneath. This works because
what we care about is to protect the underlying mount aka the parent.
The mount mounted between the parent and the top mount takes over the
job of protecting the parent mount from the top mount mount. This leaves
us free to remove the locked property from the top mount which can
consequently be unmounted:
unshare(CLONE_NEWUSER | CLONE_NEWNS)
and we inherit a clone of procfs on /proc then currently we cannot
unmount it as:
umount -l /proc
will fail with EINVAL because the procfs mount is locked.
After this series we can now do:
mount --beneath -t tmpfs tmpfs /proc
umount -l /proc
after which a tmpfs mount has been placed beneath the procfs mount. The
tmpfs mount has become locked and the procfs mount has become unlocked.
This means you can safely modify an inherited mount table after
unprivileged namespace creation.
Afterwards we simply make it possible to move a mount beneath the
rootfs allowing to upgrade the rootfs.
Removing the locked restriction makes this very useful for containers
created with unshare(CLONE_NEWUSER | CLONE_NEWNS) to reshuffle an
inherited mount table safely and MOVE_MOUNT_BENEATH makes it possible to
switch out the rootfs instead of using the costly pivot_root(2).
* patches from https://patch.msgid.link/20260224-work-mount-beneath-rootfs-v1-0-8c58bf08488f@kernel.org:
selftests/filesystems: add MOVE_MOUNT_BENEATH rootfs tests
move_mount: allow MOVE_MOUNT_BENEATH on the rootfs
move_mount: transfer MNT_LOCKED
Link: https://patch.msgid.link/20260224-work-mount-beneath-rootfs-v1-0-8c58bf08488f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'rust/kernel/ptr/git@git.tavy.me:linux.git')
0 files changed, 0 insertions, 0 deletions
