linux.git/fs/mount.h, branch v6.17

copy_tree(): don't link the mounts via mnt_list

2025-06-29T23:03:37+00:00

The only place that really needs to be adjusted is commit_tree() -
there we need to iterate through the copy and we might as well
use next_mnt() for that.  However, in case when our tree has been
slid under something already mounted (propagation to a mountpoint
that already has something mounted on it or a 'beneath' move_mount)
we need to take care not to walk into the overmounting tree.

Signed-off-by: Al Viro

mnt_slave_list/mnt_slave: turn into hlist_head/hlist_node

2025-06-29T23:03:30+00:00

Signed-off-by: Al Viro

mount: separate the flags accessed only under namespace_sem

2025-06-29T23:03:29+00:00

Several flags are updated and checked only under namespace_sem; we are
already making use of that when we are checking them without mount_lock,
but we have to hold mount_lock for all updates, which makes things
clumsier than they have to be.

Take MNT_SHARED, MNT_UNBINDABLE, MNT_MARKED and MNT_UMOUNT_CANDIDATE
into a separate field (->mnt_t_flags), renaming them to T_SHARED,
etc. to avoid confusion.  All accesses must be under namespace_sem.

That changes locking requirements for mnt_change_propagation() and
set_mnt_shared() - only namespace_sem is needed now.  The same goes
for SET_MNT_MARKED et.al.

There might be more flags moved from ->mnt_flags to that field;
this is just the initial set.

Signed-off-by: Al Viro

get rid of mountpoint->m_count

2025-06-29T22:13:42+00:00

struct mountpoint has an odd kinda-sorta refcount in it.  It's always
either equal to or one above the number of mounts attached to that
mountpoint.

"One above" happens when a function takes a temporary reference to
mountpoint.  Things get simpler if we express that as inserting
a local object into ->m_list and removing it to drop the reference.

New calling conventions:

1) lock_mount(), do_lock_mount(), get_mountpoint() and lookup_mountpoint()
take an extra struct pinned_mountpoint * argument and returns 0/-E...
(or true/false in case of lookup_mountpoint()) instead of returning
struct mountpoint pointers.  In case of success, the struct mountpoint *
we used to get can be found as pinned_mountpoint.mp

2) unlock_mount() (always paired with lock_mount()/do_lock_mount()) takes
an address of struct pinned_mountpoint - the same that had been passed to
lock_mount()/do_lock_mount().

3) put_mountpoint() for a temporary reference (paired with get_mountpoint()
or lookup_mountpoint()) is replaced with unpin_mountpoint(), which takes
the address of pinned_mountpoint we passed to matching {get,lookup}_mountpoint().

4) all instances of pinned_mountpoint are local variables; they always live on
stack.  {} is used for initializer, after successful {get,lookup}_mountpoint()
we must make sure to call unpin_mountpoint() before leaving the scope and
after successful {do_,}lock_mount() we must make sure to call unlock_mount()
before leaving the scope.

5) all manipulations of ->m_count are gone, along with ->m_count itself.
struct mountpoint lives while its ->m_list is non-empty.

Signed-off-by: Al Viro

Rewrite of propagate_umount()

2025-06-29T22:13:41+00:00

The variant currently in the tree has problems; trying to prove
correctness has caught at least one class of bugs (reparenting
that ends up moving the visible location of reparented mount, due
to not excluding some of the counterparts on propagation that
should've been included).

I tried to prove that it's the only bug there; I'm still not sure
whether it is.  If anyone can reconstruct and write down an analysis
of the mainline implementation, I'll gladly review it; as it is,
I ended up doing a different implementation.  Candidate collection
phase is similar, but trimming the set down until it satisfies the
constraints turned out pretty different.

I hoped to do transformation as a massage series, but that turns out
to be too convoluted.  So it's a single patch replacing propagate_umount()
and friends in one go, with notes and analysis in D/f/propagate_umount.txt
(in addition to inline comments).

As far I can tell, it is provably correct and provably linear by the number
of mounts we need to look at in order to decide what should be unmounted.
It even builds and seems to survive testing...

Another nice thing that fell out of that is that ->mnt_umounting is no longer
needed.

Compared to the first version:
	* explicit MNT_UMOUNT_CANDIDATE flag for is_candidate()
	* trim_ancestors() only clears that flag, leaving the suckers on list
	* trim_one() and handle_locked() take the stuff with flag cleared off
the list.  That allows to iterate with list_for_each_entry_safe() when calling
trim_one() - it removes at most one element from the list now.
	* no globals - I didn't bother with any kind of context, not worth it.

	* Notes updated accordingly; I have not touch the terms yet.

Reviewed-by: Christian Brauner 
Signed-off-by: Al Viro

new predicate: anon_ns_root(mount)

2025-06-29T22:13:41+00:00

checks if mount is the root of an anonymouns namespace.
Switch open-coded equivalents to using it.

For mounts that belong to anon namespace !mnt_has_parent(mount)
is the same as mount == ns->root, and intent is more obvious in
the latter form.

Reviewed-by: Christian Brauner 
Signed-off-by: Al Viro

constify is_local_mountpoint()

2025-06-29T22:13:41+00:00

Reviewed-by: Christian Brauner 
Signed-off-by: Al Viro

constify mnt_has_parent()

2025-06-29T22:13:41+00:00

Reviewed-by: Christian Brauner 
Signed-off-by: Al Viro

prevent mount hash conflicts

2025-06-29T22:13:41+00:00

Currently it's still possible to run into a pathological situation when
two hashed mounts share both parent and mountpoint.  That does not work
well, for obvious reasons.

We are not far from getting rid of that; the only remaining gap is
attach_recursive_mnt() not being careful enough when sliding a tree
under existing mount (for propagated copies or in 'beneath' case for
the original one).

To deal with that cleanly we need to be able to find overmounts
(i.e. mounts on top of parent's root); we could do hash lookups or scan
the list of children but either would be costly.  Since one of the results
we get from that will be prevention of multiple parallel overmounts, let's
just bite the bullet and store a (non-counting) reference to overmount
in struct mount.

With that done, closing the hole in attach_recursive_mnt() becomes easy
- we just need to follow the chain of overmounts before we change the
mountpoint of the mount we are sliding things under.

Signed-off-by: Al Viro

Don't propagate mounts into detached trees

2025-05-26T21:35:32+00:00

All versions up to 6.14 did not propagate mount events into detached
tree.  Shortly after 6.14 a merge of vfs-6.15-rc1.mount.namespace
(130e696aa68b) has changed that.

Unfortunately, that has caused userland regressions (reported in
https://lore.kernel.org/all/CAOYeF9WQhFDe+BGW=Dp5fK8oRy5AgZ6zokVyTj1Wp4EUiYgt4w@mail.gmail.com/)

Straight revert wouldn't be an option - in particular, the variant in 6.14
had a bug that got fixed in d1ddc6f1d9f0 ("fix IS_MNT_PROPAGATING uses")
and we don't want to bring the bug back.

This is a modification of manual revert posted by Christian, with changes
needed to avoid reintroducing the breakage in scenario described in
d1ddc6f1d9f0.

Cc: stable@vger.kernel.org
Reported-by: Allison Karlitskaya 
Tested-by: Allison Karlitskaya 
Acked-by: Christian Brauner 
Co-developed-by: Christian Brauner 
Signed-off-by: Al Viro