linux-stable.git/fs/namespace.c, branch linux-3.8.y

userns: Restrict when proc and sysfs can be mounted

2013-04-05T16:26:02+00:00

commit 87a8ebd637dafc255070f503909a053cf0d98d3f upstream.

Only allow unprivileged mounts of proc and sysfs if they are already
mounted when the user namespace is created.

proc and sysfs are interesting because they have content that is
per namespace, and so fresh mounts are needed when new namespaces
are created while at the same time proc and sysfs have content that
is shared between every instance.

Respect the policy of who may see the shared content of proc and sysfs
by only allowing new mounts if there was an existing mount at the time
the user namespace was created.

In practice there are only two interesting cases: proc and sysfs are
mounted at their usual places, proc and sysfs are not mounted at all
(some form of mount namespace jail).

Acked-by: Serge Hallyn 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

vfs: Carefully propogate mounts across user namespaces

2013-04-05T16:26:02+00:00

commit 132c94e31b8bca8ea921f9f96a57d684fa4ae0a9 upstream.

As a matter of policy MNT_READONLY should not be changable if the
original mounter had more privileges than creator of the mount
namespace.

Add the flag CL_UNPRIVILEGED to note when we are copying a mount from
a mount namespace that requires more privileges to a mount namespace
that requires fewer privileges.

When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT
if any of the mnt flags that should never be changed are set.

This protects both mount propagation and the initial creation of a less
privileged mount namespace.

Acked-by: Serge Hallyn 
Reported-by: Andy Lutomirski 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

vfs: Add a mount flag to lock read only bind mounts

2013-04-05T16:26:01+00:00

commit 90563b198e4c6674c63672fae1923da467215f45 upstream.

When a read-only bind mount is copied from mount namespace in a higher
privileged user namespace to a mount namespace in a lesser privileged
user namespace, it should not be possible to remove the the read-only
restriction.

Add a MNT_LOCK_READONLY mount flag to indicate that a mount must
remain read-only.

Acked-by: Serge Hallyn 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

userns: Don't allow creation if the user is chrooted

2013-04-05T16:26:01+00:00

commit 3151527ee007b73a0ebd296010f1c0454a919c7d upstream.

Guarantee that the policy of which files may be access that is
established by setting the root directory will not be violated
by user namespaces by verifying that the root directory points
to the root of the mount namespace at the time of user namespace
creation.

Changing the root is a privileged operation, and as a matter of policy
it serves to limit unprivileged processes to files below the current
root directory.

For reasons of simplicity and comprehensibility the privilege to
change the root directory is gated solely on the CAP_SYS_CHROOT
capability in the user namespace.  Therefore when creating a user
namespace we must ensure that the policy of which files may be access
can not be violated by changing the root directory.

Anyone who runs a processes in a chroot and would like to use user
namespace can setup the same view of filesystems with a mount
namespace instead.  With this result that this is not a practical
limitation for using user namespaces.

Acked-by: Serge Hallyn 
Reported-by: Andy Lutomirski 
Signed-off-by: "Eric W. Biederman" 
Signed-off-by: Greg Kroah-Hartman

get rid of unprotected dereferencing of mnt->mnt_ns

2013-02-28T13:38:35+00:00

commit 9b40bc90abd126bcc5da5658059b8e72e285e559 upstream.

It's safe only under namespace_sem or vfsmount_lock; all places
in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use
current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in
there).

Signed-off-by: Al Viro 
Signed-off-by: Greg Kroah-Hartman

vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags

2012-12-20T18:36:18+00:00

The compiler may optimize the while loop and make the check just be done once,
so we should use ACCESS_ONCE() to guard access to ->mnt_flags

Signed-off-by: Miao Xie 
Signed-off-by: Al Viro

userns: Require CAP_SYS_ADMIN for most uses of setns.

2012-12-15T00:12:03+00:00

Andy Lutomirski  found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn 
Signed-off-by: "Eric W. Biederman"

proc: Usable inode numbers for the namespace file descriptors.

2012-11-20T12:19:49+00:00

Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.

A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.

This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.

We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.

I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.

Signed-off-by: Eric W. Biederman

userns: fix return value on mntns_install() failure

2012-11-19T13:59:22+00:00

Change return value from -EINVAL to -EPERM when the permission check fails.

Signed-off-by: Zhao Hongjiang 
Signed-off-by: Eric W. Biederman

vfs: Allow unprivileged manipulation of the mount namespace.

2012-11-19T13:59:21+00:00

- Add a filesystem flag to mark filesystems that are safe to mount as
  an unprivileged user.

- Add a filesystem flag to mark filesystems that don't need MNT_NODEV
  when mounted by an unprivileged user.

- Relax the permission checks to allow unprivileged users that have
  CAP_SYS_ADMIN permissions in the user namespace referred to by the
  current mount namespace to be allowed to mount, unmount, and move
  filesystems.

Acked-by: "Serge E. Hallyn" 
Signed-off-by: "Eric W. Biederman"