x SuSE Linux 13.1-RELEASE x
x SuSE Linux 13.1-RELEASEx
mount_namespaces(7) Miscellaneous Information Manual mount_namespaces(7)
NAME
mount_namespaces - overview of Linux mount namespaces
DESCRIPTION
For an overview of namespaces, see namespaces(7).
Mount namespaces provide isolation of the list of mounts seen by the
processes in each namespace instance. Thus, the processes in each of
the mount namespace instances will see distinct single-directory hier-
archies.
The views provided by the /proc/pid/mounts, /proc/pid/mountinfo, and
/proc/pid/mountstats files (all described in proc(5)) correspond to the
mount namespace in which the process with the PID pid resides. (All of
the processes that reside in the same mount namespace will see the same
view in these files.)
A new mount namespace is created using either clone(2) or unshare(2)
with the CLONE_NEWNS flag. When a new mount namespace is created, its
mount list is initialized as follows:
o If the namespace is created using clone(2), the mount list of the
child's namespace is a copy of the mount list in the parent
process's mount namespace.
o If the namespace is created using unshare(2), the mount list of the
new namespace is a copy of the mount list in the caller's previous
mount namespace.
Subsequent modifications to the mount list (mount(2) and umount(2)) in
either mount namespace will not (by default) affect the mount list seen
in the other namespace (but see the following discussion of shared sub-
trees).
SHARED SUBTREES
After the implementation of mount namespaces was completed, experience
showed that the isolation that they provided was, in some cases, too
great. For example, in order to make a newly loaded optical disk
available in all mount namespaces, a mount operation was required in
each namespace. For this use case, and others, the shared subtree fea-
ture was introduced in Linux 2.6.15. This feature allows for auto-
matic, controlled propagation of mount(2) and umount(2) events between
namespaces (or, more precisely, between the mounts that are members of
a peer group that are propagating events to one another).
Each mount is marked (via mount(2)) as having one of the following
propagation types:
MS_SHARED
This mount shares events with members of a peer group. mount(2)
and umount(2) events immediately under this mount will propagate
to the other mounts that are members of the peer group. Propa-
gation here means that the same mount(2) or umount(2) will auto-
matically occur under all of the other mounts in the peer group.
Conversely, mount(2) and umount(2) events that take place under
peer mounts will propagate to this mount.
MS_PRIVATE
This mount is private; it does not have a peer group. mount(2)
and umount(2) events do not propagate into or out of this mount.
MS_SLAVE
mount(2) and umount(2) events propagate into this mount from a
(master) shared peer group. mount(2) and umount(2) events under
this mount do not propagate to any peer.
Note that a mount can be the slave of another peer group while
at the same time sharing mount(2) and umount(2) events with a
peer group of which it is a member. (More precisely, one peer
group can be the slave of another peer group.)
MS_UNBINDABLE
This is like a private mount, and in addition this mount can't
be bind mounted. Attempts to bind mount this mount (mount(2)
with the MS_BIND flag) will fail.
When a recursive bind mount (mount(2) with the MS_BIND and
MS_REC flags) is performed on a directory subtree, any bind
mounts within the subtree are automatically pruned (i.e., not
replicated) when replicating that subtree to produce the target
subtree.
For a discussion of the propagation type assigned to a new mount, see
NOTES.
The propagation type is a per-mount-point setting; some mounts may be
marked as shared (with each shared mount being a member of a distinct
peer group), while others are private (or slaved or unbindable).
Note that a mount's propagation type determines whether mount(2) and
umount(2) of mounts immediately under the mount are propagated. Thus,
the propagation type does not affect propagation of events for grand-
children and further removed descendant mounts. What happens if the
mount itself is unmounted is determined by the propagation type that is
in effect for the parent of the mount.
Members are added to a peer group when a mount is marked as shared and
either:
(a) the mount is replicated during the creation of a new mount name-
space; or
(b) a new bind mount is created from the mount.
In both of these cases, the new mount joins the peer group of which the
existing mount is a member.
A new peer group is also created when a child mount is created under an
existing mount that is marked as shared. In this case, the new child
mount is also marked as shared and the resulting peer group consists of
all the mounts that are replicated under the peers of parent mounts.
A mount ceases to be a member of a peer group when either the mount is
explicitly unmounted, or when the mount is implicitly unmounted because
a mount namespace is removed (because it has no more member processes).
The propagation type of the mounts in a mount namespace can be discov-
ered via the "optional fields" exposed in /proc/pid/mountinfo. (See
proc(5) for details of this file.) The following tags can appear in
the optional fields for a record in that file:
shared:X
This mount is shared in peer group X. Each peer group has a
unique ID that is automatically generated by the kernel, and all
mounts in the same peer group will show the same ID. (These IDs
are assigned starting from the value 1, and may be recycled when
a peer group ceases to have any members.)
master:X
This mount is a slave to shared peer group X.
propagate_from:X (since Linux 2.6.26)
This mount is a slave and receives propagation from shared peer
group X. This tag will always appear in conjunction with a mas-
ter:X tag. Here, X is the closest dominant peer group under the
process's root directory. If X is the immediate master of the
mount, or if there is no dominant peer group under the same
root, then only the master:X field is present and not the propa-
gate_from:X field. For further details, see below.
unbindable
This is an unbindable mount.
If none of the above tags is present, then this is a private mount.
MS_SHARED and MS_PRIVATE example
Suppose that on a terminal in the initial mount namespace, we mark one
mount as shared and another as private, and then view the mounts in
/proc/self/mountinfo:
sh1# mount --make-shared /mntS
sh1# mount --make-private /mntP
sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
77 61 8:17 / /mntS rw,relatime shared:1
83 61 8:15 / /mntP rw,relatime
From the /proc/self/mountinfo output, we see that /mntS is a shared
mount in peer group 1, and that /mntP has no optional tags, indicating
that it is a private mount. The first two fields in each record in
this file are the unique ID for this mount, and the mount ID of the
parent mount. We can further inspect this file to see that the parent
mount of /mntS and /mntP is the root directory, /, which is mounted as
private:
sh1# cat /proc/self/mountinfo | awk '$1 == 61' | sed 's/ - .*//'
61 0 8:2 / / rw,relatime
On a second terminal, we create a new mount namespace where we run a
second shell and inspect the mounts:
$ PS1='sh2# ' sudo unshare -m --propagation unchanged sh
sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
222 145 8:17 / /mntS rw,relatime shared:1
225 145 8:15 / /mntP rw,relatime
The new mount namespace received a copy of the initial mount name-
space's mounts. These new mounts maintain the same propagation types,
but have unique mount IDs. (The --propagation unchanged option pre-
vents unshare(1) from marking all mounts as private when creating a new
mount namespace, which it does by default.)
In the second terminal, we then create submounts under each of /mntS
and /mntP and inspect the set-up:
sh2# mkdir /mntS/a
sh2# mount /dev/sdb6 /mntS/a
sh2# mkdir /mntP/b
sh2# mount /dev/sdb7 /mntP/b
sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
222 145 8:17 / /mntS rw,relatime shared:1
225 145 8:15 / /mntP rw,relatime
178 222 8:22 / /mntS/a rw,relatime shared:2
230 225 8:23 / /mntP/b rw,relatime
From the above, it can be seen that /mntS/a was created as shared (in-
heriting this setting from its parent mount) and /mntP/b was created as
a private mount.
Returning to the first terminal and inspecting the set-up, we see that
the new mount created under the shared mount /mntS propagated to its
peer mount (in the initial mount namespace), but the new mount created
under the private mount /mntP did not propagate:
sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
77 61 8:17 / /mntS rw,relatime shared:1
83 61 8:15 / /mntP rw,relatime
179 77 8:22 / /mntS/a rw,relatime shared:2
MS_SLAVE example
Making a mount a slave allows it to receive propagated mount(2) and
umount(2) events from a master shared peer group, while preventing it
from propagating events to that master. This is useful if we want to
(say) receive a mount event when an optical disk is mounted in the mas-
ter shared peer group (in another mount namespace), but want to prevent
mount(2) and umount(2) events under the slave mount from having side
effects in other namespaces.
We can demonstrate the effect of slaving by first marking two mounts as
shared in the initial mount namespace:
sh1# mount --make-shared /mntX
sh1# mount --make-shared /mntY
sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
132 83 8:23 / /mntX rw,relatime shared:1
133 83 8:22 / /mntY rw,relatime shared:2
On a second terminal, we create a new mount namespace and inspect the
mounts:
sh2# unshare -m --propagation unchanged sh
sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
168 167 8:23 / /mntX rw,relatime shared:1
169 167 8:22 / /mntY rw,relatime shared:2
In the new mount namespace, we then mark one of the mounts as a slave:
sh2# mount --make-slave /mntY
sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
168 167 8:23 / /mntX rw,relatime shared:1
169 167 8:22 / /mntY rw,relatime master:2
From the above output, we see that /mntY is now a slave mount that is
receiving propagation events from the shared peer group with the ID 2.
Continuing in the new namespace, we create submounts under each of
/mntX and /mntY:
sh2# mkdir /mntX/a
sh2# mount /dev/sda3 /mntX/a
sh2# mkdir /mntY/b
sh2# mount /dev/sda5 /mntY/b
When we inspect the state of the mounts in the new mount namespace, we
see that /mntX/a was created as a new shared mount (inheriting the
"shared" setting from its parent mount) and /mntY/b was created as a
private mount:
sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
168 167 8:23 / /mntX rw,relatime shared:1
169 167 8:22 / /mntY rw,relatime master:2
173 168 8:3 / /mntX/a rw,relatime shared:3
175 169 8:5 / /mntY/b rw,relatime
Returning to the first terminal (in the initial mount namespace), we
see that the mount /mntX/a propagated to the peer (the shared /mntX),
but the mount /mntY/b was not propagated:
sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
132 83 8:23 / /mntX rw,relatime shared:1
133 83 8:22 / /mntY rw,relatime shared:2
174 132 8:3 / /mntX/a rw,relatime shared:3
Now we create a new mount under /mntY in the first shell:
sh1# mkdir /mntY/c
sh1# mount /dev/sda1 /mntY/c
sh1# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
132 83 8:23 / /mntX rw,relatime shared:1
133 83 8:22 / /mntY rw,relatime shared:2
174 132 8:3 / /mntX/a rw,relatime shared:3
178 133 8:1 / /mntY/c rw,relatime shared:4
When we examine the mounts in the second mount namespace, we see that
in this case the new mount has been propagated to the slave mount, and
that the new mount is itself a slave mount (to peer group 4):
sh2# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
168 167 8:23 / /mntX rw,relatime shared:1
169 167 8:22 / /mntY rw,relatime master:2
173 168 8:3 / /mntX/a rw,relatime shared:3
175 169 8:5 / /mntY/b rw,relatime
179 169 8:1 / /mntY/c rw,relatime master:4
MS_UNBINDABLE example
One of the primary purposes of unbindable mounts is to avoid the "mount
explosion" problem when repeatedly performing bind mounts of a higher-
level subtree at a lower-level mount. The problem is illustrated by
the following shell session.
Suppose we have a system with the following mounts:
# mount | awk '{print $1, $2, $3}'
/dev/sda1 on /
/dev/sdb6 on /mntX
/dev/sdb7 on /mntY
Suppose furthermore that we wish to recursively bind mount the root di-
rectory under several users' home directories. We do this for the
first user, and inspect the mounts:
# mount --rbind / /home/cecilia/
# mount | awk '{print $1, $2, $3}'
/dev/sda1 on /
/dev/sdb6 on /mntX
/dev/sdb7 on /mntY
/dev/sda1 on /home/cecilia
/dev/sdb6 on /home/cecilia/mntX
/dev/sdb7 on /home/cecilia/mntY
When we repeat this operation for the second user, we start to see the
explosion problem:
# mount --rbind / /home/henry
# mount | awk '{print $1, $2, $3}'
/dev/sda1 on /
/dev/sdb6 on /mntX
/dev/sdb7 on /mntY
/dev/sda1 on /home/cecilia
/dev/sdb6 on /home/cecilia/mntX
/dev/sdb7 on /home/cecilia/mntY
/dev/sda1 on /home/henry
/dev/sdb6 on /home/henry/mntX
/dev/sdb7 on /home/henry/mntY
/dev/sda1 on /home/henry/home/cecilia
/dev/sdb6 on /home/henry/home/cecilia/mntX
/dev/sdb7 on /home/henry/home/cecilia/mntY
Under /home/henry, we have not only recursively added the /mntX and
/mntY mounts, but also the recursive mounts of those directories under
/home/cecilia that were created in the previous step. Upon repeating
the step for a third user, it becomes obvious that the explosion is ex-
ponential in nature:
# mount --rbind / /home/otto
# mount | awk '{print $1, $2, $3}'
/dev/sda1 on /
/dev/sdb6 on /mntX
/dev/sdb7 on /mntY
/dev/sda1 on /home/cecilia
/dev/sdb6 on /home/cecilia/mntX
/dev/sdb7 on /home/cecilia/mntY
/dev/sda1 on /home/henry
/dev/sdb6 on /home/henry/mntX
/dev/sdb7 on /home/henry/mntY
/dev/sda1 on /home/henry/home/cecilia
/dev/sdb6 on /home/henry/home/cecilia/mntX
/dev/sdb7 on /home/henry/home/cecilia/mntY
/dev/sda1 on /home/otto
/dev/sdb6 on /home/otto/mntX
/dev/sdb7 on /home/otto/mntY
/dev/sda1 on /home/otto/home/cecilia
/dev/sdb6 on /home/otto/home/cecilia/mntX
/dev/sdb7 on /home/otto/home/cecilia/mntY
/dev/sda1 on /home/otto/home/henry
/dev/sdb6 on /home/otto/home/henry/mntX
/dev/sdb7 on /home/otto/home/henry/mntY
/dev/sda1 on /home/otto/home/henry/home/cecilia
/dev/sdb6 on /home/otto/home/henry/home/cecilia/mntX
/dev/sdb7 on /home/otto/home/henry/home/cecilia/mntY
The mount explosion problem in the above scenario can be avoided by
making each of the new mounts unbindable. The effect of doing this is
that recursive mounts of the root directory will not replicate the un-
bindable mounts. We make such a mount for the first user:
# mount --rbind --make-unbindable / /home/cecilia
Before going further, we show that unbindable mounts are indeed unbind-
able:
# mkdir /mntZ
# mount --bind /home/cecilia /mntZ
mount: wrong fs type, bad option, bad superblock on /home/cecilia,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
Now we create unbindable recursive bind mounts for the other two users:
# mount --rbind --make-unbindable / /home/henry
# mount --rbind --make-unbindable / /home/otto
Upon examining the list of mounts, we see there has been no explosion
of mounts, because the unbindable mounts were not replicated under each
user's directory:
# mount | awk '{print $1, $2, $3}'
/dev/sda1 on /
/dev/sdb6 on /mntX
/dev/sdb7 on /mntY
/dev/sda1 on /home/cecilia
/dev/sdb6 on /home/cecilia/mntX
/dev/sdb7 on /home/cecilia/mntY
/dev/sda1 on /home/henry
/dev/sdb6 on /home/henry/mntX
/dev/sdb7 on /home/henry/mntY
/dev/sda1 on /home/otto
/dev/sdb6 on /home/otto/mntX
/dev/sdb7 on /home/otto/mntY
Propagation type transitions
The following table shows the effect that applying a new propagation
type (i.e., mount --make-xxxx) has on the existing propagation type of
a mount. The rows correspond to existing propagation types, and the
columns are the new propagation settings. For reasons of space, "pri-
vate" is abbreviated as "priv" and "unbindable" as "unbind".
make-shared make-slave make-priv make-unbind
-------------+-------------------------------------------------------
shared |shared slave/priv [1] priv unbind
slave |slave+shared slave [2] priv unbind
slave+shared |slave+shared slave priv unbind
private |shared priv [2] priv unbind
unbindable |shared unbind [2] priv unbind
Note the following details to the table:
[1] If a shared mount is the only mount in its peer group, making it a
slave automatically makes it private.
[2] Slaving a nonshared mount has no effect on the mount.
Bind (MS_BIND) semantics
Suppose that the following command is performed:
mount --bind A/a B/b
Here, A is the source mount, B is the destination mount, a is a subdi-
rectory path under the mount point A, and b is a subdirectory path un-
der the mount point B. The propagation type of the resulting mount,
B/b, depends on the propagation types of the mounts A and B, and is
summarized in the following table.
source(A)
shared private slave unbind
------------------+------------------------------------------
dest(B) shared |shared shared slave+shared invalid
nonshared|shared private slave invalid
Note that a recursive bind of a subtree follows the same semantics as
for a bind operation on each mount in the subtree. (Unbindable mounts
are automatically pruned at the target mount point.)
For further details, see Documentation/filesystems/sharedsubtree.rst in
the kernel source tree.
Move (MS_MOVE) semantics
Suppose that the following command is performed:
mount --move A B/b
Here, A is the source mount, B is the destination mount, and b is a
subdirectory path under the mount point B. The propagation type of the
resulting mount, B/b, depends on the propagation types of the mounts A
and B, and is summarized in the following table.
source(A)
shared private slave unbind
------------------+---------------------------------------------
dest(B) shared |shared shared slave+shared invalid
nonshared|shared private slave unbindable
Note: moving a mount that resides under a shared mount is invalid.
For further details, see Documentation/filesystems/sharedsubtree.rst in
the kernel source tree.
Mount semantics
Suppose that we use the following command to create a mount:
mount device B/b
Here, B is the destination mount, and b is a subdirectory path under
the mount point B. The propagation type of the resulting mount, B/b,
follows the same rules as for a bind mount, where the propagation type
of the source mount is considered always to be private.
Unmount semantics
Suppose that we use the following command to tear down a mount:
umount A
Here, A is a mount on B/b, where B is the parent mount and b is a sub-
directory path under the mount point B. If B is shared, then all most-
recently-mounted mounts at b on mounts that receive propagation from
mount B and do not have submounts under them are unmounted.
The /proc/ pid /mountinfo propagate_from tag
The propagate_from:X tag is shown in the optional fields of a
/proc/pid/mountinfo record in cases where a process can't see a slave's
immediate master (i.e., the pathname of the master is not reachable
from the filesystem root directory) and so cannot determine the chain
of propagation between the mounts it can see.
In the following example, we first create a two-link master-slave chain
between the mounts /mnt, /tmp/etc, and /mnt/tmp/etc. Then the ch-
root(1) command is used to make the /tmp/etc mount point unreachable
from the root directory, creating a situation where the master of
/mnt/tmp/etc is not reachable from the (new) root directory of the
process.
First, we bind mount the root directory onto /mnt and then bind mount
/proc at /mnt/proc so that after the later chroot(1) the proc(5)
filesystem remains visible at the correct location in the chroot-ed en-
vironment.
# mkdir -p /mnt/proc
# mount --bind / /mnt
# mount --bind /proc /mnt/proc
Next, we ensure that the /mnt mount is a shared mount in a new peer
group (with no peers):
# mount --make-private /mnt # Isolate from any previous peer group
# mount --make-shared /mnt
# cat /proc/self/mountinfo | grep '/mnt' | sed 's/ - .*//'
239 61 8:2 / /mnt ... shared:102
248 239 0:4 / /mnt/proc ... shared:5
Next, we bind mount /mnt/etc onto /tmp/etc:
# mkdir -p /tmp/etc
# mount --bind /mnt/etc /tmp/etc
# cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
239 61 8:2 / /mnt ... shared:102
248 239 0:4 / /mnt/proc ... shared:5
267 40 8:2 /etc /tmp/etc ... shared:102
Initially, these two mounts are in the same peer group, but we then
make the /tmp/etc a slave of /mnt/etc, and then make /tmp/etc shared as
well, so that it can propagate events to the next slave in the chain:
# mount --make-slave /tmp/etc
# mount --make-shared /tmp/etc
# cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
239 61 8:2 / /mnt ... shared:102
248 239 0:4 / /mnt/proc ... shared:5
267 40 8:2 /etc /tmp/etc ... shared:105 master:102
Then we bind mount /tmp/etc onto /mnt/tmp/etc. Again, the two mounts
are initially in the same peer group, but we then make /mnt/tmp/etc a
slave of /tmp/etc:
# mkdir -p /mnt/tmp/etc
# mount --bind /tmp/etc /mnt/tmp/etc
# mount --make-slave /mnt/tmp/etc
# cat /proc/self/mountinfo | egrep '/mnt|/tmp/' | sed 's/ - .*//'
239 61 8:2 / /mnt ... shared:102
248 239 0:4 / /mnt/proc ... shared:5
267 40 8:2 /etc /tmp/etc ... shared:105 master:102
273 239 8:2 /etc /mnt/tmp/etc ... master:105
From the above, we see that /mnt is the master of the slave /tmp/etc,
which in turn is the master of the slave /mnt/tmp/etc.
We then chroot(1) to the /mnt directory, which renders the mount with
ID 267 unreachable from the (new) root directory:
# chroot /mnt
When we examine the state of the mounts inside the chroot-ed environ-
ment, we see the following:
# cat /proc/self/mountinfo | sed 's/ - .*//'
239 61 8:2 / / ... shared:102
248 239 0:4 / /proc ... shared:5
273 239 8:2 /etc /tmp/etc ... master:105 propagate_from:102
Above, we see that the mount with ID 273 is a slave whose master is the
peer group 105. The mount point for that master is unreachable, and so
a propagate_from tag is displayed, indicating that the closest dominant
peer group (i.e., the nearest reachable mount in the slave chain) is
the peer group with the ID 102 (corresponding to the /mnt mount point
before the chroot(1) was performed).
STANDARDS
Linux.
HISTORY
Linux 2.4.19.
NOTES
The propagation type assigned to a new mount depends on the propagation
type of the parent mount. If the mount has a parent (i.e., it is a
non-root mount point) and the propagation type of the parent is
MS_SHARED, then the propagation type of the new mount is also
MS_SHARED. Otherwise, the propagation type of the new mount is MS_PRI-
VATE.
Notwithstanding the fact that the default propagation type for new
mount is in many cases MS_PRIVATE, MS_SHARED is typically more useful.
For this reason, systemd(1) automatically remounts all mounts as
MS_SHARED on system startup. Thus, on most modern systems, the default
propagation type is in practice MS_SHARED.
Since, when one uses unshare(1) to create a mount namespace, the goal
is commonly to provide full isolation of the mounts in the new name-
space, unshare(1) (since util-linux
2.27) in turn reverses the step performed by systemd(1), by making all
mounts private in the new namespace. That is, unshare(1) performs the
equivalent of the following in the new mount namespace:
mount --make-rprivate /
To prevent this, one can use the --propagation unchanged option to un-
share(1).
An application that creates a new mount namespace directly using
clone(2) or unshare(2) may desire to prevent propagation of mount
events to other mount namespaces (as is done by unshare(1)). This can
be done by changing the propagation type of mounts in the new namespace
to either MS_SLAVE or MS_PRIVATE, using a call such as the following:
mount(NULL, "/", MS_SLAVE | MS_REC, NULL);
For a discussion of propagation types when moving mounts (MS_MOVE) and
creating bind mounts (MS_BIND), see Documentation/filesystems/shared-
subtree.rst.
Restrictions on mount namespaces
Note the following points with respect to mount namespaces:
[1] Each mount namespace has an owner user namespace. As explained
above, when a new mount namespace is created, its mount list is
initialized as a copy of the mount list of another mount namespace.
If the new namespace and the namespace from which the mount list
was copied are owned by different user namespaces, then the new
mount namespace is considered less privileged.
[2] When creating a less privileged mount namespace, shared mounts are
reduced to slave mounts. This ensures that mappings performed in
less privileged mount namespaces will not propagate to more privi-
leged mount namespaces.
[3] Mounts that come as a single unit from a more privileged mount
namespace are locked together and may not be separated in a less
privileged mount namespace. (The unshare(2) CLONE_NEWNS operation
brings across all of the mounts from the original mount namespace
as a single unit, and recursive mounts that propagate between mount
namespaces propagate as a single unit.)
In this context, "may not be separated" means that the mounts are
locked so that they may not be individually unmounted. Consider
the following example:
$ sudo sh
# mount --bind /dev/null /etc/shadow
# cat /etc/shadow # Produces no output
The above steps, performed in a more privileged mount namespace,
have created a bind mount that obscures the contents of the shadow
password file, /etc/shadow. For security reasons, it should not be
possible to umount(2) that mount in a less privileged mount name-
space, since that would reveal the contents of /etc/shadow.
Suppose we now create a new mount namespace owned by a new user
namespace. The new mount namespace will inherit copies of all of
the mounts from the previous mount namespace. However, those
mounts will be locked because the new mount namespace is less priv-
ileged. Consequently, an attempt to umount(2) the mount fails as
show in the following step:
# unshare --user --map-root-user --mount \
strace -o /tmp/log \
umount /mnt/dir
umount: /etc/shadow: not mounted.
# grep '^umount' /tmp/log
umount2("/etc/shadow", 0) = -1 EINVAL (Invalid argument)
The error message from mount(8) is a little confusing, but the
strace(1) output reveals that the underlying umount2(2) system call
failed with the error EINVAL, which is the error that the kernel
returns to indicate that the mount is locked.
Note, however, that it is possible to stack (and unstack) a mount
on top of one of the inherited locked mounts in a less privileged
mount namespace:
# echo 'aaaaa' > /tmp/a # File to mount onto /etc/shadow
# unshare --user --map-root-user --mount \
sh -c 'mount --bind /tmp/a /etc/shadow; cat /etc/shadow'
aaaaa
# umount /etc/shadow
The final umount(8) command above, which is performed in the ini-
tial mount namespace, makes the original /etc/shadow file once more
visible in that namespace.
[4] Following on from point [3], note that it is possible to umount(2)
an entire subtree of mounts that propagated as a unit into a less
privileged mount namespace, as illustrated in the following exam-
ple.
First, we create new user and mount namespaces using unshare(1).
In the new mount namespace, the propagation type of all mounts is
set to private. We then create a shared bind mount at /mnt, and a
small hierarchy of mounts underneath that mount.
$ PS1='ns1# ' sudo unshare --user --map-root-user \
--mount --propagation private bash
ns1# echo $$ # We need the PID of this shell later
778501
ns1# mount --make-shared --bind /mnt /mnt
ns1# mkdir /mnt/x
ns1# mount --make-private -t tmpfs none /mnt/x
ns1# mkdir /mnt/x/y
ns1# mount --make-private -t tmpfs none /mnt/x/y
ns1# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
986 83 8:5 /mnt /mnt rw,relatime shared:344
989 986 0:56 / /mnt/x rw,relatime
990 989 0:57 / /mnt/x/y rw,relatime
Continuing in the same shell session, we then create a second shell
in a new user namespace and a new (less privileged) mount namespace
and check the state of the propagated mounts rooted at /mnt.
ns1# PS1='ns2# ' unshare --user --map-root-user \
--mount --propagation unchanged bash
ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
1239 1204 8:5 /mnt /mnt rw,relatime master:344
1240 1239 0:56 / /mnt/x rw,relatime
1241 1240 0:57 / /mnt/x/y rw,relatime
Of note in the above output is that the propagation type of the
mount /mnt has been reduced to slave, as explained in point [2].
This means that submount events will propagate from the master /mnt
in "ns1", but propagation will not occur in the opposite direction.
From a separate terminal window, we then use nsenter(1) to enter
the mount and user namespaces corresponding to "ns1". In that ter-
minal window, we then recursively bind mount /mnt/x at the location
/mnt/ppp.
$ PS1='ns3# ' sudo nsenter -t 778501 --user --mount
ns3# mount --rbind --make-private /mnt/x /mnt/ppp
ns3# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
986 83 8:5 /mnt /mnt rw,relatime shared:344
989 986 0:56 / /mnt/x rw,relatime
990 989 0:57 / /mnt/x/y rw,relatime
1242 986 0:56 / /mnt/ppp rw,relatime
1243 1242 0:57 / /mnt/ppp/y rw,relatime shared:518
Because the propagation type of the parent mount, /mnt, was shared,
the recursive bind mount propagated a small subtree of mounts under
the slave mount /mnt into "ns2", as can be verified by executing
the following command in that shell session:
ns2# grep /mnt /proc/self/mountinfo | sed 's/ - .*//'
1239 1204 8:5 /mnt /mnt rw,relatime master:344
1240 1239 0:56 / /mnt/x rw,relatime
1241 1240 0:57 / /mnt/x/y rw,relatime
1244 1239 0:56 / /mnt/ppp rw,relatime
1245 1244 0:57 / /mnt/ppp/y rw,relatime master:518
While it is not possible to umount(2) a part of the propagated sub-
tree (/mnt/ppp/y) in "ns2", it is possible to umount(2) the entire
subtree, as shown by the following commands:
ns2# umount /mnt/ppp/y
umount: /mnt/ppp/y: not mounted.
ns2# umount -l /mnt/ppp | sed 's/ - .*//' # Succeeds...
ns2# grep /mnt /proc/self/mountinfo
1239 1204 8:5 /mnt /mnt rw,relatime master:344
1240 1239 0:56 / /mnt/x rw,relatime
1241 1240 0:57 / /mnt/x/y rw,relatime
[5] The mount(2) flags MS_RDONLY, MS_NOSUID, MS_NOEXEC, and the "atime"
flags (MS_NOATIME, MS_NODIRATIME, MS_RELATIME) settings become
locked when propagated from a more privileged to a less privileged
mount namespace, and may not be changed in the less privileged
mount namespace.
This point is illustrated in the following example where, in a more
privileged mount namespace, we create a bind mount that is marked
as read-only. For security reasons, it should not be possible to
make the mount writable in a less privileged mount namespace, and
indeed the kernel prevents this:
$ sudo mkdir /mnt/dir
$ sudo mount --bind -o ro /some/path /mnt/dir
$ sudo unshare --user --map-root-user --mount \
mount -o remount,rw /mnt/dir
mount: /mnt/dir: permission denied.
[6] A file or directory that is a mount point in one namespace that is
not a mount point in another namespace, may be renamed, unlinked,
or removed (rmdir(2)) in the mount namespace in which it is not a
mount point (subject to the usual permission checks). Conse-
quently, the mount point is removed in the mount namespace where it
was a mount point.
Previously (before Linux 3.18), attempting to unlink, rename, or
remove a file or directory that was a mount point in another mount
namespace would result in the error EBUSY. That behavior had tech-
nical problems of enforcement (e.g., for NFS) and permitted denial-
of-service attacks against more privileged users (i.e., preventing
individual files from being updated by bind mounting on top of
them).
EXAMPLES
See pivot_root(2).
SEE ALSO
unshare(1), clone(2), mount(2), mount_setattr(2), pivot_root(2),
setns(2), umount(2), unshare(2), proc(5), namespaces(7), user_name-
spaces(7), findmnt(8), mount(8), pam_namespace(8), pivot_root(8),
umount(8)
Documentation/filesystems/sharedsubtree.rst in the kernel source tree.
Linux man-pages 6.04 2023-03-30 mount_namespaces(7)
Want to link to this manual page? Use this URL:
<https://star2.abcm.com/cgi-bin/bsdi-man?query=mount_namespaces&sektion=7&manpath=>