From a1d220d9dafa8d76ba60a784a1016c3134e6a1e8 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Fri, 19 Jul 2024 13:41:52 +0200 Subject: nsfs: iterate through mount namespaces It is already possible to list mounts in other mount namespaces and to retrieve namespace file descriptors without having to go through procfs by deriving them from pidfds. Augment these abilities by adding the ability to retrieve information about a mount namespace via NS_MNT_GET_INFO. This will return the mount namespace id and the number of mounts currently in the mount namespace. The number of mounts can be used to size the buffer that needs to be used for listmount() and is in general useful without having to actually iterate through all the mounts. The structure is extensible. And add the ability to iterate through all mount namespaces over which the caller holds privilege returning the file descriptor for the next or previous mount namespace. To retrieve a mount namespace the caller must be privileged wrt to it's owning user namespace. This means that PID 1 on the host can list all mounts in all mount namespaces or that a container can list all mounts of its nested containers. Optionally pass a structure for NS_MNT_GET_INFO with NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace in one go. Both ioctls can be implemented for other namespace types easily. Together with recent api additions this means one can iterate through all mounts in all mount namespaces without ever touching procfs. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-5-834113cab0d2@kernel.org Reviewed-by: Josef Bacik Reviewed-by: Jeff Layton Signed-off-by: Christian Brauner --- fs/namespace.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) (limited to 'fs/namespace.c') diff --git a/fs/namespace.c b/fs/namespace.c index 1a0b9e3089a2..3dabe49f7376 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2060,14 +2060,41 @@ static bool is_mnt_ns_file(struct dentry *dentry) dentry->d_fsdata == &mntns_operations; } -static struct mnt_namespace *to_mnt_ns(struct ns_common *ns) +struct ns_common *from_mnt_ns(struct mnt_namespace *mnt) { - return container_of(ns, struct mnt_namespace, ns); + return &mnt->ns; } -struct ns_common *from_mnt_ns(struct mnt_namespace *mnt) +struct mnt_namespace *__lookup_next_mnt_ns(struct mnt_namespace *mntns, bool previous) { - return &mnt->ns; + guard(read_lock)(&mnt_ns_tree_lock); + for (;;) { + struct rb_node *node; + + if (previous) + node = rb_prev(&mntns->mnt_ns_tree_node); + else + node = rb_next(&mntns->mnt_ns_tree_node); + if (!node) + return ERR_PTR(-ENOENT); + + mntns = node_to_mnt_ns(node); + node = &mntns->mnt_ns_tree_node; + + if (!ns_capable_noaudit(mntns->user_ns, CAP_SYS_ADMIN)) + continue; + + /* + * Holding mnt_ns_tree_lock prevents the mount namespace from + * being freed but it may well be on it's deathbed. We want an + * active reference, not just a passive one here as we're + * persisting the mount namespace. + */ + if (!refcount_inc_not_zero(&mntns->ns.count)) + continue; + + return mntns; + } } static bool mnt_ns_loop(struct dentry *dentry) -- cgit v1.2.3