From d7eeac1913ff86a17f891cb4b73f03d4b94907d0 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Tue, 12 Mar 2013 15:35:59 -0700 Subject: cgroup: hold cgroup_mutex before calling css_offline() cpuset no longer nests cgroup_mutex inside cpu_hotplug lock, so we don't have to release cgroup_mutex before calling css_offline(). Signed-off-by: Li Zefan Signed-off-by: Tejun Heo --- Documentation/cgroups/cgroups.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index bcf1a00b06a1..0028e888828c 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -580,6 +580,7 @@ propagation along the hierarchy. See the comment on cgroup_for_each_descendant_pre() for details. void css_offline(struct cgroup *cgrp); +(cgroup_mutex held by caller) This is the counterpart of css_online() and called iff css_online() has succeeded on @cgrp. This signifies the beginning of the end of -- cgit v1.2.3 From bd2953ebbb533aeda9b86c82a53d5197a9a38f1b Mon Sep 17 00:00:00 2001 From: Aristeu Rozanski Date: Fri, 15 Feb 2013 11:55:47 -0500 Subject: devcg: propagate local changes down the hierarchy This patch makes exception changes to propagate down in hierarchy respecting when possible local exceptions. New exceptions allowing additional access to devices won't be propagated, but it'll be possible to add an exception to access all of part of the newly allowed device(s). New exceptions disallowing access to devices will be propagated down and the local group's exceptions will be revalidated for the new situation. Example: A / \ B group behavior exceptions A allow "b 8:* rwm", "c 116:1 rw" B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm" If a new exception is added to group A: # echo "c 116:* r" > A/devices.deny it'll propagate down and after revalidating B's local exceptions, the exception "c 116:2 rwm" will be removed. In case parent's exceptions change and local exceptions are not allowed anymore, they'll be deleted. v7: - do not allow behavior change when the cgroup has children - update documentation v6: fixed issues pointed by Serge Hallyn - only copy parent's exceptions while propagating behavior if the local behavior is different - while propagating exceptions, do not clear and copy parent's: it'd be against the premise we don't propagate access to more devices v5: fixed issues pointed by Serge Hallyn - updated documentation - not propagating when an exception is written to devices.allow - when propagating a new behavior, clean the local exceptions list if they're for a different behavior v4: fixed issues pointed by Tejun Heo - separated function to walk the tree and collect valid propagation targets v3: fixed issues pointed by Tejun Heo - update documentation - move css_online/css_offline changes to a new patch - use cgroup_for_each_descendant_pre() instead of own descendant walk - move exception_copy rework to a separared patch - move exception_clean rework to a separated patch v2: fixed issues pointed by Tejun Heo - instead of keeping the local settings that won't apply anymore, remove them Cc: Tejun Heo Cc: Serge Hallyn Signed-off-by: Aristeu Rozanski Signed-off-by: Tejun Heo --- Documentation/cgroups/devices.txt | 70 ++++++++++++++++++- security/device_cgroup.c | 139 ++++++++++++++++++++++++++++++++++++-- 2 files changed, 199 insertions(+), 10 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt index 16624a7f8222..3c1095ca02ea 100644 --- a/Documentation/cgroups/devices.txt +++ b/Documentation/cgroups/devices.txt @@ -13,9 +13,7 @@ either an integer or * for all. Access is a composition of r The root device cgroup starts with rwm to 'all'. A child device cgroup gets a copy of the parent. Administrators can then remove devices from the whitelist or add new entries. A child cgroup can -never receive a device access which is denied by its parent. However -when a device access is removed from a parent it will not also be -removed from the child(ren). +never receive a device access which is denied by its parent. 2. User Interface @@ -50,3 +48,69 @@ task to a new cgroup. (Again we'll probably want to change that). A cgroup may not be granted more permissions than the cgroup's parent has. + +4. Hierarchy + +device cgroups maintain hierarchy by making sure a cgroup never has more +access permissions than its parent. Every time an entry is written to +a cgroup's devices.deny file, all its children will have that entry removed +from their whitelist and all the locally set whitelist entries will be +re-evaluated. In case one of the locally set whitelist entries would provide +more access than the cgroup's parent, it'll be removed from the whitelist. + +Example: + A + / \ + B + + group behavior exceptions + A allow "b 8:* rwm", "c 116:1 rw" + B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm" + +If a device is denied in group A: + # echo "c 116:* r" > A/devices.deny +it'll propagate down and after revalidating B's entries, the whitelist entry +"c 116:2 rwm" will be removed: + + group whitelist entries denied devices + A all "b 8:* rwm", "c 116:* rw" + B "c 1:3 rwm", "b 3:* rwm" all the rest + +In case parent's exceptions change and local exceptions are not allowed +anymore, they'll be deleted. + +Notice that new whitelist entries will not be propagated: + A + / \ + B + + group whitelist entries denied devices + A "c 1:3 rwm", "c 1:5 r" all the rest + B "c 1:3 rwm", "c 1:5 r" all the rest + +when adding "c *:3 rwm": + # echo "c *:3 rwm" >A/devices.allow + +the result: + group whitelist entries denied devices + A "c *:3 rwm", "c 1:5 r" all the rest + B "c 1:3 rwm", "c 1:5 r" all the rest + +but now it'll be possible to add new entries to B: + # echo "c 2:3 rwm" >B/devices.allow + # echo "c 50:3 r" >B/devices.allow +or even + # echo "c *:3 rwm" >B/devices.allow + +Allowing or denying all by writing 'a' to devices.allow or devices.deny will +not be possible once the device cgroups has children. + +4.1 Hierarchy (internal implementation) + +device cgroups is implemented internally using a behavior (ALLOW, DENY) and a +list of exceptions. The internal state is controlled using the same user +interface to preserve compatibility with the previous whitelist-only +implementation. Removal or addition of exceptions that will reduce the access +to devices will be propagated down the hierarchy. +For every propagated exception, the effective rules will be re-evaluated based +on current parent's access rules. diff --git a/security/device_cgroup.c b/security/device_cgroup.c index 16c9e1069be6..221967d4690c 100644 --- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -49,6 +49,8 @@ struct dev_cgroup { struct cgroup_subsys_state css; struct list_head exceptions; enum devcg_behavior behavior; + /* temporary list for pending propagation operations */ + struct list_head propagate_pending; }; static inline struct dev_cgroup *css_to_devcgroup(struct cgroup_subsys_state *s) @@ -185,6 +187,11 @@ static void dev_exception_clean(struct dev_cgroup *dev_cgroup) __dev_exception_clean(dev_cgroup); } +static inline bool is_devcg_online(const struct dev_cgroup *devcg) +{ + return (devcg->behavior != DEVCG_DEFAULT_NONE); +} + /** * devcgroup_online - initializes devcgroup's behavior and exceptions based on * parent's @@ -235,6 +242,7 @@ static struct cgroup_subsys_state *devcgroup_css_alloc(struct cgroup *cgroup) if (!dev_cgroup) return ERR_PTR(-ENOMEM); INIT_LIST_HEAD(&dev_cgroup->exceptions); + INIT_LIST_HEAD(&dev_cgroup->propagate_pending); dev_cgroup->behavior = DEVCG_DEFAULT_NONE; parent_cgroup = cgroup->parent; @@ -413,6 +421,111 @@ static inline int may_allow_all(struct dev_cgroup *parent) return parent->behavior == DEVCG_DEFAULT_ALLOW; } +/** + * revalidate_active_exceptions - walks through the active exception list and + * revalidates the exceptions based on parent's + * behavior and exceptions. The exceptions that + * are no longer valid will be removed. + * Called with devcgroup_mutex held. + * @devcg: cgroup which exceptions will be checked + * + * This is one of the three key functions for hierarchy implementation. + * This function is responsible for re-evaluating all the cgroup's active + * exceptions due to a parent's exception change. + * Refer to Documentation/cgroups/devices.txt for more details. + */ +static void revalidate_active_exceptions(struct dev_cgroup *devcg) +{ + struct dev_exception_item *ex; + struct list_head *this, *tmp; + + list_for_each_safe(this, tmp, &devcg->exceptions) { + ex = container_of(this, struct dev_exception_item, list); + if (!parent_has_perm(devcg, ex)) + dev_exception_rm(devcg, ex); + } +} + +/** + * get_online_devcg - walks the cgroup tree and fills a list with the online + * groups + * @root: cgroup used as starting point + * @online: list that will be filled with online groups + * + * Must be called with devcgroup_mutex held. Grabs RCU lock. + * Because devcgroup_mutex is held, no devcg will become online or offline + * during the tree walk (see devcgroup_online, devcgroup_offline) + * A separated list is needed because propagate_behavior() and + * propagate_exception() need to allocate memory and can block. + */ +static void get_online_devcg(struct cgroup *root, struct list_head *online) +{ + struct cgroup *pos; + struct dev_cgroup *devcg; + + lockdep_assert_held(&devcgroup_mutex); + + rcu_read_lock(); + cgroup_for_each_descendant_pre(pos, root) { + devcg = cgroup_to_devcgroup(pos); + if (is_devcg_online(devcg)) + list_add_tail(&devcg->propagate_pending, online); + } + rcu_read_unlock(); +} + +/** + * propagate_exception - propagates a new exception to the children + * @devcg_root: device cgroup that added a new exception + * @ex: new exception to be propagated + * + * returns: 0 in case of success, != 0 in case of error + */ +static int propagate_exception(struct dev_cgroup *devcg_root, + struct dev_exception_item *ex) +{ + struct cgroup *root = devcg_root->css.cgroup; + struct dev_cgroup *devcg, *parent, *tmp; + int rc = 0; + LIST_HEAD(pending); + + get_online_devcg(root, &pending); + + list_for_each_entry_safe(devcg, tmp, &pending, propagate_pending) { + parent = cgroup_to_devcgroup(devcg->css.cgroup->parent); + + /* + * in case both root's behavior and devcg is allow, a new + * restriction means adding to the exception list + */ + if (devcg_root->behavior == DEVCG_DEFAULT_ALLOW && + devcg->behavior == DEVCG_DEFAULT_ALLOW) { + rc = dev_exception_add(devcg, ex); + if (rc) + break; + } else { + /* + * in the other possible cases: + * root's behavior: allow, devcg's: deny + * root's behavior: deny, devcg's: deny + * the exception will be removed + */ + dev_exception_rm(devcg, ex); + } + revalidate_active_exceptions(devcg); + + list_del_init(&devcg->propagate_pending); + } + return rc; +} + +static inline bool has_children(struct dev_cgroup *devcgroup) +{ + struct cgroup *cgrp = devcgroup->css.cgroup; + + return !list_empty(&cgrp->children); +} + /* * Modify the exception list using allow/deny rules. * CAP_SYS_ADMIN is needed for this. It's at least separate from CAP_MKNOD @@ -449,6 +562,9 @@ static int devcgroup_update_access(struct dev_cgroup *devcgroup, case 'a': switch (filetype) { case DEVCG_ALLOW: + if (has_children(devcgroup)) + return -EINVAL; + if (!may_allow_all(parent)) return -EPERM; dev_exception_clean(devcgroup); @@ -462,6 +578,9 @@ static int devcgroup_update_access(struct dev_cgroup *devcgroup, return rc; break; case DEVCG_DENY: + if (has_children(devcgroup)) + return -EINVAL; + dev_exception_clean(devcgroup); devcgroup->behavior = DEVCG_DEFAULT_DENY; break; @@ -556,22 +675,28 @@ static int devcgroup_update_access(struct dev_cgroup *devcgroup, dev_exception_rm(devcgroup, &ex); return 0; } - return dev_exception_add(devcgroup, &ex); + rc = dev_exception_add(devcgroup, &ex); + break; case DEVCG_DENY: /* * If the default policy is to deny by default, try to remove * an matching exception instead. And be silent about it: we * don't want to break compatibility */ - if (devcgroup->behavior == DEVCG_DEFAULT_DENY) { + if (devcgroup->behavior == DEVCG_DEFAULT_DENY) dev_exception_rm(devcgroup, &ex); - return 0; - } - return dev_exception_add(devcgroup, &ex); + else + rc = dev_exception_add(devcgroup, &ex); + + if (rc) + break; + /* we only propagate new restrictions */ + rc = propagate_exception(devcgroup, &ex); + break; default: - return -EINVAL; + rc = -EINVAL; } - return 0; + return rc; } static int devcgroup_access_write(struct cgroup *cgrp, struct cftype *cft, -- cgit v1.2.3 From 1ae65ae92d77542f81c68269bc2f15bcf1ee4d6a Mon Sep 17 00:00:00 2001 From: Rami Rosen Date: Wed, 3 Apr 2013 20:32:04 +0300 Subject: cgroups: Documentation/cgroup/cgroup.txt - a trivial fix. This trivial patch removes a word which appears twice in Documentation/cgroup/cgroup.txt. Signed-off-by: Rami Rosen Signed-off-by: Tejun Heo --- Documentation/cgroups/cgroups.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 0028e888828c..638bf17ff869 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -442,7 +442,7 @@ You can attach the current shell task by echoing 0: You can use the cgroup.procs file instead of the tasks file to move all threads in a threadgroup at once. Echoing the PID of any task in a threadgroup to cgroup.procs causes all tasks in that threadgroup to be -be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks +attached to the cgroup. Writing 0 to cgroup.procs moves all tasks in the writing task's threadgroup. Note: Since every task is always a member of exactly one cgroup in each -- cgit v1.2.3 From 84cfb6ab484b442d5115eb3baf9db7d74a3ea626 Mon Sep 17 00:00:00 2001 From: Rami Rosen Date: Wed, 10 Apr 2013 14:41:17 +0300 Subject: cgroup: remove bind() method from cgroup_subsys. The bind() method of cgroup_subsys is not used in any of the controllers (cpuset, freezer, blkio, net_cls, memcg, net_prio, devices, perf, hugetlb, cpu and cpuacct) tj: Removed the entry on ->bind() from Documentation/cgroups/cgroups.txt. Also updated a couple paragraphs which were suggesting that dynamic re-binding may be implemented. It's not gonna. Signed-off-by: Rami Rosen Signed-off-by: Tejun Heo --- Documentation/cgroups/cgroups.txt | 20 +++++--------------- include/linux/cgroup.h | 2 -- kernel/cgroup.c | 4 ---- 3 files changed, 5 insertions(+), 21 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 638bf17ff869..2b51e12ce178 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -211,10 +211,9 @@ matches, and any of the requested subsystems are in use in an existing hierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy is activated, associated with the requested subsystems. -It's not currently possible to bind a new subsystem to an active -cgroup hierarchy, or to unbind a subsystem from an active cgroup -hierarchy. This may be possible in future, but is fraught with nasty -error-recovery issues. +It's not possible to bind a new subsystem to an active cgroup +hierarchy, or to unbind a subsystem from an active cgroup +hierarchy. When a cgroup filesystem is unmounted, if there are any child cgroups created below the top-level cgroup, that hierarchy @@ -382,10 +381,8 @@ To Specify a hierarchy's release_agent: Note that specifying 'release_agent' more than once will return failure. -Note that changing the set of subsystems is currently only supported -when the hierarchy consists of a single (root) cgroup. Supporting -the ability to arbitrarily bind/unbind subsystems from an existing -cgroup hierarchy is intended to be implemented in the future. +Note that changing the set of subsystems is only supported when the +hierarchy consists of a single (root) cgroup. Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 @@ -643,13 +640,6 @@ void exit(struct task_struct *task) Called during task exit. -void bind(struct cgroup *root) -(cgroup_mutex held by caller) - -Called when a cgroup subsystem is rebound to a different hierarchy -and root cgroup. Currently this will only involve movement between -the default hierarchy (which never has sub-cgroups) and a hierarchy -that is being created/destroyed (and hence has no sub-cgroups). 4. Extended attribute usage =========================== diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 515927eebb37..92acf8601ae0 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -483,8 +483,6 @@ struct cgroup_subsys { void (*fork)(struct task_struct *task); void (*exit)(struct cgroup *cgrp, struct cgroup *old_cgrp, struct task_struct *task); - void (*bind)(struct cgroup *root); - int subsys_id; int active; int disabled; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index ba3e24a76dae..fd38e1cfacca 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1064,16 +1064,12 @@ static int rebind_subsystems(struct cgroupfs_root *root, cgrp->subsys[i]->cgroup = cgrp; list_move(&ss->sibling, &root->subsys_list); ss->root = root; - if (ss->bind) - ss->bind(cgrp); /* refcount was already taken, and we're keeping it */ } else if (bit & removed_mask) { /* We're removing this subsystem */ BUG_ON(ss == NULL); BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]); BUG_ON(cgrp->subsys[i]->cgroup != cgrp); - if (ss->bind) - ss->bind(dummytop); dummytop->subsys[i]->cgroup = dummytop; cgrp->subsys[i] = NULL; subsys[i]->root = &rootnode; -- cgit v1.2.3 From 26d5bbe5ba2073fc7ef9e69a55543b2376f5bad0 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 12 Apr 2013 10:29:04 -0700 Subject: Revert "cgroup: remove bind() method from cgroup_subsys." This reverts commit 84cfb6ab484b442d5115eb3baf9db7d74a3ea626. There are scheduled changes which make use of the removed callback. Signed-off-by: Tejun Heo Cc: Rami Rosen Cc: Li Zefan --- Documentation/cgroups/cgroups.txt | 20 +++++++++++++++----- include/linux/cgroup.h | 2 ++ kernel/cgroup.c | 4 ++++ 3 files changed, 21 insertions(+), 5 deletions(-) (limited to 'Documentation') diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 2b51e12ce178..638bf17ff869 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -211,9 +211,10 @@ matches, and any of the requested subsystems are in use in an existing hierarchy, the mount will fail with -EBUSY. Otherwise, a new hierarchy is activated, associated with the requested subsystems. -It's not possible to bind a new subsystem to an active cgroup -hierarchy, or to unbind a subsystem from an active cgroup -hierarchy. +It's not currently possible to bind a new subsystem to an active +cgroup hierarchy, or to unbind a subsystem from an active cgroup +hierarchy. This may be possible in future, but is fraught with nasty +error-recovery issues. When a cgroup filesystem is unmounted, if there are any child cgroups created below the top-level cgroup, that hierarchy @@ -381,8 +382,10 @@ To Specify a hierarchy's release_agent: Note that specifying 'release_agent' more than once will return failure. -Note that changing the set of subsystems is only supported when the -hierarchy consists of a single (root) cgroup. +Note that changing the set of subsystems is currently only supported +when the hierarchy consists of a single (root) cgroup. Supporting +the ability to arbitrarily bind/unbind subsystems from an existing +cgroup hierarchy is intended to be implemented in the future. Then under /sys/fs/cgroup/rg1 you can find a tree that corresponds to the tree of the cgroups in the system. For instance, /sys/fs/cgroup/rg1 @@ -640,6 +643,13 @@ void exit(struct task_struct *task) Called during task exit. +void bind(struct cgroup *root) +(cgroup_mutex held by caller) + +Called when a cgroup subsystem is rebound to a different hierarchy +and root cgroup. Currently this will only involve movement between +the default hierarchy (which never has sub-cgroups) and a hierarchy +that is being created/destroyed (and hence has no sub-cgroups). 4. Extended attribute usage =========================== diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index a2b9d4b13369..45aee0fc6b98 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -484,6 +484,8 @@ struct cgroup_subsys { void (*fork)(struct task_struct *task); void (*exit)(struct cgroup *cgrp, struct cgroup *old_cgrp, struct task_struct *task); + void (*bind)(struct cgroup *root); + int subsys_id; int active; int disabled; diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 7bf3ce09c50c..678a22c75fdb 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1091,12 +1091,16 @@ static int rebind_subsystems(struct cgroupfs_root *root, cgrp->subsys[i]->cgroup = cgrp; list_move(&ss->sibling, &root->subsys_list); ss->root = root; + if (ss->bind) + ss->bind(cgrp); /* refcount was already taken, and we're keeping it */ } else if (bit & removed_mask) { /* We're removing this subsystem */ BUG_ON(ss == NULL); BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]); BUG_ON(cgrp->subsys[i]->cgroup != cgrp); + if (ss->bind) + ss->bind(dummytop); dummytop->subsys[i]->cgroup = dummytop; cgrp->subsys[i] = NULL; subsys[i]->root = &rootnode; -- cgit v1.2.3