summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cgroup: remove subsystem files when remounting cgroupGao feng2012-12-031-2/+9
| | | | | | | | | | | | | | | | | | cgroup_clear_directroy is called by cgroup_d_remove_dir and cgroup_remount. when we call cgroup_remount to remount the cgroup,the subsystem may be unlinked from cgroupfs_root->subsys_list in rebind_subsystem,this subsystem's files will not be removed in cgroup_clear_directroy. And the system will panic when we try to access these files. this patch removes subsystems's files before rebind_subsystems, if rebind_subsystems failed, repopulate these removed files. With help from Tejun. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: use cgroup_addrm_files() in cgroup_clear_directory()Gao feng2012-11-301-1/+3
| | | | | | | | | | | | | | | cgroup_clear_directory() incorrectly invokes cgroup_rm_file() on each cftset of the target subsystems, which only removes the first file of each set. This leaves dangling files after subsystems are removed from a cgroup root via remount. Use cgroup_addrm_files() to remove all files of target subsystems. tj: Move cgroup_addrm_files() prototype decl upwards next to other global declarations. Commit message updated. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: warn about broken hierarchies only after css_onlineGlauber Costa2012-11-301-9/+9
| | | | | | | | | | | | | If everything goes right, it shouldn't really matter if we are spitting this warning after css_alloc or css_online. If we fail between then, there are some ill cases where we would previously see the message and now we won't (like if the files fail to be created). I believe it really shouldn't matter: this message is intended in spirit to be shown when creation succeeds, but with insane settings. Signed-off-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: list_del_init() on removed eventsGreg Thelen2012-11-281-2/+2
| | | | | | | | | Use list_del_init() rather than list_del() to remove events from cgrp->event_list. No functional change. This is just defensive coding. Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: fix lockdep warning for event_controlGreg Thelen2012-11-281-3/+8
| | | | | | | | | | | | The cgroup_event_wake() function is called with the wait queue head locked and it takes cgrp->event_list_lock. However, in cgroup_rmdir() remove_wait_queue() was being called after taking cgrp->event_list_lock. Correct the lock ordering by using a temporary list to obtain the event list to remove from the wait queue. Signed-off-by: Greg Thelen <gthelen@google.com> Signed-off-by: Aaron Durbin <adurbin@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: move list add after list head initilizationLi Zhong2012-11-281-1/+1
| | | | | | | | | | | 2243076ad1 ("cgroup: initialize cgrp->allcg_node in init_cgroup_housekeeping()") initializes cgrp->allcg_node in init_cgroup_housekeeping(). Then in init_cgroup_root(), we should call init_cgroup_housekeeping() before adding it to &root->allcg_list; otherwise, we are initializing an entry already in a list. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* netprio_cgroup: allow nesting and inherit config on cgroup creationTejun Heo2012-11-222-18/+26
| | | | | | | | | | | | | | | | Inherit netprio configuration from ->css_online(), allow nesting and remove .broken_hierarchy marking. This makes netprio_cgroup's behavior match netcls_cgroup's. Note that this patch changes userland-visible behavior. Nesting is allowed and the first level cgroups below the root cgroup behave differently - they inherit priorities from the root cgroup on creation instead of starting with 0. This is unfortunate but not doing so is much crazier. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* netprio_cgroup: implement netprio[_set]_prio() helpersTejun Heo2012-11-221-22/+50
| | | | | | | | | | | Introduce two helpers - netprio_prio() and netprio_set_prio() - which hide the details of priomap access and expansion. This will help implementing hierarchy support. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidxTejun Heo2012-11-222-61/+15
| | | | | | | | | | | | | | | | With priomap expansion no longer depending on knowing max id allocated, netprio_cgroup can use cgroup->id insted of cs->prioidx. Drop prioidx alloc/free logic and convert all uses to cgroup->id. * In cgrp_css_alloc(), parent->id test is moved above @cs allocation to simplify error path. * In cgrp_css_free(), @cs assignment is made initialization. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* netprio_cgroup: reimplement priomap expansionTejun Heo2012-11-221-23/+33
| | | | | | | | | | | | | | | | | | | | netprio kept track of the highest prioidx allocated and resized priomaps accordingly when necessary. This makes it necessary to keep track of prioidx allocation and may end up resizing on every new prioidx. Update extend_netdev_table() such that it takes @target_idx which the priomap should be able to accomodate. If the priomap is large enough, nothing happens; otherwise, the size is doubled until @target_idx can be accomodated. This makes max_prioidx and write_update_netdev_table() unnecessary. write_priomap() now calls extend_netdev_table() directly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* netprio_cgroup: shorten variable names in extend_netdev_table()Tejun Heo2012-11-221-12/+11
| | | | | | | | | | | | The function is about to go through a rewrite. In preparation, shorten the variable names so that we don't repeat "priomap" so often. This patch is cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* netprio_cgroup: simplify write_priomap()Tejun Heo2012-11-221-45/+11
| | | | | | | | | sscanf() doesn't bite. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* netcls_cgroup: move config inheritance to ->css_online() and remove ↵Tejun Heo2012-11-221-12/+8
| | | | | | | | | | | | | | | | | | | | .broken_hierarchy marking It turns out that we'll have to live with attributes which are inherited at cgroup creation time but not affected by further updates to the parent afterwards - such attributes are already in wide use e.g. for cpuset. So, there's nothing to do for netcls_cgroup for hierarchy support. Its current behavior - inherit only during creation - is good enough. Move config inheriting from ->css_alloc() to ->css_online() for consistency, which doesn't change behavior at all, and remove .broken_hierarchy marking. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-and-Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: David S. Miller <davem@davemloft.net>
* cgroup: remove obsolete guarantee from cgroup_task_migrate.Tao Ma2012-11-202-7/+5
| | | | | | | | | | | 'guarantee' is already removed from cgroup_task_migrate, so remove the corresponding comments. Some other typos in cgroup are also changed. Cc: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: add cgroup->idTejun Heo2012-11-192-1/+16
| | | | | | | | | | | | | | | With the introduction of generic cgroup hierarchy iterators, css_id is being phased out. It was unnecessarily complex, id'ing the wrong thing (cgroups need IDs, not CSSes) and has other oddities like not being available at ->css_alloc(). This patch adds cgroup->id, which is a simple per-hierarchy ida-allocated ID which is assigned before ->css_alloc() and released after ->css_free(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Neil Horman <nhorman@tuxdriver.com>
* cgroup, cpuset: remove cgroup_subsys->post_clone()Tejun Heo2012-11-194-57/+36
| | | | | | | | | | | | | | | | | | | | | Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now that clone_children is cpuset specific, there's no reason to have this rather odd option activation mechanism in cgroup core. cpuset can check the flag from its ->css_allocate() and take the necessary action. Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and remove cgroup_subsys->post_clone(). Loosely based on Glauber's "generalize post_clone into post_create" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Original-patch-by: Glauber Costa <glommer@parallels.com> Original-patch: <1351686554-22592-2-git-send-email-glommer@parallels.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/Tejun Heo2012-11-193-23/+19
| | | | | | | | | | | | clone_children is only meaningful for cpuset and will stay that way. Rename the flag to reflect that and update documentation. Also, drop clone_children() wrapper in cgroup.c. The thin wrapper is used only a few times and one of them will go away soon. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: rename ->create/post_create/pre_destroy/destroy() to ↵Tejun Heo2012-11-1913-121/+132
| | | | | | | | | | ->css_alloc/online/offline/free() Rename cgroup_subsys css lifetime related callbacks to better describe what their roles are. Also, update documentation. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: allow ->post_create() to failTejun Heo2012-11-193-12/+23
| | | | | | | | | | | There could be cases where controllers want to do initialization operations which may fail from ->post_create(). This patch makes ->post_create() return -errno to indicate failure and online_css() relay such failures. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: update cgroup_create() failure pathTejun Heo2012-11-191-7/+14
| | | | | | | | | | | | cgroup_create() was ignoring failure of cgroupfs files. Update it such that, if file creation fails, it rolls back by calling cgroup_destroy_locked() and returns failure. Note that error out goto labels are renamed. The labels are a bit confusing but will become better w/ later cgroup operation renames. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: use mutex_trylock() when grabbing i_mutex of a new cgroup directoryTejun Heo2012-11-191-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All cgroup directory i_mutexes nest outside cgroup_mutex; however, new directory creation is a special case. A new cgroup directory is created while holding cgroup_mutex. Populating the new directory requires both the new directory's i_mutex and cgroup_mutex. Because all directory i_mutexes nest outside cgroup_mutex, grabbing both requires releasing cgroup_mutex first, which isn't a good idea as the new cgroup isn't yet ready to be manipulated by other cgroup opreations. This is worked around by grabbing the new directory's i_mutex while holding cgroup_mutex before making it visible. As there's no other user at that point, grabbing the i_mutex under cgroup_mutex can't lead to deadlock. cgroup_create_file() was using I_MUTEX_CHILD to tell lockdep not to worry about the reverse locking order; however, this creates pseudo locking dependency cgroup_mutex -> I_MUTEX_CHILD, which isn't true - all directory i_mutexes are still nested outside cgroup_mutex. This pseudo locking dependency can lead to spurious lockdep warnings. Use mutex_trylock() instead. This will always succeed and lockdep doesn't create any locking dependency for it. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: simplify cgroup_load_subsys() failure pathTejun Heo2012-11-191-11/+10
| | | | | | | | | | | | | | | | Now that cgroup_unload_subsys() can tell whether the root css is online or not, we can safely call cgroup_unload_subsys() after idr init failure in cgroup_load_subsys(). Replace the manual unrolling and invoke cgroup_unload_subsys() on failure. This drops cgroup_mutex inbetween but should be safe as the subsystem will fail try_module_get() and thus can't be mounted inbetween. As this means that cgroup_unload_subsys() can be called before css_sets are rehashed, remove BUG_ON() on %NULL css_set->subsys[] from cgroup_unload_subsys(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: introduce CSS_ONLINE flag and on/offline_css() helpersTejun Heo2012-11-192-23/+43
| | | | | | | | | | | | | | | New helpers on/offline_css() respectively wrap ->post_create() and ->pre_destroy() invocations. online_css() sets CSS_ONLINE after ->post_create() is complete and offline_css() invokes ->pre_destroy() iff CSS_ONLINE is set and clears it while also handling the temporary dropping of cgroup_mutex. This patch doesn't introduce any behavior change at the moment but will be used to improve cgroup_create() failure path and allow ->post_create() to fail. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: separate out cgroup_destroy_locked()Tejun Heo2012-11-191-15/+25
| | | | | | | | | | | | | Separate out cgroup_destroy_locked() from cgroup_destroy(). This will be later used in cgroup_create() failure path. While at it, add lockdep asserts on i_mutex and cgroup_mutex, and move @d and @parent assignments to their declarations. This patch doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: fix harmless bugs in cgroup_load_subsys() fail path and ↵Tejun Heo2012-11-191-1/+14
| | | | | | | | | | | | | | | | | cgroup_unload_subsys() * If idr init fails, cgroup_load_subsys() cleared dummytop->subsys[] before calilng ->destroy() making CSS inaccessible to the callback, and didn't unlink ss->sibling. As no modular controller uses ->use_id, this doesn't cause any actual problems. * cgroup_unload_subsys() was forgetting to free idr, call ->pre_destroy() and clear ->active. As there currently is no modular controller which uses ->use_id, ->pre_destroy() or ->active, this doesn't cause any actual problems. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: lock cgroup_mutex in cgroup_init_subsys()Tejun Heo2012-11-191-0/+4
| | | | | | | | | | | Make cgroup_init_subsys() grab cgroup_mutex while initializing a subsystem so that all helpers and callbacks are called under the context they expect. This isn't strictly necessary as cgroup_init_subsys() doesn't race with anybody but will allow adding lockdep assertions. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: trivial cleanup for cgroup_init/load_subsys()Tejun Heo2012-11-191-3/+3
| | | | | | | | Consistently use @css and @dummytop in these two functions instead of referring to them indirectly. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: make CSS_* flags bit masks instead of bit positionsTejun Heo2012-11-192-5/+5
| | | | | | | | | | | Currently, CSS_* flags are defined as bit positions and manipulated using atomic bitops. There's no reason to use atomic bitops for them and bit positions are clunkier to deal with than bit masks. Make CSS_* bit masks instead and use the usual C bitwise operators to access them. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: cgroup->dentry isn't a RCU pointerTejun Heo2012-11-192-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | cgroup->dentry is marked and used as a RCU pointer; however, it isn't one - the final dentry put doesn't go through call_rcu(). cgroup and dentry share the same RCU freeing rule via synchronize_rcu() in cgroup_diput() (kfree_rcu() used on cgrp is unnecessary). If cgrp is accessible under RCU read lock, so is its dentry and dereferencing cgrp->dentry doesn't need any further RCU protection or annotation. While not being accurate, before the previous patch, the RCU accessors served a purpose as memory barriers - cgroup->dentry used to be assigned after the cgroup was made visible to cgroup_path(), so the assignment and dereferencing in cgroup_path() needed the memory barrier pair. Now that list_add_tail_rcu() happens after cgroup->dentry is assigned, this no longer is necessary. Remove the now unnecessary and misleading RCU annotations from cgroup->dentry. To make up for the removal of rcu_dereference_check() in cgroup_path(), add an explicit rcu_lockdep_assert(), which asserts the dereference rule of @cgrp, not cgrp->dentry. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: create directory before linking while creating a new cgroupTejun Heo2012-11-191-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | While creating a new cgroup, cgroup_create() links the newly allocated cgroup into various places before trying to create its directory. Because cgroup life-cycle is tied to the vfs objects, this makes it impossible to use cgroup_rmdir() for rolling back creation - the removal logic depends on having full vfs objects. This patch moves directory creation above linking and collect linking operations to one place. This allows directory creation failure to share error exit path with css allocation failures and any failure sites afterwards (to be added later) can use cgroup_rmdir() logic to undo creation. Note that this also makes the memory barriers around cgroup->dentry, which currently is misleadingly using RCU operations, unnecessary. This will be handled in the next patch. While at it, locking BUG_ON() on i_mutex is converted to lockdep_assert_held(). v2: Patch originally removed %NULL dentry check in cgroup_path(); however, Li pointed out that this patch doesn't make it unnecessary as ->create() may call cgroup_path(). Drop the change for now. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: open-code cgroup_create_dir()Tejun Heo2012-11-191-25/+5
| | | | | | | | | | | | The operation order of cgroup creation is about to change and cgroup_create_dir() is more of a hindrance than a proper abstraction. Open-code it by moving the parent nlink adjustment next to self nlink adjustment in cgroup_create_file() and the rest to cgroup_create(). This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: initialize cgrp->allcg_node in init_cgroup_housekeeping()Tejun Heo2012-11-191-0/+1
| | | | | | | | Not strictly necessary but it's annoying to have uninitialized list_head around. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: remove incorrect dget/dput() pair in cgroup_create_dir()Tejun Heo2012-11-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_create_dir() does weird dancing with dentry refcnt. On success, it gets and then puts it achieving nothing. On failure, it puts but there isn't no matching get anywhere leading to the following oops if cgroup_create_file() fails for whatever reason. ------------[ cut here ]------------ kernel BUG at /work/os/work/fs/dcache.c:552! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU 2 Pid: 697, comm: mkdir Not tainted 3.7.0-rc4-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff811d9c0c>] [<ffffffff811d9c0c>] dput+0x1dc/0x1e0 RSP: 0018:ffff88001a3ebef8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000e5b1ef8 RCX: 0000000000000403 RDX: 0000000000000303 RSI: 2000000000000000 RDI: ffff88000e5b1f58 RBP: ffff88001a3ebf18 R08: ffffffff82c76960 R09: 0000000000000001 R10: ffff880015022080 R11: ffd9bed70f48a041 R12: 00000000ffffffea R13: 0000000000000001 R14: ffff88000e5b1f58 R15: 00007fff57656d60 FS: 00007ff05fcb3800(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004046f0 CR3: 000000001315f000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process mkdir (pid: 697, threadinfo ffff88001a3ea000, task ffff880015022080) Stack: ffff88001a3ebf48 00000000ffffffea 0000000000000001 0000000000000000 ffff88001a3ebf38 ffffffff811cc889 0000000000000001 ffff88000e5b1ef8 ffff88001a3ebf68 ffffffff811d1fc9 ffff8800198d7f18 ffff880019106ef8 Call Trace: [<ffffffff811cc889>] done_path_create+0x19/0x50 [<ffffffff811d1fc9>] sys_mkdirat+0x59/0x80 [<ffffffff811d2009>] sys_mkdir+0x19/0x20 [<ffffffff81be1e02>] system_call_fastpath+0x16/0x1b Code: 00 48 8d 90 18 01 00 00 48 89 93 c0 00 00 00 4c 89 a0 18 01 00 00 48 8b 83 a0 00 00 00 83 80 28 01 00 00 01 e8 e6 6f a0 00 eb 92 <0f> 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 RIP [<ffffffff811d9c0c>] dput+0x1dc/0x1e0 RSP <ffff88001a3ebef8> ---[ end trace 1277bcfd9561ddb0 ]--- Fix it by dropping the unnecessary dget/dput() pair. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org
* cgroup_freezer: implement proper hierarchy supportTejun Heo2012-11-092-59/+165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now, cgroup_freezer didn't implement hierarchy properly. cgroups could be arranged in hierarchy but it didn't make any difference in how each cgroup_freezer behaved. They all operated separately. This patch implements proper hierarchy support. If a cgroup is frozen, all its descendants are frozen. A cgroup is thawed iff it and all its ancestors are THAWED. freezer.self_freezing shows the current freezing state for the cgroup itself. freezer.parent_freezing shows whether the cgroup is freezing because any of its ancestors is freezing. freezer_post_create() locks the parent and new cgroup and inherits the parent's state and freezer_change_state() applies new state top-down using cgroup_for_each_descendant_pre() which guarantees that no child can escape its parent's state. update_if_frozen() uses cgroup_for_each_descendant_post() to propagate frozen states bottom-up. Synchronization could be coarser and easier by using a single mutex to protect all hierarchy operations. Finer grained approach was used because it wasn't too difficult for cgroup_freezer and I think it's beneficial to have an example implementation and cgroup_freezer is rather simple and can serve a good one. As this makes cgroup_freezer properly hierarchical, freezer_subsys.broken_hierarchy marking is removed. Note that this patch changes userland visible behavior - freezing a cgroup now freezes all its descendants too. This behavior change is intended and has been warned via .broken_hierarchy. v2: Michal spotted a bug in freezer_change_state() - descendants were inheriting from the wrong ancestor. Fixed. v3: Documentation/cgroups/freezer-subsystem.txt updated. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: add ->post_create() and ->pre_destroy() and track online stateTejun Heo2012-11-091-2/+40
| | | | | | | | | | | | | | | | | | | | | | A cgroup is online and visible to iteration between ->post_create() and ->pre_destroy(). This patch introduces CGROUP_FREEZER_ONLINE and toggles it from the newly added freezer_post_create() and freezer_pre_destroy() while holding freezer->lock such that a cgroup_freezer can be reilably distinguished to be online. This will be used by full hierarchy support. ONLINE test is added to freezer_apply_state() but it currently doesn't make any difference as freezer_write() can only be called for an online cgroup. Adjusting system_freezing_cnt on destruction is moved from freezer_destroy() to the new freezer_pre_destroy() for consistency. This patch doesn't introduce any noticeable behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: introduce CGROUP_FREEZING_[SELF|PARENT]Tejun Heo2012-11-091-8/+47
| | | | | | | | | | | | | | | | Introduce FREEZING_SELF and FREEZING_PARENT and make FREEZING OR of the two flags. This is to prepare for full hierarchy support. freezer_apply_date() is updated such that it can handle setting and clearing of both flags. The two flags are also exposed to userland via read-only files self_freezing and parent_freezing. Other than the added cgroupfs files, this patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: make freezer->state mask of flagsTejun Heo2012-11-091-33/+27
| | | | | | | | | | | | | | | | freezer->state was an enum value - one of THAWED, FREEZING and FROZEN. As the scheduled full hierarchy support requires more than one freezing condition, switch it to mask of flags. If FREEZING is not set, it's thawed. FREEZING is set if freezing or frozen. If frozen, both FREEZING and FROZEN are set. Now that tasks can be attached to an already frozen cgroup, this also makes freezing condition checks more natural. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: prepare freezer_change_state() for full hierarchy supportTejun Heo2012-11-091-18/+30
| | | | | | | | | | | | | | | * Make freezer_change_state() take bool @freeze instead of enum freezer_state. * Separate out freezer_apply_state() out of freezer_change_state(). This makes freezer_change_state() a rather silly thin wrapper. It will be filled with hierarchy handling later on. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup_freezer: trivial cleanupsTejun Heo2012-11-091-22/+19
| | | | | | | | | | | | * Clean-up indentation and line-breaks. Drop the invalid comment about freezer->lock. * Make all internal functions take @freezer instead of both @cgroup and @freezer. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz>
* cgroup: implement generic child / descendant walk macrosTejun Heo2012-11-092-0/+180
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cgroup doesn't provide any generic helper for walking a given cgroup's children or descendants. This patch adds the following three macros. * cgroup_for_each_child() - walk immediate children of a cgroup. * cgroup_for_each_descendant_pre() - visit all descendants of a cgroup in pre-order tree traversal. * cgroup_for_each_descendant_post() - visit all descendants of a cgroup in post-order tree traversal. All three only require the user to hold RCU read lock during traversal. Verifying that each iterated cgroup is online is the responsibility of the user. When used with proper synchronization, cgroup_for_each_descendant_pre() can be used to propagate state updates to descendants in reliable way. See comments for details. v2: s/config/state/ in commit message and comments per Michal. More documentation on synchronization rules. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujisu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: use rculist ops for cgroup->childrenTejun Heo2012-11-092-5/+4
| | | | | | | | | | | | | | | Use RCU safe list operations for cgroup->children. This will be used to implement cgroup children / descendant walking which can be used by controllers. Note that cgroup_create() now puts a new cgroup at the end of the ->children list instead of head. This isn't strictly necessary but is done so that the iteration order is more conventional. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: add cgroup_subsys->post_create()Tejun Heo2012-11-092-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there's no way for a controller to find out whether a new cgroup finished all ->create() allocatinos successfully and is considered "live" by cgroup. This becomes a problem later when we add generic descendants walking to cgroup which can be used by controllers as controllers don't have a synchronization point where it can synchronize against new cgroups appearing in such walks. This patch adds ->post_create(). It's called after all ->create() succeeded and the cgroup is linked into the generic cgroup hierarchy. This plays the counterpart of ->pre_destroy(). When used in combination with the to-be-added generic descendant iterators, ->post_create() can be used to implement reliable state inheritance. It will be explained with the descendant iterators. v2: Added a paragraph about its future use w/ descendant iterators per Michal. v3: Forgot to add ->post_create() invocation to cgroup_load_subsys(). Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: set 'start' with the right value in cgroup_path.Tao Ma2012-11-081-2/+2
| | | | | | | | | 'start' is set to buf + buflen and do the '--' immediately. Just set it to 'buf + buflen - 1' directly. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Li Zefan <lizefan@huawei.com>
* device_cgroup: add lockdep assertsTejun Heo2012-11-061-0/+12
| | | | | | | | | | | | device_cgroup uses RCU safe ->exceptions list which is write-protected by devcgroup_mutex and has had some issues using locking correctly. Add lockdep asserts to utility functions so that future errors can be easily detected. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
* Merge branch 'cgroup/for-3.7-fixes' into cgroup/for-3.8Tejun Heo2012-11-061187-12717/+16301
|\ | | | | | | | | | | | | This is to receive device_cgroup fixes so that further device_cgroup changes can be made in cgroup/for-3.8. Signed-off-by: Tejun Heo <tj@kernel.org>
| * device_cgroup: fix RCU usageTejun Heo2012-11-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dev_cgroup->exceptions is protected with devcgroup_mutex for writes and RCU for reads; however, RCU usage isn't correct. * dev_exception_clean() doesn't use RCU variant of list_del() and kfree(). The function can race with may_access() and may_access() may end up dereferencing already freed memory. Use list_del_rcu() and kfree_rcu() instead. * may_access() may be called only with RCU read locked but doesn't use RCU safe traversal over ->exceptions. Use list_for_each_entry_rcu(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: stable@vger.kernel.org Cc: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com>
| * device_cgroup: fix unchecked cgroup parent usageAristeu Rozanski2012-11-061-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 4cef7299b478687 ("device_cgroup: add proper checking when changing default behavior") the cgroup parent usage is unchecked. root will not have a parent and trying to use device.{allow,deny} will cause problems. For some reason my stressing scripts didn't test the root directory so I didn't catch it on my regular tests. Signed-off-by: Aristeu Rozanski <aris@redhat.com> Cc: Li Zefan <lizefan@huawei.com> Cc: James Morris <jmorris@namei.org> Cc: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
| * Linux 3.7-rc4v3.7-rc4Linus Torvalds2012-11-041-1/+1
| |
| * Merge tag 'nfs-for-3.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2012-11-0311-35/+110
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: - Fix a bunch of deadlock situations: * State recovery can deadlock if we fail to release sequence ids before scheduling the recovery thread. * Calling deactivate_super() from an RPC workqueue thread can deadlock because of the call to rpc_shutdown_client. - Display the device name correctly in /proc/*/mounts - Fix a number of incorrect error return values: * When NFSv3 mounts fail due to a timeout. * On NFSv4.1 backchannel setup failure * On NFSv4 open access checks - pnfs_find_alloc_layout() must check the layout pointer for NULL - Fix a regression in the legacy DNS resolved * tag 'nfs-for-3.7-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS4: nfs4_opendata_access should return errno NFSv4: Initialise the NFSv4.1 slot table highest_used_slotid correctly SUNRPC: return proper errno from backchannel_rqst NFS: add nfs_sb_deactive_async to avoid deadlock nfs: Show original device name verbatim in /proc/*/mount{s,info} nfsv3: Make v3 mounts fail with ETIMEDOUTs instead EIO on mountd timeouts nfs: Check whether a layout pointer is NULL before free it NFS: fix bug in legacy DNS resolver. NFSv4: nfs4_locku_done must release the sequence id NFSv4.1: We must release the sequence id when we fail to get a session slot NFS: Wait for session recovery to finish before returning
| | * NFS4: nfs4_opendata_access should return errnoWeston Andros Adamson2012-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Return errno - not an NFS4ERR_. This worked because NFS4ERR_ACCESS == EACCES. Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>