diff options
Diffstat (limited to 'Documentation/cgroups')
-rw-r--r-- | Documentation/cgroups/00-INDEX | 18 | ||||
-rw-r--r-- | Documentation/cgroups/cgroups.txt | 36 | ||||
-rw-r--r-- | Documentation/cgroups/cpusets.txt | 12 | ||||
-rw-r--r-- | Documentation/cgroups/devices.txt | 2 | ||||
-rw-r--r-- | Documentation/cgroups/memcg_test.txt | 22 | ||||
-rw-r--r-- | Documentation/cgroups/memory.txt | 2 |
6 files changed, 73 insertions, 19 deletions
diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX new file mode 100644 index 000000000000..3f58fa3d6d00 --- /dev/null +++ b/Documentation/cgroups/00-INDEX @@ -0,0 +1,18 @@ +00-INDEX + - this file +cgroups.txt + - Control Groups definition, implementation details, examples and API. +cpuacct.txt + - CPU Accounting Controller; account CPU usage for groups of tasks. +cpusets.txt + - documents the cpusets feature; assign CPUs and Mem to a set of tasks. +devices.txt + - Device Whitelist Controller; description, interface and security. +freezer-subsystem.txt + - checkpointing; rationale to not use signals, interface. +memcg_test.txt + - Memory Resource Controller; implementation details. +memory.txt + - Memory Resource Controller; design, accounting, interface, testing. +resource_counter.txt + - Resource Counter API. diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 93feb8444489..6eb1a97e88ce 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt @@ -56,7 +56,7 @@ hierarchy, and a set of subsystems; each subsystem has system-specific state attached to each cgroup in the hierarchy. Each hierarchy has an instance of the cgroup virtual filesystem associated with it. -At any one time there may be multiple active hierachies of task +At any one time there may be multiple active hierarchies of task cgroups. Each hierarchy is a partition of all tasks in the system. User level code may create and destroy cgroups by name in an @@ -124,10 +124,10 @@ following lines: / \ Prof (15%) students (5%) -Browsers like firefox/lynx go into the WWW network class, while (k)nfsd go +Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go into NFS network class. -At the same time firefox/lynx will share an appropriate CPU/Memory class +At the same time Firefox/Lynx will share an appropriate CPU/Memory class depending on who launched it (prof/student). With the ability to classify tasks differently for different resources @@ -325,7 +325,7 @@ and then start a subshell 'sh' in that cgroup: Creating, modifying, using the cgroups can be done through the cgroup virtual filesystem. -To mount a cgroup hierarchy will all available subsystems, type: +To mount a cgroup hierarchy with all available subsystems, type: # mount -t cgroup xxx /dev/cgroup The "xxx" is not interpreted by the cgroup code, but will appear in @@ -333,12 +333,23 @@ The "xxx" is not interpreted by the cgroup code, but will appear in To mount a cgroup hierarchy with just the cpuset and numtasks subsystems, type: -# mount -t cgroup -o cpuset,numtasks hier1 /dev/cgroup +# mount -t cgroup -o cpuset,memory hier1 /dev/cgroup To change the set of subsystems bound to a mounted hierarchy, just remount with different options: +# mount -o remount,cpuset,ns hier1 /dev/cgroup -# mount -o remount,cpuset,ns /dev/cgroup +Now memory is removed from the hierarchy and ns is added. + +Note this will add ns to the hierarchy but won't remove memory or +cpuset, because the new options are appended to the old ones: +# mount -o remount,ns /dev/cgroup + +To Specify a hierarchy's release_agent: +# mount -t cgroup -o cpuset,release_agent="/sbin/cpuset_release_agent" \ + xxx /dev/cgroup + +Note that specifying 'release_agent' more than once will return failure. Note that changing the set of subsystems is currently only supported when the hierarchy consists of a single (root) cgroup. Supporting @@ -349,6 +360,11 @@ Then under /dev/cgroup you can find a tree that corresponds to the tree of the cgroups in the system. For instance, /dev/cgroup is the cgroup that holds the whole system. +If you want to change the value of release_agent: +# echo "/sbin/new_release_agent" > /dev/cgroup/release_agent + +It can also be changed via remount. + If you want to create a new cgroup under /dev/cgroup: # cd /dev/cgroup # mkdir my_cgroup @@ -476,11 +492,13 @@ cgroup->parent is still valid. (Note - can also be called for a newly-created cgroup if an error occurs after this subsystem's create() method has been called for the new cgroup). -void pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp); +int pre_destroy(struct cgroup_subsys *ss, struct cgroup *cgrp); Called before checking the reference count on each subsystem. This may be useful for subsystems which have some extra references even if -there are not tasks in the cgroup. +there are not tasks in the cgroup. If pre_destroy() returns error code, +rmdir() will fail with it. From this behavior, pre_destroy() can be +called multiple times against a cgroup. int can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, struct task_struct *task) @@ -521,7 +539,7 @@ always handled well. void post_clone(struct cgroup_subsys *ss, struct cgroup *cgrp) (cgroup_mutex held by caller) -Called at the end of cgroup_clone() to do any paramater +Called at the end of cgroup_clone() to do any parameter initialization which might be required before a task could attach. For example in cpusets, no task may attach before 'cpus' and 'mems' are set up. diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 0611e9528c7c..f9ca389dddf4 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt @@ -131,7 +131,7 @@ Cpusets extends these two mechanisms as follows: - The hierarchy of cpusets can be mounted at /dev/cpuset, for browsing and manipulation from user space. - A cpuset may be marked exclusive, which ensures that no other - cpuset (except direct ancestors and descendents) may contain + cpuset (except direct ancestors and descendants) may contain any overlapping CPUs or Memory Nodes. - You can list all the tasks (by pid) attached to any cpuset. @@ -226,7 +226,7 @@ nodes with memory--using the cpuset_track_online_nodes() hook. -------------------------------- If a cpuset is cpu or mem exclusive, no other cpuset, other than -a direct ancestor or descendent, may share any of the same CPUs or +a direct ancestor or descendant, may share any of the same CPUs or Memory Nodes. A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled", @@ -427,7 +427,7 @@ child cpusets have this flag enabled. When doing this, you don't usually want to leave any unpinned tasks in the top cpuset that might use non-trivial amounts of CPU, as such tasks may be artificially constrained to some subset of CPUs, depending on -the particulars of this flag setting in descendent cpusets. Even if +the particulars of this flag setting in descendant cpusets. Even if such a task could use spare CPU cycles in some other CPUs, the kernel scheduler might not consider the possibility of load balancing that task to that underused CPU. @@ -531,9 +531,9 @@ be idle. Of course it takes some searching cost to find movable tasks and/or idle CPUs, the scheduler might not search all CPUs in the domain -everytime. In fact, in some architectures, the searching ranges on +every time. In fact, in some architectures, the searching ranges on events are limited in the same socket or node where the CPU locates, -while the load balance on tick searchs all. +while the load balance on tick searches all. For example, assume CPU Z is relatively far from CPU X. Even if CPU Z is idle while CPU X and the siblings are busy, scheduler can't migrate @@ -601,7 +601,7 @@ its new cpuset, then the task will continue to use whatever subset of MPOL_BIND nodes are still allowed in the new cpuset. If the task was using MPOL_BIND and now none of its MPOL_BIND nodes are allowed in the new cpuset, then the task will be essentially treated as if it -was MPOL_BIND bound to the new cpuset (even though its numa placement, +was MPOL_BIND bound to the new cpuset (even though its NUMA placement, as queried by get_mempolicy(), doesn't change). If a task is moved from one cpuset to another, then the kernel will adjust the tasks memory placement, as above, the next time that the kernel attempts diff --git a/Documentation/cgroups/devices.txt b/Documentation/cgroups/devices.txt index 7cc6e6a60672..57ca4c89fe5c 100644 --- a/Documentation/cgroups/devices.txt +++ b/Documentation/cgroups/devices.txt @@ -42,7 +42,7 @@ suffice, but we can decide the best way to adequately restrict movement as people get some experience with this. We may just want to require CAP_SYS_ADMIN, which at least is a separate bit from CAP_MKNOD. We may want to just refuse moving to a cgroup which -isn't a descendent of the current one. Or we may want to use +isn't a descendant of the current one. Or we may want to use CAP_MAC_ADMIN, since we really are trying to lock down root. CAP_SYS_ADMIN is needed to modify the whitelist or move another diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt index 523a9c16c400..72db89ed0609 100644 --- a/Documentation/cgroups/memcg_test.txt +++ b/Documentation/cgroups/memcg_test.txt @@ -1,5 +1,5 @@ Memory Resource Controller(Memcg) Implementation Memo. -Last Updated: 2009/1/19 +Last Updated: 2009/1/20 Base Kernel Version: based on 2.6.29-rc2. Because VM is getting complex (one of reasons is memcg...), memcg's behavior @@ -356,7 +356,25 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. (Shell-B) # move all tasks in /cgroup/test to /cgroup # /sbin/swapoff -a - # rmdir /test/cgroup + # rmdir /cgroup/test # kill malloc task. Of course, tmpfs v.s. swapoff test should be tested, too. + + 9.8 OOM-Killer + Out-of-memory caused by memcg's limit will kill tasks under + the memcg. When hierarchy is used, a task under hierarchy + will be killed by the kernel. + In this case, panic_on_oom shouldn't be invoked and tasks + in other groups shouldn't be killed. + + It's not difficult to cause OOM under memcg as following. + Case A) when you can swapoff + #swapoff -a + #echo 50M > /memory.limit_in_bytes + run 51M of malloc + + Case B) when you use mem+swap limitation. + #echo 50M > memory.limit_in_bytes + #echo 50M > memory.memsw.limit_in_bytes + run 51M of malloc diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index e1501964df1e..a98a7fe7aabb 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt @@ -302,7 +302,7 @@ will be charged as a new owner of it. unevictable - # of pages cannot be reclaimed.(mlocked etc) Below is depend on CONFIG_DEBUG_VM. - inactive_ratio - VM inernal parameter. (see mm/page_alloc.c) + inactive_ratio - VM internal parameter. (see mm/page_alloc.c) recent_rotated_anon - VM internal parameter. (see mm/vmscan.c) recent_rotated_file - VM internal parameter. (see mm/vmscan.c) recent_scanned_anon - VM internal parameter. (see mm/vmscan.c) |