sched: fair group: fix overflow(was: fix divide by zero)

I found a bug which can be reproduced by this way:(linux-2.6.26-rc5, x86-64) (use 2^32, 2^33, ...., 2^63 as shares value) # mkdir /dev/cpuctl # mount -t cgroup -o cpu cpuctl /dev/cpuctl # cd /dev/cpuctl # mkdir sub # echo 0x8000000000000000 > sub/cpu.shares # echo $$ > sub/tasks oops here! divide by zero. This is because do_div() expects the 2th parameter to be 32 bits, but unsigned long is 64 bits in x86_64. Peter Zijstra pointed it out that the sane thing to do is limit the shares value to something smaller instead of using an even more expensive divide. Also, I found another bug about "the shares value is too large": pid1 and pid2 are set affinity to cpu#0 pid1 is attached to cg1 and pid2 is attached to cg2 if cg1/cpu.shares = 1024 cg2/cpu.shares = 2000000000 then pid2 got 100% usage of cpu, and pid1 0% if cg1/cpu.shares = 1024 cg2/cpu.shares = 20000000000 then pid2 got 0% usage of cpu, and pid1 100% And a weight of a cfs_rq is the sum of weights of which entities are queued on this cfs_rq, so the shares value should be limited to a smaller value. I think that (1UL << 18) is a good limited value: 1) it's not too large, we can create a lot of group before overflow 2) it's several times the weight value for nice=-19 (not too small) Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
author: Lai Jiangshan <laijs@cn.fujitsu.com> 2008-06-12 10:42:58 +0200
committer: Ingo Molnar <mingo@elte.hu> 2008-06-12 14:23:55 +0200
commit: 2e084786f6fe052274f1dfa7c675fe4a02cacd6e (patch)
tree: 66190825236b1f3f59e88aaa83d2683bd121143d /kernel
parent: sched: fix TASK_WAKEKILL vs SIGKILL race (diff)
download: linux-2e084786f6fe052274f1dfa7c675fe4a02cacd6e.tar.xz
linux-2e084786f6fe052274f1dfa7c675fe4a02cacd6e.zip
1 files changed, 5 insertions, 2 deletions
diff --git a/kernel/sched.c b/kernel/sched.c
index 2c65bf29d133..6c1ecbdc0db9 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -312,12 +312,15 @@ static DEFINE_SPINLOCK(task_group_lock);
 #endif
 
 /*
- * A weight of 0, 1 or ULONG_MAX can cause arithmetics problems.
+ * A weight of 0 or 1 can cause arithmetics problems.
+ * A weight of a cfs_rq is the sum of weights of which entities
+ * are queued on this cfs_rq, so a weight of a entity should not be
+ * too large, so as the shares value of a task group.
  * (The default weight is 1024 - so there's no practical
  *  limitation from this.)
  */
 #define MIN_SHARES	2
-#define MAX_SHARES	(ULONG_MAX - 1)
+#define MAX_SHARES	(1UL << 18)
 
 static int init_task_group_load = INIT_TASK_GROUP_LOAD;
 #endif
author	Lai Jiangshan <laijs@cn.fujitsu.com>	2008-06-12 10:42:58 +0200
committer	Ingo Molnar <mingo@elte.hu>	2008-06-12 14:23:55 +0200
commit	2e084786f6fe052274f1dfa7c675fe4a02cacd6e (patch)
tree	66190825236b1f3f59e88aaa83d2683bd121143d /kernel
parent	sched: fix TASK_WAKEKILL vs SIGKILL race (diff)
download	linux-2e084786f6fe052274f1dfa7c675fe4a02cacd6e.tar.xz linux-2e084786f6fe052274f1dfa7c675fe4a02cacd6e.zip