sched/cfs: Make util/load_avg more stable

In the current implementation of load/util_avg, we assume that the ongoing time segment has fully elapsed, and util/load_sum is divided by LOAD_AVG_MAX, even if part of the time segment still remains to run. As a consequence, this remaining part is considered as idle time and generates unexpected variations of util_avg of a busy CPU in the range [1002..1024[ whereas util_avg should stay at 1023. In order to keep the metric stable, we should not consider the ongoing time segment when computing load/util_avg but only the segments that have already fully elapsed. But to not consider the current time segment adds unwanted latency in the load/util_avg responsivness especially when the time is scaled instead of the contribution. Instead of waiting for the current time segment to have fully elapsed before accounting it in load/util_avg, we can already account the elapsed part but change the range used to compute load/util_avg accordingly. At the very beginning of a new time segment, the past segments have been decayed and the max value is LOAD_AVG_MAX*y. At the very end of the current time segment, the max value becomes: LOAD_AVG_MAX*y + 1024(us) (== LOAD_AVG_MAX) In fact, the max value is: LOAD_AVG_MAX*y + sa->period_contrib at any time in the time segment. Taking advantage of the fact that: LOAD_AVG_MAX*y == LOAD_AVG_MAX-1024 the range becomes [0..LOAD_AVG_MAX-1024+sa->period_contrib]. As the elapsed part is already accounted in load/util_sum, we update the max value according to the current position in the time segment instead of removing its contribution. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Morten.Rasmussen@arm.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bsegall@google.com Cc: dietmar.eggemann@arm.com Cc: pjt@google.com Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1493188076-2767-1-git-send-email-vincent.guittot@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Vincent Guittot <vincent.guittot@linaro.org> 2017-04-26 08:27:56 +0200
committer: Ingo Molnar <mingo@kernel.org> 2017-05-15 10:15:13 +0200
commit: 625ed2bf049d5a352c1bcca962d6e133454eaaff (patch)
tree: bb56ad5ff1b52808de762a6c6c0c754de6679970 /kernel/sched
parent: sched/core: Call __schedule() from do_idle() without enabling preemption (diff)
download: linux-625ed2bf049d5a352c1bcca962d6e133454eaaff.tar.xz
linux-625ed2bf049d5a352c1bcca962d6e133454eaaff.zip
1 files changed, 3 insertions, 3 deletions
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d71109321841..4f1825d60937 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2916,12 +2916,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	/*
 	 * Step 2: update *_avg.
 	 */
-	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX);
+	sa->load_avg = div_u64(sa->load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	if (cfs_rq) {
 		cfs_rq->runnable_load_avg =
-			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
+			div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX - 1024 + sa->period_contrib);
 	}
-	sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
+	sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
 
 	return 1;
 }
author	Vincent Guittot <vincent.guittot@linaro.org>	2017-04-26 08:27:56 +0200
committer	Ingo Molnar <mingo@kernel.org>	2017-05-15 10:15:13 +0200
commit	625ed2bf049d5a352c1bcca962d6e133454eaaff (patch)
tree	bb56ad5ff1b52808de762a6c6c0c754de6679970 /kernel/sched
parent	sched/core: Call __schedule() from do_idle() without enabling preemption (diff)
download	linux-625ed2bf049d5a352c1bcca962d6e133454eaaff.tar.xz linux-625ed2bf049d5a352c1bcca962d6e133454eaaff.zip