diff options
Diffstat (limited to 'Documentation/cpu-freq/governors.txt')
-rw-r--r-- | Documentation/cpu-freq/governors.txt | 117 |
1 files changed, 65 insertions, 52 deletions
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index 63eef4cca1b7..61b3184b6c24 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -10,6 +10,8 @@ Dominik Brodowski <linux@brodo.de> some additions and corrections by Nico Golde <nico@ngolde.de> + Rafael J. Wysocki <rafael.j.wysocki@intel.com> + Viresh Kumar <viresh.kumar@linaro.org> @@ -28,32 +30,27 @@ Contents: 2.3 Userspace 2.4 Ondemand 2.5 Conservative +2.6 Schedutil 3. The Governor Interface in the CPUfreq Core +4. References 1. What Is A CPUFreq Governor? ============================== Most cpufreq drivers (except the intel_pstate and longrun) or even most -cpu frequency scaling algorithms only offer the CPU to be set to one -frequency. In order to offer dynamic frequency scaling, the cpufreq -core must be able to tell these drivers of a "target frequency". So -these specific drivers will be transformed to offer a "->target/target_index" -call instead of the existing "->setpolicy" call. For "longrun", all -stays the same, though. +cpu frequency scaling algorithms only allow the CPU frequency to be set +to predefined fixed values. In order to offer dynamic frequency +scaling, the cpufreq core must be able to tell these drivers of a +"target frequency". So these specific drivers will be transformed to +offer a "->target/target_index/fast_switch()" call instead of the +"->setpolicy()" call. For set_policy drivers, all stays the same, +though. How to decide what frequency within the CPUfreq policy should be used? -That's done using "cpufreq governors". Two are already in this patch --- they're the already existing "powersave" and "performance" which -set the frequency statically to the lowest or highest frequency, -respectively. At least two more such governors will be ready for -addition in the near future, but likely many more as there are various -different theories and models about dynamic frequency scaling -around. Using such a generic interface as cpufreq offers to scaling -governors, these can be tested extensively, and the best one can be -selected for each specific use. +That's done using "cpufreq governors". Basically, it's the following flow graph: @@ -71,7 +68,7 @@ CPU can be set to switch independently | CPU can only be set / the limits of policy->{min,max} / \ / \ - Using the ->setpolicy call, Using the ->target/target_index call, + Using the ->setpolicy call, Using the ->target/target_index/fast_switch call, the limits and the the frequency closest "policy" is set. to target_freq is set. It is assured that it @@ -109,9 +106,12 @@ directory. 2.4 Ondemand ------------ -The CPUfreq governor "ondemand" sets the CPU depending on the -current usage. To do this the CPU must have the capability to -switch the frequency very quickly. +The CPUfreq governor "ondemand" sets the CPU frequency depending on the +current system load. Load estimation is triggered by the scheduler +through the update_util_data->func hook; when triggered, cpufreq checks +the CPU-usage statistics over the last period and the governor sets the +CPU accordingly. The CPU must have the capability to switch the +frequency very quickly. Sysfs files: @@ -207,12 +207,12 @@ Sysfs files: ---------------- The CPUfreq governor "conservative", much like the "ondemand" -governor, sets the CPU depending on the current usage. It differs in -behaviour in that it gracefully increases and decreases the CPU speed -rather than jumping to max speed the moment there is any load on the -CPU. This behaviour more suitable in a battery powered environment. -The governor is tweaked in the same manner as the "ondemand" governor -through sysfs with the addition of: +governor, sets the CPU frequency depending on the current usage. It +differs in behaviour in that it gracefully increases and decreases the +CPU speed rather than jumping to max speed the moment there is any load +on the CPU. This behaviour is more suitable in a battery powered +environment. The governor is tweaked in the same manner as the +"ondemand" governor through sysfs with the addition of: * freq_step: @@ -237,6 +237,29 @@ through sysfs with the addition of: decision on when to decrease the frequency while running in any speed. Load for frequency increase is still evaluated every sampling rate. + +2.6 Schedutil +------------- + +The "schedutil" governor aims at better integration with the Linux +kernel scheduler. Load estimation is achieved through the scheduler's +Per-Entity Load Tracking (PELT) mechanism, which also provides +information about the recent load [1]. This governor currently does +load based DVFS only for tasks managed by CFS. RT and DL scheduler tasks +are always run at the highest frequency. Unlike all the other +governors, the code is located under the kernel/sched/ directory. + +Sysfs files: + +* rate_limit_us: + + This contains a value in microseconds. The governor waits for + rate_limit_us time before reevaluating the load again, after it has + evaluated the load once. + +For an in-depth comparison with the other governors refer to [2]. + + 3. The Governor Interface in the CPUfreq Core ============================================= @@ -244,26 +267,10 @@ A new governor must register itself with the CPUfreq core using "cpufreq_register_governor". The struct cpufreq_governor, which has to be passed to that function, must contain the following values: -governor->name - A unique name for this governor -governor->governor - The governor callback function -governor->owner - .THIS_MODULE for the governor module (if - appropriate) - -The governor->governor callback is called with the current (or to-be-set) -cpufreq_policy struct for that CPU, and an unsigned int event. The -following events are currently defined: - -CPUFREQ_GOV_START: This governor shall start its duty for the CPU - policy->cpu -CPUFREQ_GOV_STOP: This governor shall end its duty for the CPU - policy->cpu -CPUFREQ_GOV_LIMITS: The limits for CPU policy->cpu have changed to - policy->min and policy->max. - -If you need other "events" externally of your driver, _only_ use the -cpufreq_governor_l(unsigned int cpu, unsigned int event) call to the -CPUfreq core to ensure proper locking. +governor->name - A unique name for this governor. +governor->owner - .THIS_MODULE for the governor module (if appropriate). +plus a set of hooks to the functions implementing the governor's logic. The CPUfreq governor may call the CPU processor driver using one of these two functions: @@ -277,12 +284,18 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy, unsigned int relation); target_freq must be within policy->min and policy->max, of course. -What's the difference between these two functions? When your governor -still is in a direct code path of a call to governor->governor, the -per-CPU cpufreq lock is still held in the cpufreq core, and there's -no need to lock it again (in fact, this would cause a deadlock). So -use __cpufreq_driver_target only in these cases. In all other cases -(for example, when there's a "daemonized" function that wakes up -every second), use cpufreq_driver_target to lock the cpufreq per-CPU -lock before the command is passed to the cpufreq processor driver. +What's the difference between these two functions? When your governor is +in a direct code path of a call to governor callbacks, like +governor->start(), the policy->rwsem is still held in the cpufreq core, +and there's no need to lock it again (in fact, this would cause a +deadlock). So use __cpufreq_driver_target only in these cases. In all +other cases (for example, when there's a "daemonized" function that +wakes up every second), use cpufreq_driver_target to take policy->rwsem +before the command is passed to the cpufreq driver. + +4. References +============= + +[1] Per-entity load tracking: https://lwn.net/Articles/531853/ +[2] Improvements in CPU frequency management: https://lwn.net/Articles/682391/ |