diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-11-14 05:41:48 +0100 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-11-14 05:41:48 +0100 |
commit | f9300eaaac1ca300083ad41937923a90cc3a2394 (patch) | |
tree | 724b72ad729a8b85c09d2d54f8ca7d8ba22d774f /Documentation | |
parent | Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/... (diff) | |
parent | Merge branch 'pm-cpufreq' (diff) | |
download | linux-f9300eaaac1ca300083ad41937923a90cc3a2394.tar.xz linux-f9300eaaac1ca300083ad41937923a90cc3a2394.zip |
Merge tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael J Wysocki:
- New power capping framework and the the Intel Running Average Power
Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.
- Addition of the in-kernel switching feature to the arm_big_little
cpufreq driver from Viresh Kumar and Nicolas Pitre.
- cpufreq support for iMac G5 from Aaro Koskinen.
- Baytrail processors support for intel_pstate from Dirk Brandewie.
- cpufreq support for Midway/ECX-2000 from Mark Langsdorf.
- ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.
- ACPI power management support for the I2C and SPI bus types from Mika
Westerberg and Lv Zheng.
- cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.
- cpufreq drivers updates (mostly fixes and cleanups) from Viresh
Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.
- intel_pstate updates from Dirk Brandewie and Adrian Huang.
- ACPICA update to version 20130927 includig fixes and cleanups and
some reduction of divergences between the ACPICA code in the kernel
and ACPICA upstream in order to improve the automatic ACPICA patch
generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
Bhat, Bjorn Helgaas, David E Box.
- ACPI IPMI driver fixes and cleanups from Lv Zheng.
- ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
Yanfei, Rafael J Wysocki.
- Conversion of the ACPI AC driver to the platform bus type and
multiple driver fixes and cleanups related to ACPI from Zhang Rui.
- ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.
- Fixes and cleanups and new blacklist entries related to the ACPI
video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
Kirill Tkhai.
- cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.
- cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
Bartlomiej Zolnierkiewicz, Prarit Bhargava.
- devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.
- Operation Performance Points (OPP) core updates from Nishanth Menon.
- Runtime power management core fix from Rafael J Wysocki and update
from Ulf Hansson.
- Hibernation fixes from Aaron Lu and Rafael J Wysocki.
- Device suspend/resume lockup detection mechanism from Benoit Goby.
- Removal of unused proc directories created for various ACPI drivers
from Lan Tianyu.
- ACPI LPSS driver fix and new device IDs for the ACPI platform scan
handler from Heikki Krogerus and Jarkko Nikula.
- New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.
- Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
Liu Chuansheng.
- Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
Jean-Christophe Plagniol-Villard.
* tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
cpufreq: conservative: fix requested_freq reduction issue
ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
ACPI / event: remove unneeded NULL pointer check
Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
ACPI / video: Quirk initial backlight level 0
ACPI / video: Fix initial level validity test
intel_pstate: skip the driver if ACPI has power mgmt option
PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
ACPI / hotplug: Do not execute "insert in progress" _OST
ACPI / hotplug: Carry out PCI root eject directly
ACPI / hotplug: Merge device hot-removal routines
ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
ACPI / hotplug: Simplify device ejection routines
ACPI / hotplug: Fix handle_root_bridge_removal()
ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
ACPI / scan: Start matching drivers after trying scan handlers
ACPI: Remove acpi_pci_slot_init() headers from internal.h
ACPI / blacklist: fix name of ThinkPad Edge E530
PowerCap: Fix build error with option -Werror=format-security
...
Conflicts:
arch/arm/mach-omap2/opp.c
drivers/Kconfig
drivers/spi/spi.c
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-class-powercap | 152 | ||||
-rw-r--r-- | Documentation/cpu-freq/cpu-drivers.txt | 27 | ||||
-rw-r--r-- | Documentation/cpu-freq/governors.txt | 4 | ||||
-rw-r--r-- | Documentation/cpuidle/governor.txt | 1 | ||||
-rw-r--r-- | Documentation/power/opp.txt | 108 | ||||
-rw-r--r-- | Documentation/power/powercap/powercap.txt | 236 | ||||
-rw-r--r-- | Documentation/power/runtime_pm.txt | 14 |
7 files changed, 470 insertions, 72 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-powercap b/Documentation/ABI/testing/sysfs-class-powercap new file mode 100644 index 000000000000..db3b3ff70d84 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-class-powercap @@ -0,0 +1,152 @@ +What: /sys/class/powercap/ +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + The powercap/ class sub directory belongs to the power cap + subsystem. Refer to + Documentation/power/powercap/powercap.txt for details. + +What: /sys/class/powercap/<control type> +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + A <control type> is a unique name under /sys/class/powercap. + Here <control type> determines how the power is going to be + controlled. A <control type> can contain multiple power zones. + +What: /sys/class/powercap/<control type>/enabled +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + This allows to enable/disable power capping for a "control type". + This status affects every power zone using this "control_type. + +What: /sys/class/powercap/<control type>/<power zone> +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + A power zone is a single or a collection of devices, which can + be independently monitored and controlled. A power zone sysfs + entry is qualified with the name of the <control type>. + E.g. intel-rapl:0:1:1. + +What: /sys/class/powercap/<control type>/<power zone>/<child power zone> +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Power zones may be organized in a hierarchy in which child + power zones provide monitoring and control for a subset of + devices under the parent. For example, if there is a parent + power zone for a whole CPU package, each CPU core in it can + be a child power zone. + +What: /sys/class/powercap/.../<power zone>/name +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Specifies the name of this power zone. + +What: /sys/class/powercap/.../<power zone>/energy_uj +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Current energy counter in micro-joules. Write "0" to reset. + If the counter can not be reset, then this attribute is + read-only. + +What: /sys/class/powercap/.../<power zone>/max_energy_range_uj +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Range of the above energy counter in micro-joules. + + +What: /sys/class/powercap/.../<power zone>/power_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Current power in micro-watts. + +What: /sys/class/powercap/.../<power zone>/max_power_range_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Range of the above power value in micro-watts. + +What: /sys/class/powercap/.../<power zone>/constraint_X_name +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Each power zone can define one or more constraints. Each + constraint can have an optional name. Here "X" can have values + from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_power_limit_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Power limit in micro-watts should be applicable for + the time window specified by "constraint_X_time_window_us". + Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_time_window_us +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Time window in micro seconds. This is used along with + constraint_X_power_limit_uw to define a power constraint. + Here "X" can have values from 0 to max integer. + + +What: /sys/class/powercap/<control type>/.../constraint_X_max_power_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Maximum allowed power in micro watts for this constraint. + Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/<control type>/.../constraint_X_min_power_uw +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Minimum allowed power in micro watts for this constraint. + Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_max_time_window_us +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Maximum allowed time window in micro seconds for this + constraint. Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/constraint_X_min_time_window_us +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description: + Minimum allowed time window in micro seconds for this + constraint. Here "X" can have values from 0 to max integer. + +What: /sys/class/powercap/.../<power zone>/enabled +Date: September 2013 +KernelVersion: 3.13 +Contact: linux-pm@vger.kernel.org +Description + This allows to enable/disable power capping at power zone level. + This applies to current power zone and its children. diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt index 40282e617913..8b1a4451422e 100644 --- a/Documentation/cpu-freq/cpu-drivers.txt +++ b/Documentation/cpu-freq/cpu-drivers.txt @@ -23,8 +23,8 @@ Contents: 1.1 Initialization 1.2 Per-CPU Initialization 1.3 verify -1.4 target or setpolicy? -1.5 target +1.4 target/target_index or setpolicy? +1.5 target/target_index 1.6 setpolicy 2. Frequency Table Helpers @@ -56,7 +56,8 @@ cpufreq_driver.init - A pointer to the per-CPU initialization cpufreq_driver.verify - A pointer to a "verification" function. cpufreq_driver.setpolicy _or_ -cpufreq_driver.target - See below on the differences. +cpufreq_driver.target/ +target_index - See below on the differences. And optionally @@ -66,7 +67,7 @@ cpufreq_driver.resume - A pointer to a per-CPU resume function which is called with interrupts disabled and _before_ the pre-suspend frequency and/or policy is restored by a call to - ->target or ->setpolicy. + ->target/target_index or ->setpolicy. cpufreq_driver.attr - A pointer to a NULL-terminated list of "struct freq_attr" which allow to @@ -103,8 +104,8 @@ policy->governor must contain the "default policy" for this CPU. A few moments later, cpufreq_driver.verify and either cpufreq_driver.setpolicy or - cpufreq_driver.target is called with - these values. + cpufreq_driver.target/target_index is called + with these values. For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the frequency table helpers might be helpful. See the section 2 for more information @@ -133,20 +134,28 @@ range) is within policy->min and policy->max. If necessary, increase policy->max first, and only if this is no solution, decrease policy->min. -1.4 target or setpolicy? +1.4 target/target_index or setpolicy? ---------------------------- Most cpufreq drivers or even most cpu frequency scaling algorithms only allow the CPU to be set to one frequency. For these, you use the -->target call. +->target/target_index call. Some cpufreq-capable processors switch the frequency between certain limits on their own. These shall use the ->setpolicy call -1.4. target +1.4. target/target_index ------------- +The target_index call has two arguments: struct cpufreq_policy *policy, +and unsigned int index (into the exposed frequency table). + +The CPUfreq driver must set the new frequency when called here. The +actual frequency must be determined by freq_table[index].frequency. + +Deprecated: +---------- The target call has three arguments: struct cpufreq_policy *policy, unsigned int target_frequency, unsigned int relation. diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt index 219970ba54b7..77ec21574fb1 100644 --- a/Documentation/cpu-freq/governors.txt +++ b/Documentation/cpu-freq/governors.txt @@ -40,7 +40,7 @@ Most cpufreq drivers (in fact, all except one, longrun) or even most cpu frequency scaling algorithms only offer the CPU to be set to one frequency. In order to offer dynamic frequency scaling, the cpufreq core must be able to tell these drivers of a "target frequency". So -these specific drivers will be transformed to offer a "->target" +these specific drivers will be transformed to offer a "->target/target_index" call instead of the existing "->setpolicy" call. For "longrun", all stays the same, though. @@ -71,7 +71,7 @@ CPU can be set to switch independently | CPU can only be set / the limits of policy->{min,max} / \ / \ - Using the ->setpolicy call, Using the ->target call, + Using the ->setpolicy call, Using the ->target/target_index call, the limits and the the frequency closest "policy" is set. to target_freq is set. It is assured that it diff --git a/Documentation/cpuidle/governor.txt b/Documentation/cpuidle/governor.txt index 12c6bd50c9f6..d9020f5e847b 100644 --- a/Documentation/cpuidle/governor.txt +++ b/Documentation/cpuidle/governor.txt @@ -25,5 +25,4 @@ kernel configuration and platform will be selected by cpuidle. Interfaces: extern int cpuidle_register_governor(struct cpuidle_governor *gov); -extern void cpuidle_unregister_governor(struct cpuidle_governor *gov); struct cpuidle_governor diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt index 425c51d56aef..b8a907dc0169 100644 --- a/Documentation/power/opp.txt +++ b/Documentation/power/opp.txt @@ -42,7 +42,7 @@ We can represent these as three OPPs as the following {Hz, uV} tuples: OPP library provides a set of helper functions to organize and query the OPP information. The library is located in drivers/base/power/opp.c and the header -is located in include/linux/opp.h. OPP library can be enabled by enabling +is located in include/linux/pm_opp.h. OPP library can be enabled by enabling CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to optionally boot at a certain OPP without needing cpufreq. @@ -71,14 +71,14 @@ operations until that OPP could be re-enabled if possible. OPP library facilitates this concept in it's implementation. The following operational functions operate only on available opps: -opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count -and opp_init_cpufreq_table +opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, dev_pm_opp_get_opp_count +and dev_pm_opp_init_cpufreq_table -opp_find_freq_exact is meant to be used to find the opp pointer which can then -be used for opp_enable/disable functions to make an opp available as required. +dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer which can then +be used for dev_pm_opp_enable/disable functions to make an opp available as required. WARNING: Users of OPP library should refresh their availability count using -get_opp_count if opp_enable/disable functions are invoked for a device, the +get_opp_count if dev_pm_opp_enable/disable functions are invoked for a device, the exact mechanism to trigger these or the notification mechanism to other dependent subsystems such as cpufreq are left to the discretion of the SoC specific framework which uses the OPP library. Similar care needs to be taken @@ -96,24 +96,24 @@ using RCU read locks. The opp_find_freq_{exact,ceil,floor}, opp_get_{voltage, freq, opp_count} fall into this category. opp_{add,enable,disable} are updaters which use mutex and implement it's own -RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses +RCU locking mechanisms. dev_pm_opp_init_cpufreq_table acts as an updater and uses mutex to implment RCU updater strategy. These functions should *NOT* be called under RCU locks and other contexts that prevent blocking functions in RCU or mutex operations from working. 2. Initial OPP List Registration ================================ -The SoC implementation calls opp_add function iteratively to add OPPs per +The SoC implementation calls dev_pm_opp_add function iteratively to add OPPs per device. It is expected that the SoC framework will register the OPP entries optimally- typical numbers range to be less than 5. The list generated by registering the OPPs is maintained by OPP library throughout the device operation. The SoC framework can subsequently control the availability of the -OPPs dynamically using the opp_enable / disable functions. +OPPs dynamically using the dev_pm_opp_enable / disable functions. -opp_add - Add a new OPP for a specific domain represented by the device pointer. +dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer. The OPP is defined using the frequency and voltage. Once added, the OPP is assumed to be available and control of it's availability can be done - with the opp_enable/disable functions. OPP library internally stores + with the dev_pm_opp_enable/disable functions. OPP library internally stores and manages this information in the opp struct. This function may be used by SoC framework to define a optimal list as per the demands of SoC usage environment. @@ -124,7 +124,7 @@ opp_add - Add a new OPP for a specific domain represented by the device pointer. soc_pm_init() { /* Do things */ - r = opp_add(mpu_dev, 1000000, 900000); + r = dev_pm_opp_add(mpu_dev, 1000000, 900000); if (!r) { pr_err("%s: unable to register mpu opp(%d)\n", r); goto no_cpufreq; @@ -143,44 +143,44 @@ functions return the matching pointer representing the opp if a match is found, else returns error. These errors are expected to be handled by standard error checks such as IS_ERR() and appropriate actions taken by the caller. -opp_find_freq_exact - Search for an OPP based on an *exact* frequency and +dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and availability. This function is especially useful to enable an OPP which is not available by default. Example: In a case when SoC framework detects a situation where a higher frequency could be made available, it can use this function to - find the OPP prior to call the opp_enable to actually make it available. + find the OPP prior to call the dev_pm_opp_enable to actually make it available. rcu_read_lock(); - opp = opp_find_freq_exact(dev, 1000000000, false); + opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); rcu_read_unlock(); /* dont operate on the pointer.. just do a sanity check.. */ if (IS_ERR(opp)) { pr_err("frequency not disabled!\n"); /* trigger appropriate actions.. */ } else { - opp_enable(dev,1000000000); + dev_pm_opp_enable(dev,1000000000); } NOTE: This is the only search function that operates on OPPs which are not available. -opp_find_freq_floor - Search for an available OPP which is *at most* the +dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the provided frequency. This function is useful while searching for a lesser match OR operating on OPP information in the order of decreasing frequency. Example: To find the highest opp for a device: freq = ULONG_MAX; rcu_read_lock(); - opp_find_freq_floor(dev, &freq); + dev_pm_opp_find_freq_floor(dev, &freq); rcu_read_unlock(); -opp_find_freq_ceil - Search for an available OPP which is *at least* the +dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the provided frequency. This function is useful while searching for a higher match OR operating on OPP information in the order of increasing frequency. Example 1: To find the lowest opp for a device: freq = 0; rcu_read_lock(); - opp_find_freq_ceil(dev, &freq); + dev_pm_opp_find_freq_ceil(dev, &freq); rcu_read_unlock(); Example 2: A simplified implementation of a SoC cpufreq_driver->target: soc_cpufreq_target(..) @@ -188,7 +188,7 @@ opp_find_freq_ceil - Search for an available OPP which is *at least* the /* Do stuff like policy checks etc. */ /* Find the best frequency match for the req */ rcu_read_lock(); - opp = opp_find_freq_ceil(dev, &freq); + opp = dev_pm_opp_find_freq_ceil(dev, &freq); rcu_read_unlock(); if (!IS_ERR(opp)) soc_switch_to_freq_voltage(freq); @@ -208,34 +208,34 @@ as thermal considerations (e.g. don't use OPPx until the temperature drops). WARNING: Do not use these functions in interrupt context. -opp_enable - Make a OPP available for operation. +dev_pm_opp_enable - Make a OPP available for operation. Example: Lets say that 1GHz OPP is to be made available only if the SoC temperature is lower than a certain threshold. The SoC framework implementation might choose to do something as follows: if (cur_temp < temp_low_thresh) { /* Enable 1GHz if it was disabled */ rcu_read_lock(); - opp = opp_find_freq_exact(dev, 1000000000, false); + opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false); rcu_read_unlock(); /* just error check */ if (!IS_ERR(opp)) - ret = opp_enable(dev, 1000000000); + ret = dev_pm_opp_enable(dev, 1000000000); else goto try_something_else; } -opp_disable - Make an OPP to be not available for operation +dev_pm_opp_disable - Make an OPP to be not available for operation Example: Lets say that 1GHz OPP is to be disabled if the temperature exceeds a threshold value. The SoC framework implementation might choose to do something as follows: if (cur_temp > temp_high_thresh) { /* Disable 1GHz if it was enabled */ rcu_read_lock(); - opp = opp_find_freq_exact(dev, 1000000000, true); + opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true); rcu_read_unlock(); /* just error check */ if (!IS_ERR(opp)) - ret = opp_disable(dev, 1000000000); + ret = dev_pm_opp_disable(dev, 1000000000); else goto try_something_else; } @@ -247,7 +247,7 @@ information from the OPP structure is necessary. Once an OPP pointer is retrieved using the search functions, the following functions can be used by SoC framework to retrieve the information represented inside the OPP layer. -opp_get_voltage - Retrieve the voltage represented by the opp pointer. +dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer. Example: At a cpufreq transition to a different frequency, SoC framework requires to set the voltage represented by the OPP using the regulator framework to the Power Management chip providing the @@ -256,15 +256,15 @@ opp_get_voltage - Retrieve the voltage represented by the opp pointer. { /* do things */ rcu_read_lock(); - opp = opp_find_freq_ceil(dev, &freq); - v = opp_get_voltage(opp); + opp = dev_pm_opp_find_freq_ceil(dev, &freq); + v = dev_pm_opp_get_voltage(opp); rcu_read_unlock(); if (v) regulator_set_voltage(.., v); /* do other things */ } -opp_get_freq - Retrieve the freq represented by the opp pointer. +dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer. Example: Lets say the SoC framework uses a couple of helper functions we could pass opp pointers instead of doing additional parameters to handle quiet a bit of data parameters. @@ -273,8 +273,8 @@ opp_get_freq - Retrieve the freq represented by the opp pointer. /* do things.. */ max_freq = ULONG_MAX; rcu_read_lock(); - max_opp = opp_find_freq_floor(dev,&max_freq); - requested_opp = opp_find_freq_ceil(dev,&freq); + max_opp = dev_pm_opp_find_freq_floor(dev,&max_freq); + requested_opp = dev_pm_opp_find_freq_ceil(dev,&freq); if (!IS_ERR(max_opp) && !IS_ERR(requested_opp)) r = soc_test_validity(max_opp, requested_opp); rcu_read_unlock(); @@ -282,25 +282,25 @@ opp_get_freq - Retrieve the freq represented by the opp pointer. } soc_test_validity(..) { - if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp)) + if(dev_pm_opp_get_voltage(max_opp) < dev_pm_opp_get_voltage(requested_opp)) return -EINVAL; - if(opp_get_freq(max_opp) < opp_get_freq(requested_opp)) + if(dev_pm_opp_get_freq(max_opp) < dev_pm_opp_get_freq(requested_opp)) return -EINVAL; /* do things.. */ } -opp_get_opp_count - Retrieve the number of available opps for a device +dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device Example: Lets say a co-processor in the SoC needs to know the available frequencies in a table, the main processor can notify as following: soc_notify_coproc_available_frequencies() { /* Do things */ rcu_read_lock(); - num_available = opp_get_opp_count(dev); + num_available = dev_pm_opp_get_opp_count(dev); speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL); /* populate the table in increasing order */ freq = 0; - while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) { + while (!IS_ERR(opp = dev_pm_opp_find_freq_ceil(dev, &freq))) { speeds[i] = freq; freq++; i++; @@ -313,7 +313,7 @@ opp_get_opp_count - Retrieve the number of available opps for a device 6. Cpufreq Table Generation =========================== -opp_init_cpufreq_table - cpufreq framework typically is initialized with +dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with cpufreq_frequency_table_cpuinfo which is provided with the list of frequencies that are available for operation. This function provides a ready to use conversion routine to translate the OPP layer's internal @@ -326,7 +326,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with soc_pm_init() { /* Do things */ - r = opp_init_cpufreq_table(dev, &freq_table); + r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); if (!r) cpufreq_frequency_table_cpuinfo(policy, freq_table); /* Do other things */ @@ -336,7 +336,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with addition to CONFIG_PM as power management feature is required to dynamically scale voltage and frequency in a system. -opp_free_cpufreq_table - Free up the table allocated by opp_init_cpufreq_table +dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table 7. Data Structures ================== @@ -358,16 +358,16 @@ accessed by various functions as described above. However, the structures representing the actual OPPs and domains are internal to the OPP library itself to allow for suitable abstraction reusable across systems. -struct opp - The internal data structure of OPP library which is used to +struct dev_pm_opp - The internal data structure of OPP library which is used to represent an OPP. In addition to the freq, voltage, availability information, it also contains internal book keeping information required for the OPP library to operate on. Pointer to this structure is provided back to the users such as SoC framework to be used as a identifier for OPP in the interactions with OPP layer. - WARNING: The struct opp pointer should not be parsed or modified by the - users. The defaults of for an instance is populated by opp_add, but the - availability of the OPP can be modified by opp_enable/disable functions. + WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the + users. The defaults of for an instance is populated by dev_pm_opp_add, but the + availability of the OPP can be modified by dev_pm_opp_enable/disable functions. struct device - This is used to identify a domain to the OPP layer. The nature of the device and it's implementation is left to the user of @@ -377,19 +377,19 @@ Overall, in a simplistic view, the data structure operations is represented as following: Initialization / modification: - +-----+ /- opp_enable -opp_add --> | opp | <------- - | +-----+ \- opp_disable + +-----+ /- dev_pm_opp_enable +dev_pm_opp_add --> | opp | <------- + | +-----+ \- dev_pm_opp_disable \-------> domain_info(device) Search functions: - /-- opp_find_freq_ceil ---\ +-----+ -domain_info<---- opp_find_freq_exact -----> | opp | - \-- opp_find_freq_floor ---/ +-----+ + /-- dev_pm_opp_find_freq_ceil ---\ +-----+ +domain_info<---- dev_pm_opp_find_freq_exact -----> | opp | + \-- dev_pm_opp_find_freq_floor ---/ +-----+ Retrieval functions: -+-----+ /- opp_get_voltage ++-----+ /- dev_pm_opp_get_voltage | opp | <--- -+-----+ \- opp_get_freq ++-----+ \- dev_pm_opp_get_freq -domain_info <- opp_get_opp_count +domain_info <- dev_pm_opp_get_opp_count diff --git a/Documentation/power/powercap/powercap.txt b/Documentation/power/powercap/powercap.txt new file mode 100644 index 000000000000..1e6ef164e07a --- /dev/null +++ b/Documentation/power/powercap/powercap.txt @@ -0,0 +1,236 @@ +Power Capping Framework +================================== + +The power capping framework provides a consistent interface between the kernel +and the user space that allows power capping drivers to expose the settings to +user space in a uniform way. + +Terminology +========================= +The framework exposes power capping devices to user space via sysfs in the +form of a tree of objects. The objects at the root level of the tree represent +'control types', which correspond to different methods of power capping. For +example, the intel-rapl control type represents the Intel "Running Average +Power Limit" (RAPL) technology, whereas the 'idle-injection' control type +corresponds to the use of idle injection for controlling power. + +Power zones represent different parts of the system, which can be controlled and +monitored using the power capping method determined by the control type the +given zone belongs to. They each contain attributes for monitoring power, as +well as controls represented in the form of power constraints. If the parts of +the system represented by different power zones are hierarchical (that is, one +bigger part consists of multiple smaller parts that each have their own power +controls), those power zones may also be organized in a hierarchy with one +parent power zone containing multiple subzones and so on to reflect the power +control topology of the system. In that case, it is possible to apply power +capping to a set of devices together using the parent power zone and if more +fine grained control is required, it can be applied through the subzones. + + +Example sysfs interface tree: + +/sys/devices/virtual/powercap +??? intel-rapl + ??? intel-rapl:0 + ? ??? constraint_0_name + ? ??? constraint_0_power_limit_uw + ? ??? constraint_0_time_window_us + ? ??? constraint_1_name + ? ??? constraint_1_power_limit_uw + ? ??? constraint_1_time_window_us + ? ??? device -> ../../intel-rapl + ? ??? energy_uj + ? ??? intel-rapl:0:0 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:0 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? intel-rapl:0:1 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:0 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? max_energy_range_uj + ? ??? max_power_range_uw + ? ??? name + ? ??? enabled + ? ??? power + ? ? ??? async + ? ? [] + ? ??? subsystem -> ../../../../../class/power_cap + ? ??? enabled + ? ??? uevent + ??? intel-rapl:1 + ? ??? constraint_0_name + ? ??? constraint_0_power_limit_uw + ? ??? constraint_0_time_window_us + ? ??? constraint_1_name + ? ??? constraint_1_power_limit_uw + ? ??? constraint_1_time_window_us + ? ??? device -> ../../intel-rapl + ? ??? energy_uj + ? ??? intel-rapl:1:0 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:1 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? intel-rapl:1:1 + ? ? ??? constraint_0_name + ? ? ??? constraint_0_power_limit_uw + ? ? ??? constraint_0_time_window_us + ? ? ??? constraint_1_name + ? ? ??? constraint_1_power_limit_uw + ? ? ??? constraint_1_time_window_us + ? ? ??? device -> ../../intel-rapl:1 + ? ? ??? energy_uj + ? ? ??? max_energy_range_uj + ? ? ??? name + ? ? ??? enabled + ? ? ??? power + ? ? ? ??? async + ? ? ? [] + ? ? ??? subsystem -> ../../../../../../class/power_cap + ? ? ??? uevent + ? ??? max_energy_range_uj + ? ??? max_power_range_uw + ? ??? name + ? ??? enabled + ? ??? power + ? ? ??? async + ? ? [] + ? ??? subsystem -> ../../../../../class/power_cap + ? ??? uevent + ??? power + ? ??? async + ? [] + ??? subsystem -> ../../../../class/power_cap + ??? enabled + ??? uevent + +The above example illustrates a case in which the Intel RAPL technology, +available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one +control type called intel-rapl which contains two power zones, intel-rapl:0 and +intel-rapl:1, representing CPU packages. Each of these power zones contains +two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the +"core" and the "uncore" parts of the given CPU package, respectively. All of +the zones and subzones contain energy monitoring attributes (energy_uj, +max_energy_range_uj) and constraint attributes (constraint_*) allowing controls +to be applied (the constraints in the 'package' power zones apply to the whole +CPU packages and the subzone constraints only apply to the respective parts of +the given package individually). Since Intel RAPL doesn't provide instantaneous +power value, there is no power_uw attribute. + +In addition to that, each power zone contains a name attribute, allowing the +part of the system represented by that zone to be identified. +For example: + +cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name +package-0 + +The Intel RAPL technology allows two constraints, short term and long term, +with two different time windows to be applied to each power zone. Thus for +each zone there are 2 attributes representing the constraint names, 2 power +limits and 2 attributes representing the sizes of the time windows. Such that, +constraint_j_* attributes correspond to the jth constraint (j = 0,1). + +For example: + constraint_0_name + constraint_0_power_limit_uw + constraint_0_time_window_us + constraint_1_name + constraint_1_power_limit_uw + constraint_1_time_window_us + +Power Zone Attributes +================================= +Monitoring attributes +---------------------- + +energy_uj (rw): Current energy counter in micro joules. Write "0" to reset. +If the counter can not be reset, then this attribute is read only. + +max_energy_range_uj (ro): Range of the above energy counter in micro-joules. + +power_uw (ro): Current power in micro watts. + +max_power_range_uw (ro): Range of the above power value in micro-watts. + +name (ro): Name of this power zone. + +It is possible that some domains have both power ranges and energy counter ranges; +however, only one is mandatory. + +Constraints +---------------- +constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be +applicable for the time window specified by "constraint_X_time_window_us". + +constraint_X_time_window_us (rw): Time window in micro seconds. + +constraint_X_name (ro): An optional name of the constraint + +constraint_X_max_power_uw(ro): Maximum allowed power in micro watts. + +constraint_X_min_power_uw(ro): Minimum allowed power in micro watts. + +constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds. + +constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds. + +Except power_limit_uw and time_window_us other fields are optional. + +Common zone and control type attributes +---------------------------------------- +enabled (rw): Enable/Disable controls at zone level or for all zones using +a control type. + +Power Cap Client Driver Interface +================================== +The API summary: + +Call powercap_register_control_type() to register control type object. +Call powercap_register_zone() to register a power zone (under a given +control type), either as a top-level power zone or as a subzone of another +power zone registered earlier. +The number of constraints in a power zone and the corresponding callbacks have +to be defined prior to calling powercap_register_zone() to register that zone. + +To Free a power zone call powercap_unregister_zone(). +To free a control type object call powercap_unregister_control_type(). +Detailed API can be generated using kernel-doc on include/linux/powercap.h. diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 71d8fe4e75d3..0f54333b0ff2 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -145,11 +145,13 @@ The action performed by the idle callback is totally dependent on the subsystem if the device can be suspended (i.e. if all of the conditions necessary for suspending the device are satisfied) and to queue up a suspend request for the device in that case. If there is no idle callback, or if the callback returns -0, then the PM core will attempt to carry out a runtime suspend of the device; -in essence, it will call pm_runtime_suspend() directly. To prevent this (for -example, if the callback routine has started a delayed suspend), the routine -should return a non-zero value. Negative error return codes are ignored by the -PM core. +0, then the PM core will attempt to carry out a runtime suspend of the device, +also respecting devices configured for autosuspend. In essence this means a +call to pm_runtime_autosuspend() (do note that drivers needs to update the +device last busy mark, pm_runtime_mark_last_busy(), to control the delay under +this circumstance). To prevent this (for example, if the callback routine has +started a delayed suspend), the routine must return a non-zero value. Negative +error return codes are ignored by the PM core. The helper functions provided by the PM core, described in Section 4, guarantee that the following constraints are met with respect to runtime PM callbacks for @@ -308,7 +310,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: - execute the subsystem-level idle callback for the device; returns an error code on failure, where -EINPROGRESS means that ->runtime_idle() is already being executed; if there is no callback or the callback returns 0 - then run pm_runtime_suspend(dev) and return its result + then run pm_runtime_autosuspend(dev) and return its result int pm_runtime_suspend(struct device *dev); - execute the subsystem-level suspend callback for the device; returns 0 on |