summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-class-powercap152
-rw-r--r--Documentation/cpu-freq/cpu-drivers.txt27
-rw-r--r--Documentation/cpu-freq/governors.txt4
-rw-r--r--Documentation/cpuidle/governor.txt1
-rw-r--r--Documentation/power/opp.txt108
-rw-r--r--Documentation/power/powercap/powercap.txt236
-rw-r--r--Documentation/power/runtime_pm.txt14
7 files changed, 470 insertions, 72 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-powercap b/Documentation/ABI/testing/sysfs-class-powercap
new file mode 100644
index 000000000000..db3b3ff70d84
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-powercap
@@ -0,0 +1,152 @@
+What: /sys/class/powercap/
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ The powercap/ class sub directory belongs to the power cap
+ subsystem. Refer to
+ Documentation/power/powercap/powercap.txt for details.
+
+What: /sys/class/powercap/<control type>
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ A <control type> is a unique name under /sys/class/powercap.
+ Here <control type> determines how the power is going to be
+ controlled. A <control type> can contain multiple power zones.
+
+What: /sys/class/powercap/<control type>/enabled
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ This allows to enable/disable power capping for a "control type".
+ This status affects every power zone using this "control_type.
+
+What: /sys/class/powercap/<control type>/<power zone>
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ A power zone is a single or a collection of devices, which can
+ be independently monitored and controlled. A power zone sysfs
+ entry is qualified with the name of the <control type>.
+ E.g. intel-rapl:0:1:1.
+
+What: /sys/class/powercap/<control type>/<power zone>/<child power zone>
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Power zones may be organized in a hierarchy in which child
+ power zones provide monitoring and control for a subset of
+ devices under the parent. For example, if there is a parent
+ power zone for a whole CPU package, each CPU core in it can
+ be a child power zone.
+
+What: /sys/class/powercap/.../<power zone>/name
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Specifies the name of this power zone.
+
+What: /sys/class/powercap/.../<power zone>/energy_uj
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Current energy counter in micro-joules. Write "0" to reset.
+ If the counter can not be reset, then this attribute is
+ read-only.
+
+What: /sys/class/powercap/.../<power zone>/max_energy_range_uj
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Range of the above energy counter in micro-joules.
+
+
+What: /sys/class/powercap/.../<power zone>/power_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Current power in micro-watts.
+
+What: /sys/class/powercap/.../<power zone>/max_power_range_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Range of the above power value in micro-watts.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_name
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Each power zone can define one or more constraints. Each
+ constraint can have an optional name. Here "X" can have values
+ from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_power_limit_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Power limit in micro-watts should be applicable for
+ the time window specified by "constraint_X_time_window_us".
+ Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_time_window_us
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Time window in micro seconds. This is used along with
+ constraint_X_power_limit_uw to define a power constraint.
+ Here "X" can have values from 0 to max integer.
+
+
+What: /sys/class/powercap/<control type>/.../constraint_X_max_power_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Maximum allowed power in micro watts for this constraint.
+ Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/<control type>/.../constraint_X_min_power_uw
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Minimum allowed power in micro watts for this constraint.
+ Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_max_time_window_us
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Maximum allowed time window in micro seconds for this
+ constraint. Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/constraint_X_min_time_window_us
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description:
+ Minimum allowed time window in micro seconds for this
+ constraint. Here "X" can have values from 0 to max integer.
+
+What: /sys/class/powercap/.../<power zone>/enabled
+Date: September 2013
+KernelVersion: 3.13
+Contact: linux-pm@vger.kernel.org
+Description
+ This allows to enable/disable power capping at power zone level.
+ This applies to current power zone and its children.
diff --git a/Documentation/cpu-freq/cpu-drivers.txt b/Documentation/cpu-freq/cpu-drivers.txt
index 40282e617913..8b1a4451422e 100644
--- a/Documentation/cpu-freq/cpu-drivers.txt
+++ b/Documentation/cpu-freq/cpu-drivers.txt
@@ -23,8 +23,8 @@ Contents:
1.1 Initialization
1.2 Per-CPU Initialization
1.3 verify
-1.4 target or setpolicy?
-1.5 target
+1.4 target/target_index or setpolicy?
+1.5 target/target_index
1.6 setpolicy
2. Frequency Table Helpers
@@ -56,7 +56,8 @@ cpufreq_driver.init - A pointer to the per-CPU initialization
cpufreq_driver.verify - A pointer to a "verification" function.
cpufreq_driver.setpolicy _or_
-cpufreq_driver.target - See below on the differences.
+cpufreq_driver.target/
+target_index - See below on the differences.
And optionally
@@ -66,7 +67,7 @@ cpufreq_driver.resume - A pointer to a per-CPU resume function
which is called with interrupts disabled
and _before_ the pre-suspend frequency
and/or policy is restored by a call to
- ->target or ->setpolicy.
+ ->target/target_index or ->setpolicy.
cpufreq_driver.attr - A pointer to a NULL-terminated list of
"struct freq_attr" which allow to
@@ -103,8 +104,8 @@ policy->governor must contain the "default policy" for
this CPU. A few moments later,
cpufreq_driver.verify and either
cpufreq_driver.setpolicy or
- cpufreq_driver.target is called with
- these values.
+ cpufreq_driver.target/target_index is called
+ with these values.
For setting some of these values (cpuinfo.min[max]_freq, policy->min[max]), the
frequency table helpers might be helpful. See the section 2 for more information
@@ -133,20 +134,28 @@ range) is within policy->min and policy->max. If necessary, increase
policy->max first, and only if this is no solution, decrease policy->min.
-1.4 target or setpolicy?
+1.4 target/target_index or setpolicy?
----------------------------
Most cpufreq drivers or even most cpu frequency scaling algorithms
only allow the CPU to be set to one frequency. For these, you use the
-->target call.
+->target/target_index call.
Some cpufreq-capable processors switch the frequency between certain
limits on their own. These shall use the ->setpolicy call
-1.4. target
+1.4. target/target_index
-------------
+The target_index call has two arguments: struct cpufreq_policy *policy,
+and unsigned int index (into the exposed frequency table).
+
+The CPUfreq driver must set the new frequency when called here. The
+actual frequency must be determined by freq_table[index].frequency.
+
+Deprecated:
+----------
The target call has three arguments: struct cpufreq_policy *policy,
unsigned int target_frequency, unsigned int relation.
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index 219970ba54b7..77ec21574fb1 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -40,7 +40,7 @@ Most cpufreq drivers (in fact, all except one, longrun) or even most
cpu frequency scaling algorithms only offer the CPU to be set to one
frequency. In order to offer dynamic frequency scaling, the cpufreq
core must be able to tell these drivers of a "target frequency". So
-these specific drivers will be transformed to offer a "->target"
+these specific drivers will be transformed to offer a "->target/target_index"
call instead of the existing "->setpolicy" call. For "longrun", all
stays the same, though.
@@ -71,7 +71,7 @@ CPU can be set to switch independently | CPU can only be set
/ the limits of policy->{min,max}
/ \
/ \
- Using the ->setpolicy call, Using the ->target call,
+ Using the ->setpolicy call, Using the ->target/target_index call,
the limits and the the frequency closest
"policy" is set. to target_freq is set.
It is assured that it
diff --git a/Documentation/cpuidle/governor.txt b/Documentation/cpuidle/governor.txt
index 12c6bd50c9f6..d9020f5e847b 100644
--- a/Documentation/cpuidle/governor.txt
+++ b/Documentation/cpuidle/governor.txt
@@ -25,5 +25,4 @@ kernel configuration and platform will be selected by cpuidle.
Interfaces:
extern int cpuidle_register_governor(struct cpuidle_governor *gov);
-extern void cpuidle_unregister_governor(struct cpuidle_governor *gov);
struct cpuidle_governor
diff --git a/Documentation/power/opp.txt b/Documentation/power/opp.txt
index 425c51d56aef..b8a907dc0169 100644
--- a/Documentation/power/opp.txt
+++ b/Documentation/power/opp.txt
@@ -42,7 +42,7 @@ We can represent these as three OPPs as the following {Hz, uV} tuples:
OPP library provides a set of helper functions to organize and query the OPP
information. The library is located in drivers/base/power/opp.c and the header
-is located in include/linux/opp.h. OPP library can be enabled by enabling
+is located in include/linux/pm_opp.h. OPP library can be enabled by enabling
CONFIG_PM_OPP from power management menuconfig menu. OPP library depends on
CONFIG_PM as certain SoCs such as Texas Instrument's OMAP framework allows to
optionally boot at a certain OPP without needing cpufreq.
@@ -71,14 +71,14 @@ operations until that OPP could be re-enabled if possible.
OPP library facilitates this concept in it's implementation. The following
operational functions operate only on available opps:
-opp_find_freq_{ceil, floor}, opp_get_voltage, opp_get_freq, opp_get_opp_count
-and opp_init_cpufreq_table
+opp_find_freq_{ceil, floor}, dev_pm_opp_get_voltage, dev_pm_opp_get_freq, dev_pm_opp_get_opp_count
+and dev_pm_opp_init_cpufreq_table
-opp_find_freq_exact is meant to be used to find the opp pointer which can then
-be used for opp_enable/disable functions to make an opp available as required.
+dev_pm_opp_find_freq_exact is meant to be used to find the opp pointer which can then
+be used for dev_pm_opp_enable/disable functions to make an opp available as required.
WARNING: Users of OPP library should refresh their availability count using
-get_opp_count if opp_enable/disable functions are invoked for a device, the
+get_opp_count if dev_pm_opp_enable/disable functions are invoked for a device, the
exact mechanism to trigger these or the notification mechanism to other
dependent subsystems such as cpufreq are left to the discretion of the SoC
specific framework which uses the OPP library. Similar care needs to be taken
@@ -96,24 +96,24 @@ using RCU read locks. The opp_find_freq_{exact,ceil,floor},
opp_get_{voltage, freq, opp_count} fall into this category.
opp_{add,enable,disable} are updaters which use mutex and implement it's own
-RCU locking mechanisms. opp_init_cpufreq_table acts as an updater and uses
+RCU locking mechanisms. dev_pm_opp_init_cpufreq_table acts as an updater and uses
mutex to implment RCU updater strategy. These functions should *NOT* be called
under RCU locks and other contexts that prevent blocking functions in RCU or
mutex operations from working.
2. Initial OPP List Registration
================================
-The SoC implementation calls opp_add function iteratively to add OPPs per
+The SoC implementation calls dev_pm_opp_add function iteratively to add OPPs per
device. It is expected that the SoC framework will register the OPP entries
optimally- typical numbers range to be less than 5. The list generated by
registering the OPPs is maintained by OPP library throughout the device
operation. The SoC framework can subsequently control the availability of the
-OPPs dynamically using the opp_enable / disable functions.
+OPPs dynamically using the dev_pm_opp_enable / disable functions.
-opp_add - Add a new OPP for a specific domain represented by the device pointer.
+dev_pm_opp_add - Add a new OPP for a specific domain represented by the device pointer.
The OPP is defined using the frequency and voltage. Once added, the OPP
is assumed to be available and control of it's availability can be done
- with the opp_enable/disable functions. OPP library internally stores
+ with the dev_pm_opp_enable/disable functions. OPP library internally stores
and manages this information in the opp struct. This function may be
used by SoC framework to define a optimal list as per the demands of
SoC usage environment.
@@ -124,7 +124,7 @@ opp_add - Add a new OPP for a specific domain represented by the device pointer.
soc_pm_init()
{
/* Do things */
- r = opp_add(mpu_dev, 1000000, 900000);
+ r = dev_pm_opp_add(mpu_dev, 1000000, 900000);
if (!r) {
pr_err("%s: unable to register mpu opp(%d)\n", r);
goto no_cpufreq;
@@ -143,44 +143,44 @@ functions return the matching pointer representing the opp if a match is
found, else returns error. These errors are expected to be handled by standard
error checks such as IS_ERR() and appropriate actions taken by the caller.
-opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
+dev_pm_opp_find_freq_exact - Search for an OPP based on an *exact* frequency and
availability. This function is especially useful to enable an OPP which
is not available by default.
Example: In a case when SoC framework detects a situation where a
higher frequency could be made available, it can use this function to
- find the OPP prior to call the opp_enable to actually make it available.
+ find the OPP prior to call the dev_pm_opp_enable to actually make it available.
rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, false);
+ opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
rcu_read_unlock();
/* dont operate on the pointer.. just do a sanity check.. */
if (IS_ERR(opp)) {
pr_err("frequency not disabled!\n");
/* trigger appropriate actions.. */
} else {
- opp_enable(dev,1000000000);
+ dev_pm_opp_enable(dev,1000000000);
}
NOTE: This is the only search function that operates on OPPs which are
not available.
-opp_find_freq_floor - Search for an available OPP which is *at most* the
+dev_pm_opp_find_freq_floor - Search for an available OPP which is *at most* the
provided frequency. This function is useful while searching for a lesser
match OR operating on OPP information in the order of decreasing
frequency.
Example: To find the highest opp for a device:
freq = ULONG_MAX;
rcu_read_lock();
- opp_find_freq_floor(dev, &freq);
+ dev_pm_opp_find_freq_floor(dev, &freq);
rcu_read_unlock();
-opp_find_freq_ceil - Search for an available OPP which is *at least* the
+dev_pm_opp_find_freq_ceil - Search for an available OPP which is *at least* the
provided frequency. This function is useful while searching for a
higher match OR operating on OPP information in the order of increasing
frequency.
Example 1: To find the lowest opp for a device:
freq = 0;
rcu_read_lock();
- opp_find_freq_ceil(dev, &freq);
+ dev_pm_opp_find_freq_ceil(dev, &freq);
rcu_read_unlock();
Example 2: A simplified implementation of a SoC cpufreq_driver->target:
soc_cpufreq_target(..)
@@ -188,7 +188,7 @@ opp_find_freq_ceil - Search for an available OPP which is *at least* the
/* Do stuff like policy checks etc. */
/* Find the best frequency match for the req */
rcu_read_lock();
- opp = opp_find_freq_ceil(dev, &freq);
+ opp = dev_pm_opp_find_freq_ceil(dev, &freq);
rcu_read_unlock();
if (!IS_ERR(opp))
soc_switch_to_freq_voltage(freq);
@@ -208,34 +208,34 @@ as thermal considerations (e.g. don't use OPPx until the temperature drops).
WARNING: Do not use these functions in interrupt context.
-opp_enable - Make a OPP available for operation.
+dev_pm_opp_enable - Make a OPP available for operation.
Example: Lets say that 1GHz OPP is to be made available only if the
SoC temperature is lower than a certain threshold. The SoC framework
implementation might choose to do something as follows:
if (cur_temp < temp_low_thresh) {
/* Enable 1GHz if it was disabled */
rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, false);
+ opp = dev_pm_opp_find_freq_exact(dev, 1000000000, false);
rcu_read_unlock();
/* just error check */
if (!IS_ERR(opp))
- ret = opp_enable(dev, 1000000000);
+ ret = dev_pm_opp_enable(dev, 1000000000);
else
goto try_something_else;
}
-opp_disable - Make an OPP to be not available for operation
+dev_pm_opp_disable - Make an OPP to be not available for operation
Example: Lets say that 1GHz OPP is to be disabled if the temperature
exceeds a threshold value. The SoC framework implementation might
choose to do something as follows:
if (cur_temp > temp_high_thresh) {
/* Disable 1GHz if it was enabled */
rcu_read_lock();
- opp = opp_find_freq_exact(dev, 1000000000, true);
+ opp = dev_pm_opp_find_freq_exact(dev, 1000000000, true);
rcu_read_unlock();
/* just error check */
if (!IS_ERR(opp))
- ret = opp_disable(dev, 1000000000);
+ ret = dev_pm_opp_disable(dev, 1000000000);
else
goto try_something_else;
}
@@ -247,7 +247,7 @@ information from the OPP structure is necessary. Once an OPP pointer is
retrieved using the search functions, the following functions can be used by SoC
framework to retrieve the information represented inside the OPP layer.
-opp_get_voltage - Retrieve the voltage represented by the opp pointer.
+dev_pm_opp_get_voltage - Retrieve the voltage represented by the opp pointer.
Example: At a cpufreq transition to a different frequency, SoC
framework requires to set the voltage represented by the OPP using
the regulator framework to the Power Management chip providing the
@@ -256,15 +256,15 @@ opp_get_voltage - Retrieve the voltage represented by the opp pointer.
{
/* do things */
rcu_read_lock();
- opp = opp_find_freq_ceil(dev, &freq);
- v = opp_get_voltage(opp);
+ opp = dev_pm_opp_find_freq_ceil(dev, &freq);
+ v = dev_pm_opp_get_voltage(opp);
rcu_read_unlock();
if (v)
regulator_set_voltage(.., v);
/* do other things */
}
-opp_get_freq - Retrieve the freq represented by the opp pointer.
+dev_pm_opp_get_freq - Retrieve the freq represented by the opp pointer.
Example: Lets say the SoC framework uses a couple of helper functions
we could pass opp pointers instead of doing additional parameters to
handle quiet a bit of data parameters.
@@ -273,8 +273,8 @@ opp_get_freq - Retrieve the freq represented by the opp pointer.
/* do things.. */
max_freq = ULONG_MAX;
rcu_read_lock();
- max_opp = opp_find_freq_floor(dev,&max_freq);
- requested_opp = opp_find_freq_ceil(dev,&freq);
+ max_opp = dev_pm_opp_find_freq_floor(dev,&max_freq);
+ requested_opp = dev_pm_opp_find_freq_ceil(dev,&freq);
if (!IS_ERR(max_opp) && !IS_ERR(requested_opp))
r = soc_test_validity(max_opp, requested_opp);
rcu_read_unlock();
@@ -282,25 +282,25 @@ opp_get_freq - Retrieve the freq represented by the opp pointer.
}
soc_test_validity(..)
{
- if(opp_get_voltage(max_opp) < opp_get_voltage(requested_opp))
+ if(dev_pm_opp_get_voltage(max_opp) < dev_pm_opp_get_voltage(requested_opp))
return -EINVAL;
- if(opp_get_freq(max_opp) < opp_get_freq(requested_opp))
+ if(dev_pm_opp_get_freq(max_opp) < dev_pm_opp_get_freq(requested_opp))
return -EINVAL;
/* do things.. */
}
-opp_get_opp_count - Retrieve the number of available opps for a device
+dev_pm_opp_get_opp_count - Retrieve the number of available opps for a device
Example: Lets say a co-processor in the SoC needs to know the available
frequencies in a table, the main processor can notify as following:
soc_notify_coproc_available_frequencies()
{
/* Do things */
rcu_read_lock();
- num_available = opp_get_opp_count(dev);
+ num_available = dev_pm_opp_get_opp_count(dev);
speeds = kzalloc(sizeof(u32) * num_available, GFP_KERNEL);
/* populate the table in increasing order */
freq = 0;
- while (!IS_ERR(opp = opp_find_freq_ceil(dev, &freq))) {
+ while (!IS_ERR(opp = dev_pm_opp_find_freq_ceil(dev, &freq))) {
speeds[i] = freq;
freq++;
i++;
@@ -313,7 +313,7 @@ opp_get_opp_count - Retrieve the number of available opps for a device
6. Cpufreq Table Generation
===========================
-opp_init_cpufreq_table - cpufreq framework typically is initialized with
+dev_pm_opp_init_cpufreq_table - cpufreq framework typically is initialized with
cpufreq_frequency_table_cpuinfo which is provided with the list of
frequencies that are available for operation. This function provides
a ready to use conversion routine to translate the OPP layer's internal
@@ -326,7 +326,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with
soc_pm_init()
{
/* Do things */
- r = opp_init_cpufreq_table(dev, &freq_table);
+ r = dev_pm_opp_init_cpufreq_table(dev, &freq_table);
if (!r)
cpufreq_frequency_table_cpuinfo(policy, freq_table);
/* Do other things */
@@ -336,7 +336,7 @@ opp_init_cpufreq_table - cpufreq framework typically is initialized with
addition to CONFIG_PM as power management feature is required to
dynamically scale voltage and frequency in a system.
-opp_free_cpufreq_table - Free up the table allocated by opp_init_cpufreq_table
+dev_pm_opp_free_cpufreq_table - Free up the table allocated by dev_pm_opp_init_cpufreq_table
7. Data Structures
==================
@@ -358,16 +358,16 @@ accessed by various functions as described above. However, the structures
representing the actual OPPs and domains are internal to the OPP library itself
to allow for suitable abstraction reusable across systems.
-struct opp - The internal data structure of OPP library which is used to
+struct dev_pm_opp - The internal data structure of OPP library which is used to
represent an OPP. In addition to the freq, voltage, availability
information, it also contains internal book keeping information required
for the OPP library to operate on. Pointer to this structure is
provided back to the users such as SoC framework to be used as a
identifier for OPP in the interactions with OPP layer.
- WARNING: The struct opp pointer should not be parsed or modified by the
- users. The defaults of for an instance is populated by opp_add, but the
- availability of the OPP can be modified by opp_enable/disable functions.
+ WARNING: The struct dev_pm_opp pointer should not be parsed or modified by the
+ users. The defaults of for an instance is populated by dev_pm_opp_add, but the
+ availability of the OPP can be modified by dev_pm_opp_enable/disable functions.
struct device - This is used to identify a domain to the OPP layer. The
nature of the device and it's implementation is left to the user of
@@ -377,19 +377,19 @@ Overall, in a simplistic view, the data structure operations is represented as
following:
Initialization / modification:
- +-----+ /- opp_enable
-opp_add --> | opp | <-------
- | +-----+ \- opp_disable
+ +-----+ /- dev_pm_opp_enable
+dev_pm_opp_add --> | opp | <-------
+ | +-----+ \- dev_pm_opp_disable
\-------> domain_info(device)
Search functions:
- /-- opp_find_freq_ceil ---\ +-----+
-domain_info<---- opp_find_freq_exact -----> | opp |
- \-- opp_find_freq_floor ---/ +-----+
+ /-- dev_pm_opp_find_freq_ceil ---\ +-----+
+domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
+ \-- dev_pm_opp_find_freq_floor ---/ +-----+
Retrieval functions:
-+-----+ /- opp_get_voltage
++-----+ /- dev_pm_opp_get_voltage
| opp | <---
-+-----+ \- opp_get_freq
++-----+ \- dev_pm_opp_get_freq
-domain_info <- opp_get_opp_count
+domain_info <- dev_pm_opp_get_opp_count
diff --git a/Documentation/power/powercap/powercap.txt b/Documentation/power/powercap/powercap.txt
new file mode 100644
index 000000000000..1e6ef164e07a
--- /dev/null
+++ b/Documentation/power/powercap/powercap.txt
@@ -0,0 +1,236 @@
+Power Capping Framework
+==================================
+
+The power capping framework provides a consistent interface between the kernel
+and the user space that allows power capping drivers to expose the settings to
+user space in a uniform way.
+
+Terminology
+=========================
+The framework exposes power capping devices to user space via sysfs in the
+form of a tree of objects. The objects at the root level of the tree represent
+'control types', which correspond to different methods of power capping. For
+example, the intel-rapl control type represents the Intel "Running Average
+Power Limit" (RAPL) technology, whereas the 'idle-injection' control type
+corresponds to the use of idle injection for controlling power.
+
+Power zones represent different parts of the system, which can be controlled and
+monitored using the power capping method determined by the control type the
+given zone belongs to. They each contain attributes for monitoring power, as
+well as controls represented in the form of power constraints. If the parts of
+the system represented by different power zones are hierarchical (that is, one
+bigger part consists of multiple smaller parts that each have their own power
+controls), those power zones may also be organized in a hierarchy with one
+parent power zone containing multiple subzones and so on to reflect the power
+control topology of the system. In that case, it is possible to apply power
+capping to a set of devices together using the parent power zone and if more
+fine grained control is required, it can be applied through the subzones.
+
+
+Example sysfs interface tree:
+
+/sys/devices/virtual/powercap
+??? intel-rapl
+ ??? intel-rapl:0
+ ?   ??? constraint_0_name
+ ?   ??? constraint_0_power_limit_uw
+ ?   ??? constraint_0_time_window_us
+ ?   ??? constraint_1_name
+ ?   ??? constraint_1_power_limit_uw
+ ?   ??? constraint_1_time_window_us
+ ?   ??? device -> ../../intel-rapl
+ ?   ??? energy_uj
+ ?   ??? intel-rapl:0:0
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:0
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? intel-rapl:0:1
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:0
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? max_energy_range_uj
+ ?   ??? max_power_range_uw
+ ?   ??? name
+ ?   ??? enabled
+ ?   ??? power
+ ?   ?   ??? async
+ ?   ?   []
+ ?   ??? subsystem -> ../../../../../class/power_cap
+ ?   ??? enabled
+ ?   ??? uevent
+ ??? intel-rapl:1
+ ?   ??? constraint_0_name
+ ?   ??? constraint_0_power_limit_uw
+ ?   ??? constraint_0_time_window_us
+ ?   ??? constraint_1_name
+ ?   ??? constraint_1_power_limit_uw
+ ?   ??? constraint_1_time_window_us
+ ?   ??? device -> ../../intel-rapl
+ ?   ??? energy_uj
+ ?   ??? intel-rapl:1:0
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:1
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? intel-rapl:1:1
+ ?   ?   ??? constraint_0_name
+ ?   ?   ??? constraint_0_power_limit_uw
+ ?   ?   ??? constraint_0_time_window_us
+ ?   ?   ??? constraint_1_name
+ ?   ?   ??? constraint_1_power_limit_uw
+ ?   ?   ??? constraint_1_time_window_us
+ ?   ?   ??? device -> ../../intel-rapl:1
+ ?   ?   ??? energy_uj
+ ?   ?   ??? max_energy_range_uj
+ ?   ?   ??? name
+ ?   ?   ??? enabled
+ ?   ?   ??? power
+ ?   ?   ?   ??? async
+ ?   ?   ?   []
+ ?   ?   ??? subsystem -> ../../../../../../class/power_cap
+ ?   ?   ??? uevent
+ ?   ??? max_energy_range_uj
+ ?   ??? max_power_range_uw
+ ?   ??? name
+ ?   ??? enabled
+ ?   ??? power
+ ?   ?   ??? async
+ ?   ?   []
+ ?   ??? subsystem -> ../../../../../class/power_cap
+ ?   ??? uevent
+ ??? power
+ ?   ??? async
+ ?   []
+ ??? subsystem -> ../../../../class/power_cap
+ ??? enabled
+ ??? uevent
+
+The above example illustrates a case in which the Intel RAPL technology,
+available in Intel® IA-64 and IA-32 Processor Architectures, is used. There is one
+control type called intel-rapl which contains two power zones, intel-rapl:0 and
+intel-rapl:1, representing CPU packages. Each of these power zones contains
+two subzones, intel-rapl:j:0 and intel-rapl:j:1 (j = 0, 1), representing the
+"core" and the "uncore" parts of the given CPU package, respectively. All of
+the zones and subzones contain energy monitoring attributes (energy_uj,
+max_energy_range_uj) and constraint attributes (constraint_*) allowing controls
+to be applied (the constraints in the 'package' power zones apply to the whole
+CPU packages and the subzone constraints only apply to the respective parts of
+the given package individually). Since Intel RAPL doesn't provide instantaneous
+power value, there is no power_uw attribute.
+
+In addition to that, each power zone contains a name attribute, allowing the
+part of the system represented by that zone to be identified.
+For example:
+
+cat /sys/class/power_cap/intel-rapl/intel-rapl:0/name
+package-0
+
+The Intel RAPL technology allows two constraints, short term and long term,
+with two different time windows to be applied to each power zone. Thus for
+each zone there are 2 attributes representing the constraint names, 2 power
+limits and 2 attributes representing the sizes of the time windows. Such that,
+constraint_j_* attributes correspond to the jth constraint (j = 0,1).
+
+For example:
+ constraint_0_name
+ constraint_0_power_limit_uw
+ constraint_0_time_window_us
+ constraint_1_name
+ constraint_1_power_limit_uw
+ constraint_1_time_window_us
+
+Power Zone Attributes
+=================================
+Monitoring attributes
+----------------------
+
+energy_uj (rw): Current energy counter in micro joules. Write "0" to reset.
+If the counter can not be reset, then this attribute is read only.
+
+max_energy_range_uj (ro): Range of the above energy counter in micro-joules.
+
+power_uw (ro): Current power in micro watts.
+
+max_power_range_uw (ro): Range of the above power value in micro-watts.
+
+name (ro): Name of this power zone.
+
+It is possible that some domains have both power ranges and energy counter ranges;
+however, only one is mandatory.
+
+Constraints
+----------------
+constraint_X_power_limit_uw (rw): Power limit in micro watts, which should be
+applicable for the time window specified by "constraint_X_time_window_us".
+
+constraint_X_time_window_us (rw): Time window in micro seconds.
+
+constraint_X_name (ro): An optional name of the constraint
+
+constraint_X_max_power_uw(ro): Maximum allowed power in micro watts.
+
+constraint_X_min_power_uw(ro): Minimum allowed power in micro watts.
+
+constraint_X_max_time_window_us(ro): Maximum allowed time window in micro seconds.
+
+constraint_X_min_time_window_us(ro): Minimum allowed time window in micro seconds.
+
+Except power_limit_uw and time_window_us other fields are optional.
+
+Common zone and control type attributes
+----------------------------------------
+enabled (rw): Enable/Disable controls at zone level or for all zones using
+a control type.
+
+Power Cap Client Driver Interface
+==================================
+The API summary:
+
+Call powercap_register_control_type() to register control type object.
+Call powercap_register_zone() to register a power zone (under a given
+control type), either as a top-level power zone or as a subzone of another
+power zone registered earlier.
+The number of constraints in a power zone and the corresponding callbacks have
+to be defined prior to calling powercap_register_zone() to register that zone.
+
+To Free a power zone call powercap_unregister_zone().
+To free a control type object call powercap_unregister_control_type().
+Detailed API can be generated using kernel-doc on include/linux/powercap.h.
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 71d8fe4e75d3..0f54333b0ff2 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -145,11 +145,13 @@ The action performed by the idle callback is totally dependent on the subsystem
if the device can be suspended (i.e. if all of the conditions necessary for
suspending the device are satisfied) and to queue up a suspend request for the
device in that case. If there is no idle callback, or if the callback returns
-0, then the PM core will attempt to carry out a runtime suspend of the device;
-in essence, it will call pm_runtime_suspend() directly. To prevent this (for
-example, if the callback routine has started a delayed suspend), the routine
-should return a non-zero value. Negative error return codes are ignored by the
-PM core.
+0, then the PM core will attempt to carry out a runtime suspend of the device,
+also respecting devices configured for autosuspend. In essence this means a
+call to pm_runtime_autosuspend() (do note that drivers needs to update the
+device last busy mark, pm_runtime_mark_last_busy(), to control the delay under
+this circumstance). To prevent this (for example, if the callback routine has
+started a delayed suspend), the routine must return a non-zero value. Negative
+error return codes are ignored by the PM core.
The helper functions provided by the PM core, described in Section 4, guarantee
that the following constraints are met with respect to runtime PM callbacks for
@@ -308,7 +310,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
- execute the subsystem-level idle callback for the device; returns an
error code on failure, where -EINPROGRESS means that ->runtime_idle() is
already being executed; if there is no callback or the callback returns 0
- then run pm_runtime_suspend(dev) and return its result
+ then run pm_runtime_autosuspend(dev) and return its result
int pm_runtime_suspend(struct device *dev);
- execute the subsystem-level suspend callback for the device; returns 0 on