diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/cpu-freq/intel-pstate.txt | 241 | ||||
-rw-r--r-- | Documentation/cpu-freq/pcc-cpufreq.txt | 4 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/arm/cpus.txt | 17 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt | 91 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/opp/opp.txt | 132 | ||||
-rw-r--r-- | Documentation/power/pci.txt | 2 | ||||
-rw-r--r-- | Documentation/power/runtime_pm.txt | 6 |
7 files changed, 409 insertions, 84 deletions
diff --git a/Documentation/cpu-freq/intel-pstate.txt b/Documentation/cpu-freq/intel-pstate.txt index be8d4006bf76..f7b12c071d53 100644 --- a/Documentation/cpu-freq/intel-pstate.txt +++ b/Documentation/cpu-freq/intel-pstate.txt @@ -1,61 +1,131 @@ -Intel P-state driver +Intel P-State driver -------------------- -This driver provides an interface to control the P state selection for -SandyBridge+ Intel processors. The driver can operate two different -modes based on the processor model, legacy mode and Hardware P state (HWP) -mode. - -In legacy mode, the Intel P-state implements two internal governors, -performance and powersave, that differ from the general cpufreq governors of -the same name (the general cpufreq governors implement target(), whereas the -internal Intel P-state governors implement setpolicy()). The internal -performance governor sets the max_perf_pct and min_perf_pct to 100; that is, -the governor selects the highest available P state to maximize the performance -of the core. The internal powersave governor selects the appropriate P state -based on the current load on the CPU. - -In HWP mode P state selection is implemented in the processor -itself. The driver provides the interfaces between the cpufreq core and -the processor to control P state selection based on user preferences -and reporting frequency to the cpufreq core. In this mode the -internal Intel P-state governor code is disabled. - -In addition to the interfaces provided by the cpufreq core for -controlling frequency the driver provides sysfs files for -controlling P state selection. These files have been added to -/sys/devices/system/cpu/intel_pstate/ - - max_perf_pct: limits the maximum P state that will be requested by - the driver stated as a percentage of the available performance. The - available (P states) performance may be reduced by the no_turbo +This driver provides an interface to control the P-State selection for the +SandyBridge+ Intel processors. + +The following document explains P-States: +http://events.linuxfoundation.org/sites/events/files/slides/LinuxConEurope_2015.pdf +As stated in the document, P-State doesn’t exactly mean a frequency. However, for +the sake of the relationship with cpufreq, P-State and frequency are used +interchangeably. + +Understanding the cpufreq core governors and policies are important before +discussing more details about the Intel P-State driver. Based on what callbacks +a cpufreq driver provides to the cpufreq core, it can support two types of +drivers: +- with target_index() callback: In this mode, the drivers using cpufreq core +simply provide the minimum and maximum frequency limits and an additional +interface target_index() to set the current frequency. The cpufreq subsystem +has a number of scaling governors ("performance", "powersave", "ondemand", +etc.). Depending on which governor is in use, cpufreq core will call for +transitions to a specific frequency using target_index() callback. +- setpolicy() callback: In this mode, drivers do not provide target_index() +callback, so cpufreq core can't request a transition to a specific frequency. +The driver provides minimum and maximum frequency limits and callbacks to set a +policy. The policy in cpufreq sysfs is referred to as the "scaling governor". +The cpufreq core can request the driver to operate in any of the two policies: +"performance: and "powersave". The driver decides which frequency to use based +on the above policy selection considering minimum and maximum frequency limits. + +The Intel P-State driver falls under the latter category, which implements the +setpolicy() callback. This driver decides what P-State to use based on the +requested policy from the cpufreq core. If the processor is capable of +selecting its next P-State internally, then the driver will offload this +responsibility to the processor (aka HWP: Hardware P-States). If not, the +driver implements algorithms to select the next P-State. + +Since these policies are implemented in the driver, they are not same as the +cpufreq scaling governors implementation, even if they have the same name in +the cpufreq sysfs (scaling_governors). For example the "performance" policy is +similar to cpufreq’s "performance" governor, but "powersave" is completely +different than the cpufreq "powersave" governor. The strategy here is similar +to cpufreq "ondemand", where the requested P-State is related to the system load. + +Sysfs Interface + +In addition to the frequency-controlling interfaces provided by the cpufreq +core, the driver provides its own sysfs files to control the P-State selection. +These files have been added to /sys/devices/system/cpu/intel_pstate/. +Any changes made to these files are applicable to all CPUs (even in a +multi-package system). + + max_perf_pct: Limits the maximum P-State that will be requested by + the driver. It states it as a percentage of the available performance. The + available (P-State) performance may be reduced by the no_turbo setting described below. - min_perf_pct: limits the minimum P state that will be requested by - the driver stated as a percentage of the max (non-turbo) + min_perf_pct: Limits the minimum P-State that will be requested by + the driver. It states it as a percentage of the max (non-turbo) performance level. - no_turbo: limits the driver to selecting P states below the turbo + no_turbo: Limits the driver to selecting P-State below the turbo frequency range. - turbo_pct: displays the percentage of the total performance that - is supported by hardware that is in the turbo range. This number + turbo_pct: Displays the percentage of the total performance that + is supported by hardware that is in the turbo range. This number is independent of whether turbo has been disabled or not. - num_pstates: displays the number of pstates that are supported - by hardware. This number is independent of whether turbo has + num_pstates: Displays the number of P-States that are supported + by hardware. This number is independent of whether turbo has been disabled or not. +For example, if a system has these parameters: + Max 1 core turbo ratio: 0x21 (Max 1 core ratio is the maximum P-State) + Max non turbo ratio: 0x17 + Minimum ratio : 0x08 (Here the ratio is called max efficiency ratio) + +Sysfs will show : + max_perf_pct:100, which corresponds to 1 core ratio + min_perf_pct:24, max_efficiency_ratio / max 1 Core ratio + no_turbo:0, turbo is not disabled + num_pstates:26 = (max 1 Core ratio - Max Efficiency Ratio + 1) + turbo_pct:39 = (max 1 core ratio - max non turbo ratio) / num_pstates + +Refer to "Intel® 64 and IA-32 Architectures Software Developer’s Manual +Volume 3: System Programming Guide" to understand ratios. + +cpufreq sysfs for Intel P-State + +Since this driver registers with cpufreq, cpufreq sysfs is also presented. +There are some important differences, which need to be considered. + +scaling_cur_freq: This displays the real frequency which was used during +the last sample period instead of what is requested. Some other cpufreq driver, +like acpi-cpufreq, displays what is requested (Some changes are on the +way to fix this for acpi-cpufreq driver). The same is true for frequencies +displayed at /proc/cpuinfo. + +scaling_governor: This displays current active policy. Since each CPU has a +cpufreq sysfs, it is possible to set a scaling governor to each CPU. But this +is not possible with Intel P-States, as there is one common policy for all +CPUs. Here, the last requested policy will be applicable to all CPUs. It is +suggested that one use the cpupower utility to change policy to all CPUs at the +same time. + +scaling_setspeed: This attribute can never be used with Intel P-State. + +scaling_max_freq/scaling_min_freq: This interface can be used similarly to +the max_perf_pct/min_perf_pct of Intel P-State sysfs. However since frequencies +are converted to nearest possible P-State, this is prone to rounding errors. +This method is not preferred to limit performance. + +affected_cpus: Not used +related_cpus: Not used + For contemporary Intel processors, the frequency is controlled by the -processor itself and the P-states exposed to software are related to +processor itself and the P-State exposed to software is related to performance levels. The idea that frequency can be set to a single -frequency is fiction for Intel Core processors. Even if the scaling -driver selects a single P state the actual frequency the processor +frequency is fictional for Intel Core processors. Even if the scaling +driver selects a single P-State, the actual frequency the processor will run at is selected by the processor itself. -For legacy mode debugfs files have also been added to allow tuning of -the internal governor algorythm. These files are located at -/sys/kernel/debug/pstate_snb/ These files are NOT present in HWP mode. +Tuning Intel P-State driver + +When HWP mode is not used, debugfs files have also been added to allow the +tuning of the internal governor algorithm. These files are located at +/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional +Integral Derivative) controller. The PID tunable parameters are: deadband d_gain_pct @@ -63,3 +133,90 @@ the internal governor algorythm. These files are located at p_gain_pct sample_rate_ms setpoint + +To adjust these parameters, some understanding of driver implementation is +necessary. There are some tweeks described here, but be very careful. Adjusting +them requires expert level understanding of power and performance relationship. +These limits are only useful when the "powersave" policy is active. + +-To make the system more responsive to load changes, sample_rate_ms can +be adjusted (current default is 10ms). +-To make the system use higher performance, even if the load is lower, setpoint +can be adjusted to a lower number. This will also lead to faster ramp up time +to reach the maximum P-State. +If there are no derivative and integral coefficients, The next P-State will be +equal to: + current P-State - ((setpoint - current cpu load) * p_gain_pct) + +For example, if the current PID parameters are (Which are defaults for the core +processors like SandyBridge): + deadband = 0 + d_gain_pct = 0 + i_gain_pct = 0 + p_gain_pct = 20 + sample_rate_ms = 10 + setpoint = 97 + +If the current P-State = 0x08 and current load = 100, this will result in the +next P-State = 0x08 - ((97 - 100) * 0.2) = 8.6 (rounded to 9). Here the P-State +goes up by only 1. If during next sample interval the current load doesn't +change and still 100, then P-State goes up by one again. This process will +continue as long as the load is more than the setpoint until the maximum P-State +is reached. + +For the same load at setpoint = 60, this will result in the next P-State += 0x08 - ((60 - 100) * 0.2) = 16 +So by changing the setpoint from 97 to 60, there is an increase of the +next P-State from 9 to 16. So this will make processor execute at higher +P-State for the same CPU load. If the load continues to be more than the +setpoint during next sample intervals, then P-State will go up again till the +maximum P-State is reached. But the ramp up time to reach the maximum P-State +will be much faster when the setpoint is 60 compared to 97. + +Debugging Intel P-State driver + +Event tracing +To debug P-State transition, the Linux event tracing interface can be used. +There are two specific events, which can be enabled (Provided the kernel +configs related to event tracing are enabled). + +# cd /sys/kernel/debug/tracing/ +# echo 1 > events/power/pstate_sample/enable +# echo 1 > events/power/cpu_frequency/enable +# cat trace +gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 + scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 + freq=2474476 +cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2 + + +Using ftrace + +If function level tracing is required, the Linux ftrace interface can be used. +For example if we want to check how often a function to set a P-State is +called, we can set ftrace filter to intel_pstate_set_pstate. + +# cd /sys/kernel/debug/tracing/ +# cat available_filter_functions | grep -i pstate +intel_pstate_set_pstate +intel_pstate_cpu_init +... + +# echo intel_pstate_set_pstate > set_ftrace_filter +# echo function > current_tracer +# cat trace | head -15 +# tracer: function +# +# entries-in-buffer/entries-written: 80/80 #P:4 +# +# _-----=> irqs-off +# / _----=> need-resched +# | / _---=> hardirq/softirq +# || / _--=> preempt-depth +# ||| / delay +# TASK-PID CPU# |||| TIMESTAMP FUNCTION +# | | | |||| | | + Xorg-3129 [000] ..s. 2537.644844: intel_pstate_set_pstate <-intel_pstate_timer_func + gnome-terminal--4510 [002] ..s. 2537.649844: intel_pstate_set_pstate <-intel_pstate_timer_func + gnome-shell-3409 [001] ..s. 2537.650850: intel_pstate_set_pstate <-intel_pstate_timer_func + <idle>-0 [000] ..s. 2537.654843: intel_pstate_set_pstate <-intel_pstate_timer_func diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt index 9e3c3b33514c..0a94224ad296 100644 --- a/Documentation/cpu-freq/pcc-cpufreq.txt +++ b/Documentation/cpu-freq/pcc-cpufreq.txt @@ -159,8 +159,8 @@ to be strictly associated with a P-state. 2.2 cpuinfo_transition_latency: ------------------------------- -The cpuinfo_transition_latency field is 0. The PCC specification does -not include a field to expose this value currently. +The cpuinfo_transition_latency field is CPUFREQ_ETERNAL. The PCC specification +does not include a field to expose this value currently. 2.3 cpuinfo_cur_freq: --------------------- diff --git a/Documentation/devicetree/bindings/arm/cpus.txt b/Documentation/devicetree/bindings/arm/cpus.txt index 3a07a87fef20..6aca64f289b6 100644 --- a/Documentation/devicetree/bindings/arm/cpus.txt +++ b/Documentation/devicetree/bindings/arm/cpus.txt @@ -242,6 +242,23 @@ nodes to be present and contain the properties described below. Definition: Specifies the syscon node controlling the cpu core power domains. + - dynamic-power-coefficient + Usage: optional + Value type: <prop-encoded-array> + Definition: A u32 value that represents the running time dynamic + power coefficient in units of mW/MHz/uVolt^2. The + coefficient can either be calculated from power + measurements or derived by analysis. + + The dynamic power consumption of the CPU is + proportional to the square of the Voltage (V) and + the clock frequency (f). The coefficient is used to + calculate the dynamic power as below - + + Pdyn = dynamic-power-coefficient * V^2 * f + + where voltage is in uV, frequency is in MHz. + Example 1 (dual-cluster big.LITTLE system 32-bit): cpus { diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt new file mode 100644 index 000000000000..d91a02a3b6b0 --- /dev/null +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt @@ -0,0 +1,91 @@ +Binding for ST's CPUFreq driver +=============================== + +ST's CPUFreq driver attempts to read 'process' and 'version' attributes +from the SoC, then supplies the OPP framework with 'prop' and 'supported +hardware' information respectively. The framework is then able to read +the DT and operate in the usual way. + +For more information about the expected DT format [See: ../opp/opp.txt]. + +Frequency Scaling only +---------------------- + +No vendor specific driver required for this. + +Located in CPU's node: + +- operating-points : [See: ../power/opp.txt] + +Example [safe] +-------------- + +cpus { + cpu@0 { + /* kHz uV */ + operating-points = <1500000 0 + 1200000 0 + 800000 0 + 500000 0>; + }; +}; + +Dynamic Voltage and Frequency Scaling (DVFS) +-------------------------------------------- + +This requires the ST CPUFreq driver to supply 'process' and 'version' info. + +Located in CPU's node: + +- operating-points-v2 : [See ../power/opp.txt] + +Example [unsafe] +---------------- + +cpus { + cpu@0 { + operating-points-v2 = <&cpu0_opp_table>; + }; +}; + +cpu0_opp_table: opp_table { + compatible = "operating-points-v2"; + + /* ############################################################### */ + /* # WARNING: Do not attempt to copy/replicate these nodes, # */ + /* # they are only to be supplied by the bootloader !!! # */ + /* ############################################################### */ + opp0 { + /* Major Minor Substrate */ + /* 2 all all */ + opp-supported-hw = <0x00000004 0xffffffff 0xffffffff>; + opp-hz = /bits/ 64 <1500000000>; + clock-latency-ns = <10000000>; + + opp-microvolt-pcode0 = <1200000>; + opp-microvolt-pcode1 = <1200000>; + opp-microvolt-pcode2 = <1200000>; + opp-microvolt-pcode3 = <1200000>; + opp-microvolt-pcode4 = <1170000>; + opp-microvolt-pcode5 = <1140000>; + opp-microvolt-pcode6 = <1100000>; + opp-microvolt-pcode7 = <1070000>; + }; + + opp1 { + /* Major Minor Substrate */ + /* all all all */ + opp-supported-hw = <0xffffffff 0xffffffff 0xffffffff>; + opp-hz = /bits/ 64 <1200000000>; + clock-latency-ns = <10000000>; + + opp-microvolt-pcode0 = <1110000>; + opp-microvolt-pcode1 = <1150000>; + opp-microvolt-pcode2 = <1100000>; + opp-microvolt-pcode3 = <1080000>; + opp-microvolt-pcode4 = <1040000>; + opp-microvolt-pcode5 = <1020000>; + opp-microvolt-pcode6 = <980000>; + opp-microvolt-pcode7 = <930000>; + }; +}; diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt index 0cb44dc21f97..601256fe8c0d 100644 --- a/Documentation/devicetree/bindings/opp/opp.txt +++ b/Documentation/devicetree/bindings/opp/opp.txt @@ -45,21 +45,10 @@ Devices supporting OPPs must set their "operating-points-v2" property with phandle to a OPP table in their DT node. The OPP core will use this phandle to find the operating points for the device. -Devices may want to choose OPP tables at runtime and so can provide a list of -phandles here. But only *one* of them should be chosen at runtime. This must be -accompanied by a corresponding "operating-points-names" property, to uniquely -identify the OPP tables. - If required, this can be extended for SoC vendor specfic bindings. Such bindings should be documented as Documentation/devicetree/bindings/power/<vendor>-opp.txt and should have a compatible description like: "operating-points-v2-<vendor>". -Optional properties: -- operating-points-names: Names of OPP tables (required if multiple OPP - tables are present), to uniquely identify them. The same list must be present - for all the CPUs which are sharing clock/voltage rails and hence the OPP - tables. - * OPP Table Node This describes the OPPs belonging to a device. This node can have following @@ -100,6 +89,14 @@ Optional properties: Entries for multiple regulators must be present in the same order as regulators are specified in device's DT node. +- opp-microvolt-<name>: Named opp-microvolt property. This is exactly similar to + the above opp-microvolt property, but allows multiple voltage ranges to be + provided for the same OPP. At runtime, the platform can pick a <name> and + matching opp-microvolt-<name> property will be enabled for all OPPs. If the + platform doesn't pick a specific <name> or the <name> doesn't match with any + opp-microvolt-<name> properties, then opp-microvolt property shall be used, if + present. + - opp-microamp: The maximum current drawn by the device in microamperes considering system specific parameters (such as transients, process, aging, maximum operating temperature range etc.) as necessary. This may be used to @@ -112,6 +109,9 @@ Optional properties: for few regulators, then this should be marked as zero for them. If it isn't required for any regulator, then this property need not be present. +- opp-microamp-<name>: Named opp-microamp property. Similar to + opp-microvolt-<name> property, but for microamp instead. + - clock-latency-ns: Specifies the maximum possible transition latency (in nanoseconds) for switching to this OPP from any other OPP. @@ -123,6 +123,26 @@ Optional properties: - opp-suspend: Marks the OPP to be used during device suspend. Only one OPP in the table should have this. +- opp-supported-hw: This enables us to select only a subset of OPPs from the + larger OPP table, based on what version of the hardware we are running on. We + still can't have multiple nodes with the same opp-hz value in OPP table. + + It's an user defined array containing a hierarchy of hardware version numbers, + supported by the OPP. For example: a platform with hierarchy of three levels + of versions (A, B and C), this field should be like <X Y Z>, where X + corresponds to Version hierarchy A, Y corresponds to version hierarchy B and Z + corresponds to version hierarchy C. + + Each level of hierarchy is represented by a 32 bit value, and so there can be + only 32 different supported version per hierarchy. i.e. 1 bit per version. A + value of 0xFFFFFFFF will enable the OPP for all versions for that hierarchy + level. And a value of 0x00000000 will disable the OPP completely, and so we + never want that to happen. + + If 32 values aren't sufficient for a version hierarchy, than that version + hierarchy can be contained in multiple 32 bit values. i.e. <X Y Z1 Z2> in the + above example, Z1 & Z2 refer to the version hierarchy Z. + - status: Marks the node enabled/disabled. Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together. @@ -157,20 +177,20 @@ Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together. compatible = "operating-points-v2"; opp-shared; - opp00 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000 975000 985000>; opp-microamp = <70000>; clock-latency-ns = <300000>; opp-suspend; }; - opp01 { + opp@1100000000 { opp-hz = /bits/ 64 <1100000000>; opp-microvolt = <980000 1000000 1010000>; opp-microamp = <80000>; clock-latency-ns = <310000>; }; - opp02 { + opp@1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt = <1025000>; clock-latency-ns = <290000>; @@ -236,20 +256,20 @@ independently. * independently. */ - opp00 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000 975000 985000>; opp-microamp = <70000>; clock-latency-ns = <300000>; opp-suspend; }; - opp01 { + opp@1100000000 { opp-hz = /bits/ 64 <1100000000>; opp-microvolt = <980000 1000000 1010000>; opp-microamp = <80000>; clock-latency-ns = <310000>; }; - opp02 { + opp@1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt = <1025000>; opp-microamp = <90000; @@ -312,20 +332,20 @@ DVFS state together. compatible = "operating-points-v2"; opp-shared; - opp00 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000 975000 985000>; opp-microamp = <70000>; clock-latency-ns = <300000>; opp-suspend; }; - opp01 { + opp@1100000000 { opp-hz = /bits/ 64 <1100000000>; opp-microvolt = <980000 1000000 1010000>; opp-microamp = <80000>; clock-latency-ns = <310000>; }; - opp02 { + opp@1200000000 { opp-hz = /bits/ 64 <1200000000>; opp-microvolt = <1025000>; opp-microamp = <90000>; @@ -338,20 +358,20 @@ DVFS state together. compatible = "operating-points-v2"; opp-shared; - opp10 { + opp@1300000000 { opp-hz = /bits/ 64 <1300000000>; opp-microvolt = <1045000 1050000 1055000>; opp-microamp = <95000>; clock-latency-ns = <400000>; opp-suspend; }; - opp11 { + opp@1400000000 { opp-hz = /bits/ 64 <1400000000>; opp-microvolt = <1075000>; opp-microamp = <100000>; clock-latency-ns = <400000>; }; - opp12 { + opp@1500000000 { opp-hz = /bits/ 64 <1500000000>; opp-microvolt = <1010000 1100000 1110000>; opp-microamp = <95000>; @@ -378,7 +398,7 @@ Example 4: Handling multiple regulators compatible = "operating-points-v2"; opp-shared; - opp00 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000>, /* Supply 0 */ <960000>, /* Supply 1 */ @@ -391,7 +411,7 @@ Example 4: Handling multiple regulators /* OR */ - opp00 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000 975000 985000>, /* Supply 0 */ <960000 965000 975000>, /* Supply 1 */ @@ -404,7 +424,7 @@ Example 4: Handling multiple regulators /* OR */ - opp00 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; opp-microvolt = <970000 975000 985000>, /* Supply 0 */ <960000 965000 975000>, /* Supply 1 */ @@ -417,7 +437,8 @@ Example 4: Handling multiple regulators }; }; -Example 5: Multiple OPP tables +Example 5: opp-supported-hw +(example: three level hierarchy of versions: cuts, substrate and process) / { cpus { @@ -426,40 +447,73 @@ Example 5: Multiple OPP tables ... cpu-supply = <&cpu_supply> - operating-points-v2 = <&cpu0_opp_table_slow>, <&cpu0_opp_table_fast>; - operating-points-names = "slow", "fast"; + operating-points-v2 = <&cpu0_opp_table_slow>; }; }; - cpu0_opp_table_slow: opp_table_slow { + opp_table { compatible = "operating-points-v2"; status = "okay"; opp-shared; - opp00 { + opp@600000000 { + /* + * Supports all substrate and process versions for 0xF + * cuts, i.e. only first four cuts. + */ + opp-supported-hw = <0xF 0xFFFFFFFF 0xFFFFFFFF> opp-hz = /bits/ 64 <600000000>; + opp-microvolt = <900000 915000 925000>; ... }; - opp01 { + opp@800000000 { + /* + * Supports: + * - cuts: only one, 6th cut (represented by 6th bit). + * - substrate: supports 16 different substrate versions + * - process: supports 9 different process versions + */ + opp-supported-hw = <0x20 0xff0000ff 0x0000f4f0> opp-hz = /bits/ 64 <800000000>; + opp-microvolt = <900000 915000 925000>; ... }; }; +}; + +Example 6: opp-microvolt-<name>, opp-microamp-<name>: +(example: device with two possible microvolt ranges: slow and fast) - cpu0_opp_table_fast: opp_table_fast { +/ { + cpus { + cpu@0 { + compatible = "arm,cortex-a7"; + ... + + operating-points-v2 = <&cpu0_opp_table>; + }; + }; + + cpu0_opp_table: opp_table0 { compatible = "operating-points-v2"; - status = "okay"; opp-shared; - opp10 { + opp@1000000000 { opp-hz = /bits/ 64 <1000000000>; - ... + opp-microvolt-slow = <900000 915000 925000>; + opp-microvolt-fast = <970000 975000 985000>; + opp-microamp-slow = <70000>; + opp-microamp-fast = <71000>; }; - opp11 { - opp-hz = /bits/ 64 <1100000000>; - ... + opp@1200000000 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt-slow = <900000 915000 925000>, /* Supply vcc0 */ + <910000 925000 935000>; /* Supply vcc1 */ + opp-microvolt-fast = <970000 975000 985000>, /* Supply vcc0 */ + <960000 965000 975000>; /* Supply vcc1 */ + opp-microamp = <70000>; /* Will be used for both slow/fast */ }; }; }; diff --git a/Documentation/power/pci.txt b/Documentation/power/pci.txt index b0e911e0e8f5..44558882aa60 100644 --- a/Documentation/power/pci.txt +++ b/Documentation/power/pci.txt @@ -999,7 +999,7 @@ from its probe routine to make runtime PM work for the device. It is important to remember that the driver's runtime_suspend() callback may be executed right after the usage counter has been decremented, because -user space may already have cuased the pm_runtime_allow() helper function +user space may already have caused the pm_runtime_allow() helper function unblocking the runtime PM of the device to run via sysfs, so the driver must be prepared to cope with that. diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 0784bc3a2ab5..7328cf85236c 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt @@ -371,6 +371,12 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: - increment the device's usage counter, run pm_runtime_resume(dev) and return its result + int pm_runtime_get_if_in_use(struct device *dev); + - return -EINVAL if 'power.disable_depth' is nonzero; otherwise, if the + runtime PM status is RPM_ACTIVE and the runtime PM usage counter is + nonzero, increment the counter and return 1; otherwise return 0 without + changing the counter + void pm_runtime_put_noidle(struct device *dev); - decrement the device's usage counter |