summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* cpufreq: cpufreq-dt: avoid uninitialized variable warnings:Arnd Bergmann2016-01-271-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | gcc warns quite a bit about values returned from allocate_resources() in cpufreq-dt.c: cpufreq-dt.c: In function 'cpufreq_init': cpufreq-dt.c:327:6: error: 'cpu_dev' may be used uninitialized in this function [-Werror=maybe-uninitialized] cpufreq-dt.c:197:17: note: 'cpu_dev' was declared here cpufreq-dt.c:376:2: error: 'cpu_clk' may be used uninitialized in this function [-Werror=maybe-uninitialized] cpufreq-dt.c:199:14: note: 'cpu_clk' was declared here cpufreq-dt.c: In function 'dt_cpufreq_probe': cpufreq-dt.c:461:2: error: 'cpu_clk' may be used uninitialized in this function [-Werror=maybe-uninitialized] cpufreq-dt.c:447:14: note: 'cpu_clk' was declared here The problem is that it's slightly hard for gcc to follow return codes across PTR_ERR() calls. This patch uses explicit assignments to the "ret" variable to make it easier for gcc to verify that the code is actually correct, without the need to add a bogus initialization. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototypeArnd Bergmann2016-01-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | There are two definitions of pxa_cpufreq_change_voltage, with slightly different prototypes after one of them had its argument marked 'const'. Now the other one (for !CONFIG_REGULATOR) produces a harmless warning: drivers/cpufreq/pxa2xx-cpufreq.c: In function 'pxa_set_target': drivers/cpufreq/pxa2xx-cpufreq.c:291:36: warning: passing argument 1 of 'pxa_cpufreq_change_voltage' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] ret = pxa_cpufreq_change_voltage(&pxa_freq_settings[idx]); ^ drivers/cpufreq/pxa2xx-cpufreq.c:205:12: note: expected 'struct pxa_freqs *' but argument is of type 'const struct pxa_freqs *' static int pxa_cpufreq_change_voltage(struct pxa_freqs *pxa_freq) ^ This changes the prototype in the same way as the other, which avoids the warning. Fixes: 03c229906311 (cpufreq: pxa: make pxa_freqs arrays const) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.2+ <stable@vger.kernel.org> # 4.2+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: Use list_is_last() to check last entry of the policy listGautham R Shenoy2016-01-271-3/+3
| | | | | | | | | | Currently next_policy() explicitly checks if a policy is the last policy in the cpufreq_policy_list. Use the standard list_is_last primitive instead. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: Fix NULL reference crash while accessing policy->governor_dataViresh Kumar2016-01-271-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race discovered by Juri, where we are able to: - create and read a sysfs file before policy->governor_data is being set to a non NULL value. OR - set policy->governor_data to NULL, and reading a file before being destroyed. And so such a crash is reported: Unable to handle kernel NULL pointer dereference at virtual address 0000000c pgd = edfc8000 [0000000c] *pgd=bfc8c835 Internal error: Oops: 17 [#1] SMP ARM Modules linked in: CPU: 4 PID: 1730 Comm: cat Not tainted 4.5.0-rc1+ #463 Hardware name: ARM-Versatile Express task: ee8e8480 ti: ee930000 task.ti: ee930000 PC is at show_ignore_nice_load_gov_pol+0x24/0x34 LR is at show+0x4c/0x60 pc : [<c058f1bc>] lr : [<c058ae88>] psr: a0070013 sp : ee931dd0 ip : ee931de0 fp : ee931ddc r10: ee4bc290 r9 : 00001000 r8 : ef2cb000 r7 : ee4bc200 r6 : ef2cb000 r5 : c0af57b0 r4 : ee4bc2e0 r3 : 00000000 r2 : 00000000 r1 : c0928df4 r0 : ef2cb000 Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: adfc806a DAC: 00000051 Process cat (pid: 1730, stack limit = 0xee930210) Stack: (0xee931dd0 to 0xee932000) 1dc0: ee931dfc ee931de0 c058ae88 c058f1a4 1de0: edce3bc0 c07bfca4 edce3ac0 00001000 ee931e24 ee931e00 c01fcb90 c058ae48 1e00: 00000001 edce3bc0 00000000 00000001 ee931e50 ee8ff480 ee931e34 ee931e28 1e20: c01fb33c c01fcb0c ee931e8c ee931e38 c01a5210 c01fb314 ee931e9c ee931e48 1e40: 00000000 edce3bf0 befe4a00 ee931f78 00000000 00000000 000001e4 00000000 1e60: c00545a8 edce3ac0 00001000 00001000 befe4a00 ee931f78 00000000 00001000 1e80: ee931ed4 ee931e90 c01fbed8 c01a5038 ed085a58 00020000 00000000 00000000 1ea0: c0ad72e4 ee931f78 ee8ff488 ee8ff480 c077f3fc 00001000 befe4a00 ee931f78 1ec0: 00000000 00001000 ee931f44 ee931ed8 c017c328 c01fbdc4 00001000 00000000 1ee0: ee8ff480 00001000 ee931f44 ee931ef8 c017c65c c03deb10 ee931fac ee931f08 1f00: c0009270 c001f290 c0a8d968 ef2cb000 ef2cb000 ee8ff480 00000020 ee8ff480 1f20: ee8ff480 befe4a00 00001000 ee931f78 00000000 00000000 ee931f74 ee931f48 1f40: c017d1ec c017c2f8 c019c724 c019c684 ee8ff480 ee8ff480 00001000 befe4a00 1f60: 00000000 00000000 ee931fa4 ee931f78 c017d2a8 c017d160 00000000 00000000 1f80: 000a9f20 00001000 befe4a00 00000003 c000ffe4 ee930000 00000000 ee931fa8 1fa0: c000fe40 c017d264 000a9f20 00001000 00000003 befe4a00 00001000 00000000 Unable to handle kernel NULL pointer dereference at virtual address 0000000c 1fc0: 000a9f20 00001000 befe4a00 00000003 00000000 00000000 00000003 00000001 pgd = edfc4000 [0000000c] *pgd=bfcac835 1fe0: 00000000 befe49dc 000197f8 b6e35dfc 60070010 00000003 3065b49d 134ac2c9 [<c058f1bc>] (show_ignore_nice_load_gov_pol) from [<c058ae88>] (show+0x4c/0x60) [<c058ae88>] (show) from [<c01fcb90>] (sysfs_kf_seq_show+0x90/0xfc) [<c01fcb90>] (sysfs_kf_seq_show) from [<c01fb33c>] (kernfs_seq_show+0x34/0x38) [<c01fb33c>] (kernfs_seq_show) from [<c01a5210>] (seq_read+0x1e4/0x4e4) [<c01a5210>] (seq_read) from [<c01fbed8>] (kernfs_fop_read+0x120/0x1a0) [<c01fbed8>] (kernfs_fop_read) from [<c017c328>] (__vfs_read+0x3c/0xe0) [<c017c328>] (__vfs_read) from [<c017d1ec>] (vfs_read+0x98/0x104) [<c017d1ec>] (vfs_read) from [<c017d2a8>] (SyS_read+0x50/0x90) [<c017d2a8>] (SyS_read) from [<c000fe40>] (ret_fast_syscall+0x0/0x1c) Code: e5903044 e1a00001 e3081df4 e34c1092 (e593300c) ---[ end trace 5994b9a5111f35ee ]--- Fix that by making sure, policy->governor_data is updated at the right places only. Cc: 4.2+ <stable@vger.kernel.org> # 4.2+ Reported-and-tested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Documentation: cpufreq: intel_pstate: enhance documentationSrinivas Pandruvada2016-01-051-42/+199
| | | | | | | | | This is an attempt to make documentation more user friendly. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Doug Smythies <dsmythies@telus.net> Reviewed-by: Chen, Yu C <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq-dt: fix handling regulator_get_voltage() resultAndrzej Hajda2016-01-051-2/+3
| | | | | | | | | | | | | The function can return negative values so it should be assigned to signed type. The problem has been detected using proposed semantic patch scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci. Link: http://permalink.gmane.org/gmane.linux.kernel/2038576 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: governor: Fix negative idle_time when configured with ↵Chen Yu2016-01-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_HZ_PERIODIC It is reported that, with CONFIG_HZ_PERIODIC=y cpu stays at the lowest frequency even if the usage goes to 100%, neither ondemand nor conservative governor works, however performance and userspace work as expected. If set with CONFIG_NO_HZ_FULL=y, everything goes well. This problem is caused by improper calculation of the idle_time when the load is extremely high(near 100%). Firstly, cpufreq_governor uses get_cpu_idle_time to get the total idle time for specific cpu, then: 1.If the system is configured with CONFIG_NO_HZ_FULL, the idle time is returned by ktime_get, which is always increasing, it's OK. 2.However, if the system is configured with CONFIG_HZ_PERIODIC, get_cpu_idle_time might not guarantee to be always increasing, because it will leverage get_cpu_idle_time_jiffy to calculate the idle_time, consider the following scenario: At T1: idle_tick_1 = total_tick_1 - user_tick_1 sample period(80ms)... At T2: ( T2 = T1 + 80ms): idle_tick_2 = total_tick_2 - user_tick_2 Currently the algorithm is using (idle_tick_2 - idle_tick_1) to get the delta idle_time during the past sample period, however it CAN NOT guarantee that idle_tick_2 >= idle_tick_1, especially when cpu load is high. (Yes, total_tick_2 >= total_tick_1, and user_tick_2 >= user_tick_1, but how about idle_tick_2 and idle_tick_1? No guarantee.) So governor might get a negative value of idle_time during the past sample period, which might mislead the system that the idle time is very big(converted to unsigned int), and the busy time is nearly zero, which causes the governor to always choose the lowest cpufreq, then cause this problem. In theory there are two solutions: 1.The logic should not rely on the idle tick during every sample period, but be based on the busy tick directly, as this is how 'top' is implemented. 2.Or the logic must make sure that the idle_time is strictly increasing during each sample period, then there would be no negative idle_time anymore. This solution requires minimum modification to current code and this patch uses method 2. Link: https://bugzilla.kernel.org/show_bug.cgi?id=69821 Reported-by: Jan Fikar <j.fikar@gmail.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpufreq: mt8173: migrate to use operating-points-v2 bindingsPi-Cheng Chen2016-01-021-6/+11
| | | | | | | | | Modify mt8173-cpufreq driver to get OPP-sharing information and set up OPP table provided by operating-points-v2 bindings. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge branch 'pm-opp' into pm-cpufreqRafael J. Wysocki2016-01-022-3/+4
|\
| * PM / OPP: Set cpu_dev->id in cpumask firstPi-Cheng Chen2016-01-021-1/+2
| | | | | | | | | | | | | | | | | | | | Set cpu_dev->id in cpumask first when setting up cpumask for CPUs that share the same OPP table. This might be helpful when handling cpumask without the original CPU bitfield set. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * PM / OPP: Fix parsing of opp-microvolt and opp-microamp propertiesBartlomiej Zolnierkiewicz2015-12-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Commit 01fb4d3c39d3 ("PM / OPP: Parse 'opp-<prop>-<name>' bindings") broke support for parsing standard opp-microvolt and opp-microamp properties. Fix it by setting 'name' string to proper value for !prop cases. Fixes: 01fb4d3c39d3 ("PM / OPP: Parse 'opp-<prop>-<name> 'bindings") Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | cpufreq: Simplify core code related to boost supportRafael J. Wysocki2016-01-013-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Notice that the boost_supported field in struct cpufreq_driver is redundant, because the driver's ->set_boost callback may be left unset if "boost" is not supported. Moreover, the only driver populating the ->set_boost callback is acpi_cpufreq, so make it avoid populating that callback if "boost" is not supported, rework the core to check ->set_boost instead of boost_supported to verify "boost" support and drop boost_supported which isn't used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: acpi-cpufreq: Simplify boost-related codeRafael J. Wysocki2016-01-011-13/+8
| | | | | | | | | | | | | | | | | | | | The store_boost() routine is only used by store_cpb(), so move the code from it directly to that function and rename _store_boost() to set_boost() to make its name reflect the name of the driver callback pointing to it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* | cpufreq: Make cpufreq_boost_supported() staticRafael J. Wysocki2016-01-012-11/+2
| | | | | | | | | | | | | | | | | | | | cpufreq_boost_supported() is not used outside of cpufreq.c, so make it static. While at it, refactor it as a one-liner (which it really is). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
* | blackfin-cpufreq: Mark cpu_set_cclk() as staticMarkus Elfring2015-12-281-1/+1
| | | | | | | | | | | | | | | | | | The cpu_set_cclk() function was only used in a single source file so far. Indicate this setting also by the corresponding linkage specifier. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | blackfin-cpufreq: Change return type of cpu_set_cclk() to that of clk_set_rate()Markus Elfring2015-12-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The return type "unsigned long" was used by the cpu_set_cclk() function while the type "int" is provided by the clk_set_rate() function. Let us make this usage consistent. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | Merge back earlier cpufreq material for v4.5.Rafael J. Wysocki2015-12-2823-208/+1501
|\ \
| * \ Merge back earlier cpufreq material for v4.5.Rafael J. Wysocki2015-12-2123-208/+1501
| |\ \
| | * | dt: cpufreq: st: Provide bindings for ST's CPUFreq implementationLee Jones2015-12-121-0/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Acked-by: Rob Herring <robh@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: st: Provide runtime initialised driver for ST's platformsLee Jones2015-12-123-0/+305
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bootloader is charged with the responsibility to provide platform specific Dynamic Voltage and Frequency Scaling (DVFS) information via Device Tree. This driver takes the supplied configuration and registers it with the new generic OPP framework, to then be used with CPUFreq. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | Merge branch 'pm-opp' into pm-cpufreqRafael J. Wysocki2015-12-127-71/+718
| | |\|
| | | * PM / OPP: Parse 'opp-<prop>-<name>' bindingsViresh Kumar2015-12-103-15/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPP bindings (for few properties) allow a platform to choose a value/range among a set of available options. The options are present as opp-<prop>-<name>, where the platform needs to supply the <name> string. The OPP properties which allow such an option are: opp-microvolt and opp-microamp. Add support to the OPP-core to parse these bindings, by introducing dev_pm_opp_{set|put}_prop_name() APIs. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Parse 'opp-supported-hw' bindingViresh Kumar2015-12-103-0/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPP bindings allow a platform to enable OPPs based on the version of the hardware they are used for. Add support to the OPP-core to parse these bindings, by introducing dev_pm_opp_{set|put}_supported_hw() APIs. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Add missing doc commentsViresh Kumar2015-11-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Few doc-style comments were missing, add them. Rearrange another one to match the sequence within the structure. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Pavel Machek <pavel@ucw.cz> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * ARM: dts: exynos4412: Rename OPP nodes as opp@<opp-hz>Viresh Kumar2015-11-231-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPP bindings got updated to name OPP nodes this way, make changes according to that. Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Rename OPP nodes as opp@<opp-hz>Viresh Kumar2015-11-231-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would be better to name OPP nodes as opp@<opp-hz> as that will ensure that multiple DT nodes don't contain the same frequency. Of course we expect the writer to name the node with its opp-hz frequency and not any other frequency. And that will let the compile error out if multiple nodes are using the same opp-hz frequency. Suggested-by: Stephen Boyd <sboyd@codeaurora.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Remove 'operating-points-names' bindingViresh Kumar2015-11-231-60/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These aren't used until now by any DT files and wouldn't be used now as we have a better scheme in place now, i.e. opp-property-<name> properties. Remove the (useless) binding without breaking ABI. Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Add {opp-microvolt|opp-microamp}-<name> bindingViresh Kumar2015-11-231-0/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the version of hardware or its properties, which are only known at runtime, various properties of the OPP can change. For example, an OPP with frequency 1.2 GHz, may have different voltage/current requirements based on the version of the hardware it is running on. In order to not replicate the same OPP tables for varying values of all such fields, this commit introduces the concept of opp-property-<name>. The <name> can be chosen by the platform at runtime, and OPPs will be initialized depending on that name string. Currently support is extended for the following properties: - opp-microvolt-<name> - opp-microamp-<name> If the name string isn't provided by the platform, or if it is provided but doesn't match the properties present in the OPP node, we will fall back to the original properties without the -<name> string, if they are available. Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Add "opp-supported-hw" bindingViresh Kumar2015-11-231-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We may want to enable only a subset of OPPs, from the bigger list of OPPs, based on what version of the hardware we are running on. This would enable us to not duplicate OPP tables for every version of the hardware we support. To enable that, this patch defines a new property 'opp-supported-hw'. It can support any number of hierarchy levels of the versions the hardware follows. And based on the selected hardware versions, we can pick only the relevant OPPs at runtime. Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * PM / OPP: Add debugfs supportViresh Kumar2015-11-234-2/+281
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds debugfs support to OPP layer to export OPPs and their properties for all the devices. This creates a top level directory: /sys/kernel/debug/opp and then device specific directories (based on device names) inside it. For example: 'cpu0', 'cpu1', etc.. If multiple devices share the OPP table, then the real directory is created only for the first device. For all others, links are created to the real directory. Inside the device specific directory, a separate directory is created for each OPP. And within that files per opp property. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: mt8173: Move resources allocation into ->probe()Pi-Cheng Chen2015-12-121-24/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the return value of ->init() of cpufreq driver is not propagated to the device driver model now, move resources allocation into ->probe() to handle -EPROBE_DEFER properly. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: intel_pstate: Account for IO wait timePhilippe Longepe2015-12-101-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In cases where we have many IOs, the global load becomes low and the load algorithm will decrease the requested P-State. Because of that, the IOs overheads will increase and impact the IO performances. To improve IO bound work, we can count the io-wait time as busy time in calculating CPU busy. This change uses get_cpu_iowait_time_us() to obtain the IO wait time value and converts time into number of cycles spent waiting on IO at the TSC rate. At the moment, this trick is only used for Atom. Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: intel_pstate: Account for non C0 timePhilippe Longepe2015-12-101-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current function to calculate cpu utilization uses the average P-state ratio (APerf/Mperf) scaled by the ratio of the current P-state to the max available non-turbo one. This leads to an overestimation of utilization which causes higher-performance P-states to be selected more often and that leads to increased energy consumption. This is a problem for low-power systems, so it is better to use a different utilization calculation algorithm for them. Namely, the Percent Busy value (or load) can be estimated as the ratio of the MPERF counter that runs at a constant rate only during active periods (C0) to the time stamp counter (TSC) that also runs (at the same rate) during idle. That is: Percent Busy = 100 * (delta_mperf / delta_tsc) Use this algorithm for platforms with SoCs based on the Airmont and Silvermont Atom cores. Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: intel_pstate: Configurable algorithm to get target pstatePhilippe Longepe2015-12-101-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Target systems using different cpus have different power and performance requirements. They may use different algorithms to get the next P-state based on their power or performance preference. For example, power-constrained systems may not want to use high-performance P-states as aggressively as a full-size desktop or a server platform. A server platform may want to run close to the max to achieve better performance, while laptop-like systems may prefer sacrificing performance for longer battery lifes. For the above reasons, modify intel_pstate to allow the target P-state selection algorithm to be depend on the CPU ID. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Philippe Longepe <philippe.longepe@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: mt8173: check return value of regulator_get_voltage() callPi-Cheng Chen2015-12-101-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sometimes regulator_get_voltage() call returns negative values for reasons(e.g. underlying I2C bus timeout). Add check for the return values and fail out early. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: mt8173: remove redundant regulator_get_voltage() callPi-Cheng Chen2015-12-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove redundant regulator_get_voltage() call to get Vsram value since it will be obtained later at the beginning of voltage tracking loop. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: mt8173: add CPUFREQ_HAVE_GOVERNOR_PER_POLICY flagPi-Cheng Chen2015-12-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add CPUFREQ_HAVE_GOVERNOR_PER_POLICY to have individual set of tunables for each cluster of MT8173. Signed-off-by: Pi-Cheng Chen <pi-cheng.chen@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: qoriq: Register cooling device based on device treeHongtao Jia2015-12-101-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register the qoriq cpufreq driver as a cooling device, based on the thermal device tree framework. When temperature crosses the passive trip point cpufreq is used to throttle CPUs. Signed-off-by: Jia Hongtao <hongtao.jia@freescale.com> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: pcc-cpufreq: update default value of cpuinfo_transition_latencyJacob Tanenbaum2015-12-102-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cpufreq documentation specifies policy->cpuinfo.transition_latency the time it takes on this CPU to switch between two frequencies in nanoseconds (if appropriate, else specify CPUFREQ_ETERNAL) currently pcc-cpufreq does not expose the value and sets it to zero. I changed the pcc-cpufreq driver and it's documentation to conform to the default value specified in Documentation/cpu-freq/cpu-drivers.txt Signed-off-by: Jacob Tanenbaum <jtanenba@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: arm_big_little: Add support to register a cpufreq cooling devicePunit Agrawal2015-12-102-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Register passive cooling devices when initialising cpufreq on big.LITTLE systems. If the device tree provides a dynamic power coefficient for the CPUs then the bound cooling device will support the extensions that allow it to be used with all the existing thermal governors including the power allocator governor. A cooling device will be created per individual frequency domain and can be bound to thermal zones via the thermal DT bindings. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq-dt: Supply power coefficient when registering cooling devicesPunit Agrawal2015-12-101-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support registering cooling devices with dynamic power coefficient where provided by the device tree. This allows OF registered cooling devices driver to be used with the power_allocator thermal governor. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Javi Merino <javi.merino@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | devicetree: bindings: Add optional dynamic-power-coefficient propertyPunit Agrawal2015-12-101-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dynamic power consumption of a device is proportional to the square of voltage (V) and the clock frequency (f). It can be expressed as Pdyn = dynamic-power-coefficient * V^2 * f. The coefficient represents the running time dynamic power consumption in units of mw/MHz/uVolt^2 and can be used in the above formula to calculate the dynamic power in mW. Signed-off-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: governor: Use lockless timer functionRafael J. Wysocki2015-12-092-34/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible to get rid of the timer_lock spinlock used by the governor timer function for synchronization, but a couple of races need to be avoided. The first race is between multiple dbs_timer_handler() instances that may be running in parallel with each other on different CPUs. Namely, one of them has to queue up the work item, but it cannot be queued up more than once. To achieve that, atomic_inc_return() can be used on the skip_work field of struct cpu_common_dbs_info. The second race is between an already running dbs_timer_handler() and gov_cancel_work(). In that case the dbs_timer_handler() might not notice the skip_work incrementation in gov_cancel_work() and it might queue up its work item after gov_cancel_work() had returned (and that work item would corrupt skip_work going forward). To prevent that from happening, gov_cancel_work() can be made wait for the timer function to complete (on all CPUs) right after skip_work has been incremented. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
| | * | cpufreq: ondemand: update update_sampling_rate() to make it more efficientViresh Kumar2015-12-091-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently update_sampling_rate() runs over each online CPU and cancels/queues timers on all policy->cpus every time. This should be done just once for any cpu belonging to a policy. Create a cpumask and keep on clearing it as and when we process policies, so that we don't have to traverse through all CPUs of the same policy. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: governor: replace per-CPU delayed work with timersViresh Kumar2015-12-093-70/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq governors evaluate load at sampling rate and based on that they update frequency for a group of CPUs belonging to the same cpufreq policy. This is required to be done in a single thread for all policy->cpus, but because we don't want to wakeup idle CPUs to do just that, we use deferrable work for this. If we would have used a single delayed deferrable work for the entire policy, there were chances that the CPU required to run the handler can be in idle and we might end up not changing the frequency for the entire group with load variations. And so we were forced to keep per-cpu works, and only the one that expires first need to do the real work and others are rescheduled for next sampling time. We have been using the more complex solution until now, where we used a delayed deferrable work for this, which is a combination of a timer and a work. This could be made lightweight by keeping per-cpu deferred timers with a single work item, which is scheduled by the first timer that expires. This patch does just that and here are important changes: - The timer handler will run in irq context and so we need to use a spin_lock instead of the timer_mutex. And so a separate timer_lock is created. This also makes the use of the mutex and lock quite clear, as we know what exactly they are protecting. - A new field 'skip_work' is added to track when the timer handlers can queue a work. More comments present in code. Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: governor: initialize/destroy timer_mutex with 'shared'Viresh Kumar2015-12-071-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | timer_mutex is required to be initialized only while memory for 'shared' is allocated and in a similar way it is required to be destroyed only when memory for 'shared' is freed. There is no need to do the same every time we start/stop the governor. Move code to initialize/destroy timer_mutex to the relevant places. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()Viresh Kumar2015-12-074-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and dbs_data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: ondemand: Work is guaranteed to be pendingViresh Kumar2015-12-071-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are guaranteed to have works scheduled for policy->cpus, as the policy isn't stopped yet. And so there is no need to check that again. Drop it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | cpufreq: ondemand: Update sampling rate only for concerned policiesViresh Kumar2015-12-071-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are comparing policy->governor against cpufreq_gov_ondemand to make sure that we update sampling rate only for the concerned CPUs. But that isn't enough. In case of governor_per_policy, there can be multiple instances of ondemand governor and we will always end up updating all of them with current code. What we rather need to do, is to compare dbs_data with poilcy->governor_data, which will match only for the policies governed by dbs_data. This code is also racy as the governor might be getting stopped at that time and we may end up scheduling work for a policy, which we have just disabled. Fix that by protecting the entire function with &od_dbs_cdata.mutex, which will prevent against races with policy START/STOP/etc. After these locks are in place, we can safely get the policy via per-cpu dbs_info. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | cpufreq: scpi-cpufreq: signedness bug in scpi_get_dvfs_info()Dan Carpenter2015-12-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "domain" variable needs to be signed for the error handling to work. Fixes: 8def31034d03 (cpufreq: arm_big_little: add SCPI interface driver) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>