summaryrefslogtreecommitdiffstats
path: root/drivers (follow)
Commit message (Collapse)AuthorAgeFilesLines
* powercap: intel_rapl: Use topology interface in rapl_init_domains()Yunfeng Ye2021-02-121-1/+1
| | | | | | | | | | It's not a good idea to access the phys_proc_id of cpuinfo directly. Use topology_physical_package_id(cpu) instead. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap: intel_rapl: Use topology interface in rapl_add_package()Yunfeng Ye2021-02-121-3/+3
| | | | | | | | | | | It's not a good idea to access phys_proc_id and cpu_die_id directly. Use topology_physical_package_id(cpu) and topology_die_id(cpu) instead. Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/intel_rapl: add support for AlderLake MobileZhang Rui2021-01-271-0/+1
| | | | | | | Add intel_rapl support for the AlderLake Mobile platform. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Fix size of object being allocatedColin Ian King2021-01-071-1/+1
| | | | | | | | | | | | The kzalloc allocation for dtpm_cpu is currently allocating the size of the pointer and not the size of the structure. Fix this by using the correct sizeof argument. Addresses-Coverity: ("Wrong sizeof argument") Fixes: 0e8f68d7f048 ("powercap/drivers/dtpm: Add CPU energy model based support") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Fix an IS_ERR() vs NULL checkDan Carpenter2021-01-071-2/+2
| | | | | | | | | | The powercap_register_control_type() function never returns NULL, it returns error pointers on error so update this check. Fixes: a20d0ef97abf ("powercap/drivers/dtpm: Add API for dynamic thermal power management") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Fix some missing unlock bugsDan Carpenter2021-01-071-5/+12
| | | | | | | | | We need to unlock on these paths before returning. Fixes: a20d0ef97abf ("powercap/drivers/dtpm: Add API for dynamic thermal power management") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Fix a double shift bugDan Carpenter2021-01-071-1/+1
| | | | | | | | | | | | The DTPM_POWER_LIMIT_FLAG is used for test_bit() etc which take a bit number so it should be bit 0. But currently it's set to BIT(0) then that is double shifted equivalent to BIT(BIT(0)). This doesn't cause a run time problem because it's done consistently. Fixes: a20d0ef97abf ("powercap/drivers/dtpm: Add API for dynamic thermal power management") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbolsDaniel Lezcano2020-12-301-3/+3
| | | | | | | | | | | | | | | | 32-bit architectures do not support u64 divisions, so the macro DIV_ROUND_CLOSEST is not adequate as the compiler will replace the call to an unexisting function for the platform, leading to unresolved references to symbols. Fix this by using the compatible macros: DIV64_U64_ROUND_CLOSEST and DIV_ROUND_CLOSEST_ULL. Fixes: a20d0ef97abf ("powercap/drivers/dtpm: Add API for dynamic thermal power management") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Add CPU energy model based supportDaniel Lezcano2020-12-223-0/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the powercap dtpm controller, we are able to plug devices with power limitation features in the tree. The following patch introduces the CPU power limitation based on the energy model and the performance states. The power limitation is done at the performance domain level. If some CPUs are unplugged, the corresponding power will be subtracted from the performance domain total power. It is up to the platform to initialize the dtpm tree and add the CPU. Here is an example to create a simple tree with one root node called "pkg" and the CPU's performance domains. static int dtpm_register_pkg(struct dtpm_descr *descr) { struct dtpm *pkg; int ret; pkg = dtpm_alloc(NULL); if (!pkg) return -ENOMEM; ret = dtpm_register(descr->name, pkg, descr->parent); if (ret) return ret; return dtpm_register_cpu(pkg); } static struct dtpm_descr descr = { .name = "pkg", .init = dtpm_register_pkg, }; DTPM_DECLARE(descr); Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* powercap/drivers/dtpm: Add API for dynamic thermal power managementDaniel Lezcano2020-12-223-0/+480
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the embedded world, the complexity of the SoC leads to an increasing number of hotspots which need to be monitored and mitigated as a whole in order to prevent the temperature to go above the normative and legally stated 'skin temperature'. Another aspect is to sustain the performance for a given power budget, for example virtual reality where the user can feel dizziness if the GPU performance is capped while a big CPU is processing something else. Or reduce the battery charging because the dissipated power is too high compared with the power consumed by other devices. The userspace is the most adequate place to dynamically act on the different devices by limiting their power given an application profile: it has the knowledge of the platform. These userspace daemons are in charge of the Dynamic Thermal Power Management (DTPM). Nowadays, the dtpm daemons are abusing the thermal framework as they act on the cooling device state to force a specific and arbitrary state without taking care of the governor decisions. Given the closed loop of some governors that can confuse the logic or directly enter in a decision conflict. As the number of cooling device support is limited today to the CPU and the GPU, the dtpm daemons have little control on the power dissipation of the system. The out of tree solutions are hacking around here and there in the drivers, in the frameworks to have control on the devices. The common solution is to declare them as cooling devices. There is no unification of the power limitation unit, opaque states are used. This patch provides a way to create a hierarchy of constraints using the powercap framework. The devices which are registered as power limit-able devices are represented in this hierarchy as a tree. They are linked together with intermediate nodes which are just there to propagate the constraint to the children. The leaves of the tree are the real devices, the intermediate nodes are virtual, aggregating the children constraints and power characteristics. Each node have a weight on a 2^10 basis, in order to reflect the percentage of power distribution of the children's node. This percentage is used to dispatch the power limit to the children. The weight is computed against the max power of the siblings. This simple approach allows to do a fair distribution of the power limit. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge tag 'pm-5.11-rc1' of ↵Linus Torvalds2020-12-1651-969/+1078
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These update cpufreq (core and drivers), cpuidle (polling state implementation and the PSCI driver), the OPP (operating performance points) framework, devfreq (core and drivers), the power capping RAPL (Running Average Power Limit) driver, the Energy Model support, the generic power domains (genpd) framework, the ACPI device power management, the core system-wide suspend code and power management utilities. Specifics: - Use local_clock() instead of jiffies in the cpufreq statistics to improve accuracy (Viresh Kumar). - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq drivers (Viresh Kumar). - Clean up the cpufreq core, the intel_pstate driver and the schedutil cpufreq governor (Rafael Wysocki). - Fix up error code paths in the sti-cpufreq and mediatek cpufreq drivers (Yangtao Li, Qinglang Miao). - Fix cpufreq_online() to return error codes instead of success (0) in all cases when it fails (Wang ShaoBo). - Add mt8167 support to the mediatek cpufreq driver and blacklist mt8516 in the cpufreq-dt-platdev driver (Fabien Parent). - Modify the tegra194 cpufreq driver to always return values from the frequency table as the current frequency and clean up that driver (Sumit Gupta, Jon Hunter). - Modify the arm_scmi cpufreq driver to allow it to discover the power scale present in the performance protocol and provide this information to the Energy Model (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali Rohár). - Clean up the CPPC cpufreq driver (Ionela Voinescu). - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd Bergmann). - Rework the poling interval selection for the polling state in cpuidle (Mel Gorman). - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver (Ulf Hansson). - Modify the OPP framework to support empty (node-less) OPP tables in DT for passing dependency information (Nicola Mazzucato). - Fix potential lockdep issue in the OPP core and clean up the OPP core (Viresh Kumar). - Modify dev_pm_opp_put_regulators() to accept a NULL argument and update its users accordingly (Viresh Kumar). - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke). - Add support for governor feature flags to devfreq, make devfreq sysfs file permissions depend on the governor and clean up the devfreq core (Chanwoo Choi). - Clean up the tegra20 devfreq driver and deprecate it to allow another driver based on EMC_STAT to be used instead of it (Dmitry Osipenko). - Add interconnect support to the tegra30 devfreq driver, allow it to take the interconnect and OPP information from DT and clean it up (Dmitry Osipenko). - Add interconnect support to the exynos-bus devfreq driver along with interconnect properties documentation (Sylwester Nawrocki). - Add suport for AMD Fam17h and Fam19h processors to the RAPL power capping driver (Victor Ding, Kim Phillips). - Fix handling of overly long constraint names in the powercap framework (Lukasz Luba). - Fix the wakeup configuration handling for bridges in the ACPI device power management core (Rafael Wysocki). - Add support for using an abstract scale for power units in the Energy Model (EM) and document it (Lukasz Luba). - Add em_cpu_energy() micro-optimization to the EM (Pavankumar Kondeti). - Modify the generic power domains (genpd) framwework to support suspend-to-idle (Ulf Hansson). - Fix creation of debugfs nodes in genpd (Thierry Strudel). - Clean up genpd (Lina Iyer). - Clean up the core system-wide suspend code and make it print driver flags for devices with debug enabled (Alex Shi, Patrice Chotard, Chen Yu). - Modify the ACPI system reboot code to make it prepare for system power off to avoid confusing the platform firmware (Kai-Heng Feng). - Update the pm-graph (multiple changes, mostly usability-related) and cpupower (online and offline CPU information support) PM utilities (Todd Brandt, Brahadambal Srinivasan)" * tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() PM: domains: create debugfs nodes when adding power domains opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_*() accepts NULL argument drm/panfrost: dev_pm_opp_put_*() accepts NULL argument drm/lima: dev_pm_opp_put_*() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() PM / EM: Micro optimization in em_cpu_energy cpufreq: arm_scmi: Discover the power scale in performance protocol ...
| *-. Merge branches 'pm-devfreq' and 'pm-tools'Rafael J. Wysocki2020-12-1511-403/+322
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-devfreq: PM / devfreq: tegra30: Separate configurations per-SoC generation PM / devfreq: tegra30: Support interconnect and OPPs from device-tree PM / devfreq: tegra20: Deprecate in a favor of emc-stat based driver PM / devfreq: exynos-bus: Add registration of interconnect child device dt-bindings: devfreq: Add documentation for the interconnect properties soc/tegra: fuse: Add stub for tegra_sku_info soc/tegra: fuse: Export tegra_read_ram_code() clk: tegra: Export Tegra20 EMC kernel symbols PM / devfreq: tegra30: Silence deferred probe error PM / devfreq: tegra20: Relax Kconfig dependency PM / devfreq: tegra20: Silence deferred probe error PM / devfreq: Remove redundant governor_name from struct devfreq PM / devfreq: Add governor attribute flag for specifc sysfs nodes PM / devfreq: Add governor feature flag PM / devfreq: Add tracepoint for frequency changes PM / devfreq: Unify frequency change to devfreq_update_target func trace: events: devfreq: Use fixed indentation size to improve readability * pm-tools: pm-graph v5.8 cpupower: Provide online and offline CPU information
| | * \ Merge tag 'devfreq-next-for-5.11' of ↵Rafael J. Wysocki2020-12-1111-403/+322
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux Pull devfreq updates for 5.11 from Chanwoo Choi: 1. Update devfreq core - Add new devfreq_frequency tracepoint to show the frequency change information. - Add governor feature flag. The devfreq governor is able to set the specific flag in order to support a non-common feature. For example, if the governor supports the 'immutable' feature, don't allow user space to change the governor via sysfs. - Add governor sysfs attribute flag for each sysfs file. Prior to that the devfreq subsystem allowed all of the sysfs files to be accessed regardless of the governor type. But some sysfs fils are not supported by specific devfreq governors. In order to only allow the sysfs files supported by the governor to be accessed, clarify the access permissions of sysfs attributes according to the governor. When adding the devfreq governor, specify the available attribute information by using DEVFREQ_GOV_ATTR_* symbols. The user can read or write the sysfs attributes in accordance to the specified access permissions. - Clean-up the code to reduce duplication for the devfreq tracepoint and to remove redundant governor_name field from struct devfreq. 2. Update exynos-bus.c devfreq driver - Add interconnect API support to the Samsung Exynos Bus Frequency driver, exynos-bus.c. Complementing the devfreq driver with interconnect functionality allows to ensure that the QoS requirements regarding devices accessing the system memory (e.g. video processing devices) will be met and allows to avoid issues like DMA underrun. 3. Update tegra devfreq driver - Add interconnect support and OPP interface to tegra30-devfreq.c. Also, it is to guarantee the QoS requirement of some devices like the display controller. - Move tegra20-devfreq.c from drivers/devfreq/ into drivers/memory/tegra/ in order to use the more proper monitoring feature such as EMC_STAT which is located in drivers/memory/tegra/. - Separate the configuration information for different SoCs in tegra30-devfrqe.c. The tegra30-devfreq.c had been supporting both tegra30-actmon and tegra124-actmon devices. In order to use the more correct configuration data, separate them. - Use dev_err_probe() to handle the deferred probe error in tegra30-devfreq.c. 4. Pull the request of 'Tegra SoC and clock controller changes for v5.11' sent by Krzysztof Kozlowski <krzk@kernel.org> in order to avoid a build error." * tag 'devfreq-next-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux: PM / devfreq: tegra30: Separate configurations per-SoC generation PM / devfreq: tegra30: Support interconnect and OPPs from device-tree PM / devfreq: tegra20: Deprecate in a favor of emc-stat based driver PM / devfreq: exynos-bus: Add registration of interconnect child device dt-bindings: devfreq: Add documentation for the interconnect properties soc/tegra: fuse: Add stub for tegra_sku_info soc/tegra: fuse: Export tegra_read_ram_code() clk: tegra: Export Tegra20 EMC kernel symbols PM / devfreq: tegra30: Silence deferred probe error PM / devfreq: tegra20: Relax Kconfig dependency PM / devfreq: tegra20: Silence deferred probe error PM / devfreq: Remove redundant governor_name from struct devfreq PM / devfreq: Add governor attribute flag for specifc sysfs nodes PM / devfreq: Add governor feature flag PM / devfreq: Add tracepoint for frequency changes PM / devfreq: Unify frequency change to devfreq_update_target func trace: events: devfreq: Use fixed indentation size to improve readability
| | | * | PM / devfreq: tegra30: Separate configurations per-SoC generationDmitry Osipenko2020-12-071-14/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were using count-weight of the T124 for T30 in order to get EMC clock rate that was reasonable for T30. In fact the count-weight should be x2 times smaller on T30, but then devfreq was producing a bit too low EMC clock rate for ISO memory clients, like display controller for example. Now both Tegra ACTMON and Tegra DRM display drivers support interconnect framework and display driver tells to ICC what a minimum memory bandwidth is needed, preventing FIFO underflows. Thus, now we can use a proper count-weight value for Tegra30 and MC_ALL device config needs a bit more aggressive boosting. Add a separate ACTMON driver configuration that is specific to Tegra30. Tested-by: Peter Geis <pgwipeout@gmail.com> Tested-by: Nicolas Chauvet <kwizart@gmail.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | PM / devfreq: tegra30: Support interconnect and OPPs from device-treeDmitry Osipenko2020-12-071-42/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves ACTMON driver away from generating OPP table by itself, transitioning it to use the table which comes from device-tree. This change breaks compatibility with older device-trees and brings support for the interconnect framework to the driver. This is a mandatory change which needs to be done in order to implement interconnect-based memory DVFS, i.e. device-trees need to be updated. Now ACTMON issues a memory bandwidth requests using dev_pm_opp_set_bw() instead of driving EMC clock rate directly. Tested-by: Peter Geis <pgwipeout@gmail.com> Tested-by: Nicolas Chauvet <kwizart@gmail.com> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | Merge tag 'tegra-soc-clk-drivers-5.11' of ↵Chanwoo Choi2020-12-072-0/+5
| | | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl into devfreq-next Tegra SoC and clock controller changes for v5.11 Export symbols and add stubs necessary for upcoming modified Tegra memory controller drivers (touching also devfreq and interconnect).
| | | | * | soc/tegra: fuse: Export tegra_read_ram_code()Dmitry Osipenko2020-11-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tegra_read_ram_code() is used by EMC drivers and we're going to make these driver modular, hence this function needs to be exported. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20201104164923.21238-3-digetx@gmail.com Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
| | | | * | clk: tegra: Export Tegra20 EMC kernel symbolsDmitry Osipenko2020-11-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're going to modularize Tegra EMC drivers and some of the EMC-clock driver symbols need to be exported. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20201104164923.21238-2-digetx@gmail.com Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
| | | * | | PM / devfreq: tegra20: Deprecate in a favor of emc-stat based driverDmitry Osipenko2020-11-233-221/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove tegra20-devfreq in order to replace it with a EMC_STAT based devfreq driver. Previously we were going to use MC_STAT based tegra20-devfreq driver because EMC_STAT wasn't working properly, but now that problem is resolved. This resolves complications imposed by the removed driver since it was depending on both EMC and MC drivers simultaneously. Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: exynos-bus: Add registration of interconnect child deviceSylwester Nawrocki2020-11-131-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds registration of a child platform device for the exynos interconnect driver. It is assumed that the interconnect provider will only be needed when #interconnect-cells property is present in the bus DT node, hence the child device will be created only when such a property is present. Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: tegra30: Silence deferred probe errorDmitry Osipenko2020-10-261-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tegra EMC driver was turned into a regular kernel driver, meaning that it could be compiled as a loadable kernel module now. Hence EMC clock isn't guaranteed to be available and clk_get("emc") may return -EPROBE_DEFER. Let's silence the deferred probe error. Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: tegra20: Relax Kconfig dependencyDmitry Osipenko2020-10-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Tegra EMC driver now could be compiled as a loadable kernel module. Currently devfreq driver depends on the EMC/MC drivers in Kconfig, and thus, devfreq is forced to be a kernel module if EMC is compiled as a module. This build dependency could be relaxed since devfreq driver checks MC/EMC presence on probe, allowing kernel configuration where devfreq is a built-in driver and EMC driver is a loadable module. This change puts Tegra20 devfreq Kconfig entry on a par with the Tegra30 devfreq entry. Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: tegra20: Silence deferred probe errorDmitry Osipenko2020-10-261-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tegra EMC driver was turned into a regular kernel driver, meaning that it could be compiled as a loadable kernel module now. Hence EMC clock isn't guaranteed to be available and clk_get("emc") may return -EPROBE_DEFER. Let's silence the deferred probe error. Acked-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: Remove redundant governor_name from struct devfreqChanwoo Choi2020-10-262-11/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The devfreq structure instance contains the governor_name and a governor instance. When need to show the governor name, better to use the name of devfreq_governor structure. So, governor_name variable in struct devfreq is a redundant and unneeded variable. Remove the redundant governor_name of struct devfreq and then use the name of devfreq_governor instance. Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: Add governor attribute flag for specifc sysfs nodesChanwoo Choi2020-10-264-50/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DEVFREQ supports the default governors like performance, simple_ondemand and also allows the devfreq driver to add their own governor like tegra30-devfreq.c according to their requirement. In result, some sysfs attributes are useful or not useful. Prior to that the user can access all sysfs attributes regardless of the available attributes. So, clarify the access permission of sysfs attributes according to governor. When adding the devfreq governor, can specify the available attribute information by using DEVFREQ_GOV_ATTR_* constant variable. The user can read or write the sysfs attributes in accordance to the specified attributes. When adding the governor, can add the following attributes according to the governor feature. [Definition for speific sysfs attributes] - DEVFREQ_GOV_ATTR_POLLING_INTERVAL to update polling interval for timer. : /sys/class/devfreq/[devfreq dev name]/polling_interval - DEVFREQ_GOV_ATTR_TIMER to change the type of timer on either deferrable or dealyed timer. : /sys/class/devfreq/[devfreq dev name]/timer And all devfreq governors have to support the following common attributes. The common attributes are added to devfreq class by default. - governor - available_governors - available_frequencies - cur_freq - target_freq - min_freq - max_freq - trans_stat [Table of governor attribute flags for devfreq governors] ------------------------------------------------------------------------------ | simple | perfor | power | user | passive | tegra30 | ondemand | mance | save | space| | ------------------------------------------------------------------------------ governor | O | O | O | O | O | O available_governors | O | O | O | O | O | O available_frequencies | O | O | O | O | O | O cur_freq | O | O | O | O | O | O target_freq | O | O | O | O | O | O min_freq | O | O | O | O | O | O max_freq | O | O | O | O | O | O trans_stat | O | O | O | O | O | O -------------------------------------------------------- polling_interval | O | X | X | X | X | O timer | O | X | X | X | X | X ------------------------------------------------------------------------------ Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: Add governor feature flagChanwoo Choi2020-10-264-20/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The devfreq governor is able to have the specific flag as follows in order to implement the specific feature. For example, devfreq allows user to change the governors on runtime via sysfs interface. But, if devfreq device uses 'passive' governor, don't allow user to change the governor. For this case, define the DEVFREQ_GOV_FLAG_IMMUTABLE and set it to flag of passive governor. [Definition for governor flag] - DEVFREQ_GOV_FLAG_IMMUTABLE : If immutable flag is set, governor is never changeable to other governors. - DEVFREQ_GOV_FLAG_IRQ_DRIVEN : Devfreq core won't schedule polling work for this governor if value is set. [Table of governor flag for devfreq governors] ------------------------------------------------------------------------------ | simple | perfor | power | user | passive | tegra30 | ondemand | mance | save | space| | ------------------------------------------------------------------------------ immutable | X | X | X | X | O | O interrupt_driven | X(polling)| X | X | X | X | O (irq) ------------------------------------------------------------------------------ Reviewed-by: Dmitry Osipenko <digetx@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: Add tracepoint for frequency changesMatthias Kaehlcke2020-10-261-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a tracepoint for frequency changes of devfreq devices and use it. Signed-off-by: Matthias Kaehlcke <mka@chromium.org> [cw00.choi: Move print position of tracepoint and add more information] Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | * | | PM / devfreq: Unify frequency change to devfreq_update_target funcChanwoo Choi2020-10-263-39/+33
| | | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The update_devfreq() and update_passive_devfreq() have the duplicate code when changing the target frequency on final stage. So, unify frequency change code to devfreq_update_target() to remove the duplicate code and to centralize the frequency change code. Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com>
| | | | |
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| *-----. \ \ \ Merge branches 'pm-sleep', 'pm-acpi', 'pm-domains' and 'powercap'Rafael J. Wysocki2020-12-158-89/+129
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-sleep: PM: sleep: Add dev_wakeup_path() helper PM / suspend: fix kernel-doc markup PM: sleep: Print driver flags for all devices during suspend/resume * pm-acpi: PM: ACPI: Refresh wakeup device power configuration every time PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup() PM: ACPI: reboot: Use S5 for reboot * pm-domains: PM: domains: create debugfs nodes when adding power domains PM: domains: replace -ENOTSUPP with -EOPNOTSUPP * powercap: powercap: Adjust printing the constraint name with new line powercap: RAPL: Add AMD Fam19h RAPL support powercap: Add AMD Fam17h RAPL support powercap/intel_rapl_msr: Convert rapl_msr_priv into pointer x86/msr-index: sort AMD RAPL MSRs by address
| | | | | * | | | powercap: Adjust printing the constraint name with new lineLukasz Luba2020-11-231-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The constrain name has limit of size 30, which sometimes might be hit. When this happens the new line might get lost. Prevent this and set the max limit for name string length equal 29. This would result is proper string clamping (when needed) and storing '\n' at index 29 and '\0' at 30, so similarly as desired originally. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | * | | | powercap: RAPL: Add AMD Fam19h RAPL supportKim Phillips2020-11-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD Family 19h's RAPL MSRs are identical to Family 17h's. Extend Family 17h's support to Family 19h. Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Victor Ding <victording@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | * | | | powercap: Add AMD Fam17h RAPL supportVictor Ding2020-11-102-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable AMD Fam17h RAPL support for the power capping framework. The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh (Zen1) PPR. Tested by comparing the results of following two sysfs entries and the values directly read from corresponding MSRs via /dev/cpu/[x]/msr: /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj Signed-off-by: Victor Ding <victording@google.com> Acked-by: Kim Phillips <kim.phillips@amd.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | * | | | powercap/intel_rapl_msr: Convert rapl_msr_priv into pointerVictor Ding2020-11-101-15/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes the static struct rapl_msr_priv to a pointer to allow using a different RAPL MSR interface, preparing for supporting AMD's RAPL MSR interface. No functional changes. Signed-off-by: Victor Ding <victording@google.com> Acked-by: Kim Phillips <kim.phillips@amd.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | * | | | | PM: domains: create debugfs nodes when adding power domainsThierry Strudel2020-12-111-28/+45
| | | | | |_|_|/ | | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debugfs nodes were created in genpd_debug_init alled in late_initcall preventing power domains registered though loadable modules to have a debugfs entry. Create/remove debugfs nodes when the power domain is added/removed to/from the internal gpd_list. Signed-off-by: Thierry Strudel <tstrudel@google.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | | | | PM: ACPI: Refresh wakeup device power configuration every timeRafael J. Wysocki2020-12-071-7/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When wakeup signaling is enabled for a bridge for the second (or every next) time in a row, its existing device wakeup power configuration may not match the new conditions. For example, some devices below it may have been put into low-power states and that changes the device wakeup power conditions or similar. This causes functional problems to appear on some systems (for example, because of it the Thunderbolt port on Dell Precision 5550 cannot detect devices plugged in after it has been suspended). For this reason, modify __acpi_device_wakeup_enable() to refresh the device wakeup power configuration of the target device on every invocation, not just when it is called for that device first time in a row. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
| | | * | | | | PM: ACPI: PCI: Drop acpi_pm_set_bridge_wakeup()Rafael J. Wysocki2020-12-072-31/+14
| | | | |/ / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea behind acpi_pm_set_bridge_wakeup() was to allow bridges to be reference counted for wakeup enabling, because they may be enabled to signal wakeup on behalf of their subordinate devices and that may happen for multiple times in a row, whereas for the other devices it only makes sense to enable wakeup signaling once. However, this becomes problematic if the bridge itself is suspended, because it is treated as a "regular" device in that case and the reference counting doesn't work. For instance, suppose that there are two devices below a bridge and they both can signal wakeup. Every time one of them is suspended, wakeup signaling is enabled for the bridge, so when they both have been suspended, the bridge's wakeup reference counter value is 2. Say that the bridge is suspended subsequently and acpi_pci_wakeup() is called for it. Because the bridge can signal wakeup, that function will invoke acpi_pm_set_device_wakeup() to configure it and __acpi_pm_set_device_wakeup() will be called with the last argument equal to 1. This causes __acpi_device_wakeup_enable() invoked by it to omit the reference counting, because the reference counter of the target device (the bridge) is 2 at that time. Now say that the bridge resumes and one of the device below it resumes too, so the bridge's reference counter becomes 0 and wakeup signaling is disabled for it, but there is still the other suspended device which may need the bridge to signal wakeup on its behalf and that is not going to work. To address this scenario, use wakeup enable reference counting for all devices, not just for bridges, so drop the last argument from __acpi_device_wakeup_enable() and __acpi_pm_set_device_wakeup(), which causes acpi_pm_set_device_wakeup() and acpi_pm_set_bridge_wakeup() to become identical, so drop the latter and use the former instead of it everywhere. Fixes: 1ba51a7c1496 ("ACPI / PCI / PM: Rework acpi_pci_propagate_wakeup()") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
| | * | | | | PM: sleep: Add dev_wakeup_path() helperPatrice Chotard2020-11-233-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add dev_wakeup_path() helper to avoid to spread dev->power.wakeup_path test in drivers. Signed-off-by: Patrice Chotard <patrice.chotard@st.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | PM: sleep: Print driver flags for all devices during suspend/resumeChen Yu2020-11-101-2/+2
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there are 4 driver flags to control system suspend/resume behavior: DPM_FLAG_NO_DIRECT_COMPLETE, DPM_FLAG_SMART_PREPARE, DPM_FLAG_SMART_SUSPEND and DPM_FLAG_MAY_SKIP_RESUME. Print these flags during suspend/resume so as to get a brief understanding of the expected behavior of each device, and to facilitate suspend/resume debugging/tuning. To enable this tracing: echo 'file drivers/base/power/main.c +p' > /sys/kernel/debug/dynamic_debug/control Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | |
| | \ \ \ \
| *-. \ \ \ \ Merge branches 'pm-cpuidle' and 'pm-em'Rafael J. Wysocki2020-12-157-32/+100
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpuidle: cpuidle: Select polling interval based on a c-state with a longer target residency cpuidle: psci: Enable suspend-to-idle for PSCI OSI mode PM: domains: Enable dev_pm_genpd_suspend|resume() for suspend-to-idle PM: domains: Rename pm_genpd_syscore_poweroff|poweron() * pm-em: PM / EM: Micro optimization in em_cpu_energy PM: EM: Update Energy Model with new flag indicating power scale PM: EM: update the comments related to power scale PM: EM: Clarify abstract scale usage for power values in Energy Model
| | * | | | | | cpuidle: Select polling interval based on a c-state with a longer target ↵Mel Gorman2020-12-011-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | residency It was noted that a few workloads that idle rapidly regressed when commit 36fcb4292473 ("cpuidle: use first valid target residency as poll time") was merged. The workloads in question were heavy communicators that idle rapidly and were impacted by the c-state exit latency as the active CPUs were not polling at the time of wakeup. As they were not particularly realistic workloads, it was not considered to be a major problem. Unfortunately, a bug was reported for a real workload in a production environment that relied on large numbers of threads operating in a worker pool pattern. These threads would idle for periods of time longer than the C1 target residency and so incurred the c-state exit latency penalty. The application is very sensitive to wakeup latency and indirectly relying on behaviour prior to commit on a37b969a61c1 ("cpuidle: poll_state: Add time limit to poll_idle()") to poll for long enough to avoid the exit latency cost. The target residency of C1 is typically very short. On some x86 machines, it can be as low as 2 microseconds. In poll_idle(), the clock is checked every POLL_IDLE_RELAX_COUNT interations of cpu_relax() and even one iteration of that loop can be over 1 microsecond so the polling interval is very close to the granularity of what poll_idle() can detect. Furthermore, a basic ping pong workload like perf bench pipe has a longer round-trip time than the 2 microseconds meaning that the CPU will almost certainly not be polling when the ping-pong completes. This patch selects a polling interval based on an enabled c-state that has an target residency longer than 10usec. If there is no enabled-cstate then polling will be up to a TICK_NSEC/16 similar to what it was up until kernel 4.20. Polling for a full tick is unlikely (rescheduling event) and is much longer than the existing target residencies for a deep c-state. As an example, consider a CPU with the following c-state information from an Intel CPU; residency exit_latency C1 2 2 C1E 20 10 C3 100 33 C6 400 133 The polling interval selected is 20usec. If booted with intel_idle.max_cstate=1 then the polling interval is 250usec as the deeper c-states were not available. On an AMD EPYC machine, the c-state information is more limited and looks like residency exit_latency C1 2 1 C2 800 400 The polling interval selected is 250usec. While C2 was considered, the polling interval was clamped by CPUIDLE_POLL_MAX. Note that it is not expected that polling will be a universal win. As well as potentially trading power for performance, the performance is not guaranteed if the extra polling prevented a turbo state being reached. Making it a tunable was considered but it's driver-specific, may be overridden by a governor and is not a guaranteed polling interval making it difficult to describe without knowledge of the implementation. tbench4 vanilla polling Hmean 1 497.89 ( 0.00%) 543.15 * 9.09%* Hmean 2 975.88 ( 0.00%) 1059.73 * 8.59%* Hmean 4 1953.97 ( 0.00%) 2081.37 * 6.52%* Hmean 8 3645.76 ( 0.00%) 4052.95 * 11.17%* Hmean 16 6882.21 ( 0.00%) 6995.93 * 1.65%* Hmean 32 10752.20 ( 0.00%) 10731.53 * -0.19%* Hmean 64 12875.08 ( 0.00%) 12478.13 * -3.08%* Hmean 128 21500.54 ( 0.00%) 21098.60 * -1.87%* Hmean 256 21253.70 ( 0.00%) 21027.18 * -1.07%* Hmean 320 20813.50 ( 0.00%) 20580.64 * -1.12%* Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * | | | | | Merge back cpuidle changes for v5.11.Rafael J. Wysocki2020-11-236-30/+77
| | |\ \ \ \ \ \
| | | * | | | | | cpuidle: psci: Enable suspend-to-idle for PSCI OSI modeUlf Hansson2020-11-102-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To select domain idlestates for cpuidle-psci when OSI mode has been enabled, the PM domains via genpd are being managed through runtime PM. This works fine for the regular idlepath, but it doesn't during system wide suspend. More precisely, the domain idlestates becomes temporarily disabled, which is because the PM core disables runtime PM for devices during system wide suspend. Later in the system suspend phase, genpd intends to deal with this from its ->suspend_noirq() callback, but this doesn't work as expected for a device corresponding to a CPU, because the domain idlestates needs to be selected on a per CPU basis (the PM core doesn't invoke the callbacks like that). To address this problem, let's enable the syscore flag for the corresponding CPU device that becomes successfully attached to its PM domain (applicable only in OSI mode). This informs the PM core to skip invoke the system wide suspend/resume callbacks for the device, thus also prevents genpd from screwing up its internal state of it. Moreover, to properly select a domain idlestate for the CPUs during suspend-to-idle, let's assign a specific ->enter_s2idle() callback for the corresponding domain idlestate (applicable only in OSI mode). From that callback, let's invoke dev_pm_genpd_suspend|resume(), as this allows a domain idlestate to be selected for the current CPU by genpd. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | | | | | PM: domains: Enable dev_pm_genpd_suspend|resume() for suspend-to-idleUlf Hansson2020-11-101-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dev_pm_genpd_suspend|resume() have so far only been used during the syscore suspend/resume phases. However, during suspend-to-idle, where the syscore phases doesn't exist, similar operations are sometimes needed. An existing example are the timekeeping_suspend|resume() functions, which are being called both through a registered syscore ops during the syscore phases, but also as regular functions calls from cpuidle (via tick_freeze()) during suspend-to-idle. For similar reasons, let's enable the dev_pm_genpd_suspend|resume() APIs to be re-used for corresponding CPU devices that are attached to a genpd, during suspend-to-idle. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | | | | | PM: domains: Rename pm_genpd_syscore_poweroff|poweron()Ulf Hansson2020-11-104-24/+31
| | | | |/ / / / | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To better describe what the pm_genpd_syscore_poweroff|poweron() functions actually do, let's rename them to dev_pm_genpd_suspend|resume() and update the rather few callers of them accordingly (a couple of clocksource drivers). Moreover, let's take the opportunity to add some documentation of these exported functions, as that is currently missing. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | | Merge branch 'pm-cpufreq'Rafael J. Wysocki2020-12-1520-224/+297
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-cpufreq: (31 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE ...
| | * \ \ \ \ \ \ Merge branch 'cpufreq/arm/linux-next' of ↵Rafael J. Wysocki2020-12-1416-89/+159
| | |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.11-rc1 from Viresh Kumar: "This contains the following updates: - Fix imx's NVMEM_IMX_OCOTP dependency (Arnd Bergmann). - Add support for mt8167 and blacklist mt8516 (Fabien Parent). - Some ->get() callback related cleanups to the tegra194 driver and some optimizations in tegra186 driver (Jon Hunter and Sumit Gupta). - Power scale improvements to arm_scmi driver (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE and MODULE_ALIAS to several drivers (Pali Rohár). - Fix error path in mediatek driver (Qinglang Miao). - Fix memleak in ST's cpufreq driver (Yangtao Li)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (22 commits) cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE cpufreq: ap806: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init cpufreq: tegra194: get consistent cpuinfo_cur_freq cpufreq: blacklist mt8516 in cpufreq-dt-platdev cpufreq: mediatek: Add support for mt8167 ...
| | | * \ \ \ \ \ \ Merge branch 'cpufreq/scmi' into cpufreq/arm/linux-nextViresh Kumar2020-12-08396-24407/+2926
| | | |\ \ \ \ \ \ \
| | | | * | | | | | | cpufreq: arm_scmi: Discover the power scale in performance protocolLukasz Luba2020-12-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add mechanism to discover the power scale present in the performance protocol for all domains. Provide this information to Energy Model, which then can be checked in other frameworks, e.g. thermal. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
| | | | * | | | | | | firmware: arm_scmi: Add power_scale_mw_get() interfaceLukasz Luba2020-12-081-0/+8
| | | | | |_|/ / / / | | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new interface to the existing perf_ops and export the information about the power values scale. This would be used by the cpufreq driver and Energy Model framework to set the performance domains scale: milli-Watts or abstract scale. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Reviewed-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
| | | | * | | | | | PM: EM: Add a flag indicating units of power values in Energy ModelLukasz Luba2020-11-102-2/+3
| | | | | |/ / / / | | | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are different platforms and devices which might use different scale for the power values. Kernel sub-systems might need to check if all Energy Model (EM) devices are using the same scale. Address that issue and store the information inside EM for each device. Thanks to that they can be easily compared and proper action triggered. Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Quentin Perret <qperret@google.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>