summaryrefslogtreecommitdiffstats
path: root/drivers/cpuidle (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'pmdomain-v6.12' of ↵Linus Torvalds2024-09-183-27/+30
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain updates from Ulf Hansson: "pmdomain core: - Add support for s2idle for CPU PM domains on PREEMPT_RT - Add device managed version of dev_pm_domain_attach|detach_list() - Improve layout of the debugfs summary table pmdomain providers: - amlogic: Remove obsolete vpu domain driver - bcm: raspberrypi: Add support for devices used as wakeup-sources - imx: Fixup clock handling for imx93 at driver remove - rockchip: Add gating support for RK3576 - rockchip: Add support for RK3576 SoC - Some OF parsing simplifications - Some simplifications by using dev_err_probe() and guard() pmdomain consumers: - qcom/media/venus: Convert to the device managed APIs for PM domains cpuidle-psci: - Add support for s2idle/s2ram for the hierarchical topology on PREEMPT_RT - Some OF parsing simplifications" * tag 'pmdomain-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: (39 commits) pmdomain: core: Reduce debug summary table width pmdomain: core: Move mode_status_str() pmdomain: core: Fix "managed by" alignment in debug summary pmdomain: core: Harden inter-column space in debug summary pmdomain: rockchip: Add gating masks for rk3576 pmdomain: rockchip: Add gating support pmdomain: rockchip: Simplify dropping OF node reference pmdomain: mediatek: make use of dev_err_cast_probe() pmdomain: imx93-pd: drop the context variable "init_off" pmdomain: imx93-pd: don't unprepare clocks on driver remove pmdomain: imx93-pd: replace dev_err() with dev_err_probe() pmdomain: qcom: rpmpd: Simplify locking with guard() pmdomain: qcom: rpmhpd: Simplify locking with guard() pmdomain: qcom: cpr: Simplify locking with guard() pmdomain: qcom: cpr: Simplify with dev_err_probe() pmdomain: imx: gpcv2: Simplify with scoped for each OF child loop pmdomain: imx: gpc: Simplify with scoped for each OF child loop pmdomain: rockchip: SimplUlf Hanssonify locking with guard() pmdomain: rockchip: Simplify with scoped for each OF child loop pmdomain: qcom-cpr: Use scope based of_node_put() to simplify code. ...
| * cpuidle: dt_idle_genpd: Simplify with scoped for each OF child loopKrzysztof Kozlowski2024-08-201-10/+4
| | | | | | | | | | | | | | | | | | | | Use scoped for_each_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240816150931.142208-4-krzysztof.kozlowski@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * cpuidle: psci: Simplify with scoped for each OF child loopKrzysztof Kozlowski2024-08-201-5/+2
| | | | | | | | | | | | | | | | | | | | Use scoped for_each_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240816150931.142208-1-krzysztof.kozlowski@linaro.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * cpuidle: psci: Enable the hierarchical topology for s2idle on PREEMPT_RTUlf Hansson2024-08-051-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To enable the domain-idle-states to be used during s2idle on a PREEMPT_RT based configuration, let's allow the re-assignment of the ->enter_s2idle() callback to psci_enter_s2idle_domain_idle_state(). Similar to s2ram, let's leave the support for CPU hotplug outside PREEMPT_RT, as it's depending on using runtime PM. For s2idle, this means that an offline CPU's PM domain will remain powered-on. In practise this may lead to that a shallower idle-state than necessary gets selected, which shouldn't be an issue (besides wasting power). Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Raghavendra Kakarla <quic_rkakarla@quicinc.com> # qcm6490 with PREEMPT_RT set Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20240527142557.321610-8-ulf.hansson@linaro.org
| * cpuidle: psci: Enable the hierarchical topology for s2ram on PREEMPT_RTUlf Hansson2024-08-051-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hierarchical PM domain topology are currently disabled on a PREEMPT_RT based configuration. As a first step to enable it to be used, let's try to attach the CPU devices to their PM domains on PREEMPT_RT. In this way the syscore ops becomes available, allowing the PM domain topology to be managed during s2ram. For the moment let's leave the support for CPU hotplug outside PREEMPT_RT, as it's depending on using runtime PM. For s2ram, this isn't a problem as all CPUs are managed via the syscore ops. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Raghavendra Kakarla <quic_rkakarla@quicinc.com> # qcm6490 with PREEMPT_RT set Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20240527142557.321610-7-ulf.hansson@linaro.org
| * cpuidle: psci: Drop redundant assignment of CPUIDLE_FLAG_RCU_IDLEUlf Hansson2024-08-051-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | When using the hierarchical topology and PSCI OSI-mode we may end up overriding the deepest idle-state's ->enter|enter_s2idle() callbacks, but there is no point to also re-assign the CPUIDLE_FLAG_RCU_IDLE for the idle-state in question, as that has already been set when parsing the states from DT. See init_state_node(). Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Raghavendra Kakarla <quic_rkakarla@quicinc.com> # qcm6490 with PREEMPT_RT set Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20240527142557.321610-6-ulf.hansson@linaro.org
| * cpuidle: psci-domain: Enable system-wide suspend on PREEMPT_RTUlf Hansson2024-08-051-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The domain-idle-states are currently disabled on a PREEMPT_RT based configuration for the cpuidle-psci-domain. To enable them to be used for system-wide suspend and in particular during s2idle, let's set the GENPD_FLAG_RPM_ALWAYS_ON instead of GENPD_FLAG_ALWAYS_ON for the corresponding genpd provider. In this way, the runtime PM path remains disabled in genpd for its attached devices, while powering-on/off the PM domain during system-wide suspend becomes allowed. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Tested-by: Raghavendra Kakarla <quic_rkakarla@quicinc.com> # qcm6490 with PREEMPT_RT set Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20240527142557.321610-5-ulf.hansson@linaro.org
* | cpuidle: remove dead code from cpuidle_enter_state()Dhruva Gole2024-08-221-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Checking for index < 0 is useless because the find_deepest_state() function never really returns a negative value. Since this hasn't been reported in over 9 years it's dead code, so remove it. Signed-off-by: Dhruva Gole <d-gole@ti.com> Link: https://patch.msgid.link/20240821114250.1416421-1-d-gole@ti.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | cpuidle: riscv-sbi: Simplify with scoped for each OF child loopKrzysztof Kozlowski2024-08-201-5/+2
| | | | | | | | | | | | | | | | | | | | | | Use scoped for_each_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://patch.msgid.link/20240820094023.61155-2-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | cpuidle: riscv-sbi: Use scoped device node handling to fix missing of_node_putKrzysztof Kozlowski2024-08-201-14/+7
|/ | | | | | | | | | | | | Two return statements in sbi_cpuidle_dt_init_states() did not drop the OF node reference count. Solve the issue and simplify entire error handling with scoped/cleanup.h. Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver") Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Anup Patel <anup@brainfault.org> Link: https://patch.msgid.link/20240820094023.61155-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: teo: Don't count non-existent interceptsChristian Loehle2024-07-011-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When bailing out early, teo will not query the sleep length anymore since commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") with an expected sleep_length_ns value of KTIME_MAX. This lead to state0 accumulating lots of 'intercepts' because the actually measured sleep length was < KTIME_MAX, so query the sleep length instead for teo to recognize if it still is in an intercept-likely scenario without alternating between the two modes. Fundamentally we can only do one of the two: 1. Skip sleep_length_ns query when we think intercept is likely. 2. Have accurate data if sleep_length_ns is actually intercepted when we believe it is currently intercepted. Previously teo did the former while this patch chooses the latter as the additional time it takes to query the sleep length was found to be negligible and the variants of option 1 (count all unknowns as misses or count all unknown as hits) had significant regressions (as misses had lots of too shallow idle state selections and as hits had terrible performance in intercept-heavy workloads). Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") Link: https://patch.msgid.link/c40acf72-010f-4a8b-80e4-33f133ba266b@arm.com Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: teo: Remove recent intercepts metricChristian Loehle2024-06-281-63/+13
| | | | | | | | | | | | | | | | | | | | | | | | The logic for recent intercepts didn't work, there is an underflow of the 'recent' value that can be observed during boot already, which teo usually doesn't recover from, making the entire logic pointless. Furthermore the recent intercepts also were never reset, thus not actually being very 'recent'. Having underflowed 'recent' values lead to teo always acting as if we were in a scenario were expected sleep length based on timers is too high and it therefore unnecessarily selecting shallower states. Experiments show that the remaining 'intercept' logic is enough to quickly react to scenarios in which teo cannot rely on the timer expected sleep length. See also here: https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@arm.com/ Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment") Link: https://patch.msgid.link/20240628095955.34096-3-christian.loehle@arm.com Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Revert: "cpuidle: teo: Introduce util-awareness"Christian Loehle2024-06-281-105/+0
| | | | | | | | | | | | | | | | | | This reverts commit 9ce0f7c4bc64d820b02a1c53f7e8dba9539f942b. Util-awareness was reported to be too aggressive in selecting shallower states. Additionally a single threshold was found to not be suitable for reasoning about sleep length as, for all practical purposes, almost arbitrary sleep lengths are still possible for any load value. Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") Link: https://patch.msgid.link/20240628095955.34096-2-christian.loehle@arm.com Reported-by: Qais Yousef <qyousef@layalina.io> Reported-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Qais Yousef <qyousef@layalina.io> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: governors: teo: Fix a typo in a commentAtul Kumar Pant2024-06-211-1/+1
| | | | | | | | "terget" -> "target" Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com> Link: https://patch.msgid.link/20240616124025.16477-1-atulpant.linux@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: haltpoll: add missing MODULE_DESCRIPTION() macroJeff Johnson2024-06-141-0/+1
| | | | | | | | | | make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/cpuidle/cpuidle-haltpoll.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: menu: Cleanup after loadavg removalChristian Loehle2024-06-071-12/+5
| | | | | | | | | | The performance impact of loadavg was removed with commit a7fe5190c03f ("cpuidle: menu: Remove get_loadavg() from the performance multiplier") With only iowait remaining the description can be simplified, remove also the no longer needed includes. Signed-off-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge tag 'pmdomain-v6.10' of ↵Linus Torvalds2024-05-163-23/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain updates from Ulf Hansson: "pmdomain core: - Don't clear suspended_count at genpd_prepare() - Update the rejected/usage counters at system suspend too pmdomain providers: - ti-sci: Fix duplicate PD referrals - mediatek: Add MT8188 buck isolation setting - renesas: Add R-Car M3-W power-off delay quirk - renesas: Split R-Car M3-W and M3-W+ sub-drivers cpuidle-psci: - Update MAINTAINERS to set a git for DT IDLE PM DOMAIN/ARM PSCI PM DOMAIN - Update init level to core_initcall() - Drop superfluous wrappers psci_dt_attach|detach_cpu()" * tag 'pmdomain-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: ti-sci: Fix duplicate PD referrals pmdomain: core: Don't clear suspended_count at genpd_prepare() pmdomain: core: Update the rejected/usage counters at system suspend too pmdomain: renesas: rcar-sysc: Add R-Car M3-W power-off delay quirk pmdomain: renesas: rcar-sysc: Remove rcar_sysc_nullify() helper pmdomain: renesas: rcar-sysc: Split R-Car M3-W and M3-W+ sub-drivers pmdomain: renesas: rcar-sysc: Absorb rcar_sysc_ch into rcar_sysc_pd MAINTAINERS: Add a git for the DT IDLE PM DOMAIN MAINTAINERS: Add a git for the ARM PSCI PM DOMAIN cpuidle: psci: Update init level to core_initcall() cpuidle: psci: Drop superfluous wrappers psci_dt_attach|detach_cpu() pmdomain: mediatek: Add MT8188 buck isolation setting pmdomain: mediatek: scpsys: drop driver owner assignment
| * cpuidle: psci: Update init level to core_initcall()Maulik Shah2024-04-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Clients like regulators, interconnects and clocks depend on rpmh-rsc to vote on resources and rpmh-rsc depends on psci power-domains to complete probe. All of them are in core_initcall(). Change psci domain init level to core_initcall() to avoid probe defer from all of the above. Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Link: https://lore.kernel.org/r/20240217-init_level-v1-2-bde9e11f8317@quicinc.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
| * cpuidle: psci: Drop superfluous wrappers psci_dt_attach|detach_cpu()Ulf Hansson2024-04-043-22/+4
| | | | | | | | | | | | | | | | | | To simplify the code, let's drop psci_dt_attach|detach_cpu() and use the common dt_idle_attach|detach_cpu() directly instead. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Link: https://lore.kernel.org/r/20240228151139.2650258-1-ulf.hansson@linaro.org
* | Merge tag 'pm-6.10-rc1' of ↵Linus Torvalds2024-05-142-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are mostly cpufreq updates, including a significant intel-pstate driver update and several amd-pstate improvements plus some updates of ARM cpufreq drivers, general fixes and cleanups. Also included are changes related to system sleep, power capping updates adding support for a new platform and a new hardware feature (among other things), a Samsung exynos-asv driver update allowing it to change its Energy Model after adjusting voltage, minor cpuidle and devfreq updates and a small documentation cleanup. Specifics: - Rework the handling of disabled turbo in the intel_pstate driver and make it update the maximum CPU frequency consistently regardless of the reason on top of a number of cleanups (Rafael Wysocki) - Add missing checks for NULL .exit() cpufreq driver callback to the cpufreq core (Viresh Kumar) - Prevent pulicy->max from going above the frequency QoS maximum value when cpufreq_frequency_table_verify() is used (Xuewen Yan) - Prevent a negative CPU number or frequency value from being printed if they are really large (Joshua Yeong) - Update MAINTAINERS entry for amd-pstate to add two new submaintainers and a designated reviewer (Huang Rui) - Clean up the amd-pstate driver and update its documentation (Gautham Shenoy) - Fix the highest frequency issue in the amd-pstate driver which limits performance (Perry Yuan) - Enable CPPC v2 for certain processors in the family 17H, as requested by TR40 processor users who expect improved performance and lower system temperature (Perry Yuan) - Change latency and delay values to be read from platform firmware firstly for more accurate timing (Perry Yuan) - A new quirk is introduced for supporting amd-pstate on legacy processors which either lack CPPC capability, or only only have CPPC v2 capability (Perry Yuan) - Sun50i cpufreq: Add support for opp_supported_hw, H616 platform and general cleanups (Andre Przywara, Martin Botka, Brandon Cheo Fusi, Dan Carpenter, Viresh Kumar) - CPPC cpufreq: Fix possible null pointer dereference (Aleksandr Mishin) - Eliminate uses of of_node_put() from cpufreq (Javier Carrasco, Shivani Gupta) - brcmstb-avs: ISO C90 forbids mixed declarations (Portia Stephens) - mediatek cpufreq: Add support for MT7988A (Sam Shih) - cpufreq-qcom-hw: Add SM4450 compatibles in DT bindings (Tengfei Fan) - Fix struct cpudata::epp_cached kernel-doc in the intel_pstate cpufreq driver (Jeff Johnson) - Fix kerneldoc description of ladder_do_selection() (Jeff Johnson) - Convert the cpuidle kirkwood driver to platform remove callback returning void (Yangtao Li) - Replace deprecated strncpy() with strscpy() in the hibernation core code (Justin Stitt) - Use %ps to simplify debug output in the core system-wide suspend and resume code (Len Brown) - Remove unnecessary else from device_init_wakeup() and make device_wakeup_disable() return void (Dhruva Gole) - Enable PMU support in the Intel TPMI RAPL driver (Zhang Rui) - Add support for ArrowLake-H platform to the Intel RAPL driver (Zhang Rui) - Avoid explicit cpumask allocation on stack in DTPM (Dawei Li) - Make the Samsung exynos-asv driver update the Energy Model after adjusting voltage on top of some preliminary changes of the OPP and Enery Model generic code (Lukasz Luba) - Remove a reference to a function that has been dropped from the power management documentation (Bjorn Helgaas) - Convert the platfrom remove callback to .remove_new for the exyno-nocp, exynos-ppmu, mtk-cci-devfreq, sun8i-a33-mbus, and rk3399_dmc devfreq drivers (Uwe Kleine-König) - Use DEFINE_SIMPLE_PM_OPS for exyno-bus.c driver (Anand Moon)" * tag 'pm-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (68 commits) PM / devfreq: exynos: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functions PM / devfreq: rk3399_dmc: Convert to platform remove callback returning void PM / devfreq: sun8i-a33-mbus: Convert to platform remove callback returning void PM / devfreq: mtk-cci: Convert to platform remove callback returning void PM / devfreq: exynos-ppmu: Convert to platform remove callback returning void PM / devfreq: exynos-nocp: Convert to platform remove callback returning void cpufreq: amd-pstate: fix the highest frequency issue which limits performance cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc cpuidle: ladder: fix ladder_do_selection() kernel-doc powercap: intel_rapl_tpmi: Enable PMU support powercap: intel_rapl: Introduce APIs for PMU support PM: hibernate: replace deprecated strncpy() with strscpy() cpufreq: Fix up printing large CPU numbers and frequency values MAINTAINERS: cpufreq: amd-pstate: Add co-maintainers and reviewer cpufreq: amd-pstate: remove unused variable lowest_nonlinear_freq cpufreq: amd-pstate: fix code format problems cpufreq: amd-pstate: Add quirk for the pstate CPPC capabilities missing cppc_acpi: print error message if CPPC is unsupported cpufreq: amd-pstate: get transition delay and latency value from ACPI tables cpufreq: amd-pstate: Bail out if min/max/nominal_freq is 0 ...
| * | cpuidle: ladder: fix ladder_do_selection() kernel-docJeff Johnson2024-05-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make C=1 reports: warning: Function parameter or struct member 'dev' not described in 'ladder_do_selection' Document 'dev' for this function. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | cpuidle: kirkwood: Convert to platform remove callback returning voidYangtao Li2024-04-231-3/+2
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is (mostly) ignored and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Yangtao Li <frank.li@vivo.com> Reviewed-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20230712094014.41787-1-frank.li@vivo.com
* / cpuidle: Avoid explicit cpumask allocation on stackDawei Li2024-04-241-10/+3
|/ | | | | | | | | | | | | | In general it's preferable to avoid placing cpumasks on the stack, as for large values of NR_CPUS these can consume significant amounts of stack space and make stack overflows more likely. Use cpumask_first_and_and() and cpumask_weight_and() to avoid the need for a temporary cpumask on the stack. Signed-off-by: Dawei Li <dawei.li@shingroup.cn> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240416085454.3547175-8-dawei.li@shingroup.cn
* Merge tag 'riscv-for-linus-6.9-mw2' of ↵Linus Torvalds2024-03-221-44/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various vector-accelerated crypto routines - Hibernation is now enabled for portable kernel builds - mmap_rnd_bits_max is larger on systems with larger VAs - Support for fast GUP - Support for membarrier-based instruction cache synchronization - Support for the Andes hart-level interrupt controller and PMU - Some cleanups around unaligned access speed probing and Kconfig settings - Support for ACPI LPI and CPPC - Various cleanus related to barriers - A handful of fixes * tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits) riscv: Fix syscall wrapper for >word-size arguments crypto: riscv - add vector crypto accelerated AES-CBC-CTS crypto: riscv - parallelize AES-CBC decryption riscv: Only flush the mm icache when setting an exec pte riscv: Use kcalloc() instead of kzalloc() riscv/barrier: Add missing space after ',' riscv/barrier: Consolidate fence definitions riscv/barrier: Define RISCV_FULL_BARRIER riscv/barrier: Define __{mb,rmb,wmb} RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ cpufreq: Move CPPC configs to common Kconfig and add RISC-V ACPI: RISC-V: Add CPPC driver ACPI: Enable ACPI_PROCESSOR for RISC-V ACPI: RISC-V: Add LPI driver cpuidle: RISC-V: Move few functions to arch/riscv riscv: Introduce set_compat_task() in asm/compat.h riscv: Introduce is_compat_thread() into compat.h riscv: add compile-time test into is_compat_task() riscv: Replace direct thread flag check with is_compat_task() riscv: Improve arch_get_mmap_end() macro ...
| * cpuidle: RISC-V: Move few functions to arch/riscvSunil V L2024-03-201-44/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To support ACPI Low Power Idle (LPI), few functions are required which are currently static functions in the DT based cpuidle driver. Hence, move them under arch/riscv so that ACPI driver also can use them. Since they are no longer static functions, append "riscv_" prefix to the function name. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20240118062930.245937-2-sunilvl@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* | Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds2024-03-151-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...
| * | x86/mm: delete unused cpu argument to leave_mm()Yosry Ahmed2024-02-221-1/+1
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | The argument is unused since commit 3d28ebceaffa ("x86/mm: Rework lazy TLB to track the actual loaded mm"), delete it. Link: https://lkml.kernel.org/r/20240126080644.1714297-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | cpuidle: Avoid potential overflow in integer multiplicationC Cheng2024-02-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In detail: In C language, when you perform a multiplication operation, if both operands are of int type, the multiplication operation is performed on the int type, and then the result is converted to the target type. This means that if the product of int type multiplication exceeds the range that int type can represent, an overflow will occur even if you store the result in a variable of int64_t type. For a multiplication of two int values, it is better to use mul_u32_u32() rather than s->exit_latency_ns = s->exit_latency * NSEC_PER_USEC to avoid potential overflow happenning. Signed-off-by: C Cheng <C.Cheng@mediatek.com> Signed-off-by: Bo Ye <bo.ye@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> [ rjw: New subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | cpuidle: haltpoll: do not shrink guest poll_limit_ns below grow_startParshuram Sangle2024-02-121-2/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While adjusting guest halt poll limit, grow block starts at guest_halt_poll_grow_start without taking intermediate values. Similar behavior is expected while shrinking the value. This avoids short interval values which are really not required. VCPU1 trace (guest_halt_poll_shrink equals 2): VCPU1 grow 10000 VCPU1 shrink 5000 VCPU1 shrink 2500 VCPU1 shrink 1250 VCPU1 shrink 625 VCPU1 shrink 312 VCPU1 shrink 156 VCPU1 shrink 78 VCPU1 shrink 39 VCPU1 shrink 19 VCPU1 shrink 9 VCPU1 shrink 4 Similar change is done in KVM halt poll flow with below patch: Link: https://lore.kernel.org/kvm/20211006133021.271905-3-sashal@kernel.org/ Co-developed-by: Rajendran Jaishankar <jaishankar.rajendran@intel.com> Signed-off-by: Rajendran Jaishankar <jaishankar.rajendran@intel.com> Signed-off-by: Parshuram Sangle <parshuram.sangle@intel.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: haltpoll: Do not enable interrupts when entering idleBorislav Petkov (AMD)2023-12-291-5/+4
| | | | | | | | | | | | | | | | | The cpuidle drivers' ->enter() methods are supposed to be IRQ invariant: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") bb7b11258561 ("cpuidle: Move IRQ state validation") Do that in the haltpoll driver too. Fixes: 5e26aa933911 ("cpuidle/poll: Ensure IRQs stay disabled after cpuidle_state::enter() calls") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218245 Reported-by: <forza@tnonline.net> Tested-by: <forza@tnonline.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* cpuidle: dt: Replace deprecated strncpy() with strscpy()Justin Stitt2023-09-291-2/+2
| | | | | | | | | | | | | | | | | | | `strncpy` is deprecated for use on NUL-terminated destination strings [1]. We should prefer more robust and less ambiguous string interfaces. A suitable replacement is `strscpy` [2] due to the fact that it guarantees NUL-termination on the destination buffer. With this, we can also drop the now unnecessary `CPUIDLE_(NAME|DESC)_LEN - 1` pieces. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20230913-strncpy-drivers-cpuidle-dt_idle_states-c-v1-1-d16a0dbe5658@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
* Merge tag 'powerpc-6.6-1' of ↵Linus Torvalds2023-08-311-7/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks - Add support for the "nohlt" command line option to disable CPU idle - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1 - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory - Various other small features and fixes Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai. * tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits) macintosh/ams: linux/platform_device.h is needed powerpc/xmon: Reapply "Relax frame size for clang" powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled powerpc/iommu: Fix notifiers being shared by PCI and VIO buses powerpc/mpc5xxx: Add missing fwnode_handle_put() powerpc/config: Disable SLAB_DEBUG_ON in skiroot powerpc/pseries: Remove unused hcall tracing instruction powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n powerpc: dts: add missing space before { powerpc/eeh: Use pci_dev_id() to simplify the code powerpc/64s: Move CPU -mtune options into Kconfig powerpc/powermac: Fix unused function warning powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT powerpc: Don't include lppaca.h in paca.h powerpc/pseries: Move hcall_vphn() prototype into vphn.h powerpc/pseries: Move VPHN constants into vphn.h cxl: Drop unused detach_spa() powerpc: Drop zalloc_maybe_bootmem() powerpc/powernv: Use struct opal_prd_msg in more places ...
| * powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPTRussell Currey2023-08-241-7/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lppaca_shared_proc() takes a pointer to the lppaca which is typically accessed through get_lppaca(). With DEBUG_PREEMPT enabled, this leads to checking if preemption is enabled, for example: BUG: using smp_processor_id() in preemptible [00000000] code: grep/10693 caller is lparcfg_data+0x408/0x19a0 CPU: 4 PID: 10693 Comm: grep Not tainted 6.5.0-rc3 #2 Call Trace: dump_stack_lvl+0x154/0x200 (unreliable) check_preemption_disabled+0x214/0x220 lparcfg_data+0x408/0x19a0 ... This isn't actually a problem however, as it does not matter which lppaca is accessed, the shared proc state will be the same. vcpudispatch_stats_procfs_init() already works around this by disabling preemption, but the lparcfg code does not, erroring any time /proc/powerpc/lparcfg is accessed with DEBUG_PREEMPT enabled. Instead of disabling preemption on the caller side, rework lppaca_shared_proc() to not take a pointer and instead directly access the lppaca, bypassing any potential preemption checks. Fixes: f13c13a00512 ("powerpc: Stop using non-architected shared_proc field in lppaca") Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Rework to avoid needing a definition in paca.h and lppaca.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230823055317.751786-4-mpe@ellerman.id.au
* | Merge branches 'pm-cpuidle' and 'pm-cpufreq'Rafael J. Wysocki2023-08-253-111/+203
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge CPU power management updates for 6.6-rc1: - Rework the menu and teo cpuidle governors to avoid calling tick_nohz_get_sleep_length(), which is likely to become quite expensive going forward, too often and improve making decisions regarding whether or not to stop the scheduler tick in the teo governor (Rafael Wysocki). - Improve the performance of cpufreq_stats_create_table() in some cases (Liao Chang). - Fix two issues in the amd-pstate-ut cpufreq driver (Swapnil Sapkal). - Use clamp() helper macro to improve the code readability in cpufreq_verify_within_limits() (Liao Chang). - Set stale CPU frequency to minimum in intel_pstate (Doug Smythies). * pm-cpuidle: cpuidle: teo: Avoid unnecessary variable assignments cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases cpuidle: teo: Gather statistics regarding whether or not to stop the tick cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfront cpuidle: teo: Drop utilized from struct teo_cpu cpuidle: teo: Avoid stopping the tick unnecessarily when bailing out cpuidle: teo: Update idle duration estimate when choosing shallower state * pm-cpufreq: cpufreq: amd-pstate-ut: Fix kernel panic when loading the driver cpufreq: amd-pstate-ut: Remove module parameter access cpufreq: Use clamp() helper macro to improve the code readability cpufreq: intel_pstate: set stale CPU frequency to minimum cpufreq: stats: Improve the performance of cpufreq_stats_create_table()
| * | cpuidle: teo: Avoid unnecessary variable assignmentsRafael J. Wysocki2023-08-231-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Notice that it is not necessary to assign tick_intercept_sum in every iteration of the first loop over idle states in teo_select(), because the intercept_sum value does not change after the assignment in a given iteration of the loop, so its value after the last iteration of the loop can be used for computing the tick_intercept_sum value directly. Modify the code accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some casesRafael J. Wysocki2023-08-173-34/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in menu_select() so it first uses the statistics to determine the expected idle duration. If that value is higher than RESIDENCY_THRESHOLD_NS, tick_nohz_get_sleep_length() will be called to obtain the time till the closest timer and refine the idle duration prediction if necessary. This causes the governor to always take the full overhead of get_typical_interval() with the assumption that the cost will be amortized by skipping the tick_nohz_get_sleep_length() call in the cases when the predicted idle duration is relatively very small. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Doug Smythies <dsmythies@telus.net>
| * | cpuidle: teo: Gather statistics regarding whether or not to stop the tickRafael J. Wysocki2023-08-091-1/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if the target residency of the deepest idle state is less than the tick period length, which is quite likely for HZ=100, and the deepest idle state is about to be selected by the TEO idle governor, the decision on whether or not to stop the scheduler tick is based entirely on the time till the closest timer. This is often insufficient, because timers may not be in heavy use and there may be a plenty of other CPU wakeup events between the deepest idle state's target residency and the closest tick. Allow the governor to count those events by making the deepest idle state's bin effectively end at TICK_NSEC and introducing an additional "bin" for collecting "hit" events (ie. the ones in which the measured idle duration falls into the same bin as the time till the closest timer) with idle duration values past TICK_NSEC. This way the "intercepts" metric for the deepest idle state's bin becomes nonzero in general, and so it can influence the decision on whether or not to stop the tick possibly increasing the governor's accuracy in that respect. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
| * | cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some casesRafael J. Wysocki2023-08-091-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make teo_select() avoid calling tick_nohz_get_sleep_length() if the candidate idle state to return is state 0 or if state 0 is a polling one and the target residency of the current candidate one is below a certain threshold, in which cases it may be assumed that the CPU will be woken up immediately by a non-timer wakeup source and the timers are not likely to matter. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
| * | cpuidle: teo: Do not call tick_nohz_get_sleep_length() upfrontRafael J. Wysocki2023-08-091-61/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the cost of calling tick_nohz_get_sleep_length() may increase in the future, reorder the code in teo_select() so it first uses the statistics to pick up a candidate idle state and applies the utilization heuristic to it and only then calls tick_nohz_get_sleep_length() to obtain the sleep length value and refine the selection if necessary. This change by itself does not cause tick_nohz_get_sleep_length() to be called less often, but it prepares the code for subsequent changes that will do so. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Tested-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
| * | cpuidle: teo: Drop utilized from struct teo_cpuRafael J. Wysocki2023-08-031-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because the utilized field in struct teo_cpu is only used locally in teo_select(), replace it with a local variable in that function. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
| * | cpuidle: teo: Avoid stopping the tick unnecessarily when bailing outRafael J. Wysocki2023-08-031-23/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When teo_select() is going to return early in some special cases, make it avoid stopping the tick if the idle state to be returned is shallow. In particular, never stop the tick if state 0 is to be returned. Link: https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@mail.gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
| * | cpuidle: teo: Update idle duration estimate when choosing shallower stateRafael J. Wysocki2023-08-031-10/+30
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TEO governor takes CPU utilization into account by refining idle state selection when the utilization is above a certain threshold. This is done by choosing an idle state shallower than the previously selected one. However, when doing this, the idle duration estimate needs to be adjusted so as to prevent the scheduler tick from being stopped when the candidate idle state is shallow, which may lead to excessive energy usage if the CPU is not woken up quickly enough going forward. Moreover, if the scheduler tick has been stopped already and the new idle duration estimate is too small, the replacement candidate state cannot be used. Modify the relevant code to take the above observations into account. Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") Link: https://lore.kernel.org/linux-pm/CAJZ5v0jJxHj65r2HXBTd3wfbZtsg=_StzwO1kA5STDnaPe_dWA@mail.gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-and-tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
* | cpuidle: psci: Move enabling OSI mode after power domains creationMaulik Shah2023-08-081-26/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A switch from OSI to PC mode is only possible if all CPUs other than the calling one are OFF, either through a call to CPU_OFF or not yet booted. Currently OSI mode is enabled before power domains are created. In cases where CPUidle states are not using hierarchical CPU topology the bail out path tries to switch back to PC mode which gets denied by firmware since other CPUs are online at this point and creates inconsistent state as firmware is in OSI mode and Linux in PC mode. This change moves enabling OSI mode after power domains are created, this would makes sure that hierarchical CPU topology is used before switching firmware to OSI mode. Cc: stable@vger.kernel.org Fixes: 70c179b49870 ("cpuidle: psci: Allow PM domain to be initialized even if no OSI mode") Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* | cpuidle: dt_idle_genpd: Add helper function to remove genpd topologyMaulik Shah2023-08-082-0/+31
|/ | | | | | | | | | | | Genpd parent and child domain topology created using dt_idle_pd_init_topology() needs to be removed during error cases. Add new helper function dt_idle_pd_remove_topology() for same. Cc: stable@vger.kernel.org Reviewed-by: Ulf Hanssson <ulf.hansson@linaro.org> Signed-off-by: Maulik Shah <quic_mkshah@quicinc.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* cpuidle: Use local_clock_noinstr()Peter Zijlstra2023-06-052-6/+6
| | | | | | | | | | With the introduction of local_clock_noinstr(), local_clock() itself is no longer marked noinstr, use the correct function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Tested-by: Michael Kelley <mikelley@microsoft.com> # Hyper-V Link: https://lore.kernel.org/r/20230519102716.045980863@infradead.org
* RISC-V: Align SBI probe implementation with specAndrew Jones2023-04-291-1/+1
| | | | | | | | | | | | | | | | sbi_probe_extension() is specified with "Returns 0 if the given SBI extension ID (EID) is not available, or 1 if it is available unless defined as any other non-zero value by the implementation." Additionally, sbiret.value is a long. Fix the implementation to ensure any nonzero long value is considered a success, rather than only positive int values. Fixes: b9dcd9e41587 ("RISC-V: Add basic support for SBI v0.2") Signed-off-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230427163626.101042-1-ajones@ventanamicro.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
* Merge tag 'powerpc-6.4-1' of ↵Linus Torvalds2023-04-291-14/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add support for building the kernel using PC-relative addressing on Power10. - Allow HV KVM guests on Power10 to use prefixed instructions. - Unify support for the P2020 CPU (85xx) into a single machine description. - Always build the 64-bit kernel with 128-bit long double. - Drop support for several obsolete 2000's era development boards as identified by Paul Gortmaker. - A series fixing VFIO on Power since some generic changes. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu, Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and Timothy Pearson. * tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits) powerpc/64s: Disable pcrel code model on Clang powerpc: Fix merge conflict between pcrel and copy_thread changes powerpc/configs/powernv: Add IGB=y powerpc/configs/64s: Drop JFS Filesystem powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest powerpc/configs: Make pseries_le an alias for ppc64le_guest powerpc/configs: Incorporate generic kvm_guest.config into guest configs powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs powerpc/configs/64s: Enable Device Mapper options powerpc/configs/64s: Enable PSTORE powerpc/configs/64s: Enable VLAN support powerpc/configs/64s: Enable BLK_DEV_NVME powerpc/configs/64s: Drop REISERFS powerpc/configs/64s: Use SHA512 for module signatures powerpc/configs/64s: Enable IO_STRICT_DEVMEM powerpc/configs/64s: Enable SCHEDSTATS powerpc/configs/64s: Enable DEBUG_VM & other options powerpc/configs/64s: Enable EMULATED_STATS powerpc/configs/64s: Enable KUNIT and most tests ...
| * cpuidle: pseries: Mark ->enter() functions as __cpuidleMichael Ellerman2023-04-201-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Code in the idle path is not allowed to be instrumented because RCU is disabled, see commit 0e985e9d2286 ("cpuidle: Add comments about noinstr/__cpuidle usage"). Mark the cpuidle ->enter() callbacks as __cpuidle and use the raw_local_irq_*() routines to ensure that is the case. Reported-by: Sachin Sant <sachinp@linux.ibm.com> Link: https://lore.kernel.org/all/4C073F6A-C812-4C4A-BB7A-ECD10B75FB88@linux.ibm.com/ Suggested-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230406144535.3786008-3-mpe@ellerman.id.au
* | Merge tag 'driver-core-6.4-rc1' of ↵Linus Torvalds2023-04-273-5/+12
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the large set of driver core changes for 6.4-rc1. Once again, a busy development cycle, with lots of changes happening in the driver core in the quest to be able to move "struct bus" and "struct class" into read-only memory, a task now complete with these changes. This will make the future rust interactions with the driver core more "provably correct" as well as providing more obvious lifetime rules for all busses and classes in the kernel. The changes required for this did touch many individual classes and busses as many callbacks were changed to take const * parameters instead. All of these changes have been submitted to the various subsystem maintainers, giving them plenty of time to review, and most of them actually did so. Other than those changes, included in here are a small set of other things: - kobject logging improvements - cacheinfo improvements and updates - obligatory fw_devlink updates and fixes - documentation updates - device property cleanups and const * changes - firwmare loader dependency fixes. All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits) device property: make device_property functions take const device * driver core: update comments in device_rename() driver core: Don't require dynamic_debug for initcall_debug probe timing firmware_loader: rework crypto dependencies firmware_loader: Strip off \n from customized path zram: fix up permission for the hot_add sysfs file cacheinfo: Add use_arch[|_cache]_info field/function arch_topology: Remove early cacheinfo error message if -ENOENT cacheinfo: Check cache properties are present in DT cacheinfo: Check sib_leaf in cache_leaves_are_shared() cacheinfo: Allow early level detection when DT/ACPI info is missing/broken cacheinfo: Add arm64 early level initializer implementation cacheinfo: Add arch specific early level initializer tty: make tty_class a static const structure driver core: class: remove struct class_interface * from callbacks driver core: class: mark the struct class in struct class_interface constant driver core: class: make class_register() take a const * driver core: class: mark class_release() as taking a const * driver core: remove incorrect comment for device_create* MIPS: vpe-cmp: remove module owner pointer from struct class usage. ...
| * \ Merge 6.3-rc5 into driver-core-nextGreg Kroah-Hartman2023-04-031-1/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | We need the fixes in here for testing, as well as the driver core changes for documentation updates to build on. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>