summaryrefslogtreecommitdiffstats
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-02 22:17:23 +0200
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-02 22:17:23 +0200
commit355ba37d756c38962fe9bb616f5f48eb12a7e11e (patch)
tree74bb0fc617a9a9fb5660454c205bf3d08d95896a /Documentation/admin-guide
parentMerge tag 'platform-drivers-x86-v5.8-1' of git://git.infradead.org/linux-plat... (diff)
parentMerge branches 'pm-devfreq', 'powercap', 'pm-docs' and 'pm-tools' (diff)
downloadlinux-355ba37d756c38962fe9bb616f5f48eb12a7e11e.tar.xz
linux-355ba37d756c38962fe9bb616f5f48eb12a7e11e.zip
Merge tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "These rework the system-wide PM driver flags, make runtime switching of cpuidle governors easier, improve the user space hibernation interface code, add intel-speed-select interface documentation, add more debug messages to the ACPI code handling suspend to idle, update the cpufreq core and drivers, fix a minor issue in the cpuidle core and update two cpuidle drivers, improve the PM-runtime framework, update the Intel RAPL power capping driver, update devfreq core and drivers, and clean up the cpupower utility. Specifics: - Rework the system-wide PM driver flags to make them easier to understand and use and update their documentation (Rafael Wysocki, Alan Stern). - Allow cpuidle governors to be switched at run time regardless of the kernel configuration and update the related documentation accordingly (Hanjun Guo). - Improve the resume device handling in the user space hibernarion interface code (Domenico Andreoli). - Document the intel-speed-select sysfs interface (Srinivas Pandruvada). - Make the ACPI code handing suspend to idle print more debug messages to help diagnose issues with it (Rafael Wysocki). - Fix a helper routine in the cpufreq core and correct a typo in the struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang Wenhu). - Update cpufreq drivers: - Make the intel_pstate driver start in the passive mode by default on systems without HWP (Rafael Wysocki). - Add i.MX7ULP support to the imx-cpufreq-dt driver and add i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan). - Convert the qoriq cpufreq driver to a platform one, make the platform code create a suitable device object for it and add platform dependencies to it (Mian Yousaf Kaukab, Geert Uytterhoeven). - Fix wrong compatible binding in the qcom driver (Ansuel Smith). - Build the omap driver by default for ARCH_OMAP2PLUS (Anders Roxell). - Add r8a7742 SoC support to the dt cpufreq driver (Lad Prabhakar). - Update cpuidle core and drivers: - Fix three reference count leaks in error code paths in the cpuidle core (Qiushi Wu). - Convert Qualcomm SPM to a generic cpuidle driver (Stephan Gerhold). - Fix up the execution order when entering a domain idle state in the PSCI driver (Ulf Hansson). - Fix a reference counting issue related to clock management and clean up two oddities in the PM-runtime framework (Rafael Wysocki, Andy Shevchenko). - Add ElkhartLake support to the Intel RAPL power capping driver and remove an unused local MSR definition from it (Jacob Pan, Sumeet Pawnikar). - Update devfreq core and drivers: - Replace strncpy() with strscpy() in the devfreq core and use lockdep asserts instead of manual checks for a locked mutex in it (Dmitry Osipenko, Krzysztof Kozlowski). - Add a generic imx bus scaling driver and make it register an interconnect device (Leonard Crestez, Gustavo A. R. Silva). - Make the cpufreq notifier in the tegra30 driver take boosting into account and delete an unuseful error message from that driver (Dmitry Osipenko, Markus Elfring). - Remove unneeded semicolon from the cpupower code (Zou Wei)" * tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits) cpuidle: Fix three reference count leaks PM: runtime: Replace pm_runtime_callbacks_present() PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting PM: hibernate: Restrict writes to the resume device PM: runtime: clk: Fix clk_pm_runtime_get() error path cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver ACPI: EC: PM: s2idle: Extend GPE dispatching debug message ACPI: PM: s2idle: Print type of wakeup debug messages powercap: RAPL: remove unused local MSR define PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend() Documentation: admin-guide: pm: Document intel-speed-select PM: hibernate: Split off snapshot dev option PM: hibernate: Incorporate concurrency handling Documentation: ABI: make current_governer_ro as a candidate for removal ...
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/pm/cpuidle.rst20
-rw-r--r--Documentation/admin-guide/pm/intel-speed-select.rst917
-rw-r--r--Documentation/admin-guide/pm/intel_pstate.rst32
-rw-r--r--Documentation/admin-guide/pm/working-state.rst1
4 files changed, 946 insertions, 24 deletions
diff --git a/Documentation/admin-guide/pm/cpuidle.rst b/Documentation/admin-guide/pm/cpuidle.rst
index 5605cc6f9560..a96a423e3779 100644
--- a/Documentation/admin-guide/pm/cpuidle.rst
+++ b/Documentation/admin-guide/pm/cpuidle.rst
@@ -159,17 +159,15 @@ governor uses that information depends on what algorithm is implemented by it
and that is the primary reason for having more than one governor in the
``CPUIdle`` subsystem.
-There are three ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_
-and ``ladder``. Which of them is used by default depends on the configuration
-of the kernel and in particular on whether or not the scheduler tick can be
-`stopped by the idle loop <idle-cpus-and-tick_>`_. It is possible to change the
-governor at run time if the ``cpuidle_sysfs_switch`` command line parameter has
-been passed to the kernel, but that is not safe in general, so it should not be
-done on production systems (that may change in the future, though). The name of
-the ``CPUIdle`` governor currently used by the kernel can be read from the
-:file:`current_governor_ro` (or :file:`current_governor` if
-``cpuidle_sysfs_switch`` is present in the kernel command line) file under
-:file:`/sys/devices/system/cpu/cpuidle/` in ``sysfs``.
+There are four ``CPUIdle`` governors available, ``menu``, `TEO <teo-gov_>`_,
+``ladder`` and ``haltpoll``. Which of them is used by default depends on the
+configuration of the kernel and in particular on whether or not the scheduler
+tick can be `stopped by the idle loop <idle-cpus-and-tick_>`_. Available
+governors can be read from the :file:`available_governors`, and the governor
+can be changed at runtime. The name of the ``CPUIdle`` governor currently
+used by the kernel can be read from the :file:`current_governor_ro` or
+:file:`current_governor` file under :file:`/sys/devices/system/cpu/cpuidle/`
+in ``sysfs``.
Which ``CPUIdle`` driver is used, on the other hand, usually depends on the
platform the kernel is running on, but there are platforms with more than one
diff --git a/Documentation/admin-guide/pm/intel-speed-select.rst b/Documentation/admin-guide/pm/intel-speed-select.rst
new file mode 100644
index 000000000000..b2ca601c21c6
--- /dev/null
+++ b/Documentation/admin-guide/pm/intel-speed-select.rst
@@ -0,0 +1,917 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============================================================
+Intel(R) Speed Select Technology User Guide
+============================================================
+
+The Intel(R) Speed Select Technology (Intel(R) SST) provides a powerful new
+collection of features that give more granular control over CPU performance.
+With Intel(R) SST, one server can be configured for power and performance for a
+variety of diverse workload requirements.
+
+Refer to the links below for an overview of the technology:
+
+- https://www.intel.com/content/www/us/en/architecture-and-technology/speed-select-technology-article.html
+- https://builders.intel.com/docs/networkbuilders/intel-speed-select-technology-base-frequency-enhancing-performance.pdf
+
+These capabilities are further enhanced in some of the newer generations of
+server platforms where these features can be enumerated and controlled
+dynamically without pre-configuring via BIOS setup options. This dynamic
+configuration is done via mailbox commands to the hardware. One way to enumerate
+and configure these features is by using the Intel Speed Select utility.
+
+This document explains how to use the Intel Speed Select tool to enumerate and
+control Intel(R) SST features. This document gives example commands and explains
+how these commands change the power and performance profile of the system under
+test. Using this tool as an example, customers can replicate the messaging
+implemented in the tool in their production software.
+
+intel-speed-select configuration tool
+======================================
+
+Most Linux distribution packages may include the "intel-speed-select" tool. If not,
+it can be built by downloading the Linux kernel tree from kernel.org. Once
+downloaded, the tool can be built without building the full kernel.
+
+From the kernel tree, run the following commands::
+
+# cd tools/power/x86/intel-speed-select/
+# make
+# make install
+
+Getting Help
+------------
+
+To get help with the tool, execute the command below::
+
+# intel-speed-select --help
+
+The top-level help describes arguments and features. Notice that there is a
+multi-level help structure in the tool. For example, to get help for the feature "perf-profile"::
+
+# intel-speed-select perf-profile --help
+
+To get help on a command, another level of help is provided. For example for the command info "info"::
+
+# intel-speed-select perf-profile info --help
+
+Summary of platform capability
+------------------------------
+To check the current platform and driver capaibilities, execute::
+
+#intel-speed-select --info
+
+For example on a test system::
+
+ # intel-speed-select --info
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Platform: API version : 1
+ Platform: Driver version : 1
+ Platform: mbox supported : 1
+ Platform: mmio supported : 1
+ Intel(R) SST-PP (feature perf-profile) is supported
+ TDP level change control is unlocked, max level: 4
+ Intel(R) SST-TF (feature turbo-freq) is supported
+ Intel(R) SST-BF (feature base-freq) is not supported
+ Intel(R) SST-CP (feature core-power) is supported
+
+Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP)
+------------------------------------------------------------------------
+
+This feature allows configuration of a server dynamically based on workload
+performance requirements. This helps users during deployment as they do not have
+to choose a specific server configuration statically. This Intel(R) Speed Select
+Technology - Performance Profile (Intel(R) SST-PP) feature introduces a mechanism
+that allows multiple optimized performance profiles per system. Each profile
+defines a set of CPUs that need to be online and rest offline to sustain a
+guaranteed base frequency. Once the user issues a command to use a specific
+performance profile and meet CPU online/offline requirement, the user can expect
+a change in the base frequency dynamically. This feature is called
+"perf-profile" when using the Intel Speed Select tool.
+
+Number or performance levels
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+There can be multiple performance profiles on a system. To get the number of
+profiles, execute the command below::
+
+ # intel-speed-select perf-profile get-config-levels
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-config-levels:4
+ package-1
+ die-0
+ cpu-14
+ get-config-levels:4
+
+On this system under test, there are 4 performance profiles in addition to the
+base performance profile (which is performance level 0).
+
+Lock/Unlock status
+~~~~~~~~~~~~~~~~~~
+
+Even if there are multiple performance profiles, it is possible that that they
+are locked. If they are locked, users cannot issue a command to change the
+performance state. It is possible that there is a BIOS setup to unlock or check
+with your system vendor.
+
+To check if the system is locked, execute the following command::
+
+ # intel-speed-select perf-profile get-lock-status
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-lock-status:0
+ package-1
+ die-0
+ cpu-14
+ get-lock-status:0
+
+In this case, lock status is 0, which means that the system is unlocked.
+
+Properties of a performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get properties of a specific performance level (For example for the level 0, below), execute the command below::
+
+ # intel-speed-select perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ cpu-count:28
+ enable-cpu-mask:000003ff,f0003fff
+ enable-cpu-list:0,1,2,3,4,5,6,7,8,9,10,11,12,13,28,29,30,31,32,33,34,35,36,37,38,39,40,41
+ thermal-design-power-ratio:26
+ base-frequency(MHz):2600
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:disabled
+ ...
+ ...
+
+Here -l option is used to specify a performance level.
+
+If the option -l is omitted, then this command will print information about all
+the performance levels. The above command is printing properties of the
+performance level 0.
+
+For this performance profile, the list of CPUs displayed by the
+"enable-cpu-mask/enable-cpu-list" at the max can be "online." When that
+condition is met, then base frequency of 2600 MHz can be maintained. To
+understand more, execute "intel-speed-select perf-profile info" for performance
+level 4::
+
+ # intel-speed-select perf-profile info -l 4
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-4
+ cpu-count:28
+ enable-cpu-mask:000000fa,f0000faf
+ enable-cpu-list:0,1,2,3,5,7,8,9,10,11,28,29,30,31,33,35,36,37,38,39
+ thermal-design-power-ratio:28
+ base-frequency(MHz):2800
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:unsupported
+ ...
+ ...
+
+There are fewer CPUs in the "enable-cpu-mask/enable-cpu-list". Consequently, if
+the user only keeps these CPUs online and the rest "offline," then the base
+frequency is increased to 2.8 GHz compared to 2.6 GHz at performance level 0.
+
+Get current performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the current performance level, execute::
+
+ # intel-speed-select perf-profile get-config-current-level
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ get-config-current_level:0
+
+First verify that the base_frequency displayed by the cpufreq sysfs is correct::
+
+ # cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
+ 2600000
+
+This matches the base-frequency (MHz) field value displayed from the
+"perf-profile info" command for performance level 0(cpufreq frequency is in
+KHz).
+
+To check if the average frequency is equal to the base frequency for a 100% busy
+workload, disable turbo::
+
+# echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
+
+Then runs a busy workload on all CPUs, for example::
+
+#stress -c 64
+
+To verify the base frequency, run turbostat::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+
+ Package Core CPU Bzy_MHz
+ - - 2600
+ 0 0 0 2600
+ 0 1 1 2600
+ 0 2 2 2600
+ 0 3 3 2600
+ 0 4 4 2600
+ . . . .
+
+
+Changing performance level
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To the change the performance level to 4, execute::
+
+ # intel-speed-select -d perf-profile set-config-level -l 4 -o
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile
+ set_tdp_level:success
+
+In the command above, "-o" is optional. If it is specified, then it will also
+offline CPUs which are not present in the enable_cpu_mask for this performance
+level.
+
+Now if the base_frequency is checked::
+
+ #cat /sys/devices/system/cpu/cpu0/cpufreq/base_frequency
+ 2800000
+
+Which shows that the base frequency now increased from 2600 MHz at performance
+level 0 to 2800 MHz at performance level 4. As a result, any workload, which can
+use fewer CPUs, can see a boost of 200 MHz compared to performance level 0.
+
+Check presence of other Intel(R) SST features
+---------------------------------------------
+
+Each of the performance profiles also specifies weather there is support of
+other two Intel(R) SST features (Intel(R) Speed Select Technology - Base Frequency
+(Intel(R) SST-BF) and Intel(R) Speed Select Technology - Turbo Frequency (Intel
+SST-TF)).
+
+For example, from the output of "perf-profile info" above, for level 0 and level
+4:
+
+For level 0::
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:disabled
+
+For level 4::
+ speed-select-turbo-freq:disabled
+ speed-select-base-freq:unsupported
+
+Given these results, the "speed-select-base-freq" (Intel(R) SST-BF) in level 4
+changed from "disabled" to "unsupported" compared to performance level 0.
+
+This means that at performance level 4, the "speed-select-base-freq" feature is
+not supported. However, at performance level 0, this feature is "supported", but
+currently "disabled", meaning the user has not activated this feature. Whereas
+"speed-select-turbo-freq" (Intel(R) SST-TF) is supported at both performance
+levels, but currently not activated by the user.
+
+The Intel(R) SST-BF and the Intel(R) SST-TF features are built on a foundation
+technology called Intel(R) Speed Select Technology - Core Power (Intel(R) SST-CP).
+The platform firmware enables this feature when Intel(R) SST-BF or Intel(R) SST-TF
+is supported on a platform.
+
+Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP)
+---------------------------------------------------------------
+
+Intel(R) Speed Select Technology Core Power (Intel(R) SST-CP) is an interface that
+allows users to define per core priority. This defines a mechanism to distribute
+power among cores when there is a power constrained scenario. This defines a
+class of service (CLOS) configuration.
+
+The user can configure up to 4 class of service configurations. Each CLOS group
+configuration allows definitions of parameters, which affects how the frequency
+can be limited and power is distributed. Each CPU core can be tied to a class of
+service and hence an associated priority. The granularity is at core level not
+at per CPU level.
+
+Enable CLOS based prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To use CLOS based prioritization feature, firmware must be informed to enable
+and use a priority type. There is a default per platform priority type, which
+can be changed with optional command line parameter.
+
+To enable and check the options, execute::
+
+ # intel-speed-select core-power enable --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Enable core-power for a package/die
+ Clos Enable: Specify priority type with [--priority|-p]
+ 0: Proportional, 1: Ordered
+
+There are two types of priority types:
+
+- Ordered
+
+Priority for ordered throttling is defined based on the index of the assigned
+CLOS group. Where CLOS0 gets highest priority (throttled last).
+
+Priority order is:
+CLOS0 > CLOS1 > CLOS2 > CLOS3.
+
+- Proportional
+
+When proportional priority is used, there is an additional parameter called
+frequency_weight, which can be specified per CLOS group. The goal of
+proportional priority is to provide each core with the requested min., then
+distribute all remaining (excess/deficit) budgets in proportion to a defined
+weight. This proportional priority can be configured using "core-power config"
+command.
+
+To enable with the platform default priority type, execute::
+
+ # intel-speed-select core-power enable
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ enable:success
+ package-1
+ die-0
+ cpu-6
+ core-power
+ enable:success
+
+The scope of this enable is per package or die scoped when a package contains
+multiple dies. To check if CLOS is enabled and get priority type, "core-power
+info" command can be used. For example to check the status of core-power feature
+on CPU 0, execute::
+
+ # intel-speed-select -c 0 core-power info
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ support-status:supported
+ enable-status:enabled
+ clos-enable-status:enabled
+ priority-type:proportional
+ package-1
+ die-0
+ cpu-24
+ core-power
+ support-status:supported
+ enable-status:enabled
+ clos-enable-status:enabled
+ priority-type:proportional
+
+Configuring CLOS groups
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Each CLOS group has its own attributes including min, max, freq_weight and
+desired. These parameters can be configured with "core-power config" command.
+Defaults will be used if user skips setting a parameter except clos id, which is
+mandatory. To check core-power config options, execute::
+
+ # intel-speed-select core-power config --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Set core-power configuration for one of the four clos ids
+ Specify targeted clos id with [--clos|-c]
+ Specify clos Proportional Priority [--weight|-w]
+ Specify clos min in MHz with [--min|-n]
+ Specify clos max in MHz with [--max|-m]
+
+For example::
+
+ # intel-speed-select core-power config -c 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ clos epp is not specified, default: 0
+ clos frequency weight is not specified, default: 0
+ clos min is not specified, default: 0 MHz
+ clos max is not specified, default: 25500 MHz
+ clos desired is not specified, default: 0
+ package-0
+ die-0
+ cpu-0
+ core-power
+ config:success
+ package-1
+ die-0
+ cpu-6
+ core-power
+ config:success
+
+The user has the option to change defaults. For example, the user can change the
+"min" and set the base frequency to always get guaranteed base frequency.
+
+Get the current CLOS configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To check the current configuration, "core-power get-config" can be used. For
+example, to get the configuration of CLOS 0::
+
+ # intel-speed-select core-power get-config -c 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ core-power
+ clos:0
+ epp:0
+ clos-proportional-priority:0
+ clos-min:0 MHz
+ clos-max:Max Turbo frequency
+ clos-desired:0 MHz
+ package-1
+ die-0
+ cpu-24
+ core-power
+ clos:0
+ epp:0
+ clos-proportional-priority:0
+ clos-min:0 MHz
+ clos-max:Max Turbo frequency
+ clos-desired:0 MHz
+
+Associating a CPU with a CLOS group
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To associate a CPU to a CLOS group "core-power assoc" command can be used::
+
+ # intel-speed-select core-power assoc --help
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ Associate a clos id to a CPU
+ Specify targeted clos id with [--clos|-c]
+
+
+For example to associate CPU 10 to CLOS group 3, execute::
+
+ # intel-speed-select -c 10 core-power assoc -c 3
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-10
+ core-power
+ assoc:success
+
+Once a CPU is associated, its sibling CPUs are also associated to a CLOS group.
+Once associated, avoid changing Linux "cpufreq" subsystem scaling frequency
+limits.
+
+To check the existing association for a CPU, "core-power get-assoc" command can
+be used. For example, to get association of CPU 10, execute::
+
+ # intel-speed-select -c 10 core-power get-assoc
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-1
+ die-0
+ cpu-10
+ get-assoc
+ clos:3
+
+This shows that CPU 10 is part of a CLOS group 3.
+
+
+Disable CLOS based prioritization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To disable, execute::
+
+# intel-speed-select core-power disable
+
+Some features like Intel(R) SST-TF can only be enabled when CLOS based prioritization
+is enabled. For this reason, disabling while Intel(R) SST-TF is enabled can cause
+Intel(R) SST-TF to fail. This will cause the "disable" command to display an error
+if Intel(R) SST-TF is already enabled. In turn, to disable, the Intel(R) SST-TF
+feature must be disabled first.
+
+Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF)
+-------------------------------------------------------------------
+
+The Intel(R) Speed Select Technology - Base Frequency (Intel(R) SST-BF) feature lets
+the user control base frequency. If some critical workload threads demand
+constant high guaranteed performance, then this feature can be used to execute
+the thread at higher base frequency on specific sets of CPUs (high priority
+CPUs) at the cost of lower base frequency (low priority CPUs) on other CPUs.
+This feature does not require offline of the low priority CPUs.
+
+The support of Intel(R) SST-BF depends on the Intel(R) Speed Select Technology -
+Performance Profile (Intel(R) SST-PP) performance level configuration. It is
+possible that only certain performance levels support Intel(R) SST-BF. It is also
+possible that only base performance level (level = 0) has support of Intel
+SST-BF. Consequently, first select the desired performance level to enable this
+feature.
+
+In the system under test here, Intel(R) SST-BF is supported at the base
+performance level 0, but currently disabled. For example for the level 0::
+
+ # intel-speed-select -c 0 perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+
+ speed-select-base-freq:disabled
+ ...
+
+Before enabling Intel(R) SST-BF and measuring its impact on a workload
+performance, execute some workload and measure performance and get a baseline
+performance to compare against.
+
+Here the user wants more guaranteed performance. For this reason, it is likely
+that turbo is disabled. To disable turbo, execute::
+
+#echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
+
+Based on the output of the "intel-speed-select perf-profile info -l 0" base
+frequency of guaranteed frequency 2600 MHz.
+
+
+Measure baseline performance for comparison
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To compare, pick a multi-threaded workload where each thread can be scheduled on
+separate CPUs. "Hackbench pipe" test is a good example on how to improve
+performance using Intel(R) SST-BF.
+
+Below, the workload is measuring average scheduler wakeup latency, so a lower
+number means better performance::
+
+ # taskset -c 3,4 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 6.102 [sec]
+ 6.102445 usecs/op
+ 163868 ops/sec
+
+While running the above test, if we take turbostat output, it will show us that
+2 of the CPUs are busy and reaching max. frequency (which would be the base
+frequency as the turbo is disabled). The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 1000
+ 0 1 1 1005
+ 0 2 2 1000
+ 0 3 3 2600
+ 0 4 4 2600
+ 0 5 5 1000
+ 0 6 6 1000
+ 0 7 7 1005
+ 0 8 8 1005
+ 0 9 9 1000
+ 0 10 10 1000
+ 0 11 11 995
+ 0 12 12 1000
+ 0 13 13 1000
+
+From the above turbostat output, both CPU 3 and 4 are very busy and reaching
+full guaranteed frequency of 2600 MHz.
+
+Intel(R) SST-BF Capabilities
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get capabilities of Intel(R) SST-BF for the current performance level 0,
+execute::
+
+ # intel-speed-select base-freq info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ speed-select-base-freq
+ high-priority-base-frequency(MHz):3000
+ high-priority-cpu-mask:00000216,00002160
+ high-priority-cpu-list:5,6,8,13,33,34,36,41
+ low-priority-base-frequency(MHz):2400
+ tjunction-temperature(C):125
+ thermal-design-power(W):205
+
+The above capabilities show that there are some CPUs on this system that can
+offer base frequency of 3000 MHz compared to the standard base frequency at this
+performance levels. Nevertheless, these CPUs are fixed, and they are presented
+via high-priority-cpu-list/high-priority-cpu-mask. But if this Intel(R) SST-BF
+feature is selected, the low priorities CPUs (which are not in
+high-priority-cpu-list) can only offer up to 2400 MHz. As a result, if this
+clipping of low priority CPUs is acceptable, then the user can enable Intel
+SST-BF feature particularly for the above "sched pipe" workload since only two
+CPUs are used, they can be scheduled on high priority CPUs and can get boost of
+400 MHz.
+
+Enable Intel(R) SST-BF
+~~~~~~~~~~~~~~~~~~~~~~
+
+To enable Intel(R) SST-BF feature, execute::
+
+ # intel-speed-select base-freq enable -a
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ base-freq
+ enable:success
+ package-1
+ die-0
+ cpu-14
+ base-freq
+ enable:success
+
+In this case, -a option is optional. This not only enables Intel(R) SST-BF, but it
+also adjusts the priority of cores using Intel(R) Speed Select Technology Core
+Power (Intel(R) SST-CP) features. This option sets the minimum performance of each
+Intel(R) Speed Select Technology - Performance Profile (Intel(R) SST-PP) class to
+maximum performance so that the hardware will give maximum performance possible
+for each CPU.
+
+If -a option is not used, then the following steps are required before enabling
+Intel(R) SST-BF:
+
+- Discover Intel(R) SST-BF and note low and high priority base frequency
+- Note the high prioity CPU list
+- Enable CLOS using core-power feature set
+- Configure CLOS parameters. Use CLOS.min to set to minimum performance
+- Subscribe desired CPUs to CLOS groups
+
+With this configuration, if the same workload is executed by pinning the
+workload to high priority CPUs (CPU 5 and 6 in this case)::
+
+ #taskset -c 5,6 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.627 [sec]
+ 5.627922 usecs/op
+ 177685 ops/sec
+
+This way, by enabling Intel(R) SST-BF, the performance of this benchmark is
+improved (latency reduced) by 7.79%. From the turbostat output, it can be
+observed that the high priority CPUs reached 3000 MHz compared to 2600 MHz.
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 2151
+ 0 1 1 2166
+ 0 2 2 2175
+ 0 3 3 2175
+ 0 4 4 2175
+ 0 5 5 3000
+ 0 6 6 3000
+ 0 7 7 2180
+ 0 8 8 2662
+ 0 9 9 2176
+ 0 10 10 2175
+ 0 11 11 2176
+ 0 12 12 2176
+ 0 13 13 2661
+
+Disable Intel(R) SST-BF
+~~~~~~~~~~~~~~~~~~~~~~~
+
+To disable the Intel(R) SST-BF feature, execute::
+
+# intel-speed-select base-freq disable -a
+
+
+Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
+--------------------------------------------------------------------
+
+This feature enables the ability to set different "All core turbo ratio limits"
+to cores based on the priority. By using this feature, some cores can be
+configured to get higher turbo frequency by designating them as high priority at
+the cost of lower or no turbo frequency on the low priority cores.
+
+For this reason, this feature is only useful when system is busy utilizing all
+CPUs, but the user wants some configurable option to get high performance on
+some CPUs.
+
+The support of Intel(R) Speed Select Technology - Turbo Frequency (Intel(R) SST-TF)
+depends on the Intel(R) Speed Select Technology - Performance Profile (Intel
+SST-PP) performance level configuration. It is possible that only a certain
+performance level supports Intel(R) SST-TF. It is also possible that only the base
+performance level (level = 0) has the support of Intel(R) SST-TF. Hence, first
+select the desired performance level to enable this feature.
+
+In the system under test here, Intel(R) SST-TF is supported at the base
+performance level 0, but currently disabled::
+
+ # intel-speed-select -c 0 perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+ ...
+ speed-select-turbo-freq:disabled
+ ...
+ ...
+
+
+To check if performance can be improved using Intel(R) SST-TF feature, get the turbo
+frequency properties with Intel(R) SST-TF enabled and compare to the base turbo
+capability of this system.
+
+Get Base turbo capability
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the base turbo capability of performance level 0, execute::
+
+ # intel-speed-select perf-profile info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ perf-profile-level-0
+ ...
+ ...
+ turbo-ratio-limits-sse
+ bucket-0
+ core-count:2
+ max-turbo-frequency(MHz):3200
+ bucket-1
+ core-count:4
+ max-turbo-frequency(MHz):3100
+ bucket-2
+ core-count:6
+ max-turbo-frequency(MHz):3100
+ bucket-3
+ core-count:8
+ max-turbo-frequency(MHz):3100
+ bucket-4
+ core-count:10
+ max-turbo-frequency(MHz):3100
+ bucket-5
+ core-count:12
+ max-turbo-frequency(MHz):3100
+ bucket-6
+ core-count:14
+ max-turbo-frequency(MHz):3100
+ bucket-7
+ core-count:16
+ max-turbo-frequency(MHz):3100
+
+Based on the data above, when all the CPUS are busy, the max. frequency of 3100
+MHz can be achieved. If there is some busy workload on cpu 0 - 11 (e.g. stress)
+and on CPU 12 and 13, execute "hackbench pipe" workload::
+
+ # taskset -c 12,13 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.705 [sec]
+ 5.705488 usecs/op
+ 175269 ops/sec
+
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ 0 0 0 3000
+ 0 1 1 3000
+ 0 2 2 3000
+ 0 3 3 3000
+ 0 4 4 3000
+ 0 5 5 3100
+ 0 6 6 3100
+ 0 7 7 3000
+ 0 8 8 3100
+ 0 9 9 3000
+ 0 10 10 3000
+ 0 11 11 3000
+ 0 12 12 3100
+ 0 13 13 3100
+
+Based on turbostat output, the performance is limited by frequency cap of 3100
+MHz. To check if the hackbench performance can be improved for CPU 12 and CPU
+13, first check the capability of the Intel(R) SST-TF feature for this performance
+level.
+
+Get Intel(R) SST-TF Capability
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To get the capability, the "turbo-freq info" command can be used::
+
+ # intel-speed-select turbo-freq info -l 0
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-0
+ speed-select-turbo-freq
+ bucket-0
+ high-priority-cores-count:2
+ high-priority-max-frequency(MHz):3200
+ high-priority-max-avx2-frequency(MHz):3200
+ high-priority-max-avx512-frequency(MHz):3100
+ bucket-1
+ high-priority-cores-count:4
+ high-priority-max-frequency(MHz):3100
+ high-priority-max-avx2-frequency(MHz):3000
+ high-priority-max-avx512-frequency(MHz):2900
+ bucket-2
+ high-priority-cores-count:6
+ high-priority-max-frequency(MHz):3100
+ high-priority-max-avx2-frequency(MHz):3000
+ high-priority-max-avx512-frequency(MHz):2900
+ speed-select-turbo-freq-clip-frequencies
+ low-priority-max-frequency(MHz):2600
+ low-priority-max-avx2-frequency(MHz):2400
+ low-priority-max-avx512-frequency(MHz):2100
+
+Based on the output above, there is an Intel(R) SST-TF bucket for which there are
+two high priority cores. If only two high priority cores are set, then max.
+turbo frequency on those cores can be increased to 3200 MHz. This is 100 MHz
+more than the base turbo capability for all cores.
+
+In turn, for the hackbench workload, two CPUs can be set as high priority and
+rest as low priority. One side effect is that once enabled, the low priority
+cores will be clipped to a lower frequency of 2600 MHz.
+
+Enable Intel(R) SST-TF
+~~~~~~~~~~~~~~~~~~~~~~
+
+To enable Intel(R) SST-TF, execute::
+
+ # intel-speed-select -c 12,13 turbo-freq enable -a
+ Intel(R) Speed Select Technology
+ Executing on CPU model: X
+ package-0
+ die-0
+ cpu-12
+ turbo-freq
+ enable:success
+ package-0
+ die-0
+ cpu-13
+ turbo-freq
+ enable:success
+ package--1
+ die-0
+ cpu-63
+ turbo-freq --auto
+ enable:success
+
+In this case, the option "-a" is optional. If set, it enables Intel(R) SST-TF
+feature and also sets the CPUs to high and and low priority using Intel Speed
+Select Technology Core Power (Intel(R) SST-CP) features. The CPU numbers passed
+with "-c" arguments are marked as high priority, including its siblings.
+
+If -a option is not used, then the following steps are required before enabling
+Intel(R) SST-TF:
+
+- Discover Intel(R) SST-TF and note buckets of high priority cores and maximum frequency
+
+- Enable CLOS using core-power feature set - Configure CLOS parameters
+
+- Subscribe desired CPUs to CLOS groups making sure that high priority cores are set to the maximum frequency
+
+If the same hackbench workload is executed, schedule hackbench threads on high
+priority CPUs::
+
+ #taskset -c 12,13 perf bench -r 100 sched pipe
+ # Running 'sched/pipe' benchmark:
+ # Executed 1000000 pipe operations between two processes
+ Total time: 5.510 [sec]
+ 5.510165 usecs/op
+ 180826 ops/sec
+
+This improved performance by around 3.3% improvement on a busy system. Here the
+turbostat output will show that the CPU 12 and CPU 13 are getting 100 MHz boost.
+The turbostat output::
+
+ #turbostat -c 0-13 --show Package,Core,CPU,Bzy_MHz -i 1
+ Package Core CPU Bzy_MHz
+ ...
+ 0 12 12 3200
+ 0 13 13 3200
diff --git a/Documentation/admin-guide/pm/intel_pstate.rst b/Documentation/admin-guide/pm/intel_pstate.rst
index ad392f3aee06..39d80bc29ccd 100644
--- a/Documentation/admin-guide/pm/intel_pstate.rst
+++ b/Documentation/admin-guide/pm/intel_pstate.rst
@@ -62,9 +62,10 @@ on the capabilities of the processor.
Active Mode
-----------
-This is the default operation mode of ``intel_pstate``. If it works in this
-mode, the ``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq``
-policies contains the string "intel_pstate".
+This is the default operation mode of ``intel_pstate`` for processors with
+hardware-managed P-states (HWP) support. If it works in this mode, the
+``scaling_driver`` policy attribute in ``sysfs`` for all ``CPUFreq`` policies
+contains the string "intel_pstate".
In this mode the driver bypasses the scaling governors layer of ``CPUFreq`` and
provides its own scaling algorithms for P-state selection. Those algorithms
@@ -138,12 +139,13 @@ internal P-state selection logic to be less performance-focused.
Active Mode Without HWP
~~~~~~~~~~~~~~~~~~~~~~~
-This is the default operation mode for processors that do not support the HWP
-feature. It also is used by default with the ``intel_pstate=no_hwp`` argument
-in the kernel command line. However, in this mode ``intel_pstate`` may refuse
-to work with the given processor if it does not recognize it. [Note that
-``intel_pstate`` will never refuse to work with any processor with the HWP
-feature enabled.]
+This operation mode is optional for processors that do not support the HWP
+feature or when the ``intel_pstate=no_hwp`` argument is passed to the kernel in
+the command line. The active mode is used in those cases if the
+``intel_pstate=active`` argument is passed to the kernel in the command line.
+In this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it. [Note that ``intel_pstate`` will never refuse to work with
+any processor with the HWP feature enabled.]
In this mode ``intel_pstate`` registers utilization update callbacks with the
CPU scheduler in order to run a P-state selection algorithm, either
@@ -188,10 +190,14 @@ is not set.
Passive Mode
------------
-This mode is used if the ``intel_pstate=passive`` argument is passed to the
-kernel in the command line (it implies the ``intel_pstate=no_hwp`` setting too).
-Like in the active mode without HWP support, in this mode ``intel_pstate`` may
-refuse to work with the given processor if it does not recognize it.
+This is the default operation mode of ``intel_pstate`` for processors without
+hardware-managed P-states (HWP) support. It is always used if the
+``intel_pstate=passive`` argument is passed to the kernel in the command line
+regardless of whether or not the given processor supports HWP. [Note that the
+``intel_pstate=no_hwp`` setting implies ``intel_pstate=passive`` if it is used
+without ``intel_pstate=active``.] Like in the active mode without HWP support,
+in this mode ``intel_pstate`` may refuse to work with processors that are not
+recognized by it.
If the driver works in this mode, the ``scaling_driver`` policy attribute in
``sysfs`` for all ``CPUFreq`` policies contains the string "intel_cpufreq".
diff --git a/Documentation/admin-guide/pm/working-state.rst b/Documentation/admin-guide/pm/working-state.rst
index 0a38cdf39df1..f40994c422dc 100644
--- a/Documentation/admin-guide/pm/working-state.rst
+++ b/Documentation/admin-guide/pm/working-state.rst
@@ -13,3 +13,4 @@ Working-State Power Management
intel_pstate
cpufreq_drivers
intel_epb
+ intel-speed-select