diff options
Diffstat (limited to 'Documentation')
162 files changed, 6604 insertions, 1702 deletions
diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index d431e2d00472..df4afbccf037 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -128,6 +128,8 @@ Date: Aug 28, 2020 KernelVersion: 5.10.0 Contact: dmaengine@vger.kernel.org Description: The last executed device administrative command's status/error. + Also last configuration error overloaded. + Writing to it will clear the status. What: /sys/bus/dsa/devices/wq<m>.<n>/block_on_fault Date: Oct 27, 2020 @@ -211,6 +213,13 @@ Contact: dmaengine@vger.kernel.org Description: Indicate whether ATS disable is turned on for the workqueue. 0 indicates ATS is on, and 1 indicates ATS is off for the workqueue. +What: /sys/bus/dsa/devices/wq<m>.<n>/occupancy +Date May 25, 2021 +KernelVersion: 5.14.0 +Contact: dmaengine@vger.kernel.org +Description: Show the current number of entries in this WQ if WQ Occupancy + Support bit WQ capabilities is 1. + What: /sys/bus/dsa/devices/engine<m>.<n>/group_id Date: Oct 25, 2019 KernelVersion: 5.6.0 diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs index a5c28f606865..284e2dfa61cd 100644 --- a/Documentation/ABI/testing/debugfs-driver-habanalabs +++ b/Documentation/ABI/testing/debugfs-driver-habanalabs @@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of "0" means device will be reset in case some CS has timed out, otherwise it will not be reset. +What: /sys/kernel/debug/habanalabs/hl<n>/state_dump +Date: Oct 2021 +KernelVersion: 5.15 +Contact: ynudelman@habana.ai +Description: Gets the state dump occurring on a CS timeout or failure. + State dump is used for debug and is created each time in case of + a problem in a CS execution, before reset. + Reading from the node returns the newest state dump available. + Writing an integer X discards X state dumps, so that the + next read would return X+1-st newest state dump. + What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err Date: Mar 2020 KernelVersion: 5.6 @@ -230,6 +241,14 @@ Description: Displays a list with information about the currently user pointers (user virtual addresses) that are pinned and mapped to DMA addresses +What: /sys/kernel/debug/habanalabs/hl<n>/userptr_lookup +Date: Aug 2021 +KernelVersion: 5.15 +Contact: ogabbay@kernel.org +Description: Allows to search for specific user pointers (user virtual + addresses) that are pinned and mapped to DMA addresses, and see + their resolution to the specific dma address. + What: /sys/kernel/debug/habanalabs/hl<n>/vm Date: Jan 2019 KernelVersion: 5.1 diff --git a/Documentation/ABI/testing/sysfs-bus-pci b/Documentation/ABI/testing/sysfs-bus-pci index 793cbb76cd25..d4ae03296861 100644 --- a/Documentation/ABI/testing/sysfs-bus-pci +++ b/Documentation/ABI/testing/sysfs-bus-pci @@ -121,6 +121,23 @@ Description: child buses, and re-discover devices removed earlier from this part of the device tree. +What: /sys/bus/pci/devices/.../reset_method +Date: August 2021 +Contact: Amey Narkhede <ameynarkhede03@gmail.com> +Description: + Some devices allow an individual function to be reset + without affecting other functions in the same slot. + + For devices that have this support, a file named + reset_method is present in sysfs. Reading this file + gives names of the supported and enabled reset methods and + their ordering. Writing a space-separated list of names of + reset methods sets the reset methods and ordering to be + used when resetting the device. Writing an empty string + disables the ability to reset the device. Writing + "default" enables all supported reset methods in the + default ordering. + What: /sys/bus/pci/devices/.../reset Date: July 2009 Contact: Michael S. Tsirkin <mst@redhat.com> diff --git a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst index db609b97ad58..fb73345cfb8a 100644 --- a/Documentation/PCI/endpoint/pci-endpoint-cfs.rst +++ b/Documentation/PCI/endpoint/pci-endpoint-cfs.rst @@ -43,6 +43,7 @@ entries corresponding to EPF driver will be created by the EPF core. .. <EPF Driver1>/ ... <EPF Device 11>/ ... <EPF Device 21>/ + ... <EPF Device 31>/ .. <EPF Driver2>/ ... <EPF Device 12>/ ... <EPF Device 22>/ @@ -68,6 +69,7 @@ created) ... subsys_vendor_id ... subsys_id ... interrupt_pin + ... <Symlink EPF Device 31>/ ... primary/ ... <Symlink EPC Device1>/ ... secondary/ @@ -79,6 +81,13 @@ interface should be added in 'primary' directory and symlink of endpoint controller connected to secondary interface should be added in 'secondary' directory. +The <EPF Device> directory can have a list of symbolic links +(<Symlink EPF Device 31>) to other <EPF Device>. These symbolic links should +be created by the user to represent the virtual functions that are bound to +the physical function. In the above directory structure <EPF Device 11> is a +physical function and <EPF Device 31> is a virtual function. An EPF device once +it's linked to another EPF device, cannot be linked to a EPC device. + EPC Device ========== @@ -98,7 +107,8 @@ entries corresponding to EPC device will be created by the EPC core. The <EPC Device> directory will have a list of symbolic links to <EPF Device>. These symbolic links should be created by the user to -represent the functions present in the endpoint device. +represent the functions present in the endpoint device. Only <EPF Device> +that represents a physical function can be linked to a EPC device. The <EPC Device> directory will also have a *start* field. Once "1" is written to this field, the endpoint device will be ready to diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index 35314b63008c..caa3c09a5c3f 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -259,7 +259,7 @@ Configuring the kernel Compiling the kernel -------------------- - - Make sure you have at least gcc 4.9 available. + - Make sure you have at least gcc 5.1 available. For more information, refer to :ref:`Documentation/process/changes.rst <changes>`. Please note that you can still run a.out user programs with this kernel. diff --git a/Documentation/admin-guide/acpi/ssdt-overlays.rst b/Documentation/admin-guide/acpi/ssdt-overlays.rst index 5d7e25988085..b5fbf54dca19 100644 --- a/Documentation/admin-guide/acpi/ssdt-overlays.rst +++ b/Documentation/admin-guide/acpi/ssdt-overlays.rst @@ -30,22 +30,21 @@ following ASL code can be used:: { Device (STAC) { - Name (_ADR, Zero) Name (_HID, "BMA222E") + Name (RBUF, ResourceTemplate () + { + I2cSerialBus (0x0018, ControllerInitiated, 0x00061A80, + AddressingMode7Bit, "\\_SB.I2C6", 0x00, + ResourceConsumer, ,) + GpioInt (Edge, ActiveHigh, Exclusive, PullDown, 0x0000, + "\\_SB.GPO2", 0x00, ResourceConsumer, , ) + { // Pin list + 0 + } + }) Method (_CRS, 0, Serialized) { - Name (RBUF, ResourceTemplate () - { - I2cSerialBus (0x0018, ControllerInitiated, 0x00061A80, - AddressingMode7Bit, "\\_SB.I2C6", 0x00, - ResourceConsumer, ,) - GpioInt (Edge, ActiveHigh, Exclusive, PullDown, 0x0000, - "\\_SB.GPO2", 0x00, ResourceConsumer, , ) - { // Pin list - 0 - } - }) Return (RBUF) } } @@ -75,7 +74,7 @@ This option allows loading of user defined SSDTs from initrd and it is useful when the system does not support EFI or when there is not enough EFI storage. It works in a similar way with initrd based ACPI tables override/upgrade: SSDT -aml code must be placed in the first, uncompressed, initrd under the +AML code must be placed in the first, uncompressed, initrd under the "kernel/firmware/acpi" path. Multiple files can be used and this will translate in loading multiple tables. Only SSDT and OEM tables are allowed. See initrd_table_override.txt for more details. @@ -103,12 +102,14 @@ This is the preferred method, when EFI is supported on the platform, because it allows a persistent, OS independent way of storing the user defined SSDTs. There is also work underway to implement EFI support for loading user defined SSDTs and using this method will make it easier to convert to the EFI loading -mechanism when that will arrive. +mechanism when that will arrive. To enable it, the +CONFIG_EFI_CUSTOM_SSDT_OVERLAYS shoyld be chosen to y. -In order to load SSDTs from an EFI variable the efivar_ssdt kernel command line -parameter can be used. The argument for the option is the variable name to -use. If there are multiple variables with the same name but with different -vendor GUIDs, all of them will be loaded. +In order to load SSDTs from an EFI variable the ``"efivar_ssdt=..."`` kernel +command line parameter can be used (the name has a limitation of 16 characters). +The argument for the option is the variable name to use. If there are multiple +variables with the same name but with different vendor GUIDs, all of them will +be loaded. In order to store the AML code in an EFI variable the efivarfs filesystem can be used. It is enabled and mounted by default in /sys/firmware/efi/efivars in all @@ -127,7 +128,7 @@ variable with the content from a given file:: #!/bin/sh -e - while ! [ -z "$1" ]; do + while [ -n "$1" ]; do case "$1" in "-f") filename="$2"; shift;; "-g") guid="$2"; shift;; @@ -167,14 +168,14 @@ variable with the content from a given file:: Loading ACPI SSDTs from configfs ================================ -This option allows loading of user defined SSDTs from userspace via the configfs +This option allows loading of user defined SSDTs from user space via the configfs interface. The CONFIG_ACPI_CONFIGFS option must be select and configfs must be mounted. In the following examples, we assume that configfs has been mounted in -/config. +/sys/kernel/config. -New tables can be loading by creating new directories in /config/acpi/table/ and -writing the SSDT aml code in the aml attribute:: +New tables can be loading by creating new directories in /sys/kernel/config/acpi/table +and writing the SSDT AML code in the aml attribute:: - cd /config/acpi/table + cd /sys/kernel/config/acpi/table mkdir my_ssdt cat ~/ssdt.aml > my_ssdt/aml diff --git a/Documentation/admin-guide/bootconfig.rst b/Documentation/admin-guide/bootconfig.rst index 6a79f2e59396..a1860fc0ca88 100644 --- a/Documentation/admin-guide/bootconfig.rst +++ b/Documentation/admin-guide/bootconfig.rst @@ -178,7 +178,7 @@ update the boot loader and the kernel image itself as long as the boot loader passes the correct initrd file size. If by any chance, the boot loader passes a longer size, the kernel fails to find the bootconfig data. -To do this operation, Linux kernel provides "bootconfig" command under +To do this operation, Linux kernel provides ``bootconfig`` command under tools/bootconfig, which allows admin to apply or delete the config file to/from initrd image. You can build it by the following command:: @@ -196,6 +196,43 @@ To remove the config from the image, you can use -d option as below:: Then add "bootconfig" on the normal kernel command line to tell the kernel to look for the bootconfig at the end of the initrd file. + +Kernel parameters via Boot Config +================================= + +In addition to the kernel command line, the boot config can be used for +passing the kernel parameters. All the key-value pairs under ``kernel`` +key will be passed to kernel cmdline directly. Moreover, the key-value +pairs under ``init`` will be passed to init process via the cmdline. +The parameters are concatinated with user-given kernel cmdline string +as the following order, so that the command line parameter can override +bootconfig parameters (this depends on how the subsystem handles parameters +but in general, earlier parameter will be overwritten by later one.):: + + [bootconfig params][cmdline params] -- [bootconfig init params][cmdline init params] + +Here is an example of the bootconfig file for kernel/init parameters.:: + + kernel { + root = 01234567-89ab-cdef-0123-456789abcd + } + init { + splash + } + +This will be copied into the kernel cmdline string as the following:: + + root="01234567-89ab-cdef-0123-456789abcd" -- splash + +If user gives some other command line like,:: + + ro bootconfig -- quiet + +The final kernel cmdline will be the following:: + + root="01234567-89ab-cdef-0123-456789abcd" ro bootconfig -- splash quiet + + Config File Limitation ====================== diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 828d11441ebf..91ba391f9b32 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1758,6 +1758,11 @@ support for the idxd driver. By default it is set to true (1). + idxd.tc_override= [HW] + Format: <bool> + Allow override of default traffic class configuration + for the device. By default it is set to false (0). + ieee754= [MIPS] Select IEEE Std 754 conformance mode Format: { strict | legacy | 2008 | relaxed } Default: strict diff --git a/Documentation/admin-guide/mm/damon/index.rst b/Documentation/admin-guide/mm/damon/index.rst new file mode 100644 index 000000000000..8c5dde3a5754 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/index.rst @@ -0,0 +1,15 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +Monitoring Data Accesses +======================== + +:doc:`DAMON </vm/damon/index>` allows light-weight data access monitoring. +Using DAMON, users can analyze the memory access patterns of their systems and +optimize those. + +.. toctree:: + :maxdepth: 2 + + start + usage diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst new file mode 100644 index 000000000000..d5eb89a8fc38 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/start.rst @@ -0,0 +1,114 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +Getting Started +=============== + +This document briefly describes how you can use DAMON by demonstrating its +default user space tool. Please note that this document describes only a part +of its features for brevity. Please refer to :doc:`usage` for more details. + + +TL; DR +====== + +Follow the commands below to monitor and visualize the memory access pattern of +your workload. :: + + # # build the kernel with CONFIG_DAMON_*=y, install it, and reboot + # mount -t debugfs none /sys/kernel/debug/ + # git clone https://github.com/awslabs/damo + # ./damo/damo record $(pidof <your workload>) + # ./damo/damo report heat --plot_ascii + +The final command draws the access heatmap of ``<your workload>``. The heatmap +shows which memory region (x-axis) is accessed when (y-axis) and how frequently +(number; the higher the more accesses have been observed). :: + + 111111111111111111111111111111111111111111111111111111110000 + 111121111111111111111111111111211111111111111111111111110000 + 000000000000000000000000000000000000000000000000001555552000 + 000000000000000000000000000000000000000000000222223555552000 + 000000000000000000000000000000000000000011111677775000000000 + 000000000000000000000000000000000000000488888000000000000000 + 000000000000000000000000000000000177888400000000000000000000 + 000000000000000000000000000046666522222100000000000000000000 + 000000000000000000000014444344444300000000000000000000000000 + 000000000000000002222245555510000000000000000000000000000000 + # access_frequency: 0 1 2 3 4 5 6 7 8 9 + # x-axis: space (140286319947776-140286426374096: 101.496 MiB) + # y-axis: time (605442256436361-605479951866441: 37.695430s) + # resolution: 60x10 (1.692 MiB and 3.770s for each character) + + +Prerequisites +============= + +Kernel +------ + +You should first ensure your system is running on a kernel built with +``CONFIG_DAMON_*=y``. + + +User Space Tool +--------------- + +For the demonstration, we will use the default user space tool for DAMON, +called DAMON Operator (DAMO). It is available at +https://github.com/awslabs/damo. The examples below assume that ``damo`` is on +your ``$PATH``. It's not mandatory, though. + +Because DAMO is using the debugfs interface (refer to :doc:`usage` for the +detail) of DAMON, you should ensure debugfs is mounted. Mount it manually as +below:: + + # mount -t debugfs none /sys/kernel/debug/ + +or append the following line to your ``/etc/fstab`` file so that your system +can automatically mount debugfs upon booting:: + + debugfs /sys/kernel/debug debugfs defaults 0 0 + + +Recording Data Access Patterns +============================== + +The commands below record the memory access patterns of a program and save the +monitoring results to a file. :: + + $ git clone https://github.com/sjp38/masim + $ cd masim; make; ./masim ./configs/zigzag.cfg & + $ sudo damo record -o damon.data $(pidof masim) + +The first two lines of the commands download an artificial memory access +generator program and run it in the background. The generator will repeatedly +access two 100 MiB sized memory regions one by one. You can substitute this +with your real workload. The last line asks ``damo`` to record the access +pattern in the ``damon.data`` file. + + +Visualizing Recorded Patterns +============================= + +The following three commands visualize the recorded access patterns and save +the results as separate image files. :: + + $ damo report heats --heatmap access_pattern_heatmap.png + $ damo report wss --range 0 101 1 --plot wss_dist.png + $ damo report wss --range 0 101 1 --sortby time --plot wss_chron_change.png + +- ``access_pattern_heatmap.png`` will visualize the data access pattern in a + heatmap, showing which memory region (y-axis) got accessed when (x-axis) + and how frequently (color). +- ``wss_dist.png`` will show the distribution of the working set size. +- ``wss_chron_change.png`` will show how the working set size has + chronologically changed. + +You can view the visualizations of this example workload at [1]_. +Visualizations of other realistic workloads are available at [2]_ [3]_ [4]_. + +.. [1] https://damonitor.github.io/doc/html/v17/admin-guide/mm/damon/start.html#visualizing-recorded-patterns +.. [2] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html +.. [3] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html +.. [4] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst new file mode 100644 index 000000000000..a72cda374aba --- /dev/null +++ b/Documentation/admin-guide/mm/damon/usage.rst @@ -0,0 +1,112 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=============== +Detailed Usages +=============== + +DAMON provides below three interfaces for different users. + +- *DAMON user space tool.* + This is for privileged people such as system administrators who want a + just-working human-friendly interface. Using this, users can use the DAMON’s + major features in a human-friendly way. It may not be highly tuned for + special cases, though. It supports only virtual address spaces monitoring. +- *debugfs interface.* + This is for privileged user space programmers who want more optimized use of + DAMON. Using this, users can use DAMON’s major features by reading + from and writing to special debugfs files. Therefore, you can write and use + your personalized DAMON debugfs wrapper programs that reads/writes the + debugfs files instead of you. The DAMON user space tool is also a reference + implementation of such programs. It supports only virtual address spaces + monitoring. +- *Kernel Space Programming Interface.* + This is for kernel space programmers. Using this, users can utilize every + feature of DAMON most flexibly and efficiently by writing kernel space + DAMON application programs for you. You can even extend DAMON for various + address spaces. + +Nevertheless, you could write your own user space tool using the debugfs +interface. A reference implementation is available at +https://github.com/awslabs/damo. If you are a kernel programmer, you could +refer to :doc:`/vm/damon/api` for the kernel space programming interface. For +the reason, this document describes only the debugfs interface + +debugfs Interface +================= + +DAMON exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under +its debugfs directory, ``<debugfs>/damon/``. + + +Attributes +---------- + +Users can get and set the ``sampling interval``, ``aggregation interval``, +``regions update interval``, and min/max number of monitoring target regions by +reading from and writing to the ``attrs`` file. To know about the monitoring +attributes in detail, please refer to the :doc:`/vm/damon/design`. For +example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and +1000, and then check it again:: + + # cd <debugfs>/damon + # echo 5000 100000 1000000 10 1000 > attrs + # cat attrs + 5000 100000 1000000 10 1000 + + +Target IDs +---------- + +Some types of address spaces supports multiple monitoring target. For example, +the virtual memory address spaces monitoring can have multiple processes as the +monitoring targets. Users can set the targets by writing relevant id values of +the targets to, and get the ids of the current targets by reading from the +``target_ids`` file. In case of the virtual address spaces monitoring, the +values should be pids of the monitoring target processes. For example, below +commands set processes having pids 42 and 4242 as the monitoring targets and +check it again:: + + # cd <debugfs>/damon + # echo 42 4242 > target_ids + # cat target_ids + 42 4242 + +Note that setting the target ids doesn't start the monitoring. + + +Turning On/Off +-------------- + +Setting the files as described above doesn't incur effect unless you explicitly +start the monitoring. You can start, stop, and check the current status of the +monitoring by writing to and reading from the ``monitor_on`` file. Writing +``on`` to the file starts the monitoring of the targets with the attributes. +Writing ``off`` to the file stops those. DAMON also stops if every target +process is terminated. Below example commands turn on, off, and check the +status of DAMON:: + + # cd <debugfs>/damon + # echo on > monitor_on + # echo off > monitor_on + # cat monitor_on + off + +Please note that you cannot write to the above-mentioned debugfs files while +the monitoring is turned on. If you write to the files while DAMON is running, +an error code such as ``-EBUSY`` will be returned. + + +Tracepoint for Monitoring Results +================================= + +DAMON provides the monitoring results via a tracepoint, +``damon:damon_aggregated``. While the monitoring is turned on, you could +record the tracepoint events and show results using tracepoint supporting tools +like ``perf``. For example:: + + # echo on > monitor_on + # perf record -e damon:damon_aggregated & + # sleep 5 + # kill 9 $(pidof perf) + # echo off > monitor_on + # perf script diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst index 4b14d8b50e9e..cbd19d5e625f 100644 --- a/Documentation/admin-guide/mm/index.rst +++ b/Documentation/admin-guide/mm/index.rst @@ -27,6 +27,7 @@ the Linux memory management. concepts cma_debugfs + damon/index hugetlbpage idle_page_tracking ksm diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst index c6bae2d77160..03dfbc925252 100644 --- a/Documentation/admin-guide/mm/memory-hotplug.rst +++ b/Documentation/admin-guide/mm/memory-hotplug.rst @@ -1,466 +1,576 @@ .. _admin_guide_memory_hotplug: -============== -Memory Hotplug -============== +================== +Memory Hot(Un)Plug +================== -:Created: Jul 28 2007 -:Updated: Add some details about locking internals: Aug 20 2018 - -This document is about memory hotplug including how-to-use and current status. -Because Memory Hotplug is still under development, contents of this text will -be changed often. +This document describes generic Linux support for memory hot(un)plug with +a focus on System RAM, including ZONE_MOVABLE support. .. contents:: :local: -.. note:: +Introduction +============ - (1) x86_64's has special implementation for memory hotplug. - This text does not describe it. - (2) This text assumes that sysfs is mounted at ``/sys``. +Memory hot(un)plug allows for increasing and decreasing the size of physical +memory available to a machine at runtime. In the simplest case, it consists of +physically plugging or unplugging a DIMM at runtime, coordinated with the +operating system. +Memory hot(un)plug is used for various purposes: -Introduction -============ +- The physical memory available to a machine can be adjusted at runtime, up- or + downgrading the memory capacity. This dynamic memory resizing, sometimes + referred to as "capacity on demand", is frequently used with virtual machines + and logical partitions. + +- Replacing hardware, such as DIMMs or whole NUMA nodes, without downtime. One + example is replacing failing memory modules. -Purpose of memory hotplug -------------------------- +- Reducing energy consumption either by physically unplugging memory modules or + by logically unplugging (parts of) memory modules from Linux. -Memory Hotplug allows users to increase/decrease the amount of memory. -Generally, there are two purposes. +Further, the basic memory hot(un)plug infrastructure in Linux is nowadays also +used to expose persistent memory, other performance-differentiated memory and +reserved memory regions as ordinary system RAM to Linux. -(A) For changing the amount of memory. - This is to allow a feature like capacity on demand. -(B) For installing/removing DIMMs or NUMA-nodes physically. - This is to exchange DIMMs/NUMA-nodes, reduce power consumption, etc. +Linux only supports memory hot(un)plug on selected 64 bit architectures, such as +x86_64, arm64, ppc64, s390x and ia64. -(A) is required by highly virtualized environments and (B) is required by -hardware which supports memory power management. +Memory Hot(Un)Plug Granularity +------------------------------ -Linux memory hotplug is designed for both purpose. +Memory hot(un)plug in Linux uses the SPARSEMEM memory model, which divides the +physical memory address space into chunks of the same size: memory sections. The +size of a memory section is architecture dependent. For example, x86_64 uses +128 MiB and ppc64 uses 16 MiB. -Phases of memory hotplug +Memory sections are combined into chunks referred to as "memory blocks". The +size of a memory block is architecture dependent and corresponds to the smallest +granularity that can be hot(un)plugged. The default size of a memory block is +the same as memory section size, unless an architecture specifies otherwise. + +All memory blocks have the same size. + +Phases of Memory Hotplug ------------------------ -There are 2 phases in Memory Hotplug: +Memory hotplug consists of two phases: - 1) Physical Memory Hotplug phase - 2) Logical Memory Hotplug phase. +(1) Adding the memory to Linux +(2) Onlining memory blocks -The First phase is to communicate hardware/firmware and make/erase -environment for hotplugged memory. Basically, this phase is necessary -for the purpose (B), but this is good phase for communication between -highly virtualized environments too. +In the first phase, metadata, such as the memory map ("memmap") and page tables +for the direct mapping, is allocated and initialized, and memory blocks are +created; the latter also creates sysfs files for managing newly created memory +blocks. -When memory is hotplugged, the kernel recognizes new memory, makes new memory -management tables, and makes sysfs files for new memory's operation. +In the second phase, added memory is exposed to the page allocator. After this +phase, the memory is visible in memory statistics, such as free and total +memory, of the system. -If firmware supports notification of connection of new memory to OS, -this phase is triggered automatically. ACPI can notify this event. If not, -"probe" operation by system administration is used instead. -(see :ref:`memory_hotplug_physical_mem`). +Phases of Memory Hotunplug +-------------------------- -Logical Memory Hotplug phase is to change memory state into -available/unavailable for users. Amount of memory from user's view is -changed by this phase. The kernel makes all memory in it as free pages -when a memory range is available. +Memory hotunplug consists of two phases: -In this document, this phase is described as online/offline. +(1) Offlining memory blocks +(2) Removing the memory from Linux -Logical Memory Hotplug phase is triggered by write of sysfs file by system -administrator. For the hot-add case, it must be executed after Physical Hotplug -phase by hand. -(However, if you writes udev's hotplug scripts for memory hotplug, these -phases can be execute in seamless way.) +In the fist phase, memory is "hidden" from the page allocator again, for +example, by migrating busy memory to other memory locations and removing all +relevant free pages from the page allocator After this phase, the memory is no +longer visible in memory statistics of the system. -Unit of Memory online/offline operation ---------------------------------------- +In the second phase, the memory blocks are removed and metadata is freed. -Memory hotplug uses SPARSEMEM memory model which allows memory to be divided -into chunks of the same size. These chunks are called "sections". The size of -a memory section is architecture dependent. For example, power uses 16MiB, ia64 -uses 1GiB. +Memory Hotplug Notifications +============================ -Memory sections are combined into chunks referred to as "memory blocks". The -size of a memory block is architecture dependent and represents the logical -unit upon which memory online/offline operations are to be performed. The -default size of a memory block is the same as memory section size unless an -architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.) +There are various ways how Linux is notified about memory hotplug events such +that it can start adding hotplugged memory. This description is limited to +systems that support ACPI; mechanisms specific to other firmware interfaces or +virtual machines are not described. -To determine the size (in bytes) of a memory block please read this file:: +ACPI Notifications +------------------ - /sys/devices/system/memory/block_size_bytes +Platforms that support ACPI, such as x86_64, can support memory hotplug +notifications via ACPI. -Kernel Configuration -==================== +In general, a firmware supporting memory hotplug defines a memory class object +HID "PNP0C80". When notified about hotplug of a new memory device, the ACPI +driver will hotplug the memory to Linux. -To use memory hotplug feature, kernel must be compiled with following -config options. +If the firmware supports hotplug of NUMA nodes, it defines an object _HID +"ACPI0004", "PNP0A05", or "PNP0A06". When notified about an hotplug event, all +assigned memory devices are added to Linux by the ACPI driver. -- For all memory hotplug: - - Memory model -> Sparse Memory (``CONFIG_SPARSEMEM``) - - Allow for memory hot-add (``CONFIG_MEMORY_HOTPLUG``) +Similarly, Linux can be notified about requests to hotunplug a memory device or +a NUMA node via ACPI. The ACPI driver will try offlining all relevant memory +blocks, and, if successful, hotunplug the memory from Linux. -- To enable memory removal, the following are also necessary: - - Allow for memory hot remove (``CONFIG_MEMORY_HOTREMOVE``) - - Page Migration (``CONFIG_MIGRATION``) +Manual Probing +-------------- -- For ACPI memory hotplug, the following are also necessary: - - Memory hotplug (under ACPI Support menu) (``CONFIG_ACPI_HOTPLUG_MEMORY``) - - This option can be kernel module. +On some architectures, the firmware may not be able to notify the operating +system about a memory hotplug event. Instead, the memory has to be manually +probed from user space. -- As a related configuration, if your box has a feature of NUMA-node hotplug - via ACPI, then this option is necessary too. +The probe interface is located at:: - - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) - (``CONFIG_ACPI_CONTAINER``). + /sys/devices/system/memory/probe - This option can be kernel module too. +Only complete memory blocks can be probed. Individual memory blocks are probed +by providing the physical start address of the memory block:: + % echo addr > /sys/devices/system/memory/probe -.. _memory_hotplug_sysfs_files: +Which results in a memory block for the range [addr, addr + memory_block_size) +being created. -sysfs files for memory hotplug -============================== +.. note:: -All memory blocks have their device information in sysfs. Each memory block -is described under ``/sys/devices/system/memory`` as:: + Using the probe interface is discouraged as it is easy to crash the kernel, + because Linux cannot validate user input; this interface might be removed in + the future. - /sys/devices/system/memory/memoryXXX +Onlining and Offlining Memory Blocks +==================================== -where XXX is the memory block id. +After a memory block has been created, Linux has to be instructed to actually +make use of that memory: the memory block has to be "online". -For the memory block covered by the sysfs directory. It is expected that all -memory sections in this range are present and no memory holes exist in the -range. Currently there is no way to determine if there is a memory hole, but -the existence of one should not affect the hotplug capabilities of the memory -block. +Before a memory block can be removed, Linux has to stop using any memory part of +the memory block: the memory block has to be "offlined". -For example, assume 1GiB memory block size. A device for a memory starting at -0x100000000 is ``/sys/device/system/memory/memory4``:: +The Linux kernel can be configured to automatically online added memory blocks +and drivers automatically trigger offlining of memory blocks when trying +hotunplug of memory. Memory blocks can only be removed once offlining succeeded +and drivers may trigger offlining of memory blocks when attempting hotunplug of +memory. - (0x100000000 / 1Gib = 4) +Onlining Memory Blocks Manually +------------------------------- -This device covers address range [0x100000000 ... 0x140000000) +If auto-onlining of memory blocks isn't enabled, user-space has to manually +trigger onlining of memory blocks. Often, udev rules are used to automate this +task in user space. -Under each memory block, you can see 5 files: +Onlining of a memory block can be triggered via:: -- ``/sys/devices/system/memory/memoryXXX/phys_index`` -- ``/sys/devices/system/memory/memoryXXX/phys_device`` -- ``/sys/devices/system/memory/memoryXXX/state`` -- ``/sys/devices/system/memory/memoryXXX/removable`` -- ``/sys/devices/system/memory/memoryXXX/valid_zones`` + % echo online > /sys/devices/system/memory/memoryXXX/state -=================== ============================================================ -``phys_index`` read-only and contains memory block id, same as XXX. -``state`` read-write +Or alternatively:: - - at read: contains online/offline state of memory. - - at write: user can specify "online_kernel", + % echo 1 > /sys/devices/system/memory/memoryXXX/online - "online_movable", "online", "offline" command - which will be performed on all sections in the block. -``phys_device`` read-only: legacy interface only ever used on s390x to - expose the covered storage increment. -``removable`` read-only: legacy interface that indicated whether a memory - block was likely to be offlineable or not. Newer kernel - versions return "1" if and only if the kernel supports - memory offlining. -``valid_zones`` read-only: designed to show by which zone memory provided by - a memory block is managed, and to show by which zone memory - provided by an offline memory block could be managed when - onlining. - - The first column shows it`s default zone. - - "memory6/valid_zones: Normal Movable" shows this memoryblock - can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE - by online_movable. - - "memory7/valid_zones: Movable Normal" shows this memoryblock - can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL - by online_kernel. -=================== ============================================================ +The kernel will select the target zone automatically, usually defaulting to +``ZONE_NORMAL`` unless ``movablecore=1`` has been specified on the kernel +command line or if the memory block would intersect the ZONE_MOVABLE already. -.. note:: +One can explicitly request to associate an offline memory block with +ZONE_MOVABLE by:: - These directories/files appear after physical memory hotplug phase. + % echo online_movable > /sys/devices/system/memory/memoryXXX/state -If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed -via symbolic links located in the ``/sys/devices/system/node/node*`` directories. +Or one can explicitly request a kernel zone (usually ZONE_NORMAL) by:: -For example:: + % echo online_kernel > /sys/devices/system/memory/memoryXXX/state - /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 +In any case, if onlining succeeds, the state of the memory block is changed to +be "online". If it fails, the state of the memory block will remain unchanged +and the above commands will fail. -A backlink will also be created:: +Onlining Memory Blocks Automatically +------------------------------------ - /sys/devices/system/memory/memory9/node0 -> ../../node/node0 +The kernel can be configured to try auto-onlining of newly added memory blocks. +If this feature is disabled, the memory blocks will stay offline until +explicitly onlined from user space. -.. _memory_hotplug_physical_mem: +The configured auto-online behavior can be observed via:: -Physical memory hot-add phase -============================= + % cat /sys/devices/system/memory/auto_online_blocks -Hardware(Firmware) Support --------------------------- +Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or +``online_movable`` to that file, like:: -On x86_64/ia64 platform, memory hotplug by ACPI is supported. + % echo online > /sys/devices/system/memory/auto_online_blocks -In general, the firmware (ACPI) which supports memory hotplug defines -memory class object of _HID "PNP0C80". When a notify is asserted to PNP0C80, -Linux's ACPI handler does hot-add memory to the system and calls a hotplug udev -script. This will be done automatically. +Modifying the auto-online behavior will only affect all subsequently added +memory blocks only. -But scripts for memory hotplug are not contained in generic udev package(now). -You may have to write it by yourself or online/offline memory by hand. -Please see :ref:`memory_hotplug_how_to_online_memory` and -:ref:`memory_hotplug_how_to_offline_memory`. +.. note:: -If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", -"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler -calls hotplug code for all of objects which are defined in it. -If memory device is found, memory hotplug code will be called. + In corner cases, auto-onlining can fail. The kernel won't retry. Note that + auto-onlining is not expected to fail in default configurations. -Notify memory hot-add event by hand ------------------------------------ +.. note:: -On some architectures, the firmware may not notify the kernel of a memory -hotplug event. Therefore, the memory "probe" interface is supported to -explicitly notify the kernel. This interface depends on -CONFIG_ARCH_MEMORY_PROBE and can be configured on powerpc, sh, and x86 -if hotplug is supported, although for x86 this should be handled by ACPI -notification. + DLPAR on ppc64 ignores the ``offline`` setting and will still online added + memory blocks; if onlining fails, memory blocks are removed again. -Probe interface is located at:: +Offlining Memory Blocks +----------------------- - /sys/devices/system/memory/probe +In the current implementation, Linux's memory offlining will try migrating all +movable pages off the affected memory block. As most kernel allocations, such as +page tables, are unmovable, page migration can fail and, therefore, inhibit +memory offlining from succeeding. -You can tell the physical address of new memory to the kernel by:: +Having the memory provided by memory block managed by ZONE_MOVABLE significantly +increases memory offlining reliability; still, memory offlining can fail in +some corner cases. - % echo start_address_of_new_memory > /sys/devices/system/memory/probe +Further, memory offlining might retry for a long time (or even forever), until +aborted by the user. -Then, [start_address_of_new_memory, start_address_of_new_memory + -memory_block_size] memory range is hot-added. In this case, hotplug script is -not called (in current implementation). You'll have to online memory by -yourself. Please see :ref:`memory_hotplug_how_to_online_memory`. +Offlining of a memory block can be triggered via:: -Logical Memory hot-add phase -============================ + % echo offline > /sys/devices/system/memory/memoryXXX/state -State of memory ---------------- +Or alternatively:: -To see (online/offline) state of a memory block, read 'state' file:: + % echo 0 > /sys/devices/system/memory/memoryXXX/online - % cat /sys/device/system/memory/memoryXXX/state +If offlining succeeds, the state of the memory block is changed to be "offline". +If it fails, the state of the memory block will remain unchanged and the above +commands will fail, for example, via:: + bash: echo: write error: Device or resource busy -- If the memory block is online, you'll read "online". -- If the memory block is offline, you'll read "offline". +or via:: + bash: echo: write error: Invalid argument -.. _memory_hotplug_how_to_online_memory: +Observing the State of Memory Blocks +------------------------------------ -How to online memory --------------------- +The state (online/offline/going-offline) of a memory block can be observed +either via:: -When the memory is hot-added, the kernel decides whether or not to "online" -it according to the policy which can be read from "auto_online_blocks" file:: + % cat /sys/device/system/memory/memoryXXX/state - % cat /sys/devices/system/memory/auto_online_blocks +Or alternatively (1/0) via:: -The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config -option. If it is disabled the default is "offline" which means the newly added -memory is not in a ready-to-use state and you have to "online" the newly added -memory blocks manually. Automatic onlining can be requested by writing "online" -to "auto_online_blocks" file:: + % cat /sys/device/system/memory/memoryXXX/online - % echo online > /sys/devices/system/memory/auto_online_blocks +For an online memory block, the managing zone can be observed via:: -This sets a global policy and impacts all memory blocks that will subsequently -be hotplugged. Currently offline blocks keep their state. It is possible, under -certain circumstances, that some memory blocks will be added but will fail to -online. User space tools can check their "state" files -(``/sys/devices/system/memory/memoryXXX/state``) and try to online them manually. + % cat /sys/device/system/memory/memoryXXX/valid_zones -If the automatic onlining wasn't requested, failed, or some memory block was -offlined it is possible to change the individual block's state by writing to the -"state" file:: +Configuring Memory Hot(Un)Plug +============================== - % echo online > /sys/devices/system/memory/memoryXXX/state +There are various ways how system administrators can configure memory +hot(un)plug and interact with memory blocks, especially, to online them. -This onlining will not change the ZONE type of the target memory block, -If the memory block doesn't belong to any zone an appropriate kernel zone -(usually ZONE_NORMAL) will be used unless movable_node kernel command line -option is specified when ZONE_MOVABLE will be used. +Memory Hot(Un)Plug Configuration via Sysfs +------------------------------------------ -You can explicitly request to associate it with ZONE_MOVABLE by:: +Some memory hot(un)plug properties can be configured or inspected via sysfs in:: - % echo online_movable > /sys/devices/system/memory/memoryXXX/state + /sys/devices/system/memory/ -.. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE +The following files are currently defined: -Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:: +====================== ========================================================= +``auto_online_blocks`` read-write: set or get the default state of new memory + blocks; configure auto-onlining. - % echo online_kernel > /sys/devices/system/memory/memoryXXX/state + The default value depends on the + CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel configuration + option. -.. note:: current limit: this memory block must be adjacent to ZONE_NORMAL + See the ``state`` property of memory blocks for details. +``block_size_bytes`` read-only: the size in bytes of a memory block. +``probe`` write-only: add (probe) selected memory blocks manually + from user space by supplying the physical start address. -An explicit zone onlining can fail (e.g. when the range is already within -and existing and incompatible zone already). + Availability depends on the CONFIG_ARCH_MEMORY_PROBE + kernel configuration option. +``uevent`` read-write: generic udev file for device subsystems. +====================== ========================================================= -After this, memory block XXX's state will be 'online' and the amount of -available memory will be increased. +.. note:: -This may be changed in future. + When the CONFIG_MEMORY_FAILURE kernel configuration option is enabled, two + additional files ``hard_offline_page`` and ``soft_offline_page`` are available + to trigger hwpoisoning of pages, for example, for testing purposes. Note that + this functionality is not really related to memory hot(un)plug or actual + offlining of memory blocks. -Logical memory remove -===================== +Memory Block Configuration via Sysfs +------------------------------------ -Memory offline and ZONE_MOVABLE -------------------------------- +Each memory block is represented as a memory block device that can be +onlined or offlined. All memory blocks have their device information located in +sysfs. Each present memory block is listed under +``/sys/devices/system/memory`` as:: -Memory offlining is more complicated than memory online. Because memory offline -has to make the whole memory block be unused, memory offline can fail if -the memory block includes memory which cannot be freed. + /sys/devices/system/memory/memoryXXX -In general, memory offline can use 2 techniques. +where XXX is the memory block id; the number of digits is variable. -(1) reclaim and free all memory in the memory block. -(2) migrate all pages in the memory block. +A present memory block indicates that some memory in the range is present; +however, a memory block might span memory holes. A memory block spanning memory +holes cannot be offlined. -In the current implementation, Linux's memory offline uses method (2), freeing -all pages in the memory block by page migration. But not all pages are -migratable. Under current Linux, migratable pages are anonymous pages and -page caches. For offlining a memory block by migration, the kernel has to -guarantee that the memory block contains only migratable pages. +For example, assume 1 GiB memory block size. A device for a memory starting at +0x100000000 is ``/sys/device/system/memory/memory4``:: -Now, a boot option for making a memory block which consists of migratable pages -is supported. By specifying "kernelcore=" or "movablecore=" boot option, you can -create ZONE_MOVABLE...a zone which is just used for movable pages. -(See also Documentation/admin-guide/kernel-parameters.rst) + (0x100000000 / 1Gib = 4) -Assume the system has "TOTAL" amount of memory at boot time, this boot option -creates ZONE_MOVABLE as following. +This device covers address range [0x100000000 ... 0x140000000) -1) When kernelcore=YYYY boot option is used, - Size of memory not for movable pages (not for offline) is YYYY. - Size of memory for movable pages (for offline) is TOTAL-YYYY. +The following files are currently defined: -2) When movablecore=ZZZZ boot option is used, - Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ. - Size of memory for movable pages (for offline) is ZZZZ. +=================== ============================================================ +``online`` read-write: simplified interface to trigger onlining / + offlining and to observe the state of a memory block. + When onlining, the zone is selected automatically. +``phys_device`` read-only: legacy interface only ever used on s390x to + expose the covered storage increment. +``phys_index`` read-only: the memory block id (XXX). +``removable`` read-only: legacy interface that indicated whether a memory + block was likely to be offlineable or not. Nowadays, the + kernel return ``1`` if and only if it supports memory + offlining. +``state`` read-write: advanced interface to trigger onlining / + offlining and to observe the state of a memory block. + + When writing, ``online``, ``offline``, ``online_kernel`` and + ``online_movable`` are supported. + + ``online_movable`` specifies onlining to ZONE_MOVABLE. + ``online_kernel`` specifies onlining to the default kernel + zone for the memory block, such as ZONE_NORMAL. + ``online`` let's the kernel select the zone automatically. + + When reading, ``online``, ``offline`` and ``going-offline`` + may be returned. +``uevent`` read-write: generic uevent file for devices. +``valid_zones`` read-only: when a block is online, shows the zone it + belongs to; when a block is offline, shows what zone will + manage it when the block will be onlined. + + For online memory blocks, ``DMA``, ``DMA32``, ``Normal``, + ``Movable`` and ``none`` may be returned. ``none`` indicates + that memory provided by a memory block is managed by + multiple zones or spans multiple nodes; such memory blocks + cannot be offlined. ``Movable`` indicates ZONE_MOVABLE. + Other values indicate a kernel zone. + + For offline memory blocks, the first column shows the + zone the kernel would select when onlining the memory block + right now without further specifying a zone. + + Availability depends on the CONFIG_MEMORY_HOTREMOVE + kernel configuration option. +=================== ============================================================ .. note:: - Unfortunately, there is no information to show which memory block belongs - to ZONE_MOVABLE. This is TBD. + If the CONFIG_NUMA kernel configuration option is enabled, the memoryXXX/ + directories can also be accessed via symbolic links located in the + ``/sys/devices/system/node/node*`` directories. + + For example:: + + /sys/devices/system/node/node0/memory9 -> ../../memory/memory9 + + A backlink will also be created:: + + /sys/devices/system/memory/memory9/node0 -> ../../node/node0 + +Command Line Parameters +----------------------- + +Some command line parameters affect memory hot(un)plug handling. The following +command line parameters are relevant: + +======================== ======================================================= +``memhp_default_state`` configure auto-onlining by essentially setting + ``/sys/devices/system/memory/auto_online_blocks``. +``movablecore`` configure automatic zone selection of the kernel. When + set, the kernel will default to ZONE_MOVABLE, unless + other zones can be kept contiguous. +======================== ======================================================= + +Module Parameters +------------------ - Memory offlining can fail when dissolving a free huge page on ZONE_MOVABLE - and the feature of freeing unused vmemmap pages associated with each hugetlb - page is enabled. +Instead of additional command line parameters or sysfs files, the +``memory_hotplug`` subsystem now provides a dedicated namespace for module +parameters. Module parameters can be set via the command line by predicating +them with ``memory_hotplug.`` such as:: + + memory_hotplug.memmap_on_memory=1 + +and they can be observed (and some even modified at runtime) via:: + + /sys/modules/memory_hotplug/parameters/ + +The following module parameters are currently defined: + +======================== ======================================================= +``memmap_on_memory`` read-write: Allocate memory for the memmap from the + added memory block itself. Even if enabled, actual + support depends on various other system properties and + should only be regarded as a hint whether the behavior + would be desired. + + While allocating the memmap from the memory block + itself makes memory hotplug less likely to fail and + keeps the memmap on the same NUMA node in any case, it + can fragment physical memory in a way that huge pages + in bigger granularity cannot be formed on hotplugged + memory. +======================== ======================================================= + +ZONE_MOVABLE +============ + +ZONE_MOVABLE is an important mechanism for more reliable memory offlining. +Further, having system RAM managed by ZONE_MOVABLE instead of one of the +kernel zones can increase the number of possible transparent huge pages and +dynamically allocated huge pages. + +Most kernel allocations are unmovable. Important examples include the memory +map (usually 1/64ths of memory), page tables, and kmalloc(). Such allocations +can only be served from the kernel zones. + +Most user space pages, such as anonymous memory, and page cache pages are +movable. Such allocations can be served from ZONE_MOVABLE and the kernel zones. + +Only movable allocations are served from ZONE_MOVABLE, resulting in unmovable +allocations being limited to the kernel zones. Without ZONE_MOVABLE, there is +absolutely no guarantee whether a memory block can be offlined successfully. + +Zone Imbalances +--------------- - This can happen when we have plenty of ZONE_MOVABLE memory, but not enough - kernel memory to allocate vmemmmap pages. We may even be able to migrate - huge page contents, but will not be able to dissolve the source huge page. - This will prevent an offline operation and is unfortunate as memory offlining - is expected to succeed on movable zones. Users that depend on memory hotplug - to succeed for movable zones should carefully consider whether the memory - savings gained from this feature are worth the risk of possibly not being - able to offline memory in certain situations. +Having too much system RAM managed by ZONE_MOVABLE is called a zone imbalance, +which can harm the system or degrade performance. As one example, the kernel +might crash because it runs out of free memory for unmovable allocations, +although there is still plenty of free memory left in ZONE_MOVABLE. + +Usually, MOVABLE:KERNEL ratios of up to 3:1 or even 4:1 are fine. Ratios of 63:1 +are definitely impossible due to the overhead for the memory map. + +Actual safe zone ratios depend on the workload. Extreme cases, like excessive +long-term pinning of pages, might not be able to deal with ZONE_MOVABLE at all. .. note:: - Techniques that rely on long-term pinnings of memory (especially, RDMA and - vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory - hot remove. Pinned pages cannot reside on ZONE_MOVABLE, to guarantee that - memory can still get hot removed - be aware that pinning can fail even if - there is plenty of free memory in ZONE_MOVABLE. In addition, using - ZONE_MOVABLE might make page pinning more expensive, because pages have to be - migrated off that zone first. -.. _memory_hotplug_how_to_offline_memory: + CMA memory part of a kernel zone essentially behaves like memory in + ZONE_MOVABLE and similar considerations apply, especially when combining + CMA with ZONE_MOVABLE. -How to offline memory ---------------------- +ZONE_MOVABLE Sizing Considerations +---------------------------------- -You can offline a memory block by using the same sysfs interface that was used -in memory onlining:: +We usually expect that a large portion of available system RAM will actually +be consumed by user space, either directly or indirectly via the page cache. In +the normal case, ZONE_MOVABLE can be used when allocating such pages just fine. - % echo offline > /sys/devices/system/memory/memoryXXX/state +With that in mind, it makes sense that we can have a big portion of system RAM +managed by ZONE_MOVABLE. However, there are some things to consider when using +ZONE_MOVABLE, especially when fine-tuning zone ratios: + +- Having a lot of offline memory blocks. Even offline memory blocks consume + memory for metadata and page tables in the direct map; having a lot of offline + memory blocks is not a typical case, though. + +- Memory ballooning without balloon compaction is incompatible with + ZONE_MOVABLE. Only some implementations, such as virtio-balloon and + pseries CMM, fully support balloon compaction. + + Further, the CONFIG_BALLOON_COMPACTION kernel configuration option might be + disabled. In that case, balloon inflation will only perform unmovable + allocations and silently create a zone imbalance, usually triggered by + inflation requests from the hypervisor. + +- Gigantic pages are unmovable, resulting in user space consuming a + lot of unmovable memory. + +- Huge pages are unmovable when an architectures does not support huge + page migration, resulting in a similar issue as with gigantic pages. + +- Page tables are unmovable. Excessive swapping, mapping extremely large + files or ZONE_DEVICE memory can be problematic, although only really relevant + in corner cases. When we manage a lot of user space memory that has been + swapped out or is served from a file/persistent memory/... we still need a lot + of page tables to manage that memory once user space accessed that memory. + +- In certain DAX configurations the memory map for the device memory will be + allocated from the kernel zones. + +- KASAN can have a significant memory overhead, for example, consuming 1/8th of + the total system memory size as (unmovable) tracking metadata. + +- Long-term pinning of pages. Techniques that rely on long-term pinnings + (especially, RDMA and vfio/mdev) are fundamentally problematic with + ZONE_MOVABLE, and therefore, memory offlining. Pinned pages cannot reside + on ZONE_MOVABLE as that would turn these pages unmovable. Therefore, they + have to be migrated off that zone while pinning. Pinning a page can fail + even if there is plenty of free memory in ZONE_MOVABLE. + + In addition, using ZONE_MOVABLE might make page pinning more expensive, + because of the page migration overhead. + +By default, all the memory configured at boot time is managed by the kernel +zones and ZONE_MOVABLE is not used. + +To enable ZONE_MOVABLE to include the memory present at boot and to control the +ratio between movable and kernel zones there are two command line options: +``kernelcore=`` and ``movablecore=``. See +Documentation/admin-guide/kernel-parameters.rst for their description. + +Memory Offlining and ZONE_MOVABLE +--------------------------------- + +Even with ZONE_MOVABLE, there are some corner cases where offlining a memory +block might fail: + +- Memory blocks with memory holes; this applies to memory blocks present during + boot and can apply to memory blocks hotplugged via the XEN balloon and the + Hyper-V balloon. + +- Mixed NUMA nodes and mixed zones within a single memory block prevent memory + offlining; this applies to memory blocks present during boot only. + +- Special memory blocks prevented by the system from getting offlined. Examples + include any memory available during boot on arm64 or memory blocks spanning + the crashkernel area on s390x; this usually applies to memory blocks present + during boot only. + +- Memory blocks overlapping with CMA areas cannot be offlined, this applies to + memory blocks present during boot only. + +- Concurrent activity that operates on the same physical memory area, such as + allocating gigantic pages, can result in temporary offlining failures. + +- Out of memory when dissolving huge pages, especially when freeing unused + vmemmap pages associated with each hugetlb page is enabled. + + Offlining code may be able to migrate huge page contents, but may not be able + to dissolve the source huge page because it fails allocating (unmovable) pages + for the vmemmap, because the system might not have free memory in the kernel + zones left. + + Users that depend on memory offlining to succeed for movable zones should + carefully consider whether the memory savings gained from this feature are + worth the risk of possibly not being able to offline memory in certain + situations. + +Further, when running into out of memory situations while migrating pages, or +when still encountering permanently unmovable pages within ZONE_MOVABLE +(-> BUG), memory offlining will keep retrying until it eventually succeeds. + +When offlining is triggered from user space, the offlining context can be +terminated by sending a fatal signal. A timeout based offlining can easily be +implemented via:: -If offline succeeds, the state of the memory block is changed to be "offline". -If it fails, some error core (like -EBUSY) will be returned by the kernel. -Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline -it. If it doesn't contain 'unmovable' memory, you'll get success. - -A memory block under ZONE_MOVABLE is considered to be able to be offlined -easily. But under some busy state, it may return -EBUSY. Even if a memory -block cannot be offlined due to -EBUSY, you can retry offlining it and may be -able to offline it (or not). (For example, a page is referred to by some kernel -internal call and released soon.) - -Consideration: - Memory hotplug's design direction is to make the possibility of memory - offlining higher and to guarantee unplugging memory under any situation. But - it needs more work. Returning -EBUSY under some situation may be good because - the user can decide to retry more or not by himself. Currently, memory - offlining code does some amount of retry with 120 seconds timeout. - -Physical memory remove -====================== - -Need more implementation yet.... - - Notification completion of remove works by OS to firmware. - - Guard from remove if not yet. - - -Locking Internals -================= - -When adding/removing memory that uses memory block devices (i.e. ordinary RAM), -the device_hotplug_lock should be held to: - -- synchronize against online/offline requests (e.g. via sysfs). This way, memory - block devices can only be accessed (.online/.state attributes) by user - space once memory has been fully added. And when removing memory, we - know nobody is in critical sections. -- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC) - -Especially, there is a possible lock inversion that is avoided using -device_hotplug_lock when adding memory and user space tries to online that -memory faster than expected: - -- device_online() will first take the device_lock(), followed by - mem_hotplug_lock -- add_memory_resource() will first take the mem_hotplug_lock, followed by - the device_lock() (while creating the devices, during bus_add_device()). - -As the device is visible to user space before taking the device_lock(), this -can result in a lock inversion. - -onlining/offlining of memory should be done via device_online()/ -device_offline() - to make sure it is properly synchronized to actions -via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type) - -When adding/removing/onlining/offlining memory or adding/removing -heterogeneous/device memory, we should always hold the mem_hotplug_lock in -write mode to serialise memory hotplug (e.g. access to global/zone -variables). - -In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read -mode allows for a quite efficient get_online_mems/put_online_mems -implementation, so code accessing memory can protect from that memory -vanishing. - - -Future Work -=========== - - - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like - sysctl or new control file. - - showing memory block and physical device relationship. - - test and make it better memory offlining. - - support HugeTLB page migration and offlining. - - memmap removing at memory offline. - - physical remove memory. + % timeout $TIMEOUT offline_block | failure_handling diff --git a/Documentation/arm/marvell.rst b/Documentation/arm/marvell.rst index 85169bc3f538..56bb592dbd0c 100644 --- a/Documentation/arm/marvell.rst +++ b/Documentation/arm/marvell.rst @@ -140,6 +140,7 @@ EBU Armada family - 88F6821 Armada 382 - 88F6W21 Armada 383 - 88F6820 Armada 385 + - 88F6825 - 88F6828 Armada 388 - Product infos: https://web.archive.org/web/20181006144616/http://www.marvell.com/embedded-processors/armada-38x/ diff --git a/Documentation/block/blk-mq.rst b/Documentation/block/blk-mq.rst index d96118c73954..31f52f326971 100644 --- a/Documentation/block/blk-mq.rst +++ b/Documentation/block/blk-mq.rst @@ -54,7 +54,7 @@ layer or if we want to try to merge requests. In both cases, requests will be sent to the software queue. Then, after the requests are processed by software queues, they will be placed -at the hardware queue, a second stage queue were the hardware has direct access +at the hardware queue, a second stage queue where the hardware has direct access to process those requests. However, if the hardware does not have enough resources to accept more requests, blk-mq will places requests on a temporary queue, to be sent in the future, when the hardware is able. diff --git a/Documentation/conf.py b/Documentation/conf.py index 75650f6443af..948a97d6387d 100644 --- a/Documentation/conf.py +++ b/Documentation/conf.py @@ -463,8 +463,8 @@ latex_elements['preamble'] += ''' \\newcommand{\\kerneldocEndTC}{} \\newcommand{\\kerneldocBeginKR}{} \\newcommand{\\kerneldocEndKR}{} - \\newcommand{\\kerneldocBeginSC}{} - \\newcommand{\\kerneldocEndKR}{} + \\newcommand{\\kerneldocBeginJP}{} + \\newcommand{\\kerneldocEndJP}{} } ''' diff --git a/Documentation/core-api/cpu_hotplug.rst b/Documentation/core-api/cpu_hotplug.rst index b66e3cae1472..c6f4ba2fb32d 100644 --- a/Documentation/core-api/cpu_hotplug.rst +++ b/Documentation/core-api/cpu_hotplug.rst @@ -2,12 +2,13 @@ CPU hotplug in the Kernel ========================= -:Date: December, 2016 +:Date: September, 2021 :Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, - Rusty Russell <rusty@rustcorp.com.au>, - Srivatsa Vaddagiri <vatsa@in.ibm.com>, - Ashok Raj <ashok.raj@intel.com>, - Joel Schopp <jschopp@austin.ibm.com> + Rusty Russell <rusty@rustcorp.com.au>, + Srivatsa Vaddagiri <vatsa@in.ibm.com>, + Ashok Raj <ashok.raj@intel.com>, + Joel Schopp <jschopp@austin.ibm.com>, + Thomas Gleixner <tglx@linutronix.de> Introduction ============ @@ -158,100 +159,480 @@ at state ``CPUHP_OFFLINE``. This includes: * Once all services are migrated, kernel calls an arch specific routine ``__cpu_disable()`` to perform arch specific cleanup. -Using the hotplug API ---------------------- - -It is possible to receive notifications once a CPU is offline or onlined. This -might be important to certain drivers which need to perform some kind of setup -or clean up functions based on the number of available CPUs:: - - #include <linux/cpuhotplug.h> - - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "X/Y:online", - Y_online, Y_prepare_down); - -*X* is the subsystem and *Y* the particular driver. The *Y_online* callback -will be invoked during registration on all online CPUs. If an error -occurs during the online callback the *Y_prepare_down* callback will be -invoked on all CPUs on which the online callback was previously invoked. -After registration completed, the *Y_online* callback will be invoked -once a CPU is brought online and *Y_prepare_down* will be invoked when a -CPU is shutdown. All resources which were previously allocated in -*Y_online* should be released in *Y_prepare_down*. -The return value *ret* is negative if an error occurred during the -registration process. Otherwise a positive value is returned which -contains the allocated hotplug for dynamically allocated states -(*CPUHP_AP_ONLINE_DYN*). It will return zero for predefined states. - -The callback can be remove by invoking ``cpuhp_remove_state()``. In case of a -dynamically allocated state (*CPUHP_AP_ONLINE_DYN*) use the returned state. -During the removal of a hotplug state the teardown callback will be invoked. - -Multiple instances -~~~~~~~~~~~~~~~~~~ - -If a driver has multiple instances and each instance needs to perform the -callback independently then it is likely that a ''multi-state'' should be used. -First a multi-state state needs to be registered:: - - ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "X/Y:online, - Y_online, Y_prepare_down); - Y_hp_online = ret; - -The ``cpuhp_setup_state_multi()`` behaves similar to ``cpuhp_setup_state()`` -except it prepares the callbacks for a multi state and does not invoke -the callbacks. This is a one time setup. -Once a new instance is allocated, you need to register this new instance:: - - ret = cpuhp_state_add_instance(Y_hp_online, &d->node); - -This function will add this instance to your previously allocated -*Y_hp_online* state and invoke the previously registered callback -(*Y_online*) on all online CPUs. The *node* element is a ``struct -hlist_node`` member of your per-instance data structure. - -On removal of the instance:: - - cpuhp_state_remove_instance(Y_hp_online, &d->node) - -should be invoked which will invoke the teardown callback on all online -CPUs. - -Manual setup -~~~~~~~~~~~~ - -Usually it is handy to invoke setup and teardown callbacks on registration or -removal of a state because usually the operation needs to performed once a CPU -goes online (offline) and during initial setup (shutdown) of the driver. However -each registration and removal function is also available with a ``_nocalls`` -suffix which does not invoke the provided callbacks if the invocation of the -callbacks is not desired. During the manual setup (or teardown) the functions -``cpus_read_lock()`` and ``cpus_read_unlock()`` should be used to inhibit CPU -hotplug operations. - - -The ordering of the events --------------------------- - -The hotplug states are defined in ``include/linux/cpuhotplug.h``: - -* The states *CPUHP_OFFLINE* … *CPUHP_AP_OFFLINE* are invoked before the - CPU is up. -* The states *CPUHP_AP_OFFLINE* … *CPUHP_AP_ONLINE* are invoked - just the after the CPU has been brought up. The interrupts are off and - the scheduler is not yet active on this CPU. Starting with *CPUHP_AP_OFFLINE* - the callbacks are invoked on the target CPU. -* The states between *CPUHP_AP_ONLINE_DYN* and *CPUHP_AP_ONLINE_DYN_END* are - reserved for the dynamic allocation. -* The states are invoked in the reverse order on CPU shutdown starting with - *CPUHP_ONLINE* and stopping at *CPUHP_OFFLINE*. Here the callbacks are - invoked on the CPU that will be shutdown until *CPUHP_AP_OFFLINE*. - -A dynamically allocated state via *CPUHP_AP_ONLINE_DYN* is often enough. -However if an earlier invocation during the bring up or shutdown is required -then an explicit state should be acquired. An explicit state might also be -required if the hotplug event requires specific ordering in respect to -another hotplug event. + +The CPU hotplug API +=================== + +CPU hotplug state machine +------------------------- + +CPU hotplug uses a trivial state machine with a linear state space from +CPUHP_OFFLINE to CPUHP_ONLINE. Each state has a startup and a teardown +callback. + +When a CPU is onlined, the startup callbacks are invoked sequentially until +the state CPUHP_ONLINE is reached. They can also be invoked when the +callbacks of a state are set up or an instance is added to a multi-instance +state. + +When a CPU is offlined the teardown callbacks are invoked in the reverse +order sequentially until the state CPUHP_OFFLINE is reached. They can also +be invoked when the callbacks of a state are removed or an instance is +removed from a multi-instance state. + +If a usage site requires only a callback in one direction of the hotplug +operations (CPU online or CPU offline) then the other not-required callback +can be set to NULL when the state is set up. + +The state space is divided into three sections: + +* The PREPARE section + + The PREPARE section covers the state space from CPUHP_OFFLINE to + CPUHP_BRINGUP_CPU. + + The startup callbacks in this section are invoked before the CPU is + started during a CPU online operation. The teardown callbacks are invoked + after the CPU has become dysfunctional during a CPU offline operation. + + The callbacks are invoked on a control CPU as they can't obviously run on + the hotplugged CPU which is either not yet started or has become + dysfunctional already. + + The startup callbacks are used to setup resources which are required to + bring a CPU successfully online. The teardown callbacks are used to free + resources or to move pending work to an online CPU after the hotplugged + CPU became dysfunctional. + + The startup callbacks are allowed to fail. If a callback fails, the CPU + online operation is aborted and the CPU is brought down to the previous + state (usually CPUHP_OFFLINE) again. + + The teardown callbacks in this section are not allowed to fail. + +* The STARTING section + + The STARTING section covers the state space between CPUHP_BRINGUP_CPU + 1 + and CPUHP_AP_ONLINE. + + The startup callbacks in this section are invoked on the hotplugged CPU + with interrupts disabled during a CPU online operation in the early CPU + setup code. The teardown callbacks are invoked with interrupts disabled + on the hotplugged CPU during a CPU offline operation shortly before the + CPU is completely shut down. + + The callbacks in this section are not allowed to fail. + + The callbacks are used for low level hardware initialization/shutdown and + for core subsystems. + +* The ONLINE section + + The ONLINE section covers the state space between CPUHP_AP_ONLINE + 1 and + CPUHP_ONLINE. + + The startup callbacks in this section are invoked on the hotplugged CPU + during a CPU online operation. The teardown callbacks are invoked on the + hotplugged CPU during a CPU offline operation. + + The callbacks are invoked in the context of the per CPU hotplug thread, + which is pinned on the hotplugged CPU. The callbacks are invoked with + interrupts and preemption enabled. + + The callbacks are allowed to fail. When a callback fails the hotplug + operation is aborted and the CPU is brought back to the previous state. + +CPU online/offline operations +----------------------------- + +A successful online operation looks like this:: + + [CPUHP_OFFLINE] + [CPUHP_OFFLINE + 1]->startup() -> success + [CPUHP_OFFLINE + 2]->startup() -> success + [CPUHP_OFFLINE + 3] -> skipped because startup == NULL + ... + [CPUHP_BRINGUP_CPU]->startup() -> success + === End of PREPARE section + [CPUHP_BRINGUP_CPU + 1]->startup() -> success + ... + [CPUHP_AP_ONLINE]->startup() -> success + === End of STARTUP section + [CPUHP_AP_ONLINE + 1]->startup() -> success + ... + [CPUHP_ONLINE - 1]->startup() -> success + [CPUHP_ONLINE] + +A successful offline operation looks like this:: + + [CPUHP_ONLINE] + [CPUHP_ONLINE - 1]->teardown() -> success + ... + [CPUHP_AP_ONLINE + 1]->teardown() -> success + === Start of STARTUP section + [CPUHP_AP_ONLINE]->teardown() -> success + ... + [CPUHP_BRINGUP_ONLINE - 1]->teardown() + ... + === Start of PREPARE section + [CPUHP_BRINGUP_CPU]->teardown() + [CPUHP_OFFLINE + 3]->teardown() + [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL + [CPUHP_OFFLINE + 1]->teardown() + [CPUHP_OFFLINE] + +A failed online operation looks like this:: + + [CPUHP_OFFLINE] + [CPUHP_OFFLINE + 1]->startup() -> success + [CPUHP_OFFLINE + 2]->startup() -> success + [CPUHP_OFFLINE + 3] -> skipped because startup == NULL + ... + [CPUHP_BRINGUP_CPU]->startup() -> success + === End of PREPARE section + [CPUHP_BRINGUP_CPU + 1]->startup() -> success + ... + [CPUHP_AP_ONLINE]->startup() -> success + === End of STARTUP section + [CPUHP_AP_ONLINE + 1]->startup() -> success + --- + [CPUHP_AP_ONLINE + N]->startup() -> fail + [CPUHP_AP_ONLINE + (N - 1)]->teardown() + ... + [CPUHP_AP_ONLINE + 1]->teardown() + === Start of STARTUP section + [CPUHP_AP_ONLINE]->teardown() + ... + [CPUHP_BRINGUP_ONLINE - 1]->teardown() + ... + === Start of PREPARE section + [CPUHP_BRINGUP_CPU]->teardown() + [CPUHP_OFFLINE + 3]->teardown() + [CPUHP_OFFLINE + 2] -> skipped because teardown == NULL + [CPUHP_OFFLINE + 1]->teardown() + [CPUHP_OFFLINE] + +A failed offline operation looks like this:: + + [CPUHP_ONLINE] + [CPUHP_ONLINE - 1]->teardown() -> success + ... + [CPUHP_ONLINE - N]->teardown() -> fail + [CPUHP_ONLINE - (N - 1)]->startup() + ... + [CPUHP_ONLINE - 1]->startup() + [CPUHP_ONLINE] + +Recursive failures cannot be handled sensibly. Look at the following +example of a recursive fail due to a failed offline operation: :: + + [CPUHP_ONLINE] + [CPUHP_ONLINE - 1]->teardown() -> success + ... + [CPUHP_ONLINE - N]->teardown() -> fail + [CPUHP_ONLINE - (N - 1)]->startup() -> success + [CPUHP_ONLINE - (N - 2)]->startup() -> fail + +The CPU hotplug state machine stops right here and does not try to go back +down again because that would likely result in an endless loop:: + + [CPUHP_ONLINE - (N - 1)]->teardown() -> success + [CPUHP_ONLINE - N]->teardown() -> fail + [CPUHP_ONLINE - (N - 1)]->startup() -> success + [CPUHP_ONLINE - (N - 2)]->startup() -> fail + [CPUHP_ONLINE - (N - 1)]->teardown() -> success + [CPUHP_ONLINE - N]->teardown() -> fail + +Lather, rinse and repeat. In this case the CPU left in state:: + + [CPUHP_ONLINE - (N - 1)] + +which at least lets the system make progress and gives the user a chance to +debug or even resolve the situation. + +Allocating a state +------------------ + +There are two ways to allocate a CPU hotplug state: + +* Static allocation + + Static allocation has to be used when the subsystem or driver has + ordering requirements versus other CPU hotplug states. E.g. the PERF core + startup callback has to be invoked before the PERF driver startup + callbacks during a CPU online operation. During a CPU offline operation + the driver teardown callbacks have to be invoked before the core teardown + callback. The statically allocated states are described by constants in + the cpuhp_state enum which can be found in include/linux/cpuhotplug.h. + + Insert the state into the enum at the proper place so the ordering + requirements are fulfilled. The state constant has to be used for state + setup and removal. + + Static allocation is also required when the state callbacks are not set + up at runtime and are part of the initializer of the CPU hotplug state + array in kernel/cpu.c. + +* Dynamic allocation + + When there are no ordering requirements for the state callbacks then + dynamic allocation is the preferred method. The state number is allocated + by the setup function and returned to the caller on success. + + Only the PREPARE and ONLINE sections provide a dynamic allocation + range. The STARTING section does not as most of the callbacks in that + section have explicit ordering requirements. + +Setup of a CPU hotplug state +---------------------------- + +The core code provides the following functions to setup a state: + +* cpuhp_setup_state(state, name, startup, teardown) +* cpuhp_setup_state_nocalls(state, name, startup, teardown) +* cpuhp_setup_state_cpuslocked(state, name, startup, teardown) +* cpuhp_setup_state_nocalls_cpuslocked(state, name, startup, teardown) + +For cases where a driver or a subsystem has multiple instances and the same +CPU hotplug state callbacks need to be invoked for each instance, the CPU +hotplug core provides multi-instance support. The advantage over driver +specific instance lists is that the instance related functions are fully +serialized against CPU hotplug operations and provide the automatic +invocations of the state callbacks on add and removal. To set up such a +multi-instance state the following function is available: + +* cpuhp_setup_state_multi(state, name, startup, teardown) + +The @state argument is either a statically allocated state or one of the +constants for dynamically allocated states - CPUHP_PREPARE_DYN, +CPUHP_ONLINE_DYN - depending on the state section (PREPARE, ONLINE) for +which a dynamic state should be allocated. + +The @name argument is used for sysfs output and for instrumentation. The +naming convention is "subsys:mode" or "subsys/driver:mode", +e.g. "perf:mode" or "perf/x86:mode". The common mode names are: + +======== ======================================================= +prepare For states in the PREPARE section + +dead For states in the PREPARE section which do not provide + a startup callback + +starting For states in the STARTING section + +dying For states in the STARTING section which do not provide + a startup callback + +online For states in the ONLINE section + +offline For states in the ONLINE section which do not provide + a startup callback +======== ======================================================= + +As the @name argument is only used for sysfs and instrumentation other mode +descriptors can be used as well if they describe the nature of the state +better than the common ones. + +Examples for @name arguments: "perf/online", "perf/x86:prepare", +"RCU/tree:dying", "sched/waitempty" + +The @startup argument is a function pointer to the callback which should be +invoked during a CPU online operation. If the usage site does not require a +startup callback set the pointer to NULL. + +The @teardown argument is a function pointer to the callback which should +be invoked during a CPU offline operation. If the usage site does not +require a teardown callback set the pointer to NULL. + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_setup_state_nocalls(), cpuhp_setup_state_nocalls_cpuslocked() + and cpuhp_setup_state_multi() only install the callbacks + + * cpuhp_setup_state() and cpuhp_setup_state_cpuslocked() install the + callbacks and invoke the @startup callback (if not NULL) for all online + CPUs which have currently a state greater than the newly installed + state. Depending on the state section the callback is either invoked on + the current CPU (PREPARE section) or on each online CPU (ONLINE + section) in the context of the CPU's hotplug thread. + + If a callback fails for CPU N then the teardown callback for CPU + 0 .. N-1 is invoked to rollback the operation. The state setup fails, + the callbacks for the state are not installed and in case of dynamic + allocation the allocated state is freed. + +The state setup and the callback invocations are serialized against CPU +hotplug operations. If the setup function has to be called from a CPU +hotplug read locked region, then the _cpuslocked() variants have to be +used. These functions cannot be used from within CPU hotplug callbacks. + +The function return values: + ======== =================================================================== + 0 Statically allocated state was successfully set up + + >0 Dynamically allocated state was successfully set up. + + The returned number is the state number which was allocated. If + the state callbacks have to be removed later, e.g. module + removal, then this number has to be saved by the caller and used + as @state argument for the state remove function. For + multi-instance states the dynamically allocated state number is + also required as @state argument for the instance add/remove + operations. + + <0 Operation failed + ======== =================================================================== + +Removal of a CPU hotplug state +------------------------------ + +To remove a previously set up state, the following functions are provided: + +* cpuhp_remove_state(state) +* cpuhp_remove_state_nocalls(state) +* cpuhp_remove_state_nocalls_cpuslocked(state) +* cpuhp_remove_multi_state(state) + +The @state argument is either a statically allocated state or the state +number which was allocated in the dynamic range by cpuhp_setup_state*(). If +the state is in the dynamic range, then the state number is freed and +available for dynamic allocation again. + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_remove_state_nocalls(), cpuhp_remove_state_nocalls_cpuslocked() + and cpuhp_remove_multi_state() only remove the callbacks. + + * cpuhp_remove_state() removes the callbacks and invokes the teardown + callback (if not NULL) for all online CPUs which have currently a state + greater than the removed state. Depending on the state section the + callback is either invoked on the current CPU (PREPARE section) or on + each online CPU (ONLINE section) in the context of the CPU's hotplug + thread. + + In order to complete the removal, the teardown callback should not fail. + +The state removal and the callback invocations are serialized against CPU +hotplug operations. If the remove function has to be called from a CPU +hotplug read locked region, then the _cpuslocked() variants have to be +used. These functions cannot be used from within CPU hotplug callbacks. + +If a multi-instance state is removed then the caller has to remove all +instances first. + +Multi-Instance state instance management +---------------------------------------- + +Once the multi-instance state is set up, instances can be added to the +state: + + * cpuhp_state_add_instance(state, node) + * cpuhp_state_add_instance_nocalls(state, node) + +The @state argument is either a statically allocated state or the state +number which was allocated in the dynamic range by cpuhp_setup_state_multi(). + +The @node argument is a pointer to an hlist_node which is embedded in the +instance's data structure. The pointer is handed to the multi-instance +state callbacks and can be used by the callback to retrieve the instance +via container_of(). + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_state_add_instance_nocalls() and only adds the instance to the + multi-instance state's node list. + + * cpuhp_state_add_instance() adds the instance and invokes the startup + callback (if not NULL) associated with @state for all online CPUs which + have currently a state greater than @state. The callback is only + invoked for the to be added instance. Depending on the state section + the callback is either invoked on the current CPU (PREPARE section) or + on each online CPU (ONLINE section) in the context of the CPU's hotplug + thread. + + If a callback fails for CPU N then the teardown callback for CPU + 0 .. N-1 is invoked to rollback the operation, the function fails and + the instance is not added to the node list of the multi-instance state. + +To remove an instance from the state's node list these functions are +available: + + * cpuhp_state_remove_instance(state, node) + * cpuhp_state_remove_instance_nocalls(state, node) + +The arguments are the same as for the the cpuhp_state_add_instance*() +variants above. + +The functions differ in the way how the installed callbacks are treated: + + * cpuhp_state_remove_instance_nocalls() only removes the instance from the + state's node list. + + * cpuhp_state_remove_instance() removes the instance and invokes the + teardown callback (if not NULL) associated with @state for all online + CPUs which have currently a state greater than @state. The callback is + only invoked for the to be removed instance. Depending on the state + section the callback is either invoked on the current CPU (PREPARE + section) or on each online CPU (ONLINE section) in the context of the + CPU's hotplug thread. + + In order to complete the removal, the teardown callback should not fail. + +The node list add/remove operations and the callback invocations are +serialized against CPU hotplug operations. These functions cannot be used +from within CPU hotplug callbacks and CPU hotplug read locked regions. + +Examples +-------- + +Setup and teardown a statically allocated state in the STARTING section for +notifications on online and offline operations:: + + ret = cpuhp_setup_state(CPUHP_SUBSYS_STARTING, "subsys:starting", subsys_cpu_starting, subsys_cpu_dying); + if (ret < 0) + return ret; + .... + cpuhp_remove_state(CPUHP_SUBSYS_STARTING); + +Setup and teardown a dynamically allocated state in the ONLINE section +for notifications on offline operations:: + + state = cpuhp_setup_state(CPUHP_ONLINE_DYN, "subsys:offline", NULL, subsys_cpu_offline); + if (state < 0) + return state; + .... + cpuhp_remove_state(state); + +Setup and teardown a dynamically allocated state in the ONLINE section +for notifications on online operations without invoking the callbacks:: + + state = cpuhp_setup_state_nocalls(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, NULL); + if (state < 0) + return state; + .... + cpuhp_remove_state_nocalls(state); + +Setup, use and teardown a dynamically allocated multi-instance state in the +ONLINE section for notifications on online and offline operation:: + + state = cpuhp_setup_state_multi(CPUHP_ONLINE_DYN, "subsys:online", subsys_cpu_online, subsys_cpu_offline); + if (state < 0) + return state; + .... + ret = cpuhp_state_add_instance(state, &inst1->node); + if (ret) + return ret; + .... + ret = cpuhp_state_add_instance(state, &inst2->node); + if (ret) + return ret; + .... + cpuhp_remove_instance(state, &inst1->node); + .... + cpuhp_remove_instance(state, &inst2->node); + .... + remove_multi_state(state); + Testing of hotplug states ========================= diff --git a/Documentation/core-api/kernel-api.rst b/Documentation/core-api/kernel-api.rst index 2a7444e3a4c2..2e7186805148 100644 --- a/Documentation/core-api/kernel-api.rst +++ b/Documentation/core-api/kernel-api.rst @@ -315,6 +315,9 @@ Block Devices .. kernel-doc:: block/genhd.c :export: +.. kernel-doc:: block/bdev.c + :export: + Char devices ============ diff --git a/Documentation/cpu-freq/cpu-drivers.rst b/Documentation/cpu-freq/cpu-drivers.rst index d84ededb66f9..3b32336a7803 100644 --- a/Documentation/cpu-freq/cpu-drivers.rst +++ b/Documentation/cpu-freq/cpu-drivers.rst @@ -75,9 +75,6 @@ And optionally .resume - A pointer to a per-policy resume function which is called with interrupts disabled and _before_ the governor is started again. - .ready - A pointer to a per-policy ready function which is called after - the policy is fully initialized. - .attr - A pointer to a NULL-terminated list of "struct freq_attr" which allow to export values to sysfs. diff --git a/Documentation/dev-tools/kfence.rst b/Documentation/dev-tools/kfence.rst index fdf04e741ea5..0fbe3308bf37 100644 --- a/Documentation/dev-tools/kfence.rst +++ b/Documentation/dev-tools/kfence.rst @@ -65,25 +65,27 @@ Error reports A typical out-of-bounds access looks like this:: ================================================================== - BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa3/0x22b + BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0xa6/0x234 - Out-of-bounds read at 0xffffffffb672efff (1B left of kfence-#17): - test_out_of_bounds_read+0xa3/0x22b - kunit_try_run_case+0x51/0x85 + Out-of-bounds read at 0xffff8c3f2e291fff (1B left of kfence-#72): + test_out_of_bounds_read+0xa6/0x234 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - kfence-#17 [0xffffffffb672f000-0xffffffffb672f01f, size=32, cache=kmalloc-32] allocated by task 507: - test_alloc+0xf3/0x25b - test_out_of_bounds_read+0x98/0x22b - kunit_try_run_case+0x51/0x85 + kfence-#72: 0xffff8c3f2e292000-0xffff8c3f2e29201f, size=32, cache=kmalloc-32 + + allocated by task 484 on cpu 0 at 32.919330s: + test_alloc+0xfe/0x738 + test_out_of_bounds_read+0x9b/0x234 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - CPU: 4 PID: 107 Comm: kunit_try_catch Not tainted 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 0 PID: 484 Comm: kunit_try_catch Not tainted 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ================================================================== The header of the report provides a short summary of the function involved in @@ -96,30 +98,32 @@ Use-after-free accesses are reported as:: ================================================================== BUG: KFENCE: use-after-free read in test_use_after_free_read+0xb3/0x143 - Use-after-free read at 0xffffffffb673dfe0 (in kfence-#24): + Use-after-free read at 0xffff8c3f2e2a0000 (in kfence-#79): test_use_after_free_read+0xb3/0x143 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - kfence-#24 [0xffffffffb673dfe0-0xffffffffb673dfff, size=32, cache=kmalloc-32] allocated by task 507: - test_alloc+0xf3/0x25b + kfence-#79: 0xffff8c3f2e2a0000-0xffff8c3f2e2a001f, size=32, cache=kmalloc-32 + + allocated by task 488 on cpu 2 at 33.871326s: + test_alloc+0xfe/0x738 test_use_after_free_read+0x76/0x143 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - freed by task 507: + freed by task 488 on cpu 2 at 33.871358s: test_use_after_free_read+0xa8/0x143 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - CPU: 4 PID: 109 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 2 PID: 488 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ================================================================== KFENCE also reports on invalid frees, such as double-frees:: @@ -127,30 +131,32 @@ KFENCE also reports on invalid frees, such as double-frees:: ================================================================== BUG: KFENCE: invalid free in test_double_free+0xdc/0x171 - Invalid free of 0xffffffffb6741000: + Invalid free of 0xffff8c3f2e2a4000 (in kfence-#81): test_double_free+0xdc/0x171 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - kfence-#26 [0xffffffffb6741000-0xffffffffb674101f, size=32, cache=kmalloc-32] allocated by task 507: - test_alloc+0xf3/0x25b + kfence-#81: 0xffff8c3f2e2a4000-0xffff8c3f2e2a401f, size=32, cache=kmalloc-32 + + allocated by task 490 on cpu 1 at 34.175321s: + test_alloc+0xfe/0x738 test_double_free+0x76/0x171 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - freed by task 507: + freed by task 490 on cpu 1 at 34.175348s: test_double_free+0xa8/0x171 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - CPU: 4 PID: 111 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 1 PID: 490 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ================================================================== KFENCE also uses pattern-based redzones on the other side of an object's guard @@ -160,23 +166,25 @@ These are reported on frees:: ================================================================== BUG: KFENCE: memory corruption in test_kmalloc_aligned_oob_write+0xef/0x184 - Corrupted memory at 0xffffffffb6797ff9 [ 0xac . . . . . . ] (in kfence-#69): + Corrupted memory at 0xffff8c3f2e33aff9 [ 0xac . . . . . . ] (in kfence-#156): test_kmalloc_aligned_oob_write+0xef/0x184 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - kfence-#69 [0xffffffffb6797fb0-0xffffffffb6797ff8, size=73, cache=kmalloc-96] allocated by task 507: - test_alloc+0xf3/0x25b + kfence-#156: 0xffff8c3f2e33afb0-0xffff8c3f2e33aff8, size=73, cache=kmalloc-96 + + allocated by task 502 on cpu 7 at 42.159302s: + test_alloc+0xfe/0x738 test_kmalloc_aligned_oob_write+0x57/0x184 - kunit_try_run_case+0x51/0x85 + kunit_try_run_case+0x61/0xa0 kunit_generic_run_threadfn_adapter+0x16/0x30 - kthread+0x137/0x160 + kthread+0x176/0x1b0 ret_from_fork+0x22/0x30 - CPU: 4 PID: 120 Comm: kunit_try_catch Tainted: G W 5.8.0-rc6+ #7 - Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 + CPU: 7 PID: 502 Comm: kunit_try_catch Tainted: G B 5.13.0-rc3+ #7 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 ================================================================== For such errors, the address where the corruption occurred as well as the diff --git a/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml b/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml index 9222b06e93a0..fde07e4b119d 100644 --- a/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml +++ b/Documentation/devicetree/bindings/auxdisplay/hit,hd44780.yaml @@ -12,7 +12,10 @@ maintainers: description: The Hitachi HD44780 Character LCD Controller is commonly used on character LCDs that can display one or more lines of text. It exposes an M6800 bus - interface, which can be used in either 4-bit or 8-bit mode. + interface, which can be used in either 4-bit or 8-bit mode. By using a + GPIO expander it is possible to use the driver with one of the popular I2C + expander boards based on the PCF8574 available for these displays. For + an example see below. properties: compatible: @@ -94,3 +97,29 @@ examples: display-height-chars = <2>; display-width-chars = <16>; }; + - | + #include <dt-bindings/gpio/gpio.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + + pcf8574: pcf8574@27 { + compatible = "nxp,pcf8574"; + reg = <0x27>; + gpio-controller; + #gpio-cells = <2>; + }; + }; + hd44780 { + compatible = "hit,hd44780"; + display-height-chars = <2>; + display-width-chars = <16>; + data-gpios = <&pcf8574 4 0>, + <&pcf8574 5 0>, + <&pcf8574 6 0>, + <&pcf8574 7 0>; + enable-gpios = <&pcf8574 2 0>; + rs-gpios = <&pcf8574 0 0>; + rw-gpios = <&pcf8574 1 0>; + backlight-gpios = <&pcf8574 3 0>; + }; diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt index 56f442374383..1d7e49167666 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt @@ -11,7 +11,7 @@ Required properties: - None Optional properties: -- operating-points: Refer to Documentation/devicetree/bindings/opp/opp.txt for +- operating-points: Refer to Documentation/devicetree/bindings/opp/opp-v1.yaml for details. OPPs *must* be supplied either via DT, i.e. this property, or populated at runtime. - clock-latency: Specify the possible maximum transition latency for clock, diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek-hw.yaml b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek-hw.yaml new file mode 100644 index 000000000000..9cd42a64b13e --- /dev/null +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek-hw.yaml @@ -0,0 +1,70 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/cpufreq/cpufreq-mediatek-hw.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek's CPUFREQ Bindings + +maintainers: + - Hector Yuan <hector.yuan@mediatek.com> + +description: + CPUFREQ HW is a hardware engine used by MediaTek SoCs to + manage frequency in hardware. It is capable of controlling + frequency for multiple clusters. + +properties: + compatible: + const: mediatek,cpufreq-hw + + reg: + minItems: 1 + maxItems: 2 + description: + Addresses and sizes for the memory of the HW bases in + each frequency domain. Each entry corresponds to + a register bank for each frequency domain present. + + "#performance-domain-cells": + description: + Number of cells in a performance domain specifier. + Set const to 1 here for nodes providing multiple + performance domains. + const: 1 + +required: + - compatible + - reg + - "#performance-domain-cells" + +additionalProperties: false + +examples: + - | + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu0: cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a55"; + enable-method = "psci"; + performance-domains = <&performance 0>; + reg = <0x000>; + }; + }; + + /* ... */ + + soc { + #address-cells = <2>; + #size-cells = <2>; + + performance: performance-controller@11bc00 { + compatible = "mediatek,cpufreq-hw"; + reg = <0 0x0011bc10 0 0x120>, <0 0x0011bd30 0 0x120>; + + #performance-domain-cells = <1>; + }; + }; diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt index ef68711716fb..b8233ec91d3d 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-mediatek.txt @@ -10,7 +10,7 @@ Required properties: transition and not stable yet. Please refer to Documentation/devicetree/bindings/clock/clock-bindings.txt for generic clock consumer properties. -- operating-points-v2: Please refer to Documentation/devicetree/bindings/opp/opp.txt +- operating-points-v2: Please refer to Documentation/devicetree/bindings/opp/opp-v2.yaml for detail. - proc-supply: Regulator for Vproc of CPU cluster. diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt b/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt index d91a02a3b6b0..6b0b452acef0 100644 --- a/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-st.txt @@ -6,8 +6,6 @@ from the SoC, then supplies the OPP framework with 'prop' and 'supported hardware' information respectively. The framework is then able to read the DT and operate in the usual way. -For more information about the expected DT format [See: ../opp/opp.txt]. - Frequency Scaling only ---------------------- @@ -15,7 +13,7 @@ No vendor specific driver required for this. Located in CPU's node: -- operating-points : [See: ../power/opp.txt] +- operating-points : [See: ../power/opp-v1.yaml] Example [safe] -------------- @@ -37,7 +35,7 @@ This requires the ST CPUFreq driver to supply 'process' and 'version' info. Located in CPU's node: -- operating-points-v2 : [See ../power/opp.txt] +- operating-points-v2 : [See ../power/opp-v2.yaml] Example [unsafe] ---------------- diff --git a/Documentation/devicetree/bindings/cpufreq/nvidia,tegra20-cpufreq.txt b/Documentation/devicetree/bindings/cpufreq/nvidia,tegra20-cpufreq.txt index 52a24b82fd86..bdbfd7c36101 100644 --- a/Documentation/devicetree/bindings/cpufreq/nvidia,tegra20-cpufreq.txt +++ b/Documentation/devicetree/bindings/cpufreq/nvidia,tegra20-cpufreq.txt @@ -4,7 +4,7 @@ Binding for NVIDIA Tegra20 CPUFreq Required properties: - clocks: Must contain an entry for the CPU clock. See ../clocks/clock-bindings.txt for details. -- operating-points-v2: See ../bindings/opp/opp.txt for details. +- operating-points-v2: See ../bindings/opp/opp-v2.yaml for details. - #cooling-cells: Should be 2. See ../thermal/thermal-cooling-devices.yaml for details. For each opp entry in 'operating-points-v2' table: diff --git a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt index ac189dd82b08..3fbeb3733c48 100644 --- a/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt +++ b/Documentation/devicetree/bindings/devfreq/rk3399_dmc.txt @@ -8,7 +8,7 @@ Required properties: - clocks: Phandles for clock specified in "clock-names" property - clock-names : The name of clock used by the DFI, must be "pclk_ddr_mon"; -- operating-points-v2: Refer to Documentation/devicetree/bindings/opp/opp.txt +- operating-points-v2: Refer to Documentation/devicetree/bindings/opp/opp-v2.yaml for details. - center-supply: DMC supply node. - status: Marks the node enabled/disabled. diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml b/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml index 4265399bb154..c851770bbdf2 100644 --- a/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml +++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-7nm.yaml @@ -14,10 +14,10 @@ allOf: properties: compatible: - oneOf: - - const: qcom,dsi-phy-7nm - - const: qcom,dsi-phy-7nm-8150 - - const: qcom,sc7280-dsi-phy-7nm + enum: + - qcom,dsi-phy-7nm + - qcom,dsi-phy-7nm-8150 + - qcom,sc7280-dsi-phy-7nm reg: items: diff --git a/Documentation/devicetree/bindings/dma/altr,msgdma.yaml b/Documentation/devicetree/bindings/dma/altr,msgdma.yaml index a4f9fe23dcd9..b193ee2db4a7 100644 --- a/Documentation/devicetree/bindings/dma/altr,msgdma.yaml +++ b/Documentation/devicetree/bindings/dma/altr,msgdma.yaml @@ -24,13 +24,15 @@ properties: items: - description: Control and Status Register Slave Port - description: Descriptor Slave Port - - description: Response Slave Port + - description: Response Slave Port (Optional) + minItems: 2 reg-names: items: - const: csr - const: desc - const: resp + minItems: 2 interrupts: maxItems: 1 diff --git a/Documentation/devicetree/bindings/dma/renesas,rz-dmac.yaml b/Documentation/devicetree/bindings/dma/renesas,rz-dmac.yaml new file mode 100644 index 000000000000..7a4f415d74dc --- /dev/null +++ b/Documentation/devicetree/bindings/dma/renesas,rz-dmac.yaml @@ -0,0 +1,130 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/dma/renesas,rz-dmac.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Renesas RZ/G2L DMA Controller + +maintainers: + - Biju Das <biju.das.jz@bp.renesas.com> + +allOf: + - $ref: "dma-controller.yaml#" + +properties: + compatible: + items: + - enum: + - renesas,r9a07g044-dmac # RZ/G2{L,LC} + - const: renesas,rz-dmac + + reg: + items: + - description: Control and channel register block + - description: DMA extended resource selector block + + interrupts: + maxItems: 17 + + interrupt-names: + items: + - const: error + - const: ch0 + - const: ch1 + - const: ch2 + - const: ch3 + - const: ch4 + - const: ch5 + - const: ch6 + - const: ch7 + - const: ch8 + - const: ch9 + - const: ch10 + - const: ch11 + - const: ch12 + - const: ch13 + - const: ch14 + - const: ch15 + + clocks: + items: + - description: DMA main clock + - description: DMA register access clock + + '#dma-cells': + const: 1 + description: + The cell specifies the encoded MID/RID values of the DMAC port + connected to the DMA client and the slave channel configuration + parameters. + bits[0:9] - Specifies MID/RID value + bit[10] - Specifies DMA request high enable (HIEN) + bit[11] - Specifies DMA request detection type (LVL) + bits[12:14] - Specifies DMAACK output mode (AM) + bit[15] - Specifies Transfer Mode (TM) + + dma-channels: + const: 16 + + power-domains: + maxItems: 1 + + resets: + items: + - description: Reset for DMA ARESETN reset terminal + - description: Reset for DMA RST_ASYNC reset terminal + +required: + - compatible + - reg + - interrupts + - interrupt-names + - clocks + - '#dma-cells' + - dma-channels + - power-domains + - resets + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/clock/r9a07g044-cpg.h> + + dmac: dma-controller@11820000 { + compatible = "renesas,r9a07g044-dmac", + "renesas,rz-dmac"; + reg = <0x11820000 0x10000>, + <0x11830000 0x10000>; + interrupts = <GIC_SPI 141 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 125 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 126 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 127 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 128 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 129 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 130 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 131 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 132 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 133 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 134 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 135 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 136 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 137 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 138 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 139 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 140 IRQ_TYPE_EDGE_RISING>; + interrupt-names = "error", + "ch0", "ch1", "ch2", "ch3", + "ch4", "ch5", "ch6", "ch7", + "ch8", "ch9", "ch10", "ch11", + "ch12", "ch13", "ch14", "ch15"; + clocks = <&cpg CPG_MOD R9A07G044_DMAC_ACLK>, + <&cpg CPG_MOD R9A07G044_DMAC_PCLK>; + power-domains = <&cpg>; + resets = <&cpg R9A07G044_DMAC_ARESETN>, + <&cpg R9A07G044_DMAC_RST_ASYNC>; + #dma-cells = <1>; + dma-channels = <16>; + }; diff --git a/Documentation/devicetree/bindings/dma/st,stm32-dma.yaml b/Documentation/devicetree/bindings/dma/st,stm32-dma.yaml index 2a5325f480f6..4bf676fd25dc 100644 --- a/Documentation/devicetree/bindings/dma/st,stm32-dma.yaml +++ b/Documentation/devicetree/bindings/dma/st,stm32-dma.yaml @@ -40,6 +40,13 @@ description: | 0x0: FIFO mode with threshold selectable with bit 0-1 0x1: Direct mode: each DMA request immediately initiates a transfer from/to the memory, FIFO is bypassed. + -bit 4: alternative DMA request/acknowledge protocol + 0x0: Use standard DMA ACK management, where ACK signal is maintained + up to the removal of request and transfer completion + 0x1: Use alternative DMA ACK management, where ACK de-assertion does + not wait for the de-assertion of the REQuest, ACK is only managed + by transfer completion. This must only be used on channels + managing transfers for STM32 USART/UART. maintainers: diff --git a/Documentation/devicetree/bindings/gpio/gpio-virtio.yaml b/Documentation/devicetree/bindings/gpio/gpio-virtio.yaml new file mode 100644 index 000000000000..601d85754577 --- /dev/null +++ b/Documentation/devicetree/bindings/gpio/gpio-virtio.yaml @@ -0,0 +1,59 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/gpio/gpio-virtio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Virtio GPIO controller + +maintainers: + - Viresh Kumar <viresh.kumar@linaro.org> + +allOf: + - $ref: /schemas/virtio/virtio-device.yaml# + +description: + Virtio GPIO controller, see /schemas/virtio/virtio-device.yaml for more + details. + +properties: + $nodename: + const: gpio + + compatible: + const: virtio,device29 + + gpio-controller: true + + "#gpio-cells": + const: 2 + + interrupt-controller: true + + "#interrupt-cells": + const: 2 + +required: + - compatible + - gpio-controller + - "#gpio-cells" + +unevaluatedProperties: false + +examples: + - | + virtio@3000 { + compatible = "virtio,mmio"; + reg = <0x3000 0x100>; + interrupts = <41>; + + gpio { + compatible = "virtio,device29"; + gpio-controller; + #gpio-cells = <2>; + interrupt-controller; + #interrupt-cells = <2>; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml index c5f6092a2855..6f98dd55fb4c 100644 --- a/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml +++ b/Documentation/devicetree/bindings/gpu/arm,mali-bifrost.yaml @@ -137,7 +137,7 @@ examples: resets = <&reset 0>, <&reset 1>; }; - gpu_opp_table: opp_table0 { + gpu_opp_table: opp-table { compatible = "operating-points-v2"; opp-533000000 { diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.yaml b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.yaml index 696c17aedbbe..d209f272625d 100644 --- a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.yaml +++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.yaml @@ -160,7 +160,7 @@ examples: #cooling-cells = <2>; }; - gpu_opp_table: opp_table0 { + gpu_opp_table: opp-table { compatible = "operating-points-v2"; opp-533000000 { diff --git a/Documentation/devicetree/bindings/i2c/i2c-virtio.yaml b/Documentation/devicetree/bindings/i2c/i2c-virtio.yaml new file mode 100644 index 000000000000..7d87ed855301 --- /dev/null +++ b/Documentation/devicetree/bindings/i2c/i2c-virtio.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/i2c/i2c-virtio.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Virtio I2C Adapter + +maintainers: + - Viresh Kumar <viresh.kumar@linaro.org> + +allOf: + - $ref: /schemas/i2c/i2c-controller.yaml# + - $ref: /schemas/virtio/virtio-device.yaml# + +description: + Virtio I2C device, see /schemas/virtio/virtio-device.yaml for more details. + +properties: + $nodename: + const: i2c + + compatible: + const: virtio,device22 + +required: + - compatible + +unevaluatedProperties: false + +examples: + - | + virtio@3000 { + compatible = "virtio,mmio"; + reg = <0x3000 0x100>; + interrupts = <41>; + + i2c { + compatible = "virtio,device22"; + + #address-cells = <1>; + #size-cells = <0>; + + light-sensor@20 { + compatible = "dynaimage,al3320a"; + reg = <0x20>; + }; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/input/allwinner,sun4i-a10-lradc-keys.yaml b/Documentation/devicetree/bindings/input/allwinner,sun4i-a10-lradc-keys.yaml index cffd02028d02..d74f2002409e 100644 --- a/Documentation/devicetree/bindings/input/allwinner,sun4i-a10-lradc-keys.yaml +++ b/Documentation/devicetree/bindings/input/allwinner,sun4i-a10-lradc-keys.yaml @@ -29,6 +29,8 @@ properties: description: Regulator for the LRADC reference voltage + wakeup-source: true + patternProperties: "^button-[0-9]+$": type: object diff --git a/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt b/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt deleted file mode 100644 index 6cd08bca2c66..000000000000 --- a/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.txt +++ /dev/null @@ -1,55 +0,0 @@ -Qualcomm PM8941 PMIC Power Key - -PROPERTIES - -- compatible: - Usage: required - Value type: <string> - Definition: must be one of: - "qcom,pm8941-pwrkey" - "qcom,pm8941-resin" - "qcom,pmk8350-pwrkey" - "qcom,pmk8350-resin" - -- reg: - Usage: required - Value type: <prop-encoded-array> - Definition: base address of registers for block - -- interrupts: - Usage: required - Value type: <prop-encoded-array> - Definition: key change interrupt; The format of the specifier is - defined by the binding document describing the node's - interrupt parent. - -- debounce: - Usage: optional - Value type: <u32> - Definition: time in microseconds that key must be pressed or released - for state change interrupt to trigger. - -- bias-pull-up: - Usage: optional - Value type: <empty> - Definition: presence of this property indicates that the KPDPWR_N pin - should be configured for pull up. - -- linux,code: - Usage: optional - Value type: <u32> - Definition: The input key-code associated with the power key. - Use the linux event codes defined in - include/dt-bindings/input/linux-event-codes.h - When property is omitted KEY_POWER is assumed. - -EXAMPLE - - pwrkey@800 { - compatible = "qcom,pm8941-pwrkey"; - reg = <0x800>; - interrupts = <0x0 0x8 0 IRQ_TYPE_EDGE_BOTH>; - debounce = <15625>; - bias-pull-up; - linux,code = <KEY_POWER>; - }; diff --git a/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.yaml b/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.yaml new file mode 100644 index 000000000000..62314a5fdce5 --- /dev/null +++ b/Documentation/devicetree/bindings/input/qcom,pm8941-pwrkey.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/input/qcom,pm8941-pwrkey.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm PM8941 PMIC Power Key + +maintainers: + - Courtney Cavin <courtney.cavin@sonymobile.com> + - Vinod Koul <vkoul@kernel.org> + +allOf: + - $ref: input.yaml# + +properties: + compatible: + enum: + - qcom,pm8941-pwrkey + - qcom,pm8941-resin + - qcom,pmk8350-pwrkey + - qcom,pmk8350-resin + + interrupts: + maxItems: 1 + + debounce: + description: | + Time in microseconds that key must be pressed or + released for state change interrupt to trigger. + $ref: /schemas/types.yaml#/definitions/uint32 + + bias-pull-up: + description: | + Presence of this property indicates that the KPDPWR_N + pin should be configured for pull up. + $ref: /schemas/types.yaml#/definitions/flag + + linux,code: + description: | + The input key-code associated with the power key. + Use the linux event codes defined in + include/dt-bindings/input/linux-event-codes.h + When property is omitted KEY_POWER is assumed. + +required: + - compatible + - interrupts + +unevaluatedProperties: false +... diff --git a/Documentation/devicetree/bindings/input/regulator-haptic.txt b/Documentation/devicetree/bindings/input/regulator-haptic.txt deleted file mode 100644 index 3ed1c7eb2f97..000000000000 --- a/Documentation/devicetree/bindings/input/regulator-haptic.txt +++ /dev/null @@ -1,21 +0,0 @@ -* Regulator Haptic Device Tree Bindings - -Required Properties: - - compatible : Should be "regulator-haptic" - - haptic-supply : Power supply to the haptic motor. - [*] refer Documentation/devicetree/bindings/regulator/regulator.txt - - - max-microvolt : The maximum voltage value supplied to the haptic motor. - [The unit of the voltage is a micro] - - - min-microvolt : The minimum voltage value supplied to the haptic motor. - [The unit of the voltage is a micro] - -Example: - - haptics { - compatible = "regulator-haptic"; - haptic-supply = <&motor_regulator>; - max-microvolt = <2700000>; - min-microvolt = <1100000>; - }; diff --git a/Documentation/devicetree/bindings/input/regulator-haptic.yaml b/Documentation/devicetree/bindings/input/regulator-haptic.yaml new file mode 100644 index 000000000000..b1ae72f9cd2d --- /dev/null +++ b/Documentation/devicetree/bindings/input/regulator-haptic.yaml @@ -0,0 +1,43 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/input/regulator-haptic.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: Regulator Haptic Device Tree Bindings + +maintainers: + - Jaewon Kim <jaewon02.kim@samsung.com> + +properties: + compatible: + const: regulator-haptic + + haptic-supply: + description: > + Power supply to the haptic motor + + max-microvolt: + description: > + The maximum voltage value supplied to the haptic motor + + min-microvolt: + description: > + The minimum voltage value supplied to the haptic motor + +required: + - compatible + - haptic-supply + - max-microvolt + - min-microvolt + +additionalProperties: false + +examples: + - | + haptics { + compatible = "regulator-haptic"; + haptic-supply = <&motor_regulator>; + max-microvolt = <2700000>; + min-microvolt = <1100000>; + }; diff --git a/Documentation/devicetree/bindings/input/touchscreen/chipone,icn8318.yaml b/Documentation/devicetree/bindings/input/touchscreen/chipone,icn8318.yaml new file mode 100644 index 000000000000..9df685bdc5db --- /dev/null +++ b/Documentation/devicetree/bindings/input/touchscreen/chipone,icn8318.yaml @@ -0,0 +1,62 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/input/touchscreen/chipone,icn8318.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: ChipOne ICN8318 Touchscreen Controller Device Tree Bindings + +maintainers: + - Dmitry Torokhov <dmitry.torokhov@gmail.com> + +allOf: + - $ref: touchscreen.yaml# + +properties: + compatible: + const: chipone,icn8318 + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + wake-gpios: + maxItems: 1 + +unevaluatedProperties: false + +required: + - compatible + - reg + - interrupts + - wake-gpios + - touchscreen-size-x + - touchscreen-size-y + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + i2c { + #address-cells = <1>; + #size-cells = <0>; + + touchscreen@40 { + compatible = "chipone,icn8318"; + reg = <0x40>; + interrupt-parent = <&pio>; + interrupts = <9 IRQ_TYPE_EDGE_FALLING>; /* EINT9 (PG9) */ + pinctrl-names = "default"; + pinctrl-0 = <&ts_wake_pin_p66>; + wake-gpios = <&pio 1 3 GPIO_ACTIVE_HIGH>; /* PB3 */ + touchscreen-size-x = <800>; + touchscreen-size-y = <480>; + touchscreen-inverted-x; + touchscreen-swapped-x-y; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/input/touchscreen/chipone_icn8318.txt b/Documentation/devicetree/bindings/input/touchscreen/chipone_icn8318.txt deleted file mode 100644 index 38b0603f65f3..000000000000 --- a/Documentation/devicetree/bindings/input/touchscreen/chipone_icn8318.txt +++ /dev/null @@ -1,44 +0,0 @@ -* ChipOne icn8318 I2C touchscreen controller - -Required properties: - - compatible : "chipone,icn8318" - - reg : I2C slave address of the chip (0x40) - - interrupts : interrupt specification for the icn8318 interrupt - - wake-gpios : GPIO specification for the WAKE input - - touchscreen-size-x : horizontal resolution of touchscreen (in pixels) - - touchscreen-size-y : vertical resolution of touchscreen (in pixels) - -Optional properties: - - pinctrl-names : should be "default" - - pinctrl-0: : a phandle pointing to the pin settings for the - control gpios - - touchscreen-fuzz-x : horizontal noise value of the absolute input - device (in pixels) - - touchscreen-fuzz-y : vertical noise value of the absolute input - device (in pixels) - - touchscreen-inverted-x : X axis is inverted (boolean) - - touchscreen-inverted-y : Y axis is inverted (boolean) - - touchscreen-swapped-x-y : X and Y axis are swapped (boolean) - Swapping is done after inverting the axis - -Example: - -i2c@00000000 { - /* ... */ - - chipone_icn8318@40 { - compatible = "chipone,icn8318"; - reg = <0x40>; - interrupt-parent = <&pio>; - interrupts = <9 IRQ_TYPE_EDGE_FALLING>; /* EINT9 (PG9) */ - pinctrl-names = "default"; - pinctrl-0 = <&ts_wake_pin_p66>; - wake-gpios = <&pio 1 3 GPIO_ACTIVE_HIGH>; /* PB3 */ - touchscreen-size-x = <800>; - touchscreen-size-y = <480>; - touchscreen-inverted-x; - touchscreen-swapped-x-y; - }; - - /* ... */ -}; diff --git a/Documentation/devicetree/bindings/input/touchscreen/pixcir,pixcir_ts.yaml b/Documentation/devicetree/bindings/input/touchscreen/pixcir,pixcir_ts.yaml new file mode 100644 index 000000000000..f9998edbff70 --- /dev/null +++ b/Documentation/devicetree/bindings/input/touchscreen/pixcir,pixcir_ts.yaml @@ -0,0 +1,68 @@ +# SPDX-License-Identifier: GPL-2.0 +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/input/touchscreen/pixcir,pixcir_ts.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Pixcir Touchscreen Controller Device Tree Bindings + +maintainers: + - Dmitry Torokhov <dmitry.torokhov@gmail.com> + +allOf: + - $ref: touchscreen.yaml# + +properties: + compatible: + enum: + - pixcir,pixcir_ts + - pixcir,pixcir_tangoc + + reg: + maxItems: 1 + + interrupts: + maxItems: 1 + + attb-gpio: + maxItems: 1 + + reset-gpios: + maxItems: 1 + + enable-gpios: + maxItems: 1 + + wake-gpios: + maxItems: 1 + +unevaluatedProperties: false + +required: + - compatible + - reg + - interrupts + - attb-gpio + - touchscreen-size-x + - touchscreen-size-y + +examples: + - | + #include <dt-bindings/gpio/gpio.h> + #include <dt-bindings/interrupt-controller/arm-gic.h> + + i2c { + #address-cells = <1>; + #size-cells = <0>; + + touchscreen@5c { + compatible = "pixcir,pixcir_ts"; + reg = <0x5c>; + interrupts = <2 0>; + attb-gpio = <&gpf 2 0 2>; + touchscreen-size-x = <800>; + touchscreen-size-y = <600>; + }; + }; + +... diff --git a/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt b/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt deleted file mode 100644 index 697a3e7831e7..000000000000 --- a/Documentation/devicetree/bindings/input/touchscreen/pixcir_i2c_ts.txt +++ /dev/null @@ -1,31 +0,0 @@ -* Pixcir I2C touchscreen controllers - -Required properties: -- compatible: must be "pixcir,pixcir_ts" or "pixcir,pixcir_tangoc" -- reg: I2C address of the chip -- interrupts: interrupt to which the chip is connected -- attb-gpio: GPIO connected to the ATTB line of the chip -- touchscreen-size-x: horizontal resolution of touchscreen (in pixels) -- touchscreen-size-y: vertical resolution of touchscreen (in pixels) - -Optional properties: -- reset-gpios: GPIO connected to the RESET line of the chip -- enable-gpios: GPIO connected to the ENABLE line of the chip -- wake-gpios: GPIO connected to the WAKE line of the chip - -Example: - - i2c@00000000 { - /* ... */ - - pixcir_ts@5c { - compatible = "pixcir,pixcir_ts"; - reg = <0x5c>; - interrupts = <2 0>; - attb-gpio = <&gpf 2 0 2>; - touchscreen-size-x = <800>; - touchscreen-size-y = <600>; - }; - - /* ... */ - }; diff --git a/Documentation/devicetree/bindings/input/touchscreen/ti,tsc2005.yaml b/Documentation/devicetree/bindings/input/touchscreen/ti,tsc2005.yaml new file mode 100644 index 000000000000..938aab016cc2 --- /dev/null +++ b/Documentation/devicetree/bindings/input/touchscreen/ti,tsc2005.yaml @@ -0,0 +1,128 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/input/touchscreen/ti,tsc2005.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Texas Instruments TSC2004 and TSC2005 touchscreen controller bindings + +maintainers: + - Marek Vasut <marex@denx.de> + - Michael Welling <mwelling@ieee.org> + +properties: + $nodename: + pattern: "^touchscreen(@.*)?$" + + compatible: + enum: + - ti,tsc2004 + - ti,tsc2005 + + reg: + maxItems: 1 + description: | + I2C address when used on the I2C bus, or the SPI chip select index + when used on the SPI bus + + interrupts: + maxItems: 1 + + reset-gpios: + maxItems: 1 + description: GPIO specifier for the controller reset line + + spi-max-frequency: + description: TSC2005 SPI bus clock frequency. + maximum: 25000000 + + ti,x-plate-ohms: + description: resistance of the touchscreen's X plates in ohm (defaults to 280) + + ti,esd-recovery-timeout-ms: + description: | + if the touchscreen does not respond after the configured time + (in milli seconds), the driver will reset it. This is disabled + by default. + + vio-supply: + description: Regulator specifier + + touchscreen-fuzz-pressure: true + touchscreen-fuzz-x: true + touchscreen-fuzz-y: true + touchscreen-max-pressure: true + touchscreen-size-x: true + touchscreen-size-y: true + +allOf: + - $ref: touchscreen.yaml# + - if: + properties: + compatible: + contains: + const: ti,tsc2004 + then: + properties: + spi-max-frequency: false + +additionalProperties: false + +required: + - compatible + - reg + - interrupts + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/gpio/gpio.h> + i2c { + #address-cells = <1>; + #size-cells = <0>; + touchscreen@48 { + compatible = "ti,tsc2004"; + reg = <0x48>; + vio-supply = <&vio>; + + reset-gpios = <&gpio4 8 GPIO_ACTIVE_HIGH>; + interrupts-extended = <&gpio1 27 IRQ_TYPE_EDGE_RISING>; + + touchscreen-fuzz-x = <4>; + touchscreen-fuzz-y = <7>; + touchscreen-fuzz-pressure = <2>; + touchscreen-size-x = <4096>; + touchscreen-size-y = <4096>; + touchscreen-max-pressure = <2048>; + + ti,x-plate-ohms = <280>; + ti,esd-recovery-timeout-ms = <8000>; + }; + }; + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/gpio/gpio.h> + spi { + #address-cells = <1>; + #size-cells = <0>; + touchscreen@0 { + compatible = "ti,tsc2005"; + spi-max-frequency = <6000000>; + reg = <0>; + + vio-supply = <&vio>; + + reset-gpios = <&gpio4 8 GPIO_ACTIVE_HIGH>; /* 104 */ + interrupts-extended = <&gpio4 4 IRQ_TYPE_EDGE_RISING>; /* 100 */ + + touchscreen-fuzz-x = <4>; + touchscreen-fuzz-y = <7>; + touchscreen-fuzz-pressure = <2>; + touchscreen-size-x = <4096>; + touchscreen-size-y = <4096>; + touchscreen-max-pressure = <2048>; + + ti,x-plate-ohms = <280>; + ti,esd-recovery-timeout-ms = <8000>; + }; + }; diff --git a/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt b/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt deleted file mode 100644 index b80c04b0e5c0..000000000000 --- a/Documentation/devicetree/bindings/input/touchscreen/tsc2005.txt +++ /dev/null @@ -1,64 +0,0 @@ -* Texas Instruments tsc2004 and tsc2005 touchscreen controllers - -Required properties: - - compatible : "ti,tsc2004" or "ti,tsc2005" - - reg : Device address - - interrupts : IRQ specifier - - spi-max-frequency : Maximum SPI clocking speed of the device - (for tsc2005) - -Optional properties: - - vio-supply : Regulator specifier - - reset-gpios : GPIO specifier for the controller reset line - - ti,x-plate-ohms : integer, resistance of the touchscreen's X plates - in ohm (defaults to 280) - - ti,esd-recovery-timeout-ms : integer, if the touchscreen does not respond after - the configured time (in milli seconds), the driver - will reset it. This is disabled by default. - - properties defined in touchscreen.txt - -Example: - -&i2c3 { - tsc2004@48 { - compatible = "ti,tsc2004"; - reg = <0x48>; - vio-supply = <&vio>; - - reset-gpios = <&gpio4 8 GPIO_ACTIVE_HIGH>; - interrupts-extended = <&gpio1 27 IRQ_TYPE_EDGE_RISING>; - - touchscreen-fuzz-x = <4>; - touchscreen-fuzz-y = <7>; - touchscreen-fuzz-pressure = <2>; - touchscreen-size-x = <4096>; - touchscreen-size-y = <4096>; - touchscreen-max-pressure = <2048>; - - ti,x-plate-ohms = <280>; - ti,esd-recovery-timeout-ms = <8000>; - }; -} - -&mcspi1 { - tsc2005@0 { - compatible = "ti,tsc2005"; - spi-max-frequency = <6000000>; - reg = <0>; - - vio-supply = <&vio>; - - reset-gpios = <&gpio4 8 GPIO_ACTIVE_HIGH>; /* 104 */ - interrupts-extended = <&gpio4 4 IRQ_TYPE_EDGE_RISING>; /* 100 */ - - touchscreen-fuzz-x = <4>; - touchscreen-fuzz-y = <7>; - touchscreen-fuzz-pressure = <2>; - touchscreen-size-x = <4096>; - touchscreen-size-y = <4096>; - touchscreen-max-pressure = <2048>; - - ti,x-plate-ohms = <280>; - ti,esd-recovery-timeout-ms = <8000>; - }; -} diff --git a/Documentation/devicetree/bindings/interconnect/fsl,imx8m-noc.yaml b/Documentation/devicetree/bindings/interconnect/fsl,imx8m-noc.yaml index a8873739d61a..b8204ed22dd5 100644 --- a/Documentation/devicetree/bindings/interconnect/fsl,imx8m-noc.yaml +++ b/Documentation/devicetree/bindings/interconnect/fsl,imx8m-noc.yaml @@ -81,10 +81,10 @@ examples: noc_opp_table: opp-table { compatible = "operating-points-v2"; - opp-133M { + opp-133333333 { opp-hz = /bits/ 64 <133333333>; }; - opp-800M { + opp-800000000 { opp-hz = /bits/ 64 <800000000>; }; }; diff --git a/Documentation/devicetree/bindings/opp/allwinner,sun50i-h6-operating-points.yaml b/Documentation/devicetree/bindings/opp/allwinner,sun50i-h6-operating-points.yaml index aeff2bd774dd..729ae97b63d9 100644 --- a/Documentation/devicetree/bindings/opp/allwinner,sun50i-h6-operating-points.yaml +++ b/Documentation/devicetree/bindings/opp/allwinner,sun50i-h6-operating-points.yaml @@ -18,6 +18,9 @@ description: | sun50i-cpufreq-nvmem driver reads the efuse value from the SoC to provide the OPP framework with required information. +allOf: + - $ref: opp-v2-base.yaml# + properties: compatible: const: allwinner,sun50i-h6-operating-points @@ -43,6 +46,7 @@ patternProperties: properties: opp-hz: true + clock-latency-ns: true patternProperties: "opp-microvolt-.*": true diff --git a/Documentation/devicetree/bindings/opp/opp-v1.yaml b/Documentation/devicetree/bindings/opp/opp-v1.yaml new file mode 100644 index 000000000000..d585d536a3fb --- /dev/null +++ b/Documentation/devicetree/bindings/opp/opp-v1.yaml @@ -0,0 +1,51 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/opp/opp-v1.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Generic OPP (Operating Performance Points) v1 Bindings + +maintainers: + - Viresh Kumar <viresh.kumar@linaro.org> + +description: |+ + Devices work at voltage-current-frequency combinations and some implementations + have the liberty of choosing these. These combinations are called Operating + Performance Points aka OPPs. This document defines bindings for these OPPs + applicable across wide range of devices. For illustration purpose, this document + uses CPU as a device. + + This binding only supports voltage-frequency pairs. + +select: true + +properties: + operating-points: + $ref: /schemas/types.yaml#/definitions/uint32-matrix + items: + items: + - description: Frequency in kHz + - description: Voltage for OPP in uV + + +additionalProperties: true +examples: + - | + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a9"; + device_type = "cpu"; + reg = <0>; + next-level-cache = <&L2>; + operating-points = + /* kHz uV */ + <792000 1100000>, + <396000 950000>, + <198000 850000>; + }; + }; +... diff --git a/Documentation/devicetree/bindings/opp/opp-v2-base.yaml b/Documentation/devicetree/bindings/opp/opp-v2-base.yaml new file mode 100644 index 000000000000..ae3ae4d39843 --- /dev/null +++ b/Documentation/devicetree/bindings/opp/opp-v2-base.yaml @@ -0,0 +1,214 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/opp/opp-v2-base.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Generic OPP (Operating Performance Points) Common Binding + +maintainers: + - Viresh Kumar <viresh.kumar@linaro.org> + +description: | + Devices work at voltage-current-frequency combinations and some implementations + have the liberty of choosing these. These combinations are called Operating + Performance Points aka OPPs. This document defines bindings for these OPPs + applicable across wide range of devices. For illustration purpose, this document + uses CPU as a device. + + This describes the OPPs belonging to a device. + +select: false + +properties: + $nodename: + pattern: '^opp-table(-[a-z0-9]+)?$' + + opp-shared: + description: + Indicates that device nodes using this OPP Table Node's phandle switch + their DVFS state together, i.e. they share clock/voltage/current lines. + Missing property means devices have independent clock/voltage/current + lines, but they share OPP tables. + type: boolean + +patternProperties: + '^opp-?[0-9]+$': + type: object + description: + One or more OPP nodes describing voltage-current-frequency combinations. + Their name isn't significant but their phandle can be used to reference an + OPP. These are mandatory except for the case where the OPP table is + present only to indicate dependency between devices using the opp-shared + property. + + properties: + opp-hz: + description: + Frequency in Hz, expressed as a 64-bit big-endian integer. This is a + required property for all device nodes, unless another "required" + property to uniquely identify the OPP nodes exists. Devices like power + domains must have another (implementation dependent) property. + + opp-microvolt: + description: | + Voltage for the OPP + + A single regulator's voltage is specified with an array of size one or three. + Single entry is for target voltage and three entries are for <target min max> + voltages. + + Entries for multiple regulators shall be provided in the same field separated + by angular brackets <>. The OPP binding doesn't provide any provisions to + relate the values to their power supplies or the order in which the supplies + need to be configured and that is left for the implementation specific + binding. + + Entries for all regulators shall be of the same size, i.e. either all use a + single value or triplets. + minItems: 1 + maxItems: 8 # Should be enough regulators + items: + minItems: 1 + maxItems: 3 + + opp-microamp: + description: | + The maximum current drawn by the device in microamperes considering + system specific parameters (such as transients, process, aging, + maximum operating temperature range etc.) as necessary. This may be + used to set the most efficient regulator operating mode. + + Should only be set if opp-microvolt or opp-microvolt-<name> is set for + the OPP. + + Entries for multiple regulators shall be provided in the same field + separated by angular brackets <>. If current values aren't required + for a regulator, then it shall be filled with 0. If current values + aren't required for any of the regulators, then this field is not + required. The OPP binding doesn't provide any provisions to relate the + values to their power supplies or the order in which the supplies need + to be configured and that is left for the implementation specific + binding. + minItems: 1 + maxItems: 8 # Should be enough regulators + + opp-level: + description: + A value representing the performance level of the device. + $ref: /schemas/types.yaml#/definitions/uint32 + + opp-peak-kBps: + description: + Peak bandwidth in kilobytes per second, expressed as an array of + 32-bit big-endian integers. Each element of the array represents the + peak bandwidth value of each interconnect path. The number of elements + should match the number of interconnect paths. + minItems: 1 + maxItems: 32 # Should be enough + + opp-avg-kBps: + description: + Average bandwidth in kilobytes per second, expressed as an array + of 32-bit big-endian integers. Each element of the array represents the + average bandwidth value of each interconnect path. The number of elements + should match the number of interconnect paths. This property is only + meaningful in OPP tables where opp-peak-kBps is present. + minItems: 1 + maxItems: 32 # Should be enough + + clock-latency-ns: + description: + Specifies the maximum possible transition latency (in nanoseconds) for + switching to this OPP from any other OPP. + + turbo-mode: + description: + Marks the OPP to be used only for turbo modes. Turbo mode is available + on some platforms, where the device can run over its operating + frequency for a short duration of time limited by the device's power, + current and thermal limits. + type: boolean + + opp-suspend: + description: + Marks the OPP to be used during device suspend. If multiple OPPs in + the table have this, the OPP with highest opp-hz will be used. + type: boolean + + opp-supported-hw: + description: | + This property allows a platform to enable only a subset of the OPPs + from the larger set present in the OPP table, based on the current + version of the hardware (already known to the operating system). + + Each block present in the array of blocks in this property, represents + a sub-group of hardware versions supported by the OPP. i.e. <sub-group + A>, <sub-group B>, etc. The OPP will be enabled if _any_ of these + sub-groups match the hardware's version. + + Each sub-group is a platform defined array representing the hierarchy + of hardware versions supported by the platform. For a platform with + three hierarchical levels of version (X.Y.Z), this field shall look + like + + opp-supported-hw = <X1 Y1 Z1>, <X2 Y2 Z2>, <X3 Y3 Z3>. + + Each level (eg. X1) in version hierarchy is represented by a 32 bit + value, one bit per version and so there can be maximum 32 versions per + level. Logical AND (&) operation is performed for each level with the + hardware's level version and a non-zero output for _all_ the levels in + a sub-group means the OPP is supported by hardware. A value of + 0xFFFFFFFF for each level in the sub-group will enable the OPP for all + versions for the hardware. + $ref: /schemas/types.yaml#/definitions/uint32-matrix + maxItems: 32 + items: + minItems: 1 + maxItems: 4 + + required-opps: + description: + This contains phandle to an OPP node in another device's OPP table. It + may contain an array of phandles, where each phandle points to an OPP + of a different device. It should not contain multiple phandles to the + OPP nodes in the same OPP table. This specifies the minimum required + OPP of the device(s), whose OPP's phandle is present in this property, + for the functioning of the current device at the current OPP (where + this property is present). + $ref: /schemas/types.yaml#/definitions/phandle-array + + patternProperties: + '^opp-microvolt-': + description: + Named opp-microvolt property. This is exactly similar to the above + opp-microvolt property, but allows multiple voltage ranges to be + provided for the same OPP. At runtime, the platform can pick a <name> + and matching opp-microvolt-<name> property will be enabled for all + OPPs. If the platform doesn't pick a specific <name> or the <name> + doesn't match with any opp-microvolt-<name> properties, then + opp-microvolt property shall be used, if present. + $ref: /schemas/types.yaml#/definitions/uint32-matrix + minItems: 1 + maxItems: 8 # Should be enough regulators + items: + minItems: 1 + maxItems: 3 + + '^opp-microamp-': + description: + Named opp-microamp property. Similar to opp-microvolt-<name> property, + but for microamp instead. + $ref: /schemas/types.yaml#/definitions/uint32-array + minItems: 1 + maxItems: 8 # Should be enough regulators + + dependencies: + opp-avg-kBps: [ opp-peak-kBps ] + +required: + - compatible + +additionalProperties: true + +... diff --git a/Documentation/devicetree/bindings/opp/opp-v2.yaml b/Documentation/devicetree/bindings/opp/opp-v2.yaml new file mode 100644 index 000000000000..eaf8fba2c691 --- /dev/null +++ b/Documentation/devicetree/bindings/opp/opp-v2.yaml @@ -0,0 +1,475 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/opp/opp-v2.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Generic OPP (Operating Performance Points) Bindings + +maintainers: + - Viresh Kumar <viresh.kumar@linaro.org> + +allOf: + - $ref: opp-v2-base.yaml# + +properties: + compatible: + const: operating-points-v2 + +unevaluatedProperties: false + +examples: + - | + /* + * Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states + * together. + */ + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a9"; + device_type = "cpu"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp_table0>; + }; + + cpu@1 { + compatible = "arm,cortex-a9"; + device_type = "cpu"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu0_opp_table0>; + }; + }; + + cpu0_opp_table0: opp-table { + compatible = "operating-points-v2"; + opp-shared; + + opp-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <975000 970000 985000>; + opp-microamp = <70000>; + clock-latency-ns = <300000>; + opp-suspend; + }; + opp-1100000000 { + opp-hz = /bits/ 64 <1100000000>; + opp-microvolt = <1000000 980000 1010000>; + opp-microamp = <80000>; + clock-latency-ns = <310000>; + }; + opp-1200000000 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt = <1025000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; + + - | + /* + * Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states + * independently. + */ + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "qcom,krait"; + device_type = "cpu"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply0>; + operating-points-v2 = <&cpu_opp_table>; + }; + + cpu@1 { + compatible = "qcom,krait"; + device_type = "cpu"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply1>; + operating-points-v2 = <&cpu_opp_table>; + }; + + cpu@2 { + compatible = "qcom,krait"; + device_type = "cpu"; + reg = <2>; + next-level-cache = <&L2>; + clocks = <&clk_controller 2>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply2>; + operating-points-v2 = <&cpu_opp_table>; + }; + + cpu@3 { + compatible = "qcom,krait"; + device_type = "cpu"; + reg = <3>; + next-level-cache = <&L2>; + clocks = <&clk_controller 3>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply3>; + operating-points-v2 = <&cpu_opp_table>; + }; + }; + + cpu_opp_table: opp-table { + compatible = "operating-points-v2"; + + /* + * Missing opp-shared property means CPUs switch DVFS states + * independently. + */ + + opp-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <975000 970000 985000>; + opp-microamp = <70000>; + clock-latency-ns = <300000>; + opp-suspend; + }; + opp-1100000000 { + opp-hz = /bits/ 64 <1100000000>; + opp-microvolt = <1000000 980000 1010000>; + opp-microamp = <80000>; + clock-latency-ns = <310000>; + }; + opp-1200000000 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt = <1025000>; + opp-microamp = <90000>; + lock-latency-ns = <290000>; + turbo-mode; + }; + }; + + - | + /* + * Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch + * DVFS state together. + */ + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + device_type = "cpu"; + reg = <0>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply0>; + operating-points-v2 = <&cluster0_opp>; + }; + + cpu@1 { + compatible = "arm,cortex-a7"; + device_type = "cpu"; + reg = <1>; + next-level-cache = <&L2>; + clocks = <&clk_controller 0>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply0>; + operating-points-v2 = <&cluster0_opp>; + }; + + cpu@100 { + compatible = "arm,cortex-a15"; + device_type = "cpu"; + reg = <100>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply1>; + operating-points-v2 = <&cluster1_opp>; + }; + + cpu@101 { + compatible = "arm,cortex-a15"; + device_type = "cpu"; + reg = <101>; + next-level-cache = <&L2>; + clocks = <&clk_controller 1>; + clock-names = "cpu"; + cpu-supply = <&cpu_supply1>; + operating-points-v2 = <&cluster1_opp>; + }; + }; + + cluster0_opp: opp-table-0 { + compatible = "operating-points-v2"; + opp-shared; + + opp-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <975000 970000 985000>; + opp-microamp = <70000>; + clock-latency-ns = <300000>; + opp-suspend; + }; + opp-1100000000 { + opp-hz = /bits/ 64 <1100000000>; + opp-microvolt = <1000000 980000 1010000>; + opp-microamp = <80000>; + clock-latency-ns = <310000>; + }; + opp-1200000000 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt = <1025000>; + opp-microamp = <90000>; + clock-latency-ns = <290000>; + turbo-mode; + }; + }; + + cluster1_opp: opp-table-1 { + compatible = "operating-points-v2"; + opp-shared; + + opp-1300000000 { + opp-hz = /bits/ 64 <1300000000>; + opp-microvolt = <1050000 1045000 1055000>; + opp-microamp = <95000>; + clock-latency-ns = <400000>; + opp-suspend; + }; + opp-1400000000 { + opp-hz = /bits/ 64 <1400000000>; + opp-microvolt = <1075000>; + opp-microamp = <100000>; + clock-latency-ns = <400000>; + }; + opp-1500000000 { + opp-hz = /bits/ 64 <1500000000>; + opp-microvolt = <1100000 1010000 1110000>; + opp-microamp = <95000>; + clock-latency-ns = <400000>; + turbo-mode; + }; + }; + + - | + /* Example 4: Handling multiple regulators */ + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "foo,cpu-type"; + device_type = "cpu"; + reg = <0>; + + vcc0-supply = <&cpu_supply0>; + vcc1-supply = <&cpu_supply1>; + vcc2-supply = <&cpu_supply2>; + operating-points-v2 = <&cpu0_opp_table4>; + }; + }; + + cpu0_opp_table4: opp-table-0 { + compatible = "operating-points-v2"; + opp-shared; + + opp-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt = <970000>, /* Supply 0 */ + <960000>, /* Supply 1 */ + <960000>; /* Supply 2 */ + opp-microamp = <70000>, /* Supply 0 */ + <70000>, /* Supply 1 */ + <70000>; /* Supply 2 */ + clock-latency-ns = <300000>; + }; + + /* OR */ + + opp-1000000001 { + opp-hz = /bits/ 64 <1000000001>; + opp-microvolt = <975000 970000 985000>, /* Supply 0 */ + <965000 960000 975000>, /* Supply 1 */ + <965000 960000 975000>; /* Supply 2 */ + opp-microamp = <70000>, /* Supply 0 */ + <70000>, /* Supply 1 */ + <70000>; /* Supply 2 */ + clock-latency-ns = <300000>; + }; + + /* OR */ + + opp-1000000002 { + opp-hz = /bits/ 64 <1000000002>; + opp-microvolt = <975000 970000 985000>, /* Supply 0 */ + <965000 960000 975000>, /* Supply 1 */ + <965000 960000 975000>; /* Supply 2 */ + opp-microamp = <70000>, /* Supply 0 */ + <0>, /* Supply 1 doesn't need this */ + <70000>; /* Supply 2 */ + clock-latency-ns = <300000>; + }; + }; + + - | + /* + * Example 5: opp-supported-hw + * (example: three level hierarchy of versions: cuts, substrate and process) + */ + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + device_type = "cpu"; + reg = <0>; + cpu-supply = <&cpu_supply>; + operating-points-v2 = <&cpu0_opp_table_slow>; + }; + }; + + cpu0_opp_table_slow: opp-table { + compatible = "operating-points-v2"; + opp-shared; + + opp-600000000 { + /* + * Supports all substrate and process versions for 0xF + * cuts, i.e. only first four cuts. + */ + opp-supported-hw = <0xF 0xFFFFFFFF 0xFFFFFFFF>; + opp-hz = /bits/ 64 <600000000>; + }; + + opp-800000000 { + /* + * Supports: + * - cuts: only one, 6th cut (represented by 6th bit). + * - substrate: supports 16 different substrate versions + * - process: supports 9 different process versions + */ + opp-supported-hw = <0x20 0xff0000ff 0x0000f4f0>; + opp-hz = /bits/ 64 <800000000>; + }; + + opp-900000000 { + /* + * Supports: + * - All cuts and substrate where process version is 0x2. + * - All cuts and process where substrate version is 0x2. + */ + opp-supported-hw = <0xFFFFFFFF 0xFFFFFFFF 0x02>, + <0xFFFFFFFF 0x01 0xFFFFFFFF>; + opp-hz = /bits/ 64 <900000000>; + }; + }; + + - | + /* + * Example 6: opp-microvolt-<name>, opp-microamp-<name>: + * (example: device with two possible microvolt ranges: slow and fast) + */ + cpus { + #address-cells = <1>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a7"; + device_type = "cpu"; + reg = <0>; + operating-points-v2 = <&cpu0_opp_table6>; + }; + }; + + cpu0_opp_table6: opp-table-0 { + compatible = "operating-points-v2"; + opp-shared; + + opp-1000000000 { + opp-hz = /bits/ 64 <1000000000>; + opp-microvolt-slow = <915000 900000 925000>; + opp-microvolt-fast = <975000 970000 985000>; + opp-microamp-slow = <70000>; + opp-microamp-fast = <71000>; + }; + + opp-1200000000 { + opp-hz = /bits/ 64 <1200000000>; + opp-microvolt-slow = <915000 900000 925000>, /* Supply vcc0 */ + <925000 910000 935000>; /* Supply vcc1 */ + opp-microvolt-fast = <975000 970000 985000>, /* Supply vcc0 */ + <965000 960000 975000>; /* Supply vcc1 */ + opp-microamp = <70000>; /* Will be used for both slow/fast */ + }; + }; + + - | + /* + * Example 7: Single cluster Quad-core ARM cortex A53, OPP points from firmware, + * distinct clock controls but two sets of clock/voltage/current lines. + */ + cpus { + #address-cells = <2>; + #size-cells = <0>; + + cpu@0 { + compatible = "arm,cortex-a53"; + device_type = "cpu"; + reg = <0x0 0x100>; + next-level-cache = <&A53_L2>; + clocks = <&dvfs_controller 0>; + operating-points-v2 = <&cpu_opp0_table>; + }; + cpu@1 { + compatible = "arm,cortex-a53"; + device_type = "cpu"; + reg = <0x0 0x101>; + next-level-cache = <&A53_L2>; + clocks = <&dvfs_controller 1>; + operating-points-v2 = <&cpu_opp0_table>; + }; + cpu@2 { + compatible = "arm,cortex-a53"; + device_type = "cpu"; + reg = <0x0 0x102>; + next-level-cache = <&A53_L2>; + clocks = <&dvfs_controller 2>; + operating-points-v2 = <&cpu_opp1_table>; + }; + cpu@3 { + compatible = "arm,cortex-a53"; + device_type = "cpu"; + reg = <0x0 0x103>; + next-level-cache = <&A53_L2>; + clocks = <&dvfs_controller 3>; + operating-points-v2 = <&cpu_opp1_table>; + }; + + }; + + cpu_opp0_table: opp-table-0 { + compatible = "operating-points-v2"; + opp-shared; + }; + + cpu_opp1_table: opp-table-1 { + compatible = "operating-points-v2"; + opp-shared; + }; +... diff --git a/Documentation/devicetree/bindings/opp/opp.txt b/Documentation/devicetree/bindings/opp/opp.txt deleted file mode 100644 index 08b3da4736cf..000000000000 --- a/Documentation/devicetree/bindings/opp/opp.txt +++ /dev/null @@ -1,622 +0,0 @@ -Generic OPP (Operating Performance Points) Bindings ----------------------------------------------------- - -Devices work at voltage-current-frequency combinations and some implementations -have the liberty of choosing these. These combinations are called Operating -Performance Points aka OPPs. This document defines bindings for these OPPs -applicable across wide range of devices. For illustration purpose, this document -uses CPU as a device. - -This document contain multiple versions of OPP binding and only one of them -should be used per device. - -Binding 1: operating-points -============================ - -This binding only supports voltage-frequency pairs. - -Properties: -- operating-points: An array of 2-tuples items, and each item consists - of frequency and voltage like <freq-kHz vol-uV>. - freq: clock frequency in kHz - vol: voltage in microvolt - -Examples: - -cpu@0 { - compatible = "arm,cortex-a9"; - reg = <0>; - next-level-cache = <&L2>; - operating-points = < - /* kHz uV */ - 792000 1100000 - 396000 950000 - 198000 850000 - >; -}; - - -Binding 2: operating-points-v2 -============================ - -* Property: operating-points-v2 - -Devices supporting OPPs must set their "operating-points-v2" property with -phandle to a OPP table in their DT node. The OPP core will use this phandle to -find the operating points for the device. - -This can contain more than one phandle for power domain providers that provide -multiple power domains. That is, one phandle for each power domain. If only one -phandle is available, then the same OPP table will be used for all power domains -provided by the power domain provider. - -If required, this can be extended for SoC vendor specific bindings. Such bindings -should be documented as Documentation/devicetree/bindings/power/<vendor>-opp.txt -and should have a compatible description like: "operating-points-v2-<vendor>". - -* OPP Table Node - -This describes the OPPs belonging to a device. This node can have following -properties: - -Required properties: -- compatible: Allow OPPs to express their compatibility. It should be: - "operating-points-v2". - -- OPP nodes: One or more OPP nodes describing voltage-current-frequency - combinations. Their name isn't significant but their phandle can be used to - reference an OPP. These are mandatory except for the case where the OPP table - is present only to indicate dependency between devices using the opp-shared - property. - -Optional properties: -- opp-shared: Indicates that device nodes using this OPP Table Node's phandle - switch their DVFS state together, i.e. they share clock/voltage/current lines. - Missing property means devices have independent clock/voltage/current lines, - but they share OPP tables. - -- status: Marks the OPP table enabled/disabled. - - -* OPP Node - -This defines voltage-current-frequency combinations along with other related -properties. - -Required properties: -- opp-hz: Frequency in Hz, expressed as a 64-bit big-endian integer. This is a - required property for all device nodes, unless another "required" property to - uniquely identify the OPP nodes exists. Devices like power domains must have - another (implementation dependent) property. - -- opp-peak-kBps: Peak bandwidth in kilobytes per second, expressed as an array - of 32-bit big-endian integers. Each element of the array represents the - peak bandwidth value of each interconnect path. The number of elements should - match the number of interconnect paths. - -Optional properties: -- opp-microvolt: voltage in micro Volts. - - A single regulator's voltage is specified with an array of size one or three. - Single entry is for target voltage and three entries are for <target min max> - voltages. - - Entries for multiple regulators shall be provided in the same field separated - by angular brackets <>. The OPP binding doesn't provide any provisions to - relate the values to their power supplies or the order in which the supplies - need to be configured and that is left for the implementation specific - binding. - - Entries for all regulators shall be of the same size, i.e. either all use a - single value or triplets. - -- opp-microvolt-<name>: Named opp-microvolt property. This is exactly similar to - the above opp-microvolt property, but allows multiple voltage ranges to be - provided for the same OPP. At runtime, the platform can pick a <name> and - matching opp-microvolt-<name> property will be enabled for all OPPs. If the - platform doesn't pick a specific <name> or the <name> doesn't match with any - opp-microvolt-<name> properties, then opp-microvolt property shall be used, if - present. - -- opp-microamp: The maximum current drawn by the device in microamperes - considering system specific parameters (such as transients, process, aging, - maximum operating temperature range etc.) as necessary. This may be used to - set the most efficient regulator operating mode. - - Should only be set if opp-microvolt is set for the OPP. - - Entries for multiple regulators shall be provided in the same field separated - by angular brackets <>. If current values aren't required for a regulator, - then it shall be filled with 0. If current values aren't required for any of - the regulators, then this field is not required. The OPP binding doesn't - provide any provisions to relate the values to their power supplies or the - order in which the supplies need to be configured and that is left for the - implementation specific binding. - -- opp-microamp-<name>: Named opp-microamp property. Similar to - opp-microvolt-<name> property, but for microamp instead. - -- opp-level: A value representing the performance level of the device, - expressed as a 32-bit integer. - -- opp-avg-kBps: Average bandwidth in kilobytes per second, expressed as an array - of 32-bit big-endian integers. Each element of the array represents the - average bandwidth value of each interconnect path. The number of elements - should match the number of interconnect paths. This property is only - meaningful in OPP tables where opp-peak-kBps is present. - -- clock-latency-ns: Specifies the maximum possible transition latency (in - nanoseconds) for switching to this OPP from any other OPP. - -- turbo-mode: Marks the OPP to be used only for turbo modes. Turbo mode is - available on some platforms, where the device can run over its operating - frequency for a short duration of time limited by the device's power, current - and thermal limits. - -- opp-suspend: Marks the OPP to be used during device suspend. If multiple OPPs - in the table have this, the OPP with highest opp-hz will be used. - -- opp-supported-hw: This property allows a platform to enable only a subset of - the OPPs from the larger set present in the OPP table, based on the current - version of the hardware (already known to the operating system). - - Each block present in the array of blocks in this property, represents a - sub-group of hardware versions supported by the OPP. i.e. <sub-group A>, - <sub-group B>, etc. The OPP will be enabled if _any_ of these sub-groups match - the hardware's version. - - Each sub-group is a platform defined array representing the hierarchy of - hardware versions supported by the platform. For a platform with three - hierarchical levels of version (X.Y.Z), this field shall look like - - opp-supported-hw = <X1 Y1 Z1>, <X2 Y2 Z2>, <X3 Y3 Z3>. - - Each level (eg. X1) in version hierarchy is represented by a 32 bit value, one - bit per version and so there can be maximum 32 versions per level. Logical AND - (&) operation is performed for each level with the hardware's level version - and a non-zero output for _all_ the levels in a sub-group means the OPP is - supported by hardware. A value of 0xFFFFFFFF for each level in the sub-group - will enable the OPP for all versions for the hardware. - -- status: Marks the node enabled/disabled. - -- required-opps: This contains phandle to an OPP node in another device's OPP - table. It may contain an array of phandles, where each phandle points to an - OPP of a different device. It should not contain multiple phandles to the OPP - nodes in the same OPP table. This specifies the minimum required OPP of the - device(s), whose OPP's phandle is present in this property, for the - functioning of the current device at the current OPP (where this property is - present). - -Example 1: Single cluster Dual-core ARM cortex A9, switch DVFS states together. - -/ { - cpus { - #address-cells = <1>; - #size-cells = <0>; - - cpu@0 { - compatible = "arm,cortex-a9"; - reg = <0>; - next-level-cache = <&L2>; - clocks = <&clk_controller 0>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply0>; - operating-points-v2 = <&cpu0_opp_table>; - }; - - cpu@1 { - compatible = "arm,cortex-a9"; - reg = <1>; - next-level-cache = <&L2>; - clocks = <&clk_controller 0>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply0>; - operating-points-v2 = <&cpu0_opp_table>; - }; - }; - - cpu0_opp_table: opp_table0 { - compatible = "operating-points-v2"; - opp-shared; - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <975000 970000 985000>; - opp-microamp = <70000>; - clock-latency-ns = <300000>; - opp-suspend; - }; - opp-1100000000 { - opp-hz = /bits/ 64 <1100000000>; - opp-microvolt = <1000000 980000 1010000>; - opp-microamp = <80000>; - clock-latency-ns = <310000>; - }; - opp-1200000000 { - opp-hz = /bits/ 64 <1200000000>; - opp-microvolt = <1025000>; - clock-latency-ns = <290000>; - turbo-mode; - }; - }; -}; - -Example 2: Single cluster, Quad-core Qualcom-krait, switches DVFS states -independently. - -/ { - cpus { - #address-cells = <1>; - #size-cells = <0>; - - cpu@0 { - compatible = "qcom,krait"; - reg = <0>; - next-level-cache = <&L2>; - clocks = <&clk_controller 0>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply0>; - operating-points-v2 = <&cpu_opp_table>; - }; - - cpu@1 { - compatible = "qcom,krait"; - reg = <1>; - next-level-cache = <&L2>; - clocks = <&clk_controller 1>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply1>; - operating-points-v2 = <&cpu_opp_table>; - }; - - cpu@2 { - compatible = "qcom,krait"; - reg = <2>; - next-level-cache = <&L2>; - clocks = <&clk_controller 2>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply2>; - operating-points-v2 = <&cpu_opp_table>; - }; - - cpu@3 { - compatible = "qcom,krait"; - reg = <3>; - next-level-cache = <&L2>; - clocks = <&clk_controller 3>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply3>; - operating-points-v2 = <&cpu_opp_table>; - }; - }; - - cpu_opp_table: opp_table { - compatible = "operating-points-v2"; - - /* - * Missing opp-shared property means CPUs switch DVFS states - * independently. - */ - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <975000 970000 985000>; - opp-microamp = <70000>; - clock-latency-ns = <300000>; - opp-suspend; - }; - opp-1100000000 { - opp-hz = /bits/ 64 <1100000000>; - opp-microvolt = <1000000 980000 1010000>; - opp-microamp = <80000>; - clock-latency-ns = <310000>; - }; - opp-1200000000 { - opp-hz = /bits/ 64 <1200000000>; - opp-microvolt = <1025000>; - opp-microamp = <90000; - lock-latency-ns = <290000>; - turbo-mode; - }; - }; -}; - -Example 3: Dual-cluster, Dual-core per cluster. CPUs within a cluster switch -DVFS state together. - -/ { - cpus { - #address-cells = <1>; - #size-cells = <0>; - - cpu@0 { - compatible = "arm,cortex-a7"; - reg = <0>; - next-level-cache = <&L2>; - clocks = <&clk_controller 0>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply0>; - operating-points-v2 = <&cluster0_opp>; - }; - - cpu@1 { - compatible = "arm,cortex-a7"; - reg = <1>; - next-level-cache = <&L2>; - clocks = <&clk_controller 0>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply0>; - operating-points-v2 = <&cluster0_opp>; - }; - - cpu@100 { - compatible = "arm,cortex-a15"; - reg = <100>; - next-level-cache = <&L2>; - clocks = <&clk_controller 1>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply1>; - operating-points-v2 = <&cluster1_opp>; - }; - - cpu@101 { - compatible = "arm,cortex-a15"; - reg = <101>; - next-level-cache = <&L2>; - clocks = <&clk_controller 1>; - clock-names = "cpu"; - cpu-supply = <&cpu_supply1>; - operating-points-v2 = <&cluster1_opp>; - }; - }; - - cluster0_opp: opp_table0 { - compatible = "operating-points-v2"; - opp-shared; - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <975000 970000 985000>; - opp-microamp = <70000>; - clock-latency-ns = <300000>; - opp-suspend; - }; - opp-1100000000 { - opp-hz = /bits/ 64 <1100000000>; - opp-microvolt = <1000000 980000 1010000>; - opp-microamp = <80000>; - clock-latency-ns = <310000>; - }; - opp-1200000000 { - opp-hz = /bits/ 64 <1200000000>; - opp-microvolt = <1025000>; - opp-microamp = <90000>; - clock-latency-ns = <290000>; - turbo-mode; - }; - }; - - cluster1_opp: opp_table1 { - compatible = "operating-points-v2"; - opp-shared; - - opp-1300000000 { - opp-hz = /bits/ 64 <1300000000>; - opp-microvolt = <1050000 1045000 1055000>; - opp-microamp = <95000>; - clock-latency-ns = <400000>; - opp-suspend; - }; - opp-1400000000 { - opp-hz = /bits/ 64 <1400000000>; - opp-microvolt = <1075000>; - opp-microamp = <100000>; - clock-latency-ns = <400000>; - }; - opp-1500000000 { - opp-hz = /bits/ 64 <1500000000>; - opp-microvolt = <1100000 1010000 1110000>; - opp-microamp = <95000>; - clock-latency-ns = <400000>; - turbo-mode; - }; - }; -}; - -Example 4: Handling multiple regulators - -/ { - cpus { - cpu@0 { - compatible = "vendor,cpu-type"; - ... - - vcc0-supply = <&cpu_supply0>; - vcc1-supply = <&cpu_supply1>; - vcc2-supply = <&cpu_supply2>; - operating-points-v2 = <&cpu0_opp_table>; - }; - }; - - cpu0_opp_table: opp_table0 { - compatible = "operating-points-v2"; - opp-shared; - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <970000>, /* Supply 0 */ - <960000>, /* Supply 1 */ - <960000>; /* Supply 2 */ - opp-microamp = <70000>, /* Supply 0 */ - <70000>, /* Supply 1 */ - <70000>; /* Supply 2 */ - clock-latency-ns = <300000>; - }; - - /* OR */ - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <975000 970000 985000>, /* Supply 0 */ - <965000 960000 975000>, /* Supply 1 */ - <965000 960000 975000>; /* Supply 2 */ - opp-microamp = <70000>, /* Supply 0 */ - <70000>, /* Supply 1 */ - <70000>; /* Supply 2 */ - clock-latency-ns = <300000>; - }; - - /* OR */ - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt = <975000 970000 985000>, /* Supply 0 */ - <965000 960000 975000>, /* Supply 1 */ - <965000 960000 975000>; /* Supply 2 */ - opp-microamp = <70000>, /* Supply 0 */ - <0>, /* Supply 1 doesn't need this */ - <70000>; /* Supply 2 */ - clock-latency-ns = <300000>; - }; - }; -}; - -Example 5: opp-supported-hw -(example: three level hierarchy of versions: cuts, substrate and process) - -/ { - cpus { - cpu@0 { - compatible = "arm,cortex-a7"; - ... - - cpu-supply = <&cpu_supply> - operating-points-v2 = <&cpu0_opp_table_slow>; - }; - }; - - opp_table { - compatible = "operating-points-v2"; - opp-shared; - - opp-600000000 { - /* - * Supports all substrate and process versions for 0xF - * cuts, i.e. only first four cuts. - */ - opp-supported-hw = <0xF 0xFFFFFFFF 0xFFFFFFFF> - opp-hz = /bits/ 64 <600000000>; - ... - }; - - opp-800000000 { - /* - * Supports: - * - cuts: only one, 6th cut (represented by 6th bit). - * - substrate: supports 16 different substrate versions - * - process: supports 9 different process versions - */ - opp-supported-hw = <0x20 0xff0000ff 0x0000f4f0> - opp-hz = /bits/ 64 <800000000>; - ... - }; - - opp-900000000 { - /* - * Supports: - * - All cuts and substrate where process version is 0x2. - * - All cuts and process where substrate version is 0x2. - */ - opp-supported-hw = <0xFFFFFFFF 0xFFFFFFFF 0x02>, <0xFFFFFFFF 0x01 0xFFFFFFFF> - opp-hz = /bits/ 64 <900000000>; - ... - }; - }; -}; - -Example 6: opp-microvolt-<name>, opp-microamp-<name>: -(example: device with two possible microvolt ranges: slow and fast) - -/ { - cpus { - cpu@0 { - compatible = "arm,cortex-a7"; - ... - - operating-points-v2 = <&cpu0_opp_table>; - }; - }; - - cpu0_opp_table: opp_table0 { - compatible = "operating-points-v2"; - opp-shared; - - opp-1000000000 { - opp-hz = /bits/ 64 <1000000000>; - opp-microvolt-slow = <915000 900000 925000>; - opp-microvolt-fast = <975000 970000 985000>; - opp-microamp-slow = <70000>; - opp-microamp-fast = <71000>; - }; - - opp-1200000000 { - opp-hz = /bits/ 64 <1200000000>; - opp-microvolt-slow = <915000 900000 925000>, /* Supply vcc0 */ - <925000 910000 935000>; /* Supply vcc1 */ - opp-microvolt-fast = <975000 970000 985000>, /* Supply vcc0 */ - <965000 960000 975000>; /* Supply vcc1 */ - opp-microamp = <70000>; /* Will be used for both slow/fast */ - }; - }; -}; - -Example 7: Single cluster Quad-core ARM cortex A53, OPP points from firmware, -distinct clock controls but two sets of clock/voltage/current lines. - -/ { - cpus { - #address-cells = <2>; - #size-cells = <0>; - - cpu@0 { - compatible = "arm,cortex-a53"; - reg = <0x0 0x100>; - next-level-cache = <&A53_L2>; - clocks = <&dvfs_controller 0>; - operating-points-v2 = <&cpu_opp0_table>; - }; - cpu@1 { - compatible = "arm,cortex-a53"; - reg = <0x0 0x101>; - next-level-cache = <&A53_L2>; - clocks = <&dvfs_controller 1>; - operating-points-v2 = <&cpu_opp0_table>; - }; - cpu@2 { - compatible = "arm,cortex-a53"; - reg = <0x0 0x102>; - next-level-cache = <&A53_L2>; - clocks = <&dvfs_controller 2>; - operating-points-v2 = <&cpu_opp1_table>; - }; - cpu@3 { - compatible = "arm,cortex-a53"; - reg = <0x0 0x103>; - next-level-cache = <&A53_L2>; - clocks = <&dvfs_controller 3>; - operating-points-v2 = <&cpu_opp1_table>; - }; - - }; - - cpu_opp0_table: opp0_table { - compatible = "operating-points-v2"; - opp-shared; - }; - - cpu_opp1_table: opp1_table { - compatible = "operating-points-v2"; - opp-shared; - }; -}; diff --git a/Documentation/devicetree/bindings/opp/qcom-opp.txt b/Documentation/devicetree/bindings/opp/qcom-opp.txt index 32eb0793c7e6..41d3e4ff2dc3 100644 --- a/Documentation/devicetree/bindings/opp/qcom-opp.txt +++ b/Documentation/devicetree/bindings/opp/qcom-opp.txt @@ -1,7 +1,7 @@ Qualcomm OPP bindings to describe OPP nodes The bindings are based on top of the operating-points-v2 bindings -described in Documentation/devicetree/bindings/opp/opp.txt +described in Documentation/devicetree/bindings/opp/opp-v2-base.yaml Additional properties are described below. * OPP Table Node diff --git a/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt b/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt index 832346e489a3..b70d326117cd 100644 --- a/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt +++ b/Documentation/devicetree/bindings/opp/ti-omap5-opp-supply.txt @@ -13,7 +13,7 @@ regulators to the device that will undergo OPP transitions we can make use of the multi regulator binding that is part of the OPP core described here [1] to describe both regulators needed by the platform. -[1] Documentation/devicetree/bindings/opp/opp.txt +[1] Documentation/devicetree/bindings/opp/opp-v2.yaml Required Properties for Device Node: - vdd-supply: phandle to regulator controlling VDD supply diff --git a/Documentation/devicetree/bindings/pci/intel,keembay-pcie-ep.yaml b/Documentation/devicetree/bindings/pci/intel,keembay-pcie-ep.yaml new file mode 100644 index 000000000000..e87ff27526ff --- /dev/null +++ b/Documentation/devicetree/bindings/pci/intel,keembay-pcie-ep.yaml @@ -0,0 +1,69 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/pci/intel,keembay-pcie-ep.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: Intel Keem Bay PCIe controller Endpoint mode + +maintainers: + - Wan Ahmad Zainie <wan.ahmad.zainie.wan.mohamad@intel.com> + - Srikanth Thokala <srikanth.thokala@intel.com> + +properties: + compatible: + const: intel,keembay-pcie-ep + + reg: + maxItems: 5 + + reg-names: + items: + - const: dbi + - const: dbi2 + - const: atu + - const: addr_space + - const: apb + + interrupts: + maxItems: 4 + + interrupt-names: + items: + - const: pcie + - const: pcie_ev + - const: pcie_err + - const: pcie_mem_access + + num-lanes: + description: Number of lanes to use. + enum: [ 1, 2 ] + +required: + - compatible + - reg + - reg-names + - interrupts + - interrupt-names + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + pcie-ep@37000000 { + compatible = "intel,keembay-pcie-ep"; + reg = <0x37000000 0x00001000>, + <0x37100000 0x00001000>, + <0x37300000 0x00001000>, + <0x36000000 0x01000000>, + <0x37800000 0x00000200>; + reg-names = "dbi", "dbi2", "atu", "addr_space", "apb"; + interrupts = <GIC_SPI 107 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 108 IRQ_TYPE_EDGE_RISING>, + <GIC_SPI 109 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "pcie", "pcie_ev", "pcie_err", "pcie_mem_access"; + num-lanes = <2>; + }; diff --git a/Documentation/devicetree/bindings/pci/intel,keembay-pcie.yaml b/Documentation/devicetree/bindings/pci/intel,keembay-pcie.yaml new file mode 100644 index 000000000000..ed4400c9ac09 --- /dev/null +++ b/Documentation/devicetree/bindings/pci/intel,keembay-pcie.yaml @@ -0,0 +1,97 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: "http://devicetree.org/schemas/pci/intel,keembay-pcie.yaml#" +$schema: "http://devicetree.org/meta-schemas/core.yaml#" + +title: Intel Keem Bay PCIe controller Root Complex mode + +maintainers: + - Wan Ahmad Zainie <wan.ahmad.zainie.wan.mohamad@intel.com> + - Srikanth Thokala <srikanth.thokala@intel.com> + +allOf: + - $ref: /schemas/pci/pci-bus.yaml# + +properties: + compatible: + const: intel,keembay-pcie + + ranges: + maxItems: 1 + + reset-gpios: + maxItems: 1 + + reg: + maxItems: 4 + + reg-names: + items: + - const: dbi + - const: atu + - const: config + - const: apb + + clocks: + maxItems: 2 + + clock-names: + items: + - const: master + - const: aux + + interrupts: + maxItems: 3 + + interrupt-names: + items: + - const: pcie + - const: pcie_ev + - const: pcie_err + + num-lanes: + description: Number of lanes to use. + enum: [ 1, 2 ] + +required: + - compatible + - reg + - reg-names + - ranges + - clocks + - clock-names + - interrupts + - interrupt-names + - reset-gpios + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/gpio/gpio.h> + #define KEEM_BAY_A53_PCIE + #define KEEM_BAY_A53_AUX_PCIE + pcie@37000000 { + compatible = "intel,keembay-pcie"; + reg = <0x37000000 0x00001000>, + <0x37300000 0x00001000>, + <0x36e00000 0x00200000>, + <0x37800000 0x00000200>; + reg-names = "dbi", "atu", "config", "apb"; + #address-cells = <3>; + #size-cells = <2>; + device_type = "pci"; + ranges = <0x02000000 0 0x36000000 0x36000000 0 0x00e00000>; + interrupts = <GIC_SPI 107 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 108 IRQ_TYPE_LEVEL_HIGH>, + <GIC_SPI 109 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "pcie", "pcie_ev", "pcie_err"; + clocks = <&scmi_clk KEEM_BAY_A53_PCIE>, + <&scmi_clk KEEM_BAY_A53_AUX_PCIE>; + clock-names = "master", "aux"; + reset-gpios = <&pca2 9 GPIO_ACTIVE_LOW>; + num-lanes = <2>; + }; diff --git a/Documentation/devicetree/bindings/pci/mediatek-pcie-cfg.yaml b/Documentation/devicetree/bindings/pci/mediatek-pcie-cfg.yaml new file mode 100644 index 000000000000..841a3d284bbf --- /dev/null +++ b/Documentation/devicetree/bindings/pci/mediatek-pcie-cfg.yaml @@ -0,0 +1,39 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/pci/mediatek-pcie-cfg.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: MediaTek PCIECFG controller + +maintainers: + - Chuanjia Liu <chuanjia.liu@mediatek.com> + - Jianjun Wang <jianjun.wang@mediatek.com> + +description: | + The MediaTek PCIECFG controller controls some feature about + LTSSM, ASPM and so on. + +properties: + compatible: + items: + - enum: + - mediatek,generic-pciecfg + - const: syscon + + reg: + maxItems: 1 + +required: + - compatible + - reg + +additionalProperties: false + +examples: + - | + pciecfg: pciecfg@1a140000 { + compatible = "mediatek,generic-pciecfg", "syscon"; + reg = <0x1a140000 0x1000>; + }; +... diff --git a/Documentation/devicetree/bindings/pci/mediatek-pcie.txt b/Documentation/devicetree/bindings/pci/mediatek-pcie.txt index 7468d666763a..57ae73462272 100644 --- a/Documentation/devicetree/bindings/pci/mediatek-pcie.txt +++ b/Documentation/devicetree/bindings/pci/mediatek-pcie.txt @@ -8,7 +8,7 @@ Required properties: "mediatek,mt7623-pcie" "mediatek,mt7629-pcie" - device_type: Must be "pci" -- reg: Base addresses and lengths of the PCIe subsys and root ports. +- reg: Base addresses and lengths of the root ports. - reg-names: Names of the above areas to use during resource lookup. - #address-cells: Address representation for root ports (must be 3) - #size-cells: Size representation for root ports (must be 2) @@ -47,9 +47,12 @@ Required properties for MT7623/MT2701: - reset-names: Must be "pcie-rst0", "pcie-rst1", "pcie-rstN".. based on the number of root ports. -Required properties for MT2712/MT7622: +Required properties for MT2712/MT7622/MT7629: -interrupts: A list of interrupt outputs of the controller, must have one entry for each PCIe port +- interrupt-names: Must include the following entries: + - "pcie_irq": The interrupt that is asserted when an MSI/INTX is received +- linux,pci-domain: PCI domain ID. Should be unique for each host controller In addition, the device tree node must have sub-nodes describing each PCIe port interface, having the following mandatory properties: @@ -143,130 +146,143 @@ Examples for MT7623: Examples for MT2712: - pcie: pcie@11700000 { + pcie1: pcie@112ff000 { compatible = "mediatek,mt2712-pcie"; device_type = "pci"; - reg = <0 0x11700000 0 0x1000>, - <0 0x112ff000 0 0x1000>; - reg-names = "port0", "port1"; + reg = <0 0x112ff000 0 0x1000>; + reg-names = "port1"; + linux,pci-domain = <1>; #address-cells = <3>; #size-cells = <2>; - interrupts = <GIC_SPI 115 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 117 IRQ_TYPE_LEVEL_HIGH>; - clocks = <&topckgen CLK_TOP_PE2_MAC_P0_SEL>, - <&topckgen CLK_TOP_PE2_MAC_P1_SEL>, - <&pericfg CLK_PERI_PCIE0>, + interrupts = <GIC_SPI 117 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "pcie_irq"; + clocks = <&topckgen CLK_TOP_PE2_MAC_P1_SEL>, <&pericfg CLK_PERI_PCIE1>; - clock-names = "sys_ck0", "sys_ck1", "ahb_ck0", "ahb_ck1"; - phys = <&pcie0_phy PHY_TYPE_PCIE>, <&pcie1_phy PHY_TYPE_PCIE>; - phy-names = "pcie-phy0", "pcie-phy1"; + clock-names = "sys_ck1", "ahb_ck1"; + phys = <&u3port1 PHY_TYPE_PCIE>; + phy-names = "pcie-phy1"; bus-range = <0x00 0xff>; - ranges = <0x82000000 0 0x20000000 0x0 0x20000000 0 0x10000000>; + ranges = <0x82000000 0 0x11400000 0x0 0x11400000 0 0x300000>; + status = "disabled"; - pcie0: pcie@0,0 { - reg = <0x0000 0 0 0 0>; - #address-cells = <3>; - #size-cells = <2>; + #interrupt-cells = <1>; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 1 &pcie_intc1 0>, + <0 0 0 2 &pcie_intc1 1>, + <0 0 0 3 &pcie_intc1 2>, + <0 0 0 4 &pcie_intc1 3>; + pcie_intc1: interrupt-controller { + interrupt-controller; + #address-cells = <0>; #interrupt-cells = <1>; - ranges; - interrupt-map-mask = <0 0 0 7>; - interrupt-map = <0 0 0 1 &pcie_intc0 0>, - <0 0 0 2 &pcie_intc0 1>, - <0 0 0 3 &pcie_intc0 2>, - <0 0 0 4 &pcie_intc0 3>; - pcie_intc0: interrupt-controller { - interrupt-controller; - #address-cells = <0>; - #interrupt-cells = <1>; - }; }; + }; - pcie1: pcie@1,0 { - reg = <0x0800 0 0 0 0>; - #address-cells = <3>; - #size-cells = <2>; + pcie0: pcie@11700000 { + compatible = "mediatek,mt2712-pcie"; + device_type = "pci"; + reg = <0 0x11700000 0 0x1000>; + reg-names = "port0"; + linux,pci-domain = <0>; + #address-cells = <3>; + #size-cells = <2>; + interrupts = <GIC_SPI 115 IRQ_TYPE_LEVEL_HIGH>; + interrupt-names = "pcie_irq"; + clocks = <&topckgen CLK_TOP_PE2_MAC_P0_SEL>, + <&pericfg CLK_PERI_PCIE0>; + clock-names = "sys_ck0", "ahb_ck0"; + phys = <&u3port0 PHY_TYPE_PCIE>; + phy-names = "pcie-phy0"; + bus-range = <0x00 0xff>; + ranges = <0x82000000 0 0x20000000 0x0 0x20000000 0 0x10000000>; + status = "disabled"; + + #interrupt-cells = <1>; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 1 &pcie_intc0 0>, + <0 0 0 2 &pcie_intc0 1>, + <0 0 0 3 &pcie_intc0 2>, + <0 0 0 4 &pcie_intc0 3>; + pcie_intc0: interrupt-controller { + interrupt-controller; + #address-cells = <0>; #interrupt-cells = <1>; - ranges; - interrupt-map-mask = <0 0 0 7>; - interrupt-map = <0 0 0 1 &pcie_intc1 0>, - <0 0 0 2 &pcie_intc1 1>, - <0 0 0 3 &pcie_intc1 2>, - <0 0 0 4 &pcie_intc1 3>; - pcie_intc1: interrupt-controller { - interrupt-controller; - #address-cells = <0>; - #interrupt-cells = <1>; - }; }; }; Examples for MT7622: - pcie: pcie@1a140000 { + pcie0: pcie@1a143000 { compatible = "mediatek,mt7622-pcie"; device_type = "pci"; - reg = <0 0x1a140000 0 0x1000>, - <0 0x1a143000 0 0x1000>, - <0 0x1a145000 0 0x1000>; - reg-names = "subsys", "port0", "port1"; + reg = <0 0x1a143000 0 0x1000>; + reg-names = "port0"; + linux,pci-domain = <0>; #address-cells = <3>; #size-cells = <2>; - interrupts = <GIC_SPI 228 IRQ_TYPE_LEVEL_LOW>, - <GIC_SPI 229 IRQ_TYPE_LEVEL_LOW>; + interrupts = <GIC_SPI 228 IRQ_TYPE_LEVEL_LOW>; + interrupt-names = "pcie_irq"; clocks = <&pciesys CLK_PCIE_P0_MAC_EN>, - <&pciesys CLK_PCIE_P1_MAC_EN>, <&pciesys CLK_PCIE_P0_AHB_EN>, - <&pciesys CLK_PCIE_P1_AHB_EN>, <&pciesys CLK_PCIE_P0_AUX_EN>, - <&pciesys CLK_PCIE_P1_AUX_EN>, <&pciesys CLK_PCIE_P0_AXI_EN>, - <&pciesys CLK_PCIE_P1_AXI_EN>, <&pciesys CLK_PCIE_P0_OBFF_EN>, - <&pciesys CLK_PCIE_P1_OBFF_EN>, - <&pciesys CLK_PCIE_P0_PIPE_EN>, - <&pciesys CLK_PCIE_P1_PIPE_EN>; - clock-names = "sys_ck0", "sys_ck1", "ahb_ck0", "ahb_ck1", - "aux_ck0", "aux_ck1", "axi_ck0", "axi_ck1", - "obff_ck0", "obff_ck1", "pipe_ck0", "pipe_ck1"; - phys = <&pcie0_phy PHY_TYPE_PCIE>, <&pcie1_phy PHY_TYPE_PCIE>; - phy-names = "pcie-phy0", "pcie-phy1"; + <&pciesys CLK_PCIE_P0_PIPE_EN>; + clock-names = "sys_ck0", "ahb_ck0", "aux_ck0", + "axi_ck0", "obff_ck0", "pipe_ck0"; + power-domains = <&scpsys MT7622_POWER_DOMAIN_HIF0>; bus-range = <0x00 0xff>; - ranges = <0x82000000 0 0x20000000 0x0 0x20000000 0 0x10000000>; + ranges = <0x82000000 0 0x20000000 0x0 0x20000000 0 0x8000000>; + status = "disabled"; - pcie0: pcie@0,0 { - reg = <0x0000 0 0 0 0>; - #address-cells = <3>; - #size-cells = <2>; + #interrupt-cells = <1>; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 1 &pcie_intc0 0>, + <0 0 0 2 &pcie_intc0 1>, + <0 0 0 3 &pcie_intc0 2>, + <0 0 0 4 &pcie_intc0 3>; + pcie_intc0: interrupt-controller { + interrupt-controller; + #address-cells = <0>; #interrupt-cells = <1>; - ranges; - interrupt-map-mask = <0 0 0 7>; - interrupt-map = <0 0 0 1 &pcie_intc0 0>, - <0 0 0 2 &pcie_intc0 1>, - <0 0 0 3 &pcie_intc0 2>, - <0 0 0 4 &pcie_intc0 3>; - pcie_intc0: interrupt-controller { - interrupt-controller; - #address-cells = <0>; - #interrupt-cells = <1>; - }; }; + }; - pcie1: pcie@1,0 { - reg = <0x0800 0 0 0 0>; - #address-cells = <3>; - #size-cells = <2>; + pcie1: pcie@1a145000 { + compatible = "mediatek,mt7622-pcie"; + device_type = "pci"; + reg = <0 0x1a145000 0 0x1000>; + reg-names = "port1"; + linux,pci-domain = <1>; + #address-cells = <3>; + #size-cells = <2>; + interrupts = <GIC_SPI 229 IRQ_TYPE_LEVEL_LOW>; + interrupt-names = "pcie_irq"; + clocks = <&pciesys CLK_PCIE_P1_MAC_EN>, + /* designer has connect RC1 with p0_ahb clock */ + <&pciesys CLK_PCIE_P0_AHB_EN>, + <&pciesys CLK_PCIE_P1_AUX_EN>, + <&pciesys CLK_PCIE_P1_AXI_EN>, + <&pciesys CLK_PCIE_P1_OBFF_EN>, + <&pciesys CLK_PCIE_P1_PIPE_EN>; + clock-names = "sys_ck1", "ahb_ck1", "aux_ck1", + "axi_ck1", "obff_ck1", "pipe_ck1"; + + power-domains = <&scpsys MT7622_POWER_DOMAIN_HIF0>; + bus-range = <0x00 0xff>; + ranges = <0x82000000 0 0x28000000 0x0 0x28000000 0 0x8000000>; + status = "disabled"; + + #interrupt-cells = <1>; + interrupt-map-mask = <0 0 0 7>; + interrupt-map = <0 0 0 1 &pcie_intc1 0>, + <0 0 0 2 &pcie_intc1 1>, + <0 0 0 3 &pcie_intc1 2>, + <0 0 0 4 &pcie_intc1 3>; + pcie_intc1: interrupt-controller { + interrupt-controller; + #address-cells = <0>; #interrupt-cells = <1>; - ranges; - interrupt-map-mask = <0 0 0 7>; - interrupt-map = <0 0 0 1 &pcie_intc1 0>, - <0 0 0 2 &pcie_intc1 1>, - <0 0 0 3 &pcie_intc1 2>, - <0 0 0 4 &pcie_intc1 3>; - pcie_intc1: interrupt-controller { - interrupt-controller; - #address-cells = <0>; - #interrupt-cells = <1>; - }; }; }; diff --git a/Documentation/devicetree/bindings/pci/pci-ep.yaml b/Documentation/devicetree/bindings/pci/pci-ep.yaml index 7847bbcd4a03..ccec51ab5247 100644 --- a/Documentation/devicetree/bindings/pci/pci-ep.yaml +++ b/Documentation/devicetree/bindings/pci/pci-ep.yaml @@ -23,6 +23,13 @@ properties: default: 1 maximum: 255 + max-virtual-functions: + description: Array representing the number of virtual functions corresponding to each physical + function + $ref: /schemas/types.yaml#/definitions/uint8-array + minItems: 1 + maxItems: 255 + max-link-speed: $ref: /schemas/types.yaml#/definitions/uint32 enum: [ 1, 2, 3, 4 ] diff --git a/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt b/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt index 2d677e90a7e2..f56f8c58c5d9 100644 --- a/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt +++ b/Documentation/devicetree/bindings/pci/xilinx-nwl-pcie.txt @@ -35,6 +35,7 @@ Required properties: Optional properties: - dma-coherent: present if DMA operations are coherent +- clocks: Input clock specifier. Refer to common clock bindings Example: ++++++++ diff --git a/Documentation/devicetree/bindings/power/power-domain.yaml b/Documentation/devicetree/bindings/power/power-domain.yaml index aed51e9dcb11..3143ed9a3313 100644 --- a/Documentation/devicetree/bindings/power/power-domain.yaml +++ b/Documentation/devicetree/bindings/power/power-domain.yaml @@ -46,7 +46,7 @@ properties: Phandles to the OPP tables of power domains provided by a power domain provider. If the provider provides a single power domain only or all the power domains provided by the provider have identical OPP tables, - then this shall contain a single phandle. Refer to ../opp/opp.txt + then this shall contain a single phandle. Refer to ../opp/opp-v2-base.yaml for more information. "#power-domain-cells": diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt b/Documentation/devicetree/bindings/power/reset/qcom,pon.txt deleted file mode 100644 index 0c0dc3a1e693..000000000000 --- a/Documentation/devicetree/bindings/power/reset/qcom,pon.txt +++ /dev/null @@ -1,49 +0,0 @@ -Qualcomm PON Device - -The Power On device for Qualcomm PM8xxx is MFD supporting pwrkey -and resin along with the Android reboot-mode. - -This DT node has pwrkey and resin as sub nodes. - -Required Properties: --compatible: Must be one of: - "qcom,pm8916-pon" - "qcom,pms405-pon" - "qcom,pm8998-pon" - --reg: Specifies the physical address of the pon register - -Optional subnode: --pwrkey: Specifies the subnode pwrkey and should follow the - qcom,pm8941-pwrkey.txt description. --resin: Specifies the subnode resin and should follow the - qcom,pm8xxx-pwrkey.txt description. - -The rest of the properties should follow the generic reboot-mode description -found in reboot-mode.txt - -Example: - - pon@800 { - compatible = "qcom,pm8916-pon"; - - reg = <0x800>; - mode-bootloader = <0x2>; - mode-recovery = <0x1>; - - pwrkey { - compatible = "qcom,pm8941-pwrkey"; - interrupts = <0x0 0x8 0 IRQ_TYPE_EDGE_BOTH>; - debounce = <15625>; - bias-pull-up; - linux,code = <KEY_POWER>; - }; - - resin { - compatible = "qcom,pm8941-resin"; - interrupts = <0x0 0x8 1 IRQ_TYPE_EDGE_BOTH>; - debounce = <15625>; - bias-pull-up; - linux,code = <KEY_VOLUMEDOWN>; - }; - }; diff --git a/Documentation/devicetree/bindings/power/reset/qcom,pon.yaml b/Documentation/devicetree/bindings/power/reset/qcom,pon.yaml new file mode 100644 index 000000000000..353f155df0f4 --- /dev/null +++ b/Documentation/devicetree/bindings/power/reset/qcom,pon.yaml @@ -0,0 +1,80 @@ +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/power/reset/qcom,pon.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm PON Device + +maintainers: + - Vinod Koul <vkoul@kernel.org> + +description: | + The Power On device for Qualcomm PM8xxx is MFD supporting pwrkey + and resin along with the Android reboot-mode. + + This DT node has pwrkey and resin as sub nodes. + +allOf: + - $ref: reboot-mode.yaml# + +properties: + compatible: + enum: + - qcom,pm8916-pon + - qcom,pms405-pon + - qcom,pm8998-pon + + reg: + maxItems: 1 + + pwrkey: + type: object + $ref: "../../input/qcom,pm8941-pwrkey.yaml#" + + resin: + type: object + $ref: "../../input/qcom,pm8941-pwrkey.yaml#" + +required: + - compatible + - reg + +unevaluatedProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/irq.h> + #include <dt-bindings/input/linux-event-codes.h> + #include <dt-bindings/spmi/spmi.h> + spmi_bus: spmi@c440000 { + reg = <0x0c440000 0x1100>; + #address-cells = <2>; + #size-cells = <0>; + pmk8350: pmic@0 { + reg = <0x0 SPMI_USID>; + #address-cells = <1>; + #size-cells = <0>; + pmk8350_pon: pon_hlos@1300 { + reg = <0x1300>; + compatible = "qcom,pm8998-pon"; + + pwrkey { + compatible = "qcom,pm8941-pwrkey"; + interrupts = < 0x0 0x8 0 IRQ_TYPE_EDGE_BOTH >; + debounce = <15625>; + bias-pull-up; + linux,code = <KEY_POWER>; + }; + + resin { + compatible = "qcom,pm8941-resin"; + interrupts = <0x0 0x8 1 IRQ_TYPE_EDGE_BOTH>; + debounce = <15625>; + bias-pull-up; + linux,code = <KEY_VOLUMEDOWN>; + }; + }; + }; + }; +... diff --git a/Documentation/devicetree/bindings/power/reset/reboot-mode.yaml b/Documentation/devicetree/bindings/power/reset/reboot-mode.yaml index 9c6fda6b1dd9..ad0a0b95cec1 100644 --- a/Documentation/devicetree/bindings/power/reset/reboot-mode.yaml +++ b/Documentation/devicetree/bindings/power/reset/reboot-mode.yaml @@ -36,7 +36,7 @@ patternProperties: "^mode-.*$": $ref: /schemas/types.yaml#/definitions/uint32 -additionalProperties: false +additionalProperties: true examples: - | diff --git a/Documentation/devicetree/bindings/pwm/pwm-rockchip.yaml b/Documentation/devicetree/bindings/pwm/pwm-rockchip.yaml index 5596bee70509..81a54a4e8e3e 100644 --- a/Documentation/devicetree/bindings/pwm/pwm-rockchip.yaml +++ b/Documentation/devicetree/bindings/pwm/pwm-rockchip.yaml @@ -29,6 +29,7 @@ properties: - enum: - rockchip,px30-pwm - rockchip,rk3308-pwm + - rockchip,rk3568-pwm - const: rockchip,rk3328-pwm reg: diff --git a/Documentation/devicetree/bindings/rtc/trivial-rtc.yaml b/Documentation/devicetree/bindings/rtc/trivial-rtc.yaml index 7548d8714871..13925bb78ec7 100644 --- a/Documentation/devicetree/bindings/rtc/trivial-rtc.yaml +++ b/Documentation/devicetree/bindings/rtc/trivial-rtc.yaml @@ -32,6 +32,9 @@ properties: - dallas,ds3232 # I2C-BUS INTERFACE REAL TIME CLOCK MODULE - epson,rx8010 + # I2C-BUS INTERFACE REAL TIME CLOCK MODULE + - epson,rx8025 + - epson,rx8035 # I2C-BUS INTERFACE REAL TIME CLOCK MODULE with Battery Backed RAM - epson,rx8571 # I2C-BUS INTERFACE REAL TIME CLOCK MODULE diff --git a/Documentation/devicetree/bindings/sound/fsl,rpmsg.yaml b/Documentation/devicetree/bindings/sound/fsl,rpmsg.yaml index 61802a11baf4..d370c98a62c7 100644 --- a/Documentation/devicetree/bindings/sound/fsl,rpmsg.yaml +++ b/Documentation/devicetree/bindings/sound/fsl,rpmsg.yaml @@ -21,6 +21,7 @@ properties: - fsl,imx8mn-rpmsg-audio - fsl,imx8mm-rpmsg-audio - fsl,imx8mp-rpmsg-audio + - fsl,imx8ulp-rpmsg-audio model: $ref: /schemas/types.yaml#/definitions/string diff --git a/Documentation/devicetree/bindings/sound/mt8195-afe-pcm.yaml b/Documentation/devicetree/bindings/sound/mt8195-afe-pcm.yaml index 53e9434a6d9d..dcf790b053d2 100644 --- a/Documentation/devicetree/bindings/sound/mt8195-afe-pcm.yaml +++ b/Documentation/devicetree/bindings/sound/mt8195-afe-pcm.yaml @@ -130,36 +130,34 @@ additionalProperties: false examples: - | - #include <dt-bindings/clock/mt8195-clk.h> #include <dt-bindings/interrupt-controller/arm-gic.h> #include <dt-bindings/interrupt-controller/irq.h> - #include <dt-bindings/power/mt8195-power.h> afe: mt8195-afe-pcm@10890000 { compatible = "mediatek,mt8195-audio"; reg = <0x10890000 0x10000>; interrupts = <GIC_SPI 822 IRQ_TYPE_LEVEL_HIGH 0>; mediatek,topckgen = <&topckgen>; - power-domains = <&spm MT8195_POWER_DOMAIN_AUDIO>; + power-domains = <&spm 7>; //MT8195_POWER_DOMAIN_AUDIO clocks = <&clk26m>, - <&topckgen CLK_TOP_APLL1>, - <&topckgen CLK_TOP_APLL2>, - <&topckgen CLK_TOP_APLL12_DIV0>, - <&topckgen CLK_TOP_APLL12_DIV1>, - <&topckgen CLK_TOP_APLL12_DIV2>, - <&topckgen CLK_TOP_APLL12_DIV3>, - <&topckgen CLK_TOP_APLL12_DIV9>, - <&topckgen CLK_TOP_A1SYS_HP_SEL>, - <&topckgen CLK_TOP_AUD_INTBUS_SEL>, - <&topckgen CLK_TOP_AUDIO_H_SEL>, - <&topckgen CLK_TOP_AUDIO_LOCAL_BUS_SEL>, - <&topckgen CLK_TOP_DPTX_M_SEL>, - <&topckgen CLK_TOP_I2SO1_M_SEL>, - <&topckgen CLK_TOP_I2SO2_M_SEL>, - <&topckgen CLK_TOP_I2SI1_M_SEL>, - <&topckgen CLK_TOP_I2SI2_M_SEL>, - <&infracfg_ao CLK_INFRA_AO_AUDIO_26M_B>, - <&scp_adsp CLK_SCP_ADSP_AUDIODSP>; + <&topckgen 163>, //CLK_TOP_APLL1 + <&topckgen 166>, //CLK_TOP_APLL2 + <&topckgen 233>, //CLK_TOP_APLL12_DIV0 + <&topckgen 234>, //CLK_TOP_APLL12_DIV1 + <&topckgen 235>, //CLK_TOP_APLL12_DIV2 + <&topckgen 236>, //CLK_TOP_APLL12_DIV3 + <&topckgen 238>, //CLK_TOP_APLL12_DIV9 + <&topckgen 100>, //CLK_TOP_A1SYS_HP_SEL + <&topckgen 33>, //CLK_TOP_AUD_INTBUS_SEL + <&topckgen 34>, //CLK_TOP_AUDIO_H_SEL + <&topckgen 107>, //CLK_TOP_AUDIO_LOCAL_BUS_SEL + <&topckgen 98>, //CLK_TOP_DPTX_M_SEL + <&topckgen 94>, //CLK_TOP_I2SO1_M_SEL + <&topckgen 95>, //CLK_TOP_I2SO2_M_SEL + <&topckgen 96>, //CLK_TOP_I2SI1_M_SEL + <&topckgen 97>, //CLK_TOP_I2SI2_M_SEL + <&infracfg_ao 50>, //CLK_INFRA_AO_AUDIO_26M_B + <&scp_adsp 0>; //CLK_SCP_ADSP_AUDIODSP clock-names = "clk26m", "apll1_ck", "apll2_ck", diff --git a/Documentation/devicetree/bindings/spi/omap-spi.yaml b/Documentation/devicetree/bindings/spi/omap-spi.yaml index e55538186cf6..9952199cae11 100644 --- a/Documentation/devicetree/bindings/spi/omap-spi.yaml +++ b/Documentation/devicetree/bindings/spi/omap-spi.yaml @@ -84,9 +84,9 @@ unevaluatedProperties: false if: properties: compatible: - oneOf: - - const: ti,omap2-mcspi - - const: ti,omap4-mcspi + enum: + - ti,omap2-mcspi + - ti,omap4-mcspi then: properties: diff --git a/Documentation/devicetree/bindings/spi/spi-xilinx.yaml b/Documentation/devicetree/bindings/spi/spi-xilinx.yaml index 593f7693bace..03e5dca7e933 100644 --- a/Documentation/devicetree/bindings/spi/spi-xilinx.yaml +++ b/Documentation/devicetree/bindings/spi/spi-xilinx.yaml @@ -27,13 +27,11 @@ properties: xlnx,num-ss-bits: description: Number of chip selects used. - $ref: /schemas/types.yaml#/definitions/uint32 minimum: 1 maximum: 32 xlnx,num-transfer-bits: description: Number of bits per transfer. This will be 8 if not specified. - $ref: /schemas/types.yaml#/definitions/uint32 enum: [8, 16, 32] default: 8 diff --git a/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml new file mode 100644 index 000000000000..289e9a845600 --- /dev/null +++ b/Documentation/devicetree/bindings/thermal/qcom-lmh.yaml @@ -0,0 +1,82 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +# Copyright 2021 Linaro Ltd. +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/thermal/qcom-lmh.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Qualcomm Limits Management Hardware(LMh) + +maintainers: + - Thara Gopinath <thara.gopinath@linaro.org> + +description: + Limits Management Hardware(LMh) is a hardware infrastructure on some + Qualcomm SoCs that can enforce temperature and current limits as + programmed by software for certain IPs like CPU. + +properties: + compatible: + enum: + - qcom,sdm845-lmh + + reg: + items: + - description: core registers + + interrupts: + maxItems: 1 + + '#interrupt-cells': + const: 1 + + interrupt-controller: true + + cpus: + description: + phandle of the first cpu in the LMh cluster + $ref: /schemas/types.yaml#/definitions/phandle + + qcom,lmh-temp-arm-millicelsius: + description: + An integer expressing temperature threshold at which the LMh thermal + FSM is engaged. + + qcom,lmh-temp-low-millicelsius: + description: + An integer expressing temperature threshold at which the state machine + will attempt to remove frequency throttling. + + qcom,lmh-temp-high-millicelsius: + description: + An integer expressing temperature threshold at which the state machine + will attempt to throttle the frequency. + +required: + - compatible + - reg + - interrupts + - '#interrupt-cells' + - interrupt-controller + - cpus + - qcom,lmh-temp-arm-millicelsius + - qcom,lmh-temp-low-millicelsius + - qcom,lmh-temp-high-millicelsius + +additionalProperties: false + +examples: + - | + #include <dt-bindings/interrupt-controller/arm-gic.h> + + lmh@17d70800 { + compatible = "qcom,sdm845-lmh"; + reg = <0x17d70800 0x400>; + interrupts = <GIC_SPI 33 IRQ_TYPE_LEVEL_HIGH>; + cpus = <&CPU4>; + qcom,lmh-temp-arm-millicelsius = <65000>; + qcom,lmh-temp-low-millicelsius = <94500>; + qcom,lmh-temp-high-millicelsius = <95000>; + interrupt-controller; + #interrupt-cells = <1>; + }; diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml index 164f71598c59..a07de5ed0ca6 100644 --- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml +++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml @@ -215,7 +215,7 @@ patternProperties: - polling-delay - polling-delay-passive - thermal-sensors - - trips + additionalProperties: false additionalProperties: false diff --git a/Documentation/devicetree/bindings/virtio/mmio.yaml b/Documentation/devicetree/bindings/virtio/mmio.yaml index d46597028cf1..4b7a0273181c 100644 --- a/Documentation/devicetree/bindings/virtio/mmio.yaml +++ b/Documentation/devicetree/bindings/virtio/mmio.yaml @@ -36,7 +36,8 @@ required: - reg - interrupts -additionalProperties: false +additionalProperties: + type: object examples: - | diff --git a/Documentation/devicetree/bindings/virtio/virtio-device.yaml b/Documentation/devicetree/bindings/virtio/virtio-device.yaml new file mode 100644 index 000000000000..1778ea9b5aa5 --- /dev/null +++ b/Documentation/devicetree/bindings/virtio/virtio-device.yaml @@ -0,0 +1,41 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/virtio/virtio-device.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Virtio device bindings + +maintainers: + - Viresh Kumar <viresh.kumar@linaro.org> + +description: + These bindings are applicable to virtio devices irrespective of the bus they + are bound to, like mmio or pci. + +# We need a select here so we don't match all nodes with 'virtio,mmio' +properties: + compatible: + pattern: "^virtio,device[0-9a-f]{1,8}$" + description: Virtio device nodes. + "virtio,deviceID", where ID is the virtio device id. The textual + representation of ID shall be in lower case hexadecimal with leading + zeroes suppressed. + +required: + - compatible + +additionalProperties: true + +examples: + - | + virtio@3000 { + compatible = "virtio,mmio"; + reg = <0x3000 0x100>; + interrupts = <43>; + + i2c { + compatible = "virtio,device22"; + }; + }; +... diff --git a/Documentation/devicetree/bindings/watchdog/maxim,max63xx.yaml b/Documentation/devicetree/bindings/watchdog/maxim,max63xx.yaml index f2105eedac2c..ab9641e845db 100644 --- a/Documentation/devicetree/bindings/watchdog/maxim,max63xx.yaml +++ b/Documentation/devicetree/bindings/watchdog/maxim,max63xx.yaml @@ -15,13 +15,13 @@ maintainers: properties: compatible: - oneOf: - - const: maxim,max6369 - - const: maxim,max6370 - - const: maxim,max6371 - - const: maxim,max6372 - - const: maxim,max6373 - - const: maxim,max6374 + enum: + - maxim,max6369 + - maxim,max6370 + - maxim,max6371 + - maxim,max6372 + - maxim,max6373 + - maxim,max6374 reg: description: This is a 1-byte memory-mapped address diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst index 487ce4f41d77..50ebcda17ad0 100644 --- a/Documentation/driver-api/cxl/memory-devices.rst +++ b/Documentation/driver-api/cxl/memory-devices.rst @@ -36,9 +36,15 @@ CXL Core .. kernel-doc:: drivers/cxl/cxl.h :internal: -.. kernel-doc:: drivers/cxl/core.c +.. kernel-doc:: drivers/cxl/core/bus.c :doc: cxl core +.. kernel-doc:: drivers/cxl/core/pmem.c + :doc: cxl pmem + +.. kernel-doc:: drivers/cxl/core/regs.c + :doc: cxl registers + External Interfaces =================== diff --git a/Documentation/features/vm/ELF-ASLR/arch-support.txt b/Documentation/features/vm/ELF-ASLR/arch-support.txt index 99cb6d7f5005..2949c99fbb2f 100644 --- a/Documentation/features/vm/ELF-ASLR/arch-support.txt +++ b/Documentation/features/vm/ELF-ASLR/arch-support.txt @@ -22,7 +22,7 @@ | openrisc: | TODO | | parisc: | ok | | powerpc: | ok | - | riscv: | TODO | + | riscv: | ok | | s390: | ok | | sh: | TODO | | sparc: | TODO | diff --git a/Documentation/features/vm/huge-vmap/arch-support.txt b/Documentation/features/vm/huge-vmap/arch-support.txt index 439fd9069b8b..bc53905a0306 100644 --- a/Documentation/features/vm/huge-vmap/arch-support.txt +++ b/Documentation/features/vm/huge-vmap/arch-support.txt @@ -1,7 +1,7 @@ # # Feature name: huge-vmap # Kconfig: HAVE_ARCH_HUGE_VMAP -# description: arch supports the ioremap_pud_enabled() and ioremap_pmd_enabled() VM APIs +# description: arch supports the arch_vmap_pud_supported() and arch_vmap_pmd_supported() VM APIs # ----------------------- | arch |status| diff --git a/Documentation/filesystems/api-summary.rst b/Documentation/filesystems/api-summary.rst index 7e5c04c98619..98db2ea5fa12 100644 --- a/Documentation/filesystems/api-summary.rst +++ b/Documentation/filesystems/api-summary.rst @@ -71,9 +71,6 @@ Other Functions .. kernel-doc:: fs/fs-writeback.c :export: -.. kernel-doc:: fs/block_dev.c - :export: - .. kernel-doc:: fs/anon_inodes.c :export: diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index d5a73fa2c9ef..8126beadc7df 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -37,7 +37,7 @@ TTM initialization This section is outdated. Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver -<ttm_bo_driver>` structure to ttm_bo_device_init, together with an +<ttm_bo_driver>` structure to ttm_device_init, together with an initialized global reference to the memory manager. The ttm_bo_driver structure contains several fields with function pointers for initializing the TTM, allocating and freeing memory, waiting for command diff --git a/Documentation/kbuild/llvm.rst b/Documentation/kbuild/llvm.rst index e87ed5479963..d32616891dcf 100644 --- a/Documentation/kbuild/llvm.rst +++ b/Documentation/kbuild/llvm.rst @@ -130,9 +130,10 @@ Getting Help ------------ - `Website <https://clangbuiltlinux.github.io/>`_ -- `Mailing List <https://groups.google.com/forum/#!forum/clang-built-linux>`_: <clang-built-linux@googlegroups.com> +- `Mailing List <https://lore.kernel.org/llvm/>`_: <llvm@lists.linux.dev> +- `Old Mailing List Archives <https://groups.google.com/g/clang-built-linux>`_ - `Issue Tracker <https://github.com/ClangBuiltLinux/linux/issues>`_ -- IRC: #clangbuiltlinux on chat.freenode.net +- IRC: #clangbuiltlinux on irc.libera.chat - `Telegram <https://t.me/ClangBuiltLinux>`_: @ClangBuiltLinux - `Wiki <https://github.com/ClangBuiltLinux/linux/wiki>`_ - `Beginner Bugs <https://github.com/ClangBuiltLinux/linux/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22>`_ diff --git a/Documentation/kernel-hacking/hacking.rst b/Documentation/kernel-hacking/hacking.rst index df65c19aa7df..55bd37a2efb0 100644 --- a/Documentation/kernel-hacking/hacking.rst +++ b/Documentation/kernel-hacking/hacking.rst @@ -76,8 +76,8 @@ handler is never re-entered: if the same interrupt arrives, it is queued fast: frequently it simply acknowledges the interrupt, marks a 'software interrupt' for execution and exits. -You can tell you are in a hardware interrupt, because -:c:func:`in_irq()` returns true. +You can tell you are in a hardware interrupt, because in_hardirq() returns +true. .. warning:: diff --git a/Documentation/kernel-hacking/locking.rst b/Documentation/kernel-hacking/locking.rst index ed1284c6f078..90bc3f51eda9 100644 --- a/Documentation/kernel-hacking/locking.rst +++ b/Documentation/kernel-hacking/locking.rst @@ -94,16 +94,10 @@ primitives, but I'll pretend they don't exist. Locking in the Linux Kernel =========================== -If I could give you one piece of advice: never sleep with anyone crazier -than yourself. But if I had to give you advice on locking: **keep it -simple**. +If I could give you one piece of advice on locking: **keep it simple**. Be reluctant to introduce new locks. -Strangely enough, this last one is the exact reverse of my advice when -you **have** slept with someone crazier than yourself. And you should -think about getting a big dog. - Two Main Types of Kernel Locks: Spinlocks and Mutexes ----------------------------------------------------- @@ -1406,7 +1400,7 @@ bh half will be running at any time. Hardware Interrupt / Hardware IRQ - Hardware interrupt request. in_irq() returns true in a + Hardware interrupt request. in_hardirq() returns true in a hardware interrupt handler. Interrupt Context @@ -1418,7 +1412,7 @@ SMP (``CONFIG_SMP=y``). Software Interrupt / softirq - Software interrupt handler. in_irq() returns false; + Software interrupt handler. in_hardirq() returns false; in_softirq() returns true. Tasklets and softirqs both fall into the category of 'software interrupts'. diff --git a/Documentation/locking/futex-requeue-pi.rst b/Documentation/locking/futex-requeue-pi.rst index 14ab5787b9a7..dd4ecf4528a4 100644 --- a/Documentation/locking/futex-requeue-pi.rst +++ b/Documentation/locking/futex-requeue-pi.rst @@ -5,7 +5,7 @@ Futex Requeue PI Requeueing of tasks from a non-PI futex to a PI futex requires special handling in order to ensure the underlying rt_mutex is never left without an owner if it has waiters; doing so would break the PI -boosting logic [see rt-mutex-desgin.txt] For the purposes of +boosting logic [see rt-mutex-design.rst] For the purposes of brevity, this action will be referred to as "requeue_pi" throughout this document. Priority inheritance is abbreviated throughout as "PI". diff --git a/Documentation/locking/ww-mutex-design.rst b/Documentation/locking/ww-mutex-design.rst index 54d9c17bb66b..6a4d7319f8f0 100644 --- a/Documentation/locking/ww-mutex-design.rst +++ b/Documentation/locking/ww-mutex-design.rst @@ -2,7 +2,7 @@ Wound/Wait Deadlock-Proof Mutex Design ====================================== -Please read mutex-design.txt first, as it applies to wait/wound mutexes too. +Please read mutex-design.rst first, as it applies to wait/wound mutexes too. Motivation for WW-Mutexes ------------------------- diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst index 60ac091d3b0d..8a2788afe89b 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -101,8 +101,7 @@ subsystems which use EM might rely on this flag to check if all EM devices use the same scale. If there are different scales, these subsystems might decide to: return warning/error, stop working or panic. See Section 3. for an example of driver implementing this -callback, and kernel/power/energy_model.c for further documentation on this -API. +callback, or Section 2.4 for further documentation on this API 2.3 Accessing performance domains @@ -123,7 +122,17 @@ em_cpu_energy() API. The estimation is performed assuming that the schedutil CPUfreq governor is in use in case of CPU device. Currently this calculation is not provided for other type of devices. -More details about the above APIs can be found in include/linux/energy_model.h. +More details about the above APIs can be found in ``<linux/energy_model.h>`` +or in Section 2.4 + + +2.4 Description details of this API +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. kernel-doc:: include/linux/energy_model.h + :internal: + +.. kernel-doc:: kernel/power/energy_model.c + :export: 3. Example driver diff --git a/Documentation/process/applying-patches.rst b/Documentation/process/applying-patches.rst index 2e7017bef4b8..c2121c1e55d7 100644 --- a/Documentation/process/applying-patches.rst +++ b/Documentation/process/applying-patches.rst @@ -389,7 +389,7 @@ The -mm patches are experimental patches released by Andrew Morton. In the past, -mm tree were used to also test subsystem patches, but this function is now done via the -`linux-next <https://www.kernel.org/doc/man-pages/linux-next.html>` +`linux-next` (https://www.kernel.org/doc/man-pages/linux-next.html) tree. The Subsystem maintainers push their patches first to linux-next, and, during the merge window, sends them directly to Linus. diff --git a/Documentation/process/changes.rst b/Documentation/process/changes.rst index d3a8557b66a1..e35ab74a0f80 100644 --- a/Documentation/process/changes.rst +++ b/Documentation/process/changes.rst @@ -29,7 +29,7 @@ you probably needn't concern yourself with pcmciautils. ====================== =============== ======================================== Program Minimal version Command to check the version ====================== =============== ======================================== -GNU C 4.9 gcc --version +GNU C 5.1 gcc --version Clang/LLVM (optional) 10.0.1 clang --version GNU make 3.81 make --version binutils 2.23 ld -v diff --git a/Documentation/process/kernel-docs.rst b/Documentation/process/kernel-docs.rst index 22d9ace5df2a..da9527502ef0 100644 --- a/Documentation/process/kernel-docs.rst +++ b/Documentation/process/kernel-docs.rst @@ -126,15 +126,17 @@ On-line docs describes how to write user-mode utilities for communicating with Card Services. - * Title: **Linux Kernel Module Programming Guide** + * Title: **The Linux Kernel Module Programming Guide** - :Author: Ori Pomerantz. - :URL: https://tldp.org/LDP/lkmpg/2.6/html/index.html - :Date: 2001 + :Author: Peter Jay Salzman, Michael Burian, Ori Pomerantz, Bob Mottram, + Jim Huang. + :URL: https://sysprog21.github.io/lkmpg/ + :Date: 2021 :Keywords: modules, GPL book, /proc, ioctls, system calls, interrupt handlers . - :Description: Very nice 92 pages GPL book on the topic of modules - programming. Lots of examples. + :Description: A very nice GPL book on the topic of modules + programming. Lots of examples. Currently the new version is being + actively maintained at https://github.com/sysprog21/lkmpg. * Title: **Global spinlock list and usage** diff --git a/Documentation/process/maintainer-pgp-guide.rst b/Documentation/process/maintainer-pgp-guide.rst index 8f8f1fee92b8..29e7d7b1cd44 100644 --- a/Documentation/process/maintainer-pgp-guide.rst +++ b/Documentation/process/maintainer-pgp-guide.rst @@ -944,12 +944,11 @@ have on your keyring:: uid [ unknown] Linus Torvalds <torvalds@kernel.org> sub rsa2048 2011-09-20 [E] -Next, open the `PGP pathfinder`_. In the "From" field, paste the key -fingerprint of Linus Torvalds from the output above. In the "To" field, -paste the key-id you found via ``gpg --search`` of the unknown key, and -check the results: - -- `Finding paths to Linus`_ +Next, find a trust path from Linus Torvalds to the key-id you found via ``gpg +--search`` of the unknown key. For this, you can use several tools including +https://github.com/mricon/wotmate, +https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git/tree/graphs, and +https://the.earth.li/~noodles/pathfind.html. If you get a few decent trust paths, then it's a pretty good indication that it is a valid key. You can add it to your keyring from the @@ -962,6 +961,3 @@ administrators of the PGP Pathfinder service to not be malicious (in fact, this goes against :ref:`devs_not_infra`). However, if you do not carefully maintain your own web of trust, then it is a marked improvement over blindly trusting keyservers. - -.. _`PGP pathfinder`: https://pgp.cs.uu.nl/ -.. _`Finding paths to Linus`: https://pgp.cs.uu.nl/paths/79BE3E4300411886/to/C94035C21B4F2AEB.html diff --git a/Documentation/translations/it_IT/kernel-hacking/hacking.rst b/Documentation/translations/it_IT/kernel-hacking/hacking.rst index b4ea00f1b583..d5c521327f6a 100644 --- a/Documentation/translations/it_IT/kernel-hacking/hacking.rst +++ b/Documentation/translations/it_IT/kernel-hacking/hacking.rst @@ -90,7 +90,7 @@ i gestori d'interruzioni devono essere veloci: spesso si limitano esclusivamente a notificare la presa in carico dell'interruzione, programmare una 'interruzione software' per l'esecuzione e quindi terminare. -Potete dire d'essere in una interruzione hardware perché :c:func:`in_irq()` +Potete dire d'essere in una interruzione hardware perché in_hardirq() ritorna vero. .. warning:: diff --git a/Documentation/translations/it_IT/kernel-hacking/locking.rst b/Documentation/translations/it_IT/kernel-hacking/locking.rst index 1e7c84def369..1efb8293bf1f 100644 --- a/Documentation/translations/it_IT/kernel-hacking/locking.rst +++ b/Documentation/translations/it_IT/kernel-hacking/locking.rst @@ -1459,11 +1459,11 @@ contesto utente che hardware. interruzione hardware - Richiesta di interruzione hardware. in_irq() ritorna vero in un + Richiesta di interruzione hardware. in_hardirq() ritorna vero in un gestore d'interruzioni hardware. interruzione software / softirq - Gestore di interruzioni software: in_irq() ritorna falso; + Gestore di interruzioni software: in_hardirq() ritorna falso; in_softirq() ritorna vero. I tasklet e le softirq sono entrambi considerati 'interruzioni software'. diff --git a/Documentation/translations/zh_CN/admin-guide/README.rst b/Documentation/translations/zh_CN/admin-guide/README.rst index 669a022f6817..980eb20521cf 100644 --- a/Documentation/translations/zh_CN/admin-guide/README.rst +++ b/Documentation/translations/zh_CN/admin-guide/README.rst @@ -223,7 +223,7 @@ Linux内核5.x版本 <http://kernel.org/> 编译内核 --------- - - 确保您至少有gcc 4.9可用。 + - 确保您至少有gcc 5.1可用。 有关更多信息,请参阅 :ref:`Documentation/process/changes.rst <changes>` 。 请注意,您仍然可以使用此内核运行a.out用户程序。 diff --git a/Documentation/translations/zh_CN/core-api/cachetlb.rst b/Documentation/translations/zh_CN/core-api/cachetlb.rst index 55827b8a7c53..6fee45fe5e80 100644 --- a/Documentation/translations/zh_CN/core-api/cachetlb.rst +++ b/Documentation/translations/zh_CN/core-api/cachetlb.rst @@ -80,7 +80,7 @@ cpu上对这个地址空间进行刷新。 5) ``void update_mmu_cache(struct vm_area_struct *vma, unsigned long address, pte_t *ptep)`` - 在每个页面故障结束时,这个程序被调用,以告诉体系结构特定的代码,在 + 在每个缺页异常结束时,这个程序被调用,以告诉体系结构特定的代码,在 软件页表中,在地址空间“vma->vm_mm”的虚拟地址“地址”处,现在存在 一个翻译。 diff --git a/Documentation/translations/zh_CN/core-api/index.rst b/Documentation/translations/zh_CN/core-api/index.rst index d5e947d8b6f1..72f0a36daa1c 100644 --- a/Documentation/translations/zh_CN/core-api/index.rst +++ b/Documentation/translations/zh_CN/core-api/index.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../core-api/irq/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/core-api/index.rst -.. _cn_core-api_index.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_core-api_index.rst: =========== 核心API文档 diff --git a/Documentation/translations/zh_CN/core-api/irq/concepts.rst b/Documentation/translations/zh_CN/core-api/irq/concepts.rst index 41455bf0f783..9957f0453353 100644 --- a/Documentation/translations/zh_CN/core-api/irq/concepts.rst +++ b/Documentation/translations/zh_CN/core-api/irq/concepts.rst @@ -1,10 +1,12 @@ .. include:: ../../disclaimer-zh_CN.rst -:Original: :doc:`../../../../core-api/irq/concepts` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/core-api/irq/concepts.rst -.. _cn_concepts.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_concepts.rst: =========== 什么是IRQ? diff --git a/Documentation/translations/zh_CN/core-api/irq/index.rst b/Documentation/translations/zh_CN/core-api/irq/index.rst index 910ccabf041f..ba6acc4b48e5 100644 --- a/Documentation/translations/zh_CN/core-api/irq/index.rst +++ b/Documentation/translations/zh_CN/core-api/irq/index.rst @@ -1,7 +1,10 @@ .. include:: ../../disclaimer-zh_CN.rst -:Original: :doc:`../../../../core-api/irq/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/core-api/irq/index.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_irq_index.rst: diff --git a/Documentation/translations/zh_CN/core-api/irq/irq-affinity.rst b/Documentation/translations/zh_CN/core-api/irq/irq-affinity.rst index 82a4428f22fd..7addd5f27a88 100644 --- a/Documentation/translations/zh_CN/core-api/irq/irq-affinity.rst +++ b/Documentation/translations/zh_CN/core-api/irq/irq-affinity.rst @@ -1,10 +1,12 @@ .. include:: ../../disclaimer-zh_CN.rst -:Original: :doc:`../../../../core-api/irq/irq-affinity` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/core-api/irq/irq-affinity -.. _cn_irq-affinity.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_irq-affinity.rst: ============== SMP IRQ 亲和性 diff --git a/Documentation/translations/zh_CN/core-api/irq/irq-domain.rst b/Documentation/translations/zh_CN/core-api/irq/irq-domain.rst index 3c82dd307a46..7d077742f758 100644 --- a/Documentation/translations/zh_CN/core-api/irq/irq-domain.rst +++ b/Documentation/translations/zh_CN/core-api/irq/irq-domain.rst @@ -1,10 +1,12 @@ .. include:: ../../disclaimer-zh_CN.rst -:Original: :doc:`../../../../core-api/irq/irq-domain` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/core-api/irq/irq-domain.rst -.. _cn_irq-domain.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_irq-domain.rst: ======================= irq_domain 中断号映射库 diff --git a/Documentation/translations/zh_CN/core-api/irq/irqflags-tracing.rst b/Documentation/translations/zh_CN/core-api/irq/irqflags-tracing.rst index c889bd0f65d9..9af50b4b8c2d 100644 --- a/Documentation/translations/zh_CN/core-api/irq/irqflags-tracing.rst +++ b/Documentation/translations/zh_CN/core-api/irq/irqflags-tracing.rst @@ -1,10 +1,12 @@ .. include:: ../../disclaimer-zh_CN.rst -:Original: :doc:`../../../../core-api/irq/irqflags-tracing` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/core-api/irq/irqflags-tracing.rst -.. _cn_irqflags-tracing.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_irqflags-tracing.rst: ================= IRQ-flags状态追踪 diff --git a/Documentation/translations/zh_CN/core-api/kernel-api.rst b/Documentation/translations/zh_CN/core-api/kernel-api.rst index d6f815ec265b..ab7d81889340 100644 --- a/Documentation/translations/zh_CN/core-api/kernel-api.rst +++ b/Documentation/translations/zh_CN/core-api/kernel-api.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/kernel-api.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_kernel-api.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_kernel-api.rst: ============ Linux内核API diff --git a/Documentation/translations/zh_CN/core-api/kobject.rst b/Documentation/translations/zh_CN/core-api/kobject.rst index f0e6a4aeb372..b7c37794cc7f 100644 --- a/Documentation/translations/zh_CN/core-api/kobject.rst +++ b/Documentation/translations/zh_CN/core-api/kobject.rst @@ -1,7 +1,10 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/kobject.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_core_api_kobject.rst: diff --git a/Documentation/translations/zh_CN/core-api/local_ops.rst b/Documentation/translations/zh_CN/core-api/local_ops.rst index ee67379b6869..41e4525038e8 100644 --- a/Documentation/translations/zh_CN/core-api/local_ops.rst +++ b/Documentation/translations/zh_CN/core-api/local_ops.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/local_ops.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_local_ops: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_local_ops: ======================== 本地原子操作的语义和行为 diff --git a/Documentation/translations/zh_CN/core-api/padata.rst b/Documentation/translations/zh_CN/core-api/padata.rst index c627f8f131f9..781d30675afd 100644 --- a/Documentation/translations/zh_CN/core-api/padata.rst +++ b/Documentation/translations/zh_CN/core-api/padata.rst @@ -3,7 +3,10 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/padata.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_core_api_padata.rst: diff --git a/Documentation/translations/zh_CN/core-api/printk-basics.rst b/Documentation/translations/zh_CN/core-api/printk-basics.rst index 2b20f6303a82..d574de3167c8 100644 --- a/Documentation/translations/zh_CN/core-api/printk-basics.rst +++ b/Documentation/translations/zh_CN/core-api/printk-basics.rst @@ -2,10 +2,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/printk-basics.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_printk-basics.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_printk-basics.rst: ================== 使用printk记录消息 diff --git a/Documentation/translations/zh_CN/core-api/printk-formats.rst b/Documentation/translations/zh_CN/core-api/printk-formats.rst index a680c8f164c3..ce39c788cf5a 100644 --- a/Documentation/translations/zh_CN/core-api/printk-formats.rst +++ b/Documentation/translations/zh_CN/core-api/printk-formats.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/printk-formats.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_printk-formats.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_printk-formats.rst: ============================== 如何获得正确的printk格式占位符 diff --git a/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst b/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst index ea834e38d2f6..e2467fd26fc0 100644 --- a/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst +++ b/Documentation/translations/zh_CN/core-api/refcount-vs-atomic.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/refcount-vs-atomic.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_refcount-vs-atomic: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_refcount-vs-atomic: ======================================= 与atomic_t相比,refcount_t的API是这样的 diff --git a/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst b/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst index ce05c29c7697..6abf7ed534ca 100644 --- a/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst +++ b/Documentation/translations/zh_CN/core-api/symbol-namespaces.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/symbol-namespaces.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_symbol-namespaces.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_symbol-namespaces.rst: ================================= 符号命名空间(Symbol Namespaces) diff --git a/Documentation/translations/zh_CN/core-api/workqueue.rst b/Documentation/translations/zh_CN/core-api/workqueue.rst index 0b8f730db6c0..e372fa5cf101 100644 --- a/Documentation/translations/zh_CN/core-api/workqueue.rst +++ b/Documentation/translations/zh_CN/core-api/workqueue.rst @@ -2,10 +2,12 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/core-api/workqueue.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> -.. _cn_workqueue.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_workqueue.rst: ========================= 并发管理的工作队列 (cmwq) diff --git a/Documentation/translations/zh_CN/cpu-freq/core.rst b/Documentation/translations/zh_CN/cpu-freq/core.rst index 19fb9c029cfe..0c6fd447ced6 100644 --- a/Documentation/translations/zh_CN/cpu-freq/core.rst +++ b/Documentation/translations/zh_CN/cpu-freq/core.rst @@ -1,11 +1,13 @@ .. SPDX-License-Identifier: GPL-2.0 .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../cpu-freq/core` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/cpu-freq/core.rst -.. _cn_core.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_core.rst: ==================================== CPUFreq核心和CPUFreq通知器的通用说明 diff --git a/Documentation/translations/zh_CN/cpu-freq/cpu-drivers.rst b/Documentation/translations/zh_CN/cpu-freq/cpu-drivers.rst index 5ae9cfa2ec55..0fc5d1495789 100644 --- a/Documentation/translations/zh_CN/cpu-freq/cpu-drivers.rst +++ b/Documentation/translations/zh_CN/cpu-freq/cpu-drivers.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../cpu-freq/cpu-drivers` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/cpu-freq/cpu-drivers.rst -.. _cn_cpu-drivers.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_cpu-drivers.rst: ======================================= 如何实现一个新的CPUFreq处理器驱动程序? @@ -80,8 +82,6 @@ CPUfreq核心层注册一个cpufreq_driver结构体。 .resume - 一个指向per-policy恢复函数的指针,该函数在关中断且在调节器再一次开始前被 调用。 - .ready - 一个指向per-policy准备函数的指针,该函数在策略完全初始化之后被调用。 - .attr - 一个指向NULL结尾的"struct freq_attr"列表的指针,该函数允许导出值到 sysfs。 diff --git a/Documentation/translations/zh_CN/cpu-freq/cpufreq-stats.rst b/Documentation/translations/zh_CN/cpu-freq/cpufreq-stats.rst index c90d1d8353ed..f14423099d4b 100644 --- a/Documentation/translations/zh_CN/cpu-freq/cpufreq-stats.rst +++ b/Documentation/translations/zh_CN/cpu-freq/cpufreq-stats.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../cpu-freq/cpufreq-stats` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/cpu-freq/cpufreq-stats.rst -.. _cn_cpufreq-stats.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_cpufreq-stats.rst: ========================================== sysfs CPUFreq Stats的一般说明 diff --git a/Documentation/translations/zh_CN/cpu-freq/index.rst b/Documentation/translations/zh_CN/cpu-freq/index.rst index 65074e211940..c6e50963cd33 100644 --- a/Documentation/translations/zh_CN/cpu-freq/index.rst +++ b/Documentation/translations/zh_CN/cpu-freq/index.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../cpu-freq/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/cpu-freq/index.rst -.. _cn_index.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_index.rst: ======================================================= Linux CPUFreq - Linux(TM)内核中的CPU频率和电压升降代码 diff --git a/Documentation/translations/zh_CN/filesystems/debugfs.rst b/Documentation/translations/zh_CN/filesystems/debugfs.rst index 822c4d42fdf9..4981a82dd651 100644 --- a/Documentation/translations/zh_CN/filesystems/debugfs.rst +++ b/Documentation/translations/zh_CN/filesystems/debugfs.rst @@ -2,7 +2,7 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../filesystems/debugfs` +:Original: Documentation/filesystems/debugfs.rst ======= Debugfs diff --git a/Documentation/translations/zh_CN/iio/ep93xx_adc.rst b/Documentation/translations/zh_CN/iio/ep93xx_adc.rst index 7e91d2197867..64f3f3508353 100644 --- a/Documentation/translations/zh_CN/iio/ep93xx_adc.rst +++ b/Documentation/translations/zh_CN/iio/ep93xx_adc.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../iio/ep93xx_adc` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/iio/ep93xx_adc.rst -.. _cn_iio_ep93xx_adc: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_iio_ep93xx_adc: ================================== 思睿逻辑 EP93xx 模拟数字转换器驱动 diff --git a/Documentation/translations/zh_CN/iio/iio_configfs.rst b/Documentation/translations/zh_CN/iio/iio_configfs.rst index 274488e8dce4..d5460e951804 100644 --- a/Documentation/translations/zh_CN/iio/iio_configfs.rst +++ b/Documentation/translations/zh_CN/iio/iio_configfs.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../iio/iio_configfs` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/iio/iio_configfs.rst -.. _cn_iio_configfs: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_iio_configfs: ===================== 工业 IIO configfs支持 diff --git a/Documentation/translations/zh_CN/iio/index.rst b/Documentation/translations/zh_CN/iio/index.rst index 7087076a10f6..32d69047b16a 100644 --- a/Documentation/translations/zh_CN/iio/index.rst +++ b/Documentation/translations/zh_CN/iio/index.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../iio/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/iio/index.rst -.. _cn_iio_index: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_iio_index: ======== 工业 I/O diff --git a/Documentation/translations/zh_CN/kernel-hacking/hacking.rst b/Documentation/translations/zh_CN/kernel-hacking/hacking.rst index ab974faddecf..f2bc154c5bcc 100644 --- a/Documentation/translations/zh_CN/kernel-hacking/hacking.rst +++ b/Documentation/translations/zh_CN/kernel-hacking/hacking.rst @@ -68,7 +68,7 @@ 它将被排队(或丢弃)。因为它会关闭中断,所以处理程序必须很快:通常它只是 确认中断,标记一个“软件中断”以执行并退出。 -您可以通过 :c:func:`in_irq()` 返回真来判断您处于硬件中断状态。 +您可以通过 in_hardirq() 返回真来判断您处于硬件中断状态。 .. warning:: diff --git a/Documentation/translations/zh_CN/mips/booting.rst b/Documentation/translations/zh_CN/mips/booting.rst index 96453e1b962e..e0bbd3f20862 100644 --- a/Documentation/translations/zh_CN/mips/booting.rst +++ b/Documentation/translations/zh_CN/mips/booting.rst @@ -2,8 +2,11 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../mips/booting` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/mips/booting.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_booting: diff --git a/Documentation/translations/zh_CN/mips/features.rst b/Documentation/translations/zh_CN/mips/features.rst index 93d93d06b1b3..b61dab06ceaf 100644 --- a/Documentation/translations/zh_CN/mips/features.rst +++ b/Documentation/translations/zh_CN/mips/features.rst @@ -2,8 +2,11 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../mips/features` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/mips/features.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_features: diff --git a/Documentation/translations/zh_CN/mips/index.rst b/Documentation/translations/zh_CN/mips/index.rst index b85033f9d67c..192c6adbb72e 100644 --- a/Documentation/translations/zh_CN/mips/index.rst +++ b/Documentation/translations/zh_CN/mips/index.rst @@ -2,8 +2,11 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../mips/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/mips/index.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> =========================== MIPS特性文档 diff --git a/Documentation/translations/zh_CN/mips/ingenic-tcu.rst b/Documentation/translations/zh_CN/mips/ingenic-tcu.rst index f04ba407384a..ddbe149c517b 100644 --- a/Documentation/translations/zh_CN/mips/ingenic-tcu.rst +++ b/Documentation/translations/zh_CN/mips/ingenic-tcu.rst @@ -2,8 +2,11 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../mips/ingenic-tcu` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/mips/ingenic-tcu.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_ingenic-tcu: diff --git a/Documentation/translations/zh_CN/openrisc/index.rst b/Documentation/translations/zh_CN/openrisc/index.rst index d722642796c8..9ad6cc600884 100644 --- a/Documentation/translations/zh_CN/openrisc/index.rst +++ b/Documentation/translations/zh_CN/openrisc/index.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../openrisc/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/openrisc/index.rst -.. _cn_openrisc_index: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_openrisc_index: ================= OpenRISC 体系架构 diff --git a/Documentation/translations/zh_CN/openrisc/openrisc_port.rst b/Documentation/translations/zh_CN/openrisc/openrisc_port.rst index e87d0eec281d..b8a67670492d 100644 --- a/Documentation/translations/zh_CN/openrisc/openrisc_port.rst +++ b/Documentation/translations/zh_CN/openrisc/openrisc_port.rst @@ -1,7 +1,10 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../openrisc/openrisc_port` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/openrisc/openrisc_port.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_openrisc_port: diff --git a/Documentation/translations/zh_CN/openrisc/todo.rst b/Documentation/translations/zh_CN/openrisc/todo.rst index 9944ad05473b..63c38717edb1 100644 --- a/Documentation/translations/zh_CN/openrisc/todo.rst +++ b/Documentation/translations/zh_CN/openrisc/todo.rst @@ -1,7 +1,10 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../openrisc/todo` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/openrisc/todo.rst + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_openrisc_todo.rst: diff --git a/Documentation/translations/zh_CN/parisc/debugging.rst b/Documentation/translations/zh_CN/parisc/debugging.rst index c21beb986e15..68b73eb57105 100644 --- a/Documentation/translations/zh_CN/parisc/debugging.rst +++ b/Documentation/translations/zh_CN/parisc/debugging.rst @@ -1,7 +1,10 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/parisc/debugging.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_parisc_debugging: diff --git a/Documentation/translations/zh_CN/parisc/index.rst b/Documentation/translations/zh_CN/parisc/index.rst index a47454ebe32e..0cc553fc8272 100644 --- a/Documentation/translations/zh_CN/parisc/index.rst +++ b/Documentation/translations/zh_CN/parisc/index.rst @@ -2,7 +2,10 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/parisc/index.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_parisc_index: diff --git a/Documentation/translations/zh_CN/parisc/registers.rst b/Documentation/translations/zh_CN/parisc/registers.rst index 71e2404cd103..d2ab1874a602 100644 --- a/Documentation/translations/zh_CN/parisc/registers.rst +++ b/Documentation/translations/zh_CN/parisc/registers.rst @@ -1,7 +1,10 @@ .. include:: ../disclaimer-zh_CN.rst :Original: Documentation/parisc/registers.rst -:Translator: Yanteng Si <siyanteng@loongson.cn> + +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> .. _cn_parisc_registers: diff --git a/Documentation/translations/zh_CN/riscv/boot-image-header.rst b/Documentation/translations/zh_CN/riscv/boot-image-header.rst index 241bf9c1bcbe..0234c28a7114 100644 --- a/Documentation/translations/zh_CN/riscv/boot-image-header.rst +++ b/Documentation/translations/zh_CN/riscv/boot-image-header.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../riscv/boot-image-header` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/riscv/boot-image-header.rst -.. _cn_boot-image-header.rst: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_boot-image-header.rst: ========================== RISC-V Linux启动镜像文件头 diff --git a/Documentation/translations/zh_CN/riscv/index.rst b/Documentation/translations/zh_CN/riscv/index.rst index db13b1101490..bbf5d7b3777a 100644 --- a/Documentation/translations/zh_CN/riscv/index.rst +++ b/Documentation/translations/zh_CN/riscv/index.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../riscv/index` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/riscv/index.rst -.. _cn_riscv_index: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_riscv_index: =============== RISC-V 体系结构 diff --git a/Documentation/translations/zh_CN/riscv/patch-acceptance.rst b/Documentation/translations/zh_CN/riscv/patch-acceptance.rst index 9fd1c8216763..d180d24717bf 100644 --- a/Documentation/translations/zh_CN/riscv/patch-acceptance.rst +++ b/Documentation/translations/zh_CN/riscv/patch-acceptance.rst @@ -2,11 +2,13 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../riscv/patch-acceptance` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/riscv/patch-acceptance.rst -.. _cn_riscv_patch-acceptance: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_riscv_patch-acceptance: arch/riscv 开发者维护指南 ========================= diff --git a/Documentation/translations/zh_CN/riscv/pmu.rst b/Documentation/translations/zh_CN/riscv/pmu.rst index 22dcf3a9ca6e..7ec801026c4d 100644 --- a/Documentation/translations/zh_CN/riscv/pmu.rst +++ b/Documentation/translations/zh_CN/riscv/pmu.rst @@ -1,10 +1,12 @@ .. include:: ../disclaimer-zh_CN.rst -:Original: :doc:`../../../riscv/pmu` -:Translator: Yanteng Si <siyanteng@loongson.cn> +:Original: Documentation/riscv/pmu.rst -.. _cn_riscv_pmu: +:翻译: + + 司延腾 Yanteng Si <siyanteng@loongson.cn> +.. _cn_riscv_pmu: ======================== RISC-V平台上对PMUs的支持 diff --git a/Documentation/translations/zh_TW/admin-guide/README.rst b/Documentation/translations/zh_TW/admin-guide/README.rst index b752e50359e6..6ce97edbab37 100644 --- a/Documentation/translations/zh_TW/admin-guide/README.rst +++ b/Documentation/translations/zh_TW/admin-guide/README.rst @@ -226,7 +226,7 @@ Linux內核5.x版本 <http://kernel.org/> 編譯內核 --------- - - 確保您至少有gcc 4.9可用。 + - 確保您至少有gcc 5.1可用。 有關更多信息,請參閱 :ref:`Documentation/process/changes.rst <changes>` 。 請注意,您仍然可以使用此內核運行a.out用戶程序。 diff --git a/Documentation/translations/zh_TW/arm64/amu.rst b/Documentation/translations/zh_TW/arm64/amu.rst new file mode 100644 index 000000000000..ffdc466e0f62 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/amu.rst @@ -0,0 +1,104 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arm64/amu.rst <amu_index>` + +Translator: Bailu Lin <bailu.lin@vivo.com> + Hu Haowen <src.res@email.cn> + +================================== +AArch64 Linux 中擴展的活動監控單元 +================================== + +作者: Ionela Voinescu <ionela.voinescu@arm.com> + +日期: 2019-09-10 + +本文檔簡要描述了 AArch64 Linux 支持的活動監控單元的規範。 + + +架構總述 +-------- + +活動監控是 ARMv8.4 CPU 架構引入的一個可選擴展特性。 + +活動監控單元(在每個 CPU 中實現)爲系統管理提供了性能計數器。既可以通 +過系統寄存器的方式訪問計數器,同時也支持外部內存映射的方式訪問計數器。 + +AMUv1 架構實現了一個由4個固定的64位事件計數器組成的計數器組。 + + - CPU 周期計數器:同 CPU 的頻率增長 + - 常量計數器:同固定的系統時鐘頻率增長 + - 淘汰指令計數器: 同每次架構指令執行增長 + - 內存停頓周期計數器:計算由在時鐘域內的最後一級緩存中未命中而引起 + 的指令調度停頓周期數 + +當處於 WFI 或者 WFE 狀態時,計數器不會增長。 + +AMU 架構提供了一個高達16位的事件計數器空間,未來新的 AMU 版本中可能 +用它來實現新增的事件計數器。 + +另外,AMUv1 實現了一個多達16個64位輔助事件計數器的計數器組。 + +冷復位時所有的計數器會清零。 + + +基本支持 +-------- + +內核可以安全地運行在支持 AMU 和不支持 AMU 的 CPU 組合中。 +因此,當配置 CONFIG_ARM64_AMU_EXTN 後我們無條件使能後續 +(secondary or hotplugged) CPU 檢測和使用這個特性。 + +當在 CPU 上檢測到該特性時,我們會標記爲特性可用但是不能保證計數器的功能, +僅表明有擴展屬性。 + +固件(代碼運行在高異常級別,例如 arm-tf )需支持以下功能: + + - 提供低異常級別(EL2 和 EL1)訪問 AMU 寄存器的能力。 + - 使能計數器。如果未使能,它的值應爲 0。 + - 在從電源關閉狀態啓動 CPU 前或後保存或者恢復計數器。 + +當使用使能了該特性的內核啓動但固件損壞時,訪問計數器寄存器可能會遭遇 +panic 或者死鎖。即使未發現這些症狀,計數器寄存器返回的數據結果並不一 +定能反映真實情況。通常,計數器會返回 0,表明他們未被使能。 + +如果固件沒有提供適當的支持最好關閉 CONFIG_ARM64_AMU_EXTN。 +值得注意的是,出於安全原因,不要繞過 AMUSERRENR_EL0 設置而捕獲從 +EL0(用戶空間) 訪問 EL1(內核空間)。 因此,固件應該確保訪問 AMU寄存器 +不會困在 EL2或EL3。 + +AMUv1 的固定計數器可以通過如下系統寄存器訪問: + + - SYS_AMEVCNTR0_CORE_EL0 + - SYS_AMEVCNTR0_CONST_EL0 + - SYS_AMEVCNTR0_INST_RET_EL0 + - SYS_AMEVCNTR0_MEM_STALL_EL0 + +特定輔助計數器可以通過 SYS_AMEVCNTR1_EL0(n) 訪問,其中n介於0到15。 + +詳細信息定義在目錄:arch/arm64/include/asm/sysreg.h。 + + +用戶空間訪問 +------------ + +由於以下原因,當前禁止從用戶空間訪問 AMU 的寄存器: + + - 安全因數:可能會暴露處於安全模式執行的代碼信息。 + - 意願:AMU 是用於系統管理的。 + +同樣,該功能對用戶空間不可見。 + + +虛擬化 +------ + +由於以下原因,當前禁止從 KVM 客戶端的用戶空間(EL0)和內核空間(EL1) +訪問 AMU 的寄存器: + + - 安全因數:可能會暴露給其他客戶端或主機端執行的代碼信息。 + +任何試圖訪問 AMU 寄存器的行爲都會觸發一個註冊在客戶端的未定義異常。 + diff --git a/Documentation/translations/zh_TW/arm64/booting.txt b/Documentation/translations/zh_TW/arm64/booting.txt new file mode 100644 index 000000000000..b9439dd54012 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/booting.txt @@ -0,0 +1,251 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arm64/booting.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +M: Will Deacon <will.deacon@arm.com> +zh_CN: Fu Wei <wefu@redhat.com> +zh_TW: Hu Haowen <src.res@email.cn> +C: 55f058e7574c3615dea4615573a19bdb258696c6 +--------------------------------------------------------------------- +Documentation/arm64/booting.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +英文版維護者: Will Deacon <will.deacon@arm.com> +中文版維護者: 傅煒 Fu Wei <wefu@redhat.com> +中文版翻譯者: 傅煒 Fu Wei <wefu@redhat.com> +中文版校譯者: 傅煒 Fu Wei <wefu@redhat.com> +繁體中文版校譯者: 胡皓文 Hu Haowen <src.res@email.cn> +本文翻譯提交時的 Git 檢出點爲: 55f058e7574c3615dea4615573a19bdb258696c6 + +以下爲正文 +--------------------------------------------------------------------- + 啓動 AArch64 Linux + ================== + +作者: Will Deacon <will.deacon@arm.com> +日期: 2012 年 09 月 07 日 + +本文檔基於 Russell King 的 ARM 啓動文檔,且適用於所有公開發布的 +AArch64 Linux 內核代碼。 + +AArch64 異常模型由多個異常級(EL0 - EL3)組成,對於 EL0 和 EL1 異常級 +有對應的安全和非安全模式。EL2 是系統管理級,且僅存在於非安全模式下。 +EL3 是最高特權級,且僅存在於安全模式下。 + +基於本文檔的目的,我們將簡單地使用『引導裝載程序』(『boot loader』) +這個術語來定義在將控制權交給 Linux 內核前 CPU 上執行的所有軟體。 +這可能包含安全監控和系統管理代碼,或者它可能只是一些用於準備最小啓動 +環境的指令。 + +基本上,引導裝載程序(至少)應實現以下操作: + +1、設置和初始化 RAM +2、設置設備樹數據 +3、解壓內核映像 +4、調用內核映像 + + +1、設置和初始化 RAM +----------------- + +必要性: 強制 + +引導裝載程序應該找到並初始化系統中所有內核用於保持系統變量數據的 RAM。 +這個操作的執行方式因設備而異。(它可能使用內部算法來自動定位和計算所有 +RAM,或可能使用對這個設備已知的 RAM 信息,還可能是引導裝載程序設計者 +想到的任何合適的方法。) + + +2、設置設備樹數據 +--------------- + +必要性: 強制 + +設備樹數據塊(dtb)必須 8 字節對齊,且大小不能超過 2MB。由於設備樹 +數據塊將在使能緩存的情況下以 2MB 粒度被映射,故其不能被置於必須以特定 +屬性映射的2M區域內。 + +註: v4.2 之前的版本同時要求設備樹數據塊被置於從內核映像以下 +text_offset 字節處算起第一個 512MB 內。 + +3、解壓內核映像 +------------- + +必要性: 可選 + +AArch64 內核當前沒有提供自解壓代碼,因此如果使用了壓縮內核映像文件 +(比如 Image.gz),則需要通過引導裝載程序(使用 gzip 等)來進行解壓。 +若引導裝載程序沒有實現這個功能,就要使用非壓縮內核映像文件。 + + +4、調用內核映像 +------------- + +必要性: 強制 + +已解壓的內核映像包含一個 64 字節的頭,內容如下: + + u32 code0; /* 可執行代碼 */ + u32 code1; /* 可執行代碼 */ + u64 text_offset; /* 映像裝載偏移,小端模式 */ + u64 image_size; /* 映像實際大小, 小端模式 */ + u64 flags; /* 內核旗標, 小端模式 * + u64 res2 = 0; /* 保留 */ + u64 res3 = 0; /* 保留 */ + u64 res4 = 0; /* 保留 */ + u32 magic = 0x644d5241; /* 魔數, 小端, "ARM\x64" */ + u32 res5; /* 保留 (用於 PE COFF 偏移) */ + + +映像頭注釋: + +- 自 v3.17 起,除非另有說明,所有域都是小端模式。 + +- code0/code1 負責跳轉到 stext. + +- 當通過 EFI 啓動時, 最初 code0/code1 被跳過。 + res5 是到 PE 文件頭的偏移,而 PE 文件頭含有 EFI 的啓動入口點 + (efi_stub_entry)。當 stub 代碼完成了它的使命,它會跳轉到 code0 + 繼續正常的啓動流程。 + +- v3.17 之前,未明確指定 text_offset 的字節序。此時,image_size 爲零, + 且 text_offset 依照內核字節序爲 0x80000。 + 當 image_size 非零,text_offset 爲小端模式且是有效值,應被引導加載 + 程序使用。當 image_size 爲零,text_offset 可假定爲 0x80000。 + +- flags 域 (v3.17 引入) 爲 64 位小端模式,其編碼如下: + 位 0: 內核字節序。 1 表示大端模式,0 表示小端模式。 + 位 1-2: 內核頁大小。 + 0 - 未指定。 + 1 - 4K + 2 - 16K + 3 - 64K + 位 3: 內核物理位置 + 0 - 2MB 對齊基址應儘量靠近內存起始處,因爲 + 其基址以下的內存無法通過線性映射訪問 + 1 - 2MB 對齊基址可以在物理內存的任意位置 + 位 4-63: 保留。 + +- 當 image_size 爲零時,引導裝載程序應試圖在內核映像末尾之後儘可能 + 多地保留空閒內存供內核直接使用。對內存空間的需求量因所選定的內核 + 特性而異, 並無實際限制。 + +內核映像必須被放置在任意一個可用系統內存 2MB 對齊基址的 text_offset +字節處,並從該處被調用。2MB 對齊基址和內核映像起始地址之間的區域對於 +內核來說沒有特殊意義,且可能被用於其他目的。 +從映像起始地址算起,最少必須準備 image_size 字節的空閒內存供內核使用。 +註: v4.6 之前的版本無法使用內核映像物理偏移以下的內存,所以當時建議 +將映像儘量放置在靠近系統內存起始的地方。 + +任何提供給內核的內存(甚至在映像起始地址之前),若未從內核中標記爲保留 +(如在設備樹(dtb)的 memreserve 區域),都將被認爲對內核是可用。 + +在跳轉入內核前,必須符合以下狀態: + +- 停止所有 DMA 設備,這樣內存數據就不會因爲虛假網絡包或磁碟數據而 + 被破壞。這可能可以節省你許多的調試時間。 + +- 主 CPU 通用寄存器設置 + x0 = 系統 RAM 中設備樹數據塊(dtb)的物理地址。 + x1 = 0 (保留,將來可能使用) + x2 = 0 (保留,將來可能使用) + x3 = 0 (保留,將來可能使用) + +- CPU 模式 + 所有形式的中斷必須在 PSTATE.DAIF 中被屏蔽(Debug、SError、IRQ + 和 FIQ)。 + CPU 必須處於 EL2(推薦,可訪問虛擬化擴展)或非安全 EL1 模式下。 + +- 高速緩存、MMU + MMU 必須關閉。 + 指令緩存開啓或關閉皆可。 + 已載入的內核映像的相應內存區必須被清理,以達到緩存一致性點(PoC)。 + 當存在系統緩存或其他使能緩存的一致性主控器時,通常需使用虛擬地址 + 維護其緩存,而非 set/way 操作。 + 遵從通過虛擬地址操作維護構架緩存的系統緩存必須被配置,並可以被使能。 + 而不通過虛擬地址操作維護構架緩存的系統緩存(不推薦),必須被配置且 + 禁用。 + + *譯者註:對於 PoC 以及緩存相關內容,請參考 ARMv8 構架參考手冊 + ARM DDI 0487A + +- 架構計時器 + CNTFRQ 必須設定爲計時器的頻率,且 CNTVOFF 必須設定爲對所有 CPU + 都一致的值。如果在 EL1 模式下進入內核,則 CNTHCTL_EL2 中的 + EL1PCTEN (bit 0) 必須置位。 + +- 一致性 + 通過內核啓動的所有 CPU 在內核入口地址上必須處於相同的一致性域中。 + 這可能要根據具體實現來定義初始化過程,以使能每個CPU上對維護操作的 + 接收。 + +- 系統寄存器 + 在進入內核映像的異常級中,所有構架中可寫的系統寄存器必須通過軟體 + 在一個更高的異常級別下初始化,以防止在 未知 狀態下運行。 + + 對於擁有 GICv3 中斷控制器並以 v3 模式運行的系統: + - 如果 EL3 存在: + ICC_SRE_EL3.Enable (位 3) 必須初始化爲 0b1。 + ICC_SRE_EL3.SRE (位 0) 必須初始化爲 0b1。 + - 若內核運行在 EL1: + ICC_SRE_EL2.Enable (位 3) 必須初始化爲 0b1。 + ICC_SRE_EL2.SRE (位 0) 必須初始化爲 0b1。 + - 設備樹(DT)或 ACPI 表必須描述一個 GICv3 中斷控制器。 + + 對於擁有 GICv3 中斷控制器並以兼容(v2)模式運行的系統: + - 如果 EL3 存在: + ICC_SRE_EL3.SRE (位 0) 必須初始化爲 0b0。 + - 若內核運行在 EL1: + ICC_SRE_EL2.SRE (位 0) 必須初始化爲 0b0。 + - 設備樹(DT)或 ACPI 表必須描述一個 GICv2 中斷控制器。 + +以上對於 CPU 模式、高速緩存、MMU、架構計時器、一致性、系統寄存器的 +必要條件描述適用於所有 CPU。所有 CPU 必須在同一異常級別跳入內核。 + +引導裝載程序必須在每個 CPU 處於以下狀態時跳入內核入口: + +- 主 CPU 必須直接跳入內核映像的第一條指令。通過此 CPU 傳遞的設備樹 + 數據塊必須在每個 CPU 節點中包含一個 『enable-method』 屬性,所 + 支持的 enable-method 請見下文。 + + 引導裝載程序必須生成這些設備樹屬性,並在跳入內核入口之前將其插入 + 數據塊。 + +- enable-method 爲 「spin-table」 的 CPU 必須在它們的 CPU + 節點中包含一個 『cpu-release-addr』 屬性。這個屬性標識了一個 + 64 位自然對齊且初始化爲零的內存位置。 + + 這些 CPU 必須在內存保留區(通過設備樹中的 /memreserve/ 域傳遞 + 給內核)中自旋於內核之外,輪詢它們的 cpu-release-addr 位置(必須 + 包含在保留區中)。可通過插入 wfe 指令來降低忙循環開銷,而主 CPU 將 + 發出 sev 指令。當對 cpu-release-addr 所指位置的讀取操作返回非零值 + 時,CPU 必須跳入此值所指向的地址。此值爲一個單獨的 64 位小端值, + 因此 CPU 須在跳轉前將所讀取的值轉換爲其本身的端模式。 + +- enable-method 爲 「psci」 的 CPU 保持在內核外(比如,在 + memory 節點中描述爲內核空間的內存區外,或在通過設備樹 /memreserve/ + 域中描述爲內核保留區的空間中)。內核將會發起在 ARM 文檔(編號 + ARM DEN 0022A:用於 ARM 上的電源狀態協調接口系統軟體)中描述的 + CPU_ON 調用來將 CPU 帶入內核。 + + *譯者注: ARM DEN 0022A 已更新到 ARM DEN 0022C。 + + 設備樹必須包含一個 『psci』 節點,請參考以下文檔: + Documentation/devicetree/bindings/arm/psci.yaml + + +- 輔助 CPU 通用寄存器設置 + x0 = 0 (保留,將來可能使用) + x1 = 0 (保留,將來可能使用) + x2 = 0 (保留,將來可能使用) + x3 = 0 (保留,將來可能使用) + diff --git a/Documentation/translations/zh_TW/arm64/elf_hwcaps.rst b/Documentation/translations/zh_TW/arm64/elf_hwcaps.rst new file mode 100644 index 000000000000..3eb1c623ce31 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/elf_hwcaps.rst @@ -0,0 +1,244 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arm64/elf_hwcaps.rst <elf_hwcaps_index>` + +Translator: Bailu Lin <bailu.lin@vivo.com> + Hu Haowen <src.res@email.cn> + +================ +ARM64 ELF hwcaps +================ + +這篇文檔描述了 arm64 ELF hwcaps 的用法和語義。 + + +1. 簡介 +------- + +有些硬體或軟體功能僅在某些 CPU 實現上和/或在具體某個內核配置上可用,但 +對於處於 EL0 的用戶空間代碼沒有可用的架構發現機制。內核通過在輔助向量表 +公開一組稱爲 hwcaps 的標誌而把這些功能暴露給用戶空間。 + +用戶空間軟體可以通過獲取輔助向量的 AT_HWCAP 或 AT_HWCAP2 條目來測試功能, +並測試是否設置了相關標誌,例如:: + + bool floating_point_is_present(void) + { + unsigned long hwcaps = getauxval(AT_HWCAP); + if (hwcaps & HWCAP_FP) + return true; + + return false; + } + +如果軟體依賴於 hwcap 描述的功能,在嘗試使用該功能前則應檢查相關的 hwcap +標誌以驗證該功能是否存在。 + +不能通過其他方式探查這些功能。當一個功能不可用時,嘗試使用它可能導致不可 +預測的行爲,並且無法保證能確切的知道該功能不可用,例如 SIGILL。 + + +2. Hwcaps 的說明 +---------------- + +大多數 hwcaps 旨在說明通過架構 ID 寄存器(處於 EL0 的用戶空間代碼無法訪問) +描述的功能的存在。這些 hwcap 通過 ID 寄存器欄位定義,並且應根據 ARM 體系 +結構參考手冊(ARM ARM)中定義的欄位來解釋說明。 + +這些 hwcaps 以下面的形式描述:: + + idreg.field == val 表示有某個功能。 + +當 idreg.field 中有 val 時,hwcaps 表示 ARM ARM 定義的功能是有效的,但是 +並不是說要完全和 val 相等,也不是說 idreg.field 描述的其他功能就是缺失的。 + +其他 hwcaps 可能表明無法僅由 ID 寄存器描述的功能的存在。這些 hwcaps 可能 +沒有被 ID 寄存器描述,需要參考其他文檔。 + + +3. AT_HWCAP 中揭示的 hwcaps +--------------------------- + +HWCAP_FP + ID_AA64PFR0_EL1.FP == 0b0000 表示有此功能。 + +HWCAP_ASIMD + ID_AA64PFR0_EL1.AdvSIMD == 0b0000 表示有此功能。 + +HWCAP_EVTSTRM + 通用計時器頻率配置爲大約100KHz以生成事件。 + +HWCAP_AES + ID_AA64ISAR0_EL1.AES == 0b0001 表示有此功能。 + +HWCAP_PMULL + ID_AA64ISAR0_EL1.AES == 0b0010 表示有此功能。 + +HWCAP_SHA1 + ID_AA64ISAR0_EL1.SHA1 == 0b0001 表示有此功能。 + +HWCAP_SHA2 + ID_AA64ISAR0_EL1.SHA2 == 0b0001 表示有此功能。 + +HWCAP_CRC32 + ID_AA64ISAR0_EL1.CRC32 == 0b0001 表示有此功能。 + +HWCAP_ATOMICS + ID_AA64ISAR0_EL1.Atomic == 0b0010 表示有此功能。 + +HWCAP_FPHP + ID_AA64PFR0_EL1.FP == 0b0001 表示有此功能。 + +HWCAP_ASIMDHP + ID_AA64PFR0_EL1.AdvSIMD == 0b0001 表示有此功能。 + +HWCAP_CPUID + 根據 Documentation/arm64/cpu-feature-registers.rst 描述,EL0 可以訪問 + 某些 ID 寄存器。 + + 這些 ID 寄存器可能表示功能的可用性。 + +HWCAP_ASIMDRDM + ID_AA64ISAR0_EL1.RDM == 0b0001 表示有此功能。 + +HWCAP_JSCVT + ID_AA64ISAR1_EL1.JSCVT == 0b0001 表示有此功能。 + +HWCAP_FCMA + ID_AA64ISAR1_EL1.FCMA == 0b0001 表示有此功能。 + +HWCAP_LRCPC + ID_AA64ISAR1_EL1.LRCPC == 0b0001 表示有此功能。 + +HWCAP_DCPOP + ID_AA64ISAR1_EL1.DPB == 0b0001 表示有此功能。 + +HWCAP_SHA3 + ID_AA64ISAR0_EL1.SHA3 == 0b0001 表示有此功能。 + +HWCAP_SM3 + ID_AA64ISAR0_EL1.SM3 == 0b0001 表示有此功能。 + +HWCAP_SM4 + ID_AA64ISAR0_EL1.SM4 == 0b0001 表示有此功能。 + +HWCAP_ASIMDDP + ID_AA64ISAR0_EL1.DP == 0b0001 表示有此功能。 + +HWCAP_SHA512 + ID_AA64ISAR0_EL1.SHA2 == 0b0010 表示有此功能。 + +HWCAP_SVE + ID_AA64PFR0_EL1.SVE == 0b0001 表示有此功能。 + +HWCAP_ASIMDFHM + ID_AA64ISAR0_EL1.FHM == 0b0001 表示有此功能。 + +HWCAP_DIT + ID_AA64PFR0_EL1.DIT == 0b0001 表示有此功能。 + +HWCAP_USCAT + ID_AA64MMFR2_EL1.AT == 0b0001 表示有此功能。 + +HWCAP_ILRCPC + ID_AA64ISAR1_EL1.LRCPC == 0b0010 表示有此功能。 + +HWCAP_FLAGM + ID_AA64ISAR0_EL1.TS == 0b0001 表示有此功能。 + +HWCAP_SSBS + ID_AA64PFR1_EL1.SSBS == 0b0010 表示有此功能。 + +HWCAP_SB + ID_AA64ISAR1_EL1.SB == 0b0001 表示有此功能。 + +HWCAP_PACA + 如 Documentation/arm64/pointer-authentication.rst 所描述, + ID_AA64ISAR1_EL1.APA == 0b0001 或 ID_AA64ISAR1_EL1.API == 0b0001 + 表示有此功能。 + +HWCAP_PACG + 如 Documentation/arm64/pointer-authentication.rst 所描述, + ID_AA64ISAR1_EL1.GPA == 0b0001 或 ID_AA64ISAR1_EL1.GPI == 0b0001 + 表示有此功能。 + +HWCAP2_DCPODP + + ID_AA64ISAR1_EL1.DPB == 0b0010 表示有此功能。 + +HWCAP2_SVE2 + + ID_AA64ZFR0_EL1.SVEVer == 0b0001 表示有此功能。 + +HWCAP2_SVEAES + + ID_AA64ZFR0_EL1.AES == 0b0001 表示有此功能。 + +HWCAP2_SVEPMULL + + ID_AA64ZFR0_EL1.AES == 0b0010 表示有此功能。 + +HWCAP2_SVEBITPERM + + ID_AA64ZFR0_EL1.BitPerm == 0b0001 表示有此功能。 + +HWCAP2_SVESHA3 + + ID_AA64ZFR0_EL1.SHA3 == 0b0001 表示有此功能。 + +HWCAP2_SVESM4 + + ID_AA64ZFR0_EL1.SM4 == 0b0001 表示有此功能。 + +HWCAP2_FLAGM2 + + ID_AA64ISAR0_EL1.TS == 0b0010 表示有此功能。 + +HWCAP2_FRINT + + ID_AA64ISAR1_EL1.FRINTTS == 0b0001 表示有此功能。 + +HWCAP2_SVEI8MM + + ID_AA64ZFR0_EL1.I8MM == 0b0001 表示有此功能。 + +HWCAP2_SVEF32MM + + ID_AA64ZFR0_EL1.F32MM == 0b0001 表示有此功能。 + +HWCAP2_SVEF64MM + + ID_AA64ZFR0_EL1.F64MM == 0b0001 表示有此功能。 + +HWCAP2_SVEBF16 + + ID_AA64ZFR0_EL1.BF16 == 0b0001 表示有此功能。 + +HWCAP2_I8MM + + ID_AA64ISAR1_EL1.I8MM == 0b0001 表示有此功能。 + +HWCAP2_BF16 + + ID_AA64ISAR1_EL1.BF16 == 0b0001 表示有此功能。 + +HWCAP2_DGH + + ID_AA64ISAR1_EL1.DGH == 0b0001 表示有此功能。 + +HWCAP2_RNG + + ID_AA64ISAR0_EL1.RNDR == 0b0001 表示有此功能。 + +HWCAP2_BTI + + ID_AA64PFR0_EL1.BT == 0b0001 表示有此功能。 + + +4. 未使用的 AT_HWCAP 位 +----------------------- + +爲了與用戶空間交互,內核保證 AT_HWCAP 的第62、63位將始終返回0。 + diff --git a/Documentation/translations/zh_TW/arm64/hugetlbpage.rst b/Documentation/translations/zh_TW/arm64/hugetlbpage.rst new file mode 100644 index 000000000000..846b500dae97 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/hugetlbpage.rst @@ -0,0 +1,49 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arm64/hugetlbpage.rst <hugetlbpage_index>` + +Translator: Bailu Lin <bailu.lin@vivo.com> + Hu Haowen <src.res@email.cn> + +===================== +ARM64中的 HugeTLBpage +===================== + +大頁依靠有效利用 TLBs 來提高地址翻譯的性能。這取決於以下 +兩點 - + + - 大頁的大小 + - TLBs 支持的條目大小 + +ARM64 接口支持2種大頁方式。 + +1) pud/pmd 級別的塊映射 +----------------------- + +這是常規大頁,他們的 pmd 或 pud 頁面表條目指向一個內存塊。 +不管 TLB 中支持的條目大小如何,塊映射可以減少翻譯大頁地址 +所需遍歷的頁表深度。 + +2) 使用連續位 +------------- + +架構中轉換頁表條目(D4.5.3, ARM DDI 0487C.a)中提供一個連續 +位告訴 MMU 這個條目是一個連續條目集的一員,它可以被緩存在單 +個 TLB 條目中。 + +在 Linux 中連續位用來增加 pmd 和 pte(最後一級)級別映射的大 +小。受支持的連續頁表條目數量因頁面大小和頁表級別而異。 + + +支持以下大頁尺寸配置 - + + ====== ======== ==== ======== === + - CONT PTE PMD CONT PMD PUD + ====== ======== ==== ======== === + 4K: 64K 2M 32M 1G + 16K: 2M 32M 1G + 64K: 2M 512M 16G + ====== ======== ==== ======== === + diff --git a/Documentation/translations/zh_TW/arm64/index.rst b/Documentation/translations/zh_TW/arm64/index.rst new file mode 100644 index 000000000000..2322783f3881 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/index.rst @@ -0,0 +1,23 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arm64/index.rst <arm64_index>` +:Translator: Bailu Lin <bailu.lin@vivo.com> + Hu Haowen <src.res@email.cn> + +.. _tw_arm64_index: + + +========== +ARM64 架構 +========== + +.. toctree:: + :maxdepth: 2 + + amu + hugetlbpage + perf + elf_hwcaps + diff --git a/Documentation/translations/zh_TW/arm64/legacy_instructions.txt b/Documentation/translations/zh_TW/arm64/legacy_instructions.txt new file mode 100644 index 000000000000..6d4454f77b9e --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/legacy_instructions.txt @@ -0,0 +1,77 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arm64/legacy_instructions.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Punit Agrawal <punit.agrawal@arm.com> + Suzuki K. Poulose <suzuki.poulose@arm.com> +Chinese maintainer: Fu Wei <wefu@redhat.com> +Traditional Chinese maintainer: Hu Haowen <src.res@email.cn> +--------------------------------------------------------------------- +Documentation/arm64/legacy_instructions.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +本文翻譯提交時的 Git 檢出點爲: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 + +英文版維護者: Punit Agrawal <punit.agrawal@arm.com> + Suzuki K. Poulose <suzuki.poulose@arm.com> +中文版維護者: 傅煒 Fu Wei <wefu@redhat.com> +中文版翻譯者: 傅煒 Fu Wei <wefu@redhat.com> +中文版校譯者: 傅煒 Fu Wei <wefu@redhat.com> +繁體中文版校譯者:胡皓文 Hu Haowen <src.res@email.cn> + +以下爲正文 +--------------------------------------------------------------------- +Linux 內核在 arm64 上的移植提供了一個基礎框架,以支持構架中正在被淘汰或已廢棄指令的模擬執行。 +這個基礎框架的代碼使用未定義指令鉤子(hooks)來支持模擬。如果指令存在,它也允許在硬體中啓用該指令。 + +模擬模式可通過寫 sysctl 節點(/proc/sys/abi)來控制。 +不同的執行方式及 sysctl 節點的相應值,解釋如下: + +* Undef(未定義) + 值: 0 + 產生未定義指令終止異常。它是那些構架中已廢棄的指令,如 SWP,的默認處理方式。 + +* Emulate(模擬) + 值: 1 + 使用軟體模擬方式。爲解決軟體遷移問題,這種模擬指令模式的使用是被跟蹤的,並會發出速率限制警告。 + 它是那些構架中正在被淘汰的指令,如 CP15 barriers(隔離指令),的默認處理方式。 + +* Hardware Execution(硬體執行) + 值: 2 + 雖然標記爲正在被淘汰,但一些實現可能提供硬體執行這些指令的使能/禁用操作。 + 使用硬體執行一般會有更好的性能,但將無法收集運行時對正被淘汰指令的使用統計數據。 + +默認執行模式依賴於指令在構架中狀態。正在被淘汰的指令應該以模擬(Emulate)作爲默認模式, +而已廢棄的指令必須默認使用未定義(Undef)模式 + +注意:指令模擬可能無法應對所有情況。更多詳情請參考單獨的指令注釋。 + +受支持的遺留指令 +------------- +* SWP{B} +節點: /proc/sys/abi/swp +狀態: 已廢棄 +默認執行方式: Undef (0) + +* CP15 Barriers +節點: /proc/sys/abi/cp15_barrier +狀態: 正被淘汰,不推薦使用 +默認執行方式: Emulate (1) + +* SETEND +節點: /proc/sys/abi/setend +狀態: 正被淘汰,不推薦使用 +默認執行方式: Emulate (1)* +註:爲了使能這個特性,系統中的所有 CPU 必須在 EL0 支持混合字節序。 +如果一個新的 CPU (不支持混合字節序) 在使能這個特性後被熱插入系統, +在應用中可能會出現不可預期的結果。 + diff --git a/Documentation/translations/zh_TW/arm64/memory.txt b/Documentation/translations/zh_TW/arm64/memory.txt new file mode 100644 index 000000000000..99c2b78b5674 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/memory.txt @@ -0,0 +1,119 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arm64/memory.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Catalin Marinas <catalin.marinas@arm.com> +Chinese maintainer: Fu Wei <wefu@redhat.com> +Traditional Chinese maintainer: Hu Haowen <src.res@email.cn> +--------------------------------------------------------------------- +Documentation/arm64/memory.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +本文翻譯提交時的 Git 檢出點爲: bc465aa9d045feb0e13b4a8f32cc33c1943f62d6 + +英文版維護者: Catalin Marinas <catalin.marinas@arm.com> +中文版維護者: 傅煒 Fu Wei <wefu@redhat.com> +中文版翻譯者: 傅煒 Fu Wei <wefu@redhat.com> +中文版校譯者: 傅煒 Fu Wei <wefu@redhat.com> +繁體中文版校譯者: 胡皓文 Hu Haowen <src.res@email.cn> + +以下爲正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中的內存布局 + =========================== + +作者: Catalin Marinas <catalin.marinas@arm.com> + +本文檔描述 AArch64 Linux 內核所使用的虛擬內存布局。此構架可以實現 +頁大小爲 4KB 的 4 級轉換表和頁大小爲 64KB 的 3 級轉換表。 + +AArch64 Linux 使用 3 級或 4 級轉換表,其頁大小配置爲 4KB,對於用戶和內核 +分別都有 39-bit (512GB) 或 48-bit (256TB) 的虛擬地址空間。 +對於頁大小爲 64KB的配置,僅使用 2 級轉換表,有 42-bit (4TB) 的虛擬地址空間,但內存布局相同。 + +用戶地址空間的 63:48 位爲 0,而內核地址空間的相應位爲 1。TTBRx 的 +選擇由虛擬地址的 63 位給出。swapper_pg_dir 僅包含內核(全局)映射, +而用戶 pgd 僅包含用戶(非全局)映射。swapper_pg_dir 地址被寫入 +TTBR1 中,且從不寫入 TTBR0。 + + +AArch64 Linux 在頁大小爲 4KB,並使用 3 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000007fffffffff 512GB 用戶空間 +ffffff8000000000 ffffffffffffffff 512GB 內核空間 + + +AArch64 Linux 在頁大小爲 4KB,並使用 4 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB 用戶空間 +ffff000000000000 ffffffffffffffff 256TB 內核空間 + + +AArch64 Linux 在頁大小爲 64KB,並使用 2 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 000003ffffffffff 4TB 用戶空間 +fffffc0000000000 ffffffffffffffff 4TB 內核空間 + + +AArch64 Linux 在頁大小爲 64KB,並使用 3 級轉換表時的內存布局: + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000000000000000 0000ffffffffffff 256TB 用戶空間 +ffff000000000000 ffffffffffffffff 256TB 內核空間 + + +更詳細的內核虛擬內存布局,請參閱內核啓動信息。 + + +4KB 頁大小的轉換表查找: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | | + | | | | | v + | | | | | [11:0] 頁內偏移 + | | | | +-> [20:12] L3 索引 + | | | +-----------> [29:21] L2 索引 + | | +---------------------> [38:30] L1 索引 + | +-------------------------------> [47:39] L0 索引 + +-------------------------------------------------> [63] TTBR0/1 + + +64KB 頁大小的轉換表查找: + ++--------+--------+--------+--------+--------+--------+--------+--------+ +|63 56|55 48|47 40|39 32|31 24|23 16|15 8|7 0| ++--------+--------+--------+--------+--------+--------+--------+--------+ + | | | | | + | | | | v + | | | | [15:0] 頁內偏移 + | | | +----------> [28:16] L3 索引 + | | +--------------------------> [41:29] L2 索引 + | +-------------------------------> [47:42] L1 索引 + +-------------------------------------------------> [63] TTBR0/1 + + +當使用 KVM 時, 管理程序(hypervisor)在 EL2 中通過相對內核虛擬地址的 +一個固定偏移來映射內核頁(內核虛擬地址的高 24 位設爲零): + +起始地址 結束地址 大小 用途 +----------------------------------------------------------------------- +0000004000000000 0000007fffffffff 256GB 在 HYP 中映射的內核對象 + diff --git a/Documentation/translations/zh_TW/arm64/perf.rst b/Documentation/translations/zh_TW/arm64/perf.rst new file mode 100644 index 000000000000..f1ffd55dfe50 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/perf.rst @@ -0,0 +1,88 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/arm64/perf.rst <perf_index>` + +Translator: Bailu Lin <bailu.lin@vivo.com> + Hu Haowen <src.res@email.cn> + +============= +Perf 事件屬性 +============= + +:作者: Andrew Murray <andrew.murray@arm.com> +:日期: 2019-03-06 + +exclude_user +------------ + +該屬性排除用戶空間。 + +用戶空間始終運行在 EL0,因此該屬性將排除 EL0。 + + +exclude_kernel +-------------- + +該屬性排除內核空間。 + +打開 VHE 時內核運行在 EL2,不打開 VHE 時內核運行在 EL1。客戶機 +內核總是運行在 EL1。 + +對於宿主機,該屬性排除 EL1 和 VHE 上的 EL2。 + +對於客戶機,該屬性排除 EL1。請注意客戶機從來不會運行在 EL2。 + + +exclude_hv +---------- + +該屬性排除虛擬機監控器。 + +對於 VHE 宿主機該屬性將被忽略,此時我們認爲宿主機內核是虛擬機監 +控器。 + +對於 non-VHE 宿主機該屬性將排除 EL2,因爲虛擬機監控器運行在 EL2 +的任何代碼主要用於客戶機和宿主機的切換。 + +對於客戶機該屬性無效。請注意客戶機從來不會運行在 EL2。 + + +exclude_host / exclude_guest +---------------------------- + +這些屬性分別排除了 KVM 宿主機和客戶機。 + +KVM 宿主機可能運行在 EL0(用戶空間),EL1(non-VHE 內核)和 +EL2(VHE 內核 或 non-VHE 虛擬機監控器)。 + +KVM 客戶機可能運行在 EL0(用戶空間)和 EL1(內核)。 + +由於宿主機和客戶機之間重疊的異常級別,我們不能僅僅依靠 PMU 的硬體異 +常過濾機制-因此我們必須啓用/禁用對於客戶機進入和退出的計數。而這在 +VHE 和 non-VHE 系統上表現不同。 + +對於 non-VHE 系統的 exclude_host 屬性排除 EL2 - 在進入和退出客戶 +機時,我們會根據 exclude_host 和 exclude_guest 屬性在適當的情況下 +禁用/啓用該事件。 + +對於 VHE 系統的 exclude_guest 屬性排除 EL1,而對其中的 exclude_host +屬性同時排除 EL0,EL2。在進入和退出客戶機時,我們會適當地根據 +exclude_host 和 exclude_guest 屬性包括/排除 EL0。 + +以上聲明也適用於在 not-VHE 客戶機使用這些屬性時,但是請注意客戶機從 +來不會運行在 EL2。 + + +準確性 +------ + +在 non-VHE 宿主機上,我們在 EL2 進入/退出宿主機/客戶機的切換時啓用/ +關閉計數器 -但是在啓用/禁用計數器和進入/退出客戶機之間存在一段延時。 +對於 exclude_host, 我們可以通過過濾 EL2 消除在客戶機進入/退出邊界 +上用於計數客戶機事件的宿主機事件計數器。但是當使用 !exclude_hv 時, +在客戶機進入/退出有一個小的停電窗口無法捕獲到宿主機的事件。 + +在 VHE 系統沒有停電窗口。 + diff --git a/Documentation/translations/zh_TW/arm64/silicon-errata.txt b/Documentation/translations/zh_TW/arm64/silicon-errata.txt new file mode 100644 index 000000000000..bf2077197504 --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/silicon-errata.txt @@ -0,0 +1,79 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arm64/silicon-errata.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +M: Will Deacon <will.deacon@arm.com> +zh_CN: Fu Wei <wefu@redhat.com> +zh_TW: Hu Haowen <src.res@email.cn> +C: 1926e54f115725a9248d0c4c65c22acaf94de4c4 +--------------------------------------------------------------------- +Documentation/arm64/silicon-errata.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +英文版維護者: Will Deacon <will.deacon@arm.com> +中文版維護者: 傅煒 Fu Wei <wefu@redhat.com> +中文版翻譯者: 傅煒 Fu Wei <wefu@redhat.com> +中文版校譯者: 傅煒 Fu Wei <wefu@redhat.com> +繁體中文版校譯者: 胡皓文 Hu Haowen <src.res@email.cn> +本文翻譯提交時的 Git 檢出點爲: 1926e54f115725a9248d0c4c65c22acaf94de4c4 + +以下爲正文 +--------------------------------------------------------------------- + 晶片勘誤和軟體補救措施 + ================== + +作者: Will Deacon <will.deacon@arm.com> +日期: 2015年11月27日 + +一個不幸的現實:硬體經常帶有一些所謂的「瑕疵(errata)」,導致其在 +某些特定情況下會違背構架定義的行爲。就基於 ARM 的硬體而言,這些瑕疵 +大體可分爲以下幾類: + + A 類:無可行補救措施的嚴重缺陷。 + B 類:有可接受的補救措施的重大或嚴重缺陷。 + C 類:在正常操作中不會顯現的小瑕疵。 + +更多資訊,請在 infocenter.arm.com (需註冊)中查閱「軟體開發者勘誤 +筆記」(「Software Developers Errata Notice」)文檔。 + +對於 Linux 而言,B 類缺陷可能需要作業系統的某些特別處理。例如,避免 +一個特殊的代碼序列,或是以一種特定的方式配置處理器。在某種不太常見的 +情況下,爲將 A 類缺陷當作 C 類處理,可能需要用類似的手段。這些手段被 +統稱爲「軟體補救措施」,且僅在少數情況需要(例如,那些需要一個運行在 +非安全異常級的補救措施 *並且* 能被 Linux 觸發的情況)。 + +對於尚在討論中的可能對未受瑕疵影響的系統產生干擾的軟體補救措施,有一個 +相應的內核配置(Kconfig)選項被加在 「內核特性(Kernel Features)」-> +「基於可選方法框架的 ARM 瑕疵補救措施(ARM errata workarounds via +the alternatives framework)"。這些選項被默認開啓,若探測到受影響的CPU, +補丁將在運行時被使用。至於對系統運行影響較小的補救措施,內核配置選項 +並不存在,且代碼以某種規避瑕疵的方式被構造(帶注釋爲宜)。 + +這種做法對於在任意內核原始碼樹中準確地判斷出哪個瑕疵已被軟體方法所補救 +稍微有點麻煩,所以在 Linux 內核中此文件作爲軟體補救措施的註冊表, +並將在新的軟體補救措施被提交和向後移植(backported)到穩定內核時被更新。 + +| 實現者 | 受影響的組件 | 勘誤編號 | 內核配置 | ++----------------+-----------------+-----------------+-------------------------+ +| ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | +| ARM | Cortex-A53 | #827319 | ARM64_ERRATUM_827319 | +| ARM | Cortex-A53 | #824069 | ARM64_ERRATUM_824069 | +| ARM | Cortex-A53 | #819472 | ARM64_ERRATUM_819472 | +| ARM | Cortex-A53 | #845719 | ARM64_ERRATUM_845719 | +| ARM | Cortex-A53 | #843419 | ARM64_ERRATUM_843419 | +| ARM | Cortex-A57 | #832075 | ARM64_ERRATUM_832075 | +| ARM | Cortex-A57 | #852523 | N/A | +| ARM | Cortex-A57 | #834220 | ARM64_ERRATUM_834220 | +| | | | | +| Cavium | ThunderX ITS | #22375, #24313 | CAVIUM_ERRATUM_22375 | +| Cavium | ThunderX GICv3 | #23154 | CAVIUM_ERRATUM_23154 | + diff --git a/Documentation/translations/zh_TW/arm64/tagged-pointers.txt b/Documentation/translations/zh_TW/arm64/tagged-pointers.txt new file mode 100644 index 000000000000..87f88628401a --- /dev/null +++ b/Documentation/translations/zh_TW/arm64/tagged-pointers.txt @@ -0,0 +1,57 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/arm64/tagged-pointers.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Will Deacon <will.deacon@arm.com> +Chinese maintainer: Fu Wei <wefu@redhat.com> +Traditional Chinese maintainer: Hu Haowen <src.res@email.cn> +--------------------------------------------------------------------- +Documentation/arm64/tagged-pointers.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 + +英文版維護者: Will Deacon <will.deacon@arm.com> +中文版維護者: 傅煒 Fu Wei <wefu@redhat.com> +中文版翻譯者: 傅煒 Fu Wei <wefu@redhat.com> +中文版校譯者: 傅煒 Fu Wei <wefu@redhat.com> +繁體中文版校譯者: 胡皓文 Hu Haowen <src.res@email.cn> + +以下爲正文 +--------------------------------------------------------------------- + Linux 在 AArch64 中帶標記的虛擬地址 + ================================= + +作者: Will Deacon <will.deacon@arm.com> +日期: 2013 年 06 月 12 日 + +本文檔簡述了在 AArch64 地址轉換系統中提供的帶標記的虛擬地址及其在 +AArch64 Linux 中的潛在用途。 + +內核提供的地址轉換表配置使通過 TTBR0 完成的虛擬地址轉換(即用戶空間 +映射),其虛擬地址的最高 8 位(63:56)會被轉換硬體所忽略。這種機制 +讓這些位可供應用程式自由使用,其注意事項如下: + + (1) 內核要求所有傳遞到 EL1 的用戶空間地址帶有 0x00 標記。 + 這意味著任何攜帶用戶空間虛擬地址的系統調用(syscall) + 參數 *必須* 在陷入內核前使它們的最高字節被清零。 + + (2) 非零標記在傳遞信號時不被保存。這意味著在應用程式中利用了 + 標記的信號處理函數無法依賴 siginfo_t 的用戶空間虛擬 + 地址所攜帶的包含其內部域信息的標記。此規則的一個例外是 + 當信號是在調試觀察點的異常處理程序中產生的,此時標記的 + 信息將被保存。 + + (3) 當使用帶標記的指針時需特別留心,因爲僅對兩個虛擬地址 + 的高字節,C 編譯器很可能無法判斷它們是不同的。 + +此構架會阻止對帶標記的 PC 指針的利用,因此在異常返回時,其高字節 +將被設置成一個爲 「55」 的擴展符。 + diff --git a/Documentation/translations/zh_TW/cpu-freq/core.rst b/Documentation/translations/zh_TW/cpu-freq/core.rst new file mode 100644 index 000000000000..3d890c2f2a61 --- /dev/null +++ b/Documentation/translations/zh_TW/cpu-freq/core.rst @@ -0,0 +1,108 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :doc:`../../../cpu-freq/core` +:Translator: Yanteng Si <siyanteng@loongson.cn> + Hu Haowen <src.res@email.cn> + +.. _tw_core.rst: + + +==================================== +CPUFreq核心和CPUFreq通知器的通用說明 +==================================== + +作者: + - Dominik Brodowski <linux@brodo.de> + - David Kimdon <dwhedon@debian.org> + - Rafael J. Wysocki <rafael.j.wysocki@intel.com> + - Viresh Kumar <viresh.kumar@linaro.org> + +.. 目錄: + + 1. CPUFreq核心和接口 + 2. CPUFreq通知器 + 3. 含有Operating Performance Point (OPP)的CPUFreq表的生成 + +1. CPUFreq核心和接口 +====================== + +cpufreq核心代碼位於drivers/cpufreq/cpufreq.c中。這些cpufreq代碼爲CPUFreq架構的驅 +動程序(那些操作硬體切換頻率的代碼)以及 "通知器 "提供了一個標準化的接口。 +這些是設備驅動程序或需要了解策略變化的其它內核部分(如 ACPI 熱量管理)或所有頻率更改(除 +計時代碼外),甚至需要強制確定速度限制的通知器(如 ARM 架構上的 LCD 驅動程序)。 +此外, 內核 "常數" loops_per_jiffy會根據頻率變化而更新。 + +cpufreq策略的引用計數由 cpufreq_cpu_get 和 cpufreq_cpu_put 來完成,以確保 cpufreq 驅 +動程序被正確地註冊到核心中,並且驅動程序在 cpufreq_put_cpu 被調用之前不會被卸載。這也保證 +了每個CPU核的cpufreq 策略在使用期間不會被釋放。 + +2. CPUFreq 通知器 +==================== + +CPUFreq通知器符合標準的內核通知器接口。 +關於通知器的細節請參閱 linux/include/linux/notifier.h。 + +這裡有兩個不同的CPUfreq通知器 - 策略通知器和轉換通知器。 + + +2.1 CPUFreq策略通知器 +---------------------------- + +當創建或移除策略時,這些都會被通知。 + +階段是在通知器的第二個參數中指定的。當第一次創建策略時,階段是CPUFREQ_CREATE_POLICY,當 +策略被移除時,階段是CPUFREQ_REMOVE_POLICY。 + +第三個參數 ``void *pointer`` 指向一個結構體cpufreq_policy,其包括min,max(新策略的下限和 +上限(單位爲kHz))這幾個值。 + + +2.2 CPUFreq轉換通知器 +-------------------------------- + +當CPUfreq驅動切換CPU核心頻率時,策略中的每個在線CPU都會收到兩次通知,這些變化沒有任何外部干 +預。 + +第二個參數指定階段 - CPUFREQ_PRECHANGE or CPUFREQ_POSTCHANGE. + +第三個參數是一個包含如下值的結構體cpufreq_freqs: + +===== ==================== +cpu 受影響cpu的編號 +old 舊頻率 +new 新頻率 +flags cpufreq驅動的標誌 +===== ==================== + +3. 含有Operating Performance Point (OPP)的CPUFreq表的生成 +================================================================== +關於OPP的細節請參閱 Documentation/power/opp.rst + +dev_pm_opp_init_cpufreq_table - + 這個功能提供了一個隨時可用的轉換程序,用來將OPP層關於可用頻率的內部信息翻譯成一種容易提供給 + cpufreq的格式。 + + .. Warning:: + + 不要在中斷上下文中使用此函數。 + + 例如:: + + soc_pm_init() + { + /* Do things */ + r = dev_pm_opp_init_cpufreq_table(dev, &freq_table); + if (!r) + policy->freq_table = freq_table; + /* Do other things */ + } + + .. note:: + + 該函數只有在CONFIG_PM_OPP之外還啓用了CONFIG_CPU_FREQ時才可用。 + +dev_pm_opp_free_cpufreq_table + 釋放dev_pm_opp_init_cpufreq_table分配的表。 + diff --git a/Documentation/translations/zh_TW/cpu-freq/cpu-drivers.rst b/Documentation/translations/zh_TW/cpu-freq/cpu-drivers.rst new file mode 100644 index 000000000000..2bb8197cd320 --- /dev/null +++ b/Documentation/translations/zh_TW/cpu-freq/cpu-drivers.rst @@ -0,0 +1,256 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :doc:`../../../cpu-freq/cpu-drivers` +:Translator: Yanteng Si <siyanteng@loongson.cn> + Hu Haowen <src.res@email.cn> + +.. _tw_cpu-drivers.rst: + + +======================================= +如何實現一個新的CPUFreq處理器驅動程序? +======================================= + +作者: + + + - Dominik Brodowski <linux@brodo.de> + - Rafael J. Wysocki <rafael.j.wysocki@intel.com> + - Viresh Kumar <viresh.kumar@linaro.org> + +.. Contents + + 1. 怎麼做? + 1.1 初始化 + 1.2 Per-CPU 初始化 + 1.3 驗證 + 1.4 target/target_index 或 setpolicy? + 1.5 target/target_index + 1.6 setpolicy + 1.7 get_intermediate 與 target_intermediate + 2. 頻率表助手 + + + +1. 怎麼做? +=========== + +如此,你剛剛得到了一個全新的CPU/晶片組及其數據手冊,並希望爲這個CPU/晶片組添加cpufreq +支持?很好,這裡有一些至關重要的提示: + + +1.1 初始化 +---------- + +首先,在__initcall_level_7 (module_init())或更靠後的函數中檢查這個內核是否 +運行在正確的CPU和正確的晶片組上。如果是,則使用cpufreq_register_driver()向 +CPUfreq核心層註冊一個cpufreq_driver結構體。 + +結構體cpufreq_driver應該包含什麼成員? + + .name - 驅動的名字。 + + .init - 一個指向per-policy初始化函數的指針。 + + .verify - 一個指向"verification"函數的指針。 + + .setpolicy 或 .fast_switch 或 .target 或 .target_index - 差異見 + 下文。 + +並且可選擇 + + .flags - cpufreq核的提示。 + + .driver_data - cpufreq驅動程序的特定數據。 + + .get_intermediate 和 target_intermediate - 用於在改變CPU頻率時切換到穩定 + 的頻率。 + + .get - 返回CPU的當前頻率。 + + .bios_limit - 返回HW/BIOS對CPU的最大頻率限制值。 + + .exit - 一個指向per-policy清理函數的指針,該函數在cpu熱插拔過程的CPU_POST_DEAD + 階段被調用。 + + .suspend - 一個指向per-policy暫停函數的指針,該函數在關中斷且在該策略的調節器停止 + 後被調用。 + + .resume - 一個指向per-policy恢復函數的指針,該函數在關中斷且在調節器再一次開始前被 + 調用。 + + .ready - 一個指向per-policy準備函數的指針,該函數在策略完全初始化之後被調用。 + + .attr - 一個指向NULL結尾的"struct freq_attr"列表的指針,該函數允許導出值到 + sysfs。 + + .boost_enabled - 如果設置,則啓用提升(boost)頻率。 + + .set_boost - 一個指向per-policy函數的指針,該函數用來開啓/關閉提升(boost)頻率功能。 + + +1.2 Per-CPU 初始化 +------------------ + +每當一個新的CPU被註冊到設備模型中,或者在cpufreq驅動註冊自己之後,如果此CPU的cpufreq策 +略不存在,則會調用per-policy的初始化函數cpufreq_driver.init。請注意,.init()和.exit()程序 +只對策略調用一次,而不是對策略管理的每個CPU調用一次。它需要一個 ``struct cpufreq_policy +*policy`` 作爲參數。現在該怎麼做呢? + +如果有必要,請在你的CPU上激活CPUfreq功能支持。 + +然後,驅動程序必須填寫以下數值: + ++-----------------------------------+--------------------------------------+ +|policy->cpuinfo.min_freq 和 | | +|policy->cpuinfo.max_freq | 該CPU支持的最低和最高頻率(kHz) | +| | | +| | | ++-----------------------------------+--------------------------------------+ +|policy->cpuinfo.transition_latency | | +| | CPU在兩個頻率之間切換所需的時間,以 | +| | 納秒爲單位(如適用,否則指定 | +| | CPUFREQ_ETERNAL) | ++-----------------------------------+--------------------------------------+ +|policy->cur | 該CPU當前的工作頻率(如適用) | +| | | ++-----------------------------------+--------------------------------------+ +|policy->min, | | +|policy->max, | | +|policy->policy and, if necessary, | | +|policy->governor | 必須包含該cpu的 「默認策略」。稍後 | +| | 會用這些值調用 | +| | cpufreq_driver.verify and either | +| | cpufreq_driver.setpolicy or | +| | cpufreq_driver.target/target_index | +| | | ++-----------------------------------+--------------------------------------+ +|policy->cpus | 用與這個CPU一起做DVFS的(在線+離線) | +| | CPU(即與它共享時鐘/電壓軌)的掩碼更新 | +| | 這個 | +| | | ++-----------------------------------+--------------------------------------+ + +對於設置其中的一些值(cpuinfo.min[max]_freq, policy->min[max]),頻率表助手可能會有幫 +助。關於它們的更多信息,請參見第2節。 + + +1.3 驗證 +-------- + +當用戶決定設置一個新的策略(由 「policy,governor,min,max組成」)時,必須對這個策略進行驗證, +以便糾正不兼容的值。爲了驗證這些值,cpufreq_verify_within_limits(``struct cpufreq_policy +*policy``, ``unsigned int min_freq``, ``unsigned int max_freq``)函數可能會有幫助。 +關於頻率表助手的詳細內容請參見第2節。 + +您需要確保至少有一個有效頻率(或工作範圍)在 policy->min 和 policy->max 範圍內。如果有必 +要,先增加policy->max,只有在沒有辦法的情況下,才減少policy->min。 + + +1.4 target 或 target_index 或 setpolicy 或 fast_switch? +------------------------------------------------------- + +大多數cpufreq驅動甚至大多數cpu頻率升降算法只允許將CPU頻率設置爲預定義的固定值。對於這些,你 +可以使用->target(),->target_index()或->fast_switch()回調。 + +有些cpufreq功能的處理器可以自己在某些限制之間切換頻率。這些應使用->setpolicy()回調。 + + +1.5. target/target_index +------------------------ + +target_index調用有兩個參數:``struct cpufreq_policy * policy``和``unsigned int`` +索引(於列出的頻率表)。 + +當調用這裡時,CPUfreq驅動必須設置新的頻率。實際頻率必須由freq_table[index].frequency決定。 + +它應該總是在錯誤的情況下恢復到之前的頻率(即policy->restore_freq),即使我們之前切換到中間頻率。 + +已棄用 +---------- +目標調用有三個參數。``struct cpufreq_policy * policy``, unsigned int target_frequency, +unsigned int relation. + +CPUfreq驅動在調用這裡時必須設置新的頻率。實際的頻率必須使用以下規則來確定。 + +- 緊跟 "目標頻率"。 +- policy->min <= new_freq <= policy->max (這必須是有效的!!!) +- 如果 relation==CPUFREQ_REL_L,嘗試選擇一個高於或等於 target_freq 的 new_freq。("L代表 + 最低,但不能低於") +- 如果 relation==CPUFREQ_REL_H,嘗試選擇一個低於或等於 target_freq 的 new_freq。("H代表 + 最高,但不能高於") + +這裡,頻率表助手可能會幫助你--詳見第2節。 + +1.6. fast_switch +---------------- + +這個函數用於從調度器的上下文進行頻率切換。並非所有的驅動都要實現它,因爲不允許在這個回調中睡眠。這 +個回調必須經過高度優化,以儘可能快地進行切換。 + +這個函數有兩個參數: ``struct cpufreq_policy *policy`` 和 ``unsigned int target_frequency``。 + + +1.7 setpolicy +------------- + +setpolicy調用只需要一個``struct cpufreq_policy * policy``作爲參數。需要將處理器內或晶片組內動態頻 +率切換的下限設置爲policy->min,上限設置爲policy->max,如果支持的話,當policy->policy爲 +CPUFREQ_POLICY_PERFORMANCE時選擇面向性能的設置,當CPUFREQ_POLICY_POWERSAVE時選擇面向省電的設置。 +也可以查看drivers/cpufreq/longrun.c中的參考實現。 + +1.8 get_intermediate 和 target_intermediate +-------------------------------------------- + +僅適用於 target_index() 和 CPUFREQ_ASYNC_NOTIFICATION 未設置的驅動。 + +get_intermediate應該返回一個平台想要切換到的穩定的中間頻率,target_intermediate()應該將CPU設置爲 +該頻率,然後再跳轉到'index'對應的頻率。核心會負責發送通知,驅動不必在target_intermediate()或 +target_index()中處理。 + +在驅動程序不想因爲某個目標頻率切換到中間頻率的情況下,它們可以從get_intermediate()中返回'0'。在這種情況 +下,核心將直接調用->target_index()。 + +注意:->target_index()應該在失敗的情況下恢復到policy->restore_freq,因爲core會爲此發送通知。 + + +2. 頻率表助手 +============= + +由於大多數cpufreq處理器只允許被設置爲幾個特定的頻率,因此,一個帶有一些函數的 「頻率表」可能會輔助處理器驅動 +程序的一些工作。這樣的 "頻率表" 由一個cpufreq_frequency_table條目構成的數組組成,"driver_data" 中包 +含了驅動程序的具體數值,"frequency" 中包含了相應的頻率,並設置了標誌。在表的最後,需要添加一個 +cpufreq_frequency_table條目,頻率設置爲CPUFREQ_TABLE_END。而如果想跳過表中的一個條目,則將頻率設置爲 +CPUFREQ_ENTRY_INVALID。這些條目不需要按照任何特定的順序排序,但如果它們是cpufreq 核心會對它們進行快速的DVFS, +因爲搜索最佳匹配會更快。 + +如果策略在其policy->freq_table欄位中包含一個有效的指針,cpufreq表就會被核心自動驗證。 + +cpufreq_frequency_table_verify()保證至少有一個有效的頻率在policy->min和policy->max範圍內,並且所有其他 +標準都被滿足。這對->verify調用很有幫助。 + +cpufreq_frequency_table_target()是對應於->target階段的頻率表助手。只要把數值傳遞給這個函數,這個函數就會返 +回包含CPU要設置的頻率的頻率表條目。 + +以下宏可以作爲cpufreq_frequency_table的疊代器。 + +cpufreq_for_each_entry(pos, table) - 遍歷頻率表的所有條目。 + +cpufreq_for_each_valid_entry(pos, table) - 該函數遍歷所有條目,不包括CPUFREQ_ENTRY_INVALID頻率。 +使用參數 "pos"-一個``cpufreq_frequency_table * `` 作爲循環變量,使用參數 "table"-作爲你想疊代 +的``cpufreq_frequency_table * `` 。 + +例如:: + + struct cpufreq_frequency_table *pos, *driver_freq_table; + + cpufreq_for_each_entry(pos, driver_freq_table) { + /* Do something with pos */ + pos->frequency = ... + } + +如果你需要在driver_freq_table中處理pos的位置,不要減去指針,因爲它的代價相當高。相反,使用宏 +cpufreq_for_each_entry_idx() 和 cpufreq_for_each_valid_entry_idx() 。 + diff --git a/Documentation/translations/zh_TW/cpu-freq/cpufreq-stats.rst b/Documentation/translations/zh_TW/cpu-freq/cpufreq-stats.rst new file mode 100644 index 000000000000..d80bfed50e8c --- /dev/null +++ b/Documentation/translations/zh_TW/cpu-freq/cpufreq-stats.rst @@ -0,0 +1,132 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :doc:`../../../cpu-freq/cpufreq-stats` +:Translator: Yanteng Si <siyanteng@loongson.cn> + Hu Haowen <src.res@email.cn> + +.. _tw_cpufreq-stats.rst: + + +========================================== +sysfs CPUFreq Stats的一般說明 +========================================== + +用戶信息 + + +作者: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> + +.. Contents + + 1. 簡介 + 2. 提供的統計數據(舉例說明) + 3. 配置cpufreq-stats + + +1. 簡介 +=============== + +cpufreq-stats是一個爲每個CPU提供CPU頻率統計的驅動。 +這些統計數據在/sysfs中以一堆只讀接口的形式提供。這個接口(在配置好後)將出現在 +/sysfs(<sysfs root>/devices/system/cpu/cpuX/cpufreq/stats/)中cpufreq下的一個單 +獨的目錄中,提供給每個CPU。 +各種統計數據將在此目錄下形成只讀文件。 + +此驅動是獨立於任何可能運行在你所用CPU上的特定cpufreq_driver而設計的。因此,它將與所有 +cpufreq_driver一起工作。 + + +2. 提供的統計數據(舉例說明) +===================================== + +cpufreq stats提供了以下統計數據(在下面詳細解釋)。 + +- time_in_state +- total_trans +- trans_table + +所有的統計數據將從統計驅動被載入的時間(或統計被重置的時間)開始,到某一統計數據被讀取的時間爲止。 +顯然,統計驅動不會有任何關於統計驅動載入之前的頻率轉換信息。 + +:: + + <mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # ls -l + total 0 + drwxr-xr-x 2 root root 0 May 14 16:06 . + drwxr-xr-x 3 root root 0 May 14 15:58 .. + --w------- 1 root root 4096 May 14 16:06 reset + -r--r--r-- 1 root root 4096 May 14 16:06 time_in_state + -r--r--r-- 1 root root 4096 May 14 16:06 total_trans + -r--r--r-- 1 root root 4096 May 14 16:06 trans_table + +- **reset** + +只寫屬性,可用於重置統計計數器。這對於評估不同調節器下的系統行爲非常有用,且無需重啓。 + + +- **time_in_state** + +此項給出了這個CPU所支持的每個頻率所花費的時間。cat輸出的每一行都會有"<frequency> +<time>"對,表示這個CPU在<frequency>上花費了<time>個usertime單位的時間。這裡的 +usertime單位是10mS(類似於/proc中輸出的其他時間)。 + +:: + + <mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat time_in_state + 3600000 2089 + 3400000 136 + 3200000 34 + 3000000 67 + 2800000 172488 + + +- **total_trans** + +給出了這個CPU上頻率轉換的總次數。cat的輸出將有一個單一的計數,這就是頻率轉換的總數。 + +:: + + <mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat total_trans + 20 + +- **trans_table** + +這將提供所有CPU頻率轉換的細粒度信息。這裡的cat輸出是一個二維矩陣,其中一個條目<i, j>(第 +i行,第j列)代表從Freq_i到Freq_j的轉換次數。Freq_i行和Freq_j列遵循驅動最初提供給cpufreq +核的頻率表的排序順序,因此可以排序(升序或降序)或不排序。 這裡的輸出也包含了每行每列的實際 +頻率值,以便更好地閱讀。 + +如果轉換表大於PAGE_SIZE,讀取時將返回一個-EFBIG錯誤。 + +:: + + <mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat trans_table + From : To + : 3600000 3400000 3200000 3000000 2800000 + 3600000: 0 5 0 0 0 + 3400000: 4 0 2 0 0 + 3200000: 0 1 0 2 0 + 3000000: 0 0 1 0 3 + 2800000: 0 0 0 2 0 + +3. 配置cpufreq-stats +============================ + +要在你的內核中配置cpufreq-stats:: + + Config Main Menu + Power management options (ACPI, APM) ---> + CPU Frequency scaling ---> + [*] CPU Frequency scaling + [*] CPU frequency translation statistics + + +"CPU Frequency scaling" (CONFIG_CPU_FREQ) 應該被啓用以配置cpufreq-stats。 + +"CPU frequency translation statistics" (CONFIG_CPU_FREQ_STAT)提供了包括 +time_in_state、total_trans和trans_table的統計數據。 + +一旦啓用了這個選項,並且你的CPU支持cpufrequency,你就可以在/sysfs中看到CPU頻率統計。 + diff --git a/Documentation/translations/zh_TW/cpu-freq/index.rst b/Documentation/translations/zh_TW/cpu-freq/index.rst new file mode 100644 index 000000000000..1a8e680f95ed --- /dev/null +++ b/Documentation/translations/zh_TW/cpu-freq/index.rst @@ -0,0 +1,47 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :doc:`../../../cpu-freq/index` +:Translator: Yanteng Si <siyanteng@loongson.cn> + Hu Haowen <src.res@email.cn> + +.. _tw_index.rst: + + +======================================================= +Linux CPUFreq - Linux(TM)內核中的CPU頻率和電壓升降代碼 +======================================================= + +Author: Dominik Brodowski <linux@brodo.de> + + 時鐘升降允許你在運行中改變CPU的時鐘速度。這是一個很好的節省電池電量的方法,因爲時 + 鐘速度越低,CPU消耗的電量越少。 + + +.. toctree:: + :maxdepth: 1 + + core + cpu-drivers + cpufreq-stats + +郵件列表 +------------ +這裡有一個 CPU 頻率變化的 CVS 提交和通用列表,您可以在這裡報告bug、問題或提交補丁。要發 +布消息,請發送電子郵件到 linux-pm@vger.kernel.org。 + +連結 +----- +FTP檔案: +* ftp://ftp.linux.org.uk/pub/linux/cpufreq/ + +如何訪問CVS倉庫: +* http://cvs.arm.linux.org.uk/ + +CPUFreq郵件列表: +* http://vger.kernel.org/vger-lists.html#linux-pm + +SA-1100的時鐘和電壓標度: +* http://www.lartmaker.nl/projects/scaling + diff --git a/Documentation/translations/zh_TW/filesystems/debugfs.rst b/Documentation/translations/zh_TW/filesystems/debugfs.rst new file mode 100644 index 000000000000..270dd94fddf1 --- /dev/null +++ b/Documentation/translations/zh_TW/filesystems/debugfs.rst @@ -0,0 +1,224 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :doc:`../../../filesystems/debugfs` + +======= +Debugfs +======= + +譯者 +:: + + 中文版維護者:羅楚成 Chucheng Luo <luochucheng@vivo.com> + 中文版翻譯者:羅楚成 Chucheng Luo <luochucheng@vivo.com> + 中文版校譯者: 羅楚成 Chucheng Luo <luochucheng@vivo.com> + 繁體中文版校譯者: 胡皓文 Hu Haowen <src.res@email.cn> + + + +版權所有2020 羅楚成 <luochucheng@vivo.com> +版權所有2021 胡皓文 Hu Haowen <src.res@email.cn> + + +Debugfs是內核開發人員在用戶空間獲取信息的簡單方法。與/proc不同,proc只提供進程 +信息。也不像sysfs,具有嚴格的「每個文件一個值「的規則。debugfs根本沒有規則,開發 +人員可以在這裡放置他們想要的任何信息。debugfs文件系統也不能用作穩定的ABI接口。 +從理論上講,debugfs導出文件的時候沒有任何約束。但是[1]實際情況並不總是那麼 +簡單。即使是debugfs接口,也最好根據需要進行設計,並儘量保持接口不變。 + + +Debugfs通常使用以下命令安裝:: + + mount -t debugfs none /sys/kernel/debug + +(或等效的/etc/fstab行)。 +debugfs根目錄默認僅可由root用戶訪問。要更改對文件樹的訪問,請使用「 uid」,「 gid」 +和「 mode」掛載選項。請注意,debugfs API僅按照GPL協議導出到模塊。 + +使用debugfs的代碼應包含<linux/debugfs.h>。然後,首先是創建至少一個目錄來保存 +一組debugfs文件:: + + struct dentry *debugfs_create_dir(const char *name, struct dentry *parent); + +如果成功,此調用將在指定的父目錄下創建一個名爲name的目錄。如果parent參數爲空, +則會在debugfs根目錄中創建。創建目錄成功時,返回值是一個指向dentry結構體的指針。 +該dentry結構體的指針可用於在目錄中創建文件(以及最後將其清理乾淨)。ERR_PTR +(-ERROR)返回值表明出錯。如果返回ERR_PTR(-ENODEV),則表明內核是在沒有debugfs +支持的情況下構建的,並且下述函數都不會起作用。 + +在debugfs目錄中創建文件的最通用方法是:: + + struct dentry *debugfs_create_file(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops); + +在這裡,name是要創建的文件的名稱,mode描述了訪問文件應具有的權限,parent指向 +應該保存文件的目錄,data將存儲在產生的inode結構體的i_private欄位中,而fops是 +一組文件操作函數,這些函數中實現文件操作的具體行爲。至少,read()和/或 +write()操作應提供;其他可以根據需要包括在內。同樣的,返回值將是指向創建文件 +的dentry指針,錯誤時返回ERR_PTR(-ERROR),系統不支持debugfs時返回值爲ERR_PTR +(-ENODEV)。創建一個初始大小的文件,可以使用以下函數代替:: + + struct dentry *debugfs_create_file_size(const char *name, umode_t mode, + struct dentry *parent, void *data, + const struct file_operations *fops, + loff_t file_size); + +file_size是初始文件大小。其他參數跟函數debugfs_create_file的相同。 + +在許多情況下,沒必要自己去創建一組文件操作;對於一些簡單的情況,debugfs代碼提供 +了許多幫助函數。包含單個整數值的文件可以使用以下任何一項創建:: + + void debugfs_create_u8(const char *name, umode_t mode, + struct dentry *parent, u8 *value); + void debugfs_create_u16(const char *name, umode_t mode, + struct dentry *parent, u16 *value); + struct dentry *debugfs_create_u32(const char *name, umode_t mode, + struct dentry *parent, u32 *value); + void debugfs_create_u64(const char *name, umode_t mode, + struct dentry *parent, u64 *value); + +這些文件支持讀取和寫入給定值。如果某個文件不支持寫入,只需根據需要設置mode +參數位。這些文件中的值以十進位表示;如果需要使用十六進位,可以使用以下函數 +替代:: + + void debugfs_create_x8(const char *name, umode_t mode, + struct dentry *parent, u8 *value); + void debugfs_create_x16(const char *name, umode_t mode, + struct dentry *parent, u16 *value); + void debugfs_create_x32(const char *name, umode_t mode, + struct dentry *parent, u32 *value); + void debugfs_create_x64(const char *name, umode_t mode, + struct dentry *parent, u64 *value); + +這些功能只有在開發人員知道導出值的大小的時候才有用。某些數據類型在不同的架構上 +有不同的寬度,這樣會使情況變得有些複雜。在這種特殊情況下可以使用以下函數:: + + void debugfs_create_size_t(const char *name, umode_t mode, + struct dentry *parent, size_t *value); + +不出所料,此函數將創建一個debugfs文件來表示類型爲size_t的變量。 + +同樣地,也有導出無符號長整型變量的函數,分別以十進位和十六進位表示如下:: + + struct dentry *debugfs_create_ulong(const char *name, umode_t mode, + struct dentry *parent, + unsigned long *value); + void debugfs_create_xul(const char *name, umode_t mode, + struct dentry *parent, unsigned long *value); + +布爾值可以通過以下方式放置在debugfs中:: + + struct dentry *debugfs_create_bool(const char *name, umode_t mode, + struct dentry *parent, bool *value); + + +讀取結果文件將產生Y(對於非零值)或N,後跟換行符寫入的時候,它只接受大寫或小寫 +值或1或0。任何其他輸入將被忽略。 + +同樣,atomic_t類型的值也可以放置在debugfs中:: + + void debugfs_create_atomic_t(const char *name, umode_t mode, + struct dentry *parent, atomic_t *value) + +讀取此文件將獲得atomic_t值,寫入此文件將設置atomic_t值。 + +另一個選擇是通過以下結構體和函數導出一個任意二進位數據塊:: + + struct debugfs_blob_wrapper { + void *data; + unsigned long size; + }; + + struct dentry *debugfs_create_blob(const char *name, umode_t mode, + struct dentry *parent, + struct debugfs_blob_wrapper *blob); + +讀取此文件將返回由指針指向debugfs_blob_wrapper結構體的數據。一些驅動使用「blobs」 +作爲一種返回幾行(靜態)格式化文本的簡單方法。這個函數可用於導出二進位信息,但 +似乎在主線中沒有任何代碼這樣做。請注意,使用debugfs_create_blob()命令創建的 +所有文件是只讀的。 + +如果您要轉儲一個寄存器塊(在開發過程中經常會這麼做,但是這樣的調試代碼很少上傳 +到主線中。Debugfs提供兩個函數:一個用於創建僅寄存器文件,另一個把一個寄存器塊 +插入一個順序文件中:: + + struct debugfs_reg32 { + char *name; + unsigned long offset; + }; + + struct debugfs_regset32 { + struct debugfs_reg32 *regs; + int nregs; + void __iomem *base; + }; + + struct dentry *debugfs_create_regset32(const char *name, umode_t mode, + struct dentry *parent, + struct debugfs_regset32 *regset); + + void debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, + int nregs, void __iomem *base, char *prefix); + +「base」參數可能爲0,但您可能需要使用__stringify構建reg32數組,實際上有許多寄存器 +名稱(宏)是寄存器塊在基址上的字節偏移量。 + +如果要在debugfs中轉儲u32數組,可以使用以下函數創建文件:: + + void debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements); + +「array」參數提供數據,而「elements」參數爲數組中元素的數量。注意:數組創建後,數組 +大小無法更改。 + +有一個函數來創建與設備相關的seq_file:: + + struct dentry *debugfs_create_devm_seqfile(struct device *dev, + const char *name, + struct dentry *parent, + int (*read_fn)(struct seq_file *s, + void *data)); + +「dev」參數是與此debugfs文件相關的設備,並且「read_fn」是一個函數指針,這個函數在 +列印seq_file內容的時候被回調。 + +還有一些其他的面向目錄的函數:: + + struct dentry *debugfs_rename(struct dentry *old_dir, + struct dentry *old_dentry, + struct dentry *new_dir, + const char *new_name); + + struct dentry *debugfs_create_symlink(const char *name, + struct dentry *parent, + const char *target); + +調用debugfs_rename()將爲現有的debugfs文件重命名,可能同時切換目錄。 new_name +函數調用之前不能存在;返回值爲old_dentry,其中包含更新的信息。可以使用 +debugfs_create_symlink()創建符號連結。 + +所有debugfs用戶必須考慮的一件事是: + +debugfs不會自動清除在其中創建的任何目錄。如果一個模塊在不顯式刪除debugfs目錄的 +情況下卸載模塊,結果將會遺留很多野指針,從而導致系統不穩定。因此,所有debugfs +用戶-至少是那些可以作爲模塊構建的用戶-必須做模塊卸載的時候準備刪除在此創建的 +所有文件和目錄。一份文件可以通過以下方式刪除:: + + void debugfs_remove(struct dentry *dentry); + +dentry值可以爲NULL或錯誤值,在這種情況下,不會有任何文件被刪除。 + +很久以前,內核開發者使用debugfs時需要記錄他們創建的每個dentry指針,以便最後所有 +文件都可以被清理掉。但是,現在debugfs用戶能調用以下函數遞歸清除之前創建的文件:: + + void debugfs_remove_recursive(struct dentry *dentry); + +如果將對應頂層目錄的dentry傳遞給以上函數,則該目錄下的整個層次結構將會被刪除。 + +注釋: +[1] http://lwn.net/Articles/309298/ + diff --git a/Documentation/translations/zh_TW/filesystems/index.rst b/Documentation/translations/zh_TW/filesystems/index.rst new file mode 100644 index 000000000000..4e5dde0dca3c --- /dev/null +++ b/Documentation/translations/zh_TW/filesystems/index.rst @@ -0,0 +1,31 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/filesystems/index.rst <filesystems_index>` +:Translator: Wang Wenhu <wenhu.wang@vivo.com> + Hu Haowen <src.res@email.cn> + +.. _tw_filesystems_index: + +======================== +Linux Kernel中的文件系統 +======================== + +這份正在開發的手冊或許在未來某個輝煌的日子裡以易懂的形式將Linux虛擬\ +文件系統(VFS)層以及基於其上的各種文件系統如何工作呈現給大家。當前\ +可以看到下面的內容。 + +文件系統 +======== + +文件系統實現文檔。 + +.. toctree:: + :maxdepth: 2 + + virtiofs + debugfs + tmpfs + + diff --git a/Documentation/translations/zh_TW/filesystems/sysfs.txt b/Documentation/translations/zh_TW/filesystems/sysfs.txt new file mode 100644 index 000000000000..acd677f19d4f --- /dev/null +++ b/Documentation/translations/zh_TW/filesystems/sysfs.txt @@ -0,0 +1,377 @@ +SPDX-License-Identifier: GPL-2.0 + +Chinese translated version of Documentation/filesystems/sysfs.rst + +If you have any comment or update to the content, please contact the +original document maintainer directly. However, if you have a problem +communicating in English you can also ask the Chinese maintainer for +help. Contact the Chinese maintainer if this translation is outdated +or if there is a problem with the translation. + +Maintainer: Patrick Mochel <mochel@osdl.org> + Mike Murphy <mamurph@cs.clemson.edu> +Chinese maintainer: Fu Wei <tekkamanninja@gmail.com> +--------------------------------------------------------------------- +Documentation/filesystems/sysfs.rst 的中文翻譯 + +如果想評論或更新本文的內容,請直接聯繫原文檔的維護者。如果你使用英文 +交流有困難的話,也可以向中文版維護者求助。如果本翻譯更新不及時或者翻 +譯存在問題,請聯繫中文版維護者。 +英文版維護者: Patrick Mochel <mochel@osdl.org> + Mike Murphy <mamurph@cs.clemson.edu> +中文版維護者: 傅煒 Fu Wei <tekkamanninja@gmail.com> +中文版翻譯者: 傅煒 Fu Wei <tekkamanninja@gmail.com> +中文版校譯者: 傅煒 Fu Wei <tekkamanninja@gmail.com> +繁體中文版校譯者:胡皓文 Hu Haowen <src.res@email.cn> + + +以下爲正文 +--------------------------------------------------------------------- +sysfs - 用於導出內核對象(kobject)的文件系統 + +Patrick Mochel <mochel@osdl.org> +Mike Murphy <mamurph@cs.clemson.edu> + +修訂: 16 August 2011 +原始版本: 10 January 2003 + + +sysfs 簡介: +~~~~~~~~~~ + +sysfs 是一個最初基於 ramfs 且位於內存的文件系統。它提供導出內核 +數據結構及其屬性,以及它們之間的關聯到用戶空間的方法。 + +sysfs 始終與 kobject 的底層結構緊密相關。請閱讀 +Documentation/core-api/kobject.rst 文檔以獲得更多關於 kobject 接口的 +信息。 + + +使用 sysfs +~~~~~~~~~~~ + +只要內核配置中定義了 CONFIG_SYSFS ,sysfs 總是被編譯進內核。你可 +通過以下命令掛載它: + + mount -t sysfs sysfs /sys + + +創建目錄 +~~~~~~~~ + +任何 kobject 在系統中註冊,就會有一個目錄在 sysfs 中被創建。這個 +目錄是作爲該 kobject 的父對象所在目錄的子目錄創建的,以準確地傳遞 +內核的對象層次到用戶空間。sysfs 中的頂層目錄代表著內核對象層次的 +共同祖先;例如:某些對象屬於某個子系統。 + +Sysfs 在與其目錄關聯的 kernfs_node 對象中內部保存一個指向實現 +目錄的 kobject 的指針。以前,這個 kobject 指針被 sysfs 直接用於 +kobject 文件打開和關閉的引用計數。而現在的 sysfs 實現中,kobject +引用計數只能通過 sysfs_schedule_callback() 函數直接修改。 + + +屬性 +~~~~ + +kobject 的屬性可在文件系統中以普通文件的形式導出。Sysfs 爲屬性定義 +了面向文件 I/O 操作的方法,以提供對內核屬性的讀寫。 + + +屬性應爲 ASCII 碼文本文件。以一個文件只存儲一個屬性值爲宜。但一個 +文件只包含一個屬性值可能影響效率,所以一個包含相同數據類型的屬性值 +數組也被廣泛地接受。 + +混合類型、表達多行數據以及一些怪異的數據格式會遭到強烈反對。這樣做是 +很丟臉的,而且其代碼會在未通知作者的情況下被重寫。 + + +一個簡單的屬性結構定義如下: + +struct attribute { + char * name; + struct module *owner; + umode_t mode; +}; + + +int sysfs_create_file(struct kobject * kobj, const struct attribute * attr); +void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr); + + +一個單獨的屬性結構並不包含讀寫其屬性值的方法。子系統最好爲增刪特定 +對象類型的屬性定義自己的屬性結構體和封裝函數。 + +例如:驅動程序模型定義的 device_attribute 結構體如下: + +struct device_attribute { + struct attribute attr; + ssize_t (*show)(struct device *dev, struct device_attribute *attr, + char *buf); + ssize_t (*store)(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count); +}; + +int device_create_file(struct device *, const struct device_attribute *); +void device_remove_file(struct device *, const struct device_attribute *); + +爲了定義設備屬性,同時定義了一下輔助宏: + +#define DEVICE_ATTR(_name, _mode, _show, _store) \ +struct device_attribute dev_attr_##_name = __ATTR(_name, _mode, _show, _store) + +例如:聲明 + +static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo); + +等同於如下代碼: + +static struct device_attribute dev_attr_foo = { + .attr = { + .name = "foo", + .mode = S_IWUSR | S_IRUGO, + .show = show_foo, + .store = store_foo, + }, +}; + + +子系統特有的回調函數 +~~~~~~~~~~~~~~~~~~~ + +當一個子系統定義一個新的屬性類型時,必須實現一系列的 sysfs 操作, +以幫助讀寫調用實現屬性所有者的顯示和儲存方法。 + +struct sysfs_ops { + ssize_t (*show)(struct kobject *, struct attribute *, char *); + ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t); +}; + +[子系統應已經定義了一個 struct kobj_type 結構體作爲這個類型的 +描述符,並在此保存 sysfs_ops 的指針。更多的信息參見 kobject 的 +文檔] + +sysfs 會爲這個類型調用適當的方法。當一個文件被讀寫時,這個方法會 +將一般的kobject 和 attribute 結構體指針轉換爲適當的指針類型後 +調用相關聯的函數。 + + +示例: + +#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr) + +static ssize_t dev_attr_show(struct kobject *kobj, struct attribute *attr, + char *buf) +{ + struct device_attribute *dev_attr = to_dev_attr(attr); + struct device *dev = kobj_to_dev(kobj); + ssize_t ret = -EIO; + + if (dev_attr->show) + ret = dev_attr->show(dev, dev_attr, buf); + if (ret >= (ssize_t)PAGE_SIZE) { + printk("dev_attr_show: %pS returned bad count\n", + dev_attr->show); + } + return ret; +} + + + +讀寫屬性數據 +~~~~~~~~~~~~ + +在聲明屬性時,必須指定 show() 或 store() 方法,以實現屬性的 +讀或寫。這些方法的類型應該和以下的設備屬性定義一樣簡單。 + +ssize_t (*show)(struct device *dev, struct device_attribute *attr, char *buf); +ssize_t (*store)(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count); + +也就是說,他們應只以一個處理對象、一個屬性和一個緩衝指針作爲參數。 + +sysfs 會分配一個大小爲 (PAGE_SIZE) 的緩衝區並傳遞給這個方法。 +Sysfs 將會爲每次讀寫操作調用一次這個方法。這使得這些方法在執行時 +會出現以下的行爲: + +- 在讀方面(read(2)),show() 方法應該填充整個緩衝區。回想屬性 + 應只導出了一個屬性值或是一個同類型屬性值的數組,所以這個代價將 + 不會不太高。 + + 這使得用戶空間可以局部地讀和任意的向前搜索整個文件。如果用戶空間 + 向後搜索到零或使用『0』偏移執行一個pread(2)操作,show()方法將 + 再次被調用,以重新填充緩存。 + +- 在寫方面(write(2)),sysfs 希望在第一次寫操作時得到整個緩衝區。 + 之後 Sysfs 傳遞整個緩衝區給 store() 方法。 + + 當要寫 sysfs 文件時,用戶空間進程應首先讀取整個文件,修該想要 + 改變的值,然後回寫整個緩衝區。 + + 在讀寫屬性值時,屬性方法的執行應操作相同的緩衝區。 + +註記: + +- 寫操作導致的 show() 方法重載,會忽略當前文件位置。 + +- 緩衝區應總是 PAGE_SIZE 大小。對於i386,這個值爲4096。 + +- show() 方法應該返回寫入緩衝區的字節數,也就是 scnprintf()的 + 返回值。 + +- show() 方法在將格式化返回值返回用戶空間的時候,禁止使用snprintf()。 + 如果可以保證不會發生緩衝區溢出,可以使用sprintf(),否則必須使用 + scnprintf()。 + +- store() 應返回緩衝區的已用字節數。如果整個緩存都已填滿,只需返回 + count 參數。 + +- show() 或 store() 可以返回錯誤值。當得到一個非法值,必須返回一個 + 錯誤值。 + +- 一個傳遞給方法的對象將會通過 sysfs 調用對象內嵌的引用計數固定在 + 內存中。儘管如此,對象代表的物理實體(如設備)可能已不存在。如有必要, + 應該實現一個檢測機制。 + +一個簡單的(未經實驗證實的)設備屬性實現如下: + +static ssize_t show_name(struct device *dev, struct device_attribute *attr, + char *buf) +{ + return scnprintf(buf, PAGE_SIZE, "%s\n", dev->name); +} + +static ssize_t store_name(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + snprintf(dev->name, sizeof(dev->name), "%.*s", + (int)min(count, sizeof(dev->name) - 1), buf); + return count; +} + +static DEVICE_ATTR(name, S_IRUGO, show_name, store_name); + + +(注意:真正的實現不允許用戶空間設置設備名。) + +頂層目錄布局 +~~~~~~~~~~~~ + +sysfs 目錄的安排顯示了內核數據結構之間的關係。 + +頂層 sysfs 目錄如下: + +block/ +bus/ +class/ +dev/ +devices/ +firmware/ +net/ +fs/ + +devices/ 包含了一個設備樹的文件系統表示。他直接映射了內部的內核 +設備樹,反映了設備的層次結構。 + +bus/ 包含了內核中各種總線類型的平面目錄布局。每個總線目錄包含兩個 +子目錄: + + devices/ + drivers/ + +devices/ 包含了系統中出現的每個設備的符號連結,他們指向 root/ 下的 +設備目錄。 + +drivers/ 包含了每個已爲特定總線上的設備而掛載的驅動程序的目錄(這裡 +假定驅動沒有跨越多個總線類型)。 + +fs/ 包含了一個爲文件系統設立的目錄。現在每個想要導出屬性的文件系統必須 +在 fs/ 下創建自己的層次結構(參見Documentation/filesystems/fuse.rst)。 + +dev/ 包含兩個子目錄: char/ 和 block/。在這兩個子目錄中,有以 +<major>:<minor> 格式命名的符號連結。這些符號連結指向 sysfs 目錄 +中相應的設備。/sys/dev 提供一個通過一個 stat(2) 操作結果,查找 +設備 sysfs 接口快捷的方法。 + +更多有關 driver-model 的特性信息可以在 Documentation/driver-api/driver-model/ +中找到。 + + +TODO: 完成這一節。 + + +當前接口 +~~~~~~~~ + +以下的接口層普遍存在於當前的sysfs中: + +- 設備 (include/linux/device.h) +---------------------------------- +結構體: + +struct device_attribute { + struct attribute attr; + ssize_t (*show)(struct device *dev, struct device_attribute *attr, + char *buf); + ssize_t (*store)(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count); +}; + +聲明: + +DEVICE_ATTR(_name, _mode, _show, _store); + +增/刪屬性: + +int device_create_file(struct device *dev, const struct device_attribute * attr); +void device_remove_file(struct device *dev, const struct device_attribute * attr); + + +- 總線驅動程序 (include/linux/device.h) +-------------------------------------- +結構體: + +struct bus_attribute { + struct attribute attr; + ssize_t (*show)(struct bus_type *, char * buf); + ssize_t (*store)(struct bus_type *, const char * buf, size_t count); +}; + +聲明: + +BUS_ATTR(_name, _mode, _show, _store) + +增/刪屬性: + +int bus_create_file(struct bus_type *, struct bus_attribute *); +void bus_remove_file(struct bus_type *, struct bus_attribute *); + + +- 設備驅動程序 (include/linux/device.h) +----------------------------------------- + +結構體: + +struct driver_attribute { + struct attribute attr; + ssize_t (*show)(struct device_driver *, char * buf); + ssize_t (*store)(struct device_driver *, const char * buf, + size_t count); +}; + +聲明: + +DRIVER_ATTR(_name, _mode, _show, _store) + +增/刪屬性: + +int driver_create_file(struct device_driver *, const struct driver_attribute *); +void driver_remove_file(struct device_driver *, const struct driver_attribute *); + + +文檔 +~~~~ + +sysfs 目錄結構以及其中包含的屬性定義了一個內核與用戶空間之間的 ABI。 +對於任何 ABI,其自身的穩定和適當的文檔是非常重要的。所有新的 sysfs +屬性必須在 Documentation/ABI 中有文檔。詳見 Documentation/ABI/README。 + diff --git a/Documentation/translations/zh_TW/filesystems/tmpfs.rst b/Documentation/translations/zh_TW/filesystems/tmpfs.rst new file mode 100644 index 000000000000..8d753a34785b --- /dev/null +++ b/Documentation/translations/zh_TW/filesystems/tmpfs.rst @@ -0,0 +1,148 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: Documentation/filesystems/tmpfs.rst + +Translated by Wang Qing <wangqing@vivo.com> +and Hu Haowen <src.res@email.cn> + +===== +Tmpfs +===== + +Tmpfs是一個將所有文件都保存在虛擬內存中的文件系統。 + +tmpfs中的所有內容都是臨時的,也就是說沒有任何文件會在硬碟上創建。 +如果卸載tmpfs實例,所有保存在其中的文件都會丟失。 + +tmpfs將所有文件保存在內核緩存中,隨著文件內容增長或縮小可以將不需要的 +頁面swap出去。它具有最大限制,可以通過「mount -o remount ...」調整。 + +和ramfs(創建tmpfs的模板)相比,tmpfs包含交換和限制檢查。和tmpfs相似的另 +一個東西是RAM磁碟(/dev/ram*),可以在物理RAM中模擬固定大小的硬碟,並在 +此之上創建一個普通的文件系統。Ramdisks無法swap,因此無法調整它們的大小。 + +由於tmpfs完全保存於頁面緩存和swap中,因此所有tmpfs頁面將在/proc/meminfo +中顯示爲「Shmem」,而在free(1)中顯示爲「Shared」。請注意,這些計數還包括 +共享內存(shmem,請參閱ipcs(1))。獲得計數的最可靠方法是使用df(1)和du(1)。 + +tmpfs具有以下用途: + +1) 內核總有一個無法看到的內部掛載,用於共享匿名映射和SYSV共享內存。 + + 掛載不依賴於CONFIG_TMPFS。如果CONFIG_TMPFS未設置,tmpfs對用戶不可見。 + 但是內部機制始終存在。 + +2) glibc 2.2及更高版本期望將tmpfs掛載在/dev/shm上以用於POSIX共享內存 + (shm_open,shm_unlink)。添加內容到/etc/fstab應注意如下: + + tmpfs /dev/shm tmpfs defaults 0 0 + + 使用時需要記住創建掛載tmpfs的目錄。 + + SYSV共享內存無需掛載,內部已默認支持。(在2.3內核版本中,必須掛載 + tmpfs的前身(shm fs)才能使用SYSV共享內存) + +3) 很多人(包括我)都覺的在/tmp和/var/tmp上掛載非常方便,並具有較大的 + swap分區。目前循環掛載tmpfs可以正常工作,所以大多數發布都應當可以 + 使用mkinitrd通過/tmp訪問/tmp。 + +4) 也許還有更多我不知道的地方:-) + + +tmpfs有三個用於調整大小的掛載選項: + +========= =========================================================== +size tmpfs實例分配的字節數限制。默認值是不swap時物理RAM的一半。 + 如果tmpfs實例過大,機器將死鎖,因爲OOM處理將無法釋放該內存。 +nr_blocks 與size相同,但以PAGE_SIZE爲單位。 +nr_inodes tmpfs實例的最大inode個數。默認值是物理內存頁數的一半,或者 + (有高端內存的機器)低端內存RAM的頁數,二者以較低者為準。 +========= =========================================================== + +這些參數接受後綴k,m或g表示千,兆和千兆字節,可以在remount時更改。 +size參數也接受後綴%用來限制tmpfs實例占用物理RAM的百分比: +未指定size或nr_blocks時,默認值爲size=50% + +如果nr_blocks=0(或size=0),block個數將不受限制;如果nr_inodes=0, +inode個數將不受限制。這樣掛載通常是不明智的,因爲它允許任何具有寫權限的 +用戶通過訪問tmpfs耗盡機器上的所有內存;但同時這樣做也會增強在多個CPU的 +場景下的訪問。 + +tmpfs具有爲所有文件設置NUMA內存分配策略掛載選項(如果啓用了CONFIG_NUMA), +可以通過「mount -o remount ...」調整 + +======================== ========================= +mpol=default 採用進程分配策略 + (請參閱 set_mempolicy(2)) +mpol=prefer:Node 傾向從給定的節點分配 +mpol=bind:NodeList 只允許從指定的鍊表分配 +mpol=interleave 傾向於依次從每個節點分配 +mpol=interleave:NodeList 依次從每個節點分配 +mpol=local 優先本地節點分配內存 +======================== ========================= + +NodeList格式是以逗號分隔的十進位數字表示大小和範圍,最大和最小範圍是用- +分隔符的十進位數來表示。例如,mpol=bind0-3,5,7,9-15 + +帶有有效NodeList的內存策略將按指定格式保存,在創建文件時使用。當任務在該 +文件系統上創建文件時,會使用到掛載時的內存策略NodeList選項,如果設置的話, +由調用任務的cpuset[請參見Documentation/admin-guide/cgroup-v1/cpusets.rst] +以及下面列出的可選標誌約束。如果NodeLists爲設置爲空集,則文件的內存策略將 +恢復爲「默認」策略。 + +NUMA內存分配策略有可選標誌,可以用於模式結合。在掛載tmpfs時指定這些可選 +標誌可以在NodeList之前生效。 +Documentation/admin-guide/mm/numa_memory_policy.rst列出所有可用的內存 +分配策略模式標誌及其對內存策略。 + +:: + + =static 相當於 MPOL_F_STATIC_NODES + =relative 相當於 MPOL_F_RELATIVE_NODES + +例如,mpol=bind=staticNodeList相當於MPOL_BIND|MPOL_F_STATIC_NODES的分配策略 + +請注意,如果內核不支持NUMA,那麼使用mpol選項掛載tmpfs將會失敗;nodelist指定不 +在線的節點也會失敗。如果您的系統依賴於此,但內核會運行不帶NUMA功能(也許是安全 +revocery內核),或者具有較少的節點在線,建議從自動模式中省略mpol選項掛載選項。 +可以在以後通過「mount -o remount,mpol=Policy:NodeList MountPoint」添加到掛載點。 + +要指定初始根目錄,可以使用如下掛載選項: + +==== ==================== +模式 權限用八進位數字表示 +uid 用戶ID +gid 組ID +==== ==================== + +這些選項對remount沒有任何影響。您可以通過chmod(1),chown(1)和chgrp(1)的更改 +已經掛載的參數。 + +tmpfs具有選擇32位還是64位inode的掛載選項: + +======= ============= +inode64 使用64位inode +inode32 使用32位inode +======= ============= + +在32位內核上,默認是inode32,掛載時指定inode64會被拒絕。 +在64位內核上,默認配置是CONFIG_TMPFS_INODE64。inode64避免了單個設備上可能有多個 +具有相同inode編號的文件;比如32位應用程式使用glibc如果長期訪問tmpfs,一旦達到33 +位inode編號,就有EOVERFLOW失敗的危險,無法打開大於2GiB的文件,並返回EINVAL。 + +所以'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'將在 +/mytmpfs上掛載tmpfs實例,分配只能由root用戶訪問的10GB RAM/SWAP,可以有10240個 +inode的實例。 + + +:作者: + Christoph Rohland <cr@sap.com>, 1.12.01 +:更新: + Hugh Dickins, 4 June 2007 +:更新: + KOSAKI Motohiro, 16 Mar 2010 +:更新: + Chris Down, 13 July 2020 + diff --git a/Documentation/translations/zh_TW/filesystems/virtiofs.rst b/Documentation/translations/zh_TW/filesystems/virtiofs.rst new file mode 100644 index 000000000000..2b05e84375dd --- /dev/null +++ b/Documentation/translations/zh_TW/filesystems/virtiofs.rst @@ -0,0 +1,61 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. include:: ../disclaimer-zh_TW.rst + +:Original: :ref:`Documentation/filesystems/virtiofs.rst <virtiofs_index>` + +譯者 +:: + + 中文版維護者: 王文虎 Wang Wenhu <wenhu.wang@vivo.com> + 中文版翻譯者: 王文虎 Wang Wenhu <wenhu.wang@vivo.com> + 中文版校譯者: 王文虎 Wang Wenhu <wenhu.wang@vivo.com> + 中文版校譯者: 王文虎 Wang Wenhu <wenhu.wang@vivo.com> + 繁體中文版校譯者:胡皓文 Hu Haowen <src.res@email.cn> + +=========================================== +virtiofs: virtio-fs 主機<->客機共享文件系統 +=========================================== + +- Copyright (C) 2020 Vivo Communication Technology Co. Ltd. + +介紹 +==== +Linux的virtiofs文件系統實現了一個半虛擬化VIRTIO類型「virtio-fs」設備的驅動,通過該\ +類型設備實現客機<->主機文件系統共享。它允許客機掛載一個已經導出到主機的目錄。 + +客機通常需要訪問主機或者遠程系統上的文件。使用場景包括:在新客機安裝時讓文件對其\ +可見;從主機上的根文件系統啓動;對無狀態或臨時客機提供持久存儲和在客機之間共享目錄。 + +儘管在某些任務可能通過使用已有的網絡文件系統完成,但是卻需要非常難以自動化的配置\ +步驟,且將存儲網絡暴露給客機。而virtio-fs設備通過提供不經過網絡的文件系統訪問文件\ +的設計方式解決了這些問題。 + +另外,virto-fs設備發揮了主客機共存的優點提高了性能,並且提供了網絡文件系統所不具備 +的一些語義功能。 + +用法 +==== +以``myfs``標籤將文件系統掛載到``/mnt``: + +.. code-block:: sh + + guest# mount -t virtiofs myfs /mnt + +請查閱 https://virtio-fs.gitlab.io/ 了解配置QEMU和virtiofsd守護程序的詳細信息。 + +內幕 +==== +由於virtio-fs設備將FUSE協議用於文件系統請求,因此Linux的virtiofs文件系統與FUSE文\ +件系統客戶端緊密集成在一起。客機充當FUSE客戶端而主機充當FUSE伺服器,內核與用戶空\ +間之間的/dev/fuse接口由virtio-fs設備接口代替。 + +FUSE請求被置於虛擬隊列中由主機處理。主機填充緩衝區中的響應部分,而客機處理請求的完成部分。 + +將/dev/fuse映射到虛擬隊列需要解決/dev/fuse和虛擬隊列之間語義上的差異。每次讀取\ +/dev/fuse設備時,FUSE客戶端都可以選擇要傳輸的請求,從而可以使某些請求優先於其他\ +請求。虛擬隊列有其隊列語義,無法更改已入隊請求的順序。在虛擬隊列已滿的情況下尤 +其關鍵,因爲此時不可能加入高優先級的請求。爲了解決此差異,virtio-fs設備採用「hiprio」\ +(高優先級)虛擬隊列,專門用於有別於普通請求的高優先級請求。 + + diff --git a/Documentation/translations/zh_TW/index.rst b/Documentation/translations/zh_TW/index.rst index c02c4b5281e6..2a281036c406 100644 --- a/Documentation/translations/zh_TW/index.rst +++ b/Documentation/translations/zh_TW/index.rst @@ -89,6 +89,12 @@ TODOList: 大部分信息都是直接從內核原始碼獲取的,並根據需要添加補充材料(或者至少是在 我們設法添加的時候——可能不是所有的都是有需要的)。 +.. toctree:: + :maxdepth: 2 + + cpu-freq/index + filesystems/index + TODOList: * driver-api/index @@ -97,7 +103,6 @@ TODOList: * accounting/index * block/index * cdrom/index -* cpu-freq/index * ide/index * fb/index * fpga/index @@ -123,7 +128,6 @@ TODOList: * security/index * sound/index * crypto/index -* filesystems/index * vm/index * bpf/index * usb/index @@ -136,6 +140,11 @@ TODOList: 體系結構無關文檔 ---------------- +.. toctree:: + :maxdepth: 2 + + arm64/index + TODOList: * asm-annotations diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index 0b5eefed027e..c432be070f67 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -27,6 +27,7 @@ place where this information is gathered. iommu media/index sysfs-platform_profile + vduse .. only:: subproject and html diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index b7070d76f076..2e8134059c87 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -299,6 +299,7 @@ Code Seq# Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/Documentation/userspace-api/vduse.rst b/Documentation/userspace-api/vduse.rst new file mode 100644 index 000000000000..42ef59ea5314 --- /dev/null +++ b/Documentation/userspace-api/vduse.rst @@ -0,0 +1,233 @@ +================================== +VDUSE - "vDPA Device in Userspace" +================================== + +vDPA (virtio data path acceleration) device is a device that uses a +datapath which complies with the virtio specifications with vendor +specific control path. vDPA devices can be both physically located on +the hardware or emulated by software. VDUSE is a framework that makes it +possible to implement software-emulated vDPA devices in userspace. And +to make the device emulation more secure, the emulated vDPA device's +control path is handled in the kernel and only the data path is +implemented in the userspace. + +Note that only virtio block device is supported by VDUSE framework now, +which can reduce security risks when the userspace process that implements +the data path is run by an unprivileged user. The support for other device +types can be added after the security issue of corresponding device driver +is clarified or fixed in the future. + +Create/Destroy VDUSE devices +------------------------ + +VDUSE devices are created as follows: + +1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on + /dev/vduse/control. + +2. Setup each virtqueue with ioctl(VDUSE_VQ_SETUP) on /dev/vduse/$NAME. + +3. Begin processing VDUSE messages from /dev/vduse/$NAME. The first + messages will arrive while attaching the VDUSE instance to vDPA bus. + +4. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE + instance to vDPA bus. + +VDUSE devices are destroyed as follows: + +1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE + instance from vDPA bus. + +2. Close the file descriptor referring to /dev/vduse/$NAME. + +3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on + /dev/vduse/control. + +The netlink messages can be sent via vdpa tool in iproute2 or use the +below sample codes: + +.. code-block:: c + + static int netlink_add_vduse(const char *name, enum vdpa_command cmd) + { + struct nl_sock *nlsock; + struct nl_msg *msg; + int famid; + + nlsock = nl_socket_alloc(); + if (!nlsock) + return -ENOMEM; + + if (genl_connect(nlsock)) + goto free_sock; + + famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME); + if (famid < 0) + goto close_sock; + + msg = nlmsg_alloc(); + if (!msg) + goto close_sock; + + if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0)) + goto nla_put_failure; + + NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name); + if (cmd == VDPA_CMD_DEV_NEW) + NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse"); + + if (nl_send_sync(nlsock, msg)) + goto close_sock; + + nl_close(nlsock); + nl_socket_free(nlsock); + + return 0; + nla_put_failure: + nlmsg_free(msg); + close_sock: + nl_close(nlsock); + free_sock: + nl_socket_free(nlsock); + return -1; + } + +How VDUSE works +--------------- + +As mentioned above, a VDUSE device is created by ioctl(VDUSE_CREATE_DEV) on +/dev/vduse/control. With this ioctl, userspace can specify some basic configuration +such as device name (uniquely identify a VDUSE device), virtio features, virtio +configuration space, the number of virtqueues and so on for this emulated device. +Then a char device interface (/dev/vduse/$NAME) is exported to userspace for device +emulation. Userspace can use the VDUSE_VQ_SETUP ioctl on /dev/vduse/$NAME to +add per-virtqueue configuration such as the max size of virtqueue to the device. + +After the initialization, the VDUSE device can be attached to vDPA bus via +the VDPA_CMD_DEV_NEW netlink message. Userspace needs to read()/write() on +/dev/vduse/$NAME to receive/reply some control messages from/to VDUSE kernel +module as follows: + +.. code-block:: c + + static int vduse_message_handler(int dev_fd) + { + int len; + struct vduse_dev_request req; + struct vduse_dev_response resp; + + len = read(dev_fd, &req, sizeof(req)); + if (len != sizeof(req)) + return -1; + + resp.request_id = req.request_id; + + switch (req.type) { + + /* handle different types of messages */ + + } + + len = write(dev_fd, &resp, sizeof(resp)); + if (len != sizeof(resp)) + return -1; + + return 0; + } + +There are now three types of messages introduced by VDUSE framework: + +- VDUSE_GET_VQ_STATE: Get the state for virtqueue, userspace should return + avail index for split virtqueue or the device/driver ring wrap counters and + the avail and used index for packed virtqueue. + +- VDUSE_SET_STATUS: Set the device status, userspace should follow + the virtio spec: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html + to process this message. For example, fail to set the FEATURES_OK device + status bit if the device can not accept the negotiated virtio features + get from the VDUSE_DEV_GET_FEATURES ioctl. + +- VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for specified + IOVA range, userspace should firstly remove the old mapping, then setup the new + mapping via the VDUSE_IOTLB_GET_FD ioctl. + +After DRIVER_OK status bit is set via the VDUSE_SET_STATUS message, userspace is +able to start the dataplane processing as follows: + +1. Get the specified virtqueue's information with the VDUSE_VQ_GET_INFO ioctl, + including the size, the IOVAs of descriptor table, available ring and used ring, + the state and the ready status. + +2. Pass the above IOVAs to the VDUSE_IOTLB_GET_FD ioctl so that those IOVA regions + can be mapped into userspace. Some sample codes is shown below: + +.. code-block:: c + + static int perm_to_prot(uint8_t perm) + { + int prot = 0; + + switch (perm) { + case VDUSE_ACCESS_WO: + prot |= PROT_WRITE; + break; + case VDUSE_ACCESS_RO: + prot |= PROT_READ; + break; + case VDUSE_ACCESS_RW: + prot |= PROT_READ | PROT_WRITE; + break; + } + + return prot; + } + + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len) + { + int fd; + void *addr; + size_t size; + struct vduse_iotlb_entry entry; + + entry.start = iova; + entry.last = iova; + + /* + * Find the first IOVA region that overlaps with the specified + * range [start, last] and return the corresponding file descriptor. + */ + fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry); + if (fd < 0) + return NULL; + + size = entry.last - entry.start + 1; + *len = entry.last - iova + 1; + addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED, + fd, entry.offset); + close(fd); + if (addr == MAP_FAILED) + return NULL; + + /* + * Using some data structures such as linked list to store + * the iotlb mapping. The munmap(2) should be called for the + * cached mapping when the corresponding VDUSE_UPDATE_IOTLB + * message is received or the device is reset. + */ + + return addr + iova - entry.start; + } + +3. Setup the kick eventfd for the specified virtqueues with the VDUSE_VQ_SETUP_KICKFD + ioctl. The kick eventfd is used by VDUSE kernel module to notify userspace to + consume the available ring. This is optional since userspace can choose to poll the + available ring instead. + +4. Listen to the kick eventfd (optional) and consume the available ring. The buffer + described by the descriptors in the descriptor table should be also mapped into + userspace via the VDUSE_IOTLB_GET_FD ioctl before accessing. + +5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl + after the used ring is filled. + +For more details on the uAPI, please see include/uapi/linux/vduse.h. diff --git a/Documentation/vm/damon/api.rst b/Documentation/vm/damon/api.rst new file mode 100644 index 000000000000..08f34df45523 --- /dev/null +++ b/Documentation/vm/damon/api.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +API Reference +============= + +Kernel space programs can use every feature of DAMON using below APIs. All you +need to do is including ``damon.h``, which is located in ``include/linux/`` of +the source tree. + +Structures +========== + +.. kernel-doc:: include/linux/damon.h + + +Functions +========= + +.. kernel-doc:: mm/damon/core.c diff --git a/Documentation/vm/damon/design.rst b/Documentation/vm/damon/design.rst new file mode 100644 index 000000000000..b05159c295f4 --- /dev/null +++ b/Documentation/vm/damon/design.rst @@ -0,0 +1,166 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====== +Design +====== + +Configurable Layers +=================== + +DAMON provides data access monitoring functionality while making the accuracy +and the overhead controllable. The fundamental access monitorings require +primitives that dependent on and optimized for the target address space. On +the other hand, the accuracy and overhead tradeoff mechanism, which is the core +of DAMON, is in the pure logic space. DAMON separates the two parts in +different layers and defines its interface to allow various low level +primitives implementations configurable with the core logic. + +Due to this separated design and the configurable interface, users can extend +DAMON for any address space by configuring the core logics with appropriate low +level primitive implementations. If appropriate one is not provided, users can +implement the primitives on their own. + +For example, physical memory, virtual memory, swap space, those for specific +processes, NUMA nodes, files, and backing memory devices would be supportable. +Also, if some architectures or devices support special optimized access check +primitives, those will be easily configurable. + + +Reference Implementations of Address Space Specific Primitives +============================================================== + +The low level primitives for the fundamental access monitoring are defined in +two parts: + +1. Identification of the monitoring target address range for the address space. +2. Access check of specific address range in the target space. + +DAMON currently provides the implementation of the primitives for only the +virtual address spaces. Below two subsections describe how it works. + + +VMA-based Target Address Range Construction +------------------------------------------- + +Only small parts in the super-huge virtual address space of the processes are +mapped to the physical memory and accessed. Thus, tracking the unmapped +address regions is just wasteful. However, because DAMON can deal with some +level of noise using the adaptive regions adjustment mechanism, tracking every +mapping is not strictly required but could even incur a high overhead in some +cases. That said, too huge unmapped areas inside the monitoring target should +be removed to not take the time for the adaptive mechanism. + +For the reason, this implementation converts the complex mappings to three +distinct regions that cover every mapped area of the address space. The two +gaps between the three regions are the two biggest unmapped areas in the given +address space. The two biggest unmapped areas would be the gap between the +heap and the uppermost mmap()-ed region, and the gap between the lowermost +mmap()-ed region and the stack in most of the cases. Because these gaps are +exceptionally huge in usual address spaces, excluding these will be sufficient +to make a reasonable trade-off. Below shows this in detail:: + + <heap> + <BIG UNMAPPED REGION 1> + <uppermost mmap()-ed region> + (small mmap()-ed regions and munmap()-ed regions) + <lowermost mmap()-ed region> + <BIG UNMAPPED REGION 2> + <stack> + + +PTE Accessed-bit Based Access Check +----------------------------------- + +The implementation for the virtual address space uses PTE Accessed-bit for +basic access checks. It finds the relevant PTE Accessed bit from the address +by walking the page table for the target task of the address. In this way, the +implementation finds and clears the bit for next sampling target address and +checks whether the bit set again after one sampling period. This could disturb +other kernel subsystems using the Accessed bits, namely Idle page tracking and +the reclaim logic. To avoid such disturbances, DAMON makes it mutually +exclusive with Idle page tracking and uses ``PG_idle`` and ``PG_young`` page +flags to solve the conflict with the reclaim logic, as Idle page tracking does. + + +Address Space Independent Core Mechanisms +========================================= + +Below four sections describe each of the DAMON core mechanisms and the five +monitoring attributes, ``sampling interval``, ``aggregation interval``, +``regions update interval``, ``minimum number of regions``, and ``maximum +number of regions``. + + +Access Frequency Monitoring +--------------------------- + +The output of DAMON says what pages are how frequently accessed for a given +duration. The resolution of the access frequency is controlled by setting +``sampling interval`` and ``aggregation interval``. In detail, DAMON checks +access to each page per ``sampling interval`` and aggregates the results. In +other words, counts the number of the accesses to each page. After each +``aggregation interval`` passes, DAMON calls callback functions that previously +registered by users so that users can read the aggregated results and then +clears the results. This can be described in below simple pseudo-code:: + + while monitoring_on: + for page in monitoring_target: + if accessed(page): + nr_accesses[page] += 1 + if time() % aggregation_interval == 0: + for callback in user_registered_callbacks: + callback(monitoring_target, nr_accesses) + for page in monitoring_target: + nr_accesses[page] = 0 + sleep(sampling interval) + +The monitoring overhead of this mechanism will arbitrarily increase as the +size of the target workload grows. + + +Region Based Sampling +--------------------- + +To avoid the unbounded increase of the overhead, DAMON groups adjacent pages +that assumed to have the same access frequencies into a region. As long as the +assumption (pages in a region have the same access frequencies) is kept, only +one page in the region is required to be checked. Thus, for each ``sampling +interval``, DAMON randomly picks one page in each region, waits for one +``sampling interval``, checks whether the page is accessed meanwhile, and +increases the access frequency of the region if so. Therefore, the monitoring +overhead is controllable by setting the number of regions. DAMON allows users +to set the minimum and the maximum number of regions for the trade-off. + +This scheme, however, cannot preserve the quality of the output if the +assumption is not guaranteed. + + +Adaptive Regions Adjustment +--------------------------- + +Even somehow the initial monitoring target regions are well constructed to +fulfill the assumption (pages in same region have similar access frequencies), +the data access pattern can be dynamically changed. This will result in low +monitoring quality. To keep the assumption as much as possible, DAMON +adaptively merges and splits each region based on their access frequency. + +For each ``aggregation interval``, it compares the access frequencies of +adjacent regions and merges those if the frequency difference is small. Then, +after it reports and clears the aggregated access frequency of each region, it +splits each region into two or three regions if the total number of regions +will not exceed the user-specified maximum number of regions after the split. + +In this way, DAMON provides its best-effort quality and minimal overhead while +keeping the bounds users set for their trade-off. + + +Dynamic Target Space Updates Handling +------------------------------------- + +The monitoring target address range could dynamically changed. For example, +virtual memory could be dynamically mapped and unmapped. Physical memory could +be hot-plugged. + +As the changes could be quite frequent in some cases, DAMON checks the dynamic +memory mapping changes and applies it to the abstracted target area only for +each of a user-specified time interval (``regions update interval``). diff --git a/Documentation/vm/damon/faq.rst b/Documentation/vm/damon/faq.rst new file mode 100644 index 000000000000..cb3d8b585a8b --- /dev/null +++ b/Documentation/vm/damon/faq.rst @@ -0,0 +1,51 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +Frequently Asked Questions +========================== + +Why a new subsystem, instead of extending perf or other user space tools? +========================================================================= + +First, because it needs to be lightweight as much as possible so that it can be +used online, any unnecessary overhead such as kernel - user space context +switching cost should be avoided. Second, DAMON aims to be used by other +programs including the kernel. Therefore, having a dependency on specific +tools like perf is not desirable. These are the two biggest reasons why DAMON +is implemented in the kernel space. + + +Can 'idle pages tracking' or 'perf mem' substitute DAMON? +========================================================= + +Idle page tracking is a low level primitive for access check of the physical +address space. 'perf mem' is similar, though it can use sampling to minimize +the overhead. On the other hand, DAMON is a higher-level framework for the +monitoring of various address spaces. It is focused on memory management +optimization and provides sophisticated accuracy/overhead handling mechanisms. +Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of +DAMON's output, but cannot substitute DAMON. + + +Does DAMON support virtual memory only? +======================================= + +No. The core of the DAMON is address space independent. The address space +specific low level primitive parts including monitoring target regions +constructions and actual access checks can be implemented and configured on the +DAMON core by the users. In this way, DAMON users can monitor any address +space with any access check technique. + +Nonetheless, DAMON provides vma tracking and PTE Accessed bit check based +implementations of the address space dependent functions for the virtual memory +by default, for a reference and convenient use. In near future, we will +provide those for physical memory address space. + + +Can I simply monitor page granularity? +====================================== + +Yes. You can do so by setting the ``min_nr_regions`` attribute higher than the +working set size divided by the page size. Because the monitoring target +regions size is forced to be ``>=page size``, the region split will make no +effect. diff --git a/Documentation/vm/damon/index.rst b/Documentation/vm/damon/index.rst new file mode 100644 index 000000000000..a2858baf3bf1 --- /dev/null +++ b/Documentation/vm/damon/index.rst @@ -0,0 +1,30 @@ +.. SPDX-License-Identifier: GPL-2.0 + +========================== +DAMON: Data Access MONitor +========================== + +DAMON is a data access monitoring framework subsystem for the Linux kernel. +The core mechanisms of DAMON (refer to :doc:`design` for the detail) make it + + - *accurate* (the monitoring output is useful enough for DRAM level memory + management; It might not appropriate for CPU Cache levels, though), + - *light-weight* (the monitoring overhead is low enough to be applied online), + and + - *scalable* (the upper-bound of the overhead is in constant range regardless + of the size of target workloads). + +Using this framework, therefore, the kernel's memory management mechanisms can +make advanced decisions. Experimental memory management optimization works +that incurring high data accesses monitoring overhead could implemented again. +In user space, meanwhile, users who have some special workloads can write +personalized applications for better understanding and optimizations of their +workloads and systems. + +.. toctree:: + :maxdepth: 2 + + faq + design + api + plans diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst index eff5fbd492d0..b51f0d8992f8 100644 --- a/Documentation/vm/index.rst +++ b/Documentation/vm/index.rst @@ -32,6 +32,7 @@ descriptions of data structures and algorithms. arch_pgtable_helpers balance cleancache + damon/index free_page_reporting frontswap highmem diff --git a/Documentation/x86/x86_64/mm.rst b/Documentation/x86/x86_64/mm.rst index ede1875719fb..9798676bb0bf 100644 --- a/Documentation/x86/x86_64/mm.rst +++ b/Documentation/x86/x86_64/mm.rst @@ -140,10 +140,6 @@ The direct mapping covers all memory in the system up to the highest memory address (this means in some cases it can also include PCI memory holes). -vmalloc space is lazily synchronized into the different PML4/PML5 pages of -the processes using the page fault handler, with init_top_pgt as -reference. - We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual memory window (this size is arbitrary, it can be raised later if needed). The mappings are not part of any other kernel PGD and are only available |