summaryrefslogtreecommitdiffstats
path: root/Documentation/virt/kvm/api.rst
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/virt/kvm/api.rst')
-rw-r--r--Documentation/virt/kvm/api.rst190
1 files changed, 116 insertions, 74 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index edc070c6e19b..454c2aaa155e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7,8 +7,19 @@ The Definitive KVM (Kernel-based Virtual Machine) API Documentation
1. General description
======================
-The kvm API is a set of ioctls that are issued to control various aspects
-of a virtual machine. The ioctls belong to the following classes:
+The kvm API is centered around different kinds of file descriptors
+and ioctls that can be issued to these file descriptors. An initial
+open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
+can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
+handle will create a VM file descriptor which can be used to issue VM
+ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
+create a virtual cpu or device and return a file descriptor pointing to
+the new resource.
+
+In other words, the kvm API is a set of ioctls that are issued to
+different kinds of file descriptor in order to control various aspects of
+a virtual machine. Depending on the file descriptor that accepts them,
+ioctls belong to the following classes:
- System ioctls: These query and set global attributes which affect the
whole kvm subsystem. In addition a system ioctl is used to create
@@ -35,18 +46,19 @@ of a virtual machine. The ioctls belong to the following classes:
device ioctls must be issued from the same process (address space) that
was used to create the VM.
-2. File descriptors
-===================
+While most ioctls are specific to one kind of file descriptor, in some
+cases the same ioctl can belong to more than one class.
+
+The KVM API grew over time. For this reason, KVM defines many constants
+of the form ``KVM_CAP_*``, each corresponding to a set of functionality
+provided by one or more ioctls. Availability of these "capabilities" can
+be checked with :ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`. Some
+capabilities also need to be enabled for VMs or VCPUs where their
+functionality is desired (see :ref:`cap_enable` and :ref:`cap_enable_vm`).
-The kvm API is centered around file descriptors. An initial
-open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
-can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
-handle will create a VM file descriptor which can be used to issue VM
-ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
-create a virtual cpu or device and return a file descriptor pointing to
-the new resource. Finally, ioctls on a vcpu or device fd can be used
-to control the vcpu or device. For vcpus, this includes the important
-task of actually running guest code.
+
+2. Restrictions
+===============
In general file descriptors can be migrated among processes by means
of fork() and the SCM_RIGHTS facility of unix domain socket. These
@@ -96,12 +108,9 @@ description:
Capability:
which KVM extension provides this ioctl. Can be 'basic',
which means that is will be provided by any kernel that supports
- API version 12 (see section 4.1), a KVM_CAP_xyz constant, which
- means availability needs to be checked with KVM_CHECK_EXTENSION
- (see section 4.4), or 'none' which means that while not all kernels
- support this ioctl, there's no capability bit to check its
- availability: for kernels that don't support the ioctl,
- the ioctl returns -ENOTTY.
+ API version 12 (see :ref:`KVM_GET_API_VERSION <KVM_GET_API_VERSION>`),
+ or a KVM_CAP_xyz constant that can be checked with
+ :ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`.
Architectures:
which instruction set architectures provide this ioctl.
@@ -118,6 +127,8 @@ description:
are not detailed, but errors with specific meanings are.
+.. _KVM_GET_API_VERSION:
+
4.1 KVM_GET_API_VERSION
-----------------------
@@ -246,6 +257,8 @@ This list also varies by kvm version and host processor, but does not change
otherwise.
+.. _KVM_CHECK_EXTENSION:
+
4.4 KVM_CHECK_EXTENSION
-----------------------
@@ -288,7 +301,7 @@ the VCPU file descriptor can be mmap-ed, including:
- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at
KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on
- KVM_CAP_DIRTY_LOG_RING, see section 8.3.
+ KVM_CAP_DIRTY_LOG_RING, see :ref:`KVM_CAP_DIRTY_LOG_RING`.
4.7 KVM_CREATE_VCPU
@@ -338,8 +351,8 @@ KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual
cpu's hardware control block.
-4.8 KVM_GET_DIRTY_LOG (vm ioctl)
---------------------------------
+4.8 KVM_GET_DIRTY_LOG
+---------------------
:Capability: basic
:Architectures: all
@@ -1298,7 +1311,7 @@ See KVM_GET_VCPU_EVENTS for the data structure.
:Capability: KVM_CAP_DEBUGREGS
:Architectures: x86
-:Type: vm ioctl
+:Type: vcpu ioctl
:Parameters: struct kvm_debugregs (out)
:Returns: 0 on success, -1 on error
@@ -1320,7 +1333,7 @@ Reads debug registers from the vcpu.
:Capability: KVM_CAP_DEBUGREGS
:Architectures: x86
-:Type: vm ioctl
+:Type: vcpu ioctl
:Parameters: struct kvm_debugregs (in)
:Returns: 0 on success, -1 on error
@@ -1429,6 +1442,8 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
+.. _KVM_ENABLE_CAP:
+
4.37 KVM_ENABLE_CAP
-------------------
@@ -2116,8 +2131,8 @@ TLB, prior to calling KVM_RUN on the associated vcpu.
The "bitmap" field is the userspace address of an array. This array
consists of a number of bits, equal to the total number of TLB entries as
-determined by the last successful call to KVM_CONFIG_TLB, rounded up to the
-nearest multiple of 64.
+determined by the last successful call to ``KVM_ENABLE_CAP(KVM_CAP_SW_TLB)``,
+rounded up to the nearest multiple of 64.
Each bit corresponds to one TLB entry, ordered the same as in the shared TLB
array.
@@ -2170,42 +2185,6 @@ userspace update the TCE table directly which is useful in some
circumstances.
-4.63 KVM_ALLOCATE_RMA
----------------------
-
-:Capability: KVM_CAP_PPC_RMA
-:Architectures: powerpc
-:Type: vm ioctl
-:Parameters: struct kvm_allocate_rma (out)
-:Returns: file descriptor for mapping the allocated RMA
-
-This allocates a Real Mode Area (RMA) from the pool allocated at boot
-time by the kernel. An RMA is a physically-contiguous, aligned region
-of memory used on older POWER processors to provide the memory which
-will be accessed by real-mode (MMU off) accesses in a KVM guest.
-POWER processors support a set of sizes for the RMA that usually
-includes 64MB, 128MB, 256MB and some larger powers of two.
-
-::
-
- /* for KVM_ALLOCATE_RMA */
- struct kvm_allocate_rma {
- __u64 rma_size;
- };
-
-The return value is a file descriptor which can be passed to mmap(2)
-to map the allocated RMA into userspace. The mapped area can then be
-passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the
-RMA for a virtual machine. The size of the RMA in bytes (which is
-fixed at host kernel boot time) is returned in the rma_size field of
-the argument structure.
-
-The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl
-is supported; 2 if the processor requires all virtual machines to have
-an RMA, or 1 if the processor can use an RMA but doesn't require it,
-because it supports the Virtual RMA (VRMA) facility.
-
-
4.64 KVM_NMI
------------
@@ -2602,7 +2581,7 @@ Specifically:
======================= ========= ===== =======================================
.. [1] These encodings are not accepted for SVE-enabled vcpus. See
- KVM_ARM_VCPU_INIT.
+ :ref:`KVM_ARM_VCPU_INIT`.
The equivalent register content can be accessed via bits [127:0] of
the corresponding SVE Zn registers instead for vcpus that have SVE
@@ -3593,6 +3572,27 @@ Errors:
This ioctl returns the guest registers that are supported for the
KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+Note that s390 does not support KVM_GET_REG_LIST for historical reasons
+(read: nobody cared). The set of registers in kernels 4.x and newer is:
+
+- KVM_REG_S390_TODPR
+
+- KVM_REG_S390_EPOCHDIFF
+
+- KVM_REG_S390_CPU_TIMER
+
+- KVM_REG_S390_CLOCK_COMP
+
+- KVM_REG_S390_PFTOKEN
+
+- KVM_REG_S390_PFCOMPARE
+
+- KVM_REG_S390_PFSELECT
+
+- KVM_REG_S390_PP
+
+- KVM_REG_S390_GBEA
+
4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated)
-----------------------------------------
@@ -4956,8 +4956,8 @@ Coalesced pio is based on coalesced mmio. There is little difference
between coalesced mmio and pio except that coalesced pio records accesses
to I/O ports.
-4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl)
-------------------------------------
+4.117 KVM_CLEAR_DIRTY_LOG
+-------------------------
:Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2
:Architectures: x86, arm64, mips
@@ -5093,8 +5093,8 @@ Recognised values for feature:
Finalizes the configuration of the specified vcpu feature.
The vcpu must already have been initialised, enabling the affected feature, by
-means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in
-features[].
+means of a successful :ref:`KVM_ARM_VCPU_INIT <KVM_ARM_VCPU_INIT>` call with the
+appropriate flag set in features[].
For affected vcpu features, this is a mandatory step that must be performed
before the vcpu is fully usable.
@@ -5266,7 +5266,7 @@ the cpu reset definition in the POP (Principles Of Operation).
4.123 KVM_S390_INITIAL_RESET
----------------------------
-:Capability: none
+:Capability: basic
:Architectures: s390
:Type: vcpu ioctl
:Parameters: none
@@ -6205,7 +6205,7 @@ applied.
.. _KVM_ARM_GET_REG_WRITABLE_MASKS:
4.139 KVM_ARM_GET_REG_WRITABLE_MASKS
--------------------------------------------
+------------------------------------
:Capability: KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES
:Architectures: arm64
@@ -6443,6 +6443,8 @@ the capability to be present.
`flags` must currently be zero.
+.. _kvm_run:
+
5. The kvm_run structure
========================
@@ -6855,6 +6857,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
specification.
+ - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
+ if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
+ specification.
+
- for RISC-V, data[0] is set to the value of the second argument of the
``sbi_system_reset`` call.
@@ -6888,6 +6894,12 @@ either:
- Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
"Caller responsibilities" for possible return values.
+Hibernation using the PSCI SYSTEM_OFF2 call is enabled when PSCI v1.3
+is enabled. If a guest invokes the PSCI SYSTEM_OFF2 function, KVM will
+exit to userspace with the KVM_SYSTEM_EVENT_SHUTDOWN event type and with
+data[0] set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only
+supported hibernate type for the SYSTEM_OFF2 function is HIBERNATE_OFF.
+
::
/* KVM_EXIT_IOAPIC_EOI */
@@ -7162,11 +7174,15 @@ primary storage for certain register types. Therefore, the kernel may use the
values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
+.. _cap_enable:
+
6. Capabilities that can be enabled on vCPUs
============================================
There are certain capabilities that change the behavior of the virtual CPU or
-the virtual machine when enabled. To enable them, please see section 4.37.
+the virtual machine when enabled. To enable them, please see
+:ref:`KVM_ENABLE_CAP`.
+
Below you can find a list of capabilities and what their effect on the vCPU or
the virtual machine is when enabling them.
@@ -7375,7 +7391,7 @@ KVM API and also from the guest.
sets are supported
(bitfields defined in arch/x86/include/uapi/asm/kvm.h).
-As described above in the kvm_sync_regs struct info in section 5 (kvm_run):
+As described above in the kvm_sync_regs struct info in section :ref:`kvm_run`,
KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers
without having to call SET/GET_*REGS". This reduces overhead by eliminating
repeated ioctl calls for setting and/or getting register values. This is
@@ -7421,13 +7437,15 @@ Unused bitfields in the bitarrays must be set to zero.
This capability connects the vcpu to an in-kernel XIVE device.
+.. _cap_enable_vm:
+
7. Capabilities that can be enabled on VMs
==========================================
There are certain capabilities that change the behavior of the virtual
-machine when enabled. To enable them, please see section 4.37. Below
-you can find a list of capabilities and what their effect on the VM
-is when enabling them.
+machine when enabled. To enable them, please see section
+:ref:`KVM_ENABLE_CAP`. Below you can find a list of capabilities and
+what their effect on the VM is when enabling them.
The following information is provided along with the description:
@@ -8107,6 +8125,28 @@ KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM
or moved memslot isn't reachable, i.e KVM
_may_ invalidate only SPTEs related to the
memslot.
+
+KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the
+ vCPU's MSR_IA32_PERF_CAPABILITIES (0x345),
+ MSR_IA32_ARCH_CAPABILITIES (0x10a),
+ MSR_PLATFORM_INFO (0xce), and all VMX MSRs
+ (0x480..0x492) to the maximal capabilities
+ supported by KVM. KVM also sets
+ MSR_IA32_UCODE_REV (0x8b) to an arbitrary
+ value (which is different for Intel vs.
+ AMD). Lastly, when guest CPUID is set (by
+ userspace), KVM modifies select VMX MSR
+ fields to force consistency between guest
+ CPUID and L2's effective ISA. When this
+ quirk is disabled, KVM zeroes the vCPU's MSR
+ values (with two exceptions, see below),
+ i.e. treats the feature MSRs like CPUID
+ leaves and gives userspace full control of
+ the vCPU model definition. This quirk does
+ not affect VMX MSRs CR0/CR4_FIXED1 (0x487
+ and 0x489), as KVM does now allow them to
+ be set by userspace (KVM sets them based on
+ guest CPUID, for safety purposes).
=================================== ============================================
7.32 KVM_CAP_MAX_VCPU_ID
@@ -8588,6 +8628,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf
(0x40000001). Otherwise, a guest may use the paravirtual features
regardless of what has actually been exposed through the CPUID leaf.
+.. _KVM_CAP_DIRTY_LOG_RING:
+
8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL
----------------------------------------------------------