diff options
Diffstat (limited to 'Documentation/virt/kvm/api.rst')
-rw-r--r-- | Documentation/virt/kvm/api.rst | 190 |
1 files changed, 116 insertions, 74 deletions
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index edc070c6e19b..454c2aaa155e 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7,8 +7,19 @@ The Definitive KVM (Kernel-based Virtual Machine) API Documentation 1. General description ====================== -The kvm API is a set of ioctls that are issued to control various aspects -of a virtual machine. The ioctls belong to the following classes: +The kvm API is centered around different kinds of file descriptors +and ioctls that can be issued to these file descriptors. An initial +open("/dev/kvm") obtains a handle to the kvm subsystem; this handle +can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this +handle will create a VM file descriptor which can be used to issue VM +ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will +create a virtual cpu or device and return a file descriptor pointing to +the new resource. + +In other words, the kvm API is a set of ioctls that are issued to +different kinds of file descriptor in order to control various aspects of +a virtual machine. Depending on the file descriptor that accepts them, +ioctls belong to the following classes: - System ioctls: These query and set global attributes which affect the whole kvm subsystem. In addition a system ioctl is used to create @@ -35,18 +46,19 @@ of a virtual machine. The ioctls belong to the following classes: device ioctls must be issued from the same process (address space) that was used to create the VM. -2. File descriptors -=================== +While most ioctls are specific to one kind of file descriptor, in some +cases the same ioctl can belong to more than one class. + +The KVM API grew over time. For this reason, KVM defines many constants +of the form ``KVM_CAP_*``, each corresponding to a set of functionality +provided by one or more ioctls. Availability of these "capabilities" can +be checked with :ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`. Some +capabilities also need to be enabled for VMs or VCPUs where their +functionality is desired (see :ref:`cap_enable` and :ref:`cap_enable_vm`). -The kvm API is centered around file descriptors. An initial -open("/dev/kvm") obtains a handle to the kvm subsystem; this handle -can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this -handle will create a VM file descriptor which can be used to issue VM -ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will -create a virtual cpu or device and return a file descriptor pointing to -the new resource. Finally, ioctls on a vcpu or device fd can be used -to control the vcpu or device. For vcpus, this includes the important -task of actually running guest code. + +2. Restrictions +=============== In general file descriptors can be migrated among processes by means of fork() and the SCM_RIGHTS facility of unix domain socket. These @@ -96,12 +108,9 @@ description: Capability: which KVM extension provides this ioctl. Can be 'basic', which means that is will be provided by any kernel that supports - API version 12 (see section 4.1), a KVM_CAP_xyz constant, which - means availability needs to be checked with KVM_CHECK_EXTENSION - (see section 4.4), or 'none' which means that while not all kernels - support this ioctl, there's no capability bit to check its - availability: for kernels that don't support the ioctl, - the ioctl returns -ENOTTY. + API version 12 (see :ref:`KVM_GET_API_VERSION <KVM_GET_API_VERSION>`), + or a KVM_CAP_xyz constant that can be checked with + :ref:`KVM_CHECK_EXTENSION <KVM_CHECK_EXTENSION>`. Architectures: which instruction set architectures provide this ioctl. @@ -118,6 +127,8 @@ description: are not detailed, but errors with specific meanings are. +.. _KVM_GET_API_VERSION: + 4.1 KVM_GET_API_VERSION ----------------------- @@ -246,6 +257,8 @@ This list also varies by kvm version and host processor, but does not change otherwise. +.. _KVM_CHECK_EXTENSION: + 4.4 KVM_CHECK_EXTENSION ----------------------- @@ -288,7 +301,7 @@ the VCPU file descriptor can be mmap-ed, including: - if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on - KVM_CAP_DIRTY_LOG_RING, see section 8.3. + KVM_CAP_DIRTY_LOG_RING, see :ref:`KVM_CAP_DIRTY_LOG_RING`. 4.7 KVM_CREATE_VCPU @@ -338,8 +351,8 @@ KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual cpu's hardware control block. -4.8 KVM_GET_DIRTY_LOG (vm ioctl) --------------------------------- +4.8 KVM_GET_DIRTY_LOG +--------------------- :Capability: basic :Architectures: all @@ -1298,7 +1311,7 @@ See KVM_GET_VCPU_EVENTS for the data structure. :Capability: KVM_CAP_DEBUGREGS :Architectures: x86 -:Type: vm ioctl +:Type: vcpu ioctl :Parameters: struct kvm_debugregs (out) :Returns: 0 on success, -1 on error @@ -1320,7 +1333,7 @@ Reads debug registers from the vcpu. :Capability: KVM_CAP_DEBUGREGS :Architectures: x86 -:Type: vm ioctl +:Type: vcpu ioctl :Parameters: struct kvm_debugregs (in) :Returns: 0 on success, -1 on error @@ -1429,6 +1442,8 @@ because of a quirk in the virtualization implementation (see the internals documentation when it pops into existence). +.. _KVM_ENABLE_CAP: + 4.37 KVM_ENABLE_CAP ------------------- @@ -2116,8 +2131,8 @@ TLB, prior to calling KVM_RUN on the associated vcpu. The "bitmap" field is the userspace address of an array. This array consists of a number of bits, equal to the total number of TLB entries as -determined by the last successful call to KVM_CONFIG_TLB, rounded up to the -nearest multiple of 64. +determined by the last successful call to ``KVM_ENABLE_CAP(KVM_CAP_SW_TLB)``, +rounded up to the nearest multiple of 64. Each bit corresponds to one TLB entry, ordered the same as in the shared TLB array. @@ -2170,42 +2185,6 @@ userspace update the TCE table directly which is useful in some circumstances. -4.63 KVM_ALLOCATE_RMA ---------------------- - -:Capability: KVM_CAP_PPC_RMA -:Architectures: powerpc -:Type: vm ioctl -:Parameters: struct kvm_allocate_rma (out) -:Returns: file descriptor for mapping the allocated RMA - -This allocates a Real Mode Area (RMA) from the pool allocated at boot -time by the kernel. An RMA is a physically-contiguous, aligned region -of memory used on older POWER processors to provide the memory which -will be accessed by real-mode (MMU off) accesses in a KVM guest. -POWER processors support a set of sizes for the RMA that usually -includes 64MB, 128MB, 256MB and some larger powers of two. - -:: - - /* for KVM_ALLOCATE_RMA */ - struct kvm_allocate_rma { - __u64 rma_size; - }; - -The return value is a file descriptor which can be passed to mmap(2) -to map the allocated RMA into userspace. The mapped area can then be -passed to the KVM_SET_USER_MEMORY_REGION ioctl to establish it as the -RMA for a virtual machine. The size of the RMA in bytes (which is -fixed at host kernel boot time) is returned in the rma_size field of -the argument structure. - -The KVM_CAP_PPC_RMA capability is 1 or 2 if the KVM_ALLOCATE_RMA ioctl -is supported; 2 if the processor requires all virtual machines to have -an RMA, or 1 if the processor can use an RMA but doesn't require it, -because it supports the Virtual RMA (VRMA) facility. - - 4.64 KVM_NMI ------------ @@ -2602,7 +2581,7 @@ Specifically: ======================= ========= ===== ======================================= .. [1] These encodings are not accepted for SVE-enabled vcpus. See - KVM_ARM_VCPU_INIT. + :ref:`KVM_ARM_VCPU_INIT`. The equivalent register content can be accessed via bits [127:0] of the corresponding SVE Zn registers instead for vcpus that have SVE @@ -3593,6 +3572,27 @@ Errors: This ioctl returns the guest registers that are supported for the KVM_GET_ONE_REG/KVM_SET_ONE_REG calls. +Note that s390 does not support KVM_GET_REG_LIST for historical reasons +(read: nobody cared). The set of registers in kernels 4.x and newer is: + +- KVM_REG_S390_TODPR + +- KVM_REG_S390_EPOCHDIFF + +- KVM_REG_S390_CPU_TIMER + +- KVM_REG_S390_CLOCK_COMP + +- KVM_REG_S390_PFTOKEN + +- KVM_REG_S390_PFCOMPARE + +- KVM_REG_S390_PFSELECT + +- KVM_REG_S390_PP + +- KVM_REG_S390_GBEA + 4.85 KVM_ARM_SET_DEVICE_ADDR (deprecated) ----------------------------------------- @@ -4956,8 +4956,8 @@ Coalesced pio is based on coalesced mmio. There is little difference between coalesced mmio and pio except that coalesced pio records accesses to I/O ports. -4.117 KVM_CLEAR_DIRTY_LOG (vm ioctl) ------------------------------------- +4.117 KVM_CLEAR_DIRTY_LOG +------------------------- :Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 :Architectures: x86, arm64, mips @@ -5093,8 +5093,8 @@ Recognised values for feature: Finalizes the configuration of the specified vcpu feature. The vcpu must already have been initialised, enabling the affected feature, by -means of a successful KVM_ARM_VCPU_INIT call with the appropriate flag set in -features[]. +means of a successful :ref:`KVM_ARM_VCPU_INIT <KVM_ARM_VCPU_INIT>` call with the +appropriate flag set in features[]. For affected vcpu features, this is a mandatory step that must be performed before the vcpu is fully usable. @@ -5266,7 +5266,7 @@ the cpu reset definition in the POP (Principles Of Operation). 4.123 KVM_S390_INITIAL_RESET ---------------------------- -:Capability: none +:Capability: basic :Architectures: s390 :Type: vcpu ioctl :Parameters: none @@ -6205,7 +6205,7 @@ applied. .. _KVM_ARM_GET_REG_WRITABLE_MASKS: 4.139 KVM_ARM_GET_REG_WRITABLE_MASKS -------------------------------------------- +------------------------------------ :Capability: KVM_CAP_ARM_SUPPORTED_REG_MASK_RANGES :Architectures: arm64 @@ -6443,6 +6443,8 @@ the capability to be present. `flags` must currently be zero. +.. _kvm_run: + 5. The kvm_run structure ======================== @@ -6855,6 +6857,10 @@ the first `ndata` items (possibly zero) of the data array are valid. the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI specification. + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 + if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI + specification. + - for RISC-V, data[0] is set to the value of the second argument of the ``sbi_system_reset`` call. @@ -6888,6 +6894,12 @@ either: - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 "Caller responsibilities" for possible return values. +Hibernation using the PSCI SYSTEM_OFF2 call is enabled when PSCI v1.3 +is enabled. If a guest invokes the PSCI SYSTEM_OFF2 function, KVM will +exit to userspace with the KVM_SYSTEM_EVENT_SHUTDOWN event type and with +data[0] set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only +supported hibernate type for the SYSTEM_OFF2 function is HIBERNATE_OFF. + :: /* KVM_EXIT_IOAPIC_EOI */ @@ -7162,11 +7174,15 @@ primary storage for certain register types. Therefore, the kernel may use the values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set. +.. _cap_enable: + 6. Capabilities that can be enabled on vCPUs ============================================ There are certain capabilities that change the behavior of the virtual CPU or -the virtual machine when enabled. To enable them, please see section 4.37. +the virtual machine when enabled. To enable them, please see +:ref:`KVM_ENABLE_CAP`. + Below you can find a list of capabilities and what their effect on the vCPU or the virtual machine is when enabling them. @@ -7375,7 +7391,7 @@ KVM API and also from the guest. sets are supported (bitfields defined in arch/x86/include/uapi/asm/kvm.h). -As described above in the kvm_sync_regs struct info in section 5 (kvm_run): +As described above in the kvm_sync_regs struct info in section :ref:`kvm_run`, KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers without having to call SET/GET_*REGS". This reduces overhead by eliminating repeated ioctl calls for setting and/or getting register values. This is @@ -7421,13 +7437,15 @@ Unused bitfields in the bitarrays must be set to zero. This capability connects the vcpu to an in-kernel XIVE device. +.. _cap_enable_vm: + 7. Capabilities that can be enabled on VMs ========================================== There are certain capabilities that change the behavior of the virtual -machine when enabled. To enable them, please see section 4.37. Below -you can find a list of capabilities and what their effect on the VM -is when enabling them. +machine when enabled. To enable them, please see section +:ref:`KVM_ENABLE_CAP`. Below you can find a list of capabilities and +what their effect on the VM is when enabling them. The following information is provided along with the description: @@ -8107,6 +8125,28 @@ KVM_X86_QUIRK_SLOT_ZAP_ALL By default, for KVM_X86_DEFAULT_VM VMs, KVM or moved memslot isn't reachable, i.e KVM _may_ invalidate only SPTEs related to the memslot. + +KVM_X86_QUIRK_STUFF_FEATURE_MSRS By default, at vCPU creation, KVM sets the + vCPU's MSR_IA32_PERF_CAPABILITIES (0x345), + MSR_IA32_ARCH_CAPABILITIES (0x10a), + MSR_PLATFORM_INFO (0xce), and all VMX MSRs + (0x480..0x492) to the maximal capabilities + supported by KVM. KVM also sets + MSR_IA32_UCODE_REV (0x8b) to an arbitrary + value (which is different for Intel vs. + AMD). Lastly, when guest CPUID is set (by + userspace), KVM modifies select VMX MSR + fields to force consistency between guest + CPUID and L2's effective ISA. When this + quirk is disabled, KVM zeroes the vCPU's MSR + values (with two exceptions, see below), + i.e. treats the feature MSRs like CPUID + leaves and gives userspace full control of + the vCPU model definition. This quirk does + not affect VMX MSRs CR0/CR4_FIXED1 (0x487 + and 0x489), as KVM does now allow them to + be set by userspace (KVM sets them based on + guest CPUID, for safety purposes). =================================== ============================================ 7.32 KVM_CAP_MAX_VCPU_ID @@ -8588,6 +8628,8 @@ guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf (0x40000001). Otherwise, a guest may use the paravirtual features regardless of what has actually been exposed through the CPUID leaf. +.. _KVM_CAP_DIRTY_LOG_RING: + 8.29 KVM_CAP_DIRTY_LOG_RING/KVM_CAP_DIRTY_LOG_RING_ACQ_REL ---------------------------------------------------------- |