diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-12-20 19:44:05 +0100 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-12-20 19:44:05 +0100 |
commit | 6a447b0e3151893f6d4a889956553c06d2e775c6 (patch) | |
tree | 0f0c149c03dd8c2e9a5fbe01d6de528b2724893e /Documentation | |
parent | Merge tag 'rtc-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/abellon... (diff) | |
parent | KVM: SVM: fix 32-bit compilation (diff) | |
download | linux-6a447b0e3151893f6d4a889956553c06d2e775c6.tar.xz linux-6a447b0e3151893f6d4a889956553c06d2e775c6.zip |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Much x86 work was pushed out to 5.12, but ARM more than made up for it.
ARM:
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
s390:
- memcg accouting for s390 specific parts of kvm and gmap
- selftest for diag318
- new kvm_stat for when async_pf falls back to sync
x86:
- Tracepoints for the new pagetable code from 5.10
- Catch VFIO and KVM irqfd events before userspace
- Reporting dirty pages to userspace with a ring buffer
- SEV-ES host support
- Nested VMX support for wait-for-SIPI activity state
- New feature flag (AVX512 FP16)
- New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
- Selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: SVM: fix 32-bit compilation
KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
KVM: SVM: Provide support to launch and run an SEV-ES guest
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Create trace events for VMGEXIT processing
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/kernel-parameters.txt | 10 | ||||
-rw-r--r-- | Documentation/arm64/memory.rst | 2 | ||||
-rw-r--r-- | Documentation/virt/kvm/api.rst | 116 | ||||
-rw-r--r-- | Documentation/virt/kvm/arm/pvtime.rst | 4 |
4 files changed, 124 insertions, 8 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d24336109c2e..c722ec19cd00 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2254,6 +2254,16 @@ for all guests. Default is 1 (enabled) if in 64-bit or 32-bit PAE mode. + kvm-arm.mode= + [KVM,ARM] Select one of KVM/arm64's modes of operation. + + protected: nVHE-based mode with support for guests whose + state is kept private from the host. + Not valid if the kernel is running in EL2. + + Defaults to VHE/nVHE based on hardware support and + the value of CONFIG_ARM64_VHE. + kvm-arm.vgic_v3_group0_trap= [KVM,ARM] Trap guest accesses to GICv3 group-0 system registers diff --git a/Documentation/arm64/memory.rst b/Documentation/arm64/memory.rst index e7522e5c8322..901cd094f4ec 100644 --- a/Documentation/arm64/memory.rst +++ b/Documentation/arm64/memory.rst @@ -97,7 +97,7 @@ hypervisor maps kernel pages in EL2 at a fixed (and potentially random) offset from the linear mapping. See the kern_hyp_va macro and kvm_update_va_mask function for more details. MMIO devices such as GICv2 gets mapped next to the HYP idmap page, as do vectors when -ARM64_HARDEN_EL2_VECTORS is selected for particular CPUs. +ARM64_SPECTRE_V3A is enabled for particular CPUs. When using KVM with the Virtualization Host Extensions, no additional mappings are created, since the host kernel runs directly in EL2. diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index e00a66d72372..70254eaa5229 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -262,6 +262,18 @@ The KVM_RUN ioctl (cf.) communicates with userspace via a shared memory region. This ioctl returns the size of that region. See the KVM_RUN documentation for details. +Besides the size of the KVM_RUN communication region, other areas of +the VCPU file descriptor can be mmap-ed, including: + +- if KVM_CAP_COALESCED_MMIO is available, a page at + KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE; for historical reasons, + this page is included in the result of KVM_GET_VCPU_MMAP_SIZE. + KVM_CAP_COALESCED_MMIO is not documented yet. + +- if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at + KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on + KVM_CAP_DIRTY_LOG_RING, see section 8.3. + 4.6 KVM_SET_MEMORY_REGION ------------------------- @@ -4455,9 +4467,9 @@ that KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 is present. 4.118 KVM_GET_SUPPORTED_HV_CPUID -------------------------------- -:Capability: KVM_CAP_HYPERV_CPUID +:Capability: KVM_CAP_HYPERV_CPUID (vcpu), KVM_CAP_SYS_HYPERV_CPUID (system) :Architectures: x86 -:Type: vcpu ioctl +:Type: system ioctl, vcpu ioctl :Parameters: struct kvm_cpuid2 (in/out) :Returns: 0 on success, -1 on error @@ -4502,9 +4514,6 @@ Currently, the following list of CPUID leaves are returned: - HYPERV_CPUID_SYNDBG_INTERFACE - HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES -HYPERV_CPUID_NESTED_FEATURES leaf is only exposed when Enlightened VMCS was -enabled on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS). - Userspace invokes KVM_GET_SUPPORTED_HV_CPUID by passing a kvm_cpuid2 structure with the 'nent' field indicating the number of entries in the variable-size array 'entries'. If the number of entries is too low to describe all Hyper-V @@ -4515,6 +4524,15 @@ number of valid entries in the 'entries' array, which is then filled. 'index' and 'flags' fields in 'struct kvm_cpuid_entry2' are currently reserved, userspace should not expect to get any particular value there. +Note, vcpu version of KVM_GET_SUPPORTED_HV_CPUID is currently deprecated. Unlike +system ioctl which exposes all supported feature bits unconditionally, vcpu +version has the following quirks: +- HYPERV_CPUID_NESTED_FEATURES leaf and HV_X64_ENLIGHTENED_VMCS_RECOMMENDED + feature bit are only exposed when Enlightened VMCS was previously enabled + on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS). +- HV_STIMER_DIRECT_MODE_AVAILABLE bit is only exposed with in-kernel LAPIC. + (presumes KVM_CREATE_IRQCHIP has already been called). + 4.119 KVM_ARM_VCPU_FINALIZE --------------------------- @@ -6390,3 +6408,91 @@ When enabled, KVM will disable paravirtual features provided to the guest according to the bits in the KVM_CPUID_FEATURES CPUID leaf (0x40000001). Otherwise, a guest may use the paravirtual features regardless of what has actually been exposed through the CPUID leaf. + + +8.29 KVM_CAP_DIRTY_LOG_RING +--------------------------- + +:Architectures: x86 +:Parameters: args[0] - size of the dirty log ring + +KVM is capable of tracking dirty memory using ring buffers that are +mmaped into userspace; there is one dirty ring per vcpu. + +The dirty ring is available to userspace as an array of +``struct kvm_dirty_gfn``. Each dirty entry it's defined as:: + + struct kvm_dirty_gfn { + __u32 flags; + __u32 slot; /* as_id | slot_id */ + __u64 offset; + }; + +The following values are defined for the flags field to define the +current state of the entry:: + + #define KVM_DIRTY_GFN_F_DIRTY BIT(0) + #define KVM_DIRTY_GFN_F_RESET BIT(1) + #define KVM_DIRTY_GFN_F_MASK 0x3 + +Userspace should call KVM_ENABLE_CAP ioctl right after KVM_CREATE_VM +ioctl to enable this capability for the new guest and set the size of +the rings. Enabling the capability is only allowed before creating any +vCPU, and the size of the ring must be a power of two. The larger the +ring buffer, the less likely the ring is full and the VM is forced to +exit to userspace. The optimal size depends on the workload, but it is +recommended that it be at least 64 KiB (4096 entries). + +Just like for dirty page bitmaps, the buffer tracks writes to +all user memory regions for which the KVM_MEM_LOG_DIRTY_PAGES flag was +set in KVM_SET_USER_MEMORY_REGION. Once a memory region is registered +with the flag set, userspace can start harvesting dirty pages from the +ring buffer. + +An entry in the ring buffer can be unused (flag bits ``00``), +dirty (flag bits ``01``) or harvested (flag bits ``1X``). The +state machine for the entry is as follows:: + + dirtied harvested reset + 00 -----------> 01 -------------> 1X -------+ + ^ | + | | + +------------------------------------------+ + +To harvest the dirty pages, userspace accesses the mmaped ring buffer +to read the dirty GFNs. If the flags has the DIRTY bit set (at this stage +the RESET bit must be cleared), then it means this GFN is a dirty GFN. +The userspace should harvest this GFN and mark the flags from state +``01b`` to ``1Xb`` (bit 0 will be ignored by KVM, but bit 1 must be set +to show that this GFN is harvested and waiting for a reset), and move +on to the next GFN. The userspace should continue to do this until the +flags of a GFN have the DIRTY bit cleared, meaning that it has harvested +all the dirty GFNs that were available. + +It's not necessary for userspace to harvest the all dirty GFNs at once. +However it must collect the dirty GFNs in sequence, i.e., the userspace +program cannot skip one dirty GFN to collect the one next to it. + +After processing one or more entries in the ring buffer, userspace +calls the VM ioctl KVM_RESET_DIRTY_RINGS to notify the kernel about +it, so that the kernel will reprotect those collected GFNs. +Therefore, the ioctl must be called *before* reading the content of +the dirty pages. + +The dirty ring can get full. When it happens, the KVM_RUN of the +vcpu will return with exit reason KVM_EXIT_DIRTY_LOG_FULL. + +The dirty ring interface has a major difference comparing to the +KVM_GET_DIRTY_LOG interface in that, when reading the dirty ring from +userspace, it's still possible that the kernel has not yet flushed the +processor's dirty page buffers into the kernel buffer (with dirty bitmaps, the +flushing is done by the KVM_GET_DIRTY_LOG ioctl). To achieve that, one +needs to kick the vcpu out of KVM_RUN using a signal. The resulting +vmexit ensures that all dirty GFNs are flushed to the dirty rings. + +NOTE: the capability KVM_CAP_DIRTY_LOG_RING and the corresponding +ioctl KVM_RESET_DIRTY_RINGS are mutual exclusive to the existing ioctls +KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG. After enabling +KVM_CAP_DIRTY_LOG_RING with an acceptable dirty ring size, the virtual +machine will switch to ring-buffer dirty page tracking and further +KVM_GET_DIRTY_LOG or KVM_CLEAR_DIRTY_LOG ioctls will fail. diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst index 687b60d76ca9..392521af7c90 100644 --- a/Documentation/virt/kvm/arm/pvtime.rst +++ b/Documentation/virt/kvm/arm/pvtime.rst @@ -19,8 +19,8 @@ Two new SMCCC compatible hypercalls are defined: These are only available in the SMC64/HVC64 calling convention as paravirtualized time is not available to 32 bit Arm guests. The existence of -the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES -mechanism before calling it. +the PV_TIME_FEATURES hypercall should be probed using the SMCCC 1.1 +ARCH_FEATURES mechanism before calling it. PV_TIME_FEATURES ============= ======== ========== |