linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	KVM: x86: Use "is Intel compatible" helper to emulate SYSCALL in !64-bit	Sean Christopherson	2024-06-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Use guest_cpuid_is_intel_compatible() to determine whether SYSCALL in 32-bit Protected Mode (including Compatibility Mode) should #UD or succeed. The existing code already does the exact equivalent of guest_cpuid_is_intel_compatible(), just in a rather roundabout way. No functional change intended. Link: https://lore.kernel.org/r/20240405235603.1173076-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Move nEPT exit_qualification field from kvm_vcpu_arch to x86_exception	Sean Christopherson	2024-04-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the exit_qualification field that is used to track information about in-flight nEPT violations from "struct kvm_vcpu_arch" to "x86_exception", i.e. associate the information with the actual nEPT violation instead of the vCPU. To handle bits that are pulled from vmcs.EXIT_QUALIFICATION, i.e. that are propagated from the "original" EPT violation VM-Exit, simply grab them from the VMCS on-demand when injecting a nEPT Violation or a PML Full VM-exit. Aside from being ugly, having an exit_qualification field in kvm_vcpu_arch is outright dangerous, e.g. see commit d7f0a00e438d ("KVM: VMX: Report up-to-date exit qualification to userspace"). Opportunstically add a comment to call out that PML Full and EPT Violation VM-Exits use the same bit to report NMI blocking information. Link: https://lore.kernel.org/r/20240209221700.393189-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
*	Merge tag 'kvm-x86-pmu-6.9' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini	2024-03-11	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KVM x86 PMU changes for 6.9: - Fix several bugs where KVM speciously prevents the guest from utilizing fixed counters and architectural event encodings based on whether or not guest CPUID reports support for the _architectural_ encoding. - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads, priority of VMX interception vs #GP, PMC types in architectural PMUs, etc. - Add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID, i.e. are difficult to validate via KVM-Unit-Tests. - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting cycles, e.g. when checking if a PMC event needs to be synthesized when skipping an instruction. - Optimize triggering of emulated events, e.g. for "count instructions" events when skipping an instruction, which yields a ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest. - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit.
\| *	KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index	Sean Christopherson	2024-01-31	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apply the pre-intercepts RDPMC validity check only to AMD, and rename all relevant functions to make it as clear as possible that the check is not a standard PMC index check. On Intel, the basic rule is that only invalid opcodes and privilege/permission/mode checks have priority over VM-Exit, i.e. RDPMC with an invalid index should VM-Exit, not #GP. While the SDM doesn't explicitly call out RDPMC, it _does_ explicitly use RDMSR of a non-existent MSR as an example where VM-Exit has priority over #GP, and RDPMC is effectively just a variation of RDMSR. Manually testing on various Intel CPUs confirms this behavior, and the inverted priority was introduced for SVM compatibility, i.e. was not an intentional change for Intel PMUs. On AMD, all exceptions on RDPMC have priority over VM-Exit. Check for a NULL kvm_pmu_ops.check_rdpmc_early instead of using a RET0 static call so as to provide a convenient location to document the difference between Intel and AMD, and to again try to make it as obvious as possible that the early check is a one-off thing, not a generic "is this PMC valid?" helper. Fixes: 8061252ee0d2 ("KVM: SVM: Add intercept checks for remaining twobyte instructions") Cc: Jim Mattson <jmattson@google.com> Tested-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240109230250.424295-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
* \|	KVM: x86: Make kvm_get_dr() return a value, not use an out parameter	Sean Christopherson	2024-02-23	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \|	Convert kvm_get_dr()'s output parameter to a return value, and clean up most of the mess that was created by forcing callers to provide a pointer. No functional change intended. Acked-by: Mathias Krause <minipli@grsecurity.net> Reviewed-by: Mathias Krause <minipli@grsecurity.net> Link: https://lore.kernel.org/r/20240209220752.388160-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Introduce get_untagged_addr() in kvm_x86_ops and call it in emulator	Binbin Wu	2023-11-29	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new interface get_untagged_addr() to kvm_x86_ops to untag the metadata from linear address. Call the interface in linearization of instruction emulator for 64-bit mode. When enabled feature like Intel Linear Address Masking (LAM) or AMD Upper Address Ignore (UAI), linear addresses may be tagged with metadata that needs to be dropped prior to canonicality checks, i.e. the metadata is ignored. Introduce get_untagged_addr() to kvm_x86_ops to hide the vendor specific code, as sadly LAM and UAI have different semantics. Pass the emulator flags to allow vendor specific implementation to precisely identify the access type (LAM doesn't untag certain accesses). Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-9-binbin.wu@linux.intel.com [sean: massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Add X86EMUL_F_INVLPG and pass it in em_invlpg()	Binbin Wu	2023-11-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an emulation flag X86EMUL_F_INVLPG, which is used to identify an instruction that does TLB invalidation without true memory access. Only invlpg & invlpga implemented in emulator belong to this kind. invlpga doesn't need additional information for emulation. Just pass the flag to em_invlpg(). Linear Address Masking (LAM) and Linear Address Space Separation (LASS) don't apply to addresses that are inputs to TLB invalidation. The flag will be consumed to support LAM/LASS virtualization. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-5-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Add an emulation flag for implicit system access	Binbin Wu	2023-11-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an emulation flag X86EMUL_F_IMPLICIT to identify implicit system access in instruction emulation. Don't bother wiring up any usage at this point, as Linear Address Space Separation (LASS) will be the first "real" consumer of the flag and LASS support will require dedicated hooks, i.e. there aren't any existing calls where passing X86EMUL_F_IMPLICIT is meaningful. Add the IMPLICIT flag even though there's no imminent usage so that Linear Address Masking (LAM) support can reference the flag to document that addresses for implicit accesses aren't untagged. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-4-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Consolidate flags for __linearize()	Binbin Wu	2023-11-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consolidate @write and @fetch of __linearize() into a set of flags so that additional flags can be added without needing more/new boolean parameters, to precisely identify the access type. No functional change intended. Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20230913124227.12574-2-binbin.wu@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Remove x86_emulate_ops::guest_has_long_mode	Michal Luczaj	2023-08-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Remove x86_emulate_ops::guest_has_long_mode along with its implementation, emulator_guest_has_long_mode(). It has been unused since commit 1d0da94cdafe ("KVM: x86: do not go through ctxt->ops when emulating rsm"). No functional change intended. Signed-off-by: Michal Luczaj <mhal@rbox.co> Link: https://lore.kernel.org/r/20230718101809.1249769-1-mhal@rbox.co Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: Use emulator callbacks instead of duplicating "host flags"	Maxim Levitsky	2023-02-01	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of re-defining the "host flags" bits, just expose dedicated helpers for each of the two remaining flags that are consumed by the emulator. The emulator never consumes both "is guest" and "is SMM" in close proximity, so there is no motivation to avoid additional indirect branches. Also while at it, garbage collect the recently removed host flags. No functional change is intended. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Santosh Shukla <Santosh.Shukla@amd.com> Link: https://lore.kernel.org/r/20221129193717.513824-6-mlevitsk@redhat.com [sean: fix CONFIG_KVM_SMM=n builds, tweak names of wrappers] Signed-off-by: Sean Christopherson <seanjc@google.com>
*	KVM: x86: do not define SMM-related constants if SMM disabled	Paolo Bonzini	2022-11-09	1	-1/+0
\| \| \| \| \| \| \| \| \|	The hidden processor flags HF_SMM_MASK and HF_SMM_INSIDE_NMI_MASK are not needed if CONFIG_KVM_SMM is turned off. Remove the definitions altogether and the code that uses them. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: do not go through ctxt->ops when emulating rsm	Paolo Bonzini	2022-11-09	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that RSM is implemented in a single emulator callback, there is no point in going through other callbacks for the sake of modifying processor state. Just invoke KVM's own internal functions directly, and remove the callbacks that were only used by em_rsm; the only substantial difference is in the handling of the segment registers and descriptor cache, which have to be parsed into a struct kvm_segment instead of a struct desc_struct. This also fixes a bug where emulator_set_segment was shifting the limit left by 12 if the G bit is set, but the limit had not been shifted right upon entry to SMM. The emulator context is still used to restore EIP and the general purpose registers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-5-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: move SMM exit to a new file	Paolo Bonzini	2022-11-09	1	-2/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some users of KVM implement the UEFI variable store through a paravirtual device that does not require the "SMM lockbox" component of edk2, and would like to compile out system management mode. In preparation for that, move the SMM exit code out of emulate.c and into a new file. The code is still written as a series of invocations of the emulator callbacks, but the two exiting_smm and leave_smm callbacks are merged into one, and all the code from em_rsm is now part of the callback. This removes all knowledge of the format of the SMM save state area from the emulator. Further patches will clean up the code and invoke KVM's own functions to access control registers, descriptor caches, etc. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220929172016.319443-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Bug the VM if the emulator accesses a non-existent GPR	Sean Christopherson	2022-06-10	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Bug the VM, i.e. kill it, if the emulator accesses a non-existent GPR, i.e. generates an out-of-bounds GPR index. Continuing on all but gaurantees some form of data corruption in the guest, e.g. even if KVM were to redirect to a dummy register, KVM would be incorrectly read zeros and drop writes. Note, bugging the VM doesn't completely prevent data corruption, e.g. the current round of emulation will complete before the vCPU bails out to userspace. But, the very act of killing the guest can also cause data corruption, e.g. due to lack of file writeback before termination, so taking on additional complexity to cleanly bail out of the emulator isn't justified, the goal is purely to stem the bleeding and alert userspace that something has gone horribly wrong, i.e. to avoid _silent_ data corruption. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20220526210817.3428868-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM	Sean Christopherson	2022-06-10	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reduce the number of GPRs emulated by 32-bit KVM from 16 to 8. KVM does not support emulating 64-bit mode on 32-bit host kernels, and so should never generate accesses to R8-15. Opportunistically use NR_EMULATOR_GPRS in rsm_load_state_{32,64}() now that it is precise and accurate for both flavors. Wrap the definition with full #ifdef ugliness; sadly, IS_ENABLED() doesn't guarantee a compile-time constant as far as BUILD_BUG_ON() is concerned. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <20220526210817.3428868-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Use 16-bit fields to track dirty/valid emulator GPRs	Sean Christopherson	2022-06-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a u16 instead of a u32 to track the dirty/valid status of GPRs in the emulator. Unlike struct kvm_vcpu_arch, x86_emulate_ctxt tracks only the "true" GPRs, i.e. doesn't include RIP in its array, and so only needs to track 16 registers. Note, maxing out at 16 GPRs is a fundamental property of x86-64 and will not change barring a massive architecture update. Legacy x86 ModRM and SIB encodings use 3 bits for GPRs, i.e. support 8 registers. x86-64 uses a single bit in the REX prefix for each possible reference type to double the number of supported GPRs to 16 registers (4 bits). Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220526210817.3428868-5-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Omit VCPU_REGS_RIP from emulator's _regs array	Sean Christopherson	2022-06-10	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Omit RIP from the emulator's _regs array, which is used only for GPRs, i.e. registers that can be referenced via ModRM and/or SIB bytes. The emulator uses the dedicated _eip field for RIP, and manually reads from _eip to handle RIP-relative addressing. To avoid an even bigger, slightly more dangerous change, hardcode the number of GPRs to 16 for the time being even though 32-bit KVM's emulator technically should only have 8 GPRs. Add a TODO to address that in a future commit. See also the comments above the read_gpr() and write_gpr() declarations, and obviously the handling in writeback_registers(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Message-Id: <20220526210817.3428868-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2022-04-02	1	-0/+3
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
\| *	KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr	Hou Wenlong	2022-04-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If MSR access is rejected by MSR filtering, kvm_set_msr()/kvm_get_msr() would return KVM_MSR_RET_FILTERED, and the return value is only handled well for rdmsr/wrmsr. However, some instruction emulation and state transition also use kvm_set_msr()/kvm_get_msr() to do msr access but may trigger some unexpected results if MSR access is rejected, E.g. RDPID emulation would inject a #UD but RDPID wouldn't cause a exit when RDPID is supported in hardware and ENABLE_RDTSCP is set. And it would also cause failure when load MSR at nested entry/exit. Since msr filtering is based on MSR bitmap, it is better to only do MSR filtering for rdmsr/wrmsr. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <2b2774154f7532c96a6f04d71c82a8bec7d9e80b.1646655860.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
\| *	KVM: x86/emulator: Emulate RDPID only if it is enabled in guest	Hou Wenlong	2022-04-02	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When RDTSCP is supported but RDPID is not supported in host, RDPID emulation is available. However, __kvm_get_msr() would only fail when RDTSCP/RDPID both are disabled in guest, so the emulator wouldn't inject a #UD when RDPID is disabled but RDTSCP is enabled in guest. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* \|	KVM: x86: Replace memset() "optimization" with normal per-field writes	Sean Christopherson	2022-02-14	1	-5/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Explicitly zero select fields in the emulator's decode cache instead of zeroing the fields via a gross memset() that spans six fields. gcc and clang are both clever enough to batch the first five fields into a single quadword MOV, i.e. memset() and individually zeroing generate identical code. Removing the wart also prepares KVM for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memset(). No functional change intended. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/lkml/YR0jIEzEcUom/7rd@google.com Signed-off-by: Kees Cook <keescook@chromium.org>
*	KVM: x86: Update vPMCs when retiring branch instructions	Eric Hankland	2022-01-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When KVM retires a guest branch instruction through emulation, increment any vPMCs that are configured to monitor "branch instructions retired," and update the sample period of those counters so that they will overflow at the right time. Signed-off-by: Eric Hankland <ehankland@google.com> [jmattson: - Split the code to increment "branch instructions retired" into a separate commit. - Moved/consolidated the calls to kvm_pmu_trigger_event() in the emulation of VMLAUNCH/VMRESUME to accommodate the evolution of that code. ] Fixes: f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests") Signed-off-by: Jim Mattson <jmattson@google.com> Message-Id: <20211130074221.93635-7-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Drop "pre_" from enter/leave_smm() helpers	Sean Christopherson	2021-06-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Now that .post_leave_smm() is gone, drop "pre_" from the remaining helpers. The helpers aren't invoked purely before SMI/RSM processing, e.g. both helpers are invoked after state is snapshotted (from regs or SMRAM), and the RSM helper is invoked after some amount of register state has been stuffed. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Drop .post_leave_smm(), i.e. the manual post-RSM MMU reset	Sean Christopherson	2021-06-17	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Drop the .post_leave_smm() emulator callback, which at this point is just a wrapper to kvm_mmu_reset_context(). The manual context reset is unnecessary, because unlike enter_smm() which calls vendor MSR/CR helpers directly, em_rsm() bounces through the KVM helpers, e.g. kvm_set_cr4(), which are responsible for processing side effects. em_rsm() is already subtly relying on this behavior as it doesn't manually do kvm_update_cpuid_runtime(), e.g. to recognize CR4.OSXSAVE changes. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Replace .set_hflags() with dedicated .exiting_smm() helper	Sean Christopherson	2021-06-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the .set_hflags() emulator hook with a dedicated .exiting_smm(), moving the SMM and SMM_INSIDE_NMI flag handling out of the emulator in the process. This is a step towards consolidating much of the logic in kvm_smm_changed(), including the SMM hflags updates. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Emulate triple fault shutdown if RSM emulation fails	Sean Christopherson	2021-06-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the recently introduced KVM_REQ_TRIPLE_FAULT to properly emulate shutdown if RSM from SMM fails. Note, entering shutdown after clearing the SMM flag and restoring NMI blocking is architecturally correct with respect to AMD's APM, which KVM also uses for SMRAM layout and RSM NMI blocking behavior. The APM says: An RSM causes a processor shutdown if an invalid-state condition is found in the SMRAM state-save area. Only an external reset, external processor-initialization, or non-maskable external interrupt (NMI) can cause the processor to leave the shutdown state. Of note is processor-initialization (INIT) as a valid shutdown wake event, as INIT is blocked by SMM, implying that entering shutdown also forces the CPU out of SMM. For recent Intel CPUs, restoring NMI blocking is technically wrong, but so is restoring NMI blocking in the first place, and Intel's RSM "architecture" is such a mess that just about anything is allowed and can be justified as micro-architectural behavior. Per the SDM: On Pentium 4 and later processors, shutdown will inhibit INTR and A20M but will not change any of the other inhibits. On these processors, NMIs will be inhibited if no action is taken in the SMI handler to uninhibit them (see Section 34.8). where Section 34.8 says: When the processor enters SMM while executing an NMI handler, the processor saves the SMRAM state save map but does not save the attribute to keep NMI interrupts disabled. Potentially, an NMI could be latched (while in SMM or upon exit) and serviced upon exit of SMM even though the previous NMI handler has still not completed. I.e. RSM unconditionally unblocks NMI, but shutdown on RSM does not, which is in direct contradiction of KVM's behavior. But, as mentioned above, KVM follows AMD architecture and restores NMI blocking on RSM, so that micro-architectural detail is already lost. And for Pentium era CPUs, SMI# can break shutdown, meaning that at least some Intel CPUs fully leave SMM when entering shutdown: In the shutdown state, Intel processors stop executing instructions until a RESET#, INIT# or NMI# is asserted. While Pentium family processors recognize the SMI# signal in shutdown state, P6 family and Intel486 processors do not. In other words, the fact that Intel CPUs have implemented the two extremes gives KVM carte blanche when it comes to honoring Intel's architecture for handling shutdown during RSM. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210609185619.992058-3-seanjc@google.com> [Return X86EMUL_CONTINUE after triple fault. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Move FPU register accessors into fpu.h	Siddharth Chandrasekaran	2021-06-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hyper-v XMM fast hypercalls use XMM registers to pass input/output parameters. To access these, hyperv.c can reuse some FPU register accessors defined in emulator.c. Move them to a common location so both can access them. While at it, reorder the parameters of these accessor methods to make them more readable. Cc: Alexander Graf <graf@amazon.com> Cc: Evgeny Iakovlev <eyakovl@amazon.de> Signed-off-by: Siddharth Chandrasekaran <sidcha@amazon.de> Message-Id: <01a85a6560714d4d3637d3d86e5eba65073318fa.1622019133.git.sidcha@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: X86: Kill off ctxt->ud	Wanpeng Li	2021-05-28	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	ctxt->ud is consumed only by x86_decode_insn(), we can kill it off by passing emulation_type to x86_decode_insn() and dropping ctxt->ud altogether. Tracking that info in ctxt for literally one call is silly. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <1622160097-37633-2-git-send-email-wanpengli@tencent.com>
*	KVM: x86: Move RDPID emulation intercept to its own enum	Sean Christopherson	2021-05-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Add a dedicated intercept enum for RDPID instead of piggybacking RDTSCP. Unlike VMX's ENABLE_RDTSCP, RDPID is not bound to SVM's RDTSCP intercept. Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210504171734.1434054-5-seanjc@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: reading DR cannot fail	Paolo Bonzini	2021-02-09	1	-1/+1
\| \| \| \| \| \| \| \|	kvm_get_dr and emulator_get_dr except an in-range value for the register number so they cannot fail. Change the return type to void. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2020-04-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull kvm updates from Paolo Bonzini: "ARM: - GICv4.1 support - 32bit host removal PPC: - secure (encrypted) using under the Protected Execution Framework ultravisor s390: - allow disabling GISA (hardware interrupt injection) and protected VMs/ultravisor support. x86: - New dirty bitmap flag that sets all bits in the bitmap when dirty page logging is enabled; this is faster because it doesn't require bulk modification of the page tables. - Initial work on making nested SVM event injection more similar to VMX, and less buggy. - Various cleanups to MMU code (though the big ones and related optimizations were delayed to 5.8). Instead of using cr3 in function names which occasionally means eptp, KVM too has standardized on "pgd". - A large refactoring of CPUID features, which now use an array that parallels the core x86_features. - Some removal of pointer chasing from kvm_x86_ops, which will also be switched to static calls as soon as they are available. - New Tigerlake CPUID features. - More bugfixes, optimizations and cleanups. Generic: - selftests: cleanups, new MMU notifier stress test, steal-time test - CSV output for kvm_stat" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (277 commits) x86/kvm: fix a missing-prototypes "vmread_error" KVM: x86: Fix BUILD_BUG() in __cpuid_entry_get_reg() w/ CONFIG_UBSAN=y KVM: VMX: Add a trampoline to fix VMREAD error handling KVM: SVM: Annotate svm_x86_ops as __initdata KVM: VMX: Annotate vmx_x86_ops as __initdata KVM: x86: Drop __exit from kvm_x86_ops' hardware_unsetup() KVM: x86: Copy kvm_x86_ops by value to eliminate layer of indirection KVM: x86: Set kvm_x86_ops only after ->hardware_setup() completes KVM: VMX: Configure runtime hooks using vmx_x86_ops KVM: VMX: Move hardware_setup() definition below vmx_x86_ops KVM: x86: Move init-only kvm_x86_ops to separate struct KVM: Pass kvm_init()'s opaque param to additional arch funcs s390/gmap: return proper error code on ksm unsharing KVM: selftests: Fix cosmetic copy-paste error in vm_mem_region_move() KVM: Fix out of range accesses to memslots KVM: X86: Micro-optimize IPI fastpath delay KVM: X86: Delay read msr data iff writes ICR MSR KVM: PPC: Book3S HV: Add a capability for enabling secure guests KVM: arm64: GICv4.1: Expose HW-based SGIs in debugfs KVM: arm64: GICv4.1: Allow non-trapping WFI when using HW SGIs ...
*	KVM: x86: Refactor kvm_cpuid() param that controls out-of-range logic	Sean Christopherson	2020-03-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Invert and rename the kvm_cpuid() param that controls out-of-range logic to better reflect the semantics of the affected callers, i.e. callers that bypass the out-of-range logic do so because they are looking up an exact guest CPUID entry, e.g. to query the maxphyaddr. Similarly, rename kvm_cpuid()'s internal "found" to "exact" to clarify that it tracks whether or not the exact requested leaf was found, as opposed to any usable leaf being found. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Fix CPUID range checks for Hypervisor and Centaur classes	Sean Christopherson	2020-03-16	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rework the masking in the out-of-range CPUID logic to handle the Hypervisor sub-classes, as well as the Centaur class if the guest virtual CPU vendor is Centaur. Masking against 0x80000000 only handles basic and extended leafs, which results in Hypervisor range checks being performed against the basic CPUID class, and Centuar range checks being performed against the Extended class. E.g. if CPUID.0x40000000.EAX returns 0x4000000A and there is no entry for CPUID.0x40000006, then function 0x40000006 would be incorrectly reported as out of bounds. While there is no official definition of what constitutes a class, the convention established for Hypervisor classes effectively uses bits 31:8 as the mask by virtue of checking for different bases in increments of 0x100, e.g. KVM advertises its CPUID functions starting at 0x40000100 when HyperV features are advertised at the default base of 0x40000000. The bad range check doesn't cause functional problems for any known VMM because out-of-range semantics only come into play if the exact entry isn't found, and VMMs either support a very limited Hypervisor range, e.g. the official KVM range is 0x40000000-0x40000001 (effectively no room for undefined leafs) or explicitly defines gaps to be zero, e.g. Qemu explicitly creates zeroed entries up to the Centaur and Hypervisor limits (the latter comes into play when providing HyperV features). The bad behavior can be visually confirmed by dumping CPUID output in the guest when running Qemu with a stable TSC, as Qemu extends the limit of range 0x40000000 to 0x40000010 to advertise VMware's cpuid_freq, without defining zeroed entries for 0x40000002 - 0x4000000f. Note, documentation of Centaur/VIA CPUs is hard to come by. Designating 0xc0000000 - 0xcfffffff as the Centaur class is a best guess as to the behavior of a real Centaur/VIA CPU. Fixes: 43561123ab37 ("kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH") Cc: Jim Mattson <jmattson@google.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Add helpers to perform CPUID-based guest vendor check	Sean Christopherson	2020-03-16	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add helpers to provide CPUID-based guest vendor checks, i.e. to do the ugly register comparisons. Use the new helpers to check for an AMD guest vendor in guest_cpuid_is_amd() as well as in the existing emulator flows. Using the new helpers fixes a _very_ theoretical bug where guest_cpuid_is_amd() would get a false positive on a non-AMD virtual CPU with a vendor string beginning with "Auth" due to the previous logic only checking EBX. It also fixes a marginally less theoretically bug where guest_cpuid_is_amd() would incorrectly return false for a guest CPU with "AMDisbetter!" as its vendor string. Fixes: a0c0feb57992c ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Shrink the usercopy region of the emulation context	Sean Christopherson	2020-03-16	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \|	Shuffle a few operand structs to the end of struct x86_emulate_ctxt and update the cache creation to whitelist only the region of the emulation context that is expected to be copied to/from user memory, e.g. the instruction operands, registers, and fetch/io/mem caches. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	KVM: x86: Move kvm_emulate.h into KVM's private directory	Sean Christopherson	2020-03-16	1	-0/+480
	Now that the emulation context is dynamically allocated and not embedded in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public asm directory and into KVM's private x86 directory. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>