summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: x86: work around infinite loop in microcode when #AC is deliveredEric Northup2015-11-103-1/+13
| | | | | | | | | | | | It was found that a guest can DoS a host by triggering an infinite stream of "alignment check" (#AC) exceptions. This causes the microcode to enter an infinite loop where the core never receives another interrupt. The host kernel panics pretty quickly due to the effects (CVE-2015-5307). Signed-off-by: Eric Northup <digitaleric@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* context_tracking: avoid irq_save/irq_restore on guest entry and exitPaolo Bonzini2015-11-102-28/+44
| | | | | | | | | | | | | | | | | | | | | guest_enter and guest_exit must be called with interrupts disabled, since they take the vtime_seqlock with write_seq{lock,unlock}. Therefore, it is not necessary to check for exceptions, nor to save/restore the IRQ state, when context tracking functions are called by guest_enter and guest_exit. Split the body of context_tracking_entry and context_tracking_exit out to __-prefixed functions, and use them from KVM. Rik van Riel has measured this to speed up a tight vmentry/vmexit loop by about 2%. Cc: Andy Lutomirski <luto@kernel.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* context_tracking: remove duplicate enabled checkPaolo Bonzini2015-11-102-16/+4
| | | | | | | | | | | | | | | | | All calls to context_tracking_enter and context_tracking_exit are already checking context_tracking_is_enabled, except the context_tracking_user_enter and context_tracking_user_exit functions left in for the benefit of assembly calls. Pull the check up to those functions, by making them simple wrappers around the user_enter and user_exit inline functions. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Rik van Riel <riel@redhat.com> Tested-by: Rik van Riel <riel@redhat.com> Acked-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: Dump TSC multiplier in dump_vmcs()Haozhong Zhang2015-11-101-0/+3
| | | | | | | | This patch enhances dump_vmcs() to dump the value of TSC multiplier field in VMCS. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: Use a scaled host TSC for guest readings of MSR_IA32_TSCHaozhong Zhang2015-11-101-4/+5
| | | | | | | | This patch makes kvm-intel to return a scaled host TSC plus the TSC offset when handling guest readings to MSR_IA32_TSC. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: Setup TSC scaling ratio when a vcpu is loadedHaozhong Zhang2015-11-101-0/+6
| | | | | | | | | This patch makes kvm-intel module to load TSC scaling ratio into TSC multiplier field of VMCS when a vcpu is loaded, so that TSC scaling ratio can take effect if VMX TSC scaling is enabled. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: Enable and initialize VMX TSC scalingHaozhong Zhang2015-11-102-1/+19
| | | | | | | | This patch exhances kvm-intel module to enable VMX TSC scaling and collects information of TSC scaling ratio during initialization. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Use the correct vcpu's TSC rate to compute time scaleHaozhong Zhang2015-11-101-2/+4
| | | | | | | | This patch makes KVM use virtual_tsc_khz rather than the host TSC rate as vcpu's TSC rate to compute the time scale if TSC scaling is enabled. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Move TSC scaling logic out of call-back read_l1_tsc()Haozhong Zhang2015-11-104-7/+12
| | | | | | | | | Both VMX and SVM scales the host TSC in the same way in call-back read_l1_tsc(), so this patch moves the scaling logic from call-back read_l1_tsc() to a common function kvm_read_l1_tsc(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Move TSC scaling logic out of call-back adjust_tsc_offset()Haozhong Zhang2015-11-104-22/+19
| | | | | | | | | | | For both VMX and SVM, if the 2nd argument of call-back adjust_tsc_offset() is the host TSC, then adjust_tsc_offset() will scale it first. This patch moves this common TSC scaling logic to its caller adjust_tsc_offset_host() and rename the call-back adjust_tsc_offset() to adjust_tsc_offset_guest(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Replace call-back compute_tsc_offset() with a common functionHaozhong Zhang2015-11-104-20/+12
| | | | | | | | | Both VMX and SVM calculate the tsc-offset in the same way, so this patch removes the call-back compute_tsc_offset() and replaces it with a common function kvm_compute_tsc_offset(). Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Replace call-back set_tsc_khz() with a common functionHaozhong Zhang2015-11-105-59/+70
| | | | | | | | | Both VMX and SVM propagate virtual_tsc_khz in the same way, so this patch removes the call-back set_tsc_khz() and replaces it with a common function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add a common TSC scaling functionHaozhong Zhang2015-11-105-45/+97
| | | | | | | | | | VMX and SVM calculate the TSC scaling ratio in a similar logic, so this patch generalizes it to a common TSC scaling function. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> [Inline the multiplication and shift steps into mul_u64_u64_shr. Remove BUG_ON. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Add a common TSC scaling ratio field in kvm_vcpu_archHaozhong Zhang2015-11-103-16/+20
| | | | | | | | This patch moves the field of TSC scaling ratio from the architecture struct vcpu_svm to the common struct kvm_vcpu_arch. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: Collect information for setting TSC scaling ratioHaozhong Zhang2015-11-103-0/+11
| | | | | | | | | | The number of bits of the fractional part of the 64-bit TSC scaling ratio in VMX and SVM is different. This patch makes the architecture code to collect the number of fractional bits and other related information into variables that can be accessed in the common code. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: declare a few variables as __read_mostlyPaolo Bonzini2015-11-102-9/+7
| | | | | | | These include module parameters and variables that are set by kvm_x86_ops->hardware_setup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: merge handle_mmio_page_fault and handle_mmio_page_fault_commonPaolo Bonzini2015-11-104-21/+10
| | | | | | | | | They are exactly the same, except that handle_mmio_page_fault has an unused argument and a call to WARN_ON. Remove the unused argument from the callers, and move the warning to (the former) handle_mmio_page_fault_common. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: Fix commit which broke PMLKai Huang2015-11-051-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found PML was broken since below commit: commit feda805fe7c4ed9cf78158e73b1218752e3b4314 Author: Xiao Guangrong <guangrong.xiao@linux.intel.com> Date: Wed Sep 9 14:05:55 2015 +0800 KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update Unify the update in vmx_cpuid_update() Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Rewrite to use vmcs_set_secondary_exec_control. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> The reason is in above commit vmx_cpuid_update calls vmx_secondary_exec_control, in which currently SECONDARY_EXEC_ENABLE_PML bit is cleared unconditionally (as PML is enabled in creating vcpu). Therefore if vcpu_cpuid_update is called after vcpu is created, PML will be disabled unexpectedly while log-dirty code still thinks PML is used. Fix this by clearing SECONDARY_EXEC_ENABLE_PML in vmx_secondary_exec_control only when PML is not supported or not enabled (!enable_pml). This is more reasonable as PML is currently either always enabled or disabled. With this explicit updating SECONDARY_EXEC_ENABLE_PML in vmx_enable{disable}_pml is not needed so also rename vmx_enable{disable}_pml to vmx_create{destroy}_pml_buffer. Fixes: feda805fe7c4ed9cf78158e73b1218752e3b4314 Signed-off-by: Kai Huang <kai.huang@linux.intel.com> [While at it, change a wrong ASSERT to an "if". The condition can happen if creating the VCPU fails with ENOMEM. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()Laszlo Ersek2015-11-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b18d5431acc7 ("KVM: x86: fix CR0.CD virtualization") was technically correct, but it broke OVMF guests by slowing down various parts of the firmware. Commit fb279950ba02 ("KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED") quirked the first function modified by b18d5431acc7, vmx_get_mt_mask(), for OVMF's sake. This restored the speed of the OVMF code that runs before PlatformPei (including the memory intensive LZMA decompression in SEC). This patch extends the quirk to the second function modified by b18d5431acc7, kvm_set_cr0(). It eliminates the intrusive slowdown that hits the EFI_MP_SERVICES_PROTOCOL implementation of edk2's UefiCpuPkg/CpuDxe -- which is built into OVMF --, when CpuDxe starts up all APs at once for initialization, in order to count them. We also carry over the kvm_arch_has_noncoherent_dma() sub-condition from the other half of the original commit b18d5431acc7. Fixes: b18d5431acc7a2fd22767925f3a6f597aa4bd29e Cc: stable@vger.kernel.org Cc: Jordan Justen <jordan.l.justen@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Tested-by: Janusz Mocek <januszmk6@gmail.com> Signed-off-by: Laszlo Ersek <lersek@redhat.com># Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: allow RSM from 64-bit modePaolo Bonzini2015-11-041-5/+25
| | | | | | | | | | | | | | The SDM says that exiting system management mode from 64-bit mode is invalid, but that would be too good to be true. But actually, most of the code is already there to support exiting from compat mode (EFER.LME=1, EFER.LMA=0). Getting all the way from 64-bit mode to real mode only requires clearing CS.L and CR4.PCIDE. Cc: stable@vger.kernel.org Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: VMX: fix SMEP and SMAP without EPTRadim Krčmář2015-11-041-9/+10
| | | | | | | | | | | | The comment in code had it mostly right, but we enable paging for emulated real mode regardless of EPT. Without EPT (which implies emulated real mode), secondary VCPUs won't start unless we disable SM[AE]P when the guest doesn't use paging. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: move kvm_set_irq_inatomic to legacy device assignmentPaolo Bonzini2015-11-043-35/+37
| | | | | | | | | The function is not used outside device assignment, and kvm_arch_set_irq_inatomic has a different prototype. Move it here and make it static to avoid confusion. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: device assignment: remove pointless #ifdefsPaolo Bonzini2015-11-041-25/+0
| | | | | | | The symbols are always defined. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomicPaolo Bonzini2015-11-043-17/+15
| | | | | | | | | | | We do not want to do too much work in atomic context, in particular not walking all the VCPUs of the virtual machine. So we want to distinguish the architecture-specific injection function for irqfd from kvm_set_msi. Since it's still empty, reuse the newly added kvm_arch_set_irq and rename it to kvm_arch_set_irq_inatomic. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: zero apic_arb_prio on resetRadim Krčmář2015-11-041-0/+2
| | | | | | | | | | | | | | BSP doesn't get INIT so its apic_arb_prio isn't zeroed after reboot. BSP won't get lowest priority interrupts until other VCPUs get enough interrupts to match their pre-reboot apic_arb_prio. That behavior doesn't fit into KVM's round-robin-like interpretation of lowest priority delivery ... userspace should KVM_SET_LAPIC on reset, so just zero apic_arb_prio there. Reported-by: Yuki Shibuya <shibuya.yk@ncos.nec.co.jp> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* drivers/hv: share Hyper-V SynIC constants with userspaceAndrey Smetanin2015-11-043-5/+13
| | | | | | | | | | | | | | | | Moved Hyper-V synic contants from guest Hyper-V drivers private header into x86 arch uapi Hyper-V header. Added Hyper-V synic msr's flags into x86 arch uapi Hyper-V header. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Vitaly Kuznetsov <vkuznets@redhat.com> CC: "K. Y. Srinivasan" <kys@microsoft.com> CC: Gleb Natapov <gleb@kernel.org> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: handle SMBASE as physical address in RSMRadim Krčmář2015-11-041-4/+3
| | | | | | | | | | | | GET_SMSTATE depends on real mode to ensure that smbase+offset is treated as a physical address, which has already caused a bug after shuffling the code. Enforce physical addressing. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: add read_phys to x86_emulate_opsRadim Krčmář2015-11-042-0/+20
| | | | | | | | | | | | We want to read the physical memory when emulating RSM. X86EMUL_IO_NEEDED is returned on all errors for consistency with other helpers. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: x86: removing unused variableSaurabh Sengar2015-11-041-11/+5
| | | | | | | removing unused variables, found by coccinelle Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configsJan Beulich2015-11-041-1/+1
| | | | | | | The symbol was missing a KVM dependency. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvm-arm-for-4.4' of ↵Paolo Bonzini2015-11-0425-297/+676
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.4-rc1 Includes a number of fixes for the arch-timer, introducing proper level-triggered semantics for the arch-timers, a series of patches to synchronously halt a guest (prerequisite for IRQ forwarding), some tracepoint improvements, a tweak for the EL2 panic handlers, some more VGIC cleanups getting rid of redundant state, and finally a stylistic change that gets rid of some ctags warnings. Conflicts: arch/x86/include/asm/kvm_host.h
| * KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()Pavel Fedin2015-11-044-23/+2
| | | | | | | | | | | | | | | | | | Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used together. Merge them into one function, saving from second vgic_ops dereferencing every time. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Clean up vgic_retire_lr() and surroundingsPavel Fedin2015-11-041-27/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. Remove unnecessary 'irq' argument, because irq number can be retrieved from the LR. 2. Since cff9211eb1a1f58ce7f5a2d596b617928fd4be0e ("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ") LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it clears vlr.state itself. Therefore, we remove the same, now duplicated, check with all accompanying bit manipulations from vgic_unqueue_irqs(). 3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since it already does more than just clearing the LR, move vgic_irq_clear_queued() inside of it. Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Optimize away redundant LR trackingPavel Fedin2015-11-044-44/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use vgic_irq_lr_map in order to track which LRs hold which IRQs, and lr_used bitmap in order to track which LRs are used or free. vgic_irq_lr_map is actually used only for piggy-back optimization, and can be easily replaced by iteration over lr_used. This is good because in future, when LPI support is introduced, number of IRQs will grow up to at least 16384, while numbers from 1024 to 8192 are never going to be used. This would be a huge memory waste. In its turn, lr_used is also completely redundant since ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr in sync with software model"), because together with lr_used we also update elrsr. This allows to easily replace lr_used with elrsr, inverting all conditions (because in elrsr '1' means 'free'). Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm: Do not indent the arguments of DECLARE_BITMAPMichal Marek2015-10-221-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Besides being a coding style issue, it confuses make tags: ctags: Warning: include/kvm/arm_vgic.h:307: null expansion of name pattern "\1" ctags: Warning: include/kvm/arm_vgic.h:308: null expansion of name pattern "\1" ctags: Warning: include/kvm/arm_vgic.h:309: null expansion of name pattern "\1" ctags: Warning: include/kvm/arm_vgic.h:317: null expansion of name pattern "\1" Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: Michal Marek <mmarek@suse.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm64: kvm: restore EL1N SP for panicMark Rutland2015-10-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we panic in hyp mode, we inject a call to panic() into the EL1N host kernel. If a guest context is active, we first attempt to restore the minimal amount of state necessary to execute the host kernel with restore_sysregs. However, the SP is restored as part of restore_common_regs, and so we may return to the host's panic() function with the SP of the guest. Any calculations based on the SP will be bogus, and any attempt to access the stack will result in recursive data aborts. When running Linux as a guest, the guest's EL1N SP is like to be some valid kernel address. In this case, the host kernel may use that region as a stack for panic(), corrupting it in the process. Avoid the problem by restoring the host SP prior to returning to the host. To prevent misleading backtraces in the host, the FP is zeroed at the same time. We don't need any of the other "common" registers in order to panic successfully. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: <kvmarm@lists.cs.columbia.edu> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Add tracepoints for vgic and timerChristoffer Dall2015-10-223-0/+72
| | | | | | | | | | | | | | | | | | | | | | The VGIC and timer code for KVM arm/arm64 doesn't have any tracepoints or tracepoint infrastructure defined. Rewriting some of the timer code handling showed me how much we need this, so let's add these simple trace points once and for all and we can easily expand with additional trace points in these files as we go along. Cc: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Improve kvm_exit tracepointChristoffer Dall2015-10-224-4/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARM architecture only saves the exit class to the HSR (ESR_EL2 for arm64) on synchronous exceptions, not on asynchronous exceptions like an IRQ. However, we only report the exception class on kvm_exit, which is confusing because an IRQ looks like it exited at some PC with the same reason as the previous exit. Add a lookup table for the exception index and prepend the kvm_exit tracepoint text with the exception type to clarify this situation. Also resolve the exception class (EC) to a human-friendly text version so the trace output becomes immediately usable for debugging this code. Cc: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: Fix vGIC documentationPavel Fedin2015-10-221-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct some old mistakes in the API documentation: 1. VCPU is identified by index (using kvm_get_vcpu() function), but "cpu id" can be mistaken for affinity ID. 2. Some error codes are wrong. [ Slightly tweaked some grammer and did some s/CPU index/vcpu_index/ in the descriptions. -Christoffer ] Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: implement kvm_arm_[halt,resume]_guestEric Auger2015-10-223-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We introduce kvm_arm_halt_guest and resume functions. They will be used for IRQ forward state change. Halt is synchronous and prevents the guest from being re-entered. We use the same mechanism put in place for PSCI former pause, now renamed power_off. A new flag is introduced in arch vcpu state, pause, only meant to be used by those functions. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: check power_off in critical section before VCPU runEric Auger2015-10-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | In case a vcpu off PSCI call is called just after we executed the vcpu_sleep check, we can enter the guest although power_off is set. Let's check the power_off state in the critical section, just before entering the guest. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reported-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnableEric Auger2015-10-221-1/+2
| | | | | | | | | | | | | | | | | | kvm_arch_vcpu_runnable now also checks whether the power_off flag is set. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: rename pause into power_offEric Auger2015-10-224-19/+19
| | | | | | | | | | | | | | | | | | | | | | The kvm_vcpu_arch pause field is renamed into power_off to prepare for the introduction of a new pause field. Also vcpu_pause is renamed into vcpu_sleep since we will sleep until both power_off and pause are false. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM : Enable vhost device selection under KVM config menuWei Huang2015-10-222-0/+4
| | | | | | | | | | | | | | | | | | | | vhost drivers provide guest VMs with better I/O performance and lower CPU utilization. This patch allows users to select vhost devices under KVM configuration menu on ARM. This makes vhost support on arm/arm64 on a par with other architectures (e.g. x86, ppc). Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Support edge-triggered forwarded interruptsChristoffer Dall2015-10-221-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We mark edge-triggered interrupts with the HW bit set as queued to prevent the VGIC code from injecting LRs with both the Active and Pending bits set at the same time while also setting the HW bit, because the hardware does not support this. However, this means that we must also clear the queued flag when we sync back a LR where the state on the physical distributor went from active to inactive because the guest deactivated the interrupt. At this point we must also check if the interrupt is pending on the distributor, and tell the VGIC to queue it again if it is. Since these actions on the sync path are extremely close to those for level-triggered interrupts, rename process_level_irq to process_queued_irq, allowing it to cater for both cases. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Rework the arch timer to use level-triggered semanticsChristoffer Dall2015-10-225-108/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arch timer currently uses edge-triggered semantics in the sense that the line is never sampled by the vgic and lowering the line from the timer to the vgic doesn't have any effect on the pending state of virtual interrupts in the vgic. This means that we do not support a guest with the otherwise valid behavior of (1) disable interrupts (2) enable the timer (3) disable the timer (4) enable interrupts. Such a guest would validly not expect to see any interrupts on real hardware, but will see interrupts on KVM. This patch fixes this shortcoming through the following series of changes. First, we change the flow of the timer/vgic sync/flush operations. Now the timer is always flushed/synced before the vgic, because the vgic samples the state of the timer output. This has the implication that we move the timer operations in to non-preempible sections, but that is fine after the previous commit getting rid of hrtimer schedules on every entry/exit. Second, we change the internal behavior of the timer, letting the timer keep track of its previous output state, and only lower/raise the line to the vgic when the state changes. Note that in theory this could have been accomplished more simply by signalling the vgic every time the state *potentially* changed, but we don't want to be hitting the vgic more often than necessary. Third, we get rid of the use of the map->active field in the vgic and instead simply set the interrupt as active on the physical distributor whenever the input to the GIC is asserted and conversely clear the physical active state when the input to the GIC is deasserted. Fourth, and finally, we now initialize the timer PPIs (and all the other unused PPIs for now), to be level-triggered, and modify the sync code to sample the line state on HW sync and re-inject a new interrupt if it is still pending at that time. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Add forwarded physical interrupts documentationChristoffer Dall2015-10-221-0/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Forwarded physical interrupts on arm/arm64 is a tricky concept and the way we deal with them is not apparently easy to understand by reading various specs. Therefore, add a proper documentation file explaining the flow and rationale of the behavior of the vgic. Some of this text was contributed by Marc Zyngier and edited by me. Omissions and errors are all mine. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Use appropriate define in VGIC reset codeChristoffer Dall2015-10-221-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | We currently initialize the SGIs to be enabled in the VGIC code, but we use the VGIC_NR_PPIS define for this purpose, instead of the the more natural VGIC_NR_SGIS. Change this slightly confusing use of the defines. Note: This should have no functional change, as both names are defined to the number 16. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIsChristoffer Dall2015-10-221-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GICD_ICFGR allows the bits for the SGIs and PPIs to be read only. We currently simulate this behavior by writing a hardcoded value to the register for the SGIs and PPIs on every write of these bits to the register (ignoring what the guest actually wrote), and by writing the same value as the reset value to the register. This is a bit counter-intuitive, as the register is RO for these bits, and we can just implement it that way, allowing us to control the value of the bits purely in the reset code. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: vgic: Factor out level irq processing on guest exitChristoffer Dall2015-10-221-38/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently vgic_process_maintenance() processes dealing with a completed level-triggered interrupt directly, but we are soon going to reuse this logic for level-triggered mapped interrupts with the HW bit set, so move this logic into a separate static function. Probably the most scary part of this commit is convincing yourself that the current flow is safe compared to the old one. In the following I try to list the changes and why they are harmless: Move vgic_irq_clear_queued after kvm_notify_acked_irq: Harmless because the only potential effect of clearing the queued flag wrt. kvm_set_irq is that vgic_update_irq_pending does not set the pending bit on the emulated CPU interface or in the pending_on_cpu bitmask if the function is called with level=1. However, the point of kvm_notify_acked_irq is to call kvm_set_irq with level=0, and we set the queued flag again in __kvm_vgic_sync_hwstate later on if the level is stil high. Move vgic_set_lr before kvm_notify_acked_irq: Also, harmless because the LR are cpu-local operations and kvm_notify_acked only affects the dist Move vgic_dist_irq_clear_soft_pend after kvm_notify_acked_irq: Also harmless, because now we check the level state in the clear_soft_pend function and lower the pending bits if the level is low. Reviewed-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>