summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* VT-d: Change {un}map_range functions to implement {un}map interfaceJoerg Roedel2010-03-071-10/+12
| | | | | | | | | This patch changes the iommu-api functions for mapping and unmapping page ranges to use the new page-size based interface. This allows to remove the range based functions later. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* iommu-api: Add ->{un}map callbacks to iommu_opsJoerg Roedel2010-03-072-0/+10
| | | | | | | | | This patch adds new callbacks for mapping and unmapping pages to the iommu_ops structure. These callbacks are aware of page sizes which makes them different to the ->{un}map_range callbacks. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* iommu-api: Add iommu_map and iommu_unmap functionsJoerg Roedel2010-03-072-0/+47
| | | | | | | | | | | | | These two functions provide support for mapping and unmapping physical addresses to io virtual addresses. The difference to the iommu_(un)map_range() is that the new functions take a gfp_order parameter instead of a size. This allows the IOMMU backend implementations to detect easier if a given range can be mapped by larger page sizes. These new functions should replace the old ones in the long term. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* iommu-api: Rename ->{un}map function pointers to ->{un}map_rangeJoerg Roedel2010-03-074-10/+10
| | | | | | | | | The new function pointer names match better with the top-level functions of the iommu-api which are using them. Main intention of this change is to make the ->{un}map pointer names free for two new mapping functions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEPJan Kiszka2010-03-012-0/+2
| | | | | | | | | | This marks the guest single-step API improvement of 94fe45da and 91586a3b with a capability flag to allow reliable detection by user space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: stable@kernel.org (2.6.33) Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Update instruction length on intercepted BPJan Kiszka2010-03-011-0/+13
| | | | | | | | | | | | We intercept #BP while in guest debugging mode. As VM exits due to intercepted exceptions do not necessarily come with valid idt_vectoring, we have to update event_exit_inst_len explicitly in such cases. At least in the absence of migration, this ensures that re-injections of #BP will find and use the correct instruction length. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: stable@kernel.org (2.6.32, 2.6.33) Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix emulate_sys[call, enter, exit]()'s fault handlingTakuya Yoshikawa2010-03-011-17/+20
| | | | | | | | | | | | | | | | | | This patch fixes emulate_syscall(), emulate_sysenter() and emulate_sysexit() to handle injected faults properly. Even though original code injects faults in these functions, we cannot handle these unless we use the different return value from the UNHANDLEABLE case. So this patch use X86EMUL_* codes instead of -1 and 0 and makes x86_emulate_insn() to handle these propagated faults. Be sure that, in x86_emulate_insn(), goto cannot_emulate and goto done with rc equals X86EMUL_UNHANDLEABLE have same effect. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix segment descriptor loadingGleb Natapov2010-03-013-59/+151
| | | | | | | | | | | Add proper error and permission checking. This patch also change task switching code to load segment selectors before segment descriptors, like SDM requires, otherwise permission checking during segment descriptor loading will be incorrect. Cc: stable@kernel.org (2.6.33, 2.6.32) Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix load_guest_segment_descriptor() to inject page faultTakuya Yoshikawa2010-03-011-3/+10
| | | | | | | | | | | | | | | | | | | | | | This patch injects page fault when reading descriptor in load_guest_segment_descriptor() fails with FAULT. Effects of this injection: This function is used by kvm_load_segment_descriptor() which is necessary for the following instructions: - mov seg,r/m16 - jmp far - pop ?s This patch makes it possible to emulate the page faults generated by these instructions. But be sure that unless we change the kvm_load_segment_descriptor()'s ret value propagation this patch has no effect. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Forbid modifying CS segment register by mov instructionGleb Natapov2010-03-011-0/+6
| | | | | | | | | Inject #UD if guest attempts to do so. This is in accordance to Intel SDM. Cc: stable@kernel.org (2.6.33, 2.6.32) Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Convert kvm->requests_lock to raw_spinlock_tAvi Kivity2010-03-012-4/+4
| | | | | | | | The code relies on kvm->requests_lock inhibiting preemption. Noted by Jan Kiszka. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Convert i8254/i8259 locks to raw_spinlocksThomas Gleixner2010-03-015-26/+27
| | | | | | | | The i8254/i8259 locks need to be real spinlocks on preempt-rt. Convert them to raw_spinlock. No change for !RT kernels. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: disallow opcode 82 in 64-bit modeGleb Natapov2010-03-011-8/+8
| | | | | | | Instructions with opcode 82 are not valid in 64 bit mode. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: code style cleanupWei Yongjun2010-03-011-1/+1
| | | | | | | Just remove redundant semicolon. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Plan obsolescence of kernel allocated slots, paravirt mmuAvi Kivity2010-03-011-0/+30
| | | | | | | These features are unused by modern userspace and can go away. Paravirt mmu needs to stay a little longer for live migration. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Add LOCK prefix validity checkingGleb Natapov2010-03-011-41/+56
| | | | | | | | | | Instructions which are not allowed to have LOCK prefix should generate #UD if one is used. [avi: fold opcode 82 fix from another patch] Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Check CPL level during privilege instruction emulationGleb Natapov2010-03-011-15/+20
| | | | | | | | | Add CPL checking in case emulator is tricked into emulating privilege instruction from userspace. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Fix popf emulationGleb Natapov2010-03-011-1/+54
| | | | | | | | | POPF behaves differently depending on current CPU mode. Emulate correct logic to prevent guest from changing flags that it can't change otherwise. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Check IOPL level during io instruction emulationGleb Natapov2010-03-013-13/+87
| | | | | | | | | Make emulator check that vcpu is allowed to execute IN, INS, OUT, OUTS, CLI, STI. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: fix memory access during x86 emulationGleb Natapov2010-03-017-50/+142
| | | | | | | | | | | Currently when x86 emulator needs to access memory, page walk is done with broadest permission possible, so if emulated instruction was executed by userspace process it can still access kernel memory. Fix that by providing correct memory access to page walker during emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Add Virtual-8086 mode of emulationGleb Natapov2010-03-013-6/+10
| | | | | | | | | | For some instructions CPU behaves differently for real-mode and virtual 8086. Let emulator know which mode cpu is in, so it will not poke into vcpu state directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Add group9 instruction decodingGleb Natapov2010-03-011-2/+7
| | | | | | | | Use groups mechanism to decode 0F C7 instructions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Add group8 instruction decodingGleb Natapov2010-03-011-1/+6
| | | | | | | | Use groups mechanism to decode 0F BA instructions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: do not store wqh in irqfdMichael S. Tsirkin2010-03-011-3/+0
| | | | | | | wqh is unused, so we do not need to store it in irqfd anymore Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guestLiu Yu2010-03-013-17/+48
| | | | | | | | | | | Old method prematurely sets ESR and DEAR. Move this part after we decide to inject interrupt, which is more like hardware behave. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Hollis Blanchard <hollis@penguinppc.org> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: destroy ioapic device if fail to setup default irq routingWei Yongjun2010-03-011-1/+1
| | | | | | | | | | If KVM_CREATE_IRQCHIP fail due to kvm_setup_default_irq_routing(), ioapic device is not destroyed and kvm->arch.vioapic is not set to NULL, this may cause KVM_GET_IRQCHIP and KVM_SET_IRQCHIP access to unexcepted memory. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrlWei Yongjun2010-03-015-4/+28
| | | | | | | | | If we fail to init ioapic device or the fail to setup the default irq routing, the device register by kvm_create_pic() and kvm_ioapic_init() remain unregister. This patch fixed to do this. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failureWei Yongjun2010-03-011-1/+3
| | | | | | | | kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure due to cannot register io dev. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PIT: unregister kvm irq notifier if fail to create pitWei Yongjun2010-03-011-2/+3
| | | | | | | | | If fail to create pit, we should unregister kvm irq notifier which register in kvm_create_pit(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BITSheng Yang2010-03-012-3/+3
| | | | | | | Following the new SDM. Now the bit is named "Ignore PAT memory type". Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Add tracepoint for guest page agingAvi Kivity2010-03-012-3/+30
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix Codestyle in virt/kvm/coalesced_mmio.cJochen Maes2010-03-011-2/+2
| | | | | | | Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c Signed-off-by: Jochen Maes <jochen.maes@sejo.be> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Remove redundant reading of rax on OUT instructionsTakuya Yoshikawa2010-03-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | kvm_emulate_pio() and complete_pio() both read out the RAX register value and copy it to a place into which the value read out from the port will be copied later. This patch removes this redundancy. /*** snippet from arch/x86/kvm/x86.c ***/ int complete_pio(struct kvm_vcpu *vcpu) { ... if (!io->string) { if (io->in) { val = kvm_register_read(vcpu, VCPU_REGS_RAX); memcpy(&val, vcpu->arch.pio_data, io->size); kvm_register_write(vcpu, VCPU_REGS_RAX, val); } ... Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: emulate accessed bit for EPTRik van Riel2010-03-011-2/+8
| | | | | | | | | | | | | | | | | | | Currently KVM pretends that pages with EPT mappings never got accessed. This has some side effects in the VM, like swapping out actively used guest pages and needlessly breaking up actively used hugepages. We can avoid those very costly side effects by emulating the accessed bit for EPT PTEs, which should only be slightly costly because pages pass through page_referenced infrequently. TLB flushing is taken care of by kvm_mmu_notifier_clear_flush_young(). This seems to help prevent KVM guests from being swapped out when they should not on my system. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Introduce kvm_host_page_sizeJoerg Roedel2010-03-013-16/+28
| | | | | | | | | | This patch introduces a generic function to find out the host page size for a given gfn. This function is needed by the kvm iommu code. This patch also simplifies the x86 host_mapping_level function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Remove redundant test in vmx_set_efer()Julia Lawall2010-03-011-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | msr was tested above, so the second test is not needed. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ expression *x; expression e; identifier l; @@ if (x == NULL || ...) { ... when forall return ...; } ... when != goto l; when != x = e when != &x *x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: ia64: Fix string literal continuation linesJoe Perches2010-03-012-4/+4
| | | | | | | | String constants that are continued on subsequent lines with \ are not good. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Wire up .fpu_activate() callbackAvi Kivity2010-03-011-0/+1
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: fix kvm_fix_hypercall() to return X86EMUL_*Takuya Yoshikawa2010-03-011-6/+1
| | | | | | | | | | | | | | | This patch fixes kvm_fix_hypercall() to propagate X86EMUL_* info generated by emulator_write_emulated() to its callers: suggested by Marcelo. The effect of this is x86_emulate_insn() will begin to handle the page faults which occur in emulator_write_emulated(): this should be OK because emulator_write_emulated_onepage() always injects page fault when emulator_write_emulated() returns X86EMUL_PROPAGATE_FAULT. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix load_guest_segment_descriptor() to return X86EMUL_*Takuya Yoshikawa2010-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes load_guest_segment_descriptor() to return X86EMUL_PROPAGATE_FAULT when it tries to access the descriptor table beyond the limit of it: suggested by Marcelo. I have checked current callers of this helper function, - kvm_load_segment_descriptor() - kvm_task_switch() and confirmed that this patch will change nothing in the upper layers if we do not change the handling of this return value from load_guest_segment_descriptor(). Next step: Although fixing the kvm_task_switch() to handle the propagated faults properly seems difficult, and maybe not worth it because TSS is not used commonly these days, we can fix kvm_load_segment_descriptor(). By doing so, the injected #GP becomes possible to be handled by the guest. The only problem for this is how to differentiate this fault from the page faults generated by kvm_read_guest_virt(). We may have to split this function to achive this goal. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: enable PCI multiple-segments for pass-through deviceZhai, Edwin2010-03-015-5/+14
| | | | | | | | | Enable optional parameter (default 0) - PCI segment (or domain) besides BDF, when assigning PCI device to guest. Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Remove redundant check in vm_need_virtualize_apic_accesses()Gui Jianfeng2010-03-011-3/+1
| | | | | | | | flexpriority_enabled implies cpu_has_vmx_virtualize_apic_accesses() returning true, so we don't need this check here. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Trace failed msr reads and writesAvi Kivity2010-03-013-13/+22
| | | | | | | Record failed msrs reads and writes, and the fact that they failed as well. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix msr traceAvi Kivity2010-03-011-8/+8
| | | | | | | | - data is 64 bits wide, not unsigned long - rw is confusingly named Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: mark segments accessed on HW task switchGleb Natapov2010-03-011-13/+9
| | | | | | | | On HW task switch newly loaded segments should me marked as accessed. Reported-by: Lorenzo Martignoni <martignlo@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Pass cr0.mp through to the guest when the fpu is activeAvi Kivity2010-03-011-6/+9
| | | | | | | | | | | | | | When cr0.mp is clear, the guest doesn't expect a #NM in response to a WAIT instruction. Because we always keep cr0.mp set, it will get a #NM, and potentially be confused. Fix by keeping cr0.mp set only when the fpu is inactive, and passing it through when inactive. Reported-by: Lorenzo Martignoni <martignlo@gmail.com> Analyzed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: PPC E500: fix tlbcfg emulationLiu Yu2010-03-013-18/+10
| | | | | | | | | | | | | commit 55fb1027c1cf9797dbdeab48180da530e81b1c39 doesn't update tlbcfg correctly. Fix it. And since guest OS likes 'fixed' hardware, initialize tlbcfg everytime when guest access is useless. So move this part to init code. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: PPC: Add PVR/PIR init for E500Liu Yu2010-03-011-0/+6
| | | | | | | | | | commit 513579e3a391a3874c478a8493080822069976e8 change the way we emulate PVR/PIR, which left PVR/PIR uninitialized on E500, and make guest puzzled. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: PPC E500: Add register l1csr0 emulationLiu Yu2010-03-012-0/+7
| | | | | | | | | Latest kernel start to access l1csr0 to contron L1. We just tell guest no operation is on going. Signed-off-by: Liu Yu <yu.liu@freescale.com> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: Remove some useless code from alloc_mmu_pages()Wei Yongjun2010-03-011-5/+2
| | | | | | | | | If we fail to alloc page for vcpu->arch.mmu.pae_root, call to free_mmu_pages() is unnecessary, which just do free the page malloc for vcpu->arch.mmu.pae_root. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>