summaryrefslogtreecommitdiffstats
path: root/arch/x86 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'kvm-updates-2.6.26' of ↵Linus Torvalds2008-04-2727-548/+3365
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (147 commits) KVM: kill file->f_count abuse in kvm KVM: MMU: kvm_pv_mmu_op should not take mmap_sem KVM: SVM: remove selective CR0 comment KVM: SVM: remove now obsolete FIXME comment KVM: SVM: disable CR8 intercept when tpr is not masking interrupts KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabled KVM: export kvm_lapic_set_tpr() to modules KVM: SVM: sync TPR value to V_TPR field in the VMCB KVM: ppc: PowerPC 440 KVM implementation KVM: Add MAINTAINERS entry for PowerPC KVM KVM: ppc: Add DCR access information to struct kvm_run ppc: Export tlb_44x_hwater for KVM KVM: Rename debugfs_dir to kvm_debugfs_dir KVM: x86 emulator: fix lea to really get the effective address KVM: x86 emulator: fix smsw and lmsw with a memory operand KVM: x86 emulator: initialize src.val and dst.val for register operands KVM: SVM: force a new asid when initializing the vmcb KVM: fix kvm_vcpu_kick vs __vcpu_run race KVM: add ioctls to save/store mpstate KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_* ...
| * KVM: MMU: kvm_pv_mmu_op should not take mmap_semMarcelo Tosatti2008-04-271-3/+0
| | | | | | | | | | | | | | | | | | | | | | kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down in the MMU processing will take it if necessary, so as it is it can deadlock. Apparently a leftover from the days before slots_lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: remove selective CR0 commentJoerg Roedel2008-04-271-11/+0
| | | | | | | | | | | | | | | | | | | | | | There is not selective cr0 intercept bug. The code in the comment sets the CR0.PG bit. But KVM sets the CR4.PG bit for SVM always to implement the paged real mode. So the 'mov %eax,%cr0' instruction does not change the CR0.PG bit. Selective CR0 intercepts only occur when a bit is actually changed. So its the right behavior that there is no intercept on this instruction. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: remove now obsolete FIXME commentJoerg Roedel2008-04-271-7/+0
| | | | | | | | | | | | | | With the usage of the V_TPR field this comment is now obsolete. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: disable CR8 intercept when tpr is not masking interruptsJoerg Roedel2008-04-271-4/+27
| | | | | | | | | | | | | | | | | | This patch disables the intercept of CR8 writes if the TPR is not masking interrupts. This reduces the total number CR8 intercepts to below 1 percent of what we have without this patch using Windows 64 bit guests. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: sync V_TPR with LAPIC.TPR if CR8 write intercept is disabledJoerg Roedel2008-04-271-0/+12
| | | | | | | | | | | | | | | | If the CR8 write intercept is disabled the V_TPR field of the VMCB needs to be synced with the TPR field in the local apic. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: export kvm_lapic_set_tpr() to modulesJoerg Roedel2008-04-271-0/+1
| | | | | | | | | | | | | | | | This patch exports the kvm_lapic_set_tpr() function from the lapic code to modules. It is required in the kvm-amd module to optimize CR8 intercepts. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: sync TPR value to V_TPR field in the VMCBJoerg Roedel2008-04-271-2/+16
| | | | | | | | | | | | | | | | This patch adds syncing of the lapic.tpr field to the V_TPR field of the VMCB. With this change we can safely remove the CR8 read intercept. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: fix lea to really get the effective addressAvi Kivity2008-04-271-1/+1
| | | | | | | | | | | | We never hit this, since there is currently no reason to emulate lea. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: fix smsw and lmsw with a memory operandAvi Kivity2008-04-271-12/+17
| | | | | | | | | | | | | | | | lmsw and smsw were implemented only with a register operand. Extend them to support a memory operand as well. Fixes Windows running some display compatibility test on AMD hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86 emulator: initialize src.val and dst.val for register operandsAvi Kivity2008-04-271-0/+2
| | | | | | | | | | | | This lets us treat the case where mod == 3 in the same manner as other cases. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: force a new asid when initializing the vmcbAvi Kivity2008-04-271-1/+1
| | | | | | | | | | | | | | Shutdown interception clears the vmcb, leaving the asid at zero (which is illegal. so force a new asid on vmcb initialization. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: fix kvm_vcpu_kick vs __vcpu_run raceMarcelo Tosatti2008-04-271-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a window open between testing of pending IRQ's and assignment of guest_mode in __vcpu_run. Injection of IRQ's can race with __vcpu_run as follows: CPU0 CPU1 kvm_x86_ops->run() vcpu->guest_mode = 0 SET_IRQ_LINE ioctl .. kvm_x86_ops->inject_pending_irq kvm_cpu_has_interrupt() apic_test_and_set_irr() kvm_vcpu_kick if (vcpu->guest_mode) send_ipi() vcpu->guest_mode = 1 So move guest_mode=1 assignment before ->inject_pending_irq, and make sure that it won't reorder after it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: add ioctls to save/store mpstateMarcelo Tosatti2008-04-271-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | So userspace can save/restore the mpstate during migration. [avi: export the #define constants describing the value] [christian: add s390 stubs] [avi: ditto for ia64] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Rename VCPU_MP_STATE_* to KVM_MP_STATE_*Avi Kivity2008-04-273-18/+18
| | | | | | | | | | | | We wish to export it to userspace, so move it into the kvm namespace. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: hlt emulation should take in-kernel APIC/PIT timers into accountMarcelo Tosatti2008-04-274-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Timers that fire between guest hlt and vcpu_block's add_wait_queue() are ignored, possibly resulting in hangs. Also make sure that atomic_inc and waitqueue_active tests happen in the specified order, otherwise the following race is open: CPU0 CPU1 if (waitqueue_active(wq)) add_wait_queue() if (!atomic_read(pit_timer->pending)) schedule() atomic_inc(pit_timer->pending) Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: do not intercept task switch with NPTJoerg Roedel2008-04-271-0/+1
| | | | | | | | | | | | | | | | When KVM uses NPT there is no reason to intercept task switches. This patch removes the intercept for it in that case. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Add kvm trace userspace interfaceFeng(Eric) Liu2008-04-272-0/+14
| | | | | | | | | | | | | | | | This interface allows user a space application to read the trace of kvm related events through relayfs. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Add trace markersFeng (Eric) Liu2008-04-272-1/+60
| | | | | | | | | | | | | | | | Trace markers allow userspace to trace execution of a virtual machine in order to monitor its performance. Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: add intercept for machine check exceptionJoerg Roedel2008-04-271-1/+16
| | | | | | | | | | | | | | | | | | To properly forward a MCE occured while the guest is running to the host, we have to intercept this exception and call the host handler by hand. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: align shadow CR4.MCE with hostJoerg Roedel2008-04-271-0/+3
| | | | | | | | | | | | | | | | This patch aligns the host version of the CR4.MCE bit with the CR4 active in the guest. This is necessary to get MCE exceptions when the guest is running. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: SVM: indent svm_set_cr4 with tabs instead of spacesJoerg Roedel2008-04-271-4/+4
| | | | | | | | | | | | | | | | The svm_set_cr4 function is indented with spaces. This patch replaces them with tabs. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Don't assume struct page for x86Anthony Liguori2008-04-272-59/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: prepopulate guest pages after write-protectingMarcelo Tosatti2008-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zdenek reported a bug where a looping "dmsetup status" eventually hangs on SMP guests. The problem is that kvm_mmu_get_page() prepopulates the shadow MMU before write protecting the guest page tables. By doing so, it leaves a window open where the guest can mark a pte as present while the host has shadow cached such pte as "notrap". Accesses to such address will fault in the guest without the host having a chance to fix the situation. Fix by moving the write protection before the pte prefetch. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Only mark_page_accessed() if the page was accessed by the guestAvi Kivity2008-04-271-1/+2
| | | | | | | | | | | | | | | | | | | | If the accessed bit is not set, the guest has never accessed this page (at least through this spte), so there's no need to mark the page accessed. This provides more accurate data for the eviction algortithm. Noted by Andrea Arcangeli. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Free apic access page on vm destructionAvi Kivity2008-04-271-0/+2
| | | | | | | | | | | | Noticed by Marcelo Tosatti. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: allow the vm to shrink the kvm mmu shadow cachesIzik Eidus2008-04-271-2/+56
| | | | | | | | | | | | | | Allow the Linux memory manager to reclaim memory in the kvm shadow cache. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: unify slots_lock usageMarcelo Tosatti2008-04-274-51/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | Unify slots_lock acquision around vcpu_run(). This is simpler and less error-prone. Also fix some callsites that were not grabbing the lock properly. [avi: drop slots_lock while in guest mode to avoid holding the lock for indefinite periods] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: VMX: Enable MSR Bitmap featureSheng Yang2008-04-271-7/+60
| | | | | | | | | | | | | | | | | | MSR Bitmap controls whether the accessing of an MSR causes VM Exit. Eliminating exits on automatically saved and restored MSRs yields a small performance gain. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86: hardware task switching supportIzik Eidus2008-04-275-3/+498
| | | | | | | | | | | | | | | | | | This emulates the x86 hardware task switch mechanism in software, as it is unsupported by either vmx or svm. It allows operating systems which use it, like freedos, to run as kvm guests. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: x86: add functions to get the cpl of vcpuIzik Eidus2008-04-272-0/+23
| | | | | | | | | | Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: VMX: Add module option to disable flexpriorityAvi Kivity2008-04-271-2/+6
| | | | | | | | | | | | Useful for debugging. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: no longer EXPERIMENTALAvi Kivity2008-04-271-1/+1
| | | | | | | | | | | | Long overdue. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Introduce and use spte_to_page()Avi Kivity2008-04-271-5/+12
| | | | | | | | | | | | Encapsulate the pte mask'n'shift in a function. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: fix dirty bit setting when removing write permissionsIzik Eidus2008-04-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When mmu_set_spte() checks if a page related to spte should be release as dirty or clean, it check if the shadow pte was writeble, but in case rmap_write_protect() is called called it is possible for shadow ptes that were writeble to become readonly and therefor mmu_set_spte will release the pages as clean. This patch fix this issue by marking the page as dirty inside rmap_write_protect(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: Set the accessed bit on non-speculative shadow ptesAvi Kivity2008-04-272-5/+7
| | | | | | | | | | | | | | | | | | If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: KVM guest: disable clock before rebooting.Glauber Costa2008-04-271-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch writes 0 (actually, what really matters is that the LSB is cleared) to the system time msr before shutting down the machine for kexec. Without it, we can have a random memory location being written when the guest comes back It overrides the functions shutdown, used in the path of kernel_kexec() (sys.c) and crash_shutdown, used in the path of crash_kexec() (kexec.c) Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: make native_machine_shutdown non-staticGlauber Costa2008-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | it will allow external users to call it. It is mainly useful for routines that will override its machine_ops field for its own special purposes, but want to call the normal shutdown routine after they're done Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: allow machine_crash_shutdown to be replacedGlauber Costa2008-04-272-2/+12
| | | | | | | | | | | | | | | | | | This patch a llows machine_crash_shutdown to be replaced, just like any of the other functions in machine_ops Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: KVM guest: hypercall batchingMarcelo Tosatti2008-04-271-2/+60
| | | | | | | | | | | | | | | | | | | | | | Batch pte updates and tlb flushes in lazy MMU mode. [avi: - adjust to mmu_op - helper for getting para_state without debug warnings] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: KVM guest: hypercall based pte updates and TLB flushesMarcelo Tosatti2008-04-271-0/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - guest/host split - fix 32-bit truncation issues - adjust to mmu_op - adjust to ->release_*() renamed - add ->release_pud()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: MMU: hypercall based pte updates and TLB flushesMarcelo Tosatti2008-04-272-2/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Provide unlocked version of emulator_write_phys()Avi Kivity2008-04-271-7/+14
| | | | | | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
| * x86: KVM guest: add basic paravirt supportMarcelo Tosatti2008-04-275-0/+64
| | | | | | | | | | | | | | Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: add basic paravirt supportMarcelo Tosatti2008-04-271-0/+1
| | | | | | | | | | | | | | Add basic KVM paravirt support. Avoid vm-exits on IO delays. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Add reset support for in kernel PITSheng Yang2008-04-272-11/+20
| | | | | | | | | | | | | | Separate the reset part and prepare for reset support. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Add save/restore supporting of in kernel PITSheng Yang2008-04-273-0/+56
| | | | | | | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: In kernel PIT modelSheng Yang2008-04-275-1/+661
| | | | | | | | | | | | | | | | | | | | | | | | The patch moves the PIT model from userspace to kernel, and increases the timer accuracy greatly. [marcelo: make last_injected_time per-guest] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: Remove pointless desc_ptr #ifdefAvi Kivity2008-04-271-4/+0
| | | | | | | | | | | | The desc_struct changes left an unnecessary #ifdef; remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * KVM: VMX: Don't adjust tsc offset forwardAvi Kivity2008-04-271-3/+6
| | | | | | | | | | | | | | | | Most Intel hosts have a stable tsc, and playing with the offset only reduces accuracy. By limiting tsc offset adjustment only to forward updates, we effectively disable tsc offset adjustment on these hosts. Signed-off-by: Avi Kivity <avi@qumranet.com>