linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2012-07-24	12	-79/+260
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
\| *	KVM: PPC: Critical interrupt emulation support	Bharat Bhushan	2012-07-11	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rfci instruction and CSRR0/1 registers are emulated. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests	Mihai Caraman	2012-07-11	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tlbilxva emulation was using an u32 variable for guest effective address. Replace it with gva_t type to handle 64-bit guests. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC64: booke: Set interrupt computation mode for 64-bit host	Mihai Caraman	2012-07-11	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	64-bit host needs to remain in 64-bit mode when an exception take place. Set interrupt computaion mode in EPCR register. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt	Mihai Caraman	2012-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ESR register is required by Data Storage Interrupt handling code. Add the specific flag to the interrupt handler. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: bookehv64: Add support for std/ld emulation.	Varun Sethi	2012-07-11	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for std/ld emulation. Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	booke: Added crit/mc exception handler for e500v2	Bharat Bhushan	2012-07-11	1	-28/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Watchdog is taken at critical exception level. So this patch is tested with host watchdog exception happening when guest is running. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	booke/bookehv: Add host crit-watchdog exception support	Bharat Bhushan	2012-07-11	1	-0/+21
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: booke: Added DECAR support	Bharat Bhushan	2012-05-30	3	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Added the decrementer auto-reload support. DECAR is readable on e500v2/e500mc and later cpus. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Book3S HV: Make the guest hash table size configurable	Paul Mackerras	2012-05-30	5	-48/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new ioctl to enable userspace to control the size of the guest hashed page table (HPT) and to clear it out when resetting the guest. The KVM_PPC_ALLOCATE_HTAB ioctl is a VM ioctl and takes as its parameter a pointer to a u32 containing the desired order of the HPT (log base 2 of the size in bytes), which is updated on successful return to the actual order of the HPT which was allocated. There must be no vcpus running at the time of this ioctl. To enforce this, we now keep a count of the number of vcpus running in kvm->arch.vcpus_running. If the ioctl is called when a HPT has already been allocated, we don't reallocate the HPT but just clear it out. We first clear the kvm->arch.rma_setup_done flag, which has two effects: (a) since we hold the kvm->lock mutex, it will prevent any vcpus from starting to run until we're done, and (b) it means that the first vcpu to run after we're done will re-establish the VRMA if necessary. If userspace doesn't call this ioctl before running the first vcpu, the kernel will allocate a default-sized HPT at that point. We do it then rather than when creating the VM, as the code did previously, so that userspace has a chance to do the ioctl if it wants. When allocating the HPT, we can allocate either from the kernel page allocator, or from the preallocated pool. If userspace is asking for a different size from the preallocated HPTs, we first try to allocate using the kernel page allocator. Then we try to allocate from the preallocated pool, and then if that fails, we try allocating decreasing sizes from the kernel page allocator, down to the minimum size allowed (256kB). Note that the kernel page allocator limits allocations to 1 << CONFIG_FORCE_MAX_ZONEORDER pages, which by default corresponds to 16MB (on 64-bit powerpc, at least). Signed-off-by: Paul Mackerras <paulus@samba.org> [agraf: fix module compilation] Signed-off-by: Alexander Graf <agraf@suse.de>
* \|	Merge branch 'next' of ↵	Linus Torvalds	2012-07-24	6	-416/+400
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Notable highlights: - iommu improvements from Anton removing the per-iommu global lock in favor of dividing the DMA space into pools, each with its own lock, and hashed on the CPU number. Along with making the locking more fine grained, this gives significant improvements in multiqueue networking scalability. - Still from Anton, we know provide a vdso based variant of getcpu which makes sched_getcpu with the appropriate glibc patch something like 18 times faster. - More anton goodness (he's been busy !) in other areas such as a faster __clear_user and copy_page on P7, various perf fixes to improve sampling quality, etc... - One more step toward removing legacy i2c interfaces by using new device-tree based probing of platform devices for the AOA audio drivers - A nice series of patches from Michael Neuling that helps avoiding confusion between register numbers and litterals in assembly code, trying to enforce the use of "%rN" register names in gas rather than plain numbers. - A pile of FSL updates - The usual bunch of small fixes, cleanups etc... You may spot a change to drivers/char/mem. The patch got no comment or ack from outside, it's a trivial patch to allow the architecture to skip creating /dev/port, which we use to disable it on ppc64 that don't have a legacy brige. On those, IO ports 0...64K are not mapped in kernel space at all, so accesses to /dev/port cause oopses (and yes, distros -still- ship userspace that bangs hard coded ports such as kbdrate)." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits) powerpc/mpic: Create a revmap with enough entries for IPIs and timers Remove stale .rej file powerpc/iommu: Fix iommu pool initialization powerpc/eeh: Check handle_eeh_events() return value powerpc/85xx: Add phy nodes in SGMII mode for MPC8536/44/72DS & P2020DS powerpc/e500: add paravirt QEMU platform powerpc/mpc85xx_ds: convert to unified PCI init powerpc/fsl-pci: get PCI init out of board files powerpc/85xx: Update corenet64_smp_defconfig powerpc/85xx: Update corenet32_smp_defconfig powerpc/85xx: Rename P1021RDB-PC device trees to be consistent powerpc/watchdog: move booke watchdog param related code to setup-common.c sound/aoa: Adapt to new i2c probing scheme i2c/powermac: Improve detection of devices from device-tree powerpc: Disable /dev/port interface on systems without an ISA bridge of: Improve prom_update_property() function powerpc: Add "memory" attribute for mfmsr() powerpc/ftrace: Fix assembly trampoline register usage powerpc/hw_breakpoints: Fix incorrect pointer access powerpc: Put the gpr save/restore functions in their own section ...
\| * \|	powerpc: Add VDSO version of getcpu	Anton Blanchard	2012-07-11	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a request for a fast method of getting CPU and NUMA node IDs from userspace. This patch implements a getcpu VDSO function, similar to x86. Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be modified by a KVM guest, so we save the SPRG3 value in the paca and restore it when transitioning from the guest to the host. I have a glibc patch that implements sched_getcpu on top of this. Testing on a POWER7: baseline: 538 cycles vdso: 30 cycles Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
\| * \|	powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly	Stuart Yoder	2012-07-11	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
\| * \|	powerpc: Enforce usage of R0-R31 where possible	Michael Neuling	2012-07-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enforce the use of R0-R31 in macros where possible now we have all the fixes in. R0-R31 macros are removed here so that can't be used anymore. They should not be defined anywhere. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
\| * \|	powerpc: Move and fix MTMSR_EERI definition	Benjamin Herrenschmidt	2012-07-10	2	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move this duplicated definition to ppc_asm.h and remove the braces which prevent the use of %rN register names Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
\| * \|	powerpc: Merge VCPU_GPR	Michael Neuling	2012-07-10	4	-14/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge the defines of VCPU_GPR from different places. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
\| * \|	powerpc: Fix usage of register macros getting ready for %r0 change	Michael Neuling	2012-07-10	4	-393/+393
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Anything that uses a constructed instruction (ie. from ppc-opcode.h), need to use the new R0 macro, as %r0 is not going to work. Also convert usages of macros where we are just determining an offset (usually for a load/store), like: std r14,STK_REG(r14)(r1) Can't use STK_REG(r14) as %r14 doesn't work in the STK_REG macro since it's just calculating an offset. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \| \|	Merge branch 'for-upstream-master' of git://github.com/agraf/linux-2.6	Avi Kivity	2012-07-11	1	-0/+1
\|\ \ \ \| \|/ / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PPC fix from Alex Graf: "It contains an important bug fix which can lead to guest freezes when using PAPR guests with PR KVM." * 'for-upstream-master' of git://github.com/agraf/linux-2.6: powerpc/kvm: Fix "PR" KVM implementation of H_CEDE Signed-off-by: Avi Kivity <avi@redhat.com>
\| * \|	powerpc/kvm: Fix "PR" KVM implementation of H_CEDE	Benjamin Herrenschmidt	2012-07-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	H_CEDE should enable the vcpu's MSR:EE bit. It does on "HV" KVM (it's burried in the assembly code though) and as far as I can tell, qemu does it as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* \| \|	powerpc/kvm: sldi should be sld	Michael Neuling	2012-07-02	1	-1/+1
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we are taking a registers, this should never have been an sldi. Talking to paulus offline, this is the correct fix. Was introduced by: commit 19ccb76a1938ab364a412253daec64613acbf3df Author: Paul Mackerras <paulus@samba.org> Date: Sat Jul 23 17:42:46 2011 +1000 Talking to paulus, this shouldn't be a literal. Signed-off-by: Michael Neuling <mikey@neuling.org> CC: <stable@kernel.org> [v3.2+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* /	KVM: PPC: Book3S HV: Drop locks around call to kvmppc_pin_guest_page	Paul Mackerras	2012-06-19	1	-30/+66
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment we call kvmppc_pin_guest_page() in kvmppc_update_vpa() with two spinlocks held: the vcore lock and the vcpu->vpa_update_lock. This is not good, since kvmppc_pin_guest_page() calls down_read() and get_user_pages_fast(), both of which can sleep. This bug was introduced in 2e25aa5f ("KVM: PPC: Book3S HV: Make virtual processor area registration more robust"). This arranges to drop those spinlocks before calling kvmppc_pin_guest_page() and re-take them afterwards. Dropping the vcore lock in kvmppc_run_core() means we have to set the vcore_state field to VCORE_RUNNING before we drop the lock, so that other vcpus won't try to run this vcore. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
*	Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2012-05-25	30	-1281/+3583
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
\| *	KVM: PPC: Emulator: clean up SPR reads and writes	Alexander Graf	2012-05-06	7	-142/+186
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reading and writing SPRs, every SPR emulation piece had to read or write the respective GPR the value was read from or stored in itself. This approach is pretty prone to failure. What if we accidentally implement mfspr emulation where we just do "break" and nothing else? Suddenly we would get a random value in the return register - which is always a bad idea. So let's consolidate the generic code paths and only give the core specific SPR handling code readily made variables to read/write from/to. Functionally, this patch doesn't change anything, but it increases the readability of the code and makes is less prone to bugs. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Emulator: clean up instruction parsing	Alexander Graf	2012-05-06	5	-137/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instructions on PPC are pretty similarly encoded. So instead of every instruction emulation code decoding the instruction fields itself, we can move that code to more generic places and rely on the compiler to optimize the unused bits away. This has 2 advantages. It makes the code smaller and it makes the code less error prone, as the instruction fields are always available, so accidental misusage is reduced. Functionally, this patch doesn't change anything. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	kvm/powerpc: Add new ioctl to retreive server MMU infos	Benjamin Herrenschmidt	2012-05-06	3	-1/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is necessary for qemu to be able to pass the right information to the guest, such as the supported page sizes and corresponding encodings in the SLB and hash table, which can vary depending on the processor type, the type of KVM used (PR vs HV) and the version of KVM Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: fix compilation on hv, adjust for newer ioctl numbers] Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM	Benjamin Herrenschmidt	2012-05-06	7	-111/+186
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is nothing in the code for emulating TCE tables in the kernel that prevents it from working on "PR" KVM... other than ifdef's and location of the code. This and moves the bulk of the code there to a new file called book3s_64_vio.c. This speeds things up a bit on my G5. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: fix for hv kvm, 32bit, whitespace] Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler	Mihai Caraman	2012-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Guest r8 register is held in the scratch register and stored correctly, so remove the instruction that clobbers it. Guest r13 was missing from vcpu, store it there. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Book3S: Enable IRQs during exit handling	Alexander Graf	2012-05-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	While handling an exit, we should listen for interrupts and make sure to receive them when they arrive, to keep our latencies low. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Fix PR KVM on POWER7 bare metal	Alexander Graf	2012-05-06	1	-13/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running on a system that is HV capable, some interrupts use HSRR SPRs instead of the normal SRR SPRs. These are also used in the Linux handlers to jump back to code after an interrupt got processed. Unfortunately, in our "jump back to the real host handler after we've done the context switch" code, we were only setting the SRR SPRs, rendering Linux to jump back to some invalid IP after it's processed the interrupt. This fixes random crashes on p7 opal mode with PR KVM for me. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Fix stbux emulation	Alexander Graf	2012-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stbux writes the address it's operating on to the register specified in ra, not into the data source register. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields	Mihai Caraman	2012-05-06	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Interrupt code used PPC_LL/PPC_STL macros to load/store some of u32 fields which led to memory overflow on 64-bit. Use lwz/stw instead. Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Book3S: PR: No isync in slbie path	Alexander Graf	2012-05-06	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While messing around with the SLBs we're running in real mode. The entry to guest space goes through rfid, which is context synchronizing, so there's no need to manually synchronize anything through isync. With this patch and a simple priviledged SPR access loop guest, I get a speed bump from 2035607 to 2181301 exits per second. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Book3S: PR: Optimize entry path	Alexander Graf	2012-05-06	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By shuffling a few instructions around we can execute more memory loads in parallel, giving us a small performance boost. With this patch and a simple priviledged SPR access loop guest, I get a speed bump from 2013052 to 2035607 exits per second. Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: booke(hv): Fix save/restore of guest accessible SPRGs.	Varun Sethi	2012-05-06	2	-12/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and non-64 bit case. Use the PPC_STD/PPC_LD macros for saving/restoring to/from these registers. Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: bookehv: Use a Macro for saving/restoring guest registers to/from ↵	Varun Sethi	2012-05-06	1	-20/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	their 64 bit copies. Introduced PPC_STD/PPC_LD macros for saving/restoring guest registers to/from their 64 bit copies. Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: PPC: Use clockevent multiplier and shifter for decrementer	Bharat Bhushan	2012-05-06	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Time for which the hrtimer is started for decrementer emulation is calculated using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC reprogramming (if needed) and which calculate timebase ticks using the multiplier and shifter mechanism implemented within clockevent layer. It was observed that this conversion (timebase->time->timebase) are not correct because the mechanism are not consistent. In our setup it adds 2% jitter. With this patch clockevent multiplier and shifter mechanism are used when starting hrtimer for decrementer emulation. Now the jitter is < 0.5%. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	KVM: Use minimum and maximum address mapped by TLB1	Bharat Bhushan	2012-05-06	2	-2/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep track of minimum and maximum address mapped by tlb1. This helps in TLBMISS handling in KVM to quick check whether the address lies in mapped range. If address does not lies in this range then no need to look in each tlb1 entry of tlb1 array. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
\| *	powerpc/kvm: Fix magic page vs. 32-bit RTAS on ppc64	Benjamin Herrenschmidt	2012-04-08	2	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the kernel calls into RTAS, it switches to 32-bit mode. The magic page was is longer accessible in that case, causing the patched instructions in the RTAS call wrapper to crash. This fixes it by making available a 32-bit mapping of the magic page in that case. This mapping is flushed whenever we switch the kernel back to 64-bit mode. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: add a check if the magic page is mapped] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Ignore unhalt request from kvm_vcpu_block	Alexander Graf	2012-04-08	3	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running kvm_vcpu_block and it realizes that the CPU is actually good to run, we get a request bit set for KVM_REQ_UNHALT. Right now, there's nothing we can do with that bit, so let's unset it right after the call again so we don't get confused in our later checks for pending work. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Book3s: PR: Add HV traps so we can run in HV=1 mode on p7	Alexander Graf	2012-04-08	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running PR KVM on a p7 system in bare metal, we get HV exits instead of normal supervisor traps. Semantically they are identical though and the HSRR vs SRR difference is already taken care of in the exit code. So all we need to do is handle them in addition to our normal exits. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Emulate tw and td instructions	Alexander Graf	2012-04-08	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are 4 conditional trapping instructions: tw, twi, td, tdi. The ones with an i take an immediate comparison, the others compare two registers. All of them arrive in the emulator when the condition to trap was successfully fulfilled. Unfortunately, we were only implementing the i versions so far, so let's also add support for the other two. This fixes kernel booting with recents book3s_32 guest kernels. Reported-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Pass EA to updating emulation ops	Alexander Graf	2012-04-08	4	-30/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When emulating updating load/store instructions (lwzu, stwu, ...) we need to write the effective address of the load/store into a register. Currently, we write the physical address in there, which is very wrong. So instead let's save off where the virtual fault was on MMIO and use that information as value to put into the register. While at it, also move the XOP variants of the above instructions to the new scheme of using the already known vaddr instead of calculating it themselves. Reported-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Work around POWER7 DABR corruption problem	Paul Mackerras	2012-04-08	2	-41/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that on POWER7, writing to the DABR can cause a corrupted value to be written if the PMU is active and updating SDAR in continuous sampling mode. To work around this, we make sure that the PMU is inactive and SDAR updates are disabled (via MMCRA) when we are context-switching DABR. When the guest sets DABR via the H_SET_DABR hypercall, we use a slightly different workaround, which is to read back the DABR and write it again if it got corrupted. While we are at it, make it consistent that the saving and restoring of the guest's non-volatile GPRs and the FPRs are done with the guest setup of the PMU active. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	Restore guest CR after exit timing calculation	Bharat Bhushan	2012-04-08	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	No instruction which can change Condition Register (CR) should be executed after Guest CR is loaded. So the guest CR is restored after the Exit Timing in lightweight_exit executes cmpw, which can clobber CR. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace log	Paul Mackerras	2012-04-08	1	-1/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds code to measure "stolen" time per virtual core in units of timebase ticks, and to report the stolen time to the guest using the dispatch trace log (DTL). The guest can register an area of memory for the DTL for a given vcpu. The DTL is a ring buffer where KVM fills in one entry every time it enters the guest for that vcpu. Stolen time is measured as time when the virtual core is not running, either because the vcore is not runnable (e.g. some of its vcpus are executing elsewhere in the kernel or in userspace), or when the vcpu thread that is running the vcore is preempted. This includes time when all the vcpus are idle (i.e. have executed the H_CEDE hypercall), which is OK because the guest accounts stolen time while idle as idle time. Each vcpu keeps a record of how much stolen time has been reported to the guest for that vcpu so far. When we are about to enter the guest, we create a new DTL entry (if the guest vcpu has a DTL) and report the difference between total stolen time for the vcore and stolen time reported so far for the vcpu as the "enqueue to dispatch" time in the DTL entry. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Book3S HV: Make virtual processor area registration more robust	Paul Mackerras	2012-04-08	1	-69/+158
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PAPR API allows three sorts of per-virtual-processor areas to be registered (VPA, SLB shadow buffer, and dispatch trace log), and furthermore, these can be registered and unregistered for another virtual CPU. Currently we just update the vcpu fields pointing to these areas at the time of registration or unregistration. If this is done on another vcpu, there is the possibility that the target vcpu is using those fields at the time and could end up using a bogus pointer and corrupting memory. This fixes the race by making the target cpu itself do the update, so we can be sure that the update happens at a time when the fields aren't being used. Each area now has a struct kvmppc_vpa which is used to manage these updates. There is also a spinlock which protects access to all of the kvmppc_vpa structs, other than to the pinned_addr fields. (We could have just taken the spinlock when using the vpa, slb_shadow or dtl fields, but that would mean taking the spinlock on every guest entry and exit.) This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry', which is what the rest of the kernel uses. Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out the need to initialize vcpu->arch.vpa_update_lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs	Paul Mackerras	2012-04-08	2	-41/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently on POWER7, if we are running the guest on a core and we don't need all the hardware threads, we do nothing to ensure that the unused threads aren't executing in the kernel (other than checking that they are offline). We just assume they're napping and we don't do anything to stop them trying to enter the kernel while the guest is running. This means that a stray IPI can wake up the hardware thread and it will then try to enter the kernel, but since the core is in guest context, it will execute code from the guest in hypervisor mode once it turns the MMU on, which tends to lead to crashes or hangs in the host. This fixes the problem by adding two new one-byte flags in the kvmppc_host_state structure in the PACA which are used to interlock between the primary thread and the unused secondary threads when entering the guest. With these flags, the primary thread can ensure that the unused secondaries are not already in kernel mode (i.e. handling a stray IPI) and then indicate that they should not try to enter the kernel if they do get woken for any reason. Instead they will go into KVM code, find that there is no vcpu to run, acknowledge and clear the IPI and go back to nap mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Save/Restore CR over vcpu_run	Alexander Graf	2012-04-08	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls. We didn't respect that for any architecture until Paul spotted it in his patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Book3s: PR: Add SPAPR H_BULK_REMOVE support	Matt Evans	2012-04-08	1	-4/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SPAPR support includes various in-kernel hypercalls, improving performance by cutting out the exit to userspace. H_BULK_REMOVE is implemented in this patch. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
\| *	KVM: PPC: Booke: only prepare to enter when we enter	Alexander Graf	2012-04-08	1	-8/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So far, we've always called prepare_to_enter even when all we did was return to the host. This patch changes that semantic to only call prepare_to_enter when we actually want to get back into the guest. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>