summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'kvm-arm-for-4.1' of ↵Paolo Bonzini2015-04-0747-703/+1012
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups
| * KVM: arm/arm64: enable KVM_CAP_IOEVENTFDNikolay Nikolaev2015-03-301-0/+1
| | | | | | | | | | | | | | | | | | As the infrastructure for eventfd has now been merged, report the ioeventfd capability as being supported. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> [maz: grouped the case entry with the others, fixed commit log] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO busAndre Przywara2015-03-308-221/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have struct kvm_exit_mmio for encapsulating MMIO abort data to be passed on from syndrome decoding all the way down to the VGIC register handlers. Now as we switch the MMIO handling to be routed through the KVM MMIO bus, it does not make sense anymore to use that structure already from the beginning. So we keep the data in local variables until we put them into the kvm_io_bus framework. Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private structure. On that way we replace the data buffer in that structure with a pointer pointing to a single location in a local variable, so we get rid of some copying on the way. With all of the virtual GIC emulation code now being registered with the kvm_io_bus, we can remove all of the old MMIO handling code and its dispatching functionality. I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something), because that touches a lot of code lines without any good reason. This is based on an original patch by Nikolay. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: prepare GICv3 emulation to use kvm_io_bus MMIO handlingAndre Przywara2015-03-302-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | Using the framework provided by the recent vgic.c changes, we register a kvm_io_bus device on mapping the virtual GICv3 resources. The distributor mapping is pretty straight forward, but the redistributors need some more love, since they need to be tagged with the respective redistributor (read: VCPU) they are connected with. We use the kvm_io_bus framework to register one devices per VCPU. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: merge GICv3 RD_base and SGI_base register framesAndre Przywara2015-03-301-91/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently we handle the redistributor registers in two separate MMIO regions, one for the overall behaviour and SPIs and one for the SGIs/PPIs. That latter forces the creation of _two_ KVM I/O bus devices for each redistributor. Since the spec mandates those two pages to be contigious, we could as well merge them and save the churn with the second KVM I/O bus device. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: prepare GICv2 emulation to be handled by kvm_io_busAndre Przywara2015-03-262-6/+17
| | | | | | | | | | | | | | | | Using the framework provided by the recent vgic.c changes we register a kvm_io_bus device when initializing the virtual GICv2. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: implement kvm_io_bus MMIO handling for the VGICAndre Przywara2015-03-263-0/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we use a lot of VGIC specific code to do the MMIO dispatching. Use the previous reworks to add kvm_io_bus style MMIO handlers. Those are not yet called by the MMIO abort handler, also the actual VGIC emulator function do not make use of it yet, but will be enabled with the following patches. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: simplify vgic_find_range() and callersAndre Przywara2015-03-263-17/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | The vgic_find_range() function in vgic.c takes a struct kvm_exit_mmio argument, but actually only used the length field in there. Since we need to get rid of that structure in that part of the code anyway, let's rework the function (and it's callers) to pass the length argument to the function directly. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: rename struct kvm_mmio_range to vgic_io_rangeAndre Przywara2015-03-264-22/+22
| | | | | | | | | | | | | | | | | | | | | | The name "kvm_mmio_range" is a bit bold, given that it only covers the VGIC's MMIO ranges. To avoid confusion with kvm_io_range, rename it to vgic_io_range. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: x86: remove now unneeded include directory from MakefileAndre Przywara2015-03-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the x86 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: remove now unneeded include directory from MakefileAndre Przywara2015-03-262-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | virt/kvm was never really a good include directory for anything else than locally included headers. With the move of iodev.h there is no need anymore to add this directory the compiler's include path, so remove it from the arm and arm64 kvm Makefile. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: move iodev.h from virt/kvm/ to include/kvmAndre Przywara2015-03-269-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iodev.h contains definitions for the kvm_io_bus framework. This is needed both by the generic KVM code in virt/kvm as well as by architecture specific code under arch/. Putting the header file in virt/kvm and using local includes in the architecture part seems at least dodgy to me, so let's move the file into include/kvm, so that a more natural "#include <kvm/iodev.h>" can be used by all of the code. This also solves a problem later when using struct kvm_io_device in arm_vgic.h. Fixing up the FSF address in the GPL header and a wrong include path on the way. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks.Nikolay Nikolaev2015-03-2614-64/+79
| | | | | | | | | | | | | | | | | | | | | | | | This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * arm/arm64: KVM: Fix migration race in the arch timerChristoffer Dall2015-03-143-10/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a VCPU is no longer running, we currently check to see if it has a timer scheduled in the future, and if it does, we schedule a host hrtimer to notify is in case the timer expires while the VCPU is still not running. When the hrtimer fires, we mask the guest's timer and inject the timer IRQ (still relying on the guest unmasking the time when it receives the IRQ). This is all good and fine, but when migration a VM (checkpoint/restore) this introduces a race. It is unlikely, but possible, for the following sequence of events to happen: 1. Userspace stops the VM 2. Hrtimer for VCPU is scheduled 3. Userspace checkpoints the VGIC state (no pending timer interrupts) 4. The hrtimer fires, schedules work in a workqueue 5. Workqueue function runs, masks the timer and injects timer interrupt 6. Userspace checkpoints the timer state (timer masked) At restore time, you end up with a masked timer without any timer interrupts and your guest halts never receiving timer interrupts. Fix this by only kicking the VCPU in the workqueue function, and sample the expired state of the timer when entering the guest again and inject the interrupt and mask the timer only then. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: support for un-queuing active IRQsChristoffer Dall2015-03-144-38/+212
| | | | | | | | | | | | | | | | | | | | Migrating active interrupts causes the active state to be lost completely. This implements some additional bitmaps to track the active state on the distributor and export this to user space. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: add a common vgic_queue_irq_to_lr fnAlex Bennée2015-03-141-7/+17
| | | | | | | | | | | | | | | | This helps re-factor away some of the repetitive code and makes the code flow more nicely. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: export VCPU power state via MP_STATE ioctlAlex Bennée2015-03-142-6/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | To cleanly restore an SMP VM we need to ensure that the current pause state of each vcpu is correctly recorded. Things could get confused if the CPU starts running after migration restore completes when it was paused before it state was captured. We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64 interface is a lot simpler as the only valid states are KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Optimize handling of Access Flag faultsMarc Zyngier2015-03-122-0/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have page aging in Stage-2, it becomes obvious that we're doing way too much work handling the fault. The page is not going anywhere (it is still mapped), the page tables are already allocated, and all we want is to flip a bit in the PMD or PTE. Also, we can avoid any form of TLB invalidation, since a page with the AF bit off is not allowed to be cached. An obvious solution is to have a separate handler for FSC_ACCESS, where we pride ourselves to only do the very minimum amount of work. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Implement Stage-2 page agingMarc Zyngier2015-03-127-23/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, KVM/arm didn't care much for page aging (who was swapping anyway?), and simply provided empty hooks to the core KVM code. With server-type systems now being available, things are quite different. This patch implements very simple support for page aging, by clearing the Access flag in the Stage-2 page tables. On access fault, the current fault handling will write the PTE or PMD again, putting the Access flag back on. It should be possible to implement a much faster handling for Access faults, but that's left for a later patch. With this in place, performance in VMs is degraded much more gracefully. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Allow handle_hva_to_gpa to return a valueMarc Zyngier2015-03-121-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | So far, handle_hva_to_gpa was never required to return a value. As we prepare to age pages at Stage-2, we need to be able to return a value from the iterator (kvm_test_age_hva). Adapt the code to handle this situation. No semantic change. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: add irqfd supportEric Auger2015-03-129-3/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables irqfd on arm/arm64. Both irqfd and resamplefd are supported. Injection is implemented in vgic.c without routing. This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD. KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set. Irqfd injection is restricted to SPI. The rationale behind not supporting PPI irqfd injection is that any device using a PPI would be a private-to-the-CPU device (timer for instance), so its state would have to be context-switched along with the VCPU and would require in-kernel wiring anyhow. It is not a relevant use case for irqfds. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: remove coarse grain dist locking at kvm_vgic_sync_hwstateEric Auger2015-03-121-8/+5
| | | | | | | | | | | | | | | | | | | | | | To prepare for irqfd addition, coarse grain locking is removed at kvm_vgic_sync_hwstate level and finer grain locking is introduced in vgic_process_maintenance only. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: implement kvm_arch_intc_initializedEric Auger2015-03-123-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On arm/arm64 the VGIC is dynamically instantiated and it is useful to expose its state, especially for irqfd setup. This patch defines __KVM_HAVE_ARCH_INTC_INITIALIZED and implements kvm_arch_intc_initialized. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: introduce kvm_arch_intc_initialized and use it in irqfdEric Auger2015-03-122-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce __KVM_HAVE_ARCH_INTC_INITIALIZED define and associated kvm_arch_intc_initialized function. This latter allows to test whether the virtual interrupt controller is initialized and ready to accept virtual IRQ injection. On some architectures, the virtual interrupt controller is dynamically instantiated, justifying that kind of check. The new function can now be used by irqfd to check whether the virtual interrupt controller is ready on KVM_IRQFD request. If not, KVM_IRQFD returns -EAGAIN. Signed-off-by: Eric Auger <eric.auger@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * KVM: arm/arm64: unset CONFIG_HAVE_KVM_IRQCHIPEric Auger2015-03-122-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along with irq_comm.c and irqchip.c usage). This is not the case for arm/arm64 currently. This patch unsets the flag for both arm and arm64. Signed-off-by: Eric Auger <eric.auger@linaro.org> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * arm/arm64: KVM: Kill CONFIG_KVM_ARM_{VGIC,TIMER}Christoffer Dall2015-03-1210-202/+20
| | | | | | | | | | | | | | | | | | | | | | | | We can definitely decide at run-time whether to use the GIC and timers or not, and the extra code and data structures that we allocate space for is really negligable with this config option, so I don't think it's worth the extra complexity of always having to define stub static inlines. The !CONFIG_KVM_ARM_VGIC/TIMER case is pretty much an untested code path anyway, so we're better off just getting rid of it. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: prefer IS_ENABLED to a static variablePaolo Bonzini2015-03-111-12/+5
| | | | | | | | | | | | | | | | | | | | IS_ENABLED gives compile-time checking and keeps the code clearer. The one exception is inside kvm_vm_ioctl_check_extension, where the established idiom is to wrap the case labels with an #ifdef. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: vgic: add virt-capable compatible stringsMark Rutland2015-03-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several dts only list "arm,cortex-a7-gic" or "arm,gic-400" in their GIC compatible list, and while this is correct (and supported by the GIC driver), KVM will fail to detect that it can support these cases. This patch adds the missing strings to the VGIC code. The of_device_id entries are padded to keep the probe function data aligned. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Michal Simek <monstr@monstr.eu> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | Merge tag 'kvm-arm-fixes-4.0-rc5' of ↵Paolo Bonzini2015-04-078-75/+105
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' Fixes for KVM/ARM for 4.0-rc5. Fixes page refcounting issues in our Stage-2 page table management code, fixes a missing unlock in a gicv3 error path, and fixes a race that can cause lost interrupts if signals are pending just prior to entering the guest.
| * | arm/arm64: KVM: Keep elrsr/aisr in sync with software modelChristoffer Dall2015-03-144-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm/arm64: KVM: fix missing unlock on error in kvm_vgic_create()Wei Yongjun2015-03-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | Add the missing unlock before return from function kvm_vgic_create() in the error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm64: KVM: Fix outdated comment about VTCR_EL2.PSMarc Zyngier2015-03-111-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 87366d8cf7b3 ("arm64: Add boot time configuration of Intermediate Physical Address size") removed the hardcoded setting of VTCR_EL2.PS to use ID_AA64MMFR0_EL1.PARange instead, but didn't remove the (now rather misleading) comment. Fix the comments to match reality (at least for the next few minutes). Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm64: KVM: Do not use pgd_index to index stage-2 pgdMarc Zyngier2015-03-113-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel's pgd_index macro is designed to index a normal, page sized array. KVM is a bit diffferent, as we can use concatenated pages to have a bigger address space (for example 40bit IPA with 4kB pages gives us an 8kB PGD. In the above case, the use of pgd_index will always return an index inside the first 4kB, which makes a guest that has memory above 0x8000000000 rather unhappy, as it spins forever in a page fault, whist the host happilly corrupts the lower pgd. The obvious fix is to get our own kvm_pgd_index that does the right thing(tm). Tested on X-Gene with a hacked kvmtool that put memory at a stupidly high address. Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| * | arm64: KVM: Fix stage-2 PGD allocation to have per-page refcountingMarc Zyngier2015-03-113-66/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're using __get_free_pages with to allocate the guest's stage-2 PGD. The standard behaviour of this function is to return a set of pages where only the head page has a valid refcount. This behaviour gets us into trouble when we're trying to increment the refcount on a non-head page: page:ffff7c00cfb693c0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x4000000000000000() page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0) BUG: failure at include/linux/mm.h:548/get_page()! Kernel panic - not syncing: BUG! CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825 Hardware name: APM X-Gene Mustang board (DT) Call trace: [<ffff80000008a09c>] dump_backtrace+0x0/0x13c [<ffff80000008a1e8>] show_stack+0x10/0x1c [<ffff800000691da8>] dump_stack+0x74/0x94 [<ffff800000690d78>] panic+0x100/0x240 [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0 [<ffff8000000a420c>] handle_exit+0x58/0x180 [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754 [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8 [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78 CPU0: stopping A possible approach for this is to split the compound page using split_page() at allocation time, and change the teardown path to free one page at a time. It turns out that alloc_pages_exact() and free_pages_exact() does exactly that. While we're at it, the PGD allocation code is reworked to reduce duplication. This has been tested on an X-Gene platform with a 4kB/48bit-VA host kernel, and kvmtool hacked to place memory in the second page of the hardware PGD (PUD for the host kernel). Also regression-tested on a Cubietruck (Cortex-A7). [ Reworked to use alloc_pages_exact() and free_pages_exact() and to return pointers directly instead of by reference as arguments - Christoffer ] Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
* | | x86: Use bool function return values of true/false not 1/0Joe Perches2015-03-313-38/+38
| | | | | | | | | | | | | | | | | | | | | | | | Use the normal return values for bool functions Signed-off-by: Joe Perches <joe@perches.com> Message-Id: <9f593eb2f43b456851cd73f7ed09654ca58fb570.1427759009.git.joe@perches.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: remove useless check of "ret" variable prior to returning the same valueEugene Korenevsky2015-03-301-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | A trivial code cleanup. This `if` is redundant. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150328222717.GA6508@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: Remove redundant definitionsNadav Amit2015-03-303-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some constants are redfined in emulate.c. Avoid it. s/SELECTOR_RPL_MASK/SEGMENT_RPL_MASK s/SELECTOR_TI_MASK/SEGMENT_TI_MASK No functional change. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427635984-8113-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: removing redundant eflags bits definitionsNadav Amit2015-03-302-61/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The eflags are redefined (using other defines) in emulate.c. Use the definition from processor-flags.h as some mess already started. No functional change. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427635984-8113-2-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: BSF and BSR emulation change register unnecassarilyNadav Amit2015-03-301-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the source of BSF and BSR is zero, the destination register should not change. That is how real hardware behaves. If we set the destination even with the same value that we had before, we may clear bits [63:32] unnecassarily. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427719163-5429-4-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: POPA emulation may not clear bits [63:32]Nadav Amit2015-03-301-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POPA should assign the values to the registers as usual registers are assigned. In other words, 32-bits register assignments should clear bits [63:32] of the register. Split the code of register assignments that will be used by future changes as well. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427719163-5429-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: CMOV emulation on legacy mode is wrongNadav Amit2015-03-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On legacy mode CMOV emulation should still clear bits [63:32] even if the assignment is not done. The previous fix 140bad89fd ("KVM: x86: emulation of dword cmov on long-mode should clear [63:32]") was incomplete. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427719163-5429-2-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | kvm: x86: i8259: return initialized data on invalid-size readPetr Matousek2015-03-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If data is read from PIC with invalid access size, the return data stays uninitialized even though success is returned. Fix this by always initializing the data. Signed-off-by: Petr Matousek <pmatouse@redhat.com> Reported-by: Nadav Amit <nadav.amit@gmail.com> Message-Id: <20150311111609.GG8544@dhcp-25-225.brq.redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge tag 'kvm_mips_20150327' of ↵Paolo Bonzini2015-03-3023-366/+1877
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/kvm-mips into kvm-next MIPS KVM Guest FPU & SIMD (MSA) Support Add guest FPU and MIPS SIMD Architecture (MSA) support to MIPS KVM, by enabling the host FPU/MSA while in guest mode. This adds two new KVM capabilities, KVM_CAP_MIPS_FPU & KVM_CAP_MIPS_MSA, and supports the 3 FP register modes (FR=0, FR=1, FRE=1), and 128-bit MSA vector registers, with lazy FPU/MSA context save and restore. Some required MIPS FP/MSA fixes are merged in from a branch in the MIPS tree first.
| * | | MIPS: KVM: Wire up MSA capabilityJames Hogan2015-03-273-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the code is in place for KVM to support MIPS SIMD Architecutre (MSA) in MIPS guests, wire up the new KVM_CAP_MIPS_MSA capability. For backwards compatibility, the capability must be explicitly enabled in order to detect or make use of MSA from the guest. The capability is not supported if the hardware supports MSA vector partitioning, since the extra support cannot be tested yet and it extends the state that the userland program would have to save. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org
| * | | MIPS: KVM: Expose MSA registersJames Hogan2015-03-273-3/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add KVM register numbers for the MIPS SIMD Architecture (MSA) registers, and implement access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG ioctls when the MSA capability is enabled (exposed in a later patch) and present in the guest according to its Config3.MSAP bit. The MSA vector registers use the same register numbers as the FPU registers except with a different size (128bits). Since MSA depends on Status.FR=1, these registers are inaccessible when Status.FR=0. These registers are returned as a single native endian 128bit value, rather than least significant half first with each 64-bit half native endian as the kernel uses internally. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org
| * | | MIPS: KVM: Add MSA exception handlingJames Hogan2015-03-275-2/+140
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add guest exception handling for MIPS SIMD Architecture (MSA) floating point exceptions and MSA disabled exceptions. MSA floating point exceptions from the guest need passing to the guest kernel, so for these a guest MSAFPE is emulated. MSA disabled exceptions are normally handled by passing a reserved instruction exception to the guest (because no guest MSA was supported), but the hypervisor can now handle them if the guest has MSA by passing an MSA disabled exception to the guest, or if the guest has MSA enabled by transparently restoring the guest MSA context and enabling MSA and the FPU. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
| * | | MIPS: KVM: Emulate MSA bits in COP0 interfaceJames Hogan2015-03-271-2/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emulate MSA related parts of COP0 interface so that the guest will be able to enable/disable MSA (Config5.MSAEn) once the MSA capability has been wired up. As with the FPU (Status.CU1) setting Config5.MSAEn has no immediate effect if the MSA state isn't live, as MSA state is restored lazily on first use. Changes after the MSA state has been restored take immediate effect, so that the guest can start getting MSA disabled exceptions right away for guest MSA operations. The MSA state is saved lazily too, as MSA may get re-enabled in the near future anyway. A special case is also added for when Status.CU1 is set while FR=0 and the MSA state is live. In this case we are at risk of getting reserved instruction exceptions if we try and save the MSA state, so we lose the MSA state sooner while MSA is still usable. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
| * | | MIPS: KVM: Add base guest MSA supportJames Hogan2015-03-276-19/+323
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add base code for supporting the MIPS SIMD Architecture (MSA) in MIPS KVM guests. MSA cannot yet be enabled in the guest, we're just laying the groundwork. As with the FPU, whether the guest's MSA context is loaded is stored in another bit in the fpu_inuse vcpu member. This allows MSA to be disabled when the guest disables it, but keeping the MSA context loaded so it doesn't have to be reloaded if the guest re-enables it. New assembly code is added for saving and restoring the MSA context, restoring only the upper half of the MSA context (for if the FPU context is already loaded) and for saving/clearing and restoring MSACSR (which can itself cause an MSA FP exception depending on the value). The MSACSR is restored before returning to the guest if MSA is already enabled, and the existing FP exception die notifier is extended to catch the possible MSA FP exception and step over the ctcmsa instruction. The helper function kvm_own_msa() is added to enable MSA and restore the MSA context if it isn't already loaded, which will be used in a later patch when the guest attempts to use MSA for the first time and triggers an MSA disabled exception. The existing FPU helpers are extended to handle MSA. kvm_lose_fpu() saves the full MSA context if it is loaded (which includes the FPU context) and both kvm_lose_fpu() and kvm_drop_fpu() disable MSA. kvm_own_fpu() also needs to lose any MSA context if FR=0, since there would be a risk of getting reserved instruction exceptions if CU1 is enabled and we later try and save the MSA context. We shouldn't usually hit this case since it will be handled when emulating CU1 changes, however there's nothing to stop the guest modifying the Status register directly via the comm page, which will cause this case to get hit. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org
| * | | MIPS: KVM: Wire up FPU capabilityJames Hogan2015-03-273-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the code is in place for KVM to support FPU in MIPS KVM guests, wire up the new KVM_CAP_MIPS_FPU capability. For backwards compatibility, the capability must be explicitly enabled in order to detect or make use of the FPU from the guest. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org
| * | | MIPS: KVM: Expose FPU registersJames Hogan2015-03-273-11/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add KVM register numbers for the MIPS FPU registers, and implement access to them with the KVM_GET_ONE_REG / KVM_SET_ONE_REG ioctls when the FPU capability is enabled (exposed in a later patch) and present in the guest according to its Config1.FP bit. The registers are accessible in the current mode of the guest, with each sized access showing what the guest would see with an equivalent access, and like the architecture they may become UNPREDICTABLE if the FR mode is changed. When FR=0, odd doubles are inaccessible as they do not exist in that mode. Signed-off-by: James Hogan <james.hogan@imgtec.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: linux-api@vger.kernel.org Cc: linux-doc@vger.kernel.org