summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* KVM: x86: Fix immediate_exit handling for uninitialized APJan H. Schönherr2017-09-131-0/+4
| | | | | | | | | | | | When user space sets kvm_run->immediate_exit, KVM is supposed to return quickly. However, when a vCPU is in KVM_MP_STATE_UNINITIALIZED, the value is not considered and the vCPU blocks. Fix that oversight. Fixes: 460df4c1fc7c008 ("KVM: race-free exit from KVM_RUN without POSIX signals") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: x86: Fix handling of pending signal on uninitialized APJan H. Schönherr2017-09-131-0/+5
| | | | | | | | | | | | KVM API says that KVM_RUN will return with -EINTR when a signal is pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then the return value is unconditionally -EAGAIN. Copy over some code from vcpu_run(), so that the case of a pending signal results in the expected return value. Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: SVM: Add a missing 'break' statementJan H. Schönherr2017-09-131-0/+1
| | | | | | | Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Fixes: f6511935f424 ("KVM: SVM: Add checks for IO instructions") Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* KVM: x86: Remove .get_pkru() from kvm_x86_opsJoerg Roedel2017-09-131-1/+0
| | | | | | | | | | | | The commit 9dd21e104bc ('KVM: x86: simplify handling of PKRU') removed all users and providers of that call-back, but didn't remove it. Remove it now. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-09-0964-768/+1479
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Radim Krčmář: "First batch of KVM changes for 4.14 Common: - improve heuristic for boosting preempted spinlocks by ignoring VCPUs in user mode ARM: - fix for decoding external abort types from guests - added support for migrating the active priority of interrupts when running a GICv2 guest on a GICv3 host - minor cleanup PPC: - expose storage keys to userspace - merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations - fixes s390: - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups x86: - emulate Hyper-V TSC frequency MSRs - add nested INVPCID - emulate EPTP switching VMFUNC - support Virtual GIF - support 5 level page tables - speedup nested VM exits by packing byte operations - speedup MMIO by using hardware provided physical address - a lot of fixes and cleanups, especially nested" * tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits) KVM: arm/arm64: Support uaccess of GICC_APRn KVM: arm/arm64: Extract GICv3 max APRn index calculation KVM: arm/arm64: vITS: Drop its_ite->lpi field KVM: arm/arm64: vgic: constify seq_operations and file_operations KVM: arm/arm64: Fix guest external abort matching KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd KVM: s390: vsie: cleanup mcck reinjection KVM: s390: use WARN_ON_ONCE only for checking KVM: s390: guestdbg: fix range check KVM: PPC: Book3S HV: Report storage key support to userspace KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9 KVM: PPC: Book3S HV: Fix invalid use of register expression KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER KVM: PPC: e500mc: Fix a NULL dereference KVM: PPC: e500: Fix some NULL dereferences on error KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list KVM: s390: we are always in czam mode KVM: s390: expose no-DAT to guest and migration support KVM: s390: sthyi: remove invalid guest write access ...
| * Merge branch 'kvm-ppc-fixes' of ↵Radim Krčmář2017-09-08488-2101/+6346
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc This fix was intended for 4.13, but didn't get in because both maintainers were on vacation. Paul Mackerras: "It adds mutual exclusion between list_add_rcu and list_del_rcu calls on the kvm->arch.spapr_tce_tables list. Without this, userspace could potentially trigger corruption of the list and cause a host crash or worse."
| | * KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables listPaul Mackerras2017-08-301-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Al Viro pointed out that while one thread of a process is executing in kvm_vm_ioctl_create_spapr_tce(), another thread could guess the file descriptor returned by anon_inode_getfd() and close() it before the first thread has added it to the kvm->arch.spapr_tce_tables list. That highlights a more general problem: there is no mutual exclusion between writers to the spapr_tce_tables list, leading to the possibility of the list becoming corrupted, which could cause a host kernel crash. To fix the mutual exclusion problem, we add a mutex_lock/unlock pair around the list_del_rce in kvm_spapr_tce_release(). Also, this moves the call to anon_inode_getfd() inside the region protected by the kvm->lock mutex, after we have done the check for a duplicate LIOBN. This means that if another thread does guess the file descriptor and closes it, its call to kvm_spapr_tce_release() will not do any harm because it will have to wait until the first thread has released kvm->lock. With this, there are no failure points in kvm_vm_ioctl_create_spapr_tce() after the call to anon_inode_getfd(). The other things that the second thread could do with the guessed file descriptor are to mmap it or to pass it as a parameter to a KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM device fd. An mmap call won't cause any harm because kvm_spapr_tce_mmap() and kvm_spapr_tce_fault() don't access the spapr_tce_tables list or the kvmppc_spapr_tce_table.list field, and the fields that they do use have been properly initialized by the time of the anon_inode_getfd() call. The KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl calls kvm_spapr_tce_attach_iommu_group(), which scans the spapr_tce_tables list looking for the kvmppc_spapr_tce_table struct corresponding to the fd given as the parameter. Either it will find the new entry or it won't; if it doesn't, it just returns an error, and if it does, it will function normally. So, in each case there is no harmful effect. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | Merge branch 'kvm-ppc-next' of ↵Radim Krčmář2017-09-0721-75/+183
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc KVM/PPC update for 4.14 There are various minor fixes and cleanups. The only new feature is that we now export information about storage key support to userspace, so it can advertise it to the guest. I have pulled in Michael Ellerman's topic/ppc-kvm branch from the powerpc tree to get a couple of fixes that touch both KVM PPC code and other PPC code. That's why there is some arch/powerpc stuff in the diffstat that isn't arch/powerpc/kvm.
| | * | KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fdnixiaoming2017-09-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do ctx = kzalloc(sizeof(*ctx), GFP_KERNEL) and then later on call anon_inode_getfd(), but if that fails we don't free ctx, so that memory gets leaked. To fix it, this adds kfree(ctx) in the failure path. Signed-off-by: nixiaoming <nixiaoming@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras2017-08-3117-67/+151
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This merges in the 'ppc-kvm' topic branch from the powerpc tree in order to bring in some fixes which touch both powerpc and KVM code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: Book3S HV: Report storage key support to userspacePaul Mackerras2017-08-312-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds information about storage keys to the struct returned by the KVM_PPC_GET_SMMU_INFO ioctl. The new fields replace a pad field, which was zeroed by previous kernel versions. Thus userspace that knows about the new fields will see zeroes when running on an older kernel, indicating that storage keys are not supported. The size of the structure has not changed. The number of keys is hard-coded for the CPUs supported by HV KVM, which is just POWER7, POWER8 and POWER9. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9Paul Mackerras2017-08-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2f2724630f7a ("KVM: PPC: Book3S HV: Cope with host using large decrementer mode", 2017-05-22) added code to treat the hypervisor decrementer (HDEC) as a 64-bit value on POWER9 rather than 32-bit. Unfortunately, that commit missed one place where HDEC is treated as a 32-bit value. This fixes it. This bug should not have any user-visible consequences that I can think of, beyond an occasional unnecessary exit to the host kernel. If the hypervisor decrementer has gone negative, then the bottom 32 bits will be negative for about 4 seconds after that, so as long as we get out of the guest within those 4 seconds we won't conclude that the HDEC interrupt is spurious. Reported-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Fixes: 2f2724630f7a ("KVM: PPC: Book3S HV: Cope with host using large decrementer mode") Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: Book3S HV: Fix invalid use of register expressionAndreas Schwab2017-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | binutils >= 2.26 now warns about misuse of register expressions in assembler operands that are actually literals. In this instance r0 is being used where a literal 0 should be used. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> [mpe: Split into separate KVM patch, tweak change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validationNicholas Piggin2017-08-311-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM currently validates the size of the VPA registered by the client against sizeof(struct lppaca), however we align (and therefore size) that struct to 1kB to avoid crossing a 4kB boundary in the client. PAPR calls for sizes >= 640 bytes to be accepted. Hard code this with a comment. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTERRam Pai2017-08-312-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In handling a H_ENTER hypercall, the code in kvmppc_do_h_enter clobbers the high-order two bits of the storage key, which is stored in a split field in the second doubleword of the HPTE. Any storage key number above 7 hence fails to operate correctly. This makes sure we preserve all the bits of the storage key. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: e500mc: Fix a NULL dereferenceDan Carpenter2017-08-311-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should set "err = -ENOMEM;", otherwise it means we're returning ERR_PTR(0) which is NULL. It results in a NULL pointer dereference in the caller. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | | KVM: PPC: e500: Fix some NULL dereferences on errorDan Carpenter2017-08-311-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some error paths in kvmppc_core_vcpu_create_e500() where we forget to set the error code. It means that we return ERR_PTR(0) which is NULL and it results in a NULL pointer dereference in the caller. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | Merge tag 'kvm-arm-for-v4.14' of ↵Radim Krčmář2017-09-0710-69/+125
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.14 Two minor cleanups and improvements, a fix for decoding external abort types from guests, and added support for migrating the active priority of interrupts when running a GICv2 guest on a GICv3 host.
| | * | | | KVM: arm/arm64: Support uaccess of GICC_APRnChristoffer Dall2017-09-052-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When migrating guests around we need to know the active priorities to ensure functional virtual interrupt prioritization by the GIC. This commit clarifies the API and how active priorities of interrupts in different groups are represented, and implements the accessor functions for the uaccess register range. We live with a slight layering violation in accessing GICv3 data structures from vgic-mmio-v2.c, because anything else just adds too much complexity for us to deal with (it's not like there's a benefit elsewhere in the code of an intermediate representation as is the case with the VMCR). We accept this, because while doing v3 processing from a file named something-v2.c can look strange at first, this really is specific to dealing with the user space interface for something that looks like a GICv2. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| | * | | | KVM: arm/arm64: Extract GICv3 max APRn index calculationChristoffer Dall2017-09-052-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we are about to access the APRs from the GICv2 uaccess interface, make this logic generally available. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| | * | | | KVM: arm/arm64: vITS: Drop its_ite->lpi fieldMarc Zyngier2017-09-051-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For unknown reasons, the its_ite data structure carries an "lpi" field which contains the intid of the LPI. This is an obvious duplication of the vgic_irq->intid field, so let's fix the only user and remove the now useless field. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| | * | | | KVM: arm/arm64: vgic: constify seq_operations and file_operationsArvind Yadav2017-09-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vgic_debug_seq_ops and file_operations are not supposed to change at runtime and none of the structures is modified. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| | * | | | KVM: arm/arm64: Fix guest external abort matchingJames Morse2017-09-054-40/+49
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ARM-ARM has two bits in the ESR/HSR relevant to external aborts. A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA' which provides: > an IMPLEMENTATION DEFINED classification of External Aborts. This bit is in addition to the {I,D}FSC range, and has an implementation defined meaning. KVM should always ignore this bit when handling external aborts from a guest. Remove the ESR_ELx_EA definition and rewrite its helper kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper. CC: Tyler Baicar <tbaicar@codeaurora.org> Reported-by: gengdongjiu <gengdj.1984@gmail.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| * | | | Merge tag 'kvm-s390-next-4.14-2' of ↵Radim Krčmář2017-09-0725-124/+597
| |\ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: Fixes and features for 4.14 - merge of topic branch tlb-flushing from the s390 tree to get the no-dat base features - merge of kvm/master to avoid conflicts with additional sthyi fixes - wire up the no-dat enhancements in KVM - multiple epoch facility (z14 feature) - Configuration z/Architecture Mode - more sthyi fixes - gdb server range checking fix - small code cleanups
| | * | | KVM: s390: vsie: cleanup mcck reinjectionDavid Hildenbrand2017-08-311-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The machine check information is part of the vsie_page. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170830160603.5452-4-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: use WARN_ON_ONCE only for checkingDavid Hildenbrand2017-08-311-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the real logic that always has to be executed out of the WARN_ON_ONCE. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170830160603.5452-3-david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: guestdbg: fix range checkDavid Hildenbrand2017-08-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looks like the "overflowing" range check is wrong. |=======b-------a=======| addr >= a || addr <= b Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170830160603.5452-2-david@redhat.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: we are always in czam modeDavid Hildenbrand2017-08-292-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Independent of the underlying hardware, kvm will now always handle SIGP SET ARCHITECTURE as if czam were enabled. Therefore, let's not only forward that bit but always set it. While at it, add a comment regarding STHYI. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20170829143108.14703-1-david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: expose no-DAT to guest and migration supportClaudio Imbrenda2017-08-294-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The STFLE bit 147 indicates whether the ESSA no-DAT operation code is valid, the bit is not normally provided to the host; the host is instead provided with an SCLP bit that indicates whether guests can support the feature. This patch: * enables the STFLE bit in the guest if the corresponding SCLP bit is present in the host. * adds support for migrating the no-DAT bit in the PGSTEs * fixes the software interpretation of the ESSA instruction that is used when migrating, both for the new operation code and for the old "set stable", as per specifications. Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: sthyi: remove invalid guest write accessHeiko Carstens2017-08-291-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | handle_sthyi() always writes to guest memory if the sthyi function code is zero in order to fault in the page that later is written to. However a function code of zero does not necessarily mean that a write to guest memory happens: if the KVM host is running as a second level guest under z/VM 6.2 the sthyi instruction is indicated to be available to the KVM host, however if the instruction is executed it will always return with a return code that indicates "unsupported function code". In such a case handle_sthyi() must not write to guest memory. This means that the prior write access to fault in the guest page may result in invalid guest exceptions, and/or invalid data modification. In order to be architecture compliant simply remove the write_guest() call. Given that the guest assumed a write access anyway, this fix does not qualify for -stable. This just makes sure the sthyi handler is architecture compliant. Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation") Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: Multiple Epoch Facility supportCollin L. Walling2017-08-297-2/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow for the enablement of MEF and the support for the extended epoch in SIE and VSIE for the extended guest TOD-Clock. A new interface is used for getting/setting a guest's extended TOD-Clock that uses a single ioctl invocation, KVM_S390_VM_TOD_EXT. Since the host time is a moving target that might see an epoch switch or STP sync checks we need an atomic ioctl and cannot use the exisiting two interfaces. The old method of getting and setting the guest TOD-Clock is still retained and is used when the old ioctls are called. Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Reviewed-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | KVM: s390: Support Configuration z/Architecture ModeJason J. Herne2017-08-282-19/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm has always supported the concept of starting in z/Arch mode so let's reflect the feature bit to the guest. Also, we change sigp set architecture to reject any request to change architecture modes. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | Merge tag 'kvm-s390-master-4.13-2' into kvms390/nextChristian Borntraeger2017-08-282-4/+11
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Additional fixes on top of these two - missing inline assembly constraint - wrong exception handling are necessary
| | * \ \ \ Merge branch 'tlb-flushing' of ↵Christian Borntraeger2017-07-2513-81/+411
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux into kernelorgnext base patches for tlb flushing optimizations
| * | | | | | kvm: nVMX: Validate the virtual-APIC address on nested VM-entryJim Mattson2017-08-251-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the SDM, if the "use TPR shadow" VM-execution control is 1, bits 11:0 of the virtual-APIC address must be 0 and the address should set any bits beyond the processor's physical-address width. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: nVMX: Fix trying to cancel vmlauch/vmresumeWanpeng Li2017-08-241-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ------------[ cut here ]------------ WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc4+ #11 RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel] Call Trace: ? kvm_multiple_exception+0x149/0x170 [kvm] ? handle_emulation_failure+0x79/0x230 [kvm] ? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel] ? check_chain_key+0x137/0x1e0 ? reexecute_instruction.part.168+0x130/0x130 [kvm] nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] ? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel] vmx_queue_exception+0x197/0x300 [kvm_intel] kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm] ? kvm_arch_vcpu_runnable+0x220/0x220 [kvm] ? preempt_count_sub+0x18/0xc0 ? restart_apic_timer+0x17d/0x300 [kvm] ? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm] ? kvm_arch_vcpu_load+0x1d8/0x350 [kvm] kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_vcpu_ioctl+0x4e4/0x910 [kvm] ? kvm_dev_ioctl+0xbe0/0xbe0 [kvm] The flag "nested_run_pending", which can override the decision of which should run next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending is especially intended to avoid switching to L1 in the injection decision-point. This can be handled just like the other cases in vmx_check_nested_events, instead of having a special case in vmx_queue_exception. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: X86: Fix loss of exception which has not yet been injectedWanpeng Li2017-08-245-32/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmx_complete_interrupts() assumes that the exception is always injected, so it can be dropped by kvm_clear_exception_queue(). However, an exception cannot be injected immediately if it is: 1) originally destined to a nested guest; 2) trapped to cause a vmexit; 3) happening right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true. This patch applies to exceptions the same algorithm that is used for NMIs, replacing exception.reinject with "exception.injected" (equivalent to nmi_injected). exception.pending now represents an exception that is queued and whose side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet. If exception.pending is true, the exception might result in a nested vmexit instead, too (in which case the side effects must not be applied). exception.injected instead represents an exception that is going to be injected into the guest at the next vmentry. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: VMX: use kvm_event_needs_reinjectionWanpeng Li2017-08-241-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use kvm_event_needs_reinjection() encapsulation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: MMU: speedup update_permission_bitmaskPaolo Bonzini2017-08-241-51/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | update_permission_bitmask currently does a 128-iteration loop to, essentially, compute a constant array. Computing the 8 bits in parallel reduces it to 16 iterations, and is enough to speed it up substantially because many boolean operations in the inner loop become constants or simplify noticeably. Because update_permission_bitmask is actually the top item in the profile for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand clock cycles, or up to 30%: before after cpuid 35173 25954 vmcall 35122 27079 inl_from_pmtimer 52635 42675 inl_from_qemu 53604 44599 inl_from_kernel 38498 30798 outl_to_kernel 34508 28816 wr_tsc_adjust_msr 34185 26818 rd_tsc_adjust_msr 37409 27049 mmio-no-eventfd:pci-mem 50563 45276 mmio-wildcard-eventfd:pci-mem 34495 30823 mmio-datamatch-eventfd:pci-mem 35612 31071 portio-no-eventfd:pci-io 44925 40661 portio-wildcard-eventfd:pci-io 29708 27269 portio-datamatch-eventfd:pci-io 31135 27164 (I wrote a small C program to compare the tables for all values of CR0.WP, CR4.SMAP and CR4.SMEP, and they match). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: MMU: Expose the LA57 feature to VM.Yu Zhang2017-08-247-36/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch exposes 5 level page table feature to the VM. At the same time, the canonical virtual address checking is extended to support both 48-bits and 57-bits address width. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: MMU: Add 5 level EPT & Shadow page table support.Yu Zhang2017-08-249-29/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extends the shadow paging code, so that 5 level shadow page table can be constructed if VM is running in 5 level paging mode. Also extends the ept code, so that 5 level ept table can be constructed if maxphysaddr of VM exceeds 48 bits. Unlike the shadow logic, KVM should still use 4 level ept table for a VM whose physical address width is less than 48 bits, even when the VM is running in 5 level paging mode. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> [Unconditionally reset the MMU context in kvm_cpuid_update. Changing MAXPHYADDR invalidates the reserved bit bitmasks. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL.Yu Zhang2017-08-245-23/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we have 4 level page table and 5 level page table in 64 bits long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL, then we can use PT64_ROOT_5LEVEL for 5 level page table, it's helpful to make the code more clear. Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just redefine it to 5 whenever a replacement is needed for 5 level paging. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: MMU: check guest CR3 reserved bits based on its physical address width.Yu Zhang2017-08-244-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the reserved bits in CR3. Yet the length of reserved bits in guest CR3 should be based on the physical address width exposed to the VM. This patch changes CR3 check logic to calculate the reserved bits at runtime. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: Add return value to kvm_cpuid().Yu Zhang2017-08-247-21/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return false in kvm_cpuid() when it fails to find the cpuid entry. Also, this routine(and its caller) is optimized with a new argument - check_limit, so that the check_cpuid_limit() fall back can be avoided. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | kvm: vmx: Raise #UD on unsupported XSAVES/XRSTORSPaolo Bonzini2017-08-241-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A guest may not be configured to support XSAVES/XRSTORS, even when the host does. If the guest does not support XSAVES/XRSTORS, clear the secondary execution control so that the processor will raise #UD. Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in the VMCS02. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | kvm: vmx: Raise #UD on unsupported RDSEEDJim Mattson2017-08-241-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A guest may not be configured to support RDSEED, even when the host does. If the guest does not support RDSEED, intercept the instruction and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting in the IA32_VMX_PROCBASED_CTLS2 MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | kvm: vmx: Raise #UD on unsupported RDRANDJim Mattson2017-08-241-1/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A guest may not be configured to support RDRAND, even when the host does. If the guest does not support RDRAND, intercept the instruction and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting in the IA32_VMX_PROCBASED_CTLS2 MSR. Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: VMX: cache secondary exec controlsPaolo Bonzini2017-08-241-46/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, secondary execution controls are divided in three groups: - static, depending mostly on the module arguments or the processor (vmx_secondary_exec_control) - static, depending on CPUID (vmx_cpuid_update) - dynamic, depending on nested VMX or local APIC state Because walking CPUID is expensive, prepare_vmcs02 is using only the first group. This however is unnecessarily complicated. Just cache the static secondary execution controls, and then prepare_vmcs02 does not need to compute them every time. Computation of all static secondary execution controls is now kept in a single function, vmx_compute_secondary_exec_control. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: SVM: Enable Virtual GIF featureJanakarajan Natarajan2017-08-232-6/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable the Virtual GIF feature. This is done by setting bit 25 at position 60h in the vmcb. With this feature enabled, the processor uses bit 9 at position 60h as the virtual GIF when executing STGI/CLGI instructions. Since the execution of STGI by the L1 hypervisor does not cause a return to the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window are modified. The IRQ window will be opened even if GIF is not set, under the assumption that on resuming the L1 hypervisor the IRQ will be held pending until the processor executes the STGI instruction. For the NMI window, the STGI intercept is set. This will assist in opening the window only when GIF=1. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: SVM: Add Virtual GIF feature definitionJanakarajan Natarajan2017-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new cpufeature definition for Virtual GIF. Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>