| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
| |
When user space sets kvm_run->immediate_exit, KVM is supposed to
return quickly. However, when a vCPU is in KVM_MP_STATE_UNINITIALIZED,
the value is not considered and the vCPU blocks.
Fix that oversight.
Fixes: 460df4c1fc7c008 ("KVM: race-free exit from KVM_RUN without POSIX signals")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
KVM API says that KVM_RUN will return with -EINTR when a signal is
pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then
the return value is unconditionally -EAGAIN.
Copy over some code from vcpu_run(), so that the case of a pending
signal results in the expected return value.
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
|
|
|
|
|
| |
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Fixes: f6511935f424 ("KVM: SVM: Add checks for IO instructions")
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The commit
9dd21e104bc ('KVM: x86: simplify handling of PKRU')
removed all users and providers of that call-back, but
didn't remove it. Remove it now.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull KVM updates from Radim Krčmář:
"First batch of KVM changes for 4.14
Common:
- improve heuristic for boosting preempted spinlocks by ignoring
VCPUs in user mode
ARM:
- fix for decoding external abort types from guests
- added support for migrating the active priority of interrupts when
running a GICv2 guest on a GICv3 host
- minor cleanup
PPC:
- expose storage keys to userspace
- merge kvm-ppc-fixes with a fix that missed 4.13 because of
vacations
- fixes
s390:
- merge of kvm/master to avoid conflicts with additional sthyi fixes
- wire up the no-dat enhancements in KVM
- multiple epoch facility (z14 feature)
- Configuration z/Architecture Mode
- more sthyi fixes
- gdb server range checking fix
- small code cleanups
x86:
- emulate Hyper-V TSC frequency MSRs
- add nested INVPCID
- emulate EPTP switching VMFUNC
- support Virtual GIF
- support 5 level page tables
- speedup nested VM exits by packing byte operations
- speedup MMIO by using hardware provided physical address
- a lot of fixes and cleanups, especially nested"
* tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
KVM: arm/arm64: Support uaccess of GICC_APRn
KVM: arm/arm64: Extract GICv3 max APRn index calculation
KVM: arm/arm64: vITS: Drop its_ite->lpi field
KVM: arm/arm64: vgic: constify seq_operations and file_operations
KVM: arm/arm64: Fix guest external abort matching
KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd
KVM: s390: vsie: cleanup mcck reinjection
KVM: s390: use WARN_ON_ONCE only for checking
KVM: s390: guestdbg: fix range check
KVM: PPC: Book3S HV: Report storage key support to userspace
KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9
KVM: PPC: Book3S HV: Fix invalid use of register expression
KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER
KVM: PPC: e500mc: Fix a NULL dereference
KVM: PPC: e500: Fix some NULL dereferences on error
KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list
KVM: s390: we are always in czam mode
KVM: s390: expose no-DAT to guest and migration support
KVM: s390: sthyi: remove invalid guest write access
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
This fix was intended for 4.13, but didn't get in because both
maintainers were on vacation.
Paul Mackerras:
"It adds mutual exclusion between list_add_rcu and list_del_rcu calls
on the kvm->arch.spapr_tce_tables list. Without this, userspace could
potentially trigger corruption of the list and cause a host crash or
worse."
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Al Viro pointed out that while one thread of a process is executing
in kvm_vm_ioctl_create_spapr_tce(), another thread could guess the
file descriptor returned by anon_inode_getfd() and close() it before
the first thread has added it to the kvm->arch.spapr_tce_tables list.
That highlights a more general problem: there is no mutual exclusion
between writers to the spapr_tce_tables list, leading to the
possibility of the list becoming corrupted, which could cause a
host kernel crash.
To fix the mutual exclusion problem, we add a mutex_lock/unlock
pair around the list_del_rce in kvm_spapr_tce_release(). Also,
this moves the call to anon_inode_getfd() inside the region
protected by the kvm->lock mutex, after we have done the check for
a duplicate LIOBN. This means that if another thread does guess the
file descriptor and closes it, its call to kvm_spapr_tce_release()
will not do any harm because it will have to wait until the first
thread has released kvm->lock. With this, there are no failure
points in kvm_vm_ioctl_create_spapr_tce() after the call to
anon_inode_getfd().
The other things that the second thread could do with the guessed
file descriptor are to mmap it or to pass it as a parameter to a
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl on a KVM device fd. An mmap
call won't cause any harm because kvm_spapr_tce_mmap() and
kvm_spapr_tce_fault() don't access the spapr_tce_tables list or
the kvmppc_spapr_tce_table.list field, and the fields that they do use
have been properly initialized by the time of the anon_inode_getfd()
call.
The KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl calls
kvm_spapr_tce_attach_iommu_group(), which scans the spapr_tce_tables
list looking for the kvmppc_spapr_tce_table struct corresponding to
the fd given as the parameter. Either it will find the new entry
or it won't; if it doesn't, it just returns an error, and if it
does, it will function normally. So, in each case there is no
harmful effect.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
KVM/PPC update for 4.14
There are various minor fixes and cleanups. The only new feature is
that we now export information about storage key support to userspace,
so it can advertise it to the guest.
I have pulled in Michael Ellerman's topic/ppc-kvm branch from the
powerpc tree to get a couple of fixes that touch both KVM PPC code and
other PPC code. That's why there is some arch/powerpc stuff in the
diffstat that isn't arch/powerpc/kvm.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We do ctx = kzalloc(sizeof(*ctx), GFP_KERNEL) and then later on call
anon_inode_getfd(), but if that fails we don't free ctx, so that
memory gets leaked. To fix it, this adds kfree(ctx) in the failure
path.
Signed-off-by: nixiaoming <nixiaoming@huawei.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | |\ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This merges in the 'ppc-kvm' topic branch from the powerpc tree in
order to bring in some fixes which touch both powerpc and KVM code.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This adds information about storage keys to the struct returned by
the KVM_PPC_GET_SMMU_INFO ioctl. The new fields replace a pad field,
which was zeroed by previous kernel versions. Thus userspace that
knows about the new fields will see zeroes when running on an older
kernel, indicating that storage keys are not supported. The size of
the structure has not changed.
The number of keys is hard-coded for the CPUs supported by HV KVM,
which is just POWER7, POWER8 and POWER9.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit 2f2724630f7a ("KVM: PPC: Book3S HV: Cope with host using large
decrementer mode", 2017-05-22) added code to treat the hypervisor
decrementer (HDEC) as a 64-bit value on POWER9 rather than 32-bit.
Unfortunately, that commit missed one place where HDEC is treated
as a 32-bit value. This fixes it.
This bug should not have any user-visible consequences that I can
think of, beyond an occasional unnecessary exit to the host kernel.
If the hypervisor decrementer has gone negative, then the bottom
32 bits will be negative for about 4 seconds after that, so as
long as we get out of the guest within those 4 seconds we won't
conclude that the HDEC interrupt is spurious.
Reported-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Fixes: 2f2724630f7a ("KVM: PPC: Book3S HV: Cope with host using large decrementer mode")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
binutils >= 2.26 now warns about misuse of register expressions in
assembler operands that are actually literals. In this instance r0 is
being used where a literal 0 should be used.
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Split into separate KVM patch, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
KVM currently validates the size of the VPA registered by the client
against sizeof(struct lppaca), however we align (and therefore size)
that struct to 1kB to avoid crossing a 4kB boundary in the client.
PAPR calls for sizes >= 640 bytes to be accepted. Hard code this with
a comment.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In handling a H_ENTER hypercall, the code in kvmppc_do_h_enter
clobbers the high-order two bits of the storage key, which is stored
in a split field in the second doubleword of the HPTE. Any storage
key number above 7 hence fails to operate correctly.
This makes sure we preserve all the bits of the storage key.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We should set "err = -ENOMEM;", otherwise it means we're returning
ERR_PTR(0) which is NULL. It results in a NULL pointer dereference in
the caller.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There are some error paths in kvmppc_core_vcpu_create_e500() where we
forget to set the error code. It means that we return ERR_PTR(0) which
is NULL and it results in a NULL pointer dereference in the caller.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM Changes for v4.14
Two minor cleanups and improvements, a fix for decoding external abort
types from guests, and added support for migrating the active priority
of interrupts when running a GICv2 guest on a GICv3 host.
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When migrating guests around we need to know the active priorities to
ensure functional virtual interrupt prioritization by the GIC.
This commit clarifies the API and how active priorities of interrupts in
different groups are represented, and implements the accessor functions
for the uaccess register range.
We live with a slight layering violation in accessing GICv3 data
structures from vgic-mmio-v2.c, because anything else just adds too much
complexity for us to deal with (it's not like there's a benefit
elsewhere in the code of an intermediate representation as is the case
with the VMCR). We accept this, because while doing v3 processing from
a file named something-v2.c can look strange at first, this really is
specific to dealing with the user space interface for something that
looks like a GICv2.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
As we are about to access the APRs from the GICv2 uaccess interface,
make this logic generally available.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
For unknown reasons, the its_ite data structure carries an "lpi" field
which contains the intid of the LPI. This is an obvious duplication
of the vgic_irq->intid field, so let's fix the only user and remove
the now useless field.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
vgic_debug_seq_ops and file_operations are not supposed to change
at runtime and none of the structures is modified.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
| | |/ / /
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The ARM-ARM has two bits in the ESR/HSR relevant to external aborts.
A range of {I,D}FSC values (of which bit 5 is always set) and bit 9 'EA'
which provides:
> an IMPLEMENTATION DEFINED classification of External Aborts.
This bit is in addition to the {I,D}FSC range, and has an implementation
defined meaning. KVM should always ignore this bit when handling external
aborts from a guest.
Remove the ESR_ELx_EA definition and rewrite its helper
kvm_vcpu_dabt_isextabt() to check the {I,D}FSC range. This merges
kvm_vcpu_dabt_isextabt() and the recently added is_abort_sea() helper.
CC: Tyler Baicar <tbaicar@codeaurora.org>
Reported-by: gengdongjiu <gengdj.1984@gmail.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
|
| |\ \ \ \
| | |/ / /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux
KVM: s390: Fixes and features for 4.14
- merge of topic branch tlb-flushing from the s390 tree to get the
no-dat base features
- merge of kvm/master to avoid conflicts with additional sthyi fixes
- wire up the no-dat enhancements in KVM
- multiple epoch facility (z14 feature)
- Configuration z/Architecture Mode
- more sthyi fixes
- gdb server range checking fix
- small code cleanups
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The machine check information is part of the vsie_page.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-4-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Move the real logic that always has to be executed out of the
WARN_ON_ONCE.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-3-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Looks like the "overflowing" range check is wrong.
|=======b-------a=======|
addr >= a || addr <= b
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170830160603.5452-2-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Independent of the underlying hardware, kvm will now always handle
SIGP SET ARCHITECTURE as if czam were enabled. Therefore, let's not
only forward that bit but always set it.
While at it, add a comment regarding STHYI.
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170829143108.14703-1-david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The STFLE bit 147 indicates whether the ESSA no-DAT operation code is
valid, the bit is not normally provided to the host; the host is
instead provided with an SCLP bit that indicates whether guests can
support the feature.
This patch:
* enables the STFLE bit in the guest if the corresponding SCLP bit is
present in the host.
* adds support for migrating the no-DAT bit in the PGSTEs
* fixes the software interpretation of the ESSA instruction that is
used when migrating, both for the new operation code and for the old
"set stable", as per specifications.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
handle_sthyi() always writes to guest memory if the sthyi function
code is zero in order to fault in the page that later is written to.
However a function code of zero does not necessarily mean that a write
to guest memory happens: if the KVM host is running as a second level
guest under z/VM 6.2 the sthyi instruction is indicated to be
available to the KVM host, however if the instruction is executed it
will always return with a return code that indicates "unsupported
function code".
In such a case handle_sthyi() must not write to guest memory. This
means that the prior write access to fault in the guest page may
result in invalid guest exceptions, and/or invalid data modification.
In order to be architecture compliant simply remove the write_guest()
call.
Given that the guest assumed a write access anyway, this fix does not
qualify for -stable. This just makes sure the sthyi handler is
architecture compliant.
Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allow for the enablement of MEF and the support for the extended
epoch in SIE and VSIE for the extended guest TOD-Clock.
A new interface is used for getting/setting a guest's extended TOD-Clock
that uses a single ioctl invocation, KVM_S390_VM_TOD_EXT. Since the
host time is a moving target that might see an epoch switch or STP sync
checks we need an atomic ioctl and cannot use the exisiting two
interfaces. The old method of getting and setting the guest TOD-Clock is
still retained and is used when the old ioctls are called.
Signed-off-by: Collin L. Walling <walling@linux.vnet.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
kvm has always supported the concept of starting in z/Arch mode so let's
reflect the feature bit to the guest.
Also, we change sigp set architecture to reject any request to change
architecture modes.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
|
| | |\ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Additional fixes on top of these two
- missing inline assembly constraint
- wrong exception handling
are necessary
|
| | |\ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux into kernelorgnext
base patches for tlb flushing optimizations
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
According to the SDM, if the "use TPR shadow" VM-execution control is
1, bits 11:0 of the virtual-APIC address must be 0 and the address
should set any bits beyond the processor's physical-address width.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
------------[ cut here ]------------
WARNING: CPU: 7 PID: 3861 at /home/kernel/ssd/kvm/arch/x86/kvm//vmx.c:11299 nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
CPU: 7 PID: 3861 Comm: qemu-system-x86 Tainted: G W OE 4.13.0-rc4+ #11
RIP: 0010:nested_vmx_vmexit+0x176e/0x1980 [kvm_intel]
Call Trace:
? kvm_multiple_exception+0x149/0x170 [kvm]
? handle_emulation_failure+0x79/0x230 [kvm]
? load_vmcs12_host_state+0xa80/0xa80 [kvm_intel]
? check_chain_key+0x137/0x1e0
? reexecute_instruction.part.168+0x130/0x130 [kvm]
nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
? nested_vmx_inject_exception_vmexit+0xb7/0x100 [kvm_intel]
vmx_queue_exception+0x197/0x300 [kvm_intel]
kvm_arch_vcpu_ioctl_run+0x1b0c/0x2c90 [kvm]
? kvm_arch_vcpu_runnable+0x220/0x220 [kvm]
? preempt_count_sub+0x18/0xc0
? restart_apic_timer+0x17d/0x300 [kvm]
? kvm_lapic_restart_hv_timer+0x37/0x50 [kvm]
? kvm_arch_vcpu_load+0x1d8/0x350 [kvm]
kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
? kvm_vcpu_ioctl+0x4e4/0x910 [kvm]
? kvm_dev_ioctl+0xbe0/0xbe0 [kvm]
The flag "nested_run_pending", which can override the decision of which should run
next, L1 or L2. nested_run_pending=1 means that we *must* run L2 next, not L1. This
is necessary in particular when L1 did a VMLAUNCH of L2 and therefore expects L2 to
be run (and perhaps be injected with an event it specified, etc.). Nested_run_pending
is especially intended to avoid switching to L1 in the injection decision-point.
This can be handled just like the other cases in vmx_check_nested_events, instead of
having a special case in vmx_queue_exception.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
vmx_complete_interrupts() assumes that the exception is always injected,
so it can be dropped by kvm_clear_exception_queue(). However,
an exception cannot be injected immediately if it is: 1) originally
destined to a nested guest; 2) trapped to cause a vmexit; 3) happening
right after VMLAUNCH/VMRESUME, i.e. when nested_run_pending is true.
This patch applies to exceptions the same algorithm that is used for
NMIs, replacing exception.reinject with "exception.injected" (equivalent
to nmi_injected).
exception.pending now represents an exception that is queued and whose
side effects (e.g., update RFLAGS.RF or DR7) have not been applied yet.
If exception.pending is true, the exception might result in a nested
vmexit instead, too (in which case the side effects must not be applied).
exception.injected instead represents an exception that is going to be
injected into the guest at the next vmentry.
Reported-by: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Use kvm_event_needs_reinjection() encapsulation.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
update_permission_bitmask currently does a 128-iteration loop to,
essentially, compute a constant array. Computing the 8 bits in parallel
reduces it to 16 iterations, and is enough to speed it up substantially
because many boolean operations in the inner loop become constants or
simplify noticeably.
Because update_permission_bitmask is actually the top item in the profile
for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
clock cycles, or up to 30%:
before after
cpuid 35173 25954
vmcall 35122 27079
inl_from_pmtimer 52635 42675
inl_from_qemu 53604 44599
inl_from_kernel 38498 30798
outl_to_kernel 34508 28816
wr_tsc_adjust_msr 34185 26818
rd_tsc_adjust_msr 37409 27049
mmio-no-eventfd:pci-mem 50563 45276
mmio-wildcard-eventfd:pci-mem 34495 30823
mmio-datamatch-eventfd:pci-mem 35612 31071
portio-no-eventfd:pci-io 44925 40661
portio-wildcard-eventfd:pci-io 29708 27269
portio-datamatch-eventfd:pci-io 31135 27164
(I wrote a small C program to compare the tables for all values of CR0.WP,
CR4.SMAP and CR4.SMEP, and they match).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch exposes 5 level page table feature to the VM.
At the same time, the canonical virtual address checking is
extended to support both 48-bits and 57-bits address width.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Extends the shadow paging code, so that 5 level shadow page
table can be constructed if VM is running in 5 level paging
mode.
Also extends the ept code, so that 5 level ept table can be
constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
shadow logic, KVM should still use 4 level ept table for a VM
whose physical address width is less than 48 bits, even when
the VM is running in 5 level paging mode.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
[Unconditionally reset the MMU context in kvm_cpuid_update.
Changing MAXPHYADDR invalidates the reserved bit bitmasks.
- Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Now we have 4 level page table and 5 level page table in 64 bits
long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL,
then we can use PT64_ROOT_5LEVEL for 5 level page table, it's
helpful to make the code more clear.
Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just
redefine it to 5 whenever a replacement is needed for 5 level
paging.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently, KVM uses CR3_L_MODE_RESERVED_BITS to check the
reserved bits in CR3. Yet the length of reserved bits in
guest CR3 should be based on the physical address width
exposed to the VM. This patch changes CR3 check logic to
calculate the reserved bits at runtime.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Return false in kvm_cpuid() when it fails to find the cpuid
entry. Also, this routine(and its caller) is optimized with
a new argument - check_limit, so that the check_cpuid_limit()
fall back can be avoided.
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
A guest may not be configured to support XSAVES/XRSTORS, even when the host
does. If the guest does not support XSAVES/XRSTORS, clear the secondary
execution control so that the processor will raise #UD.
Also clear the "allowed-1" bit for XSAVES/XRSTORS exiting in the
IA32_VMX_PROCBASED_CTLS2 MSR, and pass through VMCS12's control in
the VMCS02.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
A guest may not be configured to support RDSEED, even when the host
does. If the guest does not support RDSEED, intercept the instruction
and synthesize #UD. Also clear the "allowed-1" bit for RDSEED exiting
in the IA32_VMX_PROCBASED_CTLS2 MSR.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
A guest may not be configured to support RDRAND, even when the host
does. If the guest does not support RDRAND, intercept the instruction
and synthesize #UD. Also clear the "allowed-1" bit for RDRAND exiting
in the IA32_VMX_PROCBASED_CTLS2 MSR.
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Currently, secondary execution controls are divided in three groups:
- static, depending mostly on the module arguments or the processor
(vmx_secondary_exec_control)
- static, depending on CPUID (vmx_cpuid_update)
- dynamic, depending on nested VMX or local APIC state
Because walking CPUID is expensive, prepare_vmcs02 is using only
the first group. This however is unnecessarily complicated. Just
cache the static secondary execution controls, and then prepare_vmcs02
does not need to compute them every time. Computation of all static
secondary execution controls is now kept in a single function,
vmx_compute_secondary_exec_control.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Enable the Virtual GIF feature. This is done by setting bit 25 at position
60h in the vmcb.
With this feature enabled, the processor uses bit 9 at position 60h as the
virtual GIF when executing STGI/CLGI instructions.
Since the execution of STGI by the L1 hypervisor does not cause a return to
the outermost (L0) hypervisor, the enable_irq_window and enable_nmi_window
are modified.
The IRQ window will be opened even if GIF is not set, under the assumption
that on resuming the L1 hypervisor the IRQ will be held pending until the
processor executes the STGI instruction.
For the NMI window, the STGI intercept is set. This will assist in opening
the window only when GIF=1.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Add a new cpufeature definition for Virtual GIF.
Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|