summaryrefslogtreecommitdiffstats
path: root/include/kvm (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-01-161-1/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...
| * KVM: arm64: vgic: Replace kernel.h with the necessary inclusionsAndy Shevchenko2022-01-041-1/+3
| | | | | | | | | | | | | | | | | | arm_vgic.h does not require all the stuff that kernel.h provides. Replace kernel.h inclusion with the list of what is really being used. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220104151940.55399-1-andriy.shevchenko@linux.intel.com
* | KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=ySean Christopherson2021-11-171-7/+12
|/ | | | | | | | | | | Move the definition of kvm_arm_pmu_available to pmu-emul.c and, out of "necessity", hide it behind CONFIG_HW_PERF_EVENTS. Provide a stub for the key's wrapper, kvm_arm_support_pmu_v3(). Moving the key's definition out of perf.c will allow a future commit to delete perf.c entirely. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211111020738.2512932-16-seanjc@google.com
* KVM: arm64: Fix PMU probe orderingMarc Zyngier2021-09-201-3/+0
| | | | | | | | | | | | | | | | | | | | | | Russell reported that since 5.13, KVM's probing of the PMU has started to fail on his HW. As it turns out, there is an implicit ordering dependency between the architectural PMU probing code and and KVM's own probing. If, due to probe ordering reasons, KVM probes before the PMU driver, it will fail to detect the PMU and prevent it from being advertised to guests as well as the VMM. Obviously, this is one probing too many, and we should be able to deal with any ordering. Add a callback from the PMU code into KVM to advertise the registration of a host CPU PMU, allowing for any probing order. Fixes: 5421db1be3b1 ("KVM: arm64: Divorce the perf code from oprofile helpers") Reported-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/YUYRKVflRtUytzy5@shell.armlinux.org.uk Cc: stable@vger.kernel.org
* KVM: arm64: vgic: Implement SW-driven deactivationMarc Zyngier2021-06-011-0/+10
| | | | | | | | | | | | | | | | In order to deal with these systems that do not offer HW-based deactivation of interrupts, let implement a SW-based approach: - When the irq is queued into a LR, treat it as a pure virtual interrupt and set the EOI flag in the LR. - When the interrupt state is read back from the LR, force a deactivation when the state is invalid (neither active nor pending) Interrupts requiring such treatment get the VGIC_SW_RESAMPLE flag. Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic: move irq->get_input_level into an ops structureMarc Zyngier2021-06-011-11/+17
| | | | | | | | | | | We already have the option to attach a callback to an interrupt to retrieve its pending state. As we are planning to expand this facility, move this callback into its own data structure. This will limit the size of individual interrupts as the ops structures can be shared across multiple interrupts. Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic: Let an interrupt controller advertise lack of HW deactivationMarc Zyngier2021-06-011-0/+3
| | | | | | | | | | | | | | The vGIC, as architected by ARM, allows a virtual interrupt to trigger the deactivation of a physical interrupt. This allows the following interrupt to be delivered without requiring an exit. However, some implementations have choosen not to implement this, meaning that we will need some unsavoury workarounds to deal with this. On detecting such a case, taint the kernel and spit a nastygram. We'll deal with this in later patches. Signed-off-by: Marc Zyngier <maz@kernel.org>
* Merge branch 'kvm-arm64/kill_oprofile_dependency' into kvmarm-master/nextMarc Zyngier2021-04-221-0/+4
|\ | | | | | | Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Divorce the perf code from oprofile helpersMarc Zyngier2021-04-221-0/+4
| | | | | | | | | | | | | | | | | | | | KVM/arm64 is the sole user of perf_num_counters(), and really could do without it. Stop using the obsolete API by relying on the existing probing code. Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20210414134409.1266357-2-maz@kernel.org
* | KVM: arm64: vgic-v3: Expose GICR_TYPER.Last for userspaceEric Auger2021-04-061-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | Commit 23bde34771f1 ("KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace") temporarily fixed a bug identified when attempting to access the GICR_TYPER register before the redistributor region setting, but dropped the support of the LAST bit. Emulating the GICR_TYPER.Last bit still makes sense for architecture compliance though. This patch restores its support (if the redistributor region was set) while keeping the code safe. We introduce a new helper, vgic_mmio_vcpu_rdist_is_last() which computes whether a redistributor is the highest one of a series of redistributor contributor pages. With this new implementation we do not need to have a uaccess read accessor anymore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20210405163941.510258-9-eric.auger@redhat.com
* KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static keyMarc Zyngier2021-03-061-2/+7
| | | | | | | | | | | | | | | | We currently find out about the presence of a HW PMU (or the handling of that PMU by perf, which amounts to the same thing) in a fairly roundabout way, by checking the number of counters available to perf. That's good enough for now, but we will soon need to find about about that on paths where perf is out of reach (in the world switch). Instead, let's turn kvm_arm_support_pmu_v3() into a static key. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Link: https://lore.kernel.org/r/20210209114844.3278746-2-maz@kernel.org Message-Id: <20210305185254.3730990-5-maz@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: arm64: Replace KVM_ARM_PMU with HW_PERF_EVENTSMarc Zyngier2021-01-041-1/+1
| | | | | | | KVM_ARM_PMU only existed for the benefit of 32bit ARM hosts, and makes no sense now that we are 64bit only. Get rid of it. Signed-off-by: Marc Zyngier <maz@kernel.org>
* Merge remote-tracking branch 'origin/kvm-arm64/misc-5.11' into ↵Marc Zyngier2020-12-041-0/+1
|\ | | | | | | | | | | kvmarm-master/queue Signed-off-by: Marc Zyngier <maz@kernel.org>
| * KVM: arm64: Delay the polling of the GICR_VPENDBASER.Dirty bitShenming Lu2020-11-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to reduce the impact of the VPT parsing happening on the GIC, we can split the vcpu reseidency in two phases: - programming GICR_VPENDBASER: this still happens in vcpu_load() - checking for the VPT parsing to be complete: this can happen on vcpu entry (in kvm_vgic_flush_hwstate()) This allows the GIC and the CPU to work in parallel, rewmoving some of the entry overhead. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Shenming Lu <lushenming@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201128141857.983-3-lushenming@huawei.com
* | KVM: arm64: Get rid of the PMU ready stateMarc Zyngier2020-11-271-3/+0
|/ | | | | | | The PMU ready state has no user left. Goodbye. Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2020-10-231-0/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "For x86, there is a new alternative and (in the future) more scalable implementation of extended page tables that does not need a reverse map from guest physical addresses to host physical addresses. For now it is disabled by default because it is still lacking a few of the existing MMU's bells and whistles. However it is a very solid piece of work and it is already available for people to hammer on it. Other updates: ARM: - New page table code for both hypervisor and guest stage-2 - Introduction of a new EL2-private host context - Allow EL2 to have its own private per-CPU variables - Support of PMU event filtering - Complete rework of the Spectre mitigation PPC: - Fix for running nested guests with in-kernel IRQ chip - Fix race condition causing occasional host hard lockup - Minor cleanups and bugfixes x86: - allow trapping unknown MSRs to userspace - allow userspace to force #GP on specific MSRs - INVPCID support on AMD - nested AMD cleanup, on demand allocation of nested SVM state - hide PV MSRs and hypercalls for features not enabled in CPUID - new test for MSR_IA32_TSC writes from host and guest - cleanups: MMU, CPUID, shared MSRs - LAPIC latency optimizations ad bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits) kvm: x86/mmu: NX largepage recovery for TDP MMU kvm: x86/mmu: Don't clear write flooding count for direct roots kvm: x86/mmu: Support MMIO in the TDP MMU kvm: x86/mmu: Support write protection for nesting in tdp MMU kvm: x86/mmu: Support disabling dirty logging for the tdp MMU kvm: x86/mmu: Support dirty logging for the TDP MMU kvm: x86/mmu: Support changed pte notifier in tdp MMU kvm: x86/mmu: Add access tracking for tdp_mmu kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU kvm: x86/mmu: Add TDP MMU PF handler kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg kvm: x86/mmu: Support zapping SPTEs in the TDP MMU KVM: Cache as_id in kvm_memory_slot kvm: x86/mmu: Add functions to handle changed TDP SPTEs kvm: x86/mmu: Allocate and free TDP MMU roots kvm: x86/mmu: Init / Uninit the TDP MMU kvm: x86/mmu: Introduce tdp_iter KVM: mmu: extract spte.h and spte.c KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp ...
| * KVM: arm64: Mask out filtered events in PCMEID{0,1}_EL1Marc Zyngier2020-09-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | As we can now hide events from the guest, let's also adjust its view of PCMEID{0,1}_EL1 so that it can figure out why some common events are not counting as they should. The astute user can still look into the TRM for their CPU and find out they've been cheated, though. Nobody's perfect. Signed-off-by: Marc Zyngier <maz@kernel.org>
* | KVM: arm64: pmu: Make overflow handler NMI safeJulien Thierry2020-09-281-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from NMI context, defer waking the vcpu to an irq_work queue. A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent running the irq_work for a non-existent vcpu by calling irq_work_sync() on the PMU destroy path. [Alexandru E.: Added irq_work_sync()] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox) Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> Cc: kvm@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com Signed-off-by: Will Deacon <will@kernel.org>
* KVM: arm64: timers: Move timer registers to the sys_regs fileMarc Zyngier2020-07-071-7/+4
| | | | | | | | | | | Move the timer gsisters to the sysreg file. This will further help when they are directly changed by a nesting hypervisor in the VNCR page. This requires moving the initialisation of the timer struct so that some of the helpers (such as arch_timer_ctx_index) can work correctly at an early stage. Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: timers: Rename kvm_timer_sync_hwstate to kvm_timer_sync_userMarc Zyngier2020-07-071-1/+1
| | | | | | | | | kvm_timer_sync_hwstate() has nothing to do with the timer HW state, but more to do with the state of a userspace interrupt controller. Change the suffix from _hwstate to_user, in keeping with the rest of the code. Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: vgic-v3: Take cpu_if pointer directly instead of vcpuChristoffer Dall2020-05-281-1/+4
| | | | | | | | | | | | | | | | | | | | | | | If we move the used_lrs field to the version-specific cpu interface structure, the following functions only operate on the struct vgic_v3_cpu_if and not the full vcpu: __vgic_v3_save_state __vgic_v3_restore_state __vgic_v3_activate_traps __vgic_v3_deactivate_traps __vgic_v3_save_aprs __vgic_v3_restore_aprs This is going to be very useful for nested virt, so move the used_lrs field and change the prototypes and implementations of these functions to take the cpu_if parameter directly. No functional change. Reviewed-by: James Morse <james.morse@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm64: GICv4.1: Allow SGIs to switch between HW and SW interruptsMarc Zyngier2020-03-241-0/+3
| | | | | | | | | | | | | | In order to let a guest buy in the new, active-less SGIs, we need to be able to switch between the two modes. Handle this by stopping all guest activity, transfer the state from one mode to the other, and resume the guest. Nothing calls this code so far, but a later patch will plug it into the MMIO emulation. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20200304203330.4967-20-maz@kernel.org
* irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layerMarc Zyngier2020-03-241-0/+1
| | | | | | | | | | | | | | In order to hide some of the differences between v4.0 and v4.1, move the doorbell management out of the KVM code, and into the GICv4-specific layer. This allows the calling code to ask for the doorbell when blocking, and otherwise to leave the doorbell permanently disabled. This matches the v4.1 code perfectly, and only results in a minor refactoring of the v4.0 code. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Link: https://lore.kernel.org/r/20200304203330.4967-14-maz@kernel.org
* Merge remote-tracking branch 'kvmarm/misc-5.5' into kvmarm/nextMarc Zyngier2019-11-081-5/+3
|\
| * KVM: arm/arm64: vgic: Fix some comments typoZenghui Yu2019-10-291-1/+1
| | | | | | | | | | | | | | | | | | Fix various comments, including wrong function names, grammar mistakes and specification references. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191029071919.177-3-yuzenghui@huawei.com
| * KVM: arm/arm64: vgic: Remove the declaration of kvm_send_userspace_msi()Zenghui Yu2019-10-291-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | The callsite of kvm_send_userspace_msi() is currently arch agnostic. There seems no reason to keep an extra declaration of it in arm_vgic.h (we already have one in include/linux/kvm_host.h). Remove it. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20191029071919.177-2-yuzenghui@huawei.com
| * KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/putMarc Zyngier2019-10-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
* | KVM: arm/arm64: Factor out hypercall handling from PSCI codeChristoffer Dall2019-10-212-1/+44
|/ | | | | | | | | | | | | | | We currently intertwine the KVM PSCI implementation with the general dispatch of hypercall handling, which makes perfect sense because PSCI is the only category of hypercalls we support. However, as we are about to support additional hypercalls, factor out this functionality into a separate hypercall handler file. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [steven.price@arm.com: rebased] Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm/arm64: vgic: Use a single IO device per redistributorEric Auger2019-08-251-1/+0
| | | | | | | | | | | | | | | At the moment we use 2 IO devices per GICv3 redistributor: one one for the RD_base frame and one for the SGI_base frame. Instead we can use a single IO device per redistributor (the 2 frames are contiguous). This saves slots on the KVM_MMIO_BUS which is currently limited to NR_IOBUS_DEVS (1000). This change allows to instantiate up to 512 redistributors and may speed the guest boot with a large number of VCPUs. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm/arm64: vgic: Add LPI translation cache definitionMarc Zyngier2019-08-181-0/+3
| | | | | | | | | | | Add the basic data structure that expresses an MSI to LPI translation as well as the allocation/release hooks. The size of the cache is arbitrarily defined as 16*nr_vcpus. Tested-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to blockMarc Zyngier2019-08-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier <maz@kernel.org>
* KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter indexZenghui Yu2019-07-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | We use "pmc->idx" and the "chained" bitmap to determine if the pmc is chained, in kvm_pmu_pmc_is_chained(). But idx might be uninitialized (and random) when we doing this decision, through a KVM_ARM_VCPU_INIT ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random idx will potentially hit a KASAN BUG [1]. In general, idx is the static property of a PMU counter that is not expected to be modified across resets, as suggested by Julien. It looks more reasonable if we can setup the PMU counter idx for a vcpu in its creation time. Introduce a new function - kvm_pmu_vcpu_init() for this basic setup. Oh, and the KASAN BUG will get fixed this way. [1] https://www.spinics.net/lists/kvm-arm/msg36700.html Fixes: 80f393a23be6 ("KVM: arm/arm64: Support chained PMU counters") Suggested-by: Andrew Murray <andrew.murray@arm.com> Suggested-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm/arm64: Support chained PMU countersAndrew Murray2019-07-051-0/+2
| | | | | | | | | | | | | | | | | | ARMv8 provides support for chained PMU counters, where an event type of 0x001E is set for odd-numbered counters, the event counter will increment by one for each overflow of the preceding even-numbered counter. Let's emulate this in KVM by creating a 64 bit perf counter when a user chains two emulated counters together. For chained events we only support generating an overflow interrupt on the high counter. We use the attributes of the low counter to determine the attributes of the perf event. Suggested-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm/arm64: Remove pmc->bitmaskAndrew Murray2019-07-051-1/+0
| | | | | | | | | | | | We currently use pmc->bitmask to determine the width of the pmc - however it's superfluous as the pmc index already describes if the pmc is a cycle counter or event counter. The architecture clearly describes the widths of these counters. Let's remove the bitmask to simplify the code. Signed-off-by: Andrew Murray <andrew.murray@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm/arm64: Rename kvm_pmu_{enable/disable}_counter functionsAndrew Murray2019-07-051-4/+4
| | | | | | | | | | | | The kvm_pmu_{enable/disable}_counter functions can enable/disable multiple counters at once as they operate on a bitmask. Let's make this clearer by renaming the function. Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Andrew Murray <andrew.murray@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234Thomas Gleixner2019-06-193-36/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 503 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Enrico Weigelt <info@metux.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190602204653.811534538@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 342Thomas Gleixner2019-06-051-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not see http www gnu org licenses extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 15 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000437.237481593@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 333Thomas Gleixner2019-06-051-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation inc 59 temple place suite 330 boston ma 02111 1307 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 136 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190530000436.384967451@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-03-151-19/+49
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
| * KVM: arm/arm64: Rework the timer code to use a timer_mapChristoffer Dall2019-02-191-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are currently emulating two timers in two different ways. When we add support for nested virtualization in the future, we are going to be emulating either two timers in two diffferent ways, or four timers in a single way. We need a unified data structure to keep track of how we map virtual state to physical state and we need to cleanup some of the timer code to operate more independently on a struct arch_timer_context instead of trying to consider the global state of the VCPU and recomputing all state. Co-written with Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| * KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systemsChristoffer Dall2019-02-191-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VHE systems don't have to emulate the physical timer, we can simply assign the EL1 physical timer directly to the VM as the host always uses the EL2 timers. In order to minimize the amount of cruft, AArch32 gets definitions for the physical timer too, but is should be generally unused on this architecture. Co-written with Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| * KVM: arm/arm64: timer: Rework data structures for multiple timersChristoffer Dall2019-02-191-20/+21
| | | | | | | | | | | | | | | | | | | | Prepare for having 4 timer data structures (2 for now). Move loaded to the cpu data structure and not the individual timer structure, in preparation for assigning the EL1 phys timer as well. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: consolidate arch timer trap handlersAndre Przywara2019-02-191-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment we have separate system register emulation handlers for each timer register. Actually they are quite similar, and we rely on kvm_arm_timer_[gs]et_reg() for the actual emulation anyways, so let's just merge all of those handlers into one function, which just marshalls the arguments and then hands off to a set of common accessors. This makes extending the emulation to include EL2 timers much easier. Signed-off-by: Andre Przywara <andre.przywara@arm.com> [Fixed 32-bit VM breakage and reduced to reworking existing code] Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [Fixed 32bit host, general cleanup] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: Simplify bg_timer programmingChristoffer Dall2019-02-191-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of calling into kvm_timer_[un]schedule from the main kvm blocking path, test if the VCPU is on the wait queue from the load/put path and perform the background timer setup/cancel in this path. This has the distinct advantage that we no longer race between load/put and schedule/unschedule and programming and canceling of the bg_timer always happens when the timer state is not loaded. Note that we must now remove the checks in kvm_timer_blocking that do not schedule a background timer if one of the timers can fire, because we no longer have a guarantee that kvm_vcpu_check_block() will be called before kvm_timer_blocking. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlockJulien Thierry2019-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | vgic_cpu->ap_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
* | KVM: arm/arm64: vgic: Make vgic_dist->lpi_list_lock a raw_spinlockJulien Thierry2019-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
* | KVM: arm/arm64: vgic: Make vgic_irq->irq_lock a raw_spinlockJulien Thierry2019-01-241-1/+1
|/ | | | | | | | | | | | | vgic_irq->irq_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
* KVM: arm/arm64: Remove arch timer workqueueChristoffer Dall2018-12-191-4/+0
| | | | | | | | | | | | | | The use of a work queue in the hrtimer expire function for the bg_timer is a leftover from the time when we would inject interrupts when the bg_timer expired. Since we are no longer doing that, we can instead call kvm_vcpu_wake_up() directly from the hrtimer function and remove all workqueue functionality from the arch timer code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm/arm64: vgic-v3: Add core support for Group0 SGIsMarc Zyngier2018-08-121-1/+1
| | | | | | | | | | | | | | | | | | | Although vgic-v3 now supports Group0 interrupts, it still doesn't deal with Group0 SGIs. As usually with the GIC, nothing is simple: - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1 with KVM (as per 8.1.10, Non-secure EL1 access) - ICC_SGI0R can only generate Group0 SGIs - ICC_ASGI1R sees its scope refocussed to generate only Group0 SGIs (as per the note at the bottom of Table 8-14) We only support Group1 SGIs so far, so no material change. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* KVM: arm/arm64: vgic: Let userspace opt-in to writable v2 IGROUPRChristoffer Dall2018-07-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Simply letting IGROUPR be writable from userspace would break migration from old kernels to newer kernels, because old kernels incorrectly report interrupt groups as group 1. This would not be a big problem if userspace wrote GICD_IIDR as read from the kernel, because we could detect the incompatibility and return an error to userspace. Unfortunately, this is not the case with current userspace implementations and simply letting IGROUPR be writable from userspace for an emulated GICv2 silently breaks migration and causes the destination VM to no longer run after migration. We now encourage userspace to write the read and expected value of GICD_IIDR as the first part of a GIC register restore, and if we observe a write to GICD_IIDR we know that userspace has been updated and has had a chance to cope with older kernels (VGICv2 IIDR.Revision == 0) incorrectly reporting interrupts as group 1, and therefore we now allow groups to be user writable. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>