summaryrefslogtreecommitdiffstats
path: root/virt (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'kvmarm-fixes-for-5.1' of ↵Paolo Bonzini2019-03-285-72/+106
|\ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM fixes for 5.1 - Fix THP handling in the presence of pre-existing PTEs - Honor request for PTE mappings even when THPs are available - GICv4 performance improvement - Take the srcu lock when writing to guest-controlled ITS data structures - Reset the virtual PMU in preemptible context - Various cleanups
| * KVM: arm/arm64: Comments cleanup in mmu.cZenghui Yu2019-03-281-14/+9
| | | | | | | | | | | | | | | | | | | | Some comments in virt/kvm/arm/mmu.c are outdated. Update them to reflect the current state of the code. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> [maz: commit message tidy-up] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: vgic-its: Make attribute accessors staticYueHaibing2019-03-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix sparse warnings: arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1732:5: warning: symbol 'vgic_its_has_attr_regs' was not declared. Should it be static? arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-its.c:1753:5: warning: symbol 'vgic_its_attr_regs_access' was not declared. Should it be static? Signed-off-by: YueHaibing <yuehaibing@huawei.com> [maz: fixed subject] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: Fix handling of stage2 huge mappingsSuzuki K Poulose2019-03-201-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We rely on the mmu_notifier call backs to handle the split/merge of huge pages and thus we are guaranteed that, while creating a block mapping, either the entire block is unmapped at stage2 or it is missing permission. However, we miss a case where the block mapping is split for dirty logging case and then could later be made block mapping, if we cancel the dirty logging. This not only creates inconsistent TLB entries for the pages in the the block, but also leakes the table pages for PMD level. Handle this corner case for the huge mappings at stage2 by unmapping the non-huge mapping for the block. This could potentially release the upper level table. So we need to restart the table walk once we unmap the range. Fixes : ad361f093c1e31d ("KVM: ARM: Support hugetlbfs backed huge pages") Reported-by: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zheng Xiang <zhengxiang9@huawei.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: Enforce PTE mappings at stage2 when neededSuzuki K Poulose2019-03-191-22/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") made the checks to skip huge mappings, stricter. However it introduced a bug where we still use huge mappings, ignoring the flag to use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE. Also, the checks do not cover the PUD huge pages, that was under review during the same period. This patch fixes both the issues. Fixes : 6794ad5443a2118 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings") Reported-by: Zenghui Yu <yuzenghui@huawei.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: vgic-its: Take the srcu lock when parsing the memslotsMarc Zyngier2019-03-191-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling kvm_is_visible_gfn() implies that we're parsing the memslots, and doing this without the srcu lock is frown upon: [12704.164532] ============================= [12704.164544] WARNING: suspicious RCU usage [12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty #16 Tainted: G W [12704.164573] ----------------------------- [12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [12704.164602] other info that might help us debug this: [12704.164616] rcu_scheduler_active = 2, debug_locks = 1 [12704.164631] 6 locks held by qemu-system-aar/13968: [12704.164644] #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [12704.164691] #1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [12704.164726] #2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164761] #3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164794] #4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164827] #5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164861] stack backtrace: [12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G W 5.1.0-rc1-00008-g600025238f51-dirty #16 [12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [12704.164896] Call trace: [12704.164910] dump_backtrace+0x0/0x138 [12704.164920] show_stack+0x24/0x30 [12704.164934] dump_stack+0xbc/0x104 [12704.164946] lockdep_rcu_suspicious+0xcc/0x110 [12704.164958] gfn_to_memslot+0x174/0x190 [12704.164969] kvm_is_visible_gfn+0x28/0x70 [12704.164980] vgic_its_check_id.isra.0+0xec/0x1e8 [12704.164991] vgic_its_save_tables_v0+0x1ac/0x330 [12704.165001] vgic_its_set_attr+0x298/0x3a0 [12704.165012] kvm_device_ioctl_attr+0x9c/0xd8 [12704.165022] kvm_device_ioctl+0x8c/0xf8 [12704.165035] do_vfs_ioctl+0xc8/0x960 [12704.165045] ksys_ioctl+0x8c/0xa0 [12704.165055] __arm64_sys_ioctl+0x28/0x38 [12704.165067] el0_svc_common+0xd8/0x138 [12704.165078] el0_svc_handler+0x38/0x78 [12704.165089] el0_svc+0x8/0xc Make sure the lock is taken when doing this. Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: vgic-its: Take the srcu lock when writing to guest memoryMarc Zyngier2019-03-192-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When halting a guest, QEMU flushes the virtual ITS caches, which amounts to writing to the various tables that the guest has allocated. When doing this, we fail to take the srcu lock, and the kernel shouts loudly if running a lockdep kernel: [ 69.680416] ============================= [ 69.680819] WARNING: suspicious RCU usage [ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted [ 69.682096] ----------------------------- [ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [ 69.683225] [ 69.683225] other info that might help us debug this: [ 69.683225] [ 69.683975] [ 69.683975] rcu_scheduler_active = 2, debug_locks = 1 [ 69.684598] 6 locks held by qemu-system-aar/4097: [ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.690729] [ 69.690729] stack backtrace: [ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18 [ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [ 69.692831] Call trace: [ 69.694072] lockdep_rcu_suspicious+0xcc/0x110 [ 69.694490] gfn_to_memslot+0x174/0x190 [ 69.694853] kvm_write_guest+0x50/0xb0 [ 69.695209] vgic_its_save_tables_v0+0x248/0x330 [ 69.695639] vgic_its_set_attr+0x298/0x3a0 [ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8 [ 69.696424] kvm_device_ioctl+0x8c/0xf8 [ 69.696788] do_vfs_ioctl+0xc8/0x960 [ 69.697128] ksys_ioctl+0x8c/0xa0 [ 69.697445] __arm64_sys_ioctl+0x28/0x38 [ 69.697817] el0_svc_common+0xd8/0x138 [ 69.698173] el0_svc_handler+0x38/0x78 [ 69.698528] el0_svc+0x8/0xc The fix is to obviously take the srcu lock, just like we do on the read side of things since bf308242ab98. One wonders why this wasn't fixed at the same time, but hey... Fixes: bf308242ab98 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is enabledMarc Zyngier2019-03-192-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | The normal interrupt flow is not to enable the vgic when no virtual interrupt is to be injected (i.e. the LRs are empty). But when a guest is likely to use GICv4 for LPIs, we absolutely need to switch it on at all times. Otherwise, VLPIs only get delivered when there is something in the LRs, which doesn't happen very often. Reported-by: Nianyao Tang <tangnianyao@huawei.com> Tested-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
* | kvm: don't redefine flags as something elseSebastian Andrzej Siewior2019-03-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The function irqfd_wakeup() has flags defined as __poll_t and then it has additional flags which is used for irqflags. Redefine the inner flags variable as iflags so it does not shadow the outer flags. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: Reject device ioctls from processes other than the VM's creatorSean Christopherson2019-03-281-0/+3
|/ | | | | | | | | | | | | | KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57dc7f ("kvm: add device control API") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-03-1511-295/+631
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - some cleanups - direct physical timer assignment - cache sanitization for 32-bit guests s390: - interrupt cleanup - introduction of the Guest Information Block - preparation for processor subfunctions in cpu models PPC: - bug fixes and improvements, especially related to machine checks and protection keys x86: - many, many cleanups, including removing a bunch of MMU code for unnecessary optimizations - AVIC fixes Generic: - memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits) kvm: vmx: fix formatting of a comment KVM: doc: Document the life cycle of a VM and its resources MAINTAINERS: Add KVM selftests to existing KVM entry Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char() KVM: PPC: Fix compilation when KVM is not enabled KVM: Minor cleanups for kvm_main.c KVM: s390: add debug logging for cpu model subfunctions KVM: s390: implement subfunction processor calls arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2 KVM: arm/arm64: Remove unused timer variable KVM: PPC: Book3S: Improve KVM reference counting KVM: PPC: Book3S HV: Fix build failure without IOMMU support Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()" x86: kvmguest: use TSC clocksource if invariant TSC is exposed KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes() KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children ...
| * Merge tag 'kvmarm-for-v5.1' of ↵Paolo Bonzini2019-02-226-237/+566
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/arm updates for Linux v5.1 - A number of pre-nested code rework - Direct physical timer assignment on VHE systems - kvm_call_hyp type safety enforcement - Set/Way cache sanitisation for 32bit guests - Build system cleanups - A bunch of janitorial fixes
| | * KVM: arm/arm64: Remove unused timer variableShaokun Zhang2019-02-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'timer' local variable became unused after commit bee038a67487 ("KVM: arm/arm64: Rework the timer code to use a timer_map"). Remove it to avoid [-Wunused-but-set-variable] warning. Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Pouloze <suzuki.poulose@arm.com> Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Remove unused gpa_end variableShaokun Zhang2019-02-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | The 'gpa_end' local variable is never used and let's remove it. Cc: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: fix spelling mistake: "auxilary" -> "auxiliary"Colin Ian King2019-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | There is a spelling mistake in a kvm_err error message. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Fix TRACE_INCLUDE_PATHMasahiro Yamada2019-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the comment block in include/trace/define_trace.h says, TRACE_INCLUDE_PATH should be a relative path to the define_trace.h ../../virt/kvm/arm is the correct relative path. ../../../virt/kvm/arm is working by coincidence because the top Makefile adds -I$(srctree)/arch/$(SRCARCH)/include as a header search path, but we should not rely on it. Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: arch_timer: Mark physical interrupt active when a virtual ↵Marc Zyngier2019-02-191-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | interrupt is pending When a guest gets scheduled, KVM performs a "load" operation, which for the timer includes evaluating the virtual "active" state of the interrupt, and replicating it on the physical side. This ensures that the deactivation in the guest will also take place in the physical GIC distributor. If the interrupt is not yet active, we flag it as inactive on the physical side. This means that on restoring the timer registers, if the timer has expired, we'll immediately take an interrupt. That's absolutely fine, as the interrupt will then be flagged as active on the physical side. What this assumes though is that we'll enter the guest right after having taken the interrupt, and that the guest will quickly ACK the interrupt, making it active at on the virtual side. It turns out that quite often, this assumption doesn't really hold. The guest may be preempted on the back on this interrupt, either from kernel space or whilst running at EL1 when a host interrupt fires. When this happens, we repeat the whole sequence on the next load (interrupt marked as inactive, timer registers restored, interrupt fires). And if it takes a really long time for a guest to activate the interrupt (as it does with nested virt), we end-up with many such events in quick succession, leading to the guest only making very slow progress. This can also be seen with the number of virtual timer interrupt on the host being far greater than the same number in the guest. An easy way to fix this is to evaluate the timer state when performing the "load" operation, just like we do when the interrupt actually fires. If the timer has a pending virtual interrupt at this stage, then we can safely flag the physical interrupt as being active, which prevents spurious exits. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Move kvm_is_write_fault to header fileChristoffer Dall2019-02-191-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | Move this little function to the header files for arm/arm64 so other code can make use of it directly. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Rework the timer code to use a timer_mapChristoffer Dall2019-02-192-135/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are currently emulating two timers in two different ways. When we add support for nested virtualization in the future, we are going to be emulating either two timers in two diffferent ways, or four timers in a single way. We need a unified data structure to keep track of how we map virtual state to physical state and we need to cleanup some of the timer code to operate more independently on a struct arch_timer_context instead of trying to consider the global state of the VCPU and recomputing all state. Co-written with Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| | * KVM: arm/arm64: arch_timer: Assign the phys timer on VHE systemsChristoffer Dall2019-02-191-49/+170
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VHE systems don't have to emulate the physical timer, we can simply assign the EL1 physical timer directly to the VM as the host always uses the EL2 timers. In order to minimize the amount of cruft, AArch32 gets definitions for the physical timer too, but is should be generally unused on this architecture. Co-written with Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| | * KVM: arm/arm64: timer: Rework data structures for multiple timersChristoffer Dall2019-02-191-28/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prepare for having 4 timer data structures (2 for now). Move loaded to the cpu data structure and not the individual timer structure, in preparation for assigning the EL1 phys timer as well. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: consolidate arch timer trap handlersAndre Przywara2019-02-191-17/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment we have separate system register emulation handlers for each timer register. Actually they are quite similar, and we rely on kvm_arm_timer_[gs]et_reg() for the actual emulation anyways, so let's just merge all of those handlers into one function, which just marshalls the arguments and then hands off to a set of common accessors. This makes extending the emulation to include EL2 timers much easier. Signed-off-by: Andre Przywara <andre.przywara@arm.com> [Fixed 32-bit VM breakage and reduced to reworking existing code] Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> [Fixed 32bit host, general cleanup] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm64: Fix ICH_ELRSR_EL2 sysreg namingMarc Zyngier2019-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | We previously incorrectly named the define for this system register. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| | * KVM: arm/arm64: Simplify bg_timer programmingChristoffer Dall2019-02-192-23/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of calling into kvm_timer_[un]schedule from the main kvm blocking path, test if the VCPU is on the wait queue from the load/put path and perform the background timer setup/cancel in this path. This has the distinct advantage that we no longer race between load/put and schedule/unschedule and programming and canceling of the bg_timer always happens when the timer state is not loaded. Note that we must now remove the checks in kvm_timer_blocking that do not schedule a background timer if one of the timers can fire, because we no longer have a guarantee that kvm_vcpu_check_block() will be called before kvm_timer_blocking. Reported-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * KVM: arm/arm64: Factor out VMID into struct kvm_vmidChristoffer Dall2019-02-192-37/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for nested virtualization where we are going to have more than a single VMID per VM, let's factor out the VMID data into a separate VMID data structure and change the VMID allocator to operate on this new structure instead of using a struct kvm. This also means that udate_vttbr now becomes update_vmid, and that the vttbr itself is generated on the fly based on the stage 2 page table base address and the vmid. We cache the physical address of the pgd when allocating the pgd to avoid doing the calculation on every entry to the guest and to avoid calling into potentially non-hyp-mapped code from hyp/EL2. If we wanted to merge the VMID allocator with the arm64 ASID allocator at some point in the future, it should actually become easier to do that after this patch. Note that to avoid mapping the kvm_vmid_bits variable into hyp, we simply forego the masking of the vmid value in kvm_get_vttbr and rely on update_vmid to always assign a valid vmid value (within the supported range). Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> [maz: minor cleanups] Reviewed-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| | * arm/arm64: KVM: Statically configure the host's view of MPIDRMarc Zyngier2019-02-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently eagerly save/restore MPIDR. It turns out to be slightly pointless: - On the host, this value is known as soon as we're scheduled on a physical CPU - In the guest, this value cannot change, as it is set by KVM (and this is a read-only register) The result of the above is that we can perfectly avoid the eager saving of MPIDR_EL1, and only keep the restore. We just have to setup the host contexts appropriately at boot time. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| | * arm/arm64: KVM: Introduce kvm_call_hyp_ret()Marc Zyngier2019-02-192-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, we haven't differentiated between HYP calls that have a return value and those who don't. As we're about to change this, introduce kvm_call_hyp_ret(), and change all call sites that actually make use of a return value. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| * | KVM: Minor cleanups for kvm_main.cLeo Yan2019-02-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains two minor cleanups: firstly it puts exported symbol for kvm_io_bus_write() by following the function definition; secondly it removes a redundant blank line. Signed-off-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"Lan Tianyu2019-02-201-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of "dirty_bitmap[i]" is already check before setting its value to mask. The following check of "mask" is redundant. The check of "mask" was introduced by commit 58d2930f4ee3 ("KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"), revert it. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_startNir Weiner2019-02-201-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | grow_halt_poll_ns() have a strange behaviour in case (vcpu->halt_poll_ns != 0) && (vcpu->halt_poll_ns < halt_poll_ns_grow_start). In this case, vcpu->halt_poll_ns will be multiplied by grow factor (halt_poll_ns_grow) which will require several grow iteration in order to reach a value bigger than halt_poll_ns_grow_start. This means that growing vcpu->halt_poll_ns from value of 0 is slower than growing it from a positive value less than halt_poll_ns_grow_start. Which is misleading and inaccurate. Fix issue by changing grow_halt_poll_ns() to set vcpu->halt_poll_ns to halt_poll_ns_grow_start in any case that (vcpu->halt_poll_ns < halt_poll_ns_grow_start). Regardless if vcpu->halt_poll_ns is 0. use READ_ONCE to get a consistent number for all cases. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Nir Weiner <nir.weiner@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameterNir Weiner2019-02-201-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hard-coded value 10000 in grow_halt_poll_ns() stands for the initial start value when raising up vcpu->halt_poll_ns. It actually sets the first timeout to the first polling session. This value has significant effect on how tolerant we are to outliers. On the standard case, higher value is better - we will spend more time in the polling busyloop, handle events/interrupts faster and result in better performance. But on outliers it puts us in a busy loop that does nothing. Even if the shrink factor is zero, we will still waste time on the first iteration. The optimal value changes between different workloads. It depends on outliers rate and polling sessions length. As this value has significant effect on the dynamic halt-polling algorithm, it should be configurable and exposed. Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Nir Weiner <nir.weiner@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_nsNir Weiner2019-02-201-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | grow_halt_poll_ns() have a strange behavior in case (halt_poll_ns_grow == 0) && (vcpu->halt_poll_ns != 0). In this case, vcpu->halt_pol_ns will be set to zero. That results in shrinking instead of growing. Fix issue by changing grow_halt_poll_ns() to not modify vcpu->halt_poll_ns in case halt_poll_ns_grow is zero Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Nir Weiner <nir.weiner@oracle.com> Suggested-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Move the memslot update in-progress flag to bit 63Sean Christopherson2019-02-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ...now that KVM won't explode by moving it out of bit 0. Using bit 63 eliminates the need to jump over bit 0, e.g. when calculating a new memslots generation or when propagating the memslots generation to an MMIO spte. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Remove the hack to trigger memslot generation wraparoundSean Christopherson2019-02-201-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 captures a subset of the memslot generation (19 bits) in its MMIO sptes so that it can expedite emulated MMIO handling by checking only the releveant spte, i.e. doesn't need to do a full page fault walk. Because the MMIO sptes capture only 19 bits (due to limited space in the sptes), there is a non-zero probability that the MMIO generation could wrap, e.g. after 500k memslot updates. Since normal usage is extremely unlikely to result in 500k memslot updates, a hack was added by commit 69c9ea93eaea ("KVM: MMU: init kvm generation close to mmio wrap-around value") to offset the MMIO generation in order to trigger a wraparound, e.g. after 150 memslot updates. When separate memslot generation sequences were assigned to each address space, commit 00f034a12fdd ("KVM: do not bias the generation number in kvm_current_mmio_generation") moved the offset logic into the initialization of the memslot generation itself so that the per-address space bit(s) were not dropped/corrupted by the MMIO shenanigans. Remove the offset hack for three reasons: - While it does exercise x86's kvm_mmu_invalidate_mmio_sptes(), simply wrapping the generation doesn't actually test the interesting case of having stale MMIO sptes with the new generation number, e.g. old sptes with a generation number of 0. - Triggering kvm_mmu_invalidate_mmio_sptes() prematurely makes its performance rather important since the probability of invalidating MMIO sptes jumps from "effectively never" to "fairly likely". This limits what can be done in future patches, e.g. to simplify the invalidation code, as doing so without proper caution could lead to a noticeable performance regression. - Forcing the memslots generation, which is a 64-bit number, to wrap prevents KVM from assuming the memslots generation will never wrap. This in turn prevents KVM from using an arbitrary bit for the "update in-progress" flag, e.g. using bit 63 would immediately collide with using a large value as the starting generation number. The "update in-progress" flag is effectively forced into bit 0 so that it's (subtly) taken into account when incrementing the generation. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Explicitly define the "memslot update in-progress" bitSean Christopherson2019-02-201-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM uses bit 0 of the memslots generation as an "update in-progress" flag, which is used by x86 to prevent caching MMIO access while the memslots are changing. Although the intended behavior is flag-like, e.g. MMIO sptes intentionally drop the in-progress bit so as to avoid caching data from in-flux memslots, the implementation oftentimes treats the bit as part of the generation number itself, e.g. incrementing the generation increments twice, once to set the flag and once to clear it. Prior to commit 4bd518f1598d ("KVM: use separate generations for each address space"), incorporating the "update in-progress" bit into the generation number largely made sense, e.g. "real" generations are even, "bogus" generations are odd, most code doesn't need to be aware of the bit, etc... Now that unique memslots generation numbers are assigned to each address space, stealthing the in-progress status into the generation number results in a wide variety of subtle code, e.g. kvm_create_vm() jumps over bit 0 when initializing the memslots generation without any hint as to why. Explicitly define the flag and convert as much code as possible (which isn't much) to actually treat it like a flag. This paves the way for eventually using a different bit for "update in-progress" so that it can be a flag in truth instead of a awkward extension to the generation number. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Call kvm_arch_memslots_updated() before updating memslotsSean Christopherson2019-02-202-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | kvm: Add memcg accounting to KVM allocationsBen Gardon2019-02-205-22/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. If the allocations aren't tied to the process, the OOM killer will not know that killing the process will free the associated kernel memory. Add __GFP_ACCOUNT flags to many of the allocations which are not yet being charged to the VM process's cgroup. Tested: Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch introduced no new failures. Ran a kernel memory accounting test which creates a VM to touch memory and then checks that the kernel memory allocated for the process is within certain bounds. With this patch we account for much more of the vmalloc and slab memory allocated for the VM. There remain a few allocations which should be charged to the VM's cgroup but are not. In they include: vcpu->run kvm->coalesced_mmio_ring There allocations are unaccounted in this patch because they are mapped to userspace, and accounting them to a cgroup causes problems. This should be addressed in a future patch. Signed-off-by: Ben Gardon <bgardon@google.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | kvm: Use struct_size() in kmalloc()Gustavo A. R. Silva2019-02-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge tag 'acpi-5.1-rc1' of ↵Linus Torvalds2019-03-061-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI updates from Rafael Wysocki: "These are ACPICA updates including ACPI 6.3 support among other things, APEI updates including the ARM Software Delegated Exception Interface (SDEI) support, ACPI EC driver fixes and cleanups and other assorted improvements. Specifics: - Update the ACPICA code in the kernel to upstream revision 20190215 including ACPI 6.3 support and more: * New predefined methods: _NBS, _NCH, _NIC, _NIH, and _NIG (Erik Schmauss). * Update of the PCC Identifier structure in PDTT (Erik Schmauss). * Support for new Generic Affinity Structure subtable in SRAT (Erik Schmauss). * New PCC operation region support (Erik Schmauss). * Support for GICC statistical profiling for MADT (Erik Schmauss). * New Error Disconnect Recover notification support (Erik Schmauss). * New PPTT Processor Structure Flags fields support (Erik Schmauss). * ACPI 6.3 HMAT updates (Erik Schmauss). * GTDT Revision 3 support (Erik Schmauss). * Legacy module-level code (MLC) support removal (Erik Schmauss). * Update/clarification of messages for control method failures (Bob Moore). * Warning on creation of a zero-length opregion (Bob Moore). * acpiexec option to dump extra info for memory leaks (Bob Moore). * More ACPI error to firmware error conversions (Bob Moore). * Debugger fix (Bob Moore). * Copyrights update (Bob Moore) - Clean up sleep states support code in ACPICA (Christoph Hellwig) - Rework in_nmi() handling in the APEI code and add suppor for the ARM Software Delegated Exception Interface (SDEI) to it (James Morse) - Fix possible out-of-bounds accesses in BERT-related core (Ross Lagerwall) - Fix the APEI code parsing HEST that includes a Deferred Machine Check subtable (Yazen Ghannam) - Use DEFINE_DEBUGFS_ATTRIBUTE for APEI-related debugfs files (YueHaibing) - Switch the APEI ERST code to the new generic UUID API (Andy Shevchenko) - Update the MAINTAINERS entry for APEI (Borislav Petkov) - Fix and clean up the ACPI EC driver (Rafael Wysocki, Zhang Rui) - Fix DMI checks handling in the ACPI backlight driver and add the "Lunch Box" chassis-type check to it (Hans de Goede) - Add support for using ACPI table overrides included in built-in initrd images (Shunyong Yang) - Update ACPI device enumeration to treat the PWM2 device as "always present" on Lenovo Yoga Book (Yauhen Kharuzhy) - Fix up the enumeration of device objects with the PRP0001 device ID (Andy Shevchenko) - Clean up PPTT parsing error messages (John Garry) - Clean up debugfs files creation handling (Greg Kroah-Hartman, Rafael Wysocki) - Clean up the ACPI DPTF Makefile (Masahiro Yamada)" * tag 'acpi-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) ACPI / bus: Respect PRP0001 when retrieving device match data ACPICA: Update version to 20190215 ACPI/ACPICA: Trivial: fix spelling mistakes and fix whitespace formatting ACPICA: ACPI 6.3: add GTDT Revision 3 support ACPICA: ACPI 6.3: HMAT updates ACPICA: ACPI 6.3: PPTT add additional fields in Processor Structure Flags ACPICA: ACPI 6.3: add Error Disconnect Recover Notification value ACPICA: ACPI 6.3: MADT: add support for statistical profiling in GICC ACPICA: ACPI 6.3: add PCC operation region support for AML interpreter efi: cper: Fix possible out-of-bounds access ACPI: APEI: Fix possible out-of-bounds access to BERT region ACPICA: ACPI 6.3: SRAT: add Generic Affinity Structure subtable ACPICA: ACPI 6.3: Add Trigger order to PCC Identifier structure in PDTT ACPICA: ACPI 6.3: Adding predefined methods _NBS, _NCH, _NIC, _NIH, and _NIG ACPICA: Update/clarify messages for control method failures ACPICA: Debugger: Fix possible fault with the "test objects" command ACPICA: Interpreter: Emit warning for creation of a zero-length op region ACPICA: Remove legacy module-level code support ACPI / x86: Make PWM2 device always present at Lenovo Yoga Book ACPI / video: Extend chassis-type detection with a "Lunch Box" check ..
| * \ \ Merge branch 'acpi-apei'Rafael J. Wysocki2019-03-041-2/+2
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-apei: (29 commits) efi: cper: Fix possible out-of-bounds access ACPI: APEI: Fix possible out-of-bounds access to BERT region MAINTAINERS: Add James Morse to the list of APEI reviewers ACPI / APEI: Add support for the SDEI GHES Notification type firmware: arm_sdei: Add ACPI GHES registration helper ACPI / APEI: Use separate fixmap pages for arm64 NMI-like notifications ACPI / APEI: Only use queued estatus entry during in_nmi_queue_one_entry() ACPI / APEI: Split ghes_read_estatus() to allow a peek at the CPER length ACPI / APEI: Make GHES estatus header validation more user friendly ACPI / APEI: Pass ghes and estatus separately to avoid a later copy ACPI / APEI: Let the notification helper specify the fixmap slot ACPI / APEI: Move locking to the notification helper arm64: KVM/mm: Move SEA handling behind a single 'claim' interface KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors ACPI / APEI: Generalise the estatus queue's notify code ACPI / APEI: Don't update struct ghes' flags in read/clear estatus ACPI / APEI: Remove spurious GHES_TO_CLEAR check ...
| | * | | KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbingJames Morse2019-02-071-2/+2
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To split up APEIs in_nmi() path, the caller needs to always be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing out into a header file. Currently guest synchronous external aborts are claimed as RAS notifications by handle_guest_sea(), which is hidden in the arch codes mm/fault.c. 32bit gets a dummy declaration in system_misc.h. There is going to be more of this in the future if/when the kernel supports the SError-based firmware-first notification mechanism and/or kernel-first notifications for both synchronous external abort and SError. Each of these will come with some Kconfig symbols and a handful of header files. Create a header file for all this. This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the declarations to kvm_ras.h as preparation for a future patch that moves the ACPI-specific RAS code out of mm/fault.c. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | | | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2019-03-051-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main RCU related changes in this cycle were: - Additional cleanups after RCU flavor consolidation - Grace-period forward-progress cleanups and improvements - Documentation updates - Miscellaneous fixes - spin_is_locked() conversions to lockdep - SPDX changes to RCU source and header files - SRCU updates - Torture-test updates, including nolibc updates and moving nolibc to tools/include" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) locking/locktorture: Convert to SPDX license identifier linux/torture: Convert to SPDX license identifier torture: Convert to SPDX license identifier linux/srcu: Convert to SPDX license identifier linux/rcutree: Convert to SPDX license identifier linux/rcutiny: Convert to SPDX license identifier linux/rcu_sync: Convert to SPDX license identifier linux/rcu_segcblist: Convert to SPDX license identifier linux/rcupdate: Convert to SPDX license identifier linux/rcu_node_tree: Convert to SPDX license identifier rcu/update: Convert to SPDX license identifier rcu/tree: Convert to SPDX license identifier rcu/tiny: Convert to SPDX license identifier rcu/sync: Convert to SPDX license identifier rcu/srcu: Convert to SPDX license identifier rcu/rcutorture: Convert to SPDX license identifier rcu/rcu_segcblist: Convert to SPDX license identifier rcu/rcuperf: Convert to SPDX license identifier rcu/rcu.h: Convert to SPDX license identifier RCU/torture.txt: Remove section MODULE PARAMETERS ...
| * \ \ \ Merge branch 'rcu-next' of ↵Ingo Molnar2019-02-131-1/+1
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull the latest RCU tree from Paul E. McKenney: - Additional cleanups after RCU flavor consolidation - Grace-period forward-progress cleanups and improvements - Documentation updates - Miscellaneous fixes - spin_is_locked() conversions to lockdep - SPDX changes to RCU source and header files - SRCU updates - Torture-test updates, including nolibc updates and moving nolibc to tools/include Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | virt/kvm: Replace spin_is_locked() with lockdepPaul E. McKenney2019-01-261-1/+1
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep_assert_held() is better suited to checking locking requirements, since it only checks if the current thread holds the lock regardless of whether someone else does. This is also a step towards possibly removing spin_is_locked(). Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: <kvm@vger.kernel.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | kvm: properly check debugfs dentry before using itGreg Kroah-Hartman2019-02-281-1/+1
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | debugfs can now report an error code if something went wrong instead of just NULL. So if the return value is to be used as a "real" dentry, it needs to be checked if it is an error before dereferencing it. This is now happening because of ff9fb72bc077 ("debugfs: return error values, not NULL"). syzbot has found a way to trigger multiple debugfs files attempting to be created, which fails, and then the error code gets passed to dentry_path_raw() which obviously does not like it. Reported-by: Eric Biggers <ebiggers@kernel.org> Reported-and-tested-by: syzbot+7857962b4d45e602b8ad@syzkaller.appspotmail.com Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'kvm-arm-fixes-for-5.0' of ↵Paolo Bonzini2019-02-1312-143/+158
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM fixes for 5.0: - Fix the way we reset vcpus, plugging the race that could happen on VHE - Fix potentially inconsistent group setting for private interrupts - Don't generate UNDEF when LORegion feature is present - Relax the restriction on using stage2 PUD huge mapping - Turn some spinlocks into raw_spinlocks to help RT compliance
| * | KVM: arm64: Relax the restriction on using stage2 PUD huge mappingSuzuki K Poulose2019-02-071-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We restrict mapping the PUD huge pages in stage2 to only when the stage2 has 4 level page table, leaving the feature unused with the default IPA size. But we could use it even with a 3 level page table, i.e, when the PUD level is folded into PGD, just like the stage1. Relax the condition to allow using the PUD huge page mappings at stage2 when it is possible. Cc: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | KVM: arm/arm64: vgic: Always initialize the group of private IRQsChristoffer Dall2019-02-071-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently initialize the group of private IRQs during kvm_vgic_vcpu_init, and the value of the group depends on the GIC model we are emulating. However, CPUs created before creating (and initializing) the VGIC might end up with the wrong group if the VGIC is created as GICv3 later. Since we have no enforced ordering of creating the VGIC and creating VCPUs, we can end up with part the VCPUs being properly intialized and the remaining incorrectly initialized. That also means that we have no single place to do the per-cpu data structure initialization which depends on knowing the emulated GIC model (which is only the group field). This patch removes the incorrect comment from kvm_vgic_vcpu_init and initializes the group of all previously created VCPUs's private interrupts in vgic_init in addition to the existing initialization in kvm_vgic_vcpu_init. Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
| * | arm/arm64: KVM: Allow a VCPU to fully reset itselfMarc Zyngier2019-02-072-20/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
| * | KVM: arm/arm64: vgic: Make vgic_cpu->ap_list_lock a raw_spinlockJulien Thierry2019-01-242-19/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vgic_cpu->ap_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: Julien Thierry <julien.thierry@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>