summaryrefslogtreecommitdiffstats
path: root/virt (follow)
Commit message (Collapse)AuthorAgeFilesLines
* KVM: Improve create VCPU parameter (CVE-2013-4587)Andy Honig2013-12-121-0/+3
| | | | | | | | | | | | | | In multiple functions the vcpu_id is used as an offset into a bitfield. Ag malicious user could specify a vcpu_id greater than 255 in order to set or clear bits in kernel memory. This could be used to elevate priveges in the kernel. This patch verifies that the vcpu_id provided is less than 255. The api documentation already specifies that the vcpu_id must be less than max_vcpus, but this is currently not checked. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: kvm_clear_guest_page(): fix empty_zero_page usageHeiko Carstens2013-11-211-2/+3
| | | | | | | | | | | | | | Using the address of 'empty_zero_page' as source address in order to clear a page is wrong. On some architectures empty_zero_page is only the pointer to the struct page of the empty_zero_page. Therefore the clear page operation would copy the contents of a couple of struct pages instead of clearing a page. For kvm only arm/arm64 are affected by this bug. To fix this use the ZERO_PAGE macro instead which will return the struct page address of the empty_zero_page on all architectures. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-11-155-113/+348
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM changes from Paolo Bonzini: "Here are the 3.13 KVM changes. There was a lot of work on the PPC side: the HV and emulation flavors can now coexist in a single kernel is probably the most interesting change from a user point of view. On the x86 side there are nested virtualization improvements and a few bugfixes. ARM got transparent huge page support, improved overcommit, and support for big endian guests. Finally, there is a new interface to connect KVM with VFIO. This helps with devices that use NoSnoop PCI transactions, letting the driver in the guest execute WBINVD instructions. This includes some nVidia cards on Windows, that fail to start without these patches and the corresponding userspace changes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) kvm, vmx: Fix lazy FPU on nested guest arm/arm64: KVM: PSCI: propagate caller endianness to the incoming vcpu arm/arm64: KVM: MMIO support for BE guest kvm, cpuid: Fix sparse warning kvm: Delete prototype for non-existent function kvm_check_iopl kvm: Delete prototype for non-existent function complete_pio hung_task: add method to reset detector pvclock: detect watchdog reset at pvclock read kvm: optimize out smp_mb after srcu_read_unlock srcu: API for barrier after srcu read unlock KVM: remove vm mmap method KVM: IOMMU: hva align mapping page size KVM: x86: trace cpuid emulation when called from emulator KVM: emulator: cleanup decode_register_operand() a bit KVM: emulator: check rex prefix inside decode_register() KVM: x86: fix emulation of "movzbl %bpl, %eax" kvm_host: typo fix KVM: x86: emulate SAHF instruction MAINTAINERS: add tree for kvm.git Documentation/kvm: add a 00-INDEX file ...
| * KVM: remove vm mmap methodGleb Natapov2013-11-061-32/+0
| | | | | | | | | | | | | | | | It was used in conjunction with KVM_SET_MEMORY_REGION ioctl which was removed by b74a07beed0 in 2010, QEMU stopped using it in 2008, so it is time to remove the code finally. Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: IOMMU: hva align mapping page sizeGreg Edwards2013-11-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When determining the page size we could use to map with the IOMMU, the page size should also be aligned with the hva, not just the gfn. The gfn may not reflect the real alignment within the hugetlbfs file. Most of the time, this works fine. However, if the hugetlbfs file is backed by non-contiguous huge pages, a multi-huge page memslot starts at an unaligned offset within the hugetlbfs file, and the gfn is aligned with respect to the huge page size, kvm_host_page_size() will return the huge page size and we will use that to map with the IOMMU. When we later unpin that same memslot, the IOMMU returns the unmap size as the huge page size, and we happily unpin that many pfns in monotonically increasing order, not realizing we are spanning non-contiguous huge pages and partially unpin the wrong huge page. Ensure the IOMMU mapping page size is aligned with the hva corresponding to the gfn, which does reflect the alignment within the hugetlbfs file. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Edwards <gedwards@ddn.com> Cc: stable@vger.kernel.org Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * Merge branch 'kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into queueGleb Natapov2013-11-041-6/+10
| |\ | | | | | | | | | | | | Conflicts: arch/powerpc/include/asm/processor.h
| | * kvm: Add struct kvm arg to memslot APIsAneesh Kumar K.V2013-10-171-6/+6
| | | | | | | | | | | | | | | | | | | | | We will use that in the later patch to find the kvm ops handler Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| | * kvm: powerpc: book3s: Support building HV and PR KVM as moduleAneesh Kumar K.V2013-10-171-0/+4
| | | | | | | | | | | | | | | | | | Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [agraf: squash in compile fix] Signed-off-by: Alexander Graf <agraf@suse.de>
| * | kvm: Create non-coherent DMA registerationAlex Williamson2013-10-302-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use some ad-hoc arch variables tied to legacy KVM device assignment to manage emulation of instructions that depend on whether non-coherent DMA is present. Create an interface for this, adapting legacy KVM device assignment and adding VFIO via the KVM-VFIO device. For now we assume that non-coherent DMA is possible any time we have a VFIO group. Eventually an interface can be developed as part of the VFIO external user interface to query the coherency of a group. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | kvm/x86: Convert iommu_flags to iommu_noncoherentAlex Williamson2013-10-301-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | Default to operating in coherent mode. This simplifies the logic when we switch to a model of registering and unregistering noncoherent I/O with KVM. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | kvm: Add VFIO deviceAlex Williamson2013-10-303-0/+228
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far we've succeeded at making KVM and VFIO mostly unaware of each other, but areas are cropping up where a connection beyond eventfds and irqfds needs to be made. This patch introduces a KVM-VFIO device that is meant to be a gateway for such interaction. The user creates the device and can add and remove VFIO groups to it via file descriptors. When a group is added, KVM verifies the group is valid and gets a reference to it via the VFIO external user interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: Mapping IOMMU pages after updating memslotYang Zhang2013-10-281-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In kvm_iommu_map_pages(), we need to know the page size via call kvm_host_page_size(). And it will check whether the target slot is valid before return the right page size. Currently, we will map the iommu pages when creating a new slot. But we call kvm_iommu_map_pages() during preparing the new slot. At that time, the new slot is not visible by domain(still in preparing). So we cannot get the right page size from kvm_host_page_size() and this will break the IOMMU super page logic. The solution is to map the iommu pages after we insert the new slot into domain. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Tested-by: Patrick Lu <patrick.lu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | Powerpc KVM work is based on a commit after rc4.Gleb Natapov2013-10-171-2/+4
| |\ \ | | |/ | |/| | | | | | | | | | | | | Merging master into next to satisfy the dependencies. Conflicts: arch/arm/kvm/reset.c
| * | KVM: Drop FOLL_GET in GUP when doing async page faultchai wen2013-10-151-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Page pinning is not mandatory in kvm async page fault processing since after async page fault event is delivered to a guest it accesses page once again and does its own GUP. Drop the FOLL_GET flag in GUP in async_pf code, and do some simplifying in check/clear processing. Suggested-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Gu zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: chai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * | virt/kvm/iommu.c: Add leading zeros to device's BDF notation in debug messagesAndre Richter2013-10-031-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When KVM (de)assigns PCI(e) devices to VMs, a debug message is printed including the BDF notation of the respective device. Currently, the BDF notation does not have the commonly used leading zeros. This produces messages like "assign device 0:1:8.0", which look strange at first sight. The patch fixes this by exchanging the printk(KERN_DEBUG ...) with dev_info() and also inserts "kvm" into the debug message, so that it is obvious where the message comes from. Also reduces LoC. Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Andre Richter <andre.o.richter@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * | KVM: Convert kvm_lock back to non-raw spinlockPaolo Bonzini2013-09-301-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit e935b8372cf8 ("KVM: Convert kvm_lock to raw_spinlock"), the kvm_lock was made a raw lock. However, the kvm mmu_shrink() function tries to grab the (non-raw) mmu_lock within the scope of the raw locked kvm_lock being held. This leads to the following: BUG: sleeping function called from invalid context at kernel/rtmutex.c:659 in_atomic(): 1, irqs_disabled(): 0, pid: 55, name: kswapd0 Preemption disabled at:[<ffffffffa0376eac>] mmu_shrink+0x5c/0x1b0 [kvm] Pid: 55, comm: kswapd0 Not tainted 3.4.34_preempt-rt Call Trace: [<ffffffff8106f2ad>] __might_sleep+0xfd/0x160 [<ffffffff817d8d64>] rt_spin_lock+0x24/0x50 [<ffffffffa0376f3c>] mmu_shrink+0xec/0x1b0 [kvm] [<ffffffff8111455d>] shrink_slab+0x17d/0x3a0 [<ffffffff81151f00>] ? mem_cgroup_iter+0x130/0x260 [<ffffffff8111824a>] balance_pgdat+0x54a/0x730 [<ffffffff8111fe47>] ? set_pgdat_percpu_threshold+0xa7/0xd0 [<ffffffff811185bf>] kswapd+0x18f/0x490 [<ffffffff81070961>] ? get_parent_ip+0x11/0x50 [<ffffffff81061970>] ? __init_waitqueue_head+0x50/0x50 [<ffffffff81118430>] ? balance_pgdat+0x730/0x730 [<ffffffff81060d2b>] kthread+0xdb/0xe0 [<ffffffff8106e122>] ? finish_task_switch+0x52/0x100 [<ffffffff817e1e94>] kernel_thread_helper+0x4/0x10 [<ffffffff81060c50>] ? __init_kthread_worker+0x After the previous patch, kvm_lock need not be a raw spinlock anymore, so change it back. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: protect kvm_usage_count with its own spinlockPaolo Bonzini2013-09-301-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VM list need not be protected by a raw spinlock. Separate the two so that kvm_lock can be made non-raw. Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: cleanup (physical) CPU hotplugPaolo Bonzini2013-09-301-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the useless argument, and do not do anything if there are no VMs running at the time of the hotplug. Cc: kvm@vger.kernel.org Cc: gleb@redhat.com Cc: jan.kiszka@siemens.com Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | kvm: remove .done from struct kvm_async_pfRadim Krčmář2013-09-241-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | '.done' is used to mark the completion of 'async_pf_execute()', but 'cancel_work_sync()' returns true when the work was canceled, so we use it instead. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: use a more sensible error number when debugfs directory creation failsPaolo Bonzini2013-10-301-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | I don't know if this was due to cut and paste, or somebody was really using a D20 to pick the error code for kvm_init_debugfs as suggested by Linus (EFAULT is 14, so the possibility cannot be entirely ruled out). In any case, this patch fixes it. Reported-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Fix NULL dereference in gfn_to_hva_prot()Gleb Natapov2013-10-031-2/+4
|/ | | | | | | | gfn_to_memslot() can return NULL or invalid slot. We need to check slot validity before accessing it. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* kvm: free resources after canceling async_pfRadim Krčmář2013-09-171-1/+4
| | | | | | | | | | | When we cancel 'async_pf_execute()', we should behave as if the work was never scheduled in 'kvm_setup_async_pf()'. Fixes a bug when we can't unload module because the vm wasn't destroyed. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: mmu: allow page tables to be in read-only slotsPaolo Bonzini2013-09-171-5/+9
| | | | | | | | | | | | | Page tables in a read-only memory slot will currently cause a triple fault because the page walker uses gfn_to_hva and it fails on such a slot. OVMF uses such a page table; however, real hardware seems to be fine with that as long as the accessed/dirty bits are set. Save whether the slot is readonly, and later check it when updating the accessed and dirty bits. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-09-051-10/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile 1 from Al Viro: "Unfortunately, this merge window it'll have a be a lot of small piles - my fault, actually, for not keeping #for-next in anything that would resemble a sane shape ;-/ This pile: assorted fixes (the first 3 are -stable fodder, IMO) and cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last components) + several long-standing patches from various folks. There definitely will be a lot more (starting with Miklos' check_submount_and_drop() series)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) direct-io: Handle O_(D)SYNC AIO direct-io: Implement generic deferred AIO completions add formats for dentry/file pathnames kvm eventfd: switch to fdget powerpc kvm: use fdget switch fchmod() to fdget switch epoll_ctl() to fdget switch copy_module_from_fd() to fdget git simplify nilfs check for busy subtree ibmasmfs: don't bother passing superblock when not needed don't pass superblock to hypfs_{mkdir,create*} don't pass superblock to hypfs_diag_create_files don't pass superblock to hypfs_vm_create_files() oprofile: get rid of pointless forward declarations of struct super_block oprofilefs_create_...() do not need superblock argument oprofilefs_mkdir() doesn't need superblock argument don't bother with passing superblock to oprofile_create_stats_files() oprofile: don't bother with passing superblock to ->create_files() don't bother passing sb to oprofile_create_files() coh901318: don't open-code simple_read_from_buffer() ...
| * kvm eventfd: switch to fdgetAl Viro2013-09-041-10/+10
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regsChristoffer Dall2013-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in one word and since there are 32 private (per cpu) irqs, we have 8 private u32 fields on the vgic_bytemap struct. We shift the offset from the base of the register group right by 2, giving us the word index instead of the field index. But then there are 8 private words, not 4, which is also why we subtract 8 words from the offset of the shared words. Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | ARM: KVM: vgic: fix GICD_ICFGRn accessMarc Zyngier2013-08-301-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | All the code in handle_mmio_cfg_reg() assumes the offset has been shifted right to accomodate for the 2:1 bit compression, but this is only done when getting the register address. Shift the offset early so the code works mostly unchanged. Reported-by: Zhaobo (Bob, ERC) <zhaobo@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | ARM: KVM: vgic: simplify vgic_get_target_regMarc Zyngier2013-08-301-9/+3
| | | | | | | | | | | | | | | | | | vgic_get_target_reg is quite complicated, for no good reason. Actually, it is fairly easy to write it in a much more efficient way by using the target CPU array instead of the bitmap. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | KVM: rename __kvm_io_bus_sort_cmp to kvm_io_bus_cmpPaolo Bonzini2013-08-281-8/+8
| | | | | | | | | | | | | | | | This is the type-safe comparison function, so the double-underscore is not related. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | kvm: optimize away THP checks in kvm_is_mmio_pfn()Andrea Arcangeli2013-08-271-22/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The checks on PG_reserved in the page structure on head and tail pages aren't necessary because split_huge_page wouldn't transfer the PG_reserved bit from head to tail anyway. This was a forward-thinking check done in the case PageReserved was set by a driver-owned page mapped in userland with something like remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not possible right now). It was meant to be very safe, but it's overkill as it's unlikely split_huge_page could ever run without the driver noticing and tearing down the hugepage itself. And if a driver in the future will really want to map a reserved hugepage in userland using an huge pmd it should simply take care of marking all subpages reserved too to keep KVM safe. This of course would require such a hypothetical driver to tear down the huge pmd itself and splitting the hugepage itself, instead of relaying on split_huge_page, but that sounds very reasonable, especially considering split_huge_page wouldn't currently transfer the reserved bit anyway. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | kvm: use anon_inode_getfd() with O_CLOEXEC flagYann Droneaud2013-08-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM uses anon_inode_get() to allocate file descriptors as part of some of its ioctls. But those ioctls are lacking a flag argument allowing userspace to choose options for the newly opened file descriptor. In such case it's advised to use O_CLOEXEC by default so that userspace is allowed to choose, without race, if the file descriptor is going to be inherited across exec(). This patch set O_CLOEXEC flag on all file descriptors created with anon_inode_getfd() to not leak file descriptors across exec(). Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Link: http://lkml.kernel.org/r/cover.1377372576.git.ydroneaud@opteya.com Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | KVM: introduce __kvm_io_bus_sort_cmpPaolo Bonzini2013-07-291-9/+12
| | | | | | | | | | | | | | | | | | kvm_io_bus_sort_cmp is used also directly, not just as a callback for sort and bsearch. In these cases, it is handy to have a type-safe variant. This patch introduces such a variant, __kvm_io_bus_sort_cmp, and uses it throughout kvm_main.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: Introduce kvm_arch_memslots_updated()Takuya Yoshikawa2013-07-181-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is called right after the memslots is updated, i.e. when the result of update_memslots() gets installed in install_new_memslots(). Since the memslots needs to be updated twice when we delete or move a memslot, kvm_arch_commit_memory_region() does not correspond to this exactly. In the following patch, x86 will use this new API to check if the mmio generation has reached its maximum value, in which case mmio sptes need to be flushed out. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Acked-by: Alexander Graf <agraf@suse.de> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: kvm-io: support cookiesCornelia Huck2013-07-181-16/+92
|/ | | | | | | | | | Add new functions kvm_io_bus_{read,write}_cookie() that allows users of the kvm io infrastructure to use a cookie value to speed up lookup of a device on an io bus. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-07-034-1/+1788
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "On the x86 side, there are some optimizations and documentation updates. The big ARM/KVM change for 3.11, support for AArch64, will come through Catalin Marinas's tree. s390 and PPC have misc cleanups and bugfixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (87 commits) KVM: PPC: Ignore PIR writes KVM: PPC: Book3S PR: Invalidate SLB entries properly KVM: PPC: Book3S PR: Allow guest to use 1TB segments KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry KVM: PPC: Book3S PR: Fix proto-VSID calculations KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL KVM: Fix RTC interrupt coalescing tracking kvm: Add a tracepoint write_tsc_offset KVM: MMU: Inform users of mmio generation wraparound KVM: MMU: document fast invalidate all mmio sptes KVM: MMU: document fast invalidate all pages KVM: MMU: document fast page fault KVM: MMU: document mmio page fault KVM: MMU: document write_flooding_count KVM: MMU: document clear_spte_count KVM: MMU: drop kvm_mmu_zap_mmio_sptes KVM: MMU: init kvm generation close to mmio wrap-around value KVM: MMU: add tracepoint for check_mmio_spte KVM: MMU: fast invalidate all mmio sptes ...
| * Merge git://git.linaro.org/people/cdall/linux-kvm-arm.git tags/kvm-arm-3.11 ↵Gleb Natapov2013-06-271-9/+20
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into queue KVM/ARM pull request for 3.11 merge window * tag 'kvm-arm-3.11' of git://git.linaro.org/people/cdall/linux-kvm-arm.git: ARM: kvm: don't include drivers/virtio/Kconfig Update MAINTAINERS: KVM/ARM work now funded by Linaro arm/kvm: Cleanup KVM_ARM_MAX_VCPUS logic ARM: KVM: clear exclusive monitor on all exception returns ARM: KVM: add missing dsb before invalidating Stage-2 TLBs ARM: KVM: perform save/restore of PAR ARM: KVM: get rid of S2_PGD_SIZE ARM: KVM: don't special case PC when doing an MMIO ARM: KVM: use phys_addr_t instead of unsigned long long for HYP PGDs ARM: KVM: remove dead prototype for __kvm_tlb_flush_vmid ARM: KVM: Don't handle PSCI calls via SMC ARM: KVM: Allow host virt timer irq to be different from guest timer virt irq
| | * ARM: KVM: Allow host virt timer irq to be different from guest timer virt irqAnup Patel2013-06-261-9/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The arch_timer irq numbers (or PPI numbers) are implementation dependent, so the host virtual timer irq number can be different from guest virtual timer irq number. This patch ensures that host virtual timer irq number is read from DTB and guest virtual timer irq is determined based on vcpu target type. Signed-off-by: Anup Patel <anup.patel@linaro.org> Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
| * | KVM: Fix RTC interrupt coalescing trackingGleb Natapov2013-06-271-6/+3
| |/ | | | | | | | | | | | | | | | | | | This reverts most of the f1ed0450a5fac7067590317cbf027f566b6ccbca. After the commit kvm_apic_set_irq() no longer returns accurate information about interrupt injection status if injection is done into disabled APIC. RTC interrupt coalescing tracking relies on the information to be accurate and cannot recover if it is not. Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * kvm: exclude ioeventfd from counting kvm_io_range limitAmos Kong2013-06-042-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can easily reach the 1000 limit by start VM with a couple hundred I/O devices (multifunction=on). The hardcode limit already been adjusted 3 times (6 ~ 200 ~ 300 ~ 1000). In userspace, we already have maximum file descriptor to limit ioeventfd count. But kvm_io_bus devices also are used for pit, pic, ioapic, coalesced_mmio. They couldn't be limited by maximum file descriptor. Currently only ioeventfds take too much kvm_io_bus devices, so just exclude it from counting kvm_io_range limit. Also fixed one indent issue in kvm_host.h Signed-off-by: Amos Kong <akong@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * ARM: KVM: move GIC/timer code to a common locationMarc Zyngier2013-05-192-0/+1771
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As KVM/arm64 is looming on the horizon, it makes sense to move some of the common code to a single location in order to reduce duplication. The code could live anywhere. Actually, most of KVM is already built with a bunch of ugly ../../.. hacks in the various Makefiles, so we're not exactly talking about style here. But maybe it is time to start moving into a less ugly direction. The include files must be in a "public" location, as they are accessed from non-KVM files (arch/arm/kernel/asm-offsets.c). For this purpose, introduce two new locations: - virt/kvm/arm/ : x86 and ia64 already share the ioapic code in virt/kvm, so this could be seen as a (very ugly) precedent. - include/kvm/ : there is already an include/xen, and while the intent is slightly different, this seems as good a location as any Eventually, we should probably have independant Makefiles at every levels (just like everywhere else in the kernel), but this is just the first step. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: x86: Remove support for reporting coalesced APIC IRQsJan Kiszka2013-05-141-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | Since the arrival of posted interrupt support we can no longer guarantee that coalesced IRQs are always reported to the IRQ source. Moreover, accumulated APIC timer events could cause a busy loop when a VCPU should rather be halted. The consensus is to remove coalesced tracking from the LAPIC. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: add missing misc_deregister() on error in kvm_init()Wei Yongjun2013-05-121-0/+1
|/ | | | | | | | Add the missing misc_deregister() before return from kvm_init() in the debugfs init error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* Merge tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-05-101-5/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Gleb Natapov: "Most of the fixes are in the emulator since now we emulate more than we did before for correctness sake we see more bugs there, but there is also an OOPS fixed and corruption of xcr0 register." * tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: emulator: emulate SALC KVM: emulator: emulate XLAT KVM: emulator: emulate AAM KVM: VMX: fix halt emulation while emulating invalid guest sate KVM: Fix kvm_irqfd_init initialization KVM: x86: fix maintenance of guest/host xcr0 state
| * KVM: Fix kvm_irqfd_init initializationAsias He2013-05-081-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit a0f155e96 'KVM: Initialize irqfd from kvm_init()', when kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko), kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be called on the error handling path. This way, the kvm_irqfd system will not be ready. This patch fix the following: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30 PGD 0 Oops: 0002 [#1] SMP Modules linked in: vhost_net CPU 6 Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK RIP: 0010:[<ffffffff81c0721e>] [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30 RSP: 0018:ffff880221721cc8 EFLAGS: 00010046 RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950 RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000 RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002 R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8 R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000 FS: 00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640) Stack: ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086 0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000 ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000 Call Trace: [<ffffffff810ac5c5>] __queue_work+0x45/0x2d0 [<ffffffff810ac8fe>] queue_work_on+0x8e/0xa0 [<ffffffff810ac949>] queue_work+0x19/0x20 [<ffffffff81009b6b>] irqfd_deactivate+0x4b/0x60 [<ffffffff8100a69d>] kvm_irqfd+0x39d/0x580 [<ffffffff81007a27>] kvm_vm_ioctl+0x207/0x5b0 [<ffffffff810c9545>] ? update_curr+0xf5/0x180 [<ffffffff811b66e8>] do_vfs_ioctl+0x98/0x550 [<ffffffff810c1f5e>] ? finish_task_switch+0x4e/0xe0 [<ffffffff81c054aa>] ? __schedule+0x2ea/0x710 [<ffffffff811b6bf7>] sys_ioctl+0x57/0x90 [<ffffffff8140ae9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff81c0f602>] system_call_fastpath+0x16/0x1b Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 <f0> 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f RIP [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30 RSP <ffff880221721cc8> CR2: 0000000000000000 ---[ end trace 13fb1e4b6e5ab21f ]--- Signed-off-by: Asias He <asias@redhat.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linusLinus Torvalds2013-05-101-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MIPS updates from Ralf Baechle: - More work on DT support for various platforms - Various fixes that were to late to make it straight into 3.9 - Improved platform support, in particular the Netlogic XLR and BCM63xx, and the SEAD3 and Malta eval boards. - Support for several Ralink SOC families. - Complete support for the microMIPS ASE which basically reencodes the existing MIPS32/MIPS64 ISA to use non-constant size instructions. - Some fallout from LTO work which remove old cruft and will generally make the MIPS kernel easier to maintain and resistant to compiler optimization, even in absence of LTO. - KVM support. While MIPS has announced hardware virtualization extensions this KVM extension uses trap and emulate mode for virtualization of MIPS32. More KVM work to add support for VZ hardware virtualizaiton extensions and MIPS64 will probably already be merged for 3.11. Most of this has been sitting in -next for a long time. All defconfigs have been build or run time tested except three for which fixes are being sent by other maintainers. Semantic conflict with kvm updates done as per Ralf * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits) MIPS: Add new GIC clockevent driver. MIPS: Formatting clean-ups for clocksources. MIPS: Refactor GIC clocksource code. MIPS: Move 'gic_frequency' to common location. MIPS: Move 'gic_present' to common location. MIPS: MIPS16e: Add unaligned access support. MIPS: MIPS16e: Support handling of delay slots. MIPS: MIPS16e: Add instruction formats. MIPS: microMIPS: Optimise 'strnlen' core library function. MIPS: microMIPS: Optimise 'strlen' core library function. MIPS: microMIPS: Optimise 'strncpy' core library function. MIPS: microMIPS: Optimise 'memset' core library function. MIPS: microMIPS: Add configuration option for microMIPS kernel. MIPS: microMIPS: Disable LL/SC and fix linker bug. MIPS: microMIPS: Add vdso support. MIPS: microMIPS: Add unaligned access support. MIPS: microMIPS: Support handling of delay slots. MIPS: microMIPS: Add support for exception handling. MIPS: microMIPS: Floating point support. MIPS: microMIPS: Fix macro naming in micro-assembler. ...
| * Merge branch 'next/kvm' into mips-for-linux-nextRalf Baechle2013-05-091-1/+1
| |\
| | * KVM/MIPS32: Do not call vcpu_load when injecting interrupts.Sanjay Lal2013-05-091-1/+1
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sanjay Lal <sanjayl@kymasys.com> Cc: kvm@vger.kernel.org Cc: linux-mips@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* | | Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-05-058-340/+659
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
| * | kvm: Add compat_ioctl for device control APIScott Wood2013-05-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | This API shouldn't have 32/64-bit issues, but VFS assumes it does unless told otherwise. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * | KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras2013-05-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>