summaryrefslogtreecommitdiffstats
path: root/virt (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-05-058-340/+659
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
| * kvm: Add compat_ioctl for device control APIScott Wood2013-05-051-0/+3
| | | | | | | | | | | | | | | | This API shouldn't have 32/64-bit issues, but VFS assumes it does unless told otherwise. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: PPC: Book3S: Add API for in-kernel XICS emulationPaul Mackerras2013-05-021-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the API for userspace to instantiate an XICS device in a VM and connect VCPUs to it. The API consists of a new device type for the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl, which is used to assert and deassert interrupt inputs of the XICS. The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES. Each attribute within this group corresponds to the state of one interrupt source. The attribute number is the same as the interrupt source number. This does not support irq routing or irqfd yet. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexander Graf <agraf@suse.de>
| * kvm: destroy emulated devices on VM exitScott Wood2013-04-261-13/+16
| | | | | | | | | | | | | | | | The hassle of getting refcounting right was greater than the hassle of keeping a list of devices to destroy on VM exit. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * kvm/ppc/mpic: in-kernel MPIC emulationScott Wood2013-04-261-0/+6
| | | | | | | | | | | | | | | | Hook the MPIC code up to the KVM interfaces, add locking, etc. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: add stub function for kvmppc_mpic_set_epr, non-booke, 64bit] Signed-off-by: Alexander Graf <agraf@suse.de>
| * kvm: add device control APIScott Wood2013-04-261-0/+129
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, devices that are emulated inside KVM are configured in a hardcoded manner based on an assumption that any given architecture only has one way to do it. If there's any need to access device state, it is done through inflexible one-purpose-only IOCTLs (e.g. KVM_GET/SET_LAPIC). Defining new IOCTLs for every little thing is cumbersome and depletes a limited numberspace. This API provides a mechanism to instantiate a device of a certain type, returning an ID that can be used to set/get attributes of the device. Attributes may include configuration parameters (e.g. register base address), device state, operational commands, etc. It is similar to the ONE_REG API, except that it acts on devices rather than vcpus. Both device types and individual attributes can be tested without having to create the device or get/set the attribute, without the need for separately managing enumerated capabilities. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
| * KVM: Move irqfd resample cap handling to generic codeAlexander Graf2013-04-261-0/+3
| | | | | | | | | | | | | | | | Now that we have most irqfd code completely platform agnostic, let's move irqfd's resample capability return to generic code as well. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * KVM: Move irq routing setup to irqchip.cAlexander Graf2013-04-262-73/+88
| | | | | | | | | | | | | | | | | | Setting up IRQ routes is nothing IOAPIC specific. Extract everything that really is generic code into irqchip.c and only leave the ioapic specific bits to irq_comm.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * KVM: Extract generic irqchip logic into irqchip.cAlexander Graf2013-04-262-118/+152
| | | | | | | | | | | | | | | | | | | | | | The current irq_comm.c file contains pieces of code that are generic across different irqchip implementations, as well as code that is fully IOAPIC specific. Split the generic bits out into irqchip.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * KVM: Move irq routing to generic codeAlexander Graf2013-04-262-30/+30
| | | | | | | | | | | | | | | | | | The IRQ routing set ioctl lives in the hacky device assignment code inside of KVM today. This is definitely the wrong place for it. Move it to the much more natural kvm_main.c. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * KVM: Introduce CONFIG_HAVE_KVM_IRQ_ROUTINGAlexander Graf2013-04-263-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | Quite a bit of code in KVM has been conditionalized on availability of IOAPIC emulation. However, most of it is generically applicable to platforms that don't have an IOPIC, but a different type of irq chip. Make code that only relies on IRQ routing, not an APIC itself, on CONFIG_HAVE_KVM_IRQ_ROUTING, so that we can reuse it later. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * KVM: Add KVM_IRQCHIP_NUM_PINS in addition to KVM_IOAPIC_NUM_PINSAlexander Graf2013-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The concept of routing interrupt lines to an irqchip is nothing that is IOAPIC specific. Every irqchip has a maximum number of pins that can be linked to irq lines. So let's add a new define that allows us to reuse generic code for non-IOAPIC platforms. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Michael S. Tsirkin <mst@redhat.com>
| * KVM: VMX: Add the deliver posted interrupt algorithmYang Zhang2013-04-161-0/+1
| | | | | | | | | | | | | | | | | | Only deliver the posted interrupt when target vcpu is running and there is no previous interrupt pending in pir. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Set TMR when programming ioapic entryYang Zhang2013-04-162-4/+11
| | | | | | | | | | | | | | | | | | | | We already know the trigger mode of a given interrupt when programming the ioapice entry. So it's not necessary to set it in each interrupt delivery. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Call common update function when ioapic entry changed.Yang Zhang2013-04-164-17/+19
| | | | | | | | | | | | | | | | | | | | Both TMR and EOI exit bitmap need to be updated when ioapic changed or vcpu's id/ldr/dfr changed. So use common function instead eoi exit bitmap specific function. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Use eoi to track RTC interrupt delivery statusYang Zhang2013-04-161-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Current interrupt coalescing logci which only used by RTC has conflict with Posted Interrupt. This patch introduces a new mechinism to use eoi to track interrupt: When delivering an interrupt to vcpu, the pending_eoi set to number of vcpu that received the interrupt. And decrease it when each vcpu writing eoi. No subsequent RTC interrupt can deliver to vcpu until all vcpus write eoi. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Let ioapic know the irq line statusYang Zhang2013-04-166-29/+41
| | | | | | | | | | | | | | | | | | Userspace may deliver RTC interrupt without query the status. So we want to track RTC EOI for this case. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Force vmexit with virtual interrupt deliveryYang Zhang2013-04-161-1/+1
| | | | | | | | | | | | | | | | | | Need the EOI to track interrupt deliver status, so force vmexit on EOI for rtc interrupt when enabling virtual interrupt delivery. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Add reset/restore rtc_status supportYang Zhang2013-04-162-0/+59
| | | | | | | | | | | | | | | | restore rtc_status from migration or save/restore Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Return destination vcpu on interrupt injectionYang Zhang2013-04-163-8/+8
| | | | | | | | | | | | | | | | Add a new parameter to know vcpus who received the interrupt. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Introduce struct rtc_statusYang Zhang2013-04-161-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | rtc_status is used to track RTC interrupt delivery status. The pending_eoi will be increased by vcpu who received RTC interrupt and will be decreased when EOI to this interrupt. Also, we use dest_map to record the destination vcpu to avoid the case that vcpu who didn't get the RTC interupt, but issued EOI with same vector of RTC and descreased pending_eoi by mistake. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Add vcpu info to ioapic_update_eoi()Yang Zhang2013-04-162-7/+8
| | | | | | | | | | | | | | | | | | Add vcpu info to ioapic_update_eoi, so we can know which vcpu issued this EOI. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: Move kvm_spurious_fault to x86.cGeoff Levand2013-04-081-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | The routine kvm_spurious_fault() is an x86 specific routine, so move it from virt/kvm/kvm_main.c to arch/x86/kvm/x86.c. Fixes this sparse warning when building on arm64: virt/kvm/kvm_main.c:warning: symbol 'kvm_spurious_fault' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: Make local routines staticGeoff Levand2013-04-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | The routines get_user_page_nowait(), kvm_io_bus_sort_cmp(), kvm_io_bus_insert_dev() and kvm_io_bus_get_first_dev() are only referenced within kvm_main.c, so give them static linkage. Fixes sparse warnings like these: virt/kvm/kvm_main.c: warning: symbol 'get_user_page_nowait' was not declared. Should it be static? Signed-off-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * kvm: fix MMIO/PIO collision misdetectionMichael S. Tsirkin2013-04-071-2/+6
| | | | | | | | | | | | | | | | | | | | | | PIO and MMIO are separate address spaces, but ioeventfd registration code mistakenly detected two eventfds as duplicate if they use the same address, even if one is PIO and another one MMIO. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * KVM: Call kvm_apic_match_dest() to check destination vcpuYang Zhang2013-04-071-6/+3
| | | | | | | | | | | | | | | | | | For a given vcpu, kvm_apic_match_dest() will tell you whether the vcpu in the destination list quickly. Drop kvm_calculate_eoi_exitmap() and use kvm_apic_match_dest() instead. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * Merge remote-tracking branch 'upstream/master' into queueMarcelo Tosatti2013-03-211-2/+5
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: From: Alexander Graf <agraf@suse.de> "Just recently this really important patch got pulled into Linus' tree for 3.9: commit 1674400aaee5b466c595a8fc310488263ce888c7 Author: Anton Blanchard <anton <at> samba.org> Date: Tue Mar 12 01:51:51 2013 +0000 Without that commit, I can not boot my G5, thus I can't run automated tests on it against my queue. Could you please merge kvm/next against linus/master, so that I can base my trees against that?" * upstream/master: (653 commits) PCI: Use ROM images from firmware only if no other ROM source available sparc: remove unused "config BITS" sparc: delete "if !ULTRA_HAS_POPULATION_COUNT" KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798) KVM: x86: Convert MSR_KVM_SYSTEM_TIME to use gfn_to_hva_cache functions (CVE-2013-1797) KVM: x86: fix for buffer overflow in handling of MSR_KVM_SYSTEM_TIME (CVE-2013-1796) arm64: Kconfig.debug: Remove unused CONFIG_DEBUG_ERRORS arm64: Do not select GENERIC_HARDIRQS_NO_DEPRECATED inet: limit length of fragment queue hash table bucket lists qeth: Fix scatter-gather regression qeth: Fix invalid router settings handling qeth: delay feature trace sgy-cts1000: Remove __dev* attributes KVM: x86: fix deadlock in clock-in-progress request handling KVM: allow host header to be included even for !CONFIG_KVM hwmon: (lm75) Fix tcn75 prefix hwmon: (lm75.h) Update header inclusion MAINTAINERS: Remove Mark M. Hoffman xfs: ensure we capture IO errors correctly xfs: fix xfs_iomap_eof_prealloc_initial_size type ... Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | kvm: Iterate over only vcpus that are preemptedRaghavendra K T2013-03-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helps in filtering out the eligible candidates further and thus potentially helps in quickly allowing preempted lockholders to run. Note that if a vcpu was spinning during preemption we filter them by checking whether they are preempted due to pause loop exit. Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * | kvm: Record the preemption status of vcpus using preempt notifiersRaghavendra K T2013-03-111-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that we mark as preempted only when vcpu's task state was Running during preemption. Thanks Jiannan, Avi for preemption notifier ideas. Thanks Gleb, PeterZ for their precious suggestions. Thanks Srikar for an idea on avoiding rcu lock while checking task state that improved overcommit numbers. Reviewed-by: Chegu Vinod <chegu_vinod@hp.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
| * | KVM: ioeventfd for virtio-ccw devices.Cornelia Huck2013-03-051-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | Enhance KVM_IOEVENTFD with a new flag that allows to attach to virtio-ccw devices on s390 via the KVM_VIRTIO_CCW_NOTIFY_BUS. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: Initialize irqfd from kvm_init().Cornelia Huck2013-03-052-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, eventfd introduces module_init/module_exit functions to initialize/cleanup the irqfd workqueue. This only works, however, if no other module_init/module_exit functions are built into the same module. Let's just move the initialization and cleanup to kvm_init and kvm_exit. This way, it is also clearer where kvm startup may fail. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: set_memory_region: Refactor commit_memory_region()Takuya Yoshikawa2013-03-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the parameter old a const pointer to the old memory slot and adds a new parameter named change to know the change being requested: the former is for removing extra copying and the latter is for cleaning up the code. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: set_memory_region: Refactor prepare_memory_region()Takuya Yoshikawa2013-03-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch drops the parameter old, a copy of the old memory slot, and adds a new parameter named change to know the change being requested. This not only cleans up the code but also removes extra copying of the memory slot structure. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: set_memory_region: Make kvm_mr_change available to arch codeTakuya Yoshikawa2013-03-051-18/+0
| | | | | | | | | | | | | | | | | | | | | This will be used for cleaning up prepare/commit_memory_region() later. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: set_memory_region: Drop user_alloc from set_memory_region()Takuya Yoshikawa2013-03-051-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Except ia64's stale code, KVM_SET_MEMORY_REGION support, this is only used for sanity checks in __kvm_set_memory_region() which can easily be changed to use slot id instead. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | KVM: set_memory_region: Drop user_alloc from prepare/commit_memory_region()Takuya Yoshikawa2013-03-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | X86 does not use this any more. The remaining user, s390's !user_alloc check, can be simply removed since KVM_SET_MEMORY_REGION ioctl is no longer supported. Note: fixed powerpc's indentations with spaces to suppress checkpatch errors. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | KVM: Allow cross page reads and writes from cached translations.Andrew Honig2013-04-071-10/+37
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | This patch adds support for kvm_gfn_to_hva_cache_init functions for reads and writes that will cross a page. If the range falls within the same memslot, then this will be a fast operation. If the range is split between two memslots, then the slower kvm_read_guest and kvm_write_guest are used. Tested: Test against kvm_clock unit tests. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | KVM: Fix bounds checking in ioapic indirect register reads (CVE-2013-1798)Andy Honig2013-03-191-2/+5
|/ | | | | | | | | | | | | | | If the guest specifies a IOAPIC_REG_SELECT with an invalid value and follows that with a read of the IOAPIC_REG_WINDOW KVM does not properly validate that request. ioapic_read_indirect contains an ASSERT(redir_index < IOAPIC_NUM_PINS), but the ASSERT has no effect in non-debug builds. In recent kernels this allows a guest to cause a kernel oops by reading invalid memory. In older kernels (pre-3.3) this allows a guest to read from large ranges of host memory. Tested: tested against apic unit tests. Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* hlist: drop the node parameter from iteratorsSasha Levin2013-02-282-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: Remove user_alloc from struct kvm_memory_slotTakuya Yoshikawa2013-02-111-1/+0
| | | | | | | | | | | | | | | | | | | This field was needed to differentiate memory slots created by the new API, KVM_SET_USER_MEMORY_REGION, from those by the old equivalent, KVM_SET_MEMORY_REGION, whose support was dropped long before: commit b74a07beed0e64bfba413dcb70dd6749c57f43dc KVM: Remove kernel-allocated memory regions Although we also have private memory slots to which KVM allocates memory with vm_mmap(), !user_alloc slots in other words, the slot id should be enough for differentiating them. Note: corresponding function parameters will be removed later. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: set_memory_region: Disallow changing read-only attribute laterTakuya Yoshikawa2013-02-051-23/+12
| | | | | | | | | | | | | | | | | | As Xiao pointed out, there are a few problems with it: - kvm_arch_commit_memory_region() write protects the memory slot only for GET_DIRTY_LOG when modifying the flags. - FNAME(sync_page) uses the old spte value to set a new one without checking KVM_MEM_READONLY flag. Since we flush all shadow pages when creating a new slot, the simplest fix is to disallow such problematic flag changes: this is safe because no one is doing such things. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: set_memory_region: Identify the requested change explicitlyTakuya Yoshikawa2013-02-051-20/+44
| | | | | | | | | | | | | | | | | | | | | KVM_SET_USER_MEMORY_REGION forces __kvm_set_memory_region() to identify what kind of change is being requested by checking the arguments. The current code does this checking at various points in code and each condition being used there is not easy to understand at first glance. This patch consolidates these checks and introduces an enum to name the possible changes to clean up the code. Although this does not introduce any functional changes, there is one change which optimizes the code a bit: if we have nothing to change, the new code returns 0 immediately. Note that the return value for this case cannot be changed since QEMU relies on it: we noticed this when we changed it to -EINVAL and got a section mismatch error at the final stage of live migration. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* kvm: Handle yield_to failure return code for potential undercommit caseRaghavendra K T2013-01-291-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | yield_to returns -ESRCH, When source and target of yield_to run queue length is one. When we see three successive failures of yield_to we assume we are in potential undercommit case and abort from PLE handler. The assumption is backed by low probability of wrong decision for even worst case scenarios such as average runqueue length between 1 and 2. More detail on rationale behind using three tries: if p is the probability of finding rq length one on a particular cpu, and if we do n tries, then probability of exiting ple handler is: p^(n+1) [ because we would have come across one source with rq length 1 and n target cpu rqs with length 1 ] so num tries: probability of aborting ple handler (1.5x overcommit) 1 1/4 2 1/8 3 1/16 We can increase this probability with more tries, but the problem is the overhead. Also, If we have tried three times that means we would have iterated over 3 good eligible vcpus along with many non-eligible candidates. In worst case if we iterate all the vcpus, we reduce 1x performance and overcommit performance get hit. note that we do not update last boosted vcpu in failure cases. Thank Avi for raising question on aborting after first fail from yield_to. Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Tested-by: Chegu Vinod <chegu_vinod@hp.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* x86, apicv: add virtual interrupt delivery supportYang Zhang2013-01-294-0/+73
| | | | | | | | | | | | | | | | | | | Virtual interrupt delivery avoids KVM to inject vAPIC interrupts manually, which is fully taken care of by the hardware. This needs some special awareness into existing interrupr injection path: - for pending interrupt, instead of direct injection, we may need update architecture specific indicators before resuming to guest. - A pending interrupt, which is masked by ISR, should be also considered in above update action, since hardware will decide when to inject it at right time. Current has_interrupt and get_interrupt only returns a valid vector from injection p.o.v. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* kvm: Obey read-only mappings in iommuAlex Williamson2013-01-271-1/+3
| | | | | | | | | We've been ignoring read-only mappings and programming everything into the iommu as read-write. Fix this to only include the write access flag when read-only is not set. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* kvm: Force IOMMU remapping on memory slot read-only flag changesAlex Williamson2013-01-271-4/+24
| | | | | | | | | | | | Memory slot flags can be altered without changing other parameters of the slot. The read-only attribute is the only one the IOMMU cares about, so generate an un-map, re-map when this occurs. This also avoid unnecessarily re-mapping the slot when no IOMMU visible changes are made. Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: set_memory_region: Remove unnecessary variable memslotTakuya Yoshikawa2013-01-171-6/+5
| | | | | | | | | | One such variable, slot, is enough for holding a pointer temporarily. We also remove another local variable named slot, which is limited in a block, since it is confusing to have the same name in this function. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: set_memory_region: Don't check for overlaps unless we create or move a slotTakuya Yoshikawa2013-01-171-8/+10
| | | | | | | | | Don't need the check for deleting an existing slot or just modifiying the flags. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: set_memory_region: Don't jump to out_free unnecessarilyTakuya Yoshikawa2013-01-171-4/+3
| | | | | | | | | This makes the separation between the sanity checks and the rest of the code a bit clearer. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* KVM: Write protect the updated slot only when dirty logging is enabledTakuya Yoshikawa2013-01-141-1/+0
| | | | | | | | | | | | | | Calling kvm_mmu_slot_remove_write_access() for a deleted slot does nothing but search for non-existent mmu pages which have mappings to that deleted memory; this is safe but a waste of time. Since we want to make the function rmap based in a later patch, in a manner which makes it unsafe to be called for a deleted slot, we makes the caller see if the slot is non-zero and being dirty logged. Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Signed-off-by: Gleb Natapov <gleb@redhat.com>