summaryrefslogtreecommitdiffstats
path: root/virt (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'kvm-arm-for-v4.12-round2' of ↵Paolo Bonzini2017-05-0916-277/+5725
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD Second round of KVM/ARM Changes for v4.12. Changes include: - A fix related to the 32-bit idmap stub - A fix to the bitmask used to deode the operands of an AArch32 CP instruction - We have moved the files shared between arch/arm/kvm and arch/arm64/kvm to virt/kvm/arm - We add support for saving/restoring the virtual ITS state to userspace
| * KVM: arm/arm64: vgic-its: Cleanup after failed ITT restoreChristoffer Dall2017-05-091-13/+22
| | | | | | | | | | | | | | | | | | | | | | When failing to restore the ITT for a DTE, we should remove the failed device entry from the list and free the object. We slightly refactor vgic_its_destroy to be able to reuse the now separate vgic_its_free_dte() function. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Don't call map_resources when restoring ITS tablesChristoffer Dall2017-05-091-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only reason we called kvm_vgic_map_resources() when restoring the ITS tables was because we wanted to have the KVM iodevs registered in the KVM IO bus framework at the time when the ITS was restored such that a restored and active device can inject MSIs prior to otherwise calling kvm_vgic_map_resources() from the first run of a VCPU. Since we now register the KVM iodevs for the redestributors and ITS as soon as possible (when setting the base addresses), we no longer need this call and kvm_vgic_map_resources() is again called only when first running a VCPU. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Register ITS iodev when setting base addressChristoffer Dall2017-05-093-43/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have to register the ITS iodevice before running the VM, because in migration scenarios, we may be restoring a live device that wishes to inject MSIs before the VCPUs have started. All we need to register the ITS io device is the base address of the ITS, so we can simply register that when the base address of the ITS is set. [ Code to fix concurrency issues when setting the ITS base address and to fix the undef base address check written by Marc Zyngier ] Signed-off-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Get rid of its->initialized fieldMarc Zyngier2017-05-091-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The its->initialized doesn't bring much to the table, and creates unnecessary ordering between setting the address and initializing it (which amounts to exactly nothing). Let's kill it altogether, making KVM_DEV_ARM_VGIC_CTRL_INIT the no-op it deserves to be. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Register iodevs when setting redist base and creating VCPUsChristoffer Dall2017-05-096-11/+71
| | | | | | | | | | | | | | | | | | | | | | | | Instead of waiting with registering KVM iodevs until the first VCPU is run, we can actually create the iodevs when the redist base address is set. The only downside is that we must now also check if we need to do this for VCPUs which are created after creating the VGIC, because there is no enforced ordering between creating the VGIC (and setting its base addresses) and creating the VCPUs. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Slightly rework kvm_vgic_addrChristoffer Dall2017-05-091-9/+13
| | | | | | | | | | | | | | | | | | | | As we are about to handle setting the address for the redistributor base region separately from some of the other base addresses, let's rework this function to leave a little more room for being flexible in what each type of base address does. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Make vgic_v3_check_base more broadly usableChristoffer Dall2017-05-092-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As we are about to fiddle with the IO device registration mechanism, let's be a little more careful when setting base addresses as early as possible. When setting a base address, we can check that there's address space enough for its scope and when the last of the two base addresses (dist and redist) get set, we can also check if the regions overlap at that time. This allows us to provide error messages to the user at time when trying to set the base address, as opposed to later when trying to run the VM. To do this, we make vgic_v3_check_base available in the core vgic-v3 code as well as in the other parts of the GICv3 code, namely the MMIO config code. We also return true for undefined base addresses so that the function can be used before all base addresses are set; all callers already check for uninitialized addresses before calling this function. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: Refactor vgic_register_redist_iodevsChristoffer Dall2017-05-093-44/+68
| | | | | | | | | | | | | | | | | | Split out the function to register all the redistributor iodevs into a function that handles a single redistributor at a time in preparation for being able to call this per VCPU as these get created. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm/arm64: vgic: Rename kvm_vgic_vcpu_init to kvm_vgic_vcpu_enableChristoffer Dall2017-05-091-6/+2
| | | | | | | | | | | | | | | | | | This function really doesn't init anything, it enables the CPU interface, so name it as such, which gives us the name to use for actual init work later on. Signed-off-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLESEric Auger2017-05-083-0/+72
| | | | | | | | | | | | | | | | | | | | This patch adds a new attribute to GICV3 KVM device KVM_DEV_ARM_VGIC_GRP_CTRL group. This allows userspace to flush all GICR pending tables into guest RAM. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: Fix pending table syncEric Auger2017-05-081-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In its_sync_lpi_pending_table() we currently ignore the target_vcpu of the LPIs. We sync the pending bit found in the vcpu pending table even if the LPI is not targeting it. Also in vgic_its_cmd_handle_invall() we are supposed to read the config table data for the LPIs associated to the collection ID. At the moment we refresh all LPI config information. This patch passes a vpcu to vgic_copy_lpi_list() so that this latter returns a snapshot of the LPIs targeting this CPU and only those. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: ITT save and restoreEric Auger2017-05-082-3/+117
| | | | | | | | | | | | | | | | Implement routines to save and restore device ITT and their interrupt table entries (ITE). Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: Device table save/restoreEric Auger2017-05-082-5/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch saves the device table entries into guest RAM. Both flat table and 2 stage tables are supported. DeviceId indexing is used. For each device listed in the device table, we also save the translation table using the vgic_its_save/restore_itt routines. Those functions will be implemented in a subsequent patch. On restore, devices are re-allocated and their itt are re-built. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: vgic_its_check_id returns the entry's GPAEric Auger2017-05-081-3/+8
| | | | | | | | | | | | | | | | | | As vgic_its_check_id() computes the device/collection entry's GPA, let's return it so that new callers can retrieve it easily. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: Collection table save/restoreEric Auger2017-05-082-2/+107
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The save path copies the collection entries into guest RAM at the GPA specified in the BASER register. This obviously requires the BASER to be set. The last written element is a dummy collection table entry. We do not index by collection ID as the collection entry can fit into 8 bytes while containing the collection ID. On restore path we re-allocate the collection objects. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: Add infrastructure for table lookupEric Auger2017-05-081-0/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a generic scan_its_table() helper whose role consists in scanning a contiguous table located in guest RAM and applying a callback on each entry. Entries can be handled as linked lists since the callback may return an id offset to the next entry and also indicate whether the entry is the last one. Helper functions also are added to compute the device/event ID offset to the next DTE/ITE. compute_next_devid_offset, compute_next_eventid_offset and scan_table will become static in subsequent patches Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: vgic_its_alloc_ite/deviceEric Auger2017-05-081-21/+47
| | | | | | | | | | | | | | | | | | Add two new helpers to allocate an its ite and an its device. This will avoid duplication on restore path. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: KVM_DEV_ARM_ITS_SAVE/RESTORE_TABLESEric Auger2017-05-081-4/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new attributes in KVM_DEV_ARM_VGIC_GRP_CTRL group: - KVM_DEV_ARM_ITS_SAVE_TABLES: saves the ITS tables into guest RAM - KVM_DEV_ARM_ITS_RESTORE_TABLES: restores them into VGIC internal structures. We hold the vcpus lock during the save and restore to make sure no vcpu is running. At this stage the functionality is not yet implemented. Only the skeleton is put in place. Signed-off-by: Eric Auger <eric.auger@redhat.com> [Given we will move the iodev register until setting the base addr] Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: Read config and pending bit in add_lpi()Eric Auger2017-05-081-11/+24
| | | | | | | | | | | | | | | | | | When creating the lpi we now ask the redistributor what is the state of the LPI (priority, enabled, pending). Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-v3: vgic_v3_lpi_sync_pending_statusEric Auger2017-05-083-4/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this new helper synchronizes the irq pending_latch with the LPI pending bit status found in rdist pending table. As the status is consumed, we reset the bit in pending table. As we need the PENDBASER_ADDRESS() in vgic-v3, let's move its definition in the irqchip header. We restore the full length of the field, ie [51:16]. Same for PROPBASER_ADDRESS with full field length of [51:12]. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: Check the device id matches TYPER DEVBITS rangeEric Auger2017-05-081-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | On MAPD we currently check the device id can be stored in the device table. Let's first check it can be encoded within the range defined by TYPER DEVBITS. Also check the collection ID belongs to the 16 bit range as GITS_TYPER CIL field equals to 0. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: Interpret MAPD ITT_addr fieldEric Auger2017-05-081-0/+4
| | | | | | | | | | | | | | | | | | Up to now the MAPD ITT_addr had been ignored. We will need it for save/restore. Let's record it in the its_device struct. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: Interpret MAPD Size field and check related errorsEric Auger2017-05-081-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | Up to now the MAPD's ITT size field has been ignored. It encodes the number of eventid bit minus 1. It should be used to check the eventid when a MAPTI command is issued on a device. Let's store the number of eventid bits in the its_device and do the check on MAPTI. Also make sure the ITT size field does not exceed the GITS_TYPER IDBITS field. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm64: vgic-its: Implement vgic_mmio_uaccess_write_its_iidrEric Auger2017-05-081-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | The GITS_IIDR revision field is used to encode the migration ABI revision. So we need to restore it to check the table layout is readable by the destination. By writing the IIDR, userspace thus forces the ABI revision to be used and this must be less than or equal to the max revision KVM supports. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: Introduce migration ABI infrastructureEric Auger2017-05-081-4/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We plan to support different migration ABIs, ie. characterizing the ITS table layout format in guest RAM. For example, a new ABI will be needed if vLPIs get supported for nested use case. So let's introduce an array of supported ABIs (at the moment a single ABI is supported though). The following characteristics are foreseen to vary with the ABI: size of table entries, save/restore operation, the way abi settings are applied. By default the MAX_ABI_REV is applied on its creation. In subsequent patches we will introduce a way for the userspace to change the ABI in use. The entry sizes now are set according to the ABI version and not hardcoded anymore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: Implement vgic_mmio_uaccess_write_its_creadrEric Auger2017-05-081-2/+40
| | | | | | | | | | | | | | | | | | | | GITS_CREADR needs to be restored so let's implement the associated uaccess_write_its callback. The write only is allowed if the its is disabled. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: Implement vgic_its_has_attr_regs and attr_regs_accessEric Auger2017-05-082-4/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements vgic_its_has_attr_regs and vgic_its_attr_regs_access upon the MMIO framework. VGIC ITS KVM device KVM_DEV_ARM_VGIC_GRP_ITS_REGS group becomes functional. At least GITS_CREADR and GITS_IIDR require to differentiate a guest write action from a user access. As such let's introduce a new uaccess_its_write vgic_register_region callback. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
| * KVM: arm/arm64: vgic: expose (un)lock_all_vcpusEric Auger2017-05-082-2/+5
| | | | | | | | | | | | | | | | | | We need to use those helpers in vgic-its.c so let's expose them in the private vgic header. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm64: vgic-its: KVM_DEV_ARM_VGIC_GRP_ITS_REGS groupEric Auger2017-05-081-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | The ITS KVM device exposes a new KVM_DEV_ARM_VGIC_GRP_ITS_REGS group which allows the userspace to save/restore ITS registers. At this stage the get/set/has operations are not yet implemented. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <cdall@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com>
| * arm/arm64: vgic: turn vgic_find_mmio_region into publicEric Auger2017-05-082-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | We plan to use vgic_find_mmio_region in vgic-its.c so let's turn it into a public function. Also let's take the opportunity to rename the region parameter into regions to emphasize this latter is an array of regions. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm/arm64: vgic-its: rename itte into iteEric Auger2017-05-081-74/+74
| | | | | | | | | | | | | | | | | | The actual abbreviation for the interrupt translation table entry is ITE. Let's rename all itte instances by ite. Signed-off-by: Eric Auger <eric.auger@redhat.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <cdall@linaro.org>
| * KVM: arm/arm64: Move shared files to virt/kvm/armChristoffer Dall2017-05-048-12/+4328
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For some time now we have been having a lot of shared functionality between the arm and arm64 KVM support in arch/arm, which not only required a horrible inter-arch reference from the Makefile in arch/arm64/kvm, but also created confusion for newcomers to the code base, as was recently seen on the mailing list. Further, it causes confusion for things like cscope, which needs special attention to index specific shared files for arm64 from the arm tree. Move the shared files into virt/kvm/arm and move the trace points along with it. When moving the tracepoints we have to modify the way the vgic creates definitions of the trace points, so we take the chance to include the VGIC tracepoints in its very own special vgic trace.h file. Signed-off-by: Christoffer Dall <cdall@linaro.org>
* | Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2017-05-091-4/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD The main thing here is a new implementation of the in-kernel XICS interrupt controller emulation for POWER9 machines, from Ben Herrenschmidt. POWER9 has a new interrupt controller called XIVE (eXternal Interrupt Virtualization Engine) which is able to deliver interrupts directly to guest virtual CPUs in hardware without hypervisor intervention. With this new code, the guest still sees the old XICS interface but performance is better because the XICS emulation in the host uses the XIVE directly rather than going through a XICS emulation in firmware. Conflicts: arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix] arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
| * \ Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-nextPaul Mackerras2017-04-281-4/+0
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controllerBenjamin Herrenschmidt2017-04-271-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | | KVM: set no_llseek in stat_fops_per_vmGeliang Tang2017-05-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In vm_stat_get_per_vm_fops and vcpu_stat_get_per_vm_fops, since we use nonseekable_open() to open, we should use no_llseek() to seek, not generic_file_llseek(). Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2017-05-091-15/+3
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge more updates from Andrew Morton: - the rest of MM - various misc things - procfs updates - lib/ updates - checkpatch updates - kdump/kexec updates - add kvmalloc helpers, use them - time helper updates for Y2038 issues. We're almost ready to remove current_fs_time() but that awaits a btrfs merge. - add tracepoints to DAX * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits) drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4 selftests/vm: add a test for virtual address range mapping dax: add tracepoint to dax_insert_mapping() dax: add tracepoint to dax_writeback_one() dax: add tracepoints to dax_writeback_mapping_range() dax: add tracepoints to dax_load_hole() dax: add tracepoints to dax_pfn_mkwrite() dax: add tracepoints to dax_iomap_pte_fault() mtd: nand: nandsim: convert to memalloc_noreclaim_*() treewide: convert PF_MEMALLOC manipulations to new helpers mm: introduce memalloc_noreclaim_{save,restore} mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required mm/huge_memory.c: use zap_deposited_table() more time: delete CURRENT_TIME_SEC and CURRENT_TIME gfs2: replace CURRENT_TIME with current_time apparmorfs: replace CURRENT_TIME with current_time() lustre: replace CURRENT_TIME macro fs: ubifs: replace CURRENT_TIME_SEC with current_time fs: ufs: use ktime_get_real_ts64() for birthtime ...
| * | | | mm: introduce kv[mz]alloc helpersMichal Hocko2017-05-091-15/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [sfr@canb.auug.org.au: f2fs fixup] Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Andreas Dilger <adilger@dilger.ca> [ext4 part] Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Hubbard <jhubbard@nvidia.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-05-0813-369/+540
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - HYP mode stub supports kexec/kdump on 32-bit - improved PMU support - virtual interrupt controller performance improvements - support for userspace virtual interrupt controller (slower, but necessary for KVM on the weird Broadcom SoCs used by the Raspberry Pi 3) MIPS: - basic support for hardware virtualization (ImgTec P5600/P6600/I6400 and Cavium Octeon III) PPC: - in-kernel acceleration for VFIO s390: - support for guests without storage keys - adapter interruption suppression x86: - usual range of nVMX improvements, notably nested EPT support for accessed and dirty bits - emulation of CPL3 CPUID faulting generic: - first part of VCPU thread request API - kvm_stat improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) kvm: nVMX: Don't validate disabled secondary controls KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick Revert "KVM: Support vCPU-based gfn->hva cache" tools/kvm: fix top level makefile KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING KVM: Documentation: remove VM mmap documentation kvm: nVMX: Remove superfluous VMX instruction fault checks KVM: x86: fix emulation of RSM and IRET instructions KVM: mark requests that need synchronization KVM: return if kvm_vcpu_wake_up() did wake up the VCPU KVM: add explicit barrier to kvm_vcpu_kick KVM: perform a wake_up in kvm_make_all_cpus_request KVM: mark requests that do not need a wakeup KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up KVM: x86: always use kvm_make_request instead of set_bit KVM: add kvm_{test,clear}_request to replace {test,clear}_bit s390: kvm: Cpu model support for msa6, msa7 and msa8 KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK kvm: better MWAIT emulation for guests KVM: x86: virtualize cpuid faulting ...
| * | | | KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kickPaolo Bonzini2017-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The #ifndef was removed in 75aaafb79f73516b69d5639ad30a72d72e75c8b4, but it was also protecting smp_send_reschedule() in kvm_vcpu_kick(). Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | Revert "KVM: Support vCPU-based gfn->hva cache"Paolo Bonzini2017-05-031-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bbd6411513aa8ef3ea02abab61318daf87c1af1e. I've been sitting on this revert for too long and it unfortunately missed 4.11. It's also the reason why I haven't merged ring-based dirty tracking for 4.12. Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and kvm_vcpu_write_guest_offset_cached means that the MSR value can now be used to access SMRAM, simply by making it point to an SMRAM physical address. This is problematic because it lets the guest OS overwrite memory that it shouldn't be able to touch. Cc: stable@vger.kernel.org Fixes: bbd6411513aa8ef3ea02abab61318daf87c1af1e Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTINGDavid Hildenbrand2017-05-022-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We needed the lock to avoid racing with creation of the irqchip on x86. As kvm_set_irq_routing() calls srcu_synchronize_expedited(), this lock might be held for a longer time. Let's introduce an arch specific callback to check if we can actually add irq routes. For x86, all we have to do is check if we have an irqchip in the kernel. We don't need kvm->lock at that point as the irqchip is marked as inititalized only when actually fully created. Reported-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Fixes: 1df6ddede10a ("KVM: x86: race between KVM_SET_GSI_ROUTING and KVM_CREATE_IRQCHIP") Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | Merge tag 'kvm-arm-for-v4.12' of ↵Paolo Bonzini2017-04-279-315/+366
| |\ \ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM Changes for v4.12. Changes include: - Using the common sysreg definitions between KVM and arm64 - Improved hyp-stub implementation with support for kexec and kdump on the 32-bit side - Proper PMU exception handling - Performance improvements of our GIC handling - Support for irqchip in userspace with in-kernel arch-timers and PMU support - A fix for a race condition in our PSCI code Conflicts: Documentation/virtual/kvm/api.txt include/uapi/linux/kvm.h
| | * | | KVM: arm/arm64: vgic-v3: Fix off-by-one LR accessMarc Zyngier2017-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When iterating over the used LRs, be careful not to try to access an unused LR, or even an unimplemented one if you're unlucky... Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| | * | | KVM: arm/arm64: vgic-v3: De-optimize VMCR save/restore when emulating a GICv2Marc Zyngier2017-04-192-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When emulating a GICv2-on-GICv3, special care must be taken to only save/restore VMCR_EL2 when ICC_SRE_EL1.SRE is cleared. Otherwise, all Group-0 interrupts end-up being delivered as FIQ, which is probably not what the guest expects, as demonstrated here with an unhappy EFI: FIQ Exception at 0x000000013BD21CC4 This means that we cannot perform the load/put trick when dealing with VMCR_EL2 (because the host has SRE set), and we have to deal with it in the world-switch. Fortunately, this is not the most common case (modern guests should be able to deal with GICv3 directly), and the performance is not worse than what it was before the VMCR optimization. Reviewed-by: Christoffer Dall <cdall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>
| | * | | KVM: arm/arm64: Report PMU overflow interrupts to userspace irqchipChristoffer Dall2017-04-092-7/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When not using an in-kernel VGIC, but instead emulating an interrupt controller in userspace, we should report the PMU overflow status to that userspace interrupt controller using the KVM_CAP_ARM_USER_IRQ feature. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | KVM: arm/arm64: Support arch timers with a userspace gicAlexander Graf2017-04-091-23/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you're running with a userspace gic or other interrupt controller (that is no vgic in the kernel), then you have so far not been able to use the architected timers, because the output of the architected timers, which are driven inside the kernel, was a kernel-only construct between the arch timer code and the vgic. This patch implements the new KVM_CAP_ARM_USER_IRQ feature, where we use a side channel on the kvm_run structure, run->s.regs.device_irq_level, to always notify userspace of the timer output levels when using a userspace irqchip. This works by ensuring that before we enter the guest, if the timer output level has changed compared to what we last told userspace, we don't enter the guest, but instead return to userspace to notify it of the new level. If we are exiting, because of an MMIO for example, and the level changed at the same time, the value is also updated and userspace can sample the line as it needs. This is nicely achieved simply always updating the timer_irq_level field after the main run loop. Note that the kvm_timer_update_irq trace event is changed to show the host IRQ number for the timer instead of the guest IRQ number, because the kernel no longer know which IRQ userspace wires up the timer signal to. Also note that this patch implements all required functionality but does not yet advertise the capability. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | KVM: arm/arm64: Cleanup the arch timer code's irqchip checkingChristoffer Dall2017-04-091-8/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we check if we have an in-kernel irqchip and if the vgic was properly implemented several places in the arch timer code. But, we already predicate our enablement of the arm timers on having a valid and initialized gic, so we can simply check if the timers are enabled or not. This also gets rid of the ugly "error that's not an error but used to signal that the timer shouldn't poke the gic" construct we have. Reviewed-by: Alexander Graf <agraf@suse.de> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | KVM: arm/arm64: vgic: Improve sync_hwstate performanceChristoffer Dall2017-04-093-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to call any functions to fold LRs when we don't use any LRs and we don't need to mess with overflow flags, take spinlocks, or prune the AP list if the AP list is empty. Note: list_empty is a single atomic read (uses READ_ONCE) and can therefore check if a list is empty or not without the need to take the spinlock protecting the list. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@linaro.org>