summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'arc-4.11-rc1' of ↵Linus Torvalds2017-02-2218-151/+138
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC updates from Vineet Gupta: - Intc imporvements [Yuriy] - VDK platform updates [Alexey] * tag 'arc-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: [plat-*] ARC_HAS_COH_CACHES no longer relevant ARCv2: intc: Delete useless comments in Device Trees ARCv2: IDU-intc: Delete deprecated parameters in Device Trees ARCv2: IDU-intc: mask all common interrupts by default ARCv2: IDU-intc: Use build registers for getting numbers of interrupts ARCv2: intc: Set default priority for all core interrupts ARCv2: intc: Use runtime value of irq count for setting up intc ARCv2: intc: Rework the build time irq count information ARC: [intc-*]: confine NR_CPU_IRQS to intc code ARCv2: intc: Use ARC_REG_STATUS32 for addressing STATUS32 reg arc: vdk: Add support of UIO arc: vdk: Add support of MMC controller arc: vdk: Disable halt on reset
| * ARC: [plat-*] ARC_HAS_COH_CACHES no longer relevantVineet Gupta2017-02-063-8/+0
| | | | | | | | | | | | | | | | | | | | A typical SMP system expects cache coherency. Initial NPS platform support was slated to be SMP w/o cache coherency. However it seems the platform now selects that option, so there is no point in keeping it around. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: intc: Delete useless comments in Device TreesYuriy Kolerov2017-02-062-2/+0
| | | | | | | | | | Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: IDU-intc: Delete deprecated parameters in Device TreesYuriy Kolerov2017-02-067-101/+21
| | | | | | | | | | | | | | | | | | | | | | | | No need for specifying a list of interrupts in the declaration of IDU interrupt controller anymore since the kernel can obtain a number of supported interrupts from the build register. Also delete support of the second parameter for devices which are connected to IDU because it is not used anywhere. Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: IDU-intc: mask all common interrupts by defaultYuriy Kolerov2017-02-061-2/+10
| | | | | | | | | | | | Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: broken off from a bigger patch]
| * ARCv2: IDU-intc: Use build registers for getting numbers of interruptsYuriy Kolerov2017-02-062-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This enhancement is needed to allow masking all available common interrupts in IDU interrupt controller in boot time since the kernel can discover a number of them from the build register. Also now there is no need to specify in device tree a list of used core interrupts by IDU. E.g. before: idu_intc: idu-interrupt-controller { compatible = "snps,archs-idu-intc"; interrupt-controller; interrupt-parent = <&core_intc>; #interrupt-cells = <2>; interrupts = <24 25 26 27 28 29 30 31>; }; and after: idu_intc: idu-interrupt-controller { compatible = "snps,archs-idu-intc"; interrupt-controller; interrupt-parent = <&core_intc>; #interrupt-cells = <2>; }; Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: intc: Set default priority for all core interruptsYuriy Kolerov2017-02-061-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After reset all interrupts in the core interrupt controller has the highest priority P0. If the platform supports Fast IRQs and has more than 1 banks of registers then CPU automatically switch banks of registers when P0 interrupt comes. The problem is that the kernel expects that by default switching of banks is not used by all interrupts. It is necessary to set a default nonzero priority for all available interrupts to avoid undefined behaviour. Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: intc: Use runtime value of irq count for setting up intcVineet Gupta2017-02-062-11/+18
| | | | | | | | Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: intc: Rework the build time irq count informationYuriy Kolerov2017-02-063-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently Kconfig knob ARC_NUMBER_OF_INTERRUPTS is used as indicator of hard irq count. But it is flawed that it doesn't affect - NR_IRQS : for number of virtual interrupts - NR_CPU_IRQS : for number of hardware interrupts Moreover the actual hardware irq count might still not be same as ARC_NUMBER_OF_INTERRUPTS. So use the information availble in the Build Configuration Registers and get rid of the Kconfig option. We still need "some" build time info about irq count to set up sufficient number of vector table entries. This is done with a sufficiently large NR_CPU_IRQS which will eventually be used soley for that purpose (subsequent patches will remove its usage elsewhere) So to summarize what this patch does: * NR_CPU_IRQS defines a maximum number of hardware interrupts. * Remove ARC_NUMBER_OF_INTERRUPTS option and create interrupts table for all possible hardware interrupts. * Increase a maximum number of virtual IRQs to 512. ARCv2 can support 240 interrupts in the core interrupts controllers and 128 interrupts in IDU. Thus 512 virtual IRQs must be enough for most configurations of boards. This patch leads to NR_CPU_IRQS in 2 places, to reduce the overall churn. The next patch will remove the 2nd definition anyways. Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: reworked the changelog a bit]
| * ARC: [intc-*]: confine NR_CPU_IRQS to intc codeVineet Gupta2017-02-063-1/+3
| | | | | | | | | | | | | | And even this willl change in subsequent patches where we resort to using run time info instead... Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * ARCv2: intc: Use ARC_REG_STATUS32 for addressing STATUS32 regYuriy Kolerov2017-02-062-1/+4
| | | | | | | | | | | | | | It is better to use it instead of magic numbers. Signed-off-by: Yuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * arc: vdk: Add support of UIOAlexey Brodkin2017-02-062-0/+10
| | | | | | | | | | | | | | | | ARC VDK for EVSS uses UIO for communication with Embedded Vision Subsystem. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * arc: vdk: Add support of MMC controllerAlexey Brodkin2017-02-062-0/+22
| | | | | | | | | | | | | | | | ARC VDK virtual platform emulates host MMC controller (DW Mobile Storage) and moreover rootfs is situated on that virtual card. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * arc: vdk: Disable halt on resetAlexey Brodkin2017-02-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In recent VDKs ARC cores are configured as "run on reset" which made existing kernel configuration outdated to effect that slave cores never start execution of the code keeping only master online. With that fix we're again in sync with VDK platform. And while at it we regenerate defconfig via savedefconfig so default options are now excluded. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* | Merge tag 'powerpc-4.11-1' of ↵Linus Torvalds2017-02-22144-1832/+4502
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights include: - Support for direct mapped LPC on POWER9, giving Linux direct access to devices that may be on there such as a UART. - Memory hotplug support for the Power9 Radix MMU. - Add new AUX vectors describing the processor's cache geometry, to be used by glibc. - The ability for a guest to ask the hypervisor to resize the guest's hash table, and in addition support for doing so automatically when memory is hotplugged into/out-of the guest. This allows the hash table to be sized based on the current memory usage of the guest, rather than the maximum possible memory usage. - Implementation of optprobes (kprobe optimisation) for powerpc. In addition there's the topic branch shared with the KVM tree, which includes support for guests to use the Radix MMU on Power9. Thanks to: Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton Blanchard, Benjamin Herrenschmidt, Chris Packham, Daniel Axtens, Daniel Borkmann, David Gibson, Finn Thain, Gautham R. Shenoy, Gavin Shan, Greg Kurz, Joel Stanley, John Allen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Reza Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun" * tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (129 commits) powerpc/mm/radix: Skip ptesync in pte update helpers powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm powerpc/mm/radix: Update pte update sequence for pte clear case powerpc/mm: Update PROTFAULT handling in the page fault path powerpc/xmon: Fix data-breakpoint powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n powerpc/pseries: Fix typo in parameter description powerpc/kprobes: Remove kprobe_exceptions_notify() kprobes: Introduce weak variant of kprobe_exceptions_notify() powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL powerpc/powernv: Fix opal_exit tracepoint opcode powerpc: Add a prototype for mcount() so it can be versioned powerpc: Drop GPL from of_node_to_nid() export to match other arches powerpc/kprobes: Optimize kprobe in kretprobe_trampoline() powerpc/kprobes: Implement Optprobes powerpc/kprobes: Fixes for kprobe_lookup_name() on BE powerpc: Add helper to check if offset is within relative branch range powerpc/bpf: Introduce __PPC_SH64() ...
| * | powerpc/mm/radix: Skip ptesync in pte update helpersAneesh Kumar K.V2017-02-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | We do them at the start of tlb flush, and we are sure a pte update will be followed by a tlbflush. Hence we can skip the ptesync in pte update helpers. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mmAneesh Kumar K.V2017-02-152-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | This helps us to do some optimization for application exit case, where we can skip the DD1 style pte update sequence. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm/radix: Update pte update sequence for pte clear caseAneesh Kumar K.V2017-02-151-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the kernel we do follow the below sequence in different code paths. pte = ptep_get_clear(ptep) .... set_pte_at(ptep, pte) We do that for mremap, autonuma protection update and softdirty clearing. This implies our optimization to skip a tlb flush when clearing a pte update is not valid, because for DD1 system that followup set_pte_at will be done witout doing the required tlbflush. Fix that by always doing the dd1 style pte update irrespective of new_pte value. In a later patch we will optimize the application exit case. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm: Update PROTFAULT handling in the page fault pathAneesh Kumar K.V2017-02-152-14/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With radix, we can get page fault with DSISR_PROTFAULT value set in case of PROT_NONE or autonuma mapping. The PROT_NONE case in handled by the vma check where we consider the access bad. For autonuma we should fall through and fixup the access mask correctly. Without this patch we trigger the WARN_ON() on radix. This code moves that WARN_ON() within a radix_enabled() check. I also moved the WARN_ON() outside the if condition making it apply for all type of faults (exec/write/read). It is also conditionalized for book3s, because BOOK3E can also get a PROTFAULT to handle the D/I cache sync. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/xmon: Fix data-breakpointRavi Bangoria2017-02-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently xmon data-breakpoint feature is broken. Whenever there is a watchpoint match occurs, hw_breakpoint_handler will be called by do_break via notifier chains mechanism. If watchpoint is registered by xmon, hw_breakpoint_handler won't find any associated perf_event and returns immediately with NOTIFY_STOP. Similarly, do_break also returns without notifying to xmon. Solve this by returning NOTIFY_DONE when hw_breakpoint_handler does not find any perf_event associated with matched watchpoint, rather than NOTIFY_STOP, which tells the core code to continue calling the other breakpoint handlers including the xmon one. Cc: stable@vger.kernel.org Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=yMichael Ellerman2017-02-151-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recently merged HPT (Hash Page Table) resize support broke the build when BOOK3S_64=n (ie. 32-bit or 64-bit Book3E) and MEMORY_HOTPLUG=y: arch/powerpc/mm/mem.o: In function `.arch_add_memory': (.text+0x4e4): undefined reference to `.resize_hpt_for_hotplug' Fix it by adding a dummy version. Fixes: 438cc81a41e8 ("powerpc/pseries: Automatically resize HPT for memory hot add/remove") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | Merge branch 'topic/ppc-kvm' into nextMichael Ellerman2017-02-1437-249/+1619
| |\ \ | | | | | | | | | | | | Merge the topic branch we're sharing with the kvm-ppc tree.
| | * | powerpc/powernv: Remove separate entry for OPAL real mode callsBenjamin Herrenschmidt2017-02-076-86/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All entry points already read the MSR so they can easily do the right thing. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: CONFIG_RELOCATABLE support for hmi interruptsNicholas Piggin2017-02-072-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The branch from hmi_exception_early to hmi_exception_realmode must use a "relocatable-style" branch, because it is branching from unrelocated exception code to beyond __end_interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Enable radix guest supportPaul Mackerras2017-01-315-27/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a few last pieces of the support for radix guests: * Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and KVM_PPC_GET_RMMU_INFO ioctls for radix guests * On POWER9, allow secondary threads to be on/off-lined while guests are running. * Set up LPCR and the partition table entry for radix guests. * Don't allocate the rmap array in the kvm_memory_slot structure on radix. * Don't try to initialize the HPT for radix guests, since they don't have an HPT. * Take out the code that prevents the HV KVM module from initializing on radix hosts. At this stage, we only support radix guests if the host is running in radix mode, and only support HPT guests if the host is running in HPT mode. Thus a guest cannot switch from one mode to the other, which enables some simplifications. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1Paul Mackerras2017-01-311-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On POWER9 DD1, we need to invalidate the ERAT (effective to real address translation cache) when changing the PIDR register, which we do as part of guest entry and exit. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Allow guest exit path to have MMU onPaul Mackerras2017-01-313-17/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we allow LPCR[AIL] to be set for radix guests, then interrupts from the guest to the host can be delivered by the hardware with relocation on, and thus the code path starting at kvmppc_interrupt_hv can be executed in virtual mode (MMU on) for radix guests (previously it was only ever executed in real mode). Most of the code is indifferent to whether the MMU is on or off, but the calls to OPAL that use the real-mode OPAL entry code need to be switched to use the virtual-mode code instead. The affected calls are the calls to the OPAL XICS emulation functions in kvmppc_read_one_intr() and related functions. We test the MSR[IR] bit to detect whether we are in real or virtual mode, and call the opal_rm_* or opal_* function as appropriate. The other place that depends on the MMU being off is the optimization where the guest exit code jumps to the external interrupt vector or hypervisor doorbell interrupt vector, or returns to its caller (which is __kvmppc_vcore_entry). If the MMU is on and we are returning to the caller, then we don't need to use an rfid instruction since the MMU is already on; a simple blr suffices. If there is an external or hypervisor doorbell interrupt to handle, we branch to the relocation-on version of the interrupt vector. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movementPaul Mackerras2017-01-314-14/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With radix, the guest can do TLB invalidations itself using the tlbie (global) and tlbiel (local) TLB invalidation instructions. Linux guests use local TLB invalidations for translations that have only ever been accessed on one vcpu. However, that doesn't mean that the translations have only been accessed on one physical cpu (pcpu) since vcpus can move around from one pcpu to another. Thus a tlbiel might leave behind stale TLB entries on a pcpu where the vcpu previously ran, and if that task then moves back to that previous pcpu, it could see those stale TLB entries and thus access memory incorrectly. The usual symptom of this is random segfaults in userspace programs in the guest. To cope with this, we detect when a vcpu is about to start executing on a thread in a core that is a different core from the last time it executed. If that is the case, then we mark the core as needing a TLB flush and then send an interrupt to any thread in the core that is currently running a vcpu from the same guest. This will get those vcpus out of the guest, and the first one to re-enter the guest will do the TLB flush. The reason for interrupting the vcpus executing on the old core is to cope with the following scenario: CPU 0 CPU 1 CPU 4 (core 0) (core 0) (core 1) VCPU 0 runs task X VCPU 1 runs core 0 TLB gets entries from task X VCPU 0 moves to CPU 4 VCPU 0 runs task X Unmap pages of task X tlbiel (still VCPU 1) task X moves to VCPU 1 task X runs task X sees stale TLB entries That is, as soon as the VCPU starts executing on the new core, it could unmap and tlbiel some page table entries, and then the task could migrate to one of the VCPUs running on the old core and potentially see stale TLB entries. Since the TLB is shared between all the threads in a core, we only use the bit of kvm->arch.need_tlb_flush corresponding to the first thread in the core. To ensure that we don't have a window where we can miss a flush, this moves the clearing of the bit from before the actual flush to after it. This way, two threads might both do the flush, but we prevent the situation where one thread can enter the guest before the flush is finished. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix modePaul Mackerras2017-01-311-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the guest is in radix mode, then it doesn't have a hashed page table (HPT), so all of the hypercalls that manipulate the HPT can't work and should return an error. This adds checks to make them return H_FUNCTION ("function not supported"). Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Implement dirty page logging for radix guestsPaul Mackerras2017-01-314-33/+144
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds code to keep track of dirty pages when requested (that is, when memslot->dirty_bitmap is non-NULL) for radix guests. We use the dirty bits in the PTEs in the second-level (partition-scoped) page tables, together with a bitmap of pages that were dirty when their PTE was invalidated (e.g., when the page was paged out). This bitmap is stored in the first half of the memslot->dirty_bitmap area, and kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the bitmap that gets returned to userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: MMU notifier callbacks for radix guestsPaul Mackerras2017-01-313-21/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adapts our implementations of the MMU notifier callbacks (unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva) to call radix functions when the guest is using radix. These implementations are much simpler than for HPT guests because we have only one PTE to deal with, so we don't need to traverse rmap chains. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Page table construction and page faults for radix guestsPaul Mackerras2017-01-315-3/+415
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the code to construct the second-level ("partition-scoped" in architecturese) page tables for guests using the radix MMU. Apart from the PGD level, which is allocated when the guest is created, the rest of the tree is all constructed in response to hypervisor page faults. As well as hypervisor page faults for missing pages, we also get faults for reference/change (RC) bits needing to be set, as well as various other error conditions. For now, we only set the R or C bit in the guest page table if the same bit is set in the host PTE for the backing page. This code can take advantage of the guest being backed with either transparent or ordinary 2MB huge pages, and insert 2MB page entries into the guest page tables. There is no support for 1GB huge pages yet. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guestsPaul Mackerras2017-01-313-11/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds code to branch around the parts that radix guests don't need - clearing and loading the SLB with the guest SLB contents, saving the guest SLB contents on exit, and restoring the host SLB contents. Since the host is now using radix, we need to save and restore the host value for the PID register. On hypervisor data/instruction storage interrupts, we don't do the guest HPT lookup on radix, but just save the guest physical address for the fault (from the ASDR register) in the vcpu struct. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Add basic infrastructure for radix guestsPaul Mackerras2017-01-316-3/+160
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a field in struct kvm_arch and an inline helper to indicate whether a guest is a radix guest or not, plus a new file to contain the radix MMU code, which currently contains just a translate function which knows how to traverse the guest page tables to translate an address. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9Paul Mackerras2017-01-311-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER9 adds a register called ASDR (Access Segment Descriptor Register), which is set by hypervisor data/instruction storage interrupts to contain the segment descriptor for the address being accessed, assuming the guest is using HPT translation. (For radix guests, it contains the guest real address of the access.) Thus, for HPT guests on POWER9, we can use this register rather than looking up the SLB with the slbfee. instruction. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9Paul Mackerras2017-01-313-5/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl for HPT guests on POWER9. With this, we can return 1 for the KVM_CAP_PPC_MMU_HASH_V3 capability. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMUPaul Mackerras2017-01-316-0/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds two capabilities and two ioctls to allow userspace to find out about and configure the POWER9 MMU in a guest. The two capabilities tell userspace whether KVM can support a guest using the radix MMU, or using the hashed page table (HPT) MMU with a process table and segment tables. (Note that the MMUs in the POWER9 processor cores do not use the process and segment tables when in HPT mode, but the nest MMU does). The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify whether a guest will use the radix MMU or the HPT MMU, and to specify the size and location (in guest space) of the process table. The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about the radix MMU. It returns a list of supported radix tree geometries (base page size and number of bits indexed at each level of the radix tree) and the encoding used to specify the various page sizes for the TLB invalidate entry instruction. Initially, both capabilities return 0 and the ioctls return -EINVAL, until the necessary infrastructure for them to operate correctly is added. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: Allow for relocation-on interrupts from guest to hostPaul Mackerras2017-01-312-29/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With host and guest both using radix translation, it is feasible for the host to take interrupts that come from the guest with relocation on, and that is in fact what the POWER9 hardware will do when LPCR[AIL] = 3. All such interrupts use HSRR0/1 not SRR0/1 except for system call with LEV=1 (hcall). Therefore this adds the KVM tests to the _HV variants of the relocation-on interrupt handlers, and adds the KVM test to the relocation-on system call entry point. We also instantiate the relocation-on versions of the hypervisor data storage and instruction interrupt handlers, since these can occur with relocation on in radix guests. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: Make type of partition table flush depend on partition typePaul Mackerras2017-01-311-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When changing a partition table entry on POWER9, we do a particular form of the tlbie instruction which flushes all TLBs and caches of the partition table for a given logical partition ID (LPID). This instruction has a field in the instruction word, labelled R (radix), which should be 1 if the partition was previously a radix partition and 0 if it was a HPT partition. This implements that logic. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: Export pgtable_cache and pgtable_cache_add for KVMPaul Mackerras2017-01-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This exports the pgtable_cache array and the pgtable_cache_add function so that HV KVM can use them for allocating radix page tables for guests. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: More definitions for POWER9Paul Mackerras2017-01-312-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds definitions for bits in the DSISR register which are used by POWER9 for various translation-related exception conditions, and for some more bits in the partition table entry that will be needed by KVM. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: Enable use of radix MMU under hypervisor on POWER9Paul Mackerras2017-01-317-6/+81
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To use radix as a guest, we first need to tell the hypervisor via the ibm,client-architecture call first that we support POWER9 and architecture v3.00, and that we can do either radix or hash and that we would like to choose later using an hcall (the H_REGISTER_PROC_TBL hcall). Then we need to check whether the hypervisor agreed to us using radix. We need to do this very early on in the kernel boot process before any of the MMU initialization is done. If the hypervisor doesn't agree, we can't use radix and therefore clear the radix MMU feature bit. Later, when we have set up our process table, which points to the radix tree for each process, we need to install that using the H_REGISTER_PROC_TBL hcall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/pseries: Fixes for the "ibm,architecture-vec-5" optionsPaul Mackerras2017-01-312-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the byte index values for some of the option bits in the "ibm,architectur-vec-5" property. The "platform facilities options" bits are in byte 17 not byte 14, so the upper 8 bits of their definitions need to be 0x11 not 0x0E. The "sub processor support" option is in byte 21 not byte 15. Note none of these options are actually looked up in "ibm,architecture-vec-5" at this time, so there is no bug. When checking whether option bits are set, we should check that the offset of the byte being checked is less than the vector length that we got from the hypervisor. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | powerpc/64: Don't try to use radix MMU under a hypervisorPaul Mackerras2017-01-311-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if the kernel is running on a POWER9 processor under a hypervisor, it will try to use the radix MMU even though it doesn't have the necessary code to use radix under a hypervisor (it doesn't negotiate use of radix, and it doesn't do the H_REGISTER_PROC_TBL hcall). The result is that the guest kernel will crash when it tries to turn on the MMU. This fixes it by looking for the /chosen/ibm,architecture-vec-5 property, and if it exists, clears the radix MMU feature bit, before we decide whether to initialize for radix or HPT. This property is created by the hypervisor as a result of the guest calling the ibm,client-architecture-support method to indicate its capabilities, so it will indicate whether the hypervisor agreed to us using radix. Systems without a hypervisor may have this property also (for example, skiboot creates it), so we check the HV bit in the MSR to see whether we are running as a guest or not. If we are in hypervisor mode, then we can do whatever we like including using the radix MMU. The reason for using this property is that in future, when we have support for using radix under a hypervisor, we will need to check this property to see whether the hypervisor agreed to us using radix. Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interruptsNicholas Piggin2017-01-314-8/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 64-bit Book3S exception handlers must find the dynamic kernel base to add to the target address when branching beyond __end_interrupts, in order to support kernel running at non-0 physical address. Support this in KVM by branching with CTR, similarly to regular interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and restored after the branch. Without this, the host kernel hangs and crashes randomly when it is running at a non-0 address and a KVM guest is started. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S: Move 64-bit KVM interrupt handler out from alt sectionNicholas Piggin2017-01-272-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A subsequent patch to make KVM handlers relocation-safe makes them unusable from within alt section "else" cases (due to the way fixed addresses are taken from within fixed section head code). Stop open-coding the KVM handlers, and add them both as normal. A more optimal fix may be to allow some level of alternate feature patching in the exception macros themselves, but for now this will do. The TRAMP_KVM handlers must be moved to the "virt" fixed section area (name is arbitrary) in order to be closer to .text and avoid the dreaded "relocation truncated to fit" error. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| | * | KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HVNicholas Piggin2017-01-273-27/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the calling convention to put the trap number together with CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV handler. The 64-bit PR handler entry translates the calling convention back to match the previous call convention (i.e., shared with 32-bit), for simplicity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=yMichael Ellerman2017-02-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the build breaks if CMA=n and SPAPR_TCE_IOMMU=y: arch/powerpc/mm/mmu_context_iommu.c: In function ‘mm_iommu_get’: arch/powerpc/mm/mmu_context_iommu.c:193:42: error: ‘MIGRATE_CMA’ undeclared (first use in this function) if (get_pageblock_migratetype(page) == MIGRATE_CMA) { ^~~~~~~~~~~ Fix it by using the existing is_migrate_cma_page(), which evaulates to false when CMA=n. Fixes: 2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=nMichael Ellerman2017-02-142-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we enable RADIX but disable HUGETLBFS, the build breaks with: arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge' arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge' Fix it by stubbing those functions when HUGETLBFS=n. Fixes: 4b5d62ca17a1 ("powerpc/mm: add radix__remove_section_mapping()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | powerpc/pseries: Fix typo in parameter descriptionWei Yongjun2017-02-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typo in "hotplug_delay" parameter description. This allows modinfo to match the help text to the parameter. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>