linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc: Exception hooks for context tracking subsystem	Li Zhong	2013-05-14	3	-45/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the exception hooks for context tracking subsystem, including data access, program check, single step, instruction breakpoint, machine check, alignment, fp unavailable, altivec assist, unknown exception, whose handlers might use RCU. This patch corresponds to [PATCH] x86: Exception hooks for userspace RCU extended QS commit 6ba3c97a38803883c2eee489505796cb0a727122 But after the exception handling moved to generic code, and some changes in following two commits: 56dd9470d7c8734f055da2a6bac553caf4a468eb context_tracking: Move exception handling to generic code 6c1e0256fad84a843d915414e4b5973b7443d48d context_tracking: Restore correct previous context state on exception exit it is able for exception hooks to use the generic code above instead of a redundant arch implementation. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Syscall hooks for context tracking subsystem	Li Zhong	2013-05-14	2	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the syscall slow path hooks for context tracking subsystem, corresponding to [PATCH] x86: Syscall hooks for userspace RCU extended QS commit bf5a3c13b939813d28ce26c01425054c740d6731 TIF_MEMDIE is moved to the second 16-bits (with value 17), as it seems there is no asm code using it. TIF_NOHZ is added to _TIF_SYCALL_T_OR_A, so it is better for it to be in the same 16 bits with others in the group, so in the asm code, andi. with this group could work. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/booke64: Fix kernel hangs at kernel_dbg_exc	Scott Wood	2013-05-14	2	-4/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MSR_DE is not cleared on entry to the kernel, and we don't clear it explicitly outside of debug code. If we have MSR_DE set in prime_debug_regs(), and the new thread has events enabled in DBCR0 (e.g. ICMP is set in thread->dbsr0, even though it was cleared in the real DBCR0 when the thread got scheduled out), we'll end up taking a debug exception in the kernel when DBCR0 is loaded. DSRR0 will not point to an exception vector, and the kernel ends up hanging at kernel_dbg_exc. Fix this by always clearing MSR_DE when we load new debug state. Another observed source of kernel_dbg_exc hangs is with the branch taken event. If this event is active, but we take a non-debug trap (e.g. a TLB miss or an asynchronous interrupt) before the next branch. We end up taking a branch-taken debug exception on the initial branch instruction of the exception vector, but because the debug exception is DBSR_BT rather than DBSR_IC we branch to kernel_dbg_exc before even checking the DSRR0 address. Fix this by checking for DBSR_BT as well as DBSR_IC, which is what 32-bit does and what the comments suggest was intended in the 64-bit code as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix irq_set_affinity() return values	Alexander Gordeev	2013-05-14	4	-4/+4
\| \| \| \| \|	Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Provide __bswapdi2	David Woodhouse	2013-05-14	3	-1/+24
\| \| \| \| \| \| \| \| \| \|	Some versions of GCC apparently expect this to be provided by libgcc. Updates from Mikey to fix 32 bit version and adding "r" to registers. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Fix starting of secondary CPUs on OPALv2 and v3	Benjamin Herrenschmidt	2013-05-14	1	-6/+56
\| \| \| \| \| \| \| \| \|	The current code fails to handle kexec on OPALv2. This fixes it and adds code to improve the situation on OPALv3 where we can query the CPU status from the firmware and decide what to do based on that. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Detect OPAL v3 API version	Benjamin Herrenschmidt	2013-05-14	4	-4/+13
\| \| \| \| \| \|	Future firmwares will support that new version Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix MAX_STACK_TRACE_ENTRIES too low warning again	Li Zhong	2013-05-14	3	-4/+1
\| \| \| \| \| \| \| \| \| \| \|	Saw this warning again, and this time from the ret_from_fork path. It seems we could clear the back chain earlier in copy_thread(), which could cover both path, and also fix potential lockdep usage in schedule_tail(), or exception occurred before we clear the back chain. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make CONFIG_RTAS_PROC depend on CONFIG_PROC_FS	Michael Ellerman	2013-05-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are getting build errors with CONFIG_PROC_FS=n: arch/powerpc/kernel/rtas_flash.c In function 'rtas_flash_init': 745:33: error: unused variable 'f' [-Werror=unused-variable] But rtas_flash.c should not be built when CONFIG_PROC_FS=n, beacause all it does is provide a /proc interface to the RTAS flash routines. CONFIG_RTAS_FLASH already depends on CONFIG_RTAS_PROC, to indicate that it depends on the RTAS proc support, but CONFIG_RTAS_PROC does not depend on CONFIG_PROC_FS. So fix that. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Bring all threads online prior to migration/hibernation	Robert Jennings	2013-05-14	3	-0/+137
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch brings online all threads which are present but not online prior to migration/hibernation. After migration/hibernation those threads are taken back offline. During migration/hibernation all online CPUs must call H_JOIN, this is required by the hypervisor. Without this patch, threads that are offline (H_CEDE'd) will not be woken to make the H_JOIN call and the OS will be deadlocked (all threads either JOIN'd or CEDE'd). Cc: <stable@kernel.org> Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/rtas_flash: Fix validate_flash buffer overflow issue	Vasant Hegde	2013-05-14	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ibm,validate-flash-image RTAS call output buffer contains 150 - 200 bytes of data on latest system. Presently we have output buffer size as 64 bytes and we use sprintf to copy data from RTAS buffer to local buffer. This causes kernel oops (see below call trace). This patch increases local buffer size to 256 and also uses snprintf instead of sprintf to copy data from RTAS buffer. Kernel call trace : ------------------- Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: nfs fscache lockd auth_rpcgss nfs_acl sunrpc fuse loop dm_mod ipv6 ipv6_lib usb_storage ehea(X) sr_mod qlge ses cdrom enclosure st be2net sg ext3 jbd mbcache usbhid hid ohci_hcd ehci_hcd usbcore qla2xxx usb_common sd_mod crc_t10dif scsi_dh_hp_sw scsi_dh_rdac scsi_dh_alua scsi_dh_emc scsi_dh lpfc scsi_transport_fc scsi_tgt ipr(X) libata scsi_mod Supported: Yes NIP: 4520323031333130 LR: 4520323031333130 CTR: 0000000000000000 REGS: c0000001b91779b0 TRAP: 0400 Tainted: G X (3.0.13-0.27-ppc64) MSR: 8000000040009032 <EE,ME,IR,DR> CR: 44022488 XER: 20000018 TASK = c0000001bca1aba0[4736] 'cat' THREAD: c0000001b9174000 CPU: 36 GPR00: 4520323031333130 c0000001b9177c30 c000000000f87c98 000000000000009b GPR04: c0000001b9177c4a 000000000000000b 3520323031333130 2032303133313031 GPR08: 3133313031350a4d 000000000000009b 0000000000000000 c0000000003664a4 GPR12: 0000000022022448 c000000003ee6c00 0000000000000002 00000000100e8a90 GPR16: 00000000100cb9d8 0000000010093370 000000001001d310 0000000000000000 GPR20: 0000000000008000 00000000100fae60 000000000000005e 0000000000000000 GPR24: 0000000010129350 46573738302e3030 2046573738302e30 300a4d4720323031 GPR28: 333130313520554e 4b4e4f574e0a4d47 2032303133313031 3520323031333130 NIP [4520323031333130] 0x4520323031333130 LR [4520323031333130] 0x4520323031333130 Call Trace: [c0000001b9177c30] [4520323031333130] 0x4520323031333130 (unreliable) Instruction dump: XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/kexec: Fix kexec when using VMX optimised memcpy	Anton Blanchard	2013-05-14	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit b3f271e86e5a (powerpc: POWER7 optimised memcpy using VMX and enhanced prefetch) uses VMX when it is safe to do so (ie not in interrupt). It also looks at the task struct to decide if we have to save the current tasks' VMX state. kexec calls memcpy() at a point where the task struct may have been overwritten by the new kexec segments. If it has been overwritten then when memcpy -> enable_altivec looks up current->thread.regs->msr we get a cryptic oops or lockup. I also notice we aren't initialising thread_info->cpu, which means smp_processor_id is broken. Fix that too. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@vger.kernel.org> # 3.6+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix build errors STRICT_MM_TYPECHECKS	Aneesh Kumar K.V	2013-05-14	3	-5/+5
\| \| \| \| \|	Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Use the correct mask value when looking at pgtable address	Aneesh Kumar K.V	2013-05-14	1	-1/+1
\| \| \| \| \| \| \| \|	Our pgtable are 2sizeof(pte_t)PTRS_PER_PTE which is PTE_FRAG_SIZE. Instead of depending on frag size, mask with PMD_MASKED_BITS. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: hard_irq_disable(): Call trace_hardirqs_off after disabling	Scott Wood	2013-05-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lockdep.c has this: /* * So we're supposed to get called after you mask local IRQs, * but for some reason the hardware doesn't quite think you did * a proper job. */ if (DEBUG_LOCKS_WARN_ON(!irqs_disabled())) return; Since irqs_disabled() is based on soft_enabled(), that (not just the hard EE bit) needs to be 0 before we call trace_hardirqs_off. Signed-off-by: Scott Wood <scottwood@freescale.com>
*	powerpc/powernv: Improve kexec reliability	Benjamin Herrenschmidt	2013-05-10	7	-0/+56
\| \| \| \| \| \| \| \| \| \|	We add a machine_shutdown hook that frees the OPAL interrupts (so they get masked at the source and don't fire while kexec'ing) and which triggers an IODA reset on all the PCIe host bridges which will have the effect of blocking all DMAs and subsequent PCIs interrupts. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Properly drop characters if console is closed	Benjamin Herrenschmidt	2013-05-08	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the firmware returns an error such as "closed" (or hardware error), we should drop characters. Currently we only do that when a firmware compatible with OPAL v2 APIs is detected, in the code that calls opal_console_write_buffer_space(), which didn't exist with OPAL v1 (or didn't work). However, when enabling early debug consoles, the flag indicating that v2 is supported isn't set yet, causing us, in case of errors or closed console, to spin forever. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add an in memory udbg console	Alistair Popple	2013-05-07	5	-0/+134
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new udbg early debug console which utilises statically defined input and output buffers stored within the kernel BSS. It is primarily designed to assist with bring up of new hardware which may not have a working console but which has a method of reading/writing kernel memory. This version incorporates comments made by Ben H (thanks!). Changes from v1: - Add memory barriers. - Ensure updating of read/write positions is atomic. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make hard_irq_disable() do the right thing vs. irq tracing	Benjamin Herrenschmidt	2013-05-07	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If hard_irq_disable() is called while interrupts are already soft-disabled (which is the most common case) all is already well. However you can (and in some cases want) to call it while everything is enabled (to make sure you don't get a lazy even, for example before entry into KVM guests) and in this case we need to inform the irq tracer that the irqs are going off. We have to change the inline into a macro to avoid an include circular dependency hell hole. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/topology: Fix spurr attribute permission	Benjamin Herrenschmidt	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \|	We are registering the attribute with permission 0600 but it doesn't have a store callback, which causes WARN_ON's during boot. Fix the permission. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pci: Support per-aperture memory offset	Benjamin Herrenschmidt	2013-05-06	10	-104/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PCI core supports an offset per aperture nowadays but our arch code still has a single offset per host bridge representing the difference betwen CPU memory addresses and PCI MMIO addresses. This is a problem as new machines and hypervisor versions are coming out where the 64-bit windows will have a different offset (basically mapped 1:1) from the 32-bit windows. This fixes it by using separate offsets. In the long run, we probably want to get rid of that intermediary struct pci_controller and have those directly stored into the pci_host_bridge as they are parsed but this will be a more invasive change. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cell/iommu: Improve error message for missing node	Benjamin Herrenschmidt	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some devices don't have a correct node ID and thus can't be attached to an iommu. The message displayed by the iommu code isn't very useful if you don't have a device-tree node as it tries to print the device-tree path but not the struct device name. Improve this by printing the device name as well. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cell/spufs: Fix status attribute permission	Benjamin Herrenschmidt	2013-05-06	1	-1/+1
\| \| \| \| \| \| \| \|	We are registering the attribute with permission 0644 but it doesn't have a store callback, which causes WARN_ON's during boot. Fix the permission. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	irqdomain: Allow quiet failure mode	Benjamin Herrenschmidt	2013-05-06	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some interrupt controllers refuse to map interrupts marked as "protected" by firwmare. Since we try to map everyting in the device-tree on some platforms, we end up with a lot of nasty WARN's in the boot log for what is a normal situation on those machines. This defines a specific return code (-EPERM) from the host map() callback which cause irqdomain to fail silently. MPIC is updated to return this when hitting a protected source printing only a single line message for diagnostic purposes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pnv: Fix "compatible" property for P8 PHB	Benjamin Herrenschmidt	2013-05-06	1	-1/+1
\| \| \| \| \| \| \|	The property should be "ibm,power8-pciex", not "ibm,p8-pciex". The latter was changed in FW because it was inconsistent with the rest of the nodes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pci: Don't add bogus empty resources to PHBs	Benjamin Herrenschmidt	2013-05-06	1	-14/+16
\| \| \| \| \| \| \| \|	When converting to use the new pci_add_resource_offset() we didn't properly account for empty resources (0 flags) and add those bogons to the PHBs. The result is some annoying messages in the log. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powerpnv: Properly handle failure starting CPUs	Benjamin Herrenschmidt	2013-05-06	1	-1/+3
\| \| \| \| \| \| \| \|	If OPAL returns an error, propagate it upward rather than spinning seconds waiting for a CPU that will never show up Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cputable: Advertise support for ISEL/HTM/DSCR/TAR on POWER8	Nishanth Aravamudan	2013-05-06	1	-0/+5
\| \| \| \| \|	Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cputable: Advertise ISEL support on appropriate embedded processors	Nishanth Aravamudan	2013-05-06	1	-0/+5
\| \| \| \| \|	Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cputable: Advertise DSCR support on P7/P7+	Nishanth Aravamudan	2013-05-06	1	-0/+4
\| \| \| \| \|	Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/cputable: Reserve bits in HWCAP2 for new features	Nishanth Aravamudan	2013-05-06	2	-0/+11
\| \| \| \| \| \| \|	Also, make HTM's presence dependent on the .config option. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Perform proper max_bus_speed detection	Kleber Sacilotto de Souza	2013-05-06	5	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \|	On pseries machines the detection for max_bus_speed should be done through an OpenFirmware property. This patch adds a function to perform this detection and a hook to perform dynamic adding of the function only for pseries. This is done by overwriting the weak pcibios_root_bridge_prepare function which is called by pci_create_root_bus(). From: Lucas Kannebley Tavares <lucaskt@linux.vnet.ibm.com> Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Force 32 bit MSIs for devices that require it	Brian King	2013-05-06	2	-3/+20
\| \| \| \| \| \| \| \| \| \| \|	The following patch implements a new PAPR change which allows the OS to force the use of 32 bit MSIs, regardless of what the PCI capabilities indicate. This is required for some devices that advertise support for 64 bit MSIs but don't actually support them. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/tm: Fix null pointer deference in flush_hash_page	Michael Neuling	2013-05-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Make sure that current->thread.reg exists before we deference it in flush_hash_page. Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: John J Miller <millerjo@us.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Defer OPAL exception handler registration	Jeremy Kerr	2013-05-06	1	-2/+13
\| \| \| \| \| \| \| \| \| \| \| \|	Currently, the OPAL exception vectors are registered before the feature fixups are processed. This means that the now-firmware-owned vectors will likely be overwritten by the kernel. This change moves the exception registration code to an early initcall, rather than at machine_init time. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Emulate non privileged DSCR read and write	Anton Blanchard	2013-05-06	2	-2/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POWER8 allows read and write of the DSCR in userspace. We added kernel emulation so applications could always use the instructions regardless of the CPU type. Unfortunately there are two SPRs for the DSCR and we only added emulation for the privileged one. Add code to match the non privileged one. A simple test was created to verify the fix: http://ozlabs.org/~anton/junkcode/user_dscr_test.c Without the patch we get a SIGILL and it passes with the patch. Signed-off-by: Anton Blanchard <anton@samba.org> Cc: <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	Merge tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2013-05-05	93	-1849/+7424
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull kvm updates from Gleb Natapov: "Highlights of the updates are: general: - new emulated device API - legacy device assignment is now optional - irqfd interface is more generic and can be shared between arches x86: - VMCS shadow support and other nested VMX improvements - APIC virtualization and Posted Interrupt hardware support - Optimize mmio spte zapping ppc: - BookE: in-kernel MPIC emulation with irqfd support - Book3S: in-kernel XICS emulation (incomplete) - Book3S: HV: migration fixes - BookE: more debug support preparation - BookE: e6500 support ARM: - reworking of Hyp idmaps s390: - ioeventfd for virtio-ccw And many other bug fixes, cleanups and improvements" * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits) kvm: Add compat_ioctl for device control API KVM: x86: Account for failing enable_irq_window for NMI window request KVM: PPC: Book3S: Add API for in-kernel XICS emulation kvm/ppc/mpic: fix missing unlock in set_base_addr() kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write kvm/ppc/mpic: remove users kvm/ppc/mpic: fix mmio region lists when multiple guests used kvm/ppc/mpic: remove default routes from documentation kvm: KVM_CAP_IOMMU only available with device assignment ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ...
\| *	Merge branch 'kvm-arm-for-3.10' of git://github.com/columbia/linux-kvm-arm ↵	Marcelo Tosatti	2013-05-03	14	-320/+526
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into queue * 'kvm-arm-for-3.10' of git://github.com/columbia/linux-kvm-arm: ARM: KVM: iterate over all CPUs for CPU compatibility check KVM: ARM: Fix spelling in error message ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally KVM: ARM: Fix API documentation for ONE_REG encoding ARM: KVM: promote vfp_host pointer to generic host cpu context ARM: KVM: add architecture specific hook for capabilities ARM: KVM: perform HYP initilization for hotplugged CPUs ARM: KVM: switch to a dual-step HYP init code ARM: KVM: rework HYP page table freeing ARM: KVM: enforce maximum size for identity mapped code ARM: KVM: move to a KVM provided HYP idmap ARM: KVM: fix HYP mapping limitations around zero ARM: KVM: simplify HYP mapping population ARM: KVM: arch_timer: use symbolic constants ARM: KVM: add support for minimal host vs guest profiling
\| \| *	ARM: KVM: iterate over all CPUs for CPU compatibility check	Andre Przywara	2013-04-29	1	-3/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kvm_target_cpus() checks the compatibility of the used CPU with KVM, which is currently limited to ARM Cortex-A15 cores. However by calling it only once on any random CPU it assumes that all cores are the same, which is not necessarily the case (for example in Big.Little). [ I cut some of the commit message and changed the formatting of the code slightly to pass checkpatch and look more like the rest of the kvm/arm init code - Christoffer ] Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	KVM: ARM: Fix spelling in error message	Christoffer Dall	2013-04-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	s/unkown/unknown/ Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally	Arnd Bergmann	2013-04-29	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The CONFIG_KVM_ARM_MAX_VCPUS symbol is needed in order to build the kernel/context_tracking.c code, which includes the vgic data structures implictly through the kvm headers. Definining the symbol to zero on builds without KVM resolves this build error: In file included from include/linux/kvm_host.h:33:0, from kernel/context_tracking.c:18: arch/arm/include/asm/kvm_host.h:28:23: warning: "CONFIG_KVM_ARM_MAX_VCPUS" is not defined [-Wundef] #define KVM_MAX_VCPUS CONFIG_KVM_ARM_MAX_VCPUS ^ arch/arm/include/asm/kvm_vgic.h:34:24: note: in expansion of macro 'KVM_MAX_VCPUS' #define VGIC_MAX_CPUS KVM_MAX_VCPUS ^ arch/arm/include/asm/kvm_vgic.h:38:6: note: in expansion of macro 'VGIC_MAX_CPUS' #if (VGIC_MAX_CPUS > 8) ^ In file included from arch/arm/include/asm/kvm_host.h:41:0, from include/linux/kvm_host.h:33, from kernel/context_tracking.c:18: arch/arm/include/asm/kvm_vgic.h:59:11: error: 'CONFIG_KVM_ARM_MAX_VCPUS' undeclared here (not in a function) } percpu[VGIC_MAX_CPUS]; ^ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@cs.columbia.edu> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: promote vfp_host pointer to generic host cpu context	Marc Zyngier	2013-04-29	3	-18/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We use the vfp_host pointer to store the host VFP context, should the guest start using VFP itself. Actually, we can use this pointer in a more generic way to store CPU speficic data, and arm64 is using it to dump the whole host state before switching to the guest. Simply rename the vfp_host field to host_cpu_context, and the corresponding type to kvm_cpu_context_t. No change in functionnality. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: add architecture specific hook for capabilities	Marc Zyngier	2013-04-29	2	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the capabilities are common to both arm and arm64, but we still need to handle the exceptions. Introduce kvm_arch_dev_ioctl_check_extension, which both architectures implement (in the 32bit case, it just returns 0). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: perform HYP initilization for hotplugged CPUs	Marc Zyngier	2013-04-29	3	-22/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we have the necessary infrastructure to boot a hotplugged CPU at any point in time, wire a CPU notifier that will perform the HYP init for the incoming CPU. Note that this depends on the platform code and/or firmware to boot the incoming CPU with HYP mode enabled and return to the kernel by following the normal boot path (HYP stub installed). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: switch to a dual-step HYP init code	Marc Zyngier	2013-04-29	5	-79/+197
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Our HYP init code suffers from two major design issues: - it cannot support CPU hotplug, as we tear down the idmap very early - it cannot perform a TLB invalidation when switching from init to runtime mappings, as pages are manipulated from PL1 exclusively The hotplug problem mandates that we keep two sets of page tables (boot and runtime). The TLB problem mandates that we're able to transition from one PGD to another while in HYP, invalidating the TLBs in the process. To be able to do this, we need to share a page between the two page tables. A page that will have the same VA in both configurations. All we need is a VA that has the following properties: - This VA can't be used to represent a kernel mapping. - This VA will not conflict with the physical address of the kernel text The vectors page seems to satisfy this requirement: - The kernel never maps anything else there - The kernel text being copied at the beginning of the physical memory, it is unlikely to use the last 64kB (I doubt we'll ever support KVM on a system with something like 4MB of RAM, but patches are very welcome). Let's call this VA the trampoline VA. Now, we map our init page at 3 locations: - idmap in the boot pgd - trampoline VA in the boot pgd - trampoline VA in the runtime pgd The init scenario is now the following: - We jump in HYP with four parameters: boot HYP pgd, runtime HYP pgd, runtime stack, runtime vectors - Enable the MMU with the boot pgd - Jump to a target into the trampoline page (remember, this is the same physical page!) - Now switch to the runtime pgd (same VA, and still the same physical page!) - Invalidate TLBs - Set stack and vectors - Profit! (or eret, if you only care about the code). Note that we keep the boot mapping permanently (it is not strictly an idmap anymore) to allow for CPU hotplug in later patches. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: rework HYP page table freeing	Marc Zyngier	2013-04-29	3	-103/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no point in freeing HYP page tables differently from Stage-2. They now have the same requirements, and should be dealt with the same way. Promote unmap_stage2_range to be The One True Way, and get rid of a number of nasty bugs in the process (good thing we never actually called free_hyp_pmds before...). Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: enforce maximum size for identity mapped code	Marc Zyngier	2013-04-29	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're about to move to an init procedure where we rely on the fact that the init code fits in a single page. Make sure we align the idmap text on a vector alignment, and that the code is not bigger than a single page. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: move to a KVM provided HYP idmap	Marc Zyngier	2013-04-29	4	-34/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After the HYP page table rework, it is pretty easy to let the KVM code provide its own idmap, rather than expecting the kernel to provide it. It takes actually less code to do so. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: fix HYP mapping limitations around zero	Marc Zyngier	2013-04-29	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current code for creating HYP mapping doesn't like to wrap around zero, which prevents from mapping anything into the last page of the virtual address space. It doesn't take much effort to remove this limitation, making the code more consistent with the rest of the kernel in the process. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>
\| \| *	ARM: KVM: simplify HYP mapping population	Marc Zyngier	2013-04-29	1	-60/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The way we populate HYP mappings is a bit convoluted, to say the least. Passing a pointer around to keep track of the current PFN is quite odd, and we end-up having two different PTE accessors for no good reason. Simplify the whole thing by unifying the two PTE accessors, passing a pgprot_t around, and moving the various validity checks to the upper layers. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <cdall@cs.columbia.edu>