linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	ARM: dove: fix legacy dove IRQ numbers	Russell King	2015-06-24	2	-63/+63
\| \| \| \| \| \| \| \| \| \| \| \| \|	v3.18 changed handle_IRQ() to call __handle_domain_irq(), which now rejects attempts to deliver IRQ0. Since IRQ 0 is used as the timer interrupt (just like the PIT on x86), this causes boot to fail as the bogomips calibration never completes. Fix this by shuffling all interrupts up by one. Fixes: a71b092a9c68 ("ARM: Convert handle_IRQ to use __handle_domain_irq") Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
*	ARM: mvebu: fix suspend to RAM on big-endian configurations	Thomas Petazzoni	2015-06-17	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current Armada XP suspend to RAM implementation, as added in commit 27432825ae19f ("ARM: mvebu: Armada XP GP specific suspend/resume code") does not handle big-endian configurations properly: the small bit of assembly code putting the DRAM in self-refresh and toggling the GPIOs to turn off power forgets to convert the values to little-endian. This commit fixes that by making sure the two values we will write to the DRAM controller register and GPIO register are already in little-endian before entering the critical assembly code. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: <stable@vger.kernel.org> # v3.19+ Fixes: 27432825ae19f ("ARM: mvebu: Armada XP GP specific suspend/resume code")
*	ARM: mvebu: armada-xp-linksys-mamba: Disable internal RTC	Imre Kaloz	2015-05-28	1	-0/+5
\| \| \| \| \| \| \| \| \| \|	The Mamba (like the OpenBlocks AX3) doesn't have a crystal connected to the internal RTC - let's prevent the kernel from probing it. Signed-off-by: Imre Kaloz <kaloz@openwrt.org> Cc: <stable@vger.kernel.org> # v4.0 + Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
*	ARM: dove: Add clock-names to CuBox Si5351 clk generator	Sebastian Hesselbarth	2015-05-11	1	-0/+1
\| \| \| \| \| \| \|	Si5351 clock generator on CuBox uses XTAL as clock reference, name the clock phandle accordingly. Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
*	ARM: mvebu: Fix the main PLL frequency on Armada 375, 38x and 39x SoCs	Gregory CLEMENT	2015-05-01	3	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Whereas for Armada 370 and XP the main PLL frequency was 2GHz for the Armada 375, 38x and 39x, the frequency is 1GHz. When writing support for these last SoCs, there was no official value for the PLL. Now that we have it, this patch fixes it in the device tree. This value is currently only used by the NAND driver for the setting the NAND timing. Fortunately it is not actually used: all the mainline board with a NAND flash comes with a NAND device tree node using the "marvell,nand-keep-config" property. With this property the timings are not modified in the kernel driver and are kept from the bootloader. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Marcin Wojtas <mw@semihalf.com>
*	ARM: mvebu: armada-xp-openblocks-ax3-4: Disable internal RTC	Gregory CLEMENT	2015-04-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \|	There is no crystal connected to the internal RTC on the Open Block AX3. So let's disable it in order to prevent the kernel probing the driver uselessly. Eventually this patches removes the following warning message from the boot log: "rtc-mv d0010300.rtc: internal RTC not ticking" Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Cc: <stable@vger.kernel.org> # v3.8 +
*	x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue	Andy Lutomirski	2015-04-27	5	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2015-04-27	4	-22/+22
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
\| *	VFS: assorted d_backing_inode() annotations	David Howells	2015-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	VFS: assorted weird filesystems: d_inode() annotations	David Howells	2015-04-15	3	-21/+21
\| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6	Linus Torvalds	2015-04-26	1	-1/+1
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull crypto fixes from Herbert Xu: "This push fixes a build problem with img-hash under non-standard configurations and a serious regression with sha512_ssse3 which can lead to boot failures" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA crypto: x86/sha512_ssse3 - fixup for asm function prototype change
\| * \|	crypto: x86/sha512_ssse3 - fixup for asm function prototype change	Ard Biesheuvel	2015-04-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch e68410ebf626 ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer") changed the prototypes of the core asm SHA-512 implementations so that they are compatible with the prototype used by the base layer. However, in one instance, the register that was used for passing the input buffer was reused as a scratch register later on in the code, and since the input buffer param changed places with the digest param -which needs to be written back before the function returns- this resulted in the scratch register to be dereferenced in a memory write operation, causing a GPF. Fix this by changing the scratch register to use the same register as the input buffer param again. Fixes: e68410ebf626 ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer") Reported-By: Bobby Powers <bobbypowers@gmail.com> Tested-By: Bobby Powers <bobbypowers@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* \| \|	Merge tag 'cris-for-4.1' of ↵	Linus Torvalds	2015-04-26	47	-1145/+285
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris Pull arch/cris updates from Jesper Nilsson: "Some much needed love for the CRIS-port. There's a bunch of changes this time, giving the CRISv32 port a bit of modern makeover with device-tree, irq domain and gpiolib support, and more switchover to generic frameworks. Some small fixes and removal of the theoretical SMP support brings up the rear" * tag 'cris-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jesper/cris: cris: fix integer overflow in ELF_ET_DYN_BASE CRISv32: use GENERIC_SCHED_CLOCK CRISv32: use MMIO clocksource CRISv32: use generic clockevents CRIS: use generic headers via Kbuild CRIS: use generic cmpxchg.h CRIS: use generic atomic.h CRIS: use generic atomic bitops CRISv10: remove redundant macros from system.h CRIS: remove SMP code CRISv32: don't enable irqs in INIT_THREAD CRISv32: handle multiple signals CRISv32: prevent bogus restarts on sigreturn CRISv32: don't attempt syscall restart on irq exit Add binding documentation for CRIS CRIS: add Axis 88 board device tree CRISv32: add device tree support CRISv32: add irq domains support CRIS: enable GPIOLIB
\| * \| \|	cris: fix integer overflow in ELF_ET_DYN_BASE	Andrey Ryabinin	2015-03-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Almost all arches define ELF_ET_DYN_BASE as 2/3 of TASK_SIZE. Though it seems that some architectures do this in a wrong way. The problem is that 2TASK_SIZE may overflow 32-bits so the real ELF_ET_DYN_BASE becomes wrong. Fix this overflow by dividing TASK_SIZE prior to multiplying: (TASK_SIZE / 3 2) Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: use GENERIC_SCHED_CLOCK	Rabin Vincent	2015-03-25	3	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide a fast sched clock using the free-running timer and the generic sched_clock infrastructure. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: use MMIO clocksource	Rabin Vincent	2015-03-25	2	-21/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a generic MMIO clocksource and get rid of some lines of code. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: use generic clockevents	Rabin Vincent	2015-03-25	2	-60/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement a oneshot-capable clockevents device so we get support for things like hrtimers and NOHZ. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRIS: use generic headers via Kbuild	Rabin Vincent	2015-03-25	13	-53/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Delete headers which do nothing but include the asm-generic versions and use Kbuild magic instead. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRIS: use generic cmpxchg.h	Rabin Vincent	2015-03-25	2	-51/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CRIS can use asm-generic's cmpxchg.h Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRIS: use generic atomic.h	Rabin Vincent	2015-03-25	4	-165/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CRIS can use asm-generic's atomic.h. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRIS: use generic atomic bitops	Rabin Vincent	2015-03-25	1	-110/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The generic atomic bitops are the same as the CRIS-specific ones. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv10: remove redundant macros from system.h	Rabin Vincent	2015-03-25	1	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All of these are either unused or already provided by other headers, so they can be removed. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRIS: remove SMP code	Rabin Vincent	2015-03-25	17	-638/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The CRIS SMP code cannot be built since there is no (and appears to never have been) a CONFIG_SMP Kconfig option in arch/cris/. Remove it. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: don't enable irqs in INIT_THREAD	Rabin Vincent	2015-03-25	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	INIT_THREAD enables interrupts in the thread_struct's saved flags. This means that interrupts get enabled in the middle of context_switch() while switching to new tasks that get forked off the init task during boot. Don't do this. Fixes the following splat on boot with spinlock debugging on: BUG: spinlock cpu recursion on CPU#0, swapper/2 lock: runqueues+0x0/0x47c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0 CPU: 0 PID: 2 Comm: swapper Not tainted 3.19.0-08796-ga747b55 #285 Call Trace: [<c0032b80>] spin_bug+0x2a/0x36 [<c0032c98>] do_raw_spin_lock+0xa2/0x126 [<c01964b0>] _raw_spin_lock+0x20/0x2a [<c00286c8>] scheduler_tick+0x22/0x76 [<c003db2c>] update_process_times+0x5e/0x72 [<c0007a94>] timer_interrupt+0x4e/0x6a [<c00378d6>] handle_irq_event_percpu+0x54/0xf2 [<c00379c4>] handle_irq_event+0x50/0x74 [<c003988e>] handle_simple_irq+0x6c/0xbe [<c0037270>] generic_handle_irq+0x2a/0x36 [<c0004c40>] do_IRQ+0x38/0x84 [<c000662e>] crisv32_do_IRQ+0x54/0x60 [<c0006204>] IRQ0x4b_interrupt+0x34/0x3c [<c0192baa>] __schedule+0x24a/0x532 [<c00056b4>] ret_from_kernel_thread+0x0/0x14 Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: handle multiple signals	Rabin Vincent	2015-03-25	2	-32/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Al Viro noted that CRIS fails to handle multiple signals. This fixes the problem for CRISv32 by making it use a C work_pending handling loop similar to the ARM implementation in 0a267fa6a15d41c ("ARM: 7472/1: pull all work_pending logics into C function"). This also happens to fixes the warnings which currently trigger on CRISv32 due to do_signal() being called with interrupts disabled. Test case (should die of the SIGSEGV which gets raised when setting up the stack for SIGALRM, but instead reaches and executes the _exit(1)): #include <unistd.h> #include <signal.h> #include <sys/time.h> #include <err.h> static void handler(int sig) { } int main(int argc, char *argv[]) { int ret; struct itimerval t1 = { .it_value = {1} }; stack_t ss = { .ss_sp = NULL, .ss_size = SIGSTKSZ, }; struct sigaction action = { .sa_handler = handler, .sa_flags = SA_ONSTACK, }; ret = sigaltstack(&ss, NULL); if (ret < 0) err(1, "sigaltstack"); sigaction(SIGALRM, &action, NULL); setitimer(ITIMER_REAL, &t1, NULL); pause(); _exit(1); return 0; } Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/r/20121208074429.GC4939@ZenIV.linux.org.uk Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: prevent bogus restarts on sigreturn	Rabin Vincent	2015-03-25	2	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Al Viro noted that CRIS is vulnerable to bogus restarts on sigreturn. The fixes CRISv32 by using regs->exs as an additional indicator to whether we should attempt to restart the syscall or not. EXS is only used in the sigtrap handling, and in that path we already have r9 (the other indicator, which indicates if we're in a syscall or not) cleared. Test case, a port of Al's ARM version from 653d48b22166db2d8 ("arm: fix really nasty sigreturn bug"): #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <sys/time.h> #include <errno.h> void f(int n) { register int r10 asm ("r10") = n; __asm__ __volatile__( "ba 1f \n" "nop \n" "break 8 \n" "1: ba . \n" "nop \n" : : "r" (r10) : "memory"); } void handler1(int sig) { } void handler2(int sig) { raise(1); } void handler3(int sig) { exit(0); } int main(int argc, char argv[]) { struct sigaction s = {.sa_handler = handler2}; struct itimerval t1 = { .it_value = {1} }; struct itimerval t2 = { .it_value = {2} }; signal(1, handler1); sigemptyset(&s.sa_mask); sigaddset(&s.sa_mask, 1); sigaction(SIGALRM, &s, NULL); signal(SIGVTALRM, handler3); setitimer(ITIMER_REAL, &t1, NULL); setitimer(ITIMER_VIRTUAL, &t2, NULL); f(-513); / -ERESTARTNOINTR */ return 0; } Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Link: http://lkml.kernel.org/r/20121208074429.GC4939@ZenIV.linux.org.uk Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: don't attempt syscall restart on irq exit	Rabin Vincent	2015-03-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r9 is used to determine whether syscall restarting must be performed or not. Unfortunately, r9 is never set to zero in the non-syscall path, and r9 is on top of that a callee-saved register which can be set to non-zero by the C functions that are called during IRQ handling. This means that if r10 (used for the syscall return value) is one of the -ERESTART* values when a hardware interrupt occurs which leads to a signal being delivered to the process, the kernel will "restart" a syscall which never occurred. This will lead to the PC being moved back by 2 on return to user space. Fix the problem by setting r9 to zero in the interrupt path. Test case (should loop forever but ends up executing the break 8 trap instruction): #include <signal.h> #include <stdlib.h> #include <sys/time.h> void f(int n) { register int r9 asm ("r9") = 1; register int r10 asm ("r10") = n; __asm__ __volatile__( "ba 1f \n" "nop \n" "break 8 \n" "1: ba . \n" "nop \n" : : "r" (r9), "r" (r10) : "memory"); } void handler1(int sig) { } int main(int argc, char argv[]) { struct itimerval t1 = { .it_value = {1} }; signal(SIGALRM, handler1); setitimer(ITIMER_REAL, &t1, NULL); f(-513); / -ERESTARTNOINTR */ return 0; } Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRIS: add Axis 88 board device tree	Rabin Vincent	2015-03-25	2	-0/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a minimal device tree for the ETRAX FS SoC and the Axis 88 developer board. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jespern@axis.com>
\| * \| \|	CRISv32: add device tree support	Rabin Vincent	2015-03-25	6	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for booting CRISv32 with a built-in device tree. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
\| * \| \|	CRISv32: add irq domains support	Rabin Vincent	2015-03-25	2	-3/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for IRQ domains to the CRISv32 interrupt controller. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
\| * \| \|	CRIS: enable GPIOLIB	Rabin Vincent	2015-03-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable GPIOLIB on CRIS so that we can use the generic GPIO APIs. Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
* \| \| \|	Merge tag 'powerpc-4.1-2' of ↵	Linus Torvalds	2015-04-26	11	-117/+137
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux Pull powerpc fixes from Michael Ellerman: - fix for mm_dec_nr_pmds() from Scott. - fixes for oopses seen with KVM + THP from Aneesh. - build fixes from Aneesh & Shreyas. * tag 'powerpc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error powerpc/mm/thp: Return pte address if we find trans_splitting. powerpc/mm/thp: Make page table walk safe against thp split/collapse KVM: PPC: Remove page table walk helpers KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()
\| * \| \| \|	powerpc/mm: Fix build error with CONFIG_PPC_TRANSACTIONAL_MEM disabled	Aneesh Kumar K.V	2015-04-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fix the below build error arch/powerpc/mm/hash_utils_64.c: In function ‘flush_hash_hugepage’: arch/powerpc/mm/hash_utils_64.c:1381:1: error: label at end of compound statement tm_abort: ^ make[1]: *** [arch/powerpc/mm/hash_utils_64.o] Error 1 Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \| \| \|	powerpc/kvm: Fix ppc64_defconfig + PPC_POWERNV=n build error	Shreyas B. Prabhu	2015-04-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kvm_no_guest() calls power7_wakeup_loss() to put the thread into the deepest supported idle state. power7_wakeup_loss() is defined in arch/powerpc/kernel/idle_power7.S, which is compiled only when PPC_P7_NAP=y. And PPC_P7_NAP is selected when PPC_POWERNV=y. Hence in cases where PPC_POWERNV=n and KVM_BOOK3S_64_HV=y we see the following error: arch/powerpc/kvm/built-in.o: In function `kvm_no_guest': arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x42c): undefined reference to `power7_wakeup_loss' Fix this by adding PPC_POWERNV as a dependency for KVM_BOOK3S_64_HV. Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \| \| \|	powerpc/mm/thp: Return pte address if we find trans_splitting.	Aneesh Kumar K.V	2015-04-17	4	-22/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For THP that is marked trans splitting, we return the pte. This require the callers to handle the pmd_trans_splitting scenario, if they care. All the current callers are either looking at pfn or write_ok, hence we don't need to update them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \| \| \|	powerpc/mm/thp: Make page table walk safe against thp split/collapse	Aneesh Kumar K.V	2015-04-17	9	-36/+88
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can disable a THP split or a hugepage collapse by disabling irq. We do send IPI to all the cpus in the early part of split/collapse, and disabling local irq ensure we don't make progress with split/collapse. If the THP is getting split we return NULL from find_linux_pte_or_hugepte(). For all the current callers it should be ok. We need to be careful if we want to use returned pte_t pointer outside the irq disabled region. W.r.t to THP split, the pfn remains the same, but then a hugepage collapse will result in a pfn change. There are few steps we can take to avoid a hugepage collapse.One way is to take page reference inside the irq disable region. Other option is to take mmap_sem so that a parallel collapse will not happen. We can also disable collapse by taking pmd_lock. Another method used by kvm subsystem is to check whether we had a mmu_notifer update in between using mmu_notifier_retry(). Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \| \| \|	KVM: PPC: Remove page table walk helpers	Aneesh Kumar K.V	2015-04-17	3	-57/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch remove helpers which we had used only once in the code. Limiting page table walk variants help in ensuring that we won't end up with code walking page table with wrong assumptions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \| \| \|	KVM: PPC: Use READ_ONCE when dereferencing pte_t pointer	Aneesh Kumar K.V	2015-04-17	2	-9/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pte can get updated from other CPUs as part of multiple activities like THP split, huge page collapse, unmap. We need to make sure we don't reload the pte value again and again for different checks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
\| * \| \| \|	Merge branch 'master' of ↵	Michael Ellerman	2015-04-17	1	-0/+1
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into fixes
\| \| * \| \| \|	powerpc/hugetlb: Call mm_dec_nr_pmds() in hugetlb_free_pmd_range()	Scott Wood	2015-04-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit dc6c9a35b66b5 ("mm: account pmd page tables to the process") added a counter that is incremented whenever a PMD is allocated and decremented whenever a PMD is freed. For hugepages on PPC, common code is used to allocated PMDs, but arch-specific code is used to free PMDs. This results in kernel output such as "BUG: non-zero nr_pmds on freeing mm: 1" when using hugepages. Update the PPC hugepage PMD freeing code to decrement the count, just as the above commit did for free_pmd_range(). Fixes: dc6c9a35b66b5 ("mm: account pmd page tables to the process") Signed-off-by: Scott Wood <scottwood@freescale.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # 4.0.x
* \| \| \| \| \|	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2015-04-26	29	-391/+1607
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull second batch of KVM changes from Paolo Bonzini: "This mostly includes the PPC changes for 4.1, which this time cover Book3S HV only (debugging aids, minor performance improvements and some cleanups). But there are also bug fixes and small cleanups for ARM, x86 and s390. The task_migration_notifier revert and real fix is still pending review, but I'll send it as soon as possible after -rc1" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits) KVM: arm/arm64: check IRQ number on userland injection KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi KVM: VMX: Preserve host CR4.MCE value while in guest mode. KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Add ICP real mode counters KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Add guest->host real mode completion counters KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte ...
\| * \ \ \ \ \	Merge tag 'kvm-arm-for-4.1-take2' of ↵	Paolo Bonzini	2015-04-22	3	-4/+15
\| \|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/ARM changes for v4.1, take #2: Rather small this time: - a fix for a nasty bug with virtual IRQ injection - a fix for irqfd
\| \| * \| \| \| \| \|	KVM: arm/arm64: check IRQ number on userland injection	Andre Przywara	2015-04-22	3	-4/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently only check it against a fixed limit, which historically is set to 127. With the new dynamic IRQ allocation the effective limit may actually be smaller (64). So when now a malicious or buggy userland injects a SPI in that range, we spill over on our VGIC bitmaps and bytemaps memory. I could trigger a host kernel NULL pointer dereference with current mainline by injecting some bogus IRQ number from a hacked kvmtool: ----------------- .... DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1) DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1) DEBUG: IRQ #114 still in the game, writing to bytemap now... Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc07652e000 [00000000] pgd=00000000f658b003, pud=00000000f658b003, *pmd=0000000000000000 Internal error: Oops: 96000006 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027 Hardware name: FVP Base (DT) task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000 PC is at kvm_vgic_inject_irq+0x234/0x310 LR is at kvm_vgic_inject_irq+0x30c/0x310 pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145 ..... So this patch fixes this by checking the SPI number against the actual limit. Also we remove the former legacy hard limit of 127 in the ioctl code. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18 [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__, as suggested by Christopher Covington] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
\| * \| \| \| \| \| \|	Merge tag 'signed-kvm-ppc-queue' of git://github.com/agraf/linux-2.6 into ↵	Paolo Bonzini	2015-04-21	22	-364/+1561
\| \|\ \ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kvm-master Patch queue for ppc - 2015-04-21 This is the latest queue for KVM on PowerPC changes. Highlights this time around: - Book3S HV: Debugging aids - Book3S HV: Minor performance improvements - Book3S HV: Cleanups
\| \| * \| \| \| \| \| \|	KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8	Paul Mackerras	2015-04-21	4	-22/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This uses msgsnd where possible for signalling other threads within the same core on POWER8 systems, rather than IPIs through the XICS interrupt controller. This includes waking secondary threads to run the guest, the interrupts generated by the virtual XICS, and the interrupts to bring the other threads out of the guest when exiting. Aggregated statistics from debugfs across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, show this before the change: rm_entry: 3387.6ns (228 - 86600, 1008969 samples) rm_exit: 4561.5ns (12 - 3477452, 1009402 samples) rm_intr: 1660.0ns (12 - 553050, 3600051 samples) and this after the change: rm_entry: 3060.1ns (212 - 65138, 953873 samples) rm_exit: 4244.1ns (12 - 9693408, 954331 samples) rm_intr: 1342.3ns (12 - 1104718, 3405326 samples) for a test of booting Fedora 20 big-endian to the login prompt. The time taken for a H_PROD hcall (which is handled in the host kernel) went down from about 35 microseconds to about 16 microseconds with this change. The noinline added to kvmppc_run_core turned out to be necessary for good performance, at least with gcc 4.9.2 as packaged with Fedora 21 and a little-endian POWER8 host. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
\| \| * \| \| \| \| \| \|	KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C	Paul Mackerras	2015-04-21	4	-68/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces the assembler code for kvmhv_commence_exit() with C code in book3s_hv_builtin.c. It also moves the IPI sending code that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function so it can be used by kvmhv_commence_exit() as well as icp_rm_set_vcpu_irq(). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
\| \| * \| \| \| \| \| \|	KVM: PPC: Book3S HV: Streamline guest entry and exit	Paul Mackerras	2015-04-21	1	-86/+126
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On entry to the guest, secondary threads now wait for the primary to switch the MMU after loading up most of their state, rather than before. This means that the secondary threads get into the guest sooner, in the common case where the secondary threads get to kvmppc_hv_entry before the primary thread. On exit, the first thread out increments the exit count and interrupts the other threads (to get them out of the guest) before saving most of its state, rather than after. That means that the other threads exit sooner and means that the first thread doesn't spend so much time waiting for the other threads at the point where the MMU gets switched back to the host. This pulls out the code that increments the exit count and interrupts other threads into a separate function, kvmhv_commence_exit(). This also makes sure that r12 and vcpu->arch.trap are set correctly in some corner cases. Statistics from /sys/kernel/debug/kvm/vm/vcpu/timings show the improvement. Aggregating across vcpus for a guest with 32 vcpus, 8 threads/vcore, running on a POWER8, gives this before the change: rm_entry: avg 4537.3ns (222 - 48444, 1068878 samples) rm_exit: avg 4787.6ns (152 - 165490, 1010717 samples) rm_intr: avg 1673.6ns (12 - 341304, 3818691 samples) and this after the change: rm_entry: avg 3427.7ns (232 - 68150, 1118921 samples) rm_exit: avg 4716.0ns (12 - 150720, 1119477 samples) rm_intr: avg 1614.8ns (12 - 522436, 3850432 samples) showing a substantial reduction in the time spent per guest entry in the real-mode guest entry code, and smaller reductions in the real mode guest exit and interrupt handling times. (The test was to start the guest and boot Fedora 20 big-endian to the login prompt.) Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
\| \| * \| \| \| \| \| \|	KVM: PPC: Book3S HV: Use bitmap of active threads rather than count	Paul Mackerras	2015-04-21	5	-49/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the entry_exit_count field in the kvmppc_vcore struct contains two 8-bit counts, one of the threads that have started entering the guest, and one of the threads that have started exiting the guest. This changes it to an entry_exit_map field which contains two bitmaps of 8 bits each. The advantage of doing this is that it gives us a bitmap of which threads need to be signalled when exiting the guest. That means that we no longer need to use the trick of setting the HDEC to 0 to pull the other threads out of the guest, which led in some cases to a spurious HDEC interrupt on the next guest entry. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
\| \| * \| \| \| \| \| \|	KVM: PPC: Book3S HV: Use decrementer to wake napping threads	Paul Mackerras	2015-04-21	1	-2/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This arranges for threads that are napping due to their vcpu having ceded or due to not having a vcpu to wake up at the end of the guest's timeslice without having to be poked with an IPI. We do that by arranging for the decrementer to contain a value no greater than the number of timebase ticks remaining until the end of the timeslice. In the case of a thread with no vcpu, this number is in the hypervisor decrementer already. In the case of a ceded vcpu, we use the smaller of the HDEC value and the DEC value. Using the DEC like this when ceded means we need to save and restore the guest decrementer value around the nap. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
\| \| * \| \| \| \| \| \|	KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI	Paul Mackerras	2015-04-21	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When running a multi-threaded guest and vcpu 0 in a virtual core is not running in the guest (i.e. it is busy elsewhere in the host), thread 0 of the physical core will switch the MMU to the guest and then go to nap mode in the code at kvm_do_nap. If the guest sends an IPI to thread 0 using the msgsndp instruction, that will wake up thread 0 and cause all the threads in the guest to exit to the host unnecessarily. To avoid the unnecessary exit, this arranges for the PECEDP bit to be cleared in this situation. When napping due to a H_CEDE from the guest, we still set PECEDP so that the thread will wake up on an IPI sent using msgsndp. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>