summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ARC: [plat-hsdk] initial port for HSDK boardAlexey Brodkin2017-09-018-2/+358
| | | | | | | | | | | | | | | | | | | | | | | This initial port adds support of ARC HS Development Kit board with some basic features such serial port, USB, SD/MMC and Ethernet. Essentially we run Linux kernel on all 4 cores (i.e. utilize SMP) and heavily use IO Coherency for speeding-up DMA-aware peripherals. Note as opposed to other ARC boards we link Linux kernel to 0x9000_0000 intentionally because cores 1 and 3 configured with DCCM situated at our more usual link base 0x8000_0000. We still can use memory region starting at 0x8000_0000 as we reallocate DCCM in our platform code. Note that PAE remapping for DMA clients does not work due to an RTL bug, so CREG_PAE register must be programmed to all zeroes, otherwise it will cause problems with DMA to/from peripherals even if PAE40 is not used. Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: mm: Decouple RAM base address from kernel link addressEugeniy Paltsev2017-09-018-10/+18
| | | | | | | | | | | | | | | | | | [Needed for HSDK] Currently the first page of system (hence RAM base) is assumed to be @ CONFIG_LINUX_LINK_BASE, where kernel itself is linked. However is case of HSDK platform, for reasons explained in that patch, this is not true. kernel needs to be linked @ 0x9000_0000 while DDR is still wired at 0x8000_0000. To properly account for this 256M of RAM, we need to introduce a new option and base page frame accountiing off of it. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: renamed CONFIG_KERNEL_RAM_BASE_ADDRESS => CONFIG_LINUX_RAM_BASE : simplified changelog]
* ARCv2: IOC: Tighten up the contraints (specifically base / size alignment)Eugeniy Paltsev2017-09-011-8/+19
| | | | | | | | | | | | | | | [Needed for HSDK] - Currently IOC base is hardcoded to 0x8000_0000 which is default value of LINUX_LINK_BASE, but may not always be the case - IOC programming model imposes the constraint that IOC aperture size needs to be aligned to IOC base address, which we were not checking so far. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: reworked the changelog]
* ARC: [plat-axs103] refactor the DT fudging codeVineet Gupta2017-09-011-24/+22
| | | | | | | | with clk frequency setting code gone by prev commits, we can elide the unconditonal DT parsing to the specific case of quad core config where we possibly need to fudge the DT value. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-axs103] use clk driver #2: Add core pll node to DT to manage cpu clkEugeniy Paltsev2017-09-012-4/+18
| | | | | | | | | | Add core pll node (core_clk) to manage cpu frequency. core_clk represents pll itself. input_clk represents clock signal source (basically xtal) which comes to pll input. Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-axs103] use clk driver #1: Get rid of platform specific cpu clk ↵Eugeniy Paltsev2017-09-011-113/+5
| | | | | | | | | | | | | | setting historically axs103 platform code used to set the cpu clk by writing to PLL registers directly. however the axs10x clk driver is now upstream so no need to do this amymore. Driver is selected automatically when CONFIG_ARC_PLAT_AXS10X is set Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: deleted more code not needed anymore]
* ARCv2: SLC: provide a line based flush routine for debuggingVineet Gupta2017-08-302-1/+55
| | | | Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: Hardcode ARCH_DMA_MINALIGN to max line length we may haveAlexey Brodkin2017-08-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | Current implementation relies on L1 line length which might easily be smaller than L2 line (which is usually the case BTW). Imagine this typical case: L2 line is 128 bytes while L1 line is 64-bytes. Now we want to allocate small buffer and later use it for DMA (consider IOC is not available). kmalloc() allocates small KMALLOC_MIN_SIZE-sized, KMALLOC_MIN_SIZE-aligned That way if buffer happens to be aligned to L1 line and not L2 line we'll be flushing and invalidating extra portions of data from L2 which will cause cache coherency issues. And since KMALLOC_MIN_SIZE is bound to ARCH_DMA_MINALIGN the fix could be simple - set ARCH_DMA_MINALIGN to the largest cache line we may ever get. As of today neither L1 of ARC700 and ARC HS38 nor SLC might not be longer than 128 bytes. Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] handle extra aux regs #2: kernel/entry exitLiav Rehana2017-08-293-0/+33
| | | | | | | | | | | | | | | | Preserve eflags and gpa1 aux during entry/exit into kernel as these could be modified by kernel mode These registers used by compare exchange instructions. - GPA1 is used for compare value, - EFLAGS got bit reflects atomic operation response. EFLAGS is zeroed for each new user task so it won't get its parent value. Signed-off-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] handle extra aux regs #1: save/restore on context switchNoam Camus2017-08-294-1/+52
| | | | | | | | | save EFLAGS, and GPA1 auxiliary registers during context switch, since they may be changed by the new task in kernel mode, while using atomic ops e.g. cmpxchg. Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] avoid toggling of DPC registerElad Kanfi2017-08-292-0/+13
| | | | | | | | | | | | HW bug description: in case of HW thread context switch the dpc configuration of the exiting thread is dragged one cycle into the next thread. In order to avoid the consequences of this bug, the DPC register is set to an initial value, and not changed afterwards. Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] Update the init sequence of aux regs per cpu.Liav Rehana2017-08-292-1/+12
| | | | | | | | | | | | | | | This commit add new configuration that enables us to distinguish between building the kernel for platforms that have a different set of auxiliary registers for each cpu and platforms that have a shared set of auxiliary registers across every thread in each core. On platforms that implement a different set of auxiliary registers disabling this configuration insures that we initialize registers on every cpu and not just for the first thread of the core. Example for non shared registers is working with EZsim (non silicon) Signed-off-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] new command line argument for HW scheduler at MTMNoam Camus2017-08-291-2/+25
| | | | | | | | | | | We add ability for all cores at NPS SoC to control the number of cycles HW thread can execute before it is replace with another eligible HW thread within the same core. The replacement is done by the HW scheduler. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: simplified handlign of out of range argument value]
* ARC: set boot print log level to PR_INFONoam Camus2017-08-293-5/+5
| | | | | | | | | | | | | Some of the boot printing code had printk() w/o explicit log level. This patch introduces consistency allowing platforms to switch to less verbose console logging using cmdline. NPS400 with 4K CPUs needs to avoid the cpu info printing for faster bootup. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] Handle user memory error same in simulation and siliconNoam Camus2017-08-293-1/+21
| | | | | | | | | | | | | | | | | | On ARC700 (and nSIM), user mode memory error triggers an L2 interrupt which is handled gracefully by kernel (or it tries to despite this being imprecise, and error could get charged to kernel itself). The offending task is killed and kernel moves on. NPS hardware however raises a Machine Check exception for same error which is NOT recoverable by kernel. This patch aligns kernel handling for nSIM case, to same as hardware by overriding the default user space bus error handler. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Elad Kanfi <eladkan@mellanox.com> [vgupta: rewrote changelog] Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] use schd.wft instruction instead of sleep at idle taskNoam Camus2017-08-292-1/+13
| | | | | | | | | | | | When HW threads are active we want CPU to enter idle state only for the calling HW thread and not to put on sleep all HW threads sharing this core. For this need the NPS400 got dedicated instruction so only calling thread is entring sleep and all other are still awake and can execute instructions. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: reworked patch to not use inline ifdef but a new function itself]
* ARC: create cpu specific version of arch_cpu_idle()Vineet Gupta2017-08-293-7/+16
| | | | | | | This paves way for creating a 3rd variant needed for NPS ARC700 without littering ifdey'ery all over the place Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] spinlock aware for MTMNoam Camus2017-08-291-0/+6
| | | | | | | | | | This way when we execute "ex" during trying to hold lock we can switch to other HW thread and utilize the core intead of just spinning on a lock. We noticed about 10% improvement of execution time with hackbench test. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: spinlock: Document the EX based spin_unlockVineet Gupta2017-08-291-0/+6
| | | | Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] disabled stall counter due to a HW bugNoam Camus2017-08-291-2/+0
| | | | | | | | | | | | | | This counter represents threshold for consecutive stall which would trigger HW threads scheduling. However when enabled, low threshhold values cause performance degradation and in the worst case even livelock. So disable it by resorting to HW reset value Signed-off-by: Noam Camus <noamca@mellanox.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: fixed changelog]
* ARC: [plat-eznps] Fix TLB ErrataNoam Camus2017-08-291-0/+9
| | | | | | | | Due to a HW bug in NPS400 we get from time to time false TLB miss. Workaround this by validating each miss. Signed-off-by: Noam Camus <noamca@mellanox.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: [plat-eznps] typo fix at KconfigNoam Camus2017-08-291-2/+2
| | | | | | Signed-off-by: Noam Camus <noamca@mellanox.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: typos fix in kernel/entry-compact.SLiav Rehana2017-08-291-11/+11
| | | | | | | Signed-off-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Noam Camus <noamca@mellanox.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* ARC: typo fix in mm/fault.cLiav Rehana2017-08-291-1/+1
| | | | | | | Signed-off-by: Liav Rehana <liavr@mellanox.com> Signed-off-by: Noam Camus <noamca@mellanox.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-08-261-3/+1
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two fixes: one for an ldt_struct handling bug and a cherry-picked objtool fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Fix use-after-free of ldt_struct objtool: Fix '-mtune=atom' decoding support in objtool 2.0
| * x86/mm: Fix use-after-free of ldt_structEric Biggers2017-08-251-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") renamed init_new_context() to init_new_context_ldt() and added a new init_new_context() which calls init_new_context_ldt(). However, the error code of init_new_context_ldt() was ignored. Consequently, if a memory allocation in alloc_ldt_struct() failed during a fork(), the ->context.ldt of the new task remained the same as that of the old task (due to the memcpy() in dup_mm()). ldt_struct's are not intended to be shared, so a use-after-free occurred after one task exited. Fix the bug by making init_new_context() pass through the error code of init_new_context_ldt(). This bug was found by syzkaller, which encountered the following splat: BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710 CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:16 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:52 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x24e/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] exec_mmap fs/exec.c:1061 [inline] flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291 load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855 search_binary_handler+0x142/0x6b0 fs/exec.c:1652 exec_binprm fs/exec.c:1694 [inline] do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816 do_execve+0x31/0x40 fs/exec.c:1860 call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627 kmalloc include/linux/slab.h:493 [inline] alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67 write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277 sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307 entry_SYSCALL_64_fastpath+0x1f/0xbe Freed by task 3700: save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3503 [inline] kfree+0xca/0x250 mm/slab.c:3820 free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121 free_ldt_struct arch/x86/kernel/ldt.c:173 [inline] destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171 destroy_context arch/x86/include/asm/mmu_context.h:157 [inline] __mmdrop+0xe9/0x530 kernel/fork.c:889 mmdrop include/linux/sched/mm.h:42 [inline] __mmput kernel/fork.c:916 [inline] mmput+0x541/0x6e0 kernel/fork.c:927 copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931 copy_process kernel/fork.c:1546 [inline] _do_fork+0x1ef/0xfb0 kernel/fork.c:2025 SYSC_clone kernel/fork.c:2135 [inline] SyS_clone+0x37/0x50 kernel/fork.c:2129 do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287 return_from_SYSCALL_64+0x0/0x7a Here is a C reproducer: #include <asm/ldt.h> #include <pthread.h> #include <signal.h> #include <stdlib.h> #include <sys/syscall.h> #include <sys/wait.h> #include <unistd.h> static void *fork_thread(void *_arg) { fork(); } int main(void) { struct user_desc desc = { .entry_number = 8191 }; syscall(__NR_modify_ldt, 1, &desc, sizeof(desc)); for (;;) { if (fork() == 0) { pthread_t t; srand(getpid()); pthread_create(&t, NULL, fork_thread, NULL); usleep(rand() % 10000); syscall(__NR_exit_group, 0); } wait(NULL); } } Note: the reproducer takes advantage of the fact that alloc_ldt_struct() may use vmalloc() to allocate a large ->entries array, and after commit: 5d17a73a2ebe ("vmalloc: back off when the current task is killed") it is possible for userspace to fail a task's vmalloc() by sending a fatal signal, e.g. via exit_group(). It would be more difficult to reproduce this bug on kernels without that commit. This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: <stable@vger.kernel.org> [v4.6+] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init") Link: http://lkml.kernel.org/r/20170824175029.76040-1-ebiggers3@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2017-08-2612-64/+135
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Paolo Bonzini: "Bugfixes for x86, PPC and s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce() KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state KVM: x86: simplify handling of PKRU KVM: x86: block guest protection keys unless the host has them enabled KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document them KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9 KVM: s390: sthyi: fix specification exception detection KVM: s390: sthyi: fix sthyi inline assembly
| * | KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()Paul Mackerras2017-08-251-22/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nixiaoming pointed out that there is a memory leak in kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd() fails; the memory allocated for the kvmppc_spapr_tce_table struct is not freed, and nor are the pages allocated for the iommu tables. In addition, we have already incremented the process's count of locked memory pages, and this doesn't get restored on error. David Hildenbrand pointed out that there is a race in that the function checks early on that there is not already an entry in the stt->iommu_tables list with the same LIOBN, but an entry with the same LIOBN could get added between then and when the new entry is added to the list. This fixes all three problems. To simplify things, we now call anon_inode_getfd() before placing the new entry in the list. The check for an existing entry is done while holding the kvm->lock mutex, immediately before adding the new entry to the list. Finally, on failure we now call kvmppc_account_memlimit to decrement the process's count of locked memory pages. Reported-by: Nixiaoming <nixiaoming@huawei.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.statePaolo Bonzini2017-08-252-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The host pkru is restored right after vcpu exit (commit 1be0e61), so KVM_GET_XSAVE will return the host PKRU value instead. Fix this by using the guest PKRU explicitly in fill_xsave and load_xsave. This part is based on a patch by Junkang Fu. The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state, because it could have been changed by userspace since the last time it was saved, so skip loading it in kvm_load_guest_fpu. Reported-by: Junkang Fu <junkang.fjk@alibaba-inc.com> Cc: Yang Zhang <zy107165@alibaba-inc.com> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: simplify handling of PKRUPaolo Bonzini2017-08-255-30/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move it to struct kvm_arch_vcpu, replacing guest_pkru_valid with a simple comparison against the host value of the register. The write of PKRU in addition can be skipped if the guest has not enabled the feature. Once we do this, we need not test OSPKE in the host anymore, because guest_CR4.PKE=1 implies host_CR4.PKE=1. The static PKU test is kept to elide the code on older CPUs. Suggested-by: Yang Zhang <zy107165@alibaba-inc.com> Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: x86: block guest protection keys unless the host has them enabledPaolo Bonzini2017-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the host has protection keys disabled, we cannot read and write the guest PKRU---RDPKRU and WRPKRU fail with #GP(0) if CR4.PKE=0. Block the PKU cpuid bit in that case. This ensures that guest_CR4.PKE=1 implies host_CR4.PKE=1. Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870 Cc: stable@vger.kernel.org Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | KVM: PPC: Book3S HV: Add missing barriers to XIVE code and document themBenjamin Herrenschmidt2017-08-241-2/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds missing memory barriers to order updates/tests of the virtual CPPR and MFRR, thus fixing a lost IPI problem. While at it also document all barriers in this file. This fixes a bug causing guest IPIs to occasionally get lost. The symptom then is hangs or stalls in the guest. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit lossBenjamin Herrenschmidt2017-08-241-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a workaround for a bug in POWER9 DD1 chips where changing the CPPR (Current Processor Priority Register) can cause bits in the IPB (Interrupt Pending Buffer) to get lost. Thankfully it only happens when manually manipulating CPPR which is quite rare. When it does happen it can cause interrupts to be delayed or lost. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | KVM: PPC: Book3S HV: Use msgsync with hypervisor doorbells on POWER9Nicholas Piggin2017-08-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When msgsnd is used for IPIs to other cores, msgsync must be executed by the target to order stores performed on the source before its msgsnd (provided the source executes the appropriate sync). Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | Merge tag 'kvm-s390-master-4.13-2' of ↵Radim Krčmář2017-08-211-2/+5
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: two fixes for sthyi emulation - missing inline assembly constraint - wrong exception handling
| | * KVM: s390: sthyi: fix specification exception detectionHeiko Carstens2017-08-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sthyi should only generate a specification exception if the function code is zero and the response buffer is not on a 4k boundary. The current code would also test for unknown function codes if the response buffer, that is currently only defined for function code 0, is not on a 4k boundary and incorrectly inject a specification exception instead of returning with condition code 3 and return code 4 (unsupported function code). Fix this by moving the boundary check. Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation") Cc: <stable@vger.kernel.org> # 4.8+ Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * KVM: s390: sthyi: fix sthyi inline assemblyHeiko Carstens2017-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sthyi inline assembly misses register r3 within the clobber list. The sthyi instruction will always write a return code to register "R2+1", which in this case would be r3. Due to that we may have register corruption and see host crashes or data corruption depending on how gcc decided to allocate and use registers during compile time. Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation") Cc: <stable@vger.kernel.org> # 4.8+ Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | | Merge tag 'powerpc-4.13-8' of ↵Linus Torvalds2017-08-263-0/+20
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: "Just one fix, to add a barrier in the switch_mm() code to make sure the mm cpumask update is ordered vs the MMU starting to load translations. As far as we know no one's actually hit the bug, but that's just luck. Thanks to Benjamin Herrenschmidt, Nicholas Piggin" * tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Ensure cpumask update is ordered
| * | | powerpc/mm: Ensure cpumask update is orderedBenjamin Herrenschmidt2017-08-183-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no guarantee that the various isync's involved with the context switch will order the update of the CPU mask with the first TLB entry for the new context being loaded by the HW. Be safe here and add a memory barrier to order any subsequent load/store which may bring entries into the TLB. The corresponding barrier on the other side already exists as pte updates use pte_xchg() which uses __cmpxchg_u64 which has a sync after the atomic operation. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comments in the code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | | Merge tag 'armsoc-fixes' of ↵Linus Torvalds2017-08-243-1/+14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Arnd Bergmann: "A small number of bugfixes, again nothing serious. - Alexander Dahl found multiple bugs in the Atmel memory interface driver - A randconfig build fix for at91 was incomplete, the second attempt fixes the remaining corner case - One fix for the TI Keystone queue handler - The Odroid XU4 HDMI port (added in 4.13) needs a small DT fix" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: dts: exynos: add needs-hpd for Odroid-XU3/4 ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms soc: ti: knav: Add a NULL pointer check for kdev in knav_pool_create memory: atmel-ebi: Fix smc cycle xlate converter memory: atmel-ebi: Allow t_DF timings of zero ns memory: atmel-ebi: Fix smc timing return value evaluation
| * | | | ARM: dts: exynos: add needs-hpd for Odroid-XU3/4Hans Verkuil2017-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CEC support was added for Exynos5 in 4.13, but for the Odroids we need to set 'needs-hpd' as well since CEC is disabled when there is no HDMI hotplug signal, just as for the exynos4 Odroid-U3. This is due to the level-shifter that is disabled when there is no HPD, thus blocking the CEC signal as well. Same close-but-no-cigar board design as the Odroid-U3. Tested with my Odroid XU4. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| * | | | ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platformsArnd Bergmann2017-08-232-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My previous patch fixed a link error for all at91 platforms when CONFIG_ARM_CPU_SUSPEND was not set, however this caused another problem on a configuration that enabled CONFIG_ARCH_AT91 but none of the individual SoCs, and that also enabled CPU_ARM720 as the only CPU: warning: (ARCH_AT91 && SOC_IMX23 && SOC_IMX28 && ARCH_PXA && MACH_MVEBU_V7 && SOC_IMX6 && ARCH_OMAP3 && ARCH_OMAP4 && SOC_OMAP5 && SOC_AM33XX && SOC_DRA7XX && ARCH_EXYNOS3 && ARCH_EXYNOS4 && EXYNOS5420_MCPM && EXYNOS_CPU_SUSPEND && ARCH_VEXPRESS_TC2_PM && ARM_BIG_LITTLE_CPUIDLE && ARM_HIGHBANK_CPUIDLE && QCOM_PM) selects ARM_CPU_SUSPEND which has unmet direct dependencies (ARCH_SUSPEND_POSSIBLE) arch/arm/kernel/sleep.o: In function `cpu_resume': (.text+0xf0): undefined reference to `cpu_arm720_suspend_size' arch/arm/kernel/suspend.o: In function `__cpu_suspend_save': suspend.c:(.text+0x134): undefined reference to `cpu_arm720_do_suspend' This improves the hack some more by only selecting ARM_CPU_SUSPEND for the part that requires it, and changing pm.c to drop the contents of unused init functions so we no longer refer to cpu_resume on at91 platforms that don't need it. Fixes: cc7a938f5f30 ("ARM: at91: select CONFIG_ARM_CPU_SUSPEND") Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
* | | | | Merge tag 'arm64-fixes' of ↵Linus Torvalds2017-08-234-11/+17
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Late arm64 fixes. They fix very early boot failures with KASLR where the early mapping of the kernel is incorrect, so the failure mode looks like a hang with no output. There's also a signal-handling fix when a uaccess routine faults with a fatal signal pending, which could be used to create unkillable user tasks using userfaultfd and finally a state leak fix for the floating pointer registers across a call to exec(). We're still seeing some random issues crop up (inode memory corruption and spinlock recursion) but we've not managed to reproduce things reliably enough to debug or bisect them yet. Summary: - Fix very early boot failures with KASLR enabled - Fix fatal signal handling on userspace access from kernel - Fix leakage of floating point register state across exec()" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: kaslr: Adjust the offset to avoid Image across alignment boundary arm64: kaslr: ignore modulo offset when validating virtual displacement arm64: mm: abort uaccess retries upon fatal signal arm64: fpsimd: Prevent registers leaking across exec
| * | | | | arm64: kaslr: Adjust the offset to avoid Image across alignment boundaryCatalin Marinas2017-08-221-7/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With 16KB pages and a kernel Image larger than 16MB, the current kaslr_early_init() logic for avoiding mappings across swapper table boundaries fails since increasing the offset by kimg_sz just moves the problem to the next boundary. This patch rounds the offset down to (1 << SWAPPER_TABLE_SHIFT) if the Image crosses a PMD_SIZE boundary. Fixes: afd0e5a87670 ("arm64: kaslr: Fix up the kernel image alignment") Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Neeraj Upadhyay <neeraju@codeaurora.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | | | | arm64: kaslr: ignore modulo offset when validating virtual displacementArd Biesheuvel2017-08-222-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the KASLR setup routine, we ensure that the early virtual mapping of the kernel image does not cover more than a single table entry at the level above the swapper block level, so that the assembler routines involved in setting up this mapping can remain simple. In this calculation we add the proposed KASLR offset to the values of the _text and _end markers, and reject it if they would end up falling in different swapper table sized windows. However, when taking the addresses of _text and _end, the modulo offset (the physical displacement modulo 2 MB) is already accounted for, and so adding it again results in incorrect results. So disregard the modulo offset from the calculation. Fixes: 08cdac619c81 ("arm64: relocatable: deal with physically misaligned ...") Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | | | | arm64: mm: abort uaccess retries upon fatal signalMark Rutland2017-08-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there's a fatal signal pending, arm64's do_page_fault() implementation returns 0. The intent is that we'll return to the faulting userspace instruction, delivering the signal on the way. However, if we take a fatal signal during fixing up a uaccess, this results in a return to the faulting kernel instruction, which will be instantly retried, resulting in the same fault being taken forever. As the task never reaches userspace, the signal is not delivered, and the task is left unkillable. While the task is stuck in this state, it can inhibit the forward progress of the system. To avoid this, we must ensure that when a fatal signal is pending, we apply any necessary fixup for a faulting kernel instruction. Thus we will return to an error path, and it is up to that code to make forward progress towards delivering the fatal signal. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Laura Abbott <labbott@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Steve Capper <steve.capper@arm.com> Tested-by: Steve Capper <steve.capper@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Tested-by: James Morse <james.morse@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | | | | arm64: fpsimd: Prevent registers leaking across execDave Martin2017-08-221-0/+2
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some tricky dependencies between the different stages of flushing the FPSIMD register state during exec, and these can race with context switch in ways that can cause the old task's regs to leak across. In particular, a context switch during the memset() can cause some of the task's old FPSIMD registers to reappear. Disabling preemption for this small window would be no big deal for performance: preemption is already disabled for similar scenarios like updating the FPSIMD registers in sigreturn. So, instead of rearranging things in ways that might swap existing subtle bugs for new ones, this patch just disables preemption around the FPSIMD state flushing so that races of this type can't occur here. This brings fpsimd_flush_thread() into line with other code paths. Cc: stable@vger.kernel.org Fixes: 674c242c9323 ("arm64: flush FP/SIMD state correctly after execve()") Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Dave Martin <Dave.Martin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2017-08-214-15/+15
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull sparc fixes from David Miller: "Just a couple small fixes, two of which have to do with gcc-7: 1) Don't clobber kernel fixed registers in __multi4 libgcc helper. 2) Fix a new uninitialized variable warning on sparc32 with gcc-7, from Thomas Petazzoni. 3) Adjust pmd_t initializer on sparc32 to make gcc happy. 4) If ATU isn't available, don't bark in the logs. From Tushar Dave" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() sparc64: remove unnecessary log message sparc64: Don't clibber fixed registers in __multi4. mm: add pmd_t initializer __pmd() to work around a GCC bug.
| * | | | | sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus()Thomas Petazzoni2017-08-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When building the kernel for Sparc using gcc 7.x, the build fails with: arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’: arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized] cmd |= PCI_COMMAND_IO; ^~ The simplified code looks like this: unsigned int cmd; [...] pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd); [...] cmd |= PCI_COMMAND_IO; I.e, the code assumes that pcic_read_config() will always initialize cmd. But it's not the case. Looking at pcic_read_config(), if bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will not be initialized. As a simple fix, we initialize cmd to zero at the beginning of pcibios_fixup_bus. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | sparc64: remove unnecessary log messageTushar Dave2017-08-161-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to log message if ATU hvapi couldn't get register. Unlike PCI hvapi, ATU hvapi registration failure is not hard error. Even if ATU hvapi registration fails (on system with ATU or without ATU) system continues with legacy IOMMU. So only log message when ATU hvapi successfully get registered. Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>