summaryrefslogtreecommitdiffstats
path: root/arch/x86/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* x86/coredump: Always use user_regs_struct for compat_elf_gregset_tDmitry Safonov2016-11-241-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: 90954e7b9407 ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag") changed the coredumping code to construct the elf coredump file according to register set size - and that's good: if binary crashes with 32-bit code selector, generate 32-bit ELF core, otherwise - 64-bit core. That was made for restoring 32-bit applications on x86_64: we want 32-bit application after restore to generate 32-bit ELF dump on crash. All was quite good and recently I started reworking 32-bit applications dumping part of CRIU: now it has two parasites (32 and 64) for seizing compat/native tasks, after rework it'll have one parasite, working in 64-bit mode, to which 32-bit prologue long-jumps during infection. And while it has worked for my work machine, in VM with !CONFIG_X86_X32_ABI during reworking I faced that segfault in 32-bit binary, that has long-jumped to 64-bit mode results in dereference of garbage: 32-victim[19266]: segfault at f775ef65 ip 00000000f775ef65 sp 00000000f776aa50 error 14 BUG: unable to handle kernel paging request at ffffffffffffffff IP: [<ffffffff81332ce0>] strlen+0x0/0x20 [...] Call Trace: [] elf_core_dump+0x11a9/0x1480 [] do_coredump+0xa6b/0xe60 [] get_signal+0x1a8/0x5c0 [] do_signal+0x23/0x660 [] exit_to_usermode_loop+0x34/0x65 [] prepare_exit_to_usermode+0x2f/0x40 [] retint_user+0x8/0x10 That's because we have 64-bit registers set (with according total size) and we're writing it to elf_thread_core_info which has smaller size on !CONFIG_X86_X32_ABI. That lead to overwriting ELF notes part. Tested on 32-, 64-bit ELF crashes and on 32-bit binaries that have jumped with 64-bit code selector - all is readable with gdb. Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: 90954e7b9407 ("x86/coredump: Use pr_reg size, rather that TIF_IA32 flag") Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/platform/intel-mid: Retrofit pci_platform_pm_ops ->get_state hookLukas Wunner2016-11-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cc7cc02bada8 ("PCI: Query platform firmware for device power state") augmented struct pci_platform_pm_ops with a ->get_state hook and implemented it for acpi_pci_platform_pm, the only pci_platform_pm_ops existing till v4.7. However v4.8 introduced another pci_platform_pm_ops for Intel Mobile Internet Devices with commit 5823d0893ec2 ("x86/platform/intel-mid: Add Power Management Unit driver"). It is missing the ->get_state hook, which is fatal since pci_set_platform_pm() enforces its presence. Andy Shevchenko reports that without the present commit, such a device "crashes without even a character printed out on serial console and reboots (since watchdog)". Retrofit mid_pci_platform_pm with the missing callback to fix the breakage. Acked-and-tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Fixes: cc7cc02bada8 ("PCI: Query platform firmware for device power state") Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-pci@vger.kernel.org Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: http://lkml.kernel.org/r/7c1567d4c49303a4aada94ba16275cbf56b8976b.1477221514.git.lukas@wunner.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-11-041-3/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "One NULL pointer dereference, and two fixes for regressions introduced during the merge window. The rest are fixes for MIPS, s390 and nested VMX" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm: x86: Check memopp before dereference (CVE-2016-8630) kvm: nVMX: VMCLEAR an active shadow VMCS after last use KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCK KVM: x86: fix wbinvd_dirty_mask use-after-free kvm/x86: Show WRMSR data is in hex kvm: nVMX: Fix kernel panics induced by illegal INVEPT/INVVPID types KVM: document lock orders KVM: fix OOPS on flush_work KVM: s390: Fix STHYI buffer alignment for diag224 KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Make ERET handle ERL before EXL KVM: MIPS: Fix lazy user ASID regenerate for SMP
| * KVM: x86: drop TSC offsetting kvm_x86_ops to fix KVM_GET/SET_CLOCKPaolo Bonzini2016-11-021-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit a545ab6a0085 ("kvm: x86: add tsc_offset field to struct kvm_vcpu_arch", 2016-09-07) the offset between host and L1 TSC is cached and need not be fished out of the VMCS or VMCB. This means that we can implement adjust_tsc_offset_guest and read_l1_tsc entirely in generic code. The simplification is particularly significant for VMX code, where vmx->nested.vmcs01_tsc_offset was duplicating what is now in vcpu->arch.tsc_offset. Therefore the vmcs01_tsc_offset can be dropped completely. More importantly, this fixes KVM_GET_CLOCK/KVM_SET_CLOCK which, after commit 108b249c453d ("KVM: x86: introduce get_kvmclock_ns", 2016-09-01) called read_l1_tsc while the VMCS was not loaded. It thus returned bogus values on Intel CPUs. Fixes: 108b249c453dd7132599ab6dc7e435a7036c193f Reported-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | x86/io: add interface to reserve io memtype for a resource range. (v1.1)Dave Airlie2016-10-261-0/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed() started enforcing checking the memory type against the registered list for amixed pfn insertion mappings. It happens that the drm drivers for a number of gpus relied on this being broken. Currently the driver only inserted VRAM mappings into the tracking table when they came from the kernel, and userspace mappings never landed in the table. This led to a regression where all the mapping end up as UC instead of WC now. I've considered a number of solutions but since this needs to be fixed in fixes and not next, and some of the solutions were going to introduce overhead that hadn't been there before I didn't consider them viable at this stage. These mainly concerned hooking into the TTM io reserve APIs, but these API have a bunch of fast paths I didn't want to unwind to add this to. The solution I've decided on is to add a new API like the arch_phys_wc APIs (these would have worked but wc_del didn't take a range), and use them from the drivers to add a WC compatible mapping to the table for all VRAM on those GPUs. This means we can then create userspace mapping that won't get degraded to UC. v1.1: use CONFIG_X86_PAT + add some comments in io.h Cc: Toshi Kani <toshi.kani@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams <dan.j.williams@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
* sched/core, x86: Make struct thread_info arch specific againHeiko Carstens2016-10-201-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: c65eacbe290b ("sched/core: Allow putting thread_info into task_struct") ... made 'struct thread_info' a generic struct with only a single ::flags member, if CONFIG_THREAD_INFO_IN_TASK_STRUCT=y is selected. This change however seems to be quite x86 centric, since at least the generic preemption code (asm-generic/preempt.h) assumes that struct thread_info also has a preempt_count member, which apparently was not true for x86. We could add a bit more #ifdefs to solve this problem too, but it seems to be much simpler to make struct thread_info arch specific again. This also makes the conversion to THREAD_INFO_IN_TASK_STRUCT a bit easier for architectures that have a couple of arch specific stuff in their thread_info definition. The arch specific stuff _could_ be moved to thread_struct. However keeping them in thread_info makes it easier: accessing thread_info members is simple, since it is at the beginning of the task_struct, while the thread_struct is at the end. At least on s390 the offsets needed to access members of the thread_struct (with task_struct as base) are too large for various asm instructions. This is not a problem when keeping these members within thread_info. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: keescook@chromium.org Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1476901693-8492-2-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/cpufeature: Add AVX512_4VNNIW and AVX512_4FMAPS featuresPiotr Luc2016-10-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AVX512_4VNNIW - Vector instructions for deep learning enhanced word variable precision. AVX512_4FMAPS - Vector instructions for deep learning floating-point single precision. These new instructions are to be used in future Intel Xeon & Xeon Phi processors. The bits 2&3 of CPUID[level:0x07, EDX] inform that new instructions are supported by a processor. The spec can be found in the Intel Software Developer Manual (SDM) or in the Instruction Set Extensions Programming Reference (ISE). Define new feature flags to enumerate the new instructions in /proc/cpuinfo accordingly to CPUID bits and add the required xsave extensions which are required for proper operation. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20161018150111.29926-1-piotr.luc@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2016-10-182-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes, plus hw-enablement changes: - fix persistent RAM handling - remove pkeys warning - remove duplicate macro - fix debug warning in irq handler - add new 'Knights Mill' CPU related constants and enable the perf bits" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Add Knights Mill CPUID perf/x86/intel/rapl: Add Knights Mill CPUID perf/x86/intel: Add Knights Mill CPUID x86/cpu/intel: Add Knights Mill to Intel family x86/e820: Don't merge consecutive E820_PRAM ranges pkeys: Remove easily triggered WARN x86: Remove duplicate rtit status MSR macro x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt()
| * x86/cpu/intel: Add Knights Mill to Intel familyPiotr Luc2016-10-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add CPUID of Knights Mill (KNM) processor to Intel family list. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161012180520.30976-1-piotr.luc@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge tag 'v4.9-rc1' into x86/urgent, to pick up updatesIngo Molnar2016-10-1618-100/+214
| |\ | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86: Remove duplicate rtit status MSR macroLongpeng(Mike)2016-10-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MSR_IA32_RTIT_STATUS is defined twice, so remove one. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: len.brown@intel.com Cc: peterz@infradead.org Cc: rafael.j.wysocki@intel.com Cc: alexander.shishkin@linux.intel.com Cc: ray.huang@amd.com Cc: Aravind.Gopalakrishnan@amd.com Cc: wu.wubin@huawei.com Cc: srinivas.pandruvada@linux.intel.com Cc: zhaoshenglong@huawei.com Cc: vladimir_zapolskiy@mentor.com Link: http://lkml.kernel.org/r/1476405740-80816-1-git-send-email-longpeng2@huawei.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | locking/rwsem/x86: Add stack frame dependency for ____down_write()Josh Poimboeuf2016-10-181-2/+4
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnd reported the following objtool warning: kernel/locking/rwsem.o: warning: objtool: down_write_killable()+0x16: call without frame pointer save/setup The warning means gcc placed the ____down_write() inline asm (and its call instruction) before the frame pointer setup in down_write_killable(), which breaks frame pointer convention and can result in incorrect stack traces. Force the stack frame to be created before the call instruction by listing the stack pointer as an output operand in the inline asm statement. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1188b7015f04baf361e59de499ee2d7272c59dce.1476393828.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'kbuild' of ↵Linus Torvalds2016-10-141-0/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kbuild updates from Michal Marek: - EXPORT_SYMBOL for asm source by Al Viro. This does bring a regression, because genksyms no longer generates checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is working on a patch to fix this. Plus, we are talking about functions like strcpy(), which rarely change prototypes. - Fixes for PPC fallout of the above by Stephen Rothwell and Nick Piggin - fixdep speedup by Alexey Dobriyan. - preparatory work by Nick Piggin to allow architectures to build with -ffunction-sections, -fdata-sections and --gc-sections - CONFIG_THIN_ARCHIVES support by Stephen Rothwell - fix for filenames with colons in the initramfs source by me. * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits) initramfs: Escape colons in depfile ppc: there is no clear_pages to export powerpc/64: whitelist unresolved modversions CRCs kbuild: -ffunction-sections fix for archs with conflicting sections kbuild: add arch specific post-link Makefile kbuild: allow archs to select link dead code/data elimination kbuild: allow architectures to use thin archives instead of ld -r kbuild: Regenerate genksyms lexer kbuild: genksyms fix for typeof handling fixdep: faster CONFIG_ search ia64: move exports to definitions sparc32: debride memcpy.S a bit [sparc] unify 32bit and 64bit string.h sparc: move exports to definitions ppc: move exports to definitions arm: move exports to definitions s390: move exports to definitions m68k: move exports to definitions alpha: move exports to actual definitions x86: move exports to actual definitions ...
| * | x86: move exports to actual definitionsAl Viro2016-08-081-0/+4
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge branch 'for-4.9' of ↵Linus Torvalds2016-10-141-2/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Nick improved generic implementations of percpu operations which modify the variable and return so that they calculate the physical address only once. - percpu_ref percpu <-> atomic mode switching improvements. The patchset was originally posted about a year ago but fell through the crack. - misc non-critical fixes. * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk() mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk() percpu: eliminate two sparse warnings percpu: improve generic percpu modify-return implementation percpu-refcount: init ->confirm_switch member properly percpu_ref: allow operation mode switching operations to be called concurrently percpu_ref: restructure operation mode switching percpu_ref: unify staggered atomic switching wait behavior percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic() percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation
| * | | percpu: eliminate two sparse warningsLance Richardson2016-09-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix two cases where a __percpu pointer cast drops __percpu. Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | Merge branch 'work.uaccess2' of ↵Linus Torvalds2016-10-124-33/+37
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull uaccess.h prepwork from Al Viro: "Preparations to tree-wide switch to use of linux/uaccess.h (which, obviously, will allow to start unifying stuff for real). The last step there, ie PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h` is not taken here - I would prefer to do it once just before or just after -rc1. However, everything should be ready for it" * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: remove a stray reference to asm/uaccess.h in docs sparc64: separate extable_64.h, switch elf_64.h to it score: separate extable.h, switch module.h to it mips: separate extable.h, switch module.h to it x86: separate extable.h, switch sections.h to it remove stray include of asm/uaccess.h from cacheflush.h mn10300: remove a bogus processor.h->uaccess.h include xtensa: split uaccess.h into C and asm sides bonding: quit messing with IOCTL kill __kernel_ds_p off mn10300: finish verify_area() off frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h exceptions: detritus removal
| * | | | x86: separate extable.h, switch sections.h to itAl Viro2016-09-283-32/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/platform/x86/dell-smo8800.c is touched due to the following obscenity: drivers/platform/x86/dell-smo8800.c -> linux/interrupt.h -> linux/hardirq.h -> asm/hardirq.h -> linux/irq.h -> asm/hw_irq.h -> asm/sections.h -> asm/uaccess.h is the only chain of includes pulling asm/uaccess.h there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | remove stray include of asm/uaccess.h from cacheflush.hAl Viro2016-09-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was introduced in "arch, x86: pmem api for ensuring durability of persistent memory updates" in July 2015, along with the code that really used that stuff. Three weeks later all that code got moved to asm/pmem.h by "pmem, x86: move x86 PMEM API to new pmem.h header"; include, however, was left behind. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | x86/panic: replace smp_send_stop() with kdump friendly version in panic pathHidehiro Kawai2016-10-122-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Walker reported problems which happens when crash_kexec_post_notifiers kernel option is enabled (https://lkml.org/lkml/2015/6/24/44). In that case, smp_send_stop() is called before entering kdump routines which assume other CPUs are still online. As the result, for x86, kdump routines fail to save other CPUs' registers and disable virtualization extensions. To fix this problem, call a new kdump friendly function, crash_smp_send_stop(), instead of the smp_send_stop() when crash_kexec_post_notifiers is enabled. crash_smp_send_stop() is a weak function, and it just call smp_send_stop(). Architecture codes should override it so that kdump can work appropriately. This patch only provides x86-specific version. For Xen's PV kernel, just keep the current behavior. NOTES: - Right solution would be to place crash_smp_send_stop() before __crash_kexec() invocation in all cases and remove smp_send_stop(), but we can't do that until all architectures implement own crash_smp_send_stop() - crash_smp_send_stop()-like work is still needed by machine_crash_shutdown() because crash_kexec() can be called without entering panic() Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option) Link: http://lkml.kernel.org/r/20160810080948.11028.15344.stgit@sysi4-13.yrl.intra.hitachi.co.jp Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reported-by: Daniel Walker <dwalker@fifo99.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Daniel Walker <dwalker@fifo99.com> Cc: Xunlei Pang <xpang@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: "Steven J. Hill" <steven.hill@cavium.com> Cc: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'mm-pkeys-for-linus' of ↵Linus Torvalds2016-10-103-7/+99
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull protection keys syscall interface from Thomas Gleixner: "This is the final step of Protection Keys support which adds the syscalls so user space can actually allocate keys and protect memory areas with them. Details and usage examples can be found in the documentation. The mm side of this has been acked by Mel" * 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pkeys: Update documentation x86/mm/pkeys: Do not skip PKRU register if debug registers are not used x86/pkeys: Fix pkeys build breakage for some non-x86 arches x86/pkeys: Add self-tests x86/pkeys: Allow configuration of init_pkru x86/pkeys: Default to a restrictive init PKRU pkeys: Add details of system call use to Documentation/ generic syscalls: Wire up memory protection keys syscalls x86: Wire up protection keys system calls x86/pkeys: Allocation/free syscalls x86/pkeys: Make mprotect_key() mask off additional vm_flags mm: Implement new pkey_mprotect() system call x86/pkeys: Add fault handling for PF_PK page fault bit
| * | | | | x86/pkeys: Default to a restrictive init PKRUDave Hansen2016-09-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PKRU is the register that lets you disallow writes or all access to a given protection key. The XSAVE hardware defines an "init state" of 0 for PKRU: its most permissive state, allowing access/writes to everything. Since we start off all new processes with the init state, we start all processes off with the most permissive possible PKRU. This is unfortunate. If a thread is clone()'d [1] before a program has time to set PKRU to a restrictive value, that thread will be able to write to all data, no matter what pkey is set on it. This weakens any integrity guarantees that we want pkeys to provide. To fix this, we define a very restrictive PKRU to override the XSAVE-provided value when we create a new FPU context. We choose a value that only allows access to pkey 0, which is as restrictive as we can practically make it. This does not cause any practical problems with applications using protection keys because we require them to specify initial permissions for each key when it is allocated, which override the restrictive default. In the end, this ensures that threads which do not know how to manage their own pkey rights can not do damage to data which is pkey-protected. I would have thought this was a pretty contrived scenario, except that I heard a bug report from an MPX user who was creating threads in some very early code before main(). It may be crazy, but folks evidently _do_ it. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | x86/pkeys: Allocation/free syscallsDave Hansen2016-09-093-7/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two new system calls: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); These implement an "allocator" for the protection keys themselves, which can be thought of as analogous to the allocator that the kernel has for file descriptors. The kernel tracks which numbers are in use, and only allows operations on keys that are valid. A key which was not obtained by pkey_alloc() may not, for instance, be passed to pkey_mprotect(). These system calls are also very important given the kernel's use of pkeys to implement execute-only support. These help ensure that userspace can never assume that it has control of a key unless it first asks the kernel. The kernel does not promise to preserve PKRU (right register) contents except for allocated pkeys. The 'init_access_rights' argument to pkey_alloc() specifies the rights that will be established for the returned pkey. For instance: pkey = pkey_alloc(flags, PKEY_DENY_WRITE); will allocate 'pkey', but also sets the bits in PKRU[1] such that writing to 'pkey' is already denied. The kernel does not prevent pkey_free() from successfully freeing in-use pkeys (those still assigned to a memory range by pkey_mprotect()). It would be expensive to implement the checks for this, so we instead say, "Just don't do it" since sane software will never do it anyway. Any piece of userspace calling pkey_alloc() needs to be prepared for it to fail. Why? pkey_alloc() returns the same error code (ENOSPC) when there are no pkeys and when pkeys are unsupported. They can be unsupported for a whole host of reasons, so apps must be prepared for this. Also, libraries or LD_PRELOADs might steal keys before an application gets access to them. This allocation mechanism could be implemented in userspace. Even if we did it in userspace, we would still need additional user/kernel interfaces to tell userspace which keys are being used by the kernel internally (such as for execute-only mappings). Having the kernel provide this facility completely removes the need for these additional interfaces, or having an implementation of this in userspace at all. Note that we have to make changes to all of the architectures that do not use mman-common.h because we use the new PKEY_DENY_ACCESS/WRITE macros in arch-independent code. 1. PKRU is the Protection Key Rights User register. It is a usermode-accessible register that controls whether writes and/or access to each individual pkey is allowed or denied. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163015.444FE75F@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | x86/pkeys: Make mprotect_key() mask off additional vm_flagsDave Hansen2016-09-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE. Three of those bits: READ/WRITE/EXEC get translated directly in to vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in mprotect()'s 'prot' argument then it must be cleared in vma->vm_flags during the mprotect() call. We do this clearing today by first calculating the VMA flags we want set, then clearing the ones we do not want to inherit from the original VMA: vm_flags = calc_vm_prot_bits(prot, key); ... newflags = vm_flags; newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC)); However, we *also* want to mask off the original VMA's vm_flags in which we store the protection key. To do that, this patch adds a new macro: ARCH_VM_PKEY_FLAGS which allows the architecture to specify additional bits that it would like cleared. We use that to ensure that the VM_PKEY_BIT* bits get cleared. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163013.E48D6981@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | mm: Implement new pkey_mprotect() system callDave Hansen2016-09-092-7/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pkey_mprotect() is just like mprotect, except it also takes a protection key as an argument. On systems that do not support protection keys, it still works, but requires that key=0. Otherwise it does exactly what mprotect does. I expect it to get used like this, if you want to guarantee that any mapping you create can *never* be accessed without the right protection keys set up. int real_prot = PROT_READ|PROT_WRITE; pkey = pkey_alloc(0, PKEY_DENY_ACCESS); ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); This way, there is *no* window where the mapping is accessible since it was always either PROT_NONE or had a protection key set that denied all access. We settled on 'unsigned long' for the type of the key here. We only need 4 bits on x86 today, but I figured that other architectures might need some more space. Semantically, we have a bit of a problem if we combine this syscall with our previously-introduced execute-only support: What do we do when we mix execute-only pkey use with pkey_mprotect() use? For instance: pkey_mprotect(ptr, PAGE_SIZE, PROT_WRITE, 6); // set pkey=6 mprotect(ptr, PAGE_SIZE, PROT_EXEC); // set pkey=X_ONLY_PKEY? mprotect(ptr, PAGE_SIZE, PROT_WRITE); // is pkey=6 again? To solve that, we make the plain-mprotect()-initiated execute-only support only apply to VMAs that have the default protection key (0) set on them. Proposed semantics: 1. protection key 0 is special and represents the default, "unassigned" protection key. It is always allocated. 2. mprotect() never affects a mapping's pkey_mprotect()-assigned protection key. A protection key of 0 (even if set explicitly) represents an unassigned protection key. 2a. mprotect(PROT_EXEC) on a mapping with an assigned protection key may or may not result in a mapping with execute-only properties. pkey_mprotect() plus pkey_set() on all threads should be used to _guarantee_ execute-only semantics if this is not a strong enough semantic. 3. mprotect(PROT_EXEC) may result in an "execute-only" mapping. The kernel will internally attempt to allocate and dedicate a protection key for the purpose of execute-only mappings. This may not be possible in cases where there are no free protection keys available. It can also happen, of course, in situations where there is no hardware support for protection keys. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: linux-arch@vger.kernel.org Cc: Dave Hansen <dave@sr71.net> Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163012.3DDD36C4@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2016-10-102-15/+6
|\ \ \ \ \ \ | | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Thomas Gleixner: "A pile of regression fixes and updates: - address the fallout of the patches which made the cpuid - nodeid relation permanent: Handling of invalid APIC ids and preventing pointless warning messages. - force eager FPU when protection keys are enabled. Protection keys are not generating FPU exceptions so they cannot work with the lazy FPU mechanism. - prevent force migration of interrupts which are not part of the CPU vector domain. - handle the fact that APIC ids are not updated in the ACPI/MADT tables on physical CPU hotplug - remove bash-isms from syscall table generator script - use the hypervisor supplied APIC frequency when running on VMware" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pkeys: Make protection keys an "eager" feature x86/apic: Prevent pointless warning messages x86/acpi: Prevent LAPIC id 0xff from being accounted arch/x86: Handle non enumerated CPU after physical hotplug x86/unwind: Fix oprofile module link error x86/vmware: Skip lapic calibration on VMware x86/syscalls: Remove bash-isms in syscall table generator x86/irq: Prevent force migration of irqs which are not in the vector domain
| * | | | | x86/pkeys: Make protection keys an "eager" featureDave Hansen2016-10-081-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our XSAVE features are divided into two categories: those that generate FPU exceptions, and those that do not. MPX and pkeys do not generate FPU exceptions and thus can not be used lazily. We disable them when lazy mode is forced on. We have a pair of masks to collect these two sets of features, but XFEATURE_MASK_PKRU was added to the wrong mask: XFEATURE_MASK_LAZY. Fix it by moving the feature to XFEATURE_MASK_EAGER. Note: this only causes problem if you boot with lazy FPU mode (eagerfpu=off) which is *not* the default. It also only affects hardware which is not currently publicly available. It looks like eager mode is going away, but we still need this patch applied to any kernel that has protection keys and lazy mode, which is 4.6 through 4.8 at this point, and 4.9 if the lazy removal isn't sent to Linus for 4.9. Fixes: c8df40098451 ("x86/fpu, x86/mm/pkeys: Add PKRU xsave fields and data structures") Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20161007162342.28A49813@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | x86/unwind: Fix oprofile module link errorJosh Poimboeuf2016-10-061-12/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n, the oprofile module fails to link: ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined! The problem was introduced when oprofile was converted to use the new x86 unwinder. When frame pointers are disabled, the "guess" unwinder's unwind_get_return_address() is an inline function which calls ftrace_graph_ret_addr(), which is not exported. Fix it by converting the "guess" version of unwind_get_return_address() to an exported out-of-line function, just like its frame pointer counterpart. Reported-by: Karl Beldan <karl.beldan@gmail.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: ec2ad9ccf12d ("oprofile/x86: Convert x86_backtrace() to use the new unwinder") Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | nmi_backtrace: generate one-line reports for idle cpusChris Metcalf2016-10-081-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing an nmi backtrace of many cores, most of which are idle, the output is a little overwhelming and very uninformative. Suppress messages for cpus that are idling when they are interrupted and just emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". We do this by grouping all the cpuidle code together into a new .cpuidle.text section, and then checking the address of the interrupted PC to see if it lies within that section. This commit suitably tags x86 and tile idle routines, and only adds in the minimal framework for other architectures. Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Tested-by: Petr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | nmi_backtrace: add more trigger_*_cpu_backtrace() methodsChris Metcalf2016-10-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "improvements to the nmi_backtrace code" v9. This patch series modifies the trigger_xxx_backtrace() NMI-based remote backtracing code to make it more flexible, and makes a few small improvements along the way. The motivation comes from the task isolation code, where there are scenarios where we want to be able to diagnose a case where some cpu is about to interrupt a task-isolated cpu. It can be helpful to see both where the interrupting cpu is, and also an approximation of where the cpu that is being interrupted is. The nmi_backtrace framework allows us to discover the stack of the interrupted cpu. I've tested that the change works as desired on tile, and build-tested x86, arm, mips, and sparc64. For x86 I confirmed that the generic cpuidle stuff as well as the architecture-specific routines are in the new cpuidle section. For arm, mips, and sparc I just build-tested it and made sure the generic cpuidle routines were in the new cpuidle section, but I didn't attempt to figure out which the platform-specific idle routines might be. That might be more usefully done by someone with platform experience in follow-up patches. This patch (of 4): Currently you can only request a backtrace of either all cpus, or all cpus but yourself. It can also be helpful to request a remote backtrace of a single cpu, and since we want that, the logical extension is to support a cpumask as the underlying primitive. This change modifies the existing lib/nmi_backtrace.c code to take a cpumask as its basic primitive, and modifies the linux/nmi.h code to use the new "cpumask" method instead. The existing clients of nmi_backtrace (arm and x86) are converted to using the new cpumask approach in this change. The other users of the backtracing API (sparc64 and mips) are converted to use the cpumask approach rather than the all/allbutself approach. The mips code ignored the "include_self" boolean but with this change it will now also dump a local backtrace if requested. Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | mm: move phys_mem_access_prot_allowed() declaration to pgtable.hBaoyou Xie2016-10-081-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We get 1 warning when building kernel with W=1: drivers/char/mem.c:220:12: warning: no previous prototype for 'phys_mem_access_prot_allowed' [-Wmissing-prototypes] int __weak phys_mem_access_prot_allowed(struct file *file, In fact, its declaration is spreading to several header files in different architecture, but need to be declare in common header file. So this patch moves phys_mem_access_prot_allowed() to pgtable.h. Link: http://lkml.kernel.org/r/1473751597-12139-1-git-send-email-baoyou.xie@linaro.org Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge tag 'pci-v4.9-changes' of ↵Linus Torvalds2016-10-071-0/+14
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Summary of PCI changes for the v4.9 merge window: Enumeration: - microblaze: Add multidomain support for procfs (Bharat Kumar Gogada) Resource management: - Ignore requested alignment for PROBE_ONLY and fixed resources (Yongji Xie) - Ignore requested alignment for VF BARs (Yongji Xie) PCI device hotplug: - Make core explicitly non-modular (Paul Gortmaker) PCIe native device hotplug: - Rename pcie_isr() locals for clarity (Bjorn Helgaas) - Return IRQ_NONE when we can't read interrupt status (Bjorn Helgaas) - Remove unnecessary guard (Bjorn Helgaas) - Clean up dmesg "Slot(%s)" messages (Bjorn Helgaas) - Remove useless pciehp_get_latch_status() calls (Bjorn Helgaas) - Clear attention LED on device add (Keith Busch) - Allow exclusive userspace control of indicators (Keith Busch) - Process all hotplug events before looking for new ones (Mayurkumar Patel) - Don't re-read Slot Status when queuing hotplug event (Mayurkumar Patel) - Don't re-read Slot Status when handling surprise event (Mayurkumar Patel) - Make explicitly non-modular (Paul Gortmaker) Power management: - Afford direct-complete to devices with non-standard PM (Lukas Wunner) - Query platform firmware for device power state (Lukas Wunner) - Recognize D3cold in pci_update_current_state() (Lukas Wunner) - Avoid unnecessary resume after direct-complete (Lukas Wunner) - Make explicitly non-modular (Paul Gortmaker) Virtualization: - Mark Atheros AR9580 to avoid bus reset (Maik Broemme) - Check for pci_setup_device() failure in pci_iov_add_virtfn() (Po Liu) MSI: - Enable PCI_MSI_IRQ_DOMAIN support for ARC (Joao Pinto) AER: - Remove aerdriver.nosourceid kernel parameter (Bjorn Helgaas) - Remove aerdriver.forceload kernel parameter (Bjorn Helgaas) - Fix aer_probe() kernel-doc comment (Cao jin) - Add bus flag to skip source ID matching (Jon Derrick) - Avoid memory allocation in interrupt handling path (Jon Derrick) - Cache capability position (Keith Busch) - Make explicitly non-modular (Paul Gortmaker) - Remove duplicate AER severity translation (Tyler Baicar) - Send correct severity to calculate AER severity (Tyler Baicar) Precision Time Measurement: - Add Precision Time Measurement (PTM) support (Jonathan Yong) - Add PTM clock granularity information (Bjorn Helgaas) - Add pci_enable_ptm() for drivers to enable PTM on endpoints (Bjorn Helgaas) Generic host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Make explicitly non-modular (Paul Gortmaker) Altera host bridge driver: - Remove redundant platform_get_resource() return value check (Bjorn Helgaas) - Poll for link training status after retraining the link (Ley Foon Tan) - Rework config accessors for use without a struct pci_bus (Ley Foon Tan) - Move retrain from fixup to altera_pcie_host_init() (Ley Foon Tan) - Make MSI explicitly non-modular (Paul Gortmaker) - Make explicitly non-modular (Paul Gortmaker) - Relax device number checking to allow SR-IOV (Po Liu) ARM Versatile host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) Axis ARTPEC-6 host bridge driver: - Drop __init from artpec6_add_pcie_port() (Niklas Cassel) Freescale i.MX6 host bridge driver: - Make explicitly non-modular (Paul Gortmaker) Intel VMD host bridge driver: - Add quirk for AER to ignore source ID (Jon Derrick) - Allocate IRQ lists with correct MSI-X count (Jon Derrick) - Convert to use pci_alloc_irq_vectors() API (Jon Derrick) - Eliminate vmd_vector member from list type (Jon Derrick) - Eliminate index member from IRQ list (Jon Derrick) - Synchronize with RCU freeing MSI IRQ descs (Keith Busch) - Request userspace control of PCIe hotplug indicators (Keith Busch) - Move VMD driver to drivers/pci/host (Keith Busch) Marvell Aardvark host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Remove redundant dev_err call in advk_pcie_probe() (Wei Yongjun) Microsoft Hyper-V host bridge driver: - Use zero-length array in struct pci_packet (Dexuan Cui) - Use pci_function_description[0] in struct definitions (Dexuan Cui) - Remove the unused 'wrk' in struct hv_pcibus_device (Dexuan Cui) - Handle vmbus_sendpacket() failure in hv_compose_msi_msg() (Dexuan Cui) - Handle hv_pci_generic_compl() error case (Dexuan Cui) - Use list_move_tail() instead of list_del() + list_add_tail() (Wei Yongjun) NVIDIA Tegra host bridge driver: - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Remove redundant _data suffix (Thierry Reding) - Use of_device_get_match_data() (Thierry Reding) Qualcomm host bridge driver: - Make explicitly non-modular (Paul Gortmaker) Renesas R-Car host bridge driver: - Consolidate register space lookup and ioremap (Bjorn Helgaas) - Don't disable/unprepare clocks on prepare/enable failure (Geert Uytterhoeven) - Add multi-MSI support (Grigory Kletsko) - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Fix some checkpatch warnings (Sergei Shtylyov) - Try increasing PCIe link speed to 5 GT/s at boot (Sergei Shtylyov) Rockchip host bridge driver: - Add DT bindings for Rockchip PCIe controller (Shawn Lin) - Add Rockchip PCIe controller support (Shawn Lin) - Improve the deassert sequence of four reset pins (Shawn Lin) - Fix wrong transmitted FTS count (Shawn Lin) - Increase the Max Credit update interval (Rajat Jain) Samsung Exynos host bridge driver: - Make explicitly non-modular (Paul Gortmaker) ST Microelectronics SPEAr13xx host bridge driver: - Make explicitly non-modular (Paul Gortmaker) Synopsys DesignWare host bridge driver: - Return data directly from dw_pcie_readl_rc() (Bjorn Helgaas) - Exchange viewport of `MEMORYs' and `CFGs/IOs' (Dong Bo) - Check LTSSM training bit before deciding link is up (Jisheng Zhang) - Move link wait definitions to .c file (Joao Pinto) - Wait for iATU enable (Joao Pinto) - Add iATU Unroll feature (Joao Pinto) - Fix pci_remap_iospace() failure path (Lorenzo Pieralisi) - Make explicitly non-modular (Paul Gortmaker) - Relax device number checking to allow SR-IOV (Po Liu) - Keep viewport fixed for IO transaction if num_viewport > 2 (Pratyush Anand) - Remove redundant platform_get_resource() return value check (Wei Yongjun) TI DRA7xx host bridge driver: - Make explicitly non-modular (Paul Gortmaker) TI Keystone host bridge driver: - Propagate request_irq() failure (Wei Yongjun) Xilinx AXI host bridge driver: - Keep both legacy and MSI interrupt domain references (Bharat Kumar Gogada) - Clear interrupt register for invalid interrupt (Bharat Kumar Gogada) - Clear correct MSI set bit (Bharat Kumar Gogada) - Dispose of MSI virtual IRQ (Bharat Kumar Gogada) - Make explicitly non-modular (Paul Gortmaker) - Relax device number checking to allow SR-IOV (Po Liu) Xilinx NWL host bridge driver: - Expand error logging (Bharat Kumar Gogada) - Enable all MSI interrupts using MSI mask (Bharat Kumar Gogada) - Make explicitly non-modular (Paul Gortmaker) Miscellaneous: - Drop CONFIG_KEXEC_CORE ifdeffery (Lukas Wunner) - portdrv: Make explicitly non-modular (Paul Gortmaker) - Make DPC explicitly non-modular (Paul Gortmaker)" * tag 'pci-v4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (105 commits) x86/PCI: VMD: Move VMD driver to drivers/pci/host PCI: rockchip: Fix wrong transmitted FTS count PCI: rockchip: Improve the deassert sequence of four reset pins PCI: rockchip: Increase the Max Credit update interval PCI: rcar: Try increasing PCIe link speed to 5 GT/s at boot PCI/AER: Fix aer_probe() kernel-doc comment PCI: Ignore requested alignment for VF BARs PCI: Ignore requested alignment for PROBE_ONLY and fixed resources PCI: Avoid unnecessary resume after direct-complete PCI: Recognize D3cold in pci_update_current_state() PCI: Query platform firmware for device power state PCI: Afford direct-complete to devices with non-standard PM PCI/AER: Cache capability position PCI/AER: Avoid memory allocation in interrupt handling path x86/PCI: VMD: Request userspace control of PCIe hotplug indicators PCI: pciehp: Allow exclusive userspace control of indicators ACPI / APEI: Send correct severity to calculate AER severity PCI/AER: Remove duplicate AER severity translation x86/PCI: VMD: Synchronize with RCU freeing MSI IRQ descs x86/PCI: VMD: Eliminate index member from IRQ list ...
| * | | | | | x86/PCI: VMD: Request userspace control of PCIe hotplug indicatorsKeith Busch2016-09-231-0/+14
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add set_dev_domain_options() to set PCI domain-specific options as devices are added. The first usage is to request exclusive userspace control of PCIe hotplug indicators in VMD domains. Devices in a VMD domain use PCIe hotplug Attention and Power Indicators in a non-standard way; tell pciehp to ignore the indicators so userspace can control them via the sysfs "attention" file. To determine whether a bus is within a VMD domain, add a bool to the pci_sysdata structure that the VMD driver sets during initialization. [bhelgaas: changelog] Requested-by: Kapil Karkra <kapil.karkra@intel.com> Tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | | | | Merge tag 'for-linus-4.9-rc0-tag' of ↵Linus Torvalds2016-10-061-11/+0
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from David Vrabel: "xen features and fixes for 4.9: - switch to new CPU hotplug mechanism - support driver_override in pciback - require vector callback for HVM guests (the alternate mechanism via the platform device has been broken for ages)" * tag 'for-linus-4.9-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/x86: Update topology map for PV VCPUs xen/x86: Initialize per_cpu(xen_vcpu, 0) a little earlier xen/pciback: support driver_override xen/pciback: avoid multiple entries in slot list xen/pciback: simplify pcistub device handling xen: Remove event channel notification through Xen PCI platform device xen/events: Convert to hotplug state machine xen/x86: Convert to hotplug state machine x86/xen: add missing \n at end of printk warning message xen/grant-table: Use kmalloc_array() in arch_gnttab_valloc() xen: Make VPMU init message look less scary xen: rename xen_pmu_init() in sys-hypervisor.c hotplug: Prevent alloc/free of irq descriptors during cpu up/down (again) xen/x86: Move irq allocation from Xen smp_op.cpu_up()
| * | | | | | xen: Remove event channel notification through Xen PCI platform deviceKarimAllah Ahmed2016-09-301-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ever since commit 254d1a3f02eb ("xen/pv-on-hvm kexec: shutdown watches from old kernel") using the INTx interrupt from Xen PCI platform device for event channel notification would just lockup the guest during bootup. postcore_initcall now calls xs_reset_watches which will eventually try to read a value from XenStore and will get stuck on read_reply at XenBus forever since the platform driver is not probed yet and its INTx interrupt handler is not registered yet. That means that the guest can not be notified at this moment of any pending event channels and none of the per-event handlers will ever be invoked (including the XenStore one) and the reply will never be picked up by the kernel. The exact stack where things get stuck during xenbus_init: -xenbus_init -xs_init -xs_reset_watches -xenbus_scanf -xenbus_read -xs_single -xs_single -xs_talkv Vector callbacks have always been the favourite event notification mechanism since their introduction in commit 38e20b07efd5 ("x86/xen: event channels delivery on HVM.") and the vector callback feature has always been advertised for quite some time by Xen that's why INTx was broken for several years now without impacting anyone. Luckily this also means that event channel notification through INTx is basically dead-code which can be safely removed without impacting anybody since it has been effectively disabled for more than 4 years with nobody complaining about it (at least as far as I'm aware of). This commit removes event channel notification through Xen PCI platform device. Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Julien Grall <julien.grall@citrix.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Ross Lagerwall <ross.lagerwall@citrix.com> Cc: xen-devel@lists.xenproject.org Cc: linux-kernel@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: Anthony Liguori <aliguori@amazon.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | | | | | | Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-10-062-39/+44
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Radim Krčmář: "All architectures: - move `make kvmconfig` stubs from x86 - use 64 bits for debugfs stats ARM: - Important fixes for not using an in-kernel irqchip - handle SError exceptions and present them to guests if appropriate - proxying of GICV access at EL2 if guest mappings are unsafe - GICv3 on AArch32 on ARMv8 - preparations for GICv3 save/restore, including ABI docs - cleanups and a bit of optimizations MIPS: - A couple of fixes in preparation for supporting MIPS EVA host kernels - MIPS SMP host & TLB invalidation fixes PPC: - Fix the bug which caused guests to falsely report lockups - other minor fixes - a small optimization s390: - Lazy enablement of runtime instrumentation - up to 255 CPUs for nested guests - rework of machine check deliver - cleanups and fixes x86: - IOMMU part of AMD's AVIC for vmexit-less interrupt delivery - Hyper-V TSC page - per-vcpu tsc_offset in debugfs - accelerated INS/OUTS in nVMX - cleanups and fixes" * tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits) KVM: MIPS: Drop dubious EntryHi optimisation KVM: MIPS: Invalidate TLB by regenerating ASIDs KVM: MIPS: Split kernel/user ASID regeneration KVM: MIPS: Drop other CPU ASIDs on guest MMU changes KVM: arm/arm64: vgic: Don't flush/sync without a working vgic KVM: arm64: Require in-kernel irqchip for PMU support KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie KVM: PPC: BookE: Fix a sanity check KVM: PPC: Book3S HV: Take out virtual core piggybacking code KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread ARM: gic-v3: Work around definition of gic_write_bpr1 KVM: nVMX: Fix the NMI IDT-vectoring handling KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive KVM: nVMX: Fix reload apic access page warning kvmconfig: add virtio-gpu to config fragment config: move x86 kvm_guest.config to a common location arm64: KVM: Remove duplicating init code for setting VMID ARM: KVM: Support vgic-v3 ...
| * | | | | | KVM: x86: Hyper-V tsc page setupPaolo Bonzini2016-09-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lately tsc page was implemented but filled with empty values. This patch setup tsc page scale and offset based on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value. The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr reads count to zero which potentially improves performance. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Reviewed-by: Peter Hornyack <peterhornyack@google.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Roman Kagan <rkagan@virtuozzo.com> CC: Denis V. Lunev <den@openvz.org> [Computation of TSC page parameters rewritten to use the Linux timekeeper parameters. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: introduce get_kvmclock_nsPaolo Bonzini2016-09-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a function that reads the exact nanoseconds value that is provided to the guest in kvmclock. This crystallizes the notion of kvmclock as a thin veneer over a stable TSC, that the guest will (hopefully) convert with NTP. In other words, kvmclock is *not* a paravirtualized host-to-guest NTP. Drop the get_kernel_ns() function, that was used both to get the base value of the master clock and to get the current value of kvmclock. The former use is replaced by ktime_get_boot_ns(), the latter is the purpose of get_kernel_ns(). This also allows KVM to provide a Hyper-V time reference counter that is synchronized with the time that is computed from the TSC page. Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | kvm: x86: drop read_tsc_offset()Luiz Capitulino2016-09-161-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TSC offset can now be read directly from struct kvm_arch_vcpu. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | kvm: x86: add tsc_offset field to struct kvm_vcpu_archLuiz Capitulino2016-09-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A future commit will want to easily read a vCPU's TSC offset, so we store it in struct kvm_arch_vcpu_arch for easy access. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2016-09-131-36/+36
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Paul Mackerras writes: The highlights are: * Reduced latency for interrupts from PCI pass-through devices, from Suresh Warrier and me. * Halt-polling implementation from Suraj Jitindar Singh. * 64-bit VCPU statistics, also from Suraj. * Various other minor fixes and improvements.
| | * | | | | | KVM: Add provisioning for ulong vm stats and u64 vcpu statsSuraj Jitindar Singh2016-09-081-36/+36
| | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Matlack <dmatlack@google.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | | | | svm: Introduce AMD IOMMU avic_ga_log_notifierSuravee Suthikulpanit2016-09-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces avic_ga_log_notifier, which will be called by IOMMU driver whenever it handles the Guest vAPIC (GA) log entry. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | svm: Introduces AVIC per-VM IDSuravee Suthikulpanit2016-09-081-0/+1
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces per-VM AVIC ID and helper functions to manage the IDs. Currently, the ID will be used to implement 32-bit AVIC IOMMU GA tag. The ID is 24-bit one-based indexing value, and is managed via helper functions to get the next ID, or to free an ID once a VM is destroyed. There should be no ID conflict for any active VMs. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | | Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds2016-10-045-4/+22
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "The main changes in this cycle centered around adding support for 32-bit compatible C/R of the vDSO on 64-bit kernels, by Dmitry Safonov" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Use CONFIG_X86_X32_ABI to enable vdso prctl x86/vdso: Only define map_vdso_randomized() if CONFIG_X86_64 x86/vdso: Only define prctl_map_vdso() if CONFIG_CHECKPOINT_RESTORE x86/signal: Add SA_{X32,IA32}_ABI sa_flags x86/ptrace: Down with test_thread_flag(TIF_IA32) x86/coredump: Use pr_reg size, rather that TIF_IA32 flag x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_* x86/vdso: Replace calculate_addr in map_vdso() with addr x86/vdso: Unmap vdso blob on vvar mapping failure
| * | | | | | x86/signal: Add SA_{X32,IA32}_ABI sa_flagsDmitry Safonov2016-09-142-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new flags that defines which ABI to use on creating sigframe. Those flags kernel will set according to sigaction syscall ABI, which set handler for the signal being delivered. So that will drop the dependency on TIF_IA32/TIF_X32 flags on signal deliver. Those flags will be used only under CONFIG_COMPAT. Similar way ARM uses sa_flags to differ in which mode deliver signal for 26-bit applications (look at SA_THIRYTWO). Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: oleg@redhat.com Cc: linux-mm@kvack.org Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-7-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | x86/coredump: Use pr_reg size, rather that TIF_IA32 flagDmitry Safonov2016-09-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Killed PR_REG_SIZE and PR_REG_PTR macro as we can get regset size from regset view. I wish I could also kill PRSTATUS_SIZE nicely. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: 0x7f454c46@gmail.com Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-5-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*Dmitry Safonov2016-09-142-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add API to change vdso blob type with arch_prctl. As this is usefull only by needs of CRIU, expose this interface under CONFIG_CHECKPOINT_RESTORE. Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: oleg@redhat.com Cc: linux-mm@kvack.org Cc: gorcunov@openvz.org Cc: xemul@virtuozzo.com Link: http://lkml.kernel.org/r/20160905133308.28234-4-dsafonov@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | | Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds2016-10-044-5/+48
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: "The main changes in this cycle were: - SGI UV updates (Andrew Banman) - Intel MID updates (Andy Shevchenko) - Initial Mellanox systems platform (Vadim Pasternak)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/mellanox: Fix return value check in mlxplat_init() x86/platform/mellanox: Introduce support for Mellanox systems platform x86/platform/uv/BAU: Add UV4-specific functions x86/platform/uv/BAU: Fix payload queue setup on UV4 hardware x86/platform/uv/BAU: Disable software timeout on UV4 hardware x86/platform/uv/BAU: Populate ->uvhub_version with UV4 version information x86/platform/uv/BAU: Use generic function pointers x86/platform/uv/BAU: Add generic function pointers x86/platform/uv/BAU: Convert uv_physnodeaddr() use to uv_gpa_to_offset() x86/platform/uv/BAU: Clean up pq_init() x86/platform/uv/BAU: Clean up and update printks x86/platform/uv/BAU: Clean up vertical alignment x86/platform/intel-mid: Keep SRAM powered on at boot x86/platform/intel-mid: Add Intel Penwell to ID table x86/cpu: Rename Merrifield2 to Moorefield x86/platform/intel-mid: Implement power off sequence x86/platform/intel-mid: Enable SD card detection on Merrifield x86/platform/intel-mid: Enable WiFi on Intel Edison x86/platform/intel-mid: Run PWRMU command immediately
| * | | | | | | x86/platform/uv/BAU: Add UV4-specific functionsAndrew Banman2016-09-221-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the UV4-specific function definitions and define an operations struct to implement them in the BAU driver. Many BAU MMRs, although functionally the same, have new addresses on UV4 due to hardware changes. Each MMR requires new read/write functions, but their implementation in the driver does not change. Thus, it is enough to enumerate them in the operations struct for the changes to take effect. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <travis@sgi.com> Acked-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: akpm@linux-foundation.org Cc: rja@sgi.com Link: http://lkml.kernel.org/r/1474474161-265604-11-git-send-email-abanman@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>