summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* kbuild: remove {CLEAN,MRPROPER,DISTCLEAN}_DIRSMasahiro Yamada2020-05-251-1/+1
| | | | | | | | | Merge {CLEAN,MRPROPER,DISTCLEAN}_DIRS into {CLEAN,MRPROPER,DISTCLEAN}_FILES because the difference is just the -r option passed to the 'rm' command. Do likewise as commit 1634f2bfdb84 ("kbuild: remove clean-dirs syntax"). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* kbuild: remove unused AS assignmentMasahiro Yamada2020-05-122-4/+0
| | | | | | | | | | | $(AS) is not used anywhere in the kernel build, hence commit aa824e0c962b ("kbuild: remove AS variable") killed it. Remove the left-over code in arch/{arm,arm64}/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Will Deacon <will@kernel.org>
* unicore32: do not evaluate compiler's library path when cleaningMasahiro Yamada2020-05-121-2/+2
| | | | | | | | | | | | | | | | | | | Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from -Wframe-larger-than="), 'make ARCH=unicore32 clean' emits error messages as follows: $ make ARCH=unicore32 clean gcc: error: missing argument to '-Wframe-larger-than=' gcc: error: missing argument to '-Wframe-larger-than=' We do not care compiler flags when cleaning. Use the '=' operator for lazy expansion because we do not use GNU_LIBC_A or GNU_LIBGCC_A when cleaning. Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from -Wframe-larger-than=") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
* h8300: suppress error messages for 'make clean'Masahiro Yamada2020-05-121-1/+1
| | | | | | | | | | | | | 'make ARCH=h8300 clean' emits error messages as follows: $ make ARCH=h8300 clean gcc: error: missing argument to '-Wframe-larger-than=' gcc: error: unrecognized command line option '-mint32' You can suppress the second one by setting the correct CROSS_COMPILE=, but we should not require any compiler for cleaning. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* hexagon: suppress error message for 'make clean'Masahiro Yamada2020-05-121-1/+1
| | | | | | | | | | | | | 'make ARCH=hexagon clean' emits an error message as follows: $ make ARCH=hexagon clean gcc: error: unrecognized command line option '-G0' You can suppress it by setting the correct CROSS_COMPILE=, but we should not require any compiler for cleaning. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Brian Cain <bcain@codeaurora.org>
* um: do not evaluate compiler's library path when cleaningMasahiro Yamada2020-05-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from -Wframe-larger-than="), 'make ARCH=um clean' emits an error message as follows: $ make ARCH=um clean gcc: error: missing argument to '-Wframe-larger-than=' We do not care compiler flags when cleaning. Use the '=' operator for lazy expansion because we do not use LDFLAGS_pcap.o or LDFLAGS_vde.o when cleaning. While I was here, I removed the redundant -r option because it already exists in the recipe. Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from -Wframe-larger-than=") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> [build]
* Merge tag 'x86-urgent-2020-05-10' of ↵Linus Torvalds2020-05-109-86/+133
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes for x86: - Ensure that direct mapping alias is always flushed when changing page attributes. The optimization for small ranges failed to do so when the virtual address was in the vmalloc or module space. - Unbreak the trace event registration for syscalls without arguments caused by the refactoring of the SYSCALL_DEFINE0() macro. - Move the printk in the TSC deadline timer code to a place where it is guaranteed to only be called once during boot and cannot be rearmed by clearing warn_once after boot. If it's invoked post boot then lockdep rightfully complains about a potential deadlock as the calling context is different. - A series of fixes for objtool and the ORC unwinder addressing variety of small issues: - Stack offset tracking for indirect CFAs in objtool ignored subsequent pushs and pops - Repair the unwind hints in the register clearing entry ASM code - Make the unwinding in the low level exit to usermode code stop after switching to the trampoline stack. The unwind hint is no longer valid and the ORC unwinder emits a warning as it can't find the registers anymore. - Fix unwind hints in switch_to_asm() and rewind_stack_do_exit() which caused objtool to generate bogus ORC data. - Prevent unwinder warnings when dumping the stack of a non-current task as there is no way to be sure about the validity because the dumped stack can be a moving target. - Make the ORC unwinder behave the same way as the frame pointer unwinder when dumping an inactive tasks stack and do not skip the first frame. - Prevent ORC unwinding before ORC data has been initialized - Immediately terminate unwinding when a unknown ORC entry type is found. - Prevent premature stop of the unwinder caused by IRET frames. - Fix another infinite loop in objtool caused by a negative offset which was not catched. - Address a few build warnings in the ORC unwinder and add missing static/ro_after_init annotations" * tag 'x86-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULES x86/apic: Move TSC deadline timer debug printk ftrace/x86: Fix trace event registration for syscalls without arguments x86/mm/cpa: Flush direct map alias during cpa objtool: Fix infinite loop in for_offset_range() x86/unwind/orc: Fix premature unwind stoppage due to IRET frames x86/unwind/orc: Fix error path for bad ORC entry type x86/unwind/orc: Prevent unwinding before ORC initialization x86/unwind/orc: Don't skip the first frame for inactive tasks x86/unwind: Prevent false warnings for non-current tasks x86/unwind/orc: Convert global variables to static x86/entry/64: Fix unwind hints in rewind_stack_do_exit() x86/entry/64: Fix unwind hints in __switch_to_asm() x86/entry/64: Fix unwind hints in kernel exit path x86/entry/64: Fix unwind hints in register clearing code objtool: Fix stack offset tracking for indirect CFAs
| * x86/unwind/orc: Move ORC sorting variables under !CONFIG_MODULESJosh Poimboeuf2020-05-031-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warnings seen with !CONFIG_MODULES: arch/x86/kernel/unwind_orc.c:29:26: warning: 'cur_orc_table' defined but not used [-Wunused-variable] 29 | static struct orc_entry *cur_orc_table = __start_orc_unwind; | ^~~~~~~~~~~~~ arch/x86/kernel/unwind_orc.c:28:13: warning: 'cur_orc_ip_table' defined but not used [-Wunused-variable] 28 | static int *cur_orc_ip_table = __start_orc_unwind_ip; | ^~~~~~~~~~~~~~~~ Fixes: 153eb2223c79 ("x86/unwind/orc: Convert global variables to static") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linux Next Mailing List <linux-next@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200428071640.psn5m7eh3zt2in4v@treble
| * x86/apic: Move TSC deadline timer debug printkThomas Gleixner2020-05-011-13/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Leon reported that the printk_once() in __setup_APIC_LVTT() triggers a lockdep splat due to a lock order violation between hrtimer_base::lock and console_sem, when the 'once' condition is reset via /sys/kernel/debug/clear_warn_once after boot. The initial printk cannot trigger this because that happens during boot when the local APIC timer is set up on the boot CPU. Prevent it by moving the printk to a place which is guaranteed to be only called once during boot. Mark the deadline timer check related functions and data __init while at it. Reported-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87y2qhoshi.fsf@nanos.tec.linutronix.de
| * ftrace/x86: Fix trace event registration for syscalls without argumentsKonstantin Khlebnikov2020-05-011-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The refactoring of SYSCALL_DEFINE0() macros removed the ABI stubs and simply defines __abi_sys_$NAME as alias of __do_sys_$NAME. As a result kallsyms_lookup() returns "__do_sys_$NAME" which does not match with the declared trace event name. See also commit 1c758a2202a6 ("tracing/x86: Update syscall trace events to handle new prefixed syscall func names"). Add __do_sys_ to the valid prefixes which are checked in arch_syscall_match_sym_name(). Fixes: d2b5de495ee9 ("x86/entry: Refactor SYSCALL_DEFINE0 macros") Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lkml.kernel.org/r/158636958997.7900.16485049455470033557.stgit@buzz
| * x86/mm/cpa: Flush direct map alias during cpaRick Edgecombe2020-04-301-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As an optimization, cpa_flush() was changed to optionally only flush the range in @cpa if it was small enough. However, this range does not include any direct map aliases changed in cpa_process_alias(). So small set_memory_() calls that touch that alias don't get the direct map changes flushed. This situation can happen when the virtual address taking variants are passed an address in vmalloc or modules space. In these cases, force a full TLB flush. Note this issue does not extend to cases where the set_memory_() calls are passed a direct map address, or page array, etc, as the primary target. In those cases the direct map would be flushed. Fixes: 935f5839827e ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation") Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200424105343.GA20730@hirez.programming.kicks-ass.net
| * x86/unwind/orc: Fix premature unwind stoppage due to IRET framesJosh Poimboeuf2020-04-253-14/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following execution path is possible: fsnotify() [ realign the stack and store previous SP in R10 ] <IRQ> [ only IRET regs saved ] common_interrupt() interrupt_entry() <NMI> [ full pt_regs saved ] ... [ unwind stack ] When the unwinder goes through the NMI and the IRQ on the stack, and then sees fsnotify(), it doesn't have access to the value of R10, because it only has the five IRET registers. So the unwind stops prematurely. However, because the interrupt_entry() code is careful not to clobber R10 before saving the full regs, the unwinder should be able to read R10 from the previously saved full pt_regs associated with the NMI. Handle this case properly. When encountering an IRET regs frame immediately after a full pt_regs frame, use the pt_regs as a backup which can be used to get the C register values. Also, note that a call frame resets the 'prev_regs' value, because a function is free to clobber the registers. For this fix to work, the IRET and full regs frames must be adjacent, with no FUNC frames in between. So replace the FUNC hint in interrupt_entry() with an IRET_REGS hint. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/97a408167cc09f1cfa0de31a7b70dd88868d743f.1587808742.git.jpoimboe@redhat.com
| * x86/unwind/orc: Fix error path for bad ORC entry typeJosh Poimboeuf2020-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the ORC entry type is unknown, nothing else can be done other than reporting an error. Exit the function instead of breaking out of the switch statement. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/a7fa668ca6eabbe81ab18b2424f15adbbfdc810a.1587808742.git.jpoimboe@redhat.com
| * x86/unwind/orc: Prevent unwinding before ORC initializationJosh Poimboeuf2020-04-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the unwinder is called before the ORC data has been initialized, orc_find() returns NULL, and it tries to fall back to using frame pointers. This can cause some unexpected warnings during boot. Move the 'orc_init' check from orc_find() to __unwind_init(), so that it doesn't even try to unwind from an uninitialized state. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/069d1499ad606d85532eb32ce39b2441679667d5.1587808742.git.jpoimboe@redhat.com
| * x86/unwind/orc: Don't skip the first frame for inactive tasksMiroslav Benes2020-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unwinding an inactive task, the ORC unwinder skips the first frame by default. If both the 'regs' and 'first_frame' parameters of unwind_start() are NULL, 'state->sp' and 'first_frame' are later initialized to the same value for an inactive task. Given there is a "less than or equal to" comparison used at the end of __unwind_start() for skipping stack frames, the first frame is skipped. Drop the equal part of the comparison and make the behavior equivalent to the frame pointer unwinder. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/7f08db872ab59e807016910acdbe82f744de7065.1587808742.git.jpoimboe@redhat.com
| * x86/unwind: Prevent false warnings for non-current tasksJosh Poimboeuf2020-04-253-18/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's some daring kernel code out there which dumps the stack of another task without first making sure the task is inactive. If the task happens to be running while the unwinder is reading the stack, unusual unwinder warnings can result. There's no race-free way for the unwinder to know whether such a warning is legitimate, so just disable unwinder warnings for all non-current tasks. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/ec424a2aea1d461eb30cab48a28c6433de2ab784.1587808742.git.jpoimboe@redhat.com
| * x86/unwind/orc: Convert global variables to staticJosh Poimboeuf2020-04-251-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These variables aren't used outside of unwind_orc.c, make them static. Also annotate some of them with '__ro_after_init', as applicable. Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/43ae310bf7822b9862e571f36ae3474cfde8f301.1587808742.git.jpoimboe@redhat.com
| * x86/entry/64: Fix unwind hints in rewind_stack_do_exit()Jann Horn2020-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LEAQ instruction in rewind_stack_do_exit() moves the stack pointer directly below the pt_regs at the top of the task stack before calling do_exit(). Tell the unwinder to expect pt_regs. Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/68c33e17ae5963854916a46f522624f8e1d264f2.1587808742.git.jpoimboe@redhat.com
| * x86/entry/64: Fix unwind hints in __switch_to_asm()Josh Poimboeuf2020-04-251-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UNWIND_HINT_FUNC has some limitations: specifically, it doesn't reset all the registers to undefined. This causes objtool to get confused about the RBP push in __switch_to_asm(), resulting in bad ORC data. While __switch_to_asm() does do some stack magic, it's otherwise a normal callable-from-C function, so just annotate it as a function, which makes objtool happy and allows it to produces the correct hints automatically. Fixes: 8c1f75587a18 ("x86/entry/64: Add unwind hint annotations") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/03d0411920d10f7418f2e909210d8e9a3b2ab081.1587808742.git.jpoimboe@redhat.com
| * x86/entry/64: Fix unwind hints in kernel exit pathJosh Poimboeuf2020-04-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In swapgs_restore_regs_and_return_to_usermode, after the stack is switched to the trampoline stack, the existing UNWIND_HINT_REGS hint is no longer valid, which can result in the following ORC unwinder warning: WARNING: can't dereference registers at 000000003aeb0cdd for ip swapgs_restore_regs_and_return_to_usermode+0x93/0xa0 For full correctness, we could try to add complicated unwind hints so the unwinder could continue to find the registers, but when when it's this close to kernel exit, unwind hints aren't really needed anymore and it's fine to just use an empty hint which tells the unwinder to stop. For consistency, also move the UNWIND_HINT_EMPTY in entry_SYSCALL_64_after_hwframe to a similar location. Fixes: 3e3b9293d392 ("x86/entry/64: Return to userspace from the trampoline stack") Reported-by: Vince Weaver <vincent.weaver@maine.edu> Reported-by: Dave Jones <dsj@fb.com> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reported-by: Joe Mario <jmario@redhat.com> Reported-by: Jann Horn <jannh@google.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/60ea8f562987ed2d9ace2977502fe481c0d7c9a0.1587808742.git.jpoimboe@redhat.com
| * x86/entry/64: Fix unwind hints in register clearing codeJosh Poimboeuf2020-04-251-19/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PUSH_AND_CLEAR_REGS macro zeroes each register immediately after pushing it. If an NMI or exception hits after a register is cleared, but before the UNWIND_HINT_REGS annotation, the ORC unwinder will wrongly think the previous value of the register was zero. This can confuse the unwinding process and cause it to exit early. Because ORC is simpler than DWARF, there are a limited number of unwind annotation states, so it's not possible to add an individual unwind hint after each push/clear combination. Instead, the register clearing instructions need to be consolidated and moved to after the UNWIND_HINT_REGS annotation. Fixes: 3f01daecd545 ("x86/entry/64: Introduce the PUSH_AND_CLEAN_REGS macro") Reviewed-by: Miroslav Benes <mbenes@suse.cz> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dave Jones <dsj@fb.com> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: https://lore.kernel.org/r/68fd3d0bc92ae2d62ff7879d15d3684217d51f08.1587808742.git.jpoimboe@redhat.com
* | Merge tag 'locking-urgent-2020-05-10' of ↵Linus Torvalds2020-05-101-2/+7
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "A single fix for the fallout of the recent futex uacess rework. With those changes GCC9 fails to analyze arch_futex_atomic_op_inuser() correctly and emits a 'maybe unitialized' warning. While we usually ignore compiler stupidity the conditional store is pointless anyway because the correct case has to store. For the fault case the extra store does no harm" * tag 'locking-urgent-2020-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ARM: futex: Address build warning
| * | ARM: futex: Address build warningThomas Gleixner2020-05-071-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen reported the following build warning on a ARM multi_v7_defconfig build with GCC 9.2.1: kernel/futex.c: In function 'do_futex': kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this function [-Wmaybe-uninitialized] 1676 | return oldval == cmparg; | ~~~~~~~^~~~~~~~~ kernel/futex.c:1652:6: note: 'oldval' was declared here 1652 | int oldval, ret; | ^~~~~~ introduced by commit a08971e9488d ("futex: arch_futex_atomic_op_inuser() calling conventions change"). While that change should not make any difference it confuses GCC which fails to work out that oldval is not referenced when the return value is not zero. GCC fails to properly analyze arch_futex_atomic_op_inuser(). It's not the early return, the issue is with the assembly macros. GCC fails to detect that those either set 'ret' to 0 and set oldval or set 'ret' to -EFAULT which makes oldval uninteresting. The store to the callsite supplied oldval pointer is conditional on ret == 0. The straight forward way to solve this is to make the store unconditional. Aside of addressing the build warning this makes sense anyway because it removes the conditional from the fastpath. In the error case the stored value is uninteresting and the extra store does not matter at all. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/87pncao2ph.fsf@nanos.tec.linutronix.de
* | | Merge tag 'riscv-for-linus-5.7-rc5' of ↵Linus Torvalds2020-05-109-34/+121
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: "A smattering of fixes and cleanups: - Dead code removal. - Exporting riscv_cpuid_to_hartid_mask for modules. - Per-CPU tracking of ISA features. - Setting max_pfn correctly when probing memory. - Adding a note to the VDSO so glibc can check the kernel's version without a uname(). - A fix to force the bootloader to initialize the boot spin tables, which still get used as a fallback when SBI-0.1 is enabled" * tag 'riscv-for-linus-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: RISC-V: Remove unused code from STRICT_KERNEL_RWX riscv: force __cpu_up_ variables to put in data section riscv: add Linux note to vdso riscv: set max_pfn to the PFN of the last page RISC-V: Remove N-extension related defines RISC-V: Add bitmap reprensenting ISA features common across CPUs RISC-V: Export riscv_cpuid_to_hartid_mask() API
| * | | RISC-V: Remove unused code from STRICT_KERNEL_RWXAtish Patra2020-05-062-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the unused functions set_kernel_text_rw/ro. Currently, it is not being invoked from anywhere and no other architecture (except arm) uses this code. Even in ARM, these functions are not invoked from anywhere currently. Fixes: d27c3c90817e ("riscv: add STRICT_KERNEL_RWX support") Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Zong Li <zong.li@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * | | riscv: force __cpu_up_ variables to put in data sectionZong Li2020-05-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Put __cpu_up_stack_pointer and __cpu_up_task_pointer in data section. Currently, these two variables are put in bss section, there is a potential risk that secondary harts get the uninitialized value before main hart finishing the bss clearing. In this case, all secondary harts would pass the waiting loop and enable the MMU before main hart set up the page table. This issue happens on random booting of multiple harts, which means it will manifest for BBL and OpenSBI v0.6 (or older version). In OpenSBI v0.7 (or higher version), we have HSM extension so all the secondary harts are brought-up by Linux kernel in an orderly fashion. This means we don't need this change for OpenSBI v0.7 (or higher version). Signed-off-by: Zong Li <zong.li@sifive.com> Reviewed-by: Greentime Hu <greentime.hu@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Atish Patra <atish.patra@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * | | riscv: add Linux note to vdsoAndreas Schwab2020-05-042-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux note in the vdso allows glibc to check the running kernel version without having to issue the uname syscall. Signed-off-by: Andreas Schwab <schwab@suse.de> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * | | riscv: set max_pfn to the PFN of the last pageVincent Chen2020-05-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current max_pfn equals to zero. In this case, I found it caused users cannot get some page information through /proc such as kpagecount in v5.6 kernel because of new sanity checks. The following message is displayed by stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t 1" on HiFive unleashed board. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [109] 4 processors online, 4 processors configured stress-ng: info: [109] dispatching hogs: 1 physpage stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0 stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0 stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found stress-ng: debug: [109] cache allocate: default cache size: 2048K stress-ng: debug: [109] starting stressors stress-ng: debug: [109] 1 stressor spawned stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0) stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success) stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success) ... stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success) stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0) stress-ng: debug: [109] process [110] terminated stress-ng: info: [109] successful run completed in 1.00s # After applying this patch, the kernel can pass the test. # stress-ng --verbose --physpage 1 -t 1 stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs stress-ng: debug: [104] cache allocate: default cache size: 2048K stress-ng: debug: [104] starting stressors stress-ng: debug: [104] 1 stressor spawned stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated stress-ng: info: [104] successful run completed in 1.01s # Cc: stable@vger.kernel.org Signed-off-by: Vincent Chen <vincent.chen@sifive.com> Reviewed-by: Anup Patel <anup@brainfault.org> Reviewed-by: Yash Shah <yash.shah@sifive.com> Tested-by: Yash Shah <yash.shah@sifive.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * | | RISC-V: Remove N-extension related definesAnup Patel2020-05-041-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RISC-V N-extension is still in draft state hence remove N-extension related defines from asm/csr.h. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * | | RISC-V: Add bitmap reprensenting ISA features common across CPUsAnup Patel2020-05-042-3/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds riscv_isa bitmap which represents Host ISA features common across all Host CPUs. The riscv_isa is not same as elf_hwcap because elf_hwcap will only have ISA features relevant for user-space apps whereas riscv_isa will have ISA features relevant to both kernel and user-space apps. One of the use-case for riscv_isa bitmap is in KVM hypervisor where we will use it to do following operations: 1. Check whether hypervisor extension is available 2. Find ISA features that need to be virtualized (e.g. floating point support, vector extension, etc.) Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Atish Patra <atish.patra@wdc.com> Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
| * | | RISC-V: Export riscv_cpuid_to_hartid_mask() APIAnup Patel2020-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The riscv_cpuid_to_hartid_mask() API should be exported to allow building KVM RISC-V as loadable module. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
* | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2020-05-081-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "14 fixes and one selftest to verify the ipc fixes herein" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: limit boost_watermark on small zones ubsan: disable UBSAN_ALIGNMENT under COMPILE_TEST mm/vmscan: remove unnecessary argument description of isolate_lru_pages() epoll: atomically remove wait entry on wake up kselftests: introduce new epoll60 testcase for catching lost wakeups percpu: make pcpu_alloc() aware of current gfp context mm/slub: fix incorrect interpretation of s->offset scripts/gdb: repair rb_first() and rb_last() eventpoll: fix missing wakeup for ovflist in ep_poll_callback arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() scripts/decodecode: fix trapping instruction formatting kernel/kcov.c: fix typos in kcov_remote_start documentation mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() mm, memcg: fix error return value of mem_cgroup_css_alloc() ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
| * | | | arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()Janakarajan Natarajan2020-05-081-1/+1
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to lock read-only pages, sev_pin_memory() fails because FOLL_WRITE is used as the flag for get_user_pages_fast(). Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") updated the get_user_pages_fast() call sites to use flags, but incorrectly updated the call in sev_pin_memory(). As the original coding of this call was correct, revert the change made by that commit. Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Mike Marshall <hubcap@omnibond.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Link: http://lkml.kernel.org/r/20200423152419.87202-1-Janakarajan.Natarajan@amd.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'arm64-fixes' of ↵Linus Torvalds2020-05-071-0/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fix from Catalin Marinas: "Avoid potential NULL dereference in huge_pte_alloc() on pmd_alloc() failure" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: hugetlb: avoid potential NULL dereference
| * | | | arm64: hugetlb: avoid potential NULL dereferenceMark Rutland2020-05-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The static analyzer in GCC 10 spotted that in huge_pte_alloc() we may pass a NULL pmdp into pte_alloc_map() when pmd_alloc() returns NULL: | CC arch/arm64/mm/pageattr.o | CC arch/arm64/mm/hugetlbpage.o | from arch/arm64/mm/hugetlbpage.c:10: | arch/arm64/mm/hugetlbpage.c: In function ‘huge_pte_alloc’: | ./arch/arm64/include/asm/pgtable-types.h:28:24: warning: dereference of NULL ‘pmdp’ [CWE-690] [-Wanalyzer-null-dereference] | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’ | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’ | |arch/arm64/mm/hugetlbpage.c:232:10: | |./arch/arm64/include/asm/pgtable-types.h:28:24: | ./arch/arm64/include/asm/pgtable.h:436:26: note: in expansion of macro ‘pmd_val’ | arch/arm64/mm/hugetlbpage.c:242:10: note: in expansion of macro ‘pte_alloc_map’ This can only occur when the kernel cannot allocate a page, and so is unlikely to happen in practice before other systems start failing. We can avoid this by bailing out if pmd_alloc() fails, as we do earlier in the function if pud_alloc() fails. Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reported-by: Kyrill Tkachov <kyrylo.tkachov@arm.com> Cc: <stable@vger.kernel.org> # 4.5.x- Cc: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2020-05-0713-39/+57
|\ \ \ \ \ | |_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm fixes from Paolo Bonzini: "Bugfixes, mostly for ARM and AMD, and more documentation. Slightly bigger than usual because I couldn't send out what was pending for rc4, but there is nothing worrisome going on. I have more fixes pending for guest debugging support (gdbstub) but I will send them next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly KVM: selftests: Fix build for evmcs.h kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path docs/virt/kvm: Document configuring and running nested guests KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts KVM: x86: Fixes posted interrupt check for IRQs delivery modes KVM: SVM: fill in kvm_run->debug.arch.dr[67] KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning KVM: arm64: Fix 32bit PC wrap-around KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS KVM: arm64: Save/restore sp_el0 as part of __guest_enter KVM: arm64: Delete duplicated label in invalid_vector KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi() KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests ...
| * | | | Merge tag 'kvm-s390-master-5.7-3' of ↵Paolo Bonzini2020-05-061-1/+3
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix for running nested uner z/VM There are circumstances when running nested under z/VM that would trigger a WARN_ON_ONCE. Remove the WARN_ON_ONCE. Long term we certainly want to make this code more robust and flexible, but just returning instead of WARNING makes guest bootable again.
| | * | | | KVM: s390: Remove false WARN_ON_ONCE for the PQAP instructionChristian Borntraeger2020-05-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In LPAR we will only get an intercept for FC==3 for the PQAP instruction. Running nested under z/VM can result in other intercepts as well as ECA_APIE is an effective bit: If one hypervisor layer has turned this bit off, the end result will be that we will get intercepts for all function codes. Usually the first one will be a query like PQAP(QCI). So the WARN_ON_ONCE is not right. Let us simply remove it. Cc: Pierre Morel <pmorel@linux.ibm.com> Cc: Tony Krowiak <akrowiak@linux.ibm.com> Cc: stable@vger.kernel.org # v5.3+ Fixes: e5282de93105 ("s390: ap: kvm: add PQAP interception for AQIC") Link: https://lore.kernel.org/kvm/20200505083515.2720-1-borntraeger@de.ibm.com Reported-by: Qian Cai <cailca@icloud.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | | | | KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properlyPeter Xu2020-05-063-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM_CAP_SET_GUEST_DEBUG should be supported for x86 however it's not declared as supported. My wild guess is that userspaces like QEMU are using "#ifdef KVM_CAP_SET_GUEST_DEBUG" to check for the capability instead, but that could be wrong because the compilation host may not be the runtime host. The userspace might still want to keep the old "#ifdef" though to not break the guest debug on old kernels. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20200505154750.126300-1-peterx@redhat.com> [Do the same for PPC and s390. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bitsPaolo Bonzini2020-05-061-15/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using CPUID data can be useful for the processor compatibility check, but that's it. Using it to compute guest-reserved bits can have both false positives (such as LA57 and UMIP which we are already handling) and false negatives: in particular, with this patch we don't allow anymore a KVM guest to set CR4.PKE when CR4.PKE is clear on the host. Fixes: b9dd21e104bc ("KVM: x86: simplify handling of PKRU") Reported-by: Jim Mattson <jmattson@google.com> Tested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB pathSean Christopherson2020-05-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear CF and ZF in the VM-Exit path after doing __FILL_RETURN_BUFFER so that KVM doesn't interpret clobbered RFLAGS as a VM-Fail. Filling the RSB has always clobbered RFLAGS, its current incarnation just happens clear CF and ZF in the processs. Relying on the macro to clear CF and ZF is extremely fragile, e.g. commit 089dd8e53126e ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool") tweaks the loop such that the ZF flag is always set. Reported-by: Qian Cai <cai@lca.pw> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Fixes: f2fde6a5bcfcf ("KVM: VMX: Move RSB stuffing to before the first RET after VM-Exit") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200506035355.2242-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | kvm: ioapic: Restrict lazy EOI update to edge-triggered interruptsPaolo Bonzini2020-05-041-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") introduces the following infinite loop: BUG: stack guard page was hit at 000000008f595917 \ (stack is 00000000bdefe5a4..00000000ae2b06f5) kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI RIP: 0010:kvm_set_irq+0x51/0x160 [kvm] Call Trace: irqfd_resampler_ack+0x32/0x90 [kvm] kvm_notify_acked_irq+0x62/0xd0 [kvm] kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm] ioapic_set_irq+0x20e/0x240 [kvm] kvm_ioapic_set_irq+0x5c/0x80 [kvm] kvm_set_irq+0xbb/0x160 [kvm] ? kvm_hv_set_sint+0x20/0x20 [kvm] irqfd_resampler_ack+0x32/0x90 [kvm] kvm_notify_acked_irq+0x62/0xd0 [kvm] kvm_ioapic_update_eoi_one.isra.0+0x30/0x120 [kvm] ioapic_set_irq+0x20e/0x240 [kvm] kvm_ioapic_set_irq+0x5c/0x80 [kvm] kvm_set_irq+0xbb/0x160 [kvm] ? kvm_hv_set_sint+0x20/0x20 [kvm] .... The re-entrancy happens because the irq state is the OR of the interrupt state and the resamplefd state. That is, we don't want to show the state as 0 until we've had a chance to set the resamplefd. But if the interrupt has _not_ gone low then ioapic_set_irq is invoked again, causing an infinite loop. This can only happen for a level-triggered interrupt, otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID high and then low. Fortunately, in the case of level-triggered interrupts the VMEXIT already happens because TMR is set. Thus, fix the bug by restricting the lazy invocation of the ack notifier to edge-triggered interrupts, the only ones that need it. Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reported-by: borisvk@bstnet.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://www.spinics.net/lists/kvm/msg213512.html Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=207489 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | KVM: x86: Fixes posted interrupt check for IRQs delivery modesSuravee Suthikulpanit2020-05-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current logic incorrectly uses the enum ioapic_irq_destination_types to check the posted interrupt destination types. However, the value was set using APIC_DM_XXX macros, which are left-shifted by 8 bits. Fixes by using the APIC_DM_FIXED and APIC_DM_LOWEST instead. Fixes: (fdcf75621375 'KVM: x86: Disable posted interrupts for non-standard IRQs delivery modes') Cc: Alexander Graf <graf@amazon.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <1586239989-58305-1-git-send-email-suravee.suthikulpanit@amd.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | Merge tag 'kvmarm-fixes-5.7-2' of ↵Paolo Bonzini2020-05-044-15/+33
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master KVM/arm fixes for Linux 5.7, take #2 - Fix compilation with Clang - Correctly initialize GICv4.1 in the absence of a virtual ITS - Move SP_EL0 save/restore to the guest entry/exit code - Handle PC wrap around on 32bit guests, and narrow all 32bit registers on userspace access
| | * | | | | KVM: arm64: Fix 32bit PC wrap-aroundMarc Zyngier2020-05-011-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the unlikely event that a 32bit vcpu traps into the hypervisor on an instruction that is located right at the end of the 32bit range, the emulation of that instruction is going to increment PC past the 32bit range. This isn't great, as userspace can then observe this value and get a bit confused. Conversly, userspace can do things like (in the context of a 64bit guest that is capable of 32bit EL0) setting PSTATE to AArch64-EL0, set PC to a 64bit value, change PSTATE to AArch32-USR, and observe that PC hasn't been truncated. More confusion. Fix both by: - truncating PC increments for 32bit guests - sanitizing all 32bit regs every time a core reg is changed by userspace, and that PSTATE indicates a 32bit mode. Cc: stable@vger.kernel.org Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | | | KVM: arm64: Save/restore sp_el0 as part of __guest_enterMarc Zyngier2020-04-302-14/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently save/restore sp_el0 in C code. This is a bit unsafe, as a lot of the C code expects 'current' to be accessible from there (and the opportunity to run kernel code in HYP is specially great with VHE). Instead, let's move the save/restore of sp_el0 to the assembly code (in __guest_enter), making sure that sp_el0 is correct very early on when we exit the guest, and is preserved as long as possible to its host value when we enter the guest. Reviewed-by: Andrew Jones <drjones@redhat.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org>
| | * | | | | KVM: arm64: Delete duplicated label in invalid_vectorFangrui Song2020-04-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SYM_CODE_START defines \label , so it is redundant to define \label again. A redefinition at the same place is accepted by GNU as (https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=159fbb6088f17a341bcaaac960623cab881b4981) but rejected by the clang integrated assembler. Fixes: 617a2f392c92 ("arm64: kvm: Annotate assembly using modern annoations") Signed-off-by: Fangrui Song <maskray@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/988 Link: https://lore.kernel.org/r/20200413231016.250737-1-maskray@google.com
| * | | | | | KVM: SVM: fill in kvm_run->debug.arch.dr[67]Paolo Bonzini2020-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The corresponding code was added for VMX in commit 42dbaa5a057 ("KVM: x86: Virtualize debug registers, 2008-12-15) but never for AMD. Fix this. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warningSean Christopherson2020-05-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use BUG() in the impossible-to-hit default case when switching on the scope of INVEPT to squash a warning with clang 11 due to clang treating the BUG_ON() as conditional. >> arch/x86/kvm/vmx/nested.c:5246:3: warning: variable 'roots_to_free' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] BUG_ON(1); Reported-by: kbuild test robot <lkp@intel.com> Fixes: ce8fe7b77bd8 ("KVM: nVMX: Free only the affected contexts when emulating INVEPT") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200504153506.28898-1-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | | | | | Merge branch 'linus' of ↵Linus Torvalds2020-05-0611-34/+69
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a potential scheduling latency problem for the algorithms used by WireGuard" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arch/nhpoly1305 - process in explicit 4k chunks crypto: arch/lib - limit simd usage to 4k chunks