| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
in the cpumask operations potentially becoming hugely less efficient,
because suddenly the cpumask was always considered to be variable-sized.
The optimization was then later added back in a limited form by commit
6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
FORCE_NR_CPUS option is not useful in a generic kernel and more of a
special case for embedded situations with fixed hardware.
Instead, just re-introduce the optimization, with some changes.
Instead of depending on CPUMASK_OFFSTACK being false, and then always
using the full constant cpumask width, this introduces three different
cpumask "sizes":
- the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
This is used for situations where we should use the exact size.
- the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
fits in a single word and the bitmap operations thus end up able
to trigger the "small_const_nbits()" optimizations.
This is used for the operations that have optimized single-word
cases that get inlined, notably the bit find and scanning functions.
- the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
is an sufficiently small constant that makes simple "copy" and
"clear" operations more efficient.
This is arbitrarily set at four words or less.
As a an example of this situation, without this fixed size optimization,
cpumask_clear() will generate code like
movl nr_cpu_ids(%rip), %edx
addq $63, %rdx
shrq $3, %rdx
andl $-8, %edx
callq memset@PLT
on x86-64, because it would calculate the "exact" number of longwords
that need to be cleared.
In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
reasonable value to use), the above becomes a single
movq $0,cpumask
instruction instead, because instead of caring to figure out exactly how
many CPU's the system has, it just knows that the cpumask will be a
single word and can just clear it all.
Note that this does end up tightening the rules a bit from the original
version in another way: operations that set bits in the cpumask are now
limited to the actual nr_cpu_ids limit, whereas we used to do the
nr_cpumask_bits thing almost everywhere in the cpumask code.
But if you just clear bits, or scan for bits, we can use the simpler
compile-time constants.
In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
which were not useful, and which fundamentally have to be limited to
'nr_cpu_ids'. Better remove them now than have somebody introduce use
of them later.
Of course, on x86-64 with MAXSMP there is no sane small compile-time
constant for the cpumask sizes, and we end up using the actual CPU bits,
and will generate the above kind of horrors regardless. Please don't
use MAXSMP unless you really expect to have machines with thousands of
cores.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 updates from Thomas Gleixner:
"A small set of updates for x86:
- Return -EIO instead of success when the certificate buffer for SEV
guests is not large enough
- Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
on return to userspace for performance reasons, but the leaves user
space vulnerable to cross-thread attacks which STIBP prevents.
Update the documentation accordingly"
* tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt/sev-guest: Return -EIO if certificate buffer is not large enough
Documentation/hw-vuln: Document the interaction between IBRS and STIBP
x86/speculation: Allow enabling STIBP with legacy IBRS
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When plain IBRS is enabled (not enhanced IBRS), the logic in
spectre_v2_user_select_mitigation() determines that STIBP is not needed.
The IBRS bit implicitly protects against cross-thread branch target
injection. However, with legacy IBRS, the IBRS bit is cleared on
returning to userspace for performance reasons which leaves userspace
threads vulnerable to cross-thread branch target injection against which
STIBP protects.
Exclude IBRS from the spectre_v2_in_ibrs_mode() check to allow for
enabling STIBP (through seccomp/prctl() by default or always-on, if
selected by spectre_v2_user kernel cmdline parameter).
[ bp: Massage. ]
Fixes: 7c693f54c873 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Reported-by: José Oliveira <joseloliveira11@gmail.com>
Reported-by: Rodrigo Branco <rodrigo@kernelhacking.com>
Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230220120127.1975241-1-kpsingh@kernel.org
Link: https://lore.kernel.org/r/20230221184908.2349578-1-kpsingh@kernel.org
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull VM_FAULT_RETRY fixes from Al Viro:
"Some of the page fault handlers do not deal with the following case
correctly:
- handle_mm_fault() has returned VM_FAULT_RETRY
- there is a pending fatal signal
- fault had happened in kernel mode
Correct action in such case is not "return unconditionally" - fatal
signals are handled only upon return to userland and something like
copy_to_user() would end up retrying the faulting instruction and
triggering the same fault again and again.
What we need to do in such case is to make the caller to treat that as
failed uaccess attempt - handle exception if there is an exception
handler for faulting instruction or oops if there isn't one.
Over the years some architectures had been fixed and now are handling
that case properly; some still do not. This series should fix the
remaining ones.
Status:
- m68k, riscv, hexagon, parisc: tested/acked by maintainers.
- alpha, sparc32, sparc64: tested locally - bug has been reproduced
on the unpatched kernel and verified to be fixed by this series.
- ia64, microblaze, nios2, openrisc: build, but otherwise completely
untested"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
openrisc: fix livelock in uaccess
nios2: fix livelock in uaccess
microblaze: fix livelock in uaccess
ia64: fix livelock in uaccess
sparc: fix livelock in uaccess
alpha: fix livelock in uaccess
parisc: fix livelock in uaccess
hexagon: fix livelock in uaccess
riscv: fix livelock in uaccess
m68k: fix livelock in uaccess
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
openrisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
nios2 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
microblaze equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ia64 equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
sparc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
alpha equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
hexagon equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Acked-by: Brian Cain <bcain@quicinc.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
riscv equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Björn Töpel <bjorn@kernel.org>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
m68k equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.
Tested-by: Finn Thain <fthain@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
include/linux/compiler-intel.h had no update in the past 3 years.
We often forget about the third C compiler to build the kernel.
For example, commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
only mentioned GCC and Clang.
init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
and nobody has reported any issue.
I guess the Intel Compiler support is broken, and nobody is caring
about it.
Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
deprecated:
$ icc -v
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
deprecated and will be removed from product release in the second half
of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
compiler moving forward. Please transition to use this compiler. Use
'-diag-disable=10441' to disable this message.
icc version 2021.7.0 (gcc version 12.1.0 compatibility)
Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
complete adoption of LLVM".
lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
untouched for better sync with https://github.com/facebook/zstd
Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes.
Eight are for MM and seven are for other parts of the kernel. Seven
are cc:stable and eight address post-6.3 issues or were judged
unsuitable for -stable backporting"
* tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: map Dikshita Agarwal's old address to his current one
mailmap: map Vikash Garodia's old address to his current one
fs/cramfs/inode.c: initialize file_ra_state
fs: hfsplus: fix UAF issue in hfsplus_put_super
panic: fix the panic_print NMI backtrace setting
lib: parser: update documentation for match_NUMBER functions
kasan, x86: don't rename memintrinsics in uninstrumented files
kasan: test: fix test for new meminstrinsic instrumentation
kasan: treat meminstrinsic as builtins in uninstrumented files
kasan: emit different calls for instrumentable memintrinsics
ocfs2: fix non-auto defrag path not working issue
ocfs2: fix defrag path triggering jbd2 ASSERT
mailmap: map Georgi Djakov's old Linaro address to his current one
mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH
mm/damon/paddr: fix missing folio_put()
mm/mremap: fix dup_anon_vma() in vma_merge() case 4
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Now that memcpy/memset/memmove are no longer overridden by KASAN, we can
just use the normal symbol names in uninstrumented files.
Drop the preprocessor redefinitions.
Link: https://lkml.kernel.org/r/20230224085942.1791837-4-elver@google.com
Fixes: 69d4c0d32186 ("entry, kasan, x86: Disallow overriding mem*() functions")
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linux Kernel Functional Testing <lkft@linaro.org>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Drop orphaned VAS MAINTAINERS entry
- Fix build errors with clang and KCSAN
- Avoid build errors seen with LD_DEAD_CODE_DATA_ELIMINATION together
with recordmcount
Thanks to Nathan Chancellor.
* tag 'powerpc-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Avoid dead code/data elimination when using recordmcount
powerpc/vmlinux.lds: Add .text.asan/tsan sections
powerpc: Drop orphaned VAS MAINTAINERS entry
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Although powerpc now has objtool mcount support, it's not enabled in all
configurations due to dependencies.
On those configurations, with some linkers (binutils 2.37 at least),
it's still possible to hit the dreaded "recordmcount bug", eg. errors
such as:
CC kernel/kexec_file.o
Cannot find symbol for section 10: .text.unlikely.
kernel/kexec_file.o: failed
make[1]: *** [scripts/Makefile.build:287 : kernel/kexec_file.o] Error 1
Those errors are much more prevalent when building with
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, because it places every function
in a separate section.
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is marked experimental and is not
enabled in any powerpc defconfigs or by major distros. Although it does
have at least some users on 32-bit where kernel size tends to be more
important.
Avoid the build errors by blocking CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
when the build is using recordmcount, rather than objtool. In practice
that means for 64-bit big endian builds, or 64-bit clang builds - both
because they lack CONFIG_MPROFILE_KERNEL.
On 32-bit objtool is always used, so
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is still available there.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230221130331.2714199-1-mpe@ellerman.id.au
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When KASAN/KCSAN are enabled clang generates .text.asan/tsan sections.
Because they are not mentioned in the linker script warnings are
generated, and when orphan handling is set to error that becomes a build
error, eg:
ld.lld: error: vmlinux.a(init/main.o):(.text.tsan.module_ctor) is
being placed in '.text.tsan.module_ctor' ld.lld: error:
vmlinux.a(init/version.o):(.text.tsan.module_ctor) is being placed in
'.text.tsan.module_ctor'
Fix it by adding the sections to our linker script, similar to the
generic change made in 848378812e40 ("vmlinux.lds.h: Handle clang's
module.{c,d}tor sections").
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20230222060037.2897169-1-mpe@ellerman.id.au
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Add empty command line parameter handling stubs to kernel for all
command line parameters which are handled in the decompressor. This
avoids invalid "Unknown kernel command line parameters" messages from
the kernel, and also avoids that these will be incorrectly passed to
user space. This caused already confusion, therefore add the empty
stubs
- Add missing phys_to_virt() handling to machine check handler
- Introduce and use a union to be used for zcrypt inline assemblies.
This makes sure that only a register wide member of the union is
passed as input and output parameter to inline assemblies, while
usual C code uses other members of the union to access bit fields of
it
- Add and use a READ_ONCE_ALIGNED_128() macro, which can be used to
atomically read a 128-bit value from memory. This replaces the
(mis-)use of the 128-bit cmpxchg operation to do the same in cpum_sf
code. Currently gcc does not generate the used lpq instruction if
__READ_ONCE() is used for aligned 128-bit accesses, therefore use
this s390 specific helper
- Simplify machine check handler code if a task needs to be killed
because of e.g. register corruption due to a machine malfunction
- Perform CPU reset to clear pending interrupts and TLB entries on an
already stopped target CPU before delegating work to it
- Generate arch/s390/boot/vmlinux.map link map for the decompressor,
when CONFIG_VMLINUX_MAP is enabled for debugging purposes
- Fix segment type handling for dcssblk devices. It incorrectly always
returned type "READ/WRITE" even for read-only segements, which can
result in a kernel panic if somebody tries to write to a read-only
device
- Sort config S390 select list again
- Fix two kprobe reenter bugs revealed by a recently added kprobe kunit
test
* tag 's390-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/kprobes: fix current_kprobe never cleared after kprobes reenter
s390/kprobes: fix irq mask clobbering on kprobe reenter from post_handler
s390/Kconfig: sort config S390 select list again
s390/extmem: return correct segment type in __segment_load()
s390/decompressor: add link map saving
s390/smp: perform cpu reset before delegating work to target cpu
s390/mcck: cleanup user process termination path
s390/cpum_sf: use READ_ONCE_ALIGNED_128() instead of 128-bit cmpxchg
s390/rwonce: add READ_ONCE_ALIGNED_128() macro
s390/ap,zcrypt,vfio: introduce and use ap_queue_status_reg union
s390/nmi: fix virtual-physical address confusion
s390/setup: do not complain about parameters handled in decompressor
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Recent test_kprobe_missed kprobes kunit test uncovers the following
problem. Once kprobe is triggered from another kprobe (kprobe reenter),
all future kprobes on this cpu are considered as kprobe reenter, thus
pre_handler and post_handler are not being called and kprobes are counted
as "missed".
Commit b9599798f953 ("[S390] kprobes: activation and deactivation")
introduced a simpler scheme for kprobes (de)activation and status
tracking by using push_kprobe/pop_kprobe, which supposed to work for
both initial kprobe entry as well as kprobe reentry and helps to avoid
handling those two cases differently. The problem is that a sequence of
calls in case of kprobes reenter:
push_kprobe() <- NULL (current_kprobe)
push_kprobe() <- kprobe1 (current_kprobe)
pop_kprobe() -> kprobe1 (current_kprobe)
pop_kprobe() -> kprobe1 (current_kprobe)
leaves "kprobe1" as "current_kprobe" on this cpu, instead of setting it
to NULL. In fact push_kprobe/pop_kprobe can only store a single state
(there is just one prev_kprobe in kprobe_ctlblk). Which is a hack but
sufficient, there is no need to have another prev_kprobe just to store
NULL. To make a simple and backportable fix simply reset "prev_kprobe"
when kprobe is poped from this "stack". No need to worry about
"kprobe_status" in this case, because its value is only checked when
current_kprobe != NULL.
Cc: stable@vger.kernel.org
Fixes: b9599798f953 ("[S390] kprobes: activation and deactivation")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Recent test_kprobe_missed kprobes kunit test uncovers the following error
(reported when CONFIG_DEBUG_ATOMIC_SLEEP is enabled):
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 662, name: kunit_try_catch
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
no locks held by kunit_try_catch/662.
irq event stamp: 280
hardirqs last enabled at (279): [<00000003e60a3d42>] __do_pgm_check+0x17a/0x1c0
hardirqs last disabled at (280): [<00000003e3bd774a>] kprobe_exceptions_notify+0x27a/0x318
softirqs last enabled at (0): [<00000003e3c5c890>] copy_process+0x14a8/0x4c80
softirqs last disabled at (0): [<0000000000000000>] 0x0
CPU: 46 PID: 662 Comm: kunit_try_catch Tainted: G N 6.2.0-173644-g44c18d77f0c0 #2
Hardware name: IBM 3931 A01 704 (LPAR)
Call Trace:
[<00000003e60a3a00>] dump_stack_lvl+0x120/0x198
[<00000003e3d02e82>] __might_resched+0x60a/0x668
[<00000003e60b9908>] __mutex_lock+0xc0/0x14e0
[<00000003e60bad5a>] mutex_lock_nested+0x32/0x40
[<00000003e3f7b460>] unregister_kprobe+0x30/0xd8
[<00000003e51b2602>] test_kprobe_missed+0xf2/0x268
[<00000003e51b5406>] kunit_try_run_case+0x10e/0x290
[<00000003e51b7dfa>] kunit_generic_run_threadfn_adapter+0x62/0xb8
[<00000003e3ce30f8>] kthread+0x2d0/0x398
[<00000003e3b96afa>] __ret_from_fork+0x8a/0xe8
[<00000003e60ccada>] ret_from_fork+0xa/0x40
The reason for this error report is that kprobes handling code failed
to restore irqs.
The problem is that when kprobe is triggered from another kprobe
post_handler current sequence of enable_singlestep / disable_singlestep
is the following:
enable_singlestep <- original kprobe (saves kprobe_saved_imask)
enable_singlestep <- kprobe triggered from post_handler (clobbers kprobe_saved_imask)
disable_singlestep <- kprobe triggered from post_handler (restores kprobe_saved_imask)
disable_singlestep <- original kprobe (restores wrong clobbered kprobe_saved_imask)
There is just one kprobe_ctlblk per cpu and both calls saves and
loads irq mask to kprobe_saved_imask. To fix the problem simply move
resume_execution (which calls disable_singlestep) before calling
post_handler. This also fixes the problem that post_handler is called
with pt_regs which were not yet adjusted after single-stepping.
Cc: stable@vger.kernel.org
Fixes: 4ba069b802c2 ("[S390] add kprobes support.")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Keep the config S390 select list sorted.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Commit f05f62d04271f ("s390/vmem: get rid of memory segment list")
reshuffled the call to vmem_add_mapping() in __segment_load(), which now
overwrites rc after it was set to contain the segment type code.
As result, __segment_load() will now always return 0 on success, which
corresponds to the segment type code SEG_TYPE_SW, i.e. a writeable
segment. This results in a kernel crash when loading a read-only segment
as dcssblk block device, and trying to write to it.
Instead of reshuffling code again, make sure to return the segment type
on success, and also describe this rather delicate and unexpected logic
in the function comment. Also initialize new segtype variable with
invalid value, to prevent possible future confusion.
Fixes: f05f62d04271 ("s390/vmem: get rid of memory segment list")
Cc: <stable@vger.kernel.org> # 5.9+
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Produce arch/s390/boot/vmlinux.map link map for the decompressor, when
CONFIG_VMLINUX_MAP option is enabled.
Link map is quite useful during making kernel changes related to how
the decompressor is composed and debugging linker scripts.
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Clear CPU state (e.g. all TLB entries, prefetched instructions, etc.)
of the target CPU, however without clearing register contents before
starting any work on it.
This puts the target CPU in a more defined state compared to the
current Stop + Restart sigp orders.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If a machine check interrupt hits while user process is
running __s390_handle_mcck() helper function is called
directly from the interrupt handler and terminates the
current process by calling make_task_dead() routine.
The make_task_dead() is not allowed to be called from
interrupt context which forces the machine check handler
switch to the kernel stack and enable local interrupts
first.
The __s390_handle_mcck() could also be called to service
pending work, but this time from the external interrupts
handler. It is the machine check handler that establishes
the work and schedules the external interrupt, therefore
the machine check interrupt itself should be disabled
while reading out the corresponding variable:
local_mcck_disable();
mcck = *this_cpu_ptr(&cpu_mcck);
memset(this_cpu_ptr(&cpu_mcck), 0, sizeof(mcck));
local_mcck_enable();
However, local_mcck_disable() does not have effect when
__s390_handle_mcck() is called directly form the machine
check handler, since the machine check interrupt is still
disabled. Therefore, it is not the opening bracket to the
following local_mcck_enable() function.
Simplify the user process termination flow by scheduling
the external interrupt and killing the affected process
from the interrupt context.
Assume a kernel-generated signal is always delivered and
ignore a value returned by do_send_sig_info() funciton.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Use READ_ONCE_ALIGNED_128() to read the previous value in front of a
128-bit cmpxchg loop, instead of (mis-)using a 128-bit cmpxchg operation to
do the same.
This makes the code more readable and is faster.
Link: https://lore.kernel.org/r/20230224100237.3247871-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Add an s390 specific READ_ONCE_ALIGNED_128() helper, which can be used for
fast block concurrent (atomic) 128-bit accesses.
The used lpq instruction requires 128-bit alignment. This is also the
reason why the compiler doesn't emit this instruction if __READ_ONCE() is
used for 128-bit accesses.
Link: https://lore.kernel.org/r/20230224100237.3247871-2-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Introduce a new ap queue status register wrapper union to access register
wide values. So the inline assembler only sees register wide values but the
surrounding code may use a more structured view of the same value and a
reader of the code (and the compiler) gets a clear understanding about the
mapping between fields and register values.
All the changes to access the ap queue status are local to the inline
functions within ap.h. However, the struct ap_qirq_ctrl has been replaces
by a union for same reason and this needed slight adaptions in the calling
code.
Suggested-by: Halil Pasic <pasic@linux.ibm.com>
Suggested-by: Andreas Arnez <arnez@linux.ibm.com>
Signed-off-by: Harald Freudenberger <freude@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Holger Dengler <dengler@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When a machine check is received while in SIE, it is reinjected into the
guest in some cases. The respective code needs to access the sie_block,
which is taken from the backed up R14.
Since reinjection only occurs while we are in SIE (i.e. between the
labels sie_entry and sie_leave in entry.S and thus if CIF_MCCK_GUEST is
set), the backed up R14 will always contain a physical address in
s390_backup_mcck_info.
This currently works, because virtual and physical addresses are
the same.
Add phys_to_virt() to resolve the virtual-physical confusion.
Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Link: https://lore.kernel.org/r/20230216121208.4390-2-nrb@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently there are several kernel command line parameters which are
only parsed and handled in decompressor and not known to the kernel.
This leads to the following error message during kernel boot:
Unknown kernel command line parameters "mem=3G nokaslr", will be passed
to user space.
To avoid confusion, register those parameters with an empty stub so that
kernel does not complain about them.
Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- Some cleanups and fixes for the Zbb-optimized string routines
- Support for custom (vendor or implementation defined) perf events
- COMMAND_LINE_SIZE has been increased to 1024
* tag 'riscv-for-linus-6.3-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Bump COMMAND_LINE_SIZE value to 1024
drivers/perf: RISC-V: Allow programming custom firmware events
riscv, lib: Fix Zbb strncmp
RISC-V: improve string-function assembly
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Increase COMMAND_LINE_SIZE as the current default value is too low
for syzbot kernel command line.
There has been considerable discussion on this patch that has led to a
larger patch set removing COMMAND_LINE_SIZE from the uapi headers on all
ports. That's not quite done yet, but it's gotten far enough we're
confident this is not a uABI change so this is safe.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Link: https://lore.kernel.org/r/20210316193420.904-1-alex@ghiti.fr
[Palmer: it's not uabi]
Link: https://lore.kernel.org/linux-riscv/874b8076-b0d1-4aaa-bcd8-05d523060152@app.fastmail.com/#t
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The Zbb optimized strncmp has two parts; a fast path that does XLEN/8B
per iteration, and a slow that does one byte per iteration.
The idea is to compare aligned XLEN chunks for most of strings, and do
the remainder tail in the slow path.
The Zbb strncmp has two issues in the fast path:
Incorrect remainder handling (wrong compare): Assume that the string
length is 9. On 64b systems, the fast path should do one iteration,
and one iteration in the slow path. Instead, both were done in the
fast path, which lead to incorrect results. An example:
strncmp("/dev/vda", "/dev/", 5);
Correct by changing "bgt" to "bge".
Missing NULL checks in the second string: This could lead to incorrect
results for:
strncmp("/dev/vda", "/dev/vda\0", 8);
Correct by adding an additional check.
Fixes: b6fcdb191e36 ("RISC-V: add zbb support to string functions")
Suggested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230228184211.1585641-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Adapt the suggestions for the assembly string functions that Andrew
suggested but that I didn't manage to include into the series that
got applied.
This includes improvements to two comments, removal of unneeded labels
and moving one instruction slightly higher to contradict an
explanatory comment.
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- In copy_highpage(), only reset the tag of the destination pointer if
KASAN_HW_TAGS is enabled so that user-space MTE does not interfere
with KASAN_SW_TAGS (which relies on top-byte-ignore).
- Remove warning if SME is detected without SVE, the kernel can cope
with such configuration (though none in the field currently).
- In cfi_handler(), pass the ESR_EL1 value to die() for consistency
with other die() callers.
- Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP on arm64 since the pte
manipulation from the generic vmemmap_remap_pte() does not follow the
required ARM break-before-make sequence (clear the pte, flush the
TLBs, set the new pte). It may be re-enabled once this sequence is
sorted.
- Fix possible memory leak in the arm64 ACPI code if the SMCCC version
and conduit checks fail.
- Forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE since gcc ignores
-falign-functions=N with -Os.
- Don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN as no
randomisation would actually take place.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kaslr: don't pretend KASLR is enabled if offset < MIN_KIMG_ALIGN
arm64: ftrace: forbid CALL_OPS with CC_OPTIMIZE_FOR_SIZE
arm64: acpi: Fix possible memory leak of ffh_ctxt
arm64: mm: hugetlb: Disable HUGETLB_PAGE_OPTIMIZE_VMEMMAP
arm64: pass ESR_ELx to die() of cfi_handler
arm64/fpsimd: Remove warning for SME without SVE
arm64: Reset KASAN tag in copy_highpage with HW tags only
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Our virtual KASLR displacement is a randomly chosen multiple of
2 MiB plus an offset that is equal to the physical placement modulo 2
MiB. This arrangement ensures that we can always use 2 MiB block
mappings (or contiguous PTE mappings for 16k or 64k pages) to map the
kernel.
This means that a KASLR offset of less than 2 MiB is simply the product
of this physical displacement, and no randomization has actually taken
place. Currently, we use 'kaslr_offset() > 0' to decide whether or not
randomization has occurred, and so we misidentify this case.
If the kernel image placement is not randomized, modules are allocated
from a dedicated region below the kernel mapping, which is only used for
modules and not for other vmalloc() or vmap() calls.
When randomization is enabled, the kernel image is vmap()'ed randomly
inside the vmalloc region, and modules are allocated in the vicinity of
this mapping to ensure that relative references are always in range.
However, unlike the dedicated module region below the vmalloc region,
this region is not reserved exclusively for modules, and so ordinary
vmalloc() calls may end up overlapping with it. This should rarely
happen, given that vmalloc allocates bottom up, although it cannot be
ruled out entirely.
The misidentified case results in a placement of the kernel image within
2 MiB of its default address. However, the logic that randomizes the
module region is still invoked, and this could result in the module
region overlapping with the start of the vmalloc region, instead of
using the dedicated region below it. If this happens, a single large
vmalloc() or vmap() call will use up the entire region, and leave no
space for loading modules after that.
Since commit 82046702e288 ("efi/libstub/arm64: Replace 'preferred'
offset with alignment check"), this is much more likely to occur on
systems that boot via EFI but lack an implementation of the EFI RNG
protocol, as in that case, the EFI stub will decide to leave the image
where it found it, and the EFI firmware uses 64k alignment only.
Fix this, by correctly identifying the case where the virtual
displacement is a result of the physical displacement only.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230223204101.1500373-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Florian reports that when building with CONFIG_CC_OPTIMIZE_FOR_SIZE=y,
he sees "Misaligned patch-site" warnings at boot, e.g.
| Misaligned patch-site bcm2836_arm_irqchip_handle_irq+0x0/0x88
| WARNING: CPU: 0 PID: 0 at arch/arm64/kernel/ftrace.c:120 ftrace_call_adjust+0x4c/0x70
This is because GCC will silently ignore `-falign-functions=N` when
passed `-Os`, resulting in functions not being aligned as we expect.
This is a known issue, and to account for this we modified the kernel to
avoid `-Os` generally. Unfortunately we forgot to account for
CONFIG_CC_OPTIMIZE_FOR_SIZE.
Forbid the use of CALL_OPS with CONFIG_CC_OPTIMIZE_FOR_SIZE=y to prevent
this issue. All exising ftrace features will work as before, though
without the performance benefit of CALL_OPS.
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Link: http://lore.kernel.org/linux-arm-kernel/2d9284c3-3805-402b-5423-520ced56d047@gmail.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Stefan Wahren <stefan.wahren@i2se.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230227115819.365630-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Allocated 'ffh_ctxt' memory leak is possible if the SMCCC version
and conduit checks fail and -EOPNOTSUPP is returned without freeing the
allocated memory.
Fix the same by moving the allocation after the SMCCC version and
conduit checks.
Fixes: 1d280ce099db ("arm64: Add architecture specific ACPI FFH Opregion callbacks")
Cc: <stable@vger.kernel.org> # 6.2.x
Cc: Will Deacon <will@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Suggested-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202302191417.dAl9NuE8-lkp@intel.com/
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20230223135742.2952091-1-sudeep.holla@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Revert the HUGETLB_PAGE_FREE_VMEMMAP selection from commit 1e63ac088f20
("arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64") but
keep the flush_dcache_page() compound_head() change as it aligns with
the corresponding check in the __sync_icache_dcache() function.
The original config option was renamed in commit 47010c040dec ("mm:
hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*") to
HUGETLB_PAGE_OPTIMIZE_VMEMMAP and the flush_dcache_page() check was
further simplified by commit 2da1c30929a2 ("mm: hugetlb_vmemmap: delete
hugetlb_optimize_vmemmap_enabled()").
The reason for the revert is that the generic vmemmap_remap_pte()
function changes both the permissions (writeable to read-only) and the
output address (pfn) of the vmemmap ptes. This is deemed UNPREDICTABLE
by the Arm architecture without a break-before-make sequence (make the
PTE invalid, TLBI, write the new valid PTE). However, such sequence is
not possible since the vmemmap may be concurrently accessed by the
kernel. Disable the optimisation until a better solution is found.
Fixes: 1e63ac088f20 ("arm64: mm: hugetlb: enable HUGETLB_PAGE_FREE_VMEMMAP for arm64")
Cc: <stable@vger.kernel.org> # 5.19.x
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Will Deacon <will@kernel.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/Y9pZALdn3pKiJUeQ@arm.com
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20230222175232.540851-1-catalin.marinas@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Commit 0f2cb928a154 ("arm64: consistently pass ESR_ELx to die()") caused
all callers to pass the ESR_ELx value to die().
For consistency, this patch also adds esr to die() call of cfi_handler.
Also, when CFI error occurs, die handlers can use ESR_ELx value.
Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230220073441.2753-1-sangmoon.kim@samsung.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Support for SME without SVE is architecturally valid and has now been tested
well enough so let's remove the warning message that is displayed at boot.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230209-arm64-sme-no-sve-v1-1-74eb3df2f878@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
During page migration, the copy_highpage function is used to copy the
page data to the target page. If the source page is a userspace page
with MTE tags, the KASAN tag of the target page must have the match-all
tag in order to avoid tag check faults during subsequent accesses to the
page by the kernel. However, the target page may have been allocated in
a number of ways, some of which will use the KASAN allocator and will
therefore end up setting the KASAN tag to a non-match-all tag. Therefore,
update the target page's KASAN tag to match the source page.
We ended up unintentionally fixing this issue as a result of a bad
merge conflict resolution between commit e059853d14ca ("arm64: mte:
Fix/clarify the PG_mte_tagged semantics") and commit 20794545c146 ("arm64:
kasan: Revert "arm64: mte: reset the page tag in page->flags""), which
preserved a tag reset for PG_mte_tagged pages which was considered to be
unnecessary at the time. Because SW tags KASAN uses separate tag storage,
update the code to only reset the tags when HW tags KASAN is enabled.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If303d8a709438d3ff5af5fd85706505830f52e0c
Reported-by: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee@mediatek.com>
Cc: <stable@vger.kernel.org> # 6.1
Fixes: 20794545c146 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"")
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20230215050911.1433132-1-pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|\ \ \ \ \ \ \ \
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull more MIPS updates from Thomas Bogendoerfer:
"A few more cleanups and fixes"
* tag 'mips_6.3_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Workaround clang inline compat branch issue
mips: dts: ralink: mt7621: add phandle to system controller node for watchdog
mips: dts: ralink: mt7621: rename watchdog node from 'wdt' into 'watchdog'
mips: ralink: make SOC_MT7621 select PINCTRL
mips: remove SYS_HAS_CPU_MIPS32_R1 from RALINK
MIPS: cevt-r4k: Offset the value used to clear compare interrupt
MIPS: smp-cps: Don't rely on CP0_CMGCRBASE
MIPS: Remove DMA_PERDEV_COHERENT
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Clang is unable to handle the situation that a chunk of inline
assembly ends with a compat branch instruction and then compiler
generates another control transfer instruction immediately after
this compat branch. The later instruction will end up in forbidden
slot and cause exception.
Workaround by add a option to control the use of compact branch.
Currently it's selected by CC_IS_CLANG and hopefully we can change
it to a version check in future if clang manages to fix it.
Fix boot on boston board.
Link: https://github.com/llvm/llvm-project/issues/61045
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
To allow to access system controller registers from watchdog driver code
add a phandle in the watchdog 'wdt' node. This avoid using arch dependent
operations in driver code.
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Watchdog nodes must use 'watchdog' for node name. When a 'make dtbs_check'
is performed the following warning appears:
wdt@100: $nodename:0: 'wdt@100' does not match '^watchdog(@.*|-[0-9a-f])?$'
Fix this warning up properly renaming the node into 'watchdog'.
Reviewed-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Currently, out of every Ralink SoC, only the dt-binding of the MT7621 SoC
uses pinctrl. Because of this, PINCTRL is not selected at all. Make
SOC_MT7621 select PINCTRL.
Remove PINCTRL_MT7621, enabling it for the MT7621 SoC will be handled under
the PINCTRL_MT7621 option.
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Acked-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|