| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A trivial compile test fix for x86:
When CONFIG_AMD_NB is not set a COMPILE_TEST of an AMD specific driver
fails due to a missing inline stub. Add the stub to cure it"
* tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/amd_nb: Fix compile-testing without CONFIG_AMD_NB
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
node_to_amd_nb() is defined to NULL in non-AMD configs:
drivers/platform/x86/amd/hsmp/plat.c: In function 'init_platform_device':
drivers/platform/x86/amd/hsmp/plat.c:165:68: error: dereferencing 'void *' pointer [-Werror]
165 | sock->root = node_to_amd_nb(i)->root;
| ^~
drivers/platform/x86/amd/hsmp/plat.c:165:68: error: request for member 'root' in something not a structure or union
Users of the interface who also allow COMPILE_TEST will cause the above build
error so provide an inline stub to fix that.
[ bp: Massage commit message. ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20241029092329.3857004-1-arnd@kernel.org
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:
- Avoid build errors with old 'rustc's without LLVM patch version
(important since it impacts people that do not even enable Rust)
- Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in
'depends on' condition (the fix was eventually backported rather
than land in LLVM 19)"
* tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux:
cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS
kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The llvm fix [1] did not make it for 19.0.0, but ended up getting
backported to llvm 19.1.3 [2]. Thus, fix the version requirement to
correctly specify which versions have the bug.
Link: https://github.com/llvm/llvm-project/pull/104826 [1]
Link: https://github.com/llvm/llvm-project/pull/113938 [2]
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202410281414.c351044e-oliver.sang@intel.com
Fixes: 8b8ca9c25fe6 ("cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20241030-cfi-icall-1913-v1-1-ab8a26e13733@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Palmer Dabbelt:
- Avoid accessing the early boot ACPI tables via unsafe memory
attributes, which can result in incorrect ACPI table data appearing.
This can cause all sorts of bad behavior.
- Avoid compiler-inserted library calls in the VDSO.
- GCC+Rust builds have been disabled, to avoid issues related to ISA
string mismatched between the GCC and LLVM Rust implementations.
- The NX flag is now set in the EFI PE/COFF headers, which is necessary
for some distro GRUB versions to boot images.
- A fix to avoid leaking DT node reference counts on ACPI systems
during cache info parsing.
- CPU numbers are now printed as unsigned values during hotplug.
- A pair of build fixes for usused macros, which can trigger warnings
on some configurations.
* tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: Remove duplicated GET_RM
riscv: Remove unused GENERATING_ASM_OFFSETS
riscv: Use '%u' to format the output of 'cpu'
riscv: Prevent a bad reference count on CPU nodes
riscv: efi: Set NX compat flag in PE/COFF header
RISC-V: disallow gcc + rust builds
riscv: Do not use fortify in early code
RISC-V: ACPI: fix early_ioremap to early_memremap
riscv: vdso: Prevent the compiler from inserting calls to memset()
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This fix is part of a series on for-next, but it fixes broken builds so
I'm picking it up as a fix.
* commit 'bf40167d54d5':
riscv: vdso: Prevent the compiler from inserting calls to memset()
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The compiler is smart enough to insert a call to memset() in
riscv_vdso_get_cpus(), which generates a dynamic relocation.
So prevent this by using -fno-builtin option.
Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20241016083625.136311-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The macro GET_RM defined twice in this file, one can be removed.
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20241008094141.549248-3-zhangchunyan@iscas.ac.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The macro is not used in the current version of kernel, it looks like
can be removed to avoid a build warning:
../arch/riscv/kernel/asm-offsets.c: At top level:
../arch/riscv/kernel/asm-offsets.c:7: warning: macro "GENERATING_ASM_OFFSETS" is not used [-Wunused-macros]
7 | #define GENERATING_ASM_OFFSETS
Fixes: 9639a44394b9 ("RISC-V: Provide a cleaner raw_smp_processor_id()")
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Link: https://lore.kernel.org/r/20241008094141.549248-2-zhangchunyan@iscas.ac.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
'cpu' is an unsigned integer, so its conversion specifier should
be %u, not %d.
Suggested-by: Wentao Guan <guanwentao@uniontech.com>
Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
Link: https://lore.kernel.org/all/alpine.DEB.2.21.2409122309090.40372@angie.orcam.me.uk/
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Fixes: f1e58583b9c7 ("RISC-V: Support cpu hotplug")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/4C127DEECDA287C8+20241017032010.96772-1-wangyuli@uniontech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When populating cache leaves we previously fetched the CPU device node
at the very beginning. But when ACPI is enabled we go through a
specific branch which returns early and does not call 'of_node_put' for
the node that was acquired.
Since we are not using a CPU device node for the ACPI code anyways, we
can simply move the initialization of it just passed the ACPI block, and
we are guaranteed to have an 'of_node_put' call for the acquired node.
This prevents a bad reference count of the CPU device node.
Moreover, the previous function did not check for errors when acquiring
the device node, so a return -ENOENT has been added for that case.
Signed-off-by: Miquel Sabaté Solà <mikisabate@gmail.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Fixes: 604f32ea6909 ("riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240913080053.36636-1-mikisabate@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The IMAGE_DLLCHARACTERISTICS_NX_COMPAT informs the firmware that the
EFI binary does not rely on pages that are both executable and
writable.
The flag is used by some distro versions of GRUB to decide if the EFI
binary may be executed.
As the Linux kernel neither has RWX sections nor needs RWX pages for
relocation we should set the flag.
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Fixes: cb7d2dd5612a ("RISC-V: Add PE/COFF header for EFI stub")
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20240929140233.211800-1-heinrich.schuchardt@canonical.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
During the discussion before supporting rust on riscv, it was decided
not to support gcc yet, due to differences in extension handling
compared to llvm (only the version of libclang matching the c compiler
is supported). Recently Jason Montleon reported [1] that building with
gcc caused build issues, due to unsupported arguments being passed to
libclang. After some discussion between myself and Miguel, it is better
to disable gcc + rust builds to match the original intent, and
subsequently support it when an appropriate set of extensions can be
deduced from the version of libclang.
Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1]
Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2]
Fixes: 70a57b247251a ("RISC-V: enable building 64-bit kernels with rust support")
Cc: stable@vger.kernel.org
Reported-by: Jason Montleon <jmontleo@redhat.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20241001-playlist-deceiving-16ece2f440f5@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Early code designates the code executed when the MMU is not yet enabled,
and this comes with some limitations (see
Documentation/arch/riscv/boot.rst, section "Pre-MMU execution").
FORTIFY_SOURCE must be disabled then since it can trigger kernel panics
as reported in [1].
Reported-by: Jason Montleon <jmontleo@redhat.com>
Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qCT2NGS7Q9HODg@mail.gmail.com/ [1]
Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head")
Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line")
Cc: stable@vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When SVPBMT is enabled, __acpi_map_table() will directly access the
data in DDR through the IO attribute, rather than through hardware
cache consistency, resulting in incorrect data in the obtained ACPI
table.
The log: ACPI: [ACPI:0x18] Invalid zero length.
We do not assume whether the bootloader flushes or not. We should
access in a cacheable way instead of maintaining cache consistency
by software.
Fixes: 3b426d4b5b14 ("RISC-V: ACPI : Fix for usage of pointers in different address space")
Cc: stable@vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20241014130141.86426-1-cuiyunhui@bytedance.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The important one is a change to the way in which we handle protection
keys around signal delivery so that we're more closely aligned with
the x86 behaviour, however there is also a revert of the previous fix
to disable software tag-based KASAN with GCC, since a workaround
materialised shortly afterwards.
I'd love to say we're done with 6.12, but we're aware of some
longstanding fpsimd register corruption issues that we're almost at
the bottom of resolving.
Summary:
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that
of x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Reset POR_EL0 to "allow all" before writing the signal frame, preventing
spurious uaccess failures.
When POE is supported, the POR_EL0 register constrains memory
accesses based on the target page's POIndex (pkey). This raises the
question: what constraints should apply to a signal handler? The
current answer is that POR_EL0 is reset to POR_EL0_INIT when
invoking the handler, giving it full access to POIndex 0. This is in
line with x86's MPK support and remains unchanged.
This is only part of the story, though. POR_EL0 constrains all
unprivileged memory accesses, meaning that uaccess routines such as
put_user() are also impacted. As a result POR_EL0 may prevent the
signal frame from being written to the signal stack (ultimately
causing a SIGSEGV). This is especially concerning when an alternate
signal stack is used, because userspace may want to prevent access
to it outside of signal handlers. There is currently no provision
for that: POR_EL0 is reset after writing to the stack, and
POR_EL0_INIT only enables access to POIndex 0.
This patch ensures that POR_EL0 is reset to its most permissive
state before the signal stack is accessed. Once the signal frame has
been fully written, POR_EL0 is still set to POR_EL0_INIT - it is up
to the signal handler to enable access to additional pkeys if
needed. As to sigreturn(), it expects having access to the stack
like any other syscall; we only need to ensure that POR_EL0 is
restored from the signal frame after all uaccess calls. This
approach is in line with the recent x86/pkeys series [1].
Resetting POR_EL0 early introduces some complications, in that we
can no longer read the register directly in preserve_poe_context().
This is addressed by introducing a struct (user_access_state)
and helpers to manage any such register impacting user accesses
(uaccess and accesses in userspace). Things look like this on signal
delivery:
1. Save original POR_EL0 into struct [save_reset_user_access_state()]
2. Set POR_EL0 to "allow all" [save_reset_user_access_state()]
3. Create signal frame
4. Write saved POR_EL0 value to the signal frame [preserve_poe_context()]
5. Finalise signal frame
6. If all operations succeeded:
a. Set POR_EL0 to POR_EL0_INIT [set_handler_user_access_state()]
b. Else reset POR_EL0 to its original value [restore_user_access_state()]
If any step fails when setting up the signal frame, the process will
be sent a SIGSEGV, which it may be able to handle. Step 6.b ensures
that the original POR_EL0 is saved in the signal frame when
delivering that SIGSEGV (so that the original value is restored by
sigreturn).
The return path (sys_rt_sigreturn) doesn't strictly require any change
since restore_poe_context() is already called last. However, to
avoid uaccess calls being accidentally added after that point, we
use the same approach as in the delivery path, i.e. separating
uaccess from writing to the register:
1. Read saved POR_EL0 value from the signal frame [restore_poe_context()]
2. Set POR_EL0 to the saved value [restore_user_access_state()]
[1] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@oracle.com/
Fixes: 9160f7e909e1 ("arm64: add POE signal support")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20241029144539.111155-2-kevin.brodsky@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more stability fixes. There's one patch adding export of MIPS
cmpxchg helper, used in the error propagation fix.
- fix error propagation from split bios to the original btrfs bio
- fix merging of adjacent extents (normal operation, defragmentation)
- fix potential use after free after freeing btrfs device structures"
* tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix defrag not merging contiguous extents due to merged extent maps
btrfs: fix extent map merging not happening for adjacent extents
btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()
btrfs: fix error propagation of split bios
MIPS: export __cmpxchg_small()
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Export the symbol __cmpxchg_small() for btrfs.ko that uses it to store
blk_status_t, which is u8. Reported by LKP:
>> ERROR: modpost: "__cmpxchg_small" [fs/btrfs/btrfs.ko] undefined!
Patch using the cmpxchg() https://lore.kernel.org/linux-btrfs/1d4f72f7fee285b2ddf4bf62b0ac0fd89def5417.1728575379.git.naohiro.aota@wdc.com/
Link: https://lore.kernel.org/all/20241016134919.GO1609@suse.cz/
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: David Sterba <dsterba@suse.com>
|
| |_|_|_|_|/
|/| | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following:
AR built-in.a
AR vmlinux.a
LD vmlinux.o
vmlinux.o: warning: objtool: handle_bug+0x4: call to
kmsan_unpoison_entry_regs() leaves .noinstr.text section
OBJCOPY modules.builtin.modinfo
GEN modules.builtin
MODPOST Module.symvers
CC .vmlinux.export.o
Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes
the warning.
There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but
it has the return condition and if we include it after
instrumentation_begin() it results the warning "return with
instrumentation enabled", hence, I'm concerned that regs will not be KMSAN
unpoisoned if `ud_type == BUG_NONE` is true.
Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com
Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()")
Signed-off-by: Sabyrzhan Tasbolatov <snovitoll@gmail.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Prevent a certain range of pages which get marked as hypervisor-only,
to get allocated to a CoCo (SNP) guest which cannot use them and thus
fail booting
- Fix the microcode loader on AMD to pay attention to the stepping of a
patch and to handle the case where a BIOS config option splits the
machine into logical NUMA nodes per L3 cache slice
- Disable LAM from being built by default due to security concerns
* tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Ensure that RMP table fixups are reserved
x86/microcode/AMD: Split load_microcode_amd()
x86/microcode/AMD: Pay attention to the stepping dynamically
x86/lam: Disable ADDRESS_MASKING in most cases
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The BIOS reserves RMP table memory via e820 reservations. This can still lead
to RMP page faults during kexec if the host tries to access memory within the
same 2MB region.
Commit
400fea4b9651 ("x86/sev: Add callback to apply RMP table fixups for kexec"
adjusts the e820 reservations for the RMP table so that the entire 2MB range
at the start/end of the RMP table is marked reserved.
The e820 reservations are then passed to firmware via SNP_INIT where they get
marked HV-Fixed.
The RMP table fixups are done after the e820 ranges have been added to
memblock, allowing the fixup ranges to still be allocated and used by the
system.
The problem is that this memory range is now marked reserved in the e820
tables and during SNP initialization these reserved ranges are marked as
HV-Fixed. This means that the pages cannot be used by an SNP guest, only by
the hypervisor.
However, the memory management subsystem does not make this distinction and
can allocate one of those pages to an SNP guest. This will ultimately result
in RMPUPDATE failures associated with the guest, causing it to fail to start
or terminate when accessing the HV-Fixed page.
The issue is captured below with memblock=debug:
[ 0.000000] SEV-SNP: *** DEBUG: snp_probe_rmptable_info:352 - rmp_base=0x280d4800000, rmp_end=0x28357efffff
...
[ 0.000000] BIOS-provided physical RAM map:
...
[ 0.000000] BIOS-e820: [mem 0x00000280d4800000-0x0000028357efffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000028357f00000-0x0000028357ffffff] usable
...
...
[ 0.183593] memblock add: [0x0000028357f00000-0x0000028357ffffff] e820__memblock_setup+0x74/0xb0
...
[ 0.203179] MEMBLOCK configuration:
[ 0.207057] memory size = 0x0000027d0d194000 reserved size = 0x0000000009ed2c00
[ 0.215299] memory.cnt = 0xb
...
[ 0.311192] memory[0x9] [0x0000028357f00000-0x0000028357ffffff], 0x0000000000100000 bytes flags: 0x0
...
...
[ 0.419110] SEV-SNP: Reserving start/end of RMP table on a 2MB boundary [0x0000028357e00000]
[ 0.428514] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved
[ 0.428517] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved
[ 0.428520] e820: update [mem 0x28357e00000-0x28357ffffff] usable ==> reserved
...
...
[ 5.604051] MEMBLOCK configuration:
[ 5.607922] memory size = 0x0000027d0d194000 reserved size = 0x0000000011faae02
[ 5.616163] memory.cnt = 0xe
...
[ 5.754525] memory[0xc] [0x0000028357f00000-0x0000028357ffffff], 0x0000000000100000 bytes on node 0 flags: 0x0
...
...
[ 10.080295] Early memory node ranges[ 10.168065]
...
node 0: [mem 0x0000028357f00000-0x0000028357ffffff]
...
...
[ 8149.348948] SEV-SNP: RMPUPDATE failed for PFN 28357f7c, pg_level: 1, ret: 2
As shown above, the memblock allocations show 1MB after the end of the RMP as
available for allocation, which is what the RMP table fixups have reserved.
This memory range subsequently gets allocated as SNP guest memory, resulting
in an RMPUPDATE failure.
This can potentially be fixed by not reserving the memory range in the e820
table, but that causes kexec failures when using the KEXEC_FILE_LOAD syscall.
The solution is to use memblock_reserve() to mark the memory reserved for the
system, ensuring that it cannot be allocated to an SNP guest.
Since HV-Fixed memory is still readable/writable by the host, this only ends
up being a problem if the memory in this range requires a page state change,
which generally will only happen when allocating memory in this range to be
used for running SNP guests, which is now possible with the SNP hypervisor
support in kernel 6.11.
Backporter note:
Fixes tag points to a 6.9 change but as the last paragraph above explains,
this whole thing can happen after 6.11 received SNP HV support, therefore
backporting to 6.9 is not really necessary.
[ bp: Massage commit message. ]
Fixes: 400fea4b9651 ("x86/sev: Add callback to apply RMP table fixups for kexec")
Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@kernel.org> # 6.11, see Backporter note above.
Link: https://lore.kernel.org/r/20240815221630.131133-1-Ashish.Kalra@amd.com
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This function should've been split a long time ago because it is used in
two paths:
1) On the late loading path, when the microcode is loaded through the
request_firmware interface
2) In the save_microcode_in_initrd() path which collects all the
microcode patches which are relevant for the current system before
the initrd with the microcode container has been jettisoned.
In that path, it is not really necessary to iterate over the nodes on
a system and match a patch however it didn't cause any trouble so it
was left for a later cleanup
However, that later cleanup was expedited by the fact that Jens was
enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
so this causes the NUMA CPU masks used in cpumask_of_node() to be
generated *after* 2) above happened on the first node. Which means, all
those masks were funky, wrong, uninitialized and whatnot, leading to
explosions when dereffing c->microcode in load_microcode_amd().
So split that function and do only the necessary work needed at each
stage.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Commit in Fixes changed how a microcode patch is loaded on Zen and newer but
the patch matching needs to happen with different rigidity, depending on what
is being done:
1) When the patch is added to the patches cache, the stepping must be ignored
because the driver still supports different steppings per system
2) When the patch is matched for loading, then the stepping must be taken into
account because each CPU needs the patch matching its exact stepping
Take care of that by making the matching smarter.
Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Linear Address Masking (LAM) has a weakness related to transient
execution as described in the SLAM paper[1]. Unless Linear Address
Space Separation (LASS) is enabled this weakness may be exploitable.
Until kernel adds support for LASS[2], only allow LAM for COMPILE_TEST,
or when speculation mitigations have been disabled at compile time,
otherwise keep LAM disabled.
There are no processors in market that support LAM yet, so currently
nobody is affected by this issue.
[1] SLAM: https://download.vusec.net/papers/slam_sp24.pdf
[2] LASS: https://lore.kernel.org/lkml/20230609183632.48706-1-alexander.shishkin@linux.intel.com/
[ dhansen: update SPECULATION_MITIGATIONS -> CPU_MITIGATIONS ]
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/5373262886f2783f054256babdf5a98545dc986b.1706068222.git.pawan.kumar.gupta%40linux.intel.com
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The x86 user pointer validation changes made me look at compiler output
a lot, and the wrong indentation for the ".popsection" in the generated
assembler triggered me.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
It turns out that AMD has a "Meltdown Lite(tm)" issue with non-canonical
accesses in kernel space. And so using just the high bit to decide
whether an access is in user space or kernel space ends up with the good
old "leak speculative data" if you have the right gadget using the
result:
CVE-2020-12965 “Transient Execution of Non-Canonical Accesses“
Now, the kernel surrounds the access with a STAC/CLAC pair, and those
instructions end up serializing execution on older Zen architectures,
which closes the speculation window.
But that was true only up until Zen 5, which renames the AC bit [1].
That improves performance of STAC/CLAC a lot, but also means that the
speculation window is now open.
Note that this affects not just the new address masking, but also the
regular valid_user_address() check used by access_ok(), and the asm
version of the sign bit check in the get_user() helpers.
It does not affect put_user() or clear_user() variants, since there's no
speculative result to be used in a gadget for those operations.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Link: https://lore.kernel.org/all/80d94591-1297-4afb-b510-c665efd37f10@citrix.com/
Link: https://lore.kernel.org/all/20241023094448.GAZxjFkEOOF_DM83TQ@fat_crate.local/ [1]
Link: https://www.amd.com/en/resources/product-security/bulletin/amd-sb-1010.html
Link: https://arxiv.org/pdf/2108.10771
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> # LAM case
Fixes: 2865baf54077 ("x86: support user address masking instead of non-speculative conditional")
Fixes: 6014bc27561f ("x86-64: make access_ok() independent of LAM")
Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Pull bpf fixes from Daniel Borkmann:
- Fix an out-of-bounds read in bpf_link_show_fdinfo for BPF sockmap
link file descriptors (Hou Tao)
- Fix BPF arm64 JIT's address emission with tag-based KASAN enabled
reserving not enough size (Peter Collingbourne)
- Fix BPF verifier do_misc_fixups patching for inlining of the
bpf_get_branch_snapshot BPF helper (Andrii Nakryiko)
- Fix a BPF verifier bug and reject BPF program write attempts into
read-only marked BPF maps (Daniel Borkmann)
- Fix perf_event_detach_bpf_prog error handling by removing an invalid
check which would skip BPF program release (Jiri Olsa)
- Fix memory leak when parsing mount options for the BPF filesystem
(Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Check validity of link->type in bpf_link_show_fdinfo()
bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()
bpf,perf: Fix perf_event_detach_bpf_prog error handling
selftests/bpf: Add test for passing in uninit mtu_len
selftests/bpf: Add test for writes to .rodata
bpf: Remove MEM_UNINIT from skb/xdp MTU helpers
bpf: Fix overloading of MEM_UNINIT's meaning
bpf: Add MEM_WRITE attribute
bpf: Preserve param->string when parsing mount options
bpf, arm64: Fix address emission with tag-based KASAN enabled
|
| | |_|_|_|_|/
| |/| | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
When BPF_TRAMP_F_CALL_ORIG is enabled, the address of a bpf_tramp_image
struct on the stack is passed during the size calculation pass and
an address on the heap is passed during code generation. This may
cause a heap buffer overflow if the heap address is tagged because
emit_a64_mov_i64() will emit longer code than it did during the size
calculation pass. The same problem could occur without tag-based
KASAN if one of the 16-bit words of the stack address happened to
be all-ones during the size calculation pass. Fix the problem by
assuming the worst case (4 instructions) when calculating the size
of the bpf_tramp_image address emission.
Fixes: 19d3c179a377 ("bpf, arm64: Fix trampoline for BPF_TRAMP_F_CALL_ORIG")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://linux-review.googlesource.com/id/I1496f2bc24fba7a1d492e16e2b94cf43714f2d3c
Link: https://lore.kernel.org/bpf/20241018221644.3240898-1-pcc@google.com
|
|\ \ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Get correct cores_per_package for SMT systems, enable IRQ if do_ale()
triggered in irq-enabled context, and fix some bugs about vDSO, memory
managenent, hrtimer in KVM, etc"
* tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Mark hrtimer to expire in hard interrupt context
LoongArch: Make KASAN usable for variable cpu_vabits
LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
LoongArch: Don't crash in stack_top() for tasks without vDSO
LoongArch: Set correct size for vDSO code mapping
LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context
LoongArch: Get correct cores_per_package for SMT systems
LoongArch: Use "Exception return address" to comment ERA
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Like commit 2c0d278f3293f ("KVM: LAPIC: Mark hrtimer to expire in hard
interrupt context") and commit 9090825fa9974 ("KVM: arm/arm64: Let the
timer expire in hardirq context on RT"), On PREEMPT_RT enabled kernels
unmarked hrtimers are moved into soft interrupt expiry mode by default.
Then the timers are canceled from an preempt-notifier which is invoked
with disabled preemption which is not allowed on PREEMPT_RT.
The timer callback is short so in could be invoked in hard-IRQ context.
So let the timer expire on hard-IRQ context even on -RT.
This fix a "scheduling while atomic" bug for PREEMPT_RT enabled kernels:
BUG: scheduling while atomic: qemu-system-loo/1011/0x00000002
Modules linked in: amdgpu rfkill nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat ns
CPU: 1 UID: 0 PID: 1011 Comm: qemu-system-loo Tainted: G W 6.12.0-rc2+ #1774
Tainted: [W]=WARN
Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022
Stack : ffffffffffffffff 0000000000000000 9000000004e3ea38 9000000116744000
90000001167475a0 0000000000000000 90000001167475a8 9000000005644830
90000000058dc000 90000000058dbff8 9000000116747420 0000000000000001
0000000000000001 6a613fc938313980 000000000790c000 90000001001c1140
00000000000003fe 0000000000000001 000000000000000d 0000000000000003
0000000000000030 00000000000003f3 000000000790c000 9000000116747830
90000000057ef000 0000000000000000 9000000005644830 0000000000000004
0000000000000000 90000000057f4b58 0000000000000001 9000000116747868
900000000451b600 9000000005644830 9000000003a13998 0000000010000020
00000000000000b0 0000000000000004 0000000000000000 0000000000071c1d
...
Call Trace:
[<9000000003a13998>] show_stack+0x38/0x180
[<9000000004e3ea34>] dump_stack_lvl+0x84/0xc0
[<9000000003a71708>] __schedule_bug+0x48/0x60
[<9000000004e45734>] __schedule+0x1114/0x1660
[<9000000004e46040>] schedule_rtlock+0x20/0x60
[<9000000004e4e330>] rtlock_slowlock_locked+0x3f0/0x10a0
[<9000000004e4f038>] rt_spin_lock+0x58/0x80
[<9000000003b02d68>] hrtimer_cancel_wait_running+0x68/0xc0
[<9000000003b02e30>] hrtimer_cancel+0x70/0x80
[<ffff80000235eb70>] kvm_restore_timer+0x50/0x1a0 [kvm]
[<ffff8000023616c8>] kvm_arch_vcpu_load+0x68/0x2a0 [kvm]
[<ffff80000234c2d4>] kvm_sched_in+0x34/0x60 [kvm]
[<9000000003a749a0>] finish_task_switch.isra.0+0x140/0x2e0
[<9000000004e44a70>] __schedule+0x450/0x1660
[<9000000004e45cb0>] schedule+0x30/0x180
[<ffff800002354c70>] kvm_vcpu_block+0x70/0x120 [kvm]
[<ffff800002354d80>] kvm_vcpu_halt+0x60/0x3e0 [kvm]
[<ffff80000235b194>] kvm_handle_gspr+0x3f4/0x4e0 [kvm]
[<ffff80000235f548>] kvm_handle_exit+0x1c8/0x260 [kvm]
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Currently, KASAN on LoongArch assume the CPU VA bits is 48, which is
true for Loongson-3 series, but not for Loongson-2 series (only 40 or
lower), this patch fix that issue and make KASAN usable for variable
cpu_vabits.
Solution is very simple: Just define XRANGE_SHADOW_SHIFT which means
valid address length from VA_BITS to min(cpu_vabits, VA_BITS).
Cc: stable@vger.kernel.org
Signed-off-by: Kanglong Wang <wangkanglong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
There are two pages in one TLB entry on LoongArch system. For kernel
space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit
set, otherwise HW treats it as non-global tlb, there will be potential
problems if tlb entry for kernel space is not global. Such as fail to
flush kernel tlb with the function local_flush_tlb_kernel_range() which
supposed only flush tlb with global bit.
Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and
kasan areas. For these areas both two consecutive page table entries
should be enabled with PAGE_GLOBAL bit. So with function set_pte() and
pte_clear(), pte buddy entry is checked and set besides its own pte
entry. However it is not atomic operation to set both two pte entries,
there is problem with test_vmalloc test case.
So function kernel_pte_init() is added to init a pte table when it is
created for kernel address space, and the default initial pte value is
PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry
need update with function set_pte() and pte_clear(), nothing to do with
the pte buddy entry.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Not all tasks have a vDSO mapped, for example kthreads never do. If such
a task ever ends up calling stack_top(), it will derefence the NULL vdso
pointer and crash.
This can for example happen when using kunit:
[<9000000000203874>] stack_top+0x58/0xa8
[<90000000002956cc>] arch_pick_mmap_layout+0x164/0x220
[<90000000003c284c>] kunit_vm_mmap_init+0x108/0x12c
[<90000000003c1fbc>] __kunit_add_resource+0x38/0x8c
[<90000000003c2704>] kunit_vm_mmap+0x88/0xc8
[<9000000000410b14>] usercopy_test_init+0xbc/0x25c
[<90000000003c1db4>] kunit_try_run_case+0x5c/0x184
[<90000000003c3d54>] kunit_generic_run_threadfn_adapter+0x24/0x48
[<900000000022e4bc>] kthread+0xc8/0xd4
[<9000000000200ce8>] ret_from_kernel_thread+0xc/0xa4
Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The current size of vDSO code mapping is hardcoded to PAGE_SIZE. This
cannot work for 4KB page size after commit 18efd0b10e0fd77 ("LoongArch:
vDSO: Wire up getrandom() vDSO implementation") because the code size
increases to 8KB. Thus set the code mapping size to its real size, i.e.
PAGE_ALIGN(vdso_end - vdso_start).
Fixes: 18efd0b10e0fd77 ("LoongArch: vDSO: Wire up getrandom() vDSO implementation")
Reviewed-by: Xi Ruoyao <xry111@xry111.site>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Unaligned access exception can be triggered in irq-enabled context such
as user mode, in this case do_ale() may call get_user() which may cause
sleep. Then we will get:
BUG: sleeping function called from invalid context at arch/loongarch/kernel/access-helper.h:7
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 129, name: modprobe
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
CPU: 0 UID: 0 PID: 129 Comm: modprobe Tainted: G W 6.12.0-rc1+ #1723
Tainted: [W]=WARN
Stack : 9000000105e0bd48 0000000000000000 9000000003803944 9000000105e08000
9000000105e0bc70 9000000105e0bc78 0000000000000000 0000000000000000
9000000105e0bc78 0000000000000001 9000000185e0ba07 9000000105e0b890
ffffffffffffffff 9000000105e0bc78 73924b81763be05b 9000000100194500
000000000000020c 000000000000000a 0000000000000000 0000000000000003
00000000000023f0 00000000000e1401 00000000072f8000 0000007ffbb0e260
0000000000000000 0000000000000000 9000000005437650 90000000055d5000
0000000000000000 0000000000000003 0000007ffbb0e1f0 0000000000000000
0000005567b00490 0000000000000000 9000000003803964 0000007ffbb0dfec
00000000000000b0 0000000000000007 0000000000000003 0000000000071c1d
...
Call Trace:
[<9000000003803964>] show_stack+0x64/0x1a0
[<9000000004c57464>] dump_stack_lvl+0x74/0xb0
[<9000000003861ab4>] __might_resched+0x154/0x1a0
[<900000000380c96c>] emulate_load_store_insn+0x6c/0xf60
[<9000000004c58118>] do_ale+0x78/0x180
[<9000000003801bc8>] handle_ale+0x128/0x1e0
So enable IRQ if unaligned access exception is triggered in irq-enabled
context to fix it.
Cc: stable@vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
In loongson_sysconf, The "core" of cores_per_node and cores_per_package
stands for a logical core, which means in a SMT system it stands for a
thread indeed. This information is gotten from SMBIOS Type4 Structure,
so in order to get a correct cores_per_package for both SMT and non-SMT
systems in parse_cpu_table() we should use SMBIOS_THREAD_PACKAGE_OFFSET
instead of SMBIOS_CORE_PACKAGE_OFFSET.
Cc: stable@vger.kernel.org
Reported-by: Chao Li <lichao@loongson.cn>
Tested-by: Chao Li <lichao@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
| |/ / / / / /
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The information contained in the comment for LOONGARCH_CSR_ERA is even
less informative than the macro itself, which can cause confusion for
junior developers. Let's use the full English term.
Signed-off-by: Yanteng Si <siyanteng@cqsoftware.com.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|\ \ \ \ \ \ \
| |/ / / / / /
|/| | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and
ID_AA64PFR1_EL1)
- Correcly expose S1PIE to guests, fixing a regression introduced in
6.12-rc1 with the S1POE support
- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state
- Address a couple of issues with the vgic when userspace
misconfigures the emulation, resulting in various splats. Headaches
courtesy of our Syzkaller friends
- Stop wasting space in the HYP idmap, as we are dangerously close to
the 4kB limit, and this has already exploded in -next
- Fix another race in vgic_init()
- Fix a UBSAN error when faking the cache topology with MTE enabled
RISCV:
- RISCV: KVM: use raw_spinlock for critical section in imsic
x86:
- A bandaid for lack of XCR0 setup in selftests, which causes trouble
if the compiler is configured to have x86-64-v3 (with AVX) as the
default ISA. Proper XCR0 setup will come in the next merge window.
- Fix an issue where KVM would not ignore low bits of the nested CR3
and potentially leak up to 31 bytes out of the guest memory's
bounds
- Fix case in which an out-of-date cached value for the segments
could by returned by KVM_GET_SREGS.
- More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
- Override MTRR state for KVM confidential guests, making it WB by
default as is already the case for Hyper-V guests.
Generic:
- Remove a couple of unused functions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
RISCV: KVM: use raw_spinlock for critical section in imsic
KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
KVM: selftests: x86: Avoid using SSE/AVX instructions
KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
x86/kvm: Override default caching mode for SEV-SNP and TDX
KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
KVM: Remove unused kvm_vcpu_gfn_to_pfn
KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
KVM: arm64: Fix shift-out-of-bounds bug
KVM: arm64: Shave a few bytes from the EL2 idmap code
KVM: arm64: Don't eagerly teardown the vgic on init error
KVM: arm64: Expose S1PIE to guests
KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
...
|
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.12, take #3
- Stop wasting space in the HYP idmap, as we are dangerously close
to the 4kB limit, and this has already exploded in -next
- Fix another race in vgic_init()
- Fix a UBSAN error when faking the cache topology with MTE
enabled
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
kvm_vgic_map_resources() prematurely marks the distributor as 'ready',
potentially allowing vCPUs to enter the guest before the distributor's
MMIO registration has been made visible.
Plug the race by marking the distributor as ready only after MMIO
registration is completed. Rely on the implied ordering of
synchronize_srcu() to ensure the MMIO registration is visible before
vgic_dist::ready. This also means that writers to vgic_dist::ready are
now serialized by the slots_lock, which was effectively the case already
as all writers held the slots_lock in addition to the config_lock.
Fixes: 59112e9c390b ("KVM: arm64: vgic: Fix a circular locking issue")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241017001947.2707312-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
KVM commits to a particular sizing of SPIs when the vgic is initialized,
which is before the point a vgic becomes ready. On top of that, KVM
supplies a default amount of SPIs should userspace not explicitly
configure this.
As such, the check for vgic_ready() in the handling of
KVM_DEV_ARM_VGIC_GRP_NR_IRQS is completely wrong, and testing if nr_spis
is nonzero is sufficient for preventing userspace from playing games
with us.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241017001947.2707312-2-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Fix a shift-out-of-bounds bug reported by UBSAN when running
VM with MTE enabled host kernel.
UBSAN: shift-out-of-bounds in arch/arm64/kvm/sys_regs.c:1988:14
shift exponent 33 is too large for 32-bit type 'int'
CPU: 26 UID: 0 PID: 7629 Comm: qemu-kvm Not tainted 6.12.0-rc2 #34
Hardware name: IEI NF5280R7/Mitchell MB, BIOS 00.00. 2024-10-12 09:28:54 10/14/2024
Call trace:
dump_backtrace+0xa0/0x128
show_stack+0x20/0x38
dump_stack_lvl+0x74/0x90
dump_stack+0x18/0x28
__ubsan_handle_shift_out_of_bounds+0xf8/0x1e0
reset_clidr+0x10c/0x1c8
kvm_reset_sys_regs+0x50/0x1c8
kvm_reset_vcpu+0xec/0x2b0
__kvm_vcpu_set_target+0x84/0x158
kvm_vcpu_set_target+0x138/0x168
kvm_arch_vcpu_ioctl_vcpu_init+0x40/0x2b0
kvm_arch_vcpu_ioctl+0x28c/0x4b8
kvm_vcpu_ioctl+0x4bc/0x7a8
__arm64_sys_ioctl+0xb4/0x100
invoke_syscall+0x70/0x100
el0_svc_common.constprop.0+0x48/0xf0
do_el0_svc+0x24/0x38
el0_svc+0x3c/0x158
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x194/0x198
Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration")
Cc: stable@vger.kernel.org
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20241017025701.67936-1-ilkka@os.amperecomputing.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Our idmap is becoming too big, to the point where it doesn't fit in
a 4kB page anymore.
There are some low-hanging fruits though, such as the el2_init_state
horror that is expanded 3 times in the kernel. Let's at least limit
ourselves to two copies, which makes the kernel link again.
At some point, we'll have to have a better way of doing this.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20241009204903.GA3353168@thelio-3990X
|
| |\| | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.12, take #2
- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1)
- Correcly expose S1PIE to guests, fixing a regression introduced
in 6.12-rc1 with the S1POE support
- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state
- Address a couple of issues with the vgic when userspace misconfigures
the emulation, resulting in various splats. Headaches courtesy
of our Syzkaller friends
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
As there is very little ordering in the KVM API, userspace can
instanciate a half-baked GIC (missing its memory map, for example)
at almost any time.
This means that, with the right timing, a thread running vcpu-0
can enter the kernel without a GIC configured and get a GIC created
behind its back by another thread. Amusingly, it will pick up
that GIC and start messing with the data structures without the
GIC having been fully initialised.
Similarly, a thread running vcpu-1 can enter the kernel, and try
to init the GIC that was previously created. Since this GIC isn't
properly configured (no memory map), it fails to correctly initialise.
And that's the point where we decide to teardown the GIC, freeing all
its resources. Behind vcpu-0's back. Things stop pretty abruptly,
with a variety of symptoms. Clearly, this isn't good, we should be
a bit more careful about this.
It is obvious that this guest is not viable, as it is missing some
important part of its configuration. So instead of trying to tear
bits of it down, let's just mark it as *dead*. It means that any
further interaction from userspace will result in -EIO. The memory
will be released on the "normal" path, when userspace gives up.
Cc: stable@vger.kernel.org
Reported-by: Alexander Potapenko <glider@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241009183603.3221824-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Prior to commit 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
we just exposed the santised view of ID_AA64MMFR3_EL1 to guests, meaning
that they saw both TCRX and S1PIE if present on the host machine. That
commit added VMM control over the contents of the register and exposed
S1POE but removed S1PIE, meaning that the extension is no longer visible
to guests. Reenable support for S1PIE with VMM control.
Fixes: 70ed7238297f ("KVM: arm64: Sanitise ID_AA64MMFR3_EL1")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20241005-kvm-arm64-fix-s1pie-v1-1-5901f02de749@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
There's been a decent amount of attention around unmaps of nested MMUs,
and TLBI handling is no exception to this. Add a comment clarifying why
it is safe to reschedule during a TLBI unmap, even without a reference
on the MMU in progress.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241007233028.2236133-5-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Currently, when a nested MMU is repurposed for some other MMU context,
KVM unmaps everything during vcpu_load() while holding the MMU lock for
write. This is quite a performance bottleneck for large nested VMs, as
all vCPU scheduling will spin until the unmap completes.
Start punting the MMU cleanup to a vCPU request, where it is then
possible to periodically release the MMU lock and CPU in the presence of
contention.
Ensure that no vCPU winds up using a stale MMU by tracking the pending
unmap on the S2 MMU itself and requesting an unmap on every vCPU that
finds it.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241007233028.2236133-4-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Right now the nested code allows unmap operations on a shadow stage-2 to
block unconditionally. This is wrong in a couple places, such as a
non-blocking MMU notifier or on the back of a sched_in() notifier as
part of shadow MMU recycling.
Carry through whether or not blocking is allowed to
kvm_pgtable_stage2_unmap(). This 'fixes' an issue where stage-2 MMU
reclaim would precipitate a stack overflow from a pile of kvm_sched_in()
callbacks, all trying to recycle a stage-2 MMU.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20241007233028.2236133-3-oliver.upton@linux.dev
Signed-off-by: Marc Zyngier <maz@kernel.org>
|