| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there are several kernel command line parameters which are
only parsed and handled in decompressor and not known to the kernel.
This leads to the following error message during kernel boot:
Unknown kernel command line parameters "mem=3G nokaslr", will be passed
to user space.
To avoid confusion, register those parameters with an empty stub so that
kernel does not complain about them.
Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
RDP instruction allows to reset DAT-protection bit in a PTE, with less
CPU synchronization overhead than IPTE instruction. In particular, IPTE
can cause machine-wide synchronization overhead, and excessive IPTE usage
can negatively impact machine performance.
RDP can be used instead of IPTE, if the new PTE only differs in SW bits
and _PAGE_PROTECT HW bit, for PTE protection changes from RO to RW.
SW PTE bit changes are allowed, e.g. for dirty and young tracking, but none
of the other HW-defined part of the PTE must change. This is because the
architecture forbids such changes to an active and valid PTE, which
is why invalidation with IPTE is always used first, before writing a new
entry.
The RDP optimization helps mainly for fault-driven SW dirty-bit tracking.
Writable PTEs are initially always mapped with HW _PAGE_PROTECT bit set,
to allow SW dirty-bit accounting on first write protection fault, where
the DAT-protection would then be reset. The reset is now done with RDP
instead of IPTE, if RDP instruction is available.
RDP cannot always guarantee that the DAT-protection reset is propagated
to all CPUs immediately. This means that spurious TLB protection faults
on other CPUs can now occur. For this, common code provides a
flush_tlb_fix_spurious_fault() handler, which will now be used to do a
CPU-local TLB flush. However, this will clear the whole TLB of a CPU, and
not just the affected entry. For more fine-grained flushing, by simply
doing a (local) RDP again, flush_tlb_fix_spurious_fault() would need to
also provide the PTE pointer.
Note that spurious TLB protection faults cannot really be distinguished
from racing pagetable updates, where another thread already installed the
correct PTE. In such a case, the local TLB flush would be unnecessary
overhead, but overall reduction of CPU synchronization overhead by not
using IPTE is still expected to be beneficial.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The setup of the kernel virtual address space is spread
throughout the sources, boot stages and config options
like this:
1. The available physical memory regions are queried
and stored as mem_detect information for later use
in the decompressor.
2. Based on the physical memory availability the virtual
memory layout is established in the decompressor;
3. If CONFIG_KASAN is disabled the kernel paging setup
code populates kernel pgtables and turns DAT mode on.
It uses the information stored at step [1].
4. If CONFIG_KASAN is enabled the kernel early boot
kasan setup populates kernel pgtables and turns DAT
mode on. It uses the information stored at step [1].
The kasan setup creates early_pg_dir directory and
directly overwrites swapper_pg_dir entries to make
shadow memory pages available.
Move the kernel virtual memory setup to the decompressor
and start the kernel with DAT turned on right from the
very first istruction. That completely eliminates the
boot phase when the kernel runs in DAT-off mode, simplies
the overall design and consolidates pgtables setup.
The identity mapping is created in the decompressor, while
kasan shadow mappings are still created by the early boot
kernel code.
Share with decompressor the existing kasan memory allocator.
It decreases the size of a newly requested memory block from
pgalloc_pos and ensures that kernel image is not overwritten.
pgalloc_low and pgalloc_pos pointers are made preserved boot
variables for that.
Use the bootdata infrastructure to setup swapper_pg_dir
and invalid_pg_dir directories used by the kernel later.
The interim early_pg_dir directory established by the
kasan initialization code gets eliminated as result.
As the kernel runs in DAT-on mode only the PSW_KERNEL_BITS
define gets PSW_MASK_DAT bit by default. Additionally, the
setup_lowcore_dat_off() and setup_lowcore_dat_on() routines
get merged, since there is no DAT-off mode stage anymore.
The memory mappings are created with RW+X protection that
allows the early boot code setting up all necessary data
and services for the kernel being booted. Just before the
paging is enabled the memory protection is changed to
RO+X for text, RO+NX for read-only data and RW+NX for
kernel data and the identity mapping.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit ada1da31ce34 ("s390/sclp: sort out physical vs
virtual pointers usage") fixed the notion of virtual
address for sclp_early_sccb pointer. However, it did
not take into account that kasan_early_init() can also
output messages and sclp_early_sccb should be adjusted
by the time kasan_early_init() is called.
Currently it is not a problem, since virtual and physical
addresses on s390 are the same. Nevertheless, should they
ever differ, this would cause an invalid pointer access.
Fixes: ada1da31ce34 ("s390/sclp: sort out physical vs virtual pointers usage")
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Benjamin Block <bblock@linux.ibm.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://lore.kernel.org/r/20220818205948.6360-1-wsa+renesas@sang-engineering.com
Link: https://lore.kernel.org/r/20220818210102.7301-1-wsa+renesas@sang-engineering.com
[gor@linux.ibm.com: squashed two changes linked above together]
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Due to historic reasons the base program check handler calls a
configurable function. Given that there is only the early program
check handler left, simplify the code by directly calling that
function.
The only other user was removed with commit d485235b0054 ("s390:
assume diag308 set always works").
Also rename all functions and the asm file to reflect this.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Add and use fixup_exception helper function in order to remove the
duplicated exception handler fixup code at several places.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass pt_regs to early program check handler like it is done for every
other interrupt and exception handler.
Also the passed pt_regs can be changed by the called function and the
changes register contents and psw contents will be taken into account
when returning. In addition the return psw will not be copied to the
program check old psw in lowcore, but to the usual return psw
location, like it is also done by the regular program check handler.
This allows also to get rid of the code that disabled lowcore
protection when changing the return address.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Follow arm64 and riscv and move the EX_TABLE define to asm-extable.h
which is a lot less generic than the current linkage.h.
Also make sure that all files which contain EX_TABLE usages actually
include the new header file. This should make sure that the files
always compile and there won't be any random compile breakage due to
other header file dependencies.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The early program check handler is active before the amode31 extable
is sorted. Therefore in case a program check happens early within the
amode31 code the extable entry might not be found.
Fix this by sorting the amode31 extable early.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Remove my old invalid email address which can be found in a couple of
files. Instead of updating it, just remove my contact data completely
from source files.
We have git and other tools which allow to figure out who is responsible
for what with recent contact data.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently s390 supports a fixed maximum command line length of 896
bytes. This isn't enough as some installers are trying to pass all
configuration data via kernel command line, and even with zfcp alone
it is easy to generate really long command lines. Therefore extend
the command line to 4 kbytes.
In the parm area where the command line is stored there is no indication
of the maximum allowed length, so a new field which contains the maximum
length is added.
The parm area has always been initialized to zero, so with old kernels
this field would read zero. This is important because tools like zipl
could read this field. If it contains a number larger than zero zipl
knows the maximum length that can be stored in the parm area, otherwise
it must assume that it is booting a legacy kernel and only 896 bytes are
available.
The removing of trailing whitespace in head.S is also removed because
code to do this is already present in setup_boot_command_line().
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide physical addresses whenever the hardware interface
expects it or a 32-bit value used for tracking.
Variable sclp_early_sccb gets initialized in the decompressor
and points to an address in physcal memory. Yet, it is used
as virtual memory pointer and therefore should be converted.
Note, the other two __bootdata variables sclp_info_sccb and
sclp_info_sccb_valid contain plain data, but no pointers and
do need any special care.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Kernel support for the newer PCI mio instructions can be toggled off
with the pci=nomio command line option which needs to integrate with
common code PCI option parsing. However this option then toggles static
branches which can't be toggled yet in an early_param() call.
Thus commit 9964f396f1d0 ("s390: fix setting of mio addressing control")
moved toggling the static branches to the PCI init routine.
With this setup however we can't check for mio support outside the PCI
code during early boot, i.e. before switching the static branches, which
we need to be able to export this as an ELF HWCAP.
Improve on this by turning mio availability into a machine flag that
gets initially set based on CONFIG_PCI and the facility bit and gets
toggled off if pci=nomio is found during PCI option parsing allowing
simple access to this machine flag after early init.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The magic string "S390EP" at offset 0x10008 indicated to the decompressed
kernel that it was booted by the decompressor. Introduce a new bootdata
flag instead which conveys the same information in an explicit and
a cleaner way. But keep the magic string because it is a kernel ABI.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
With gcc-11, there are a lot of warnings because the facility functions
are accessing lowcore through a null pointer. Fix this by moving the
facility arrays away from lowcore.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Convert tod_clock_base to union tod_clock. This simplifies quite a bit
of code and also fixes a bug in read_persistent_clock64();
void read_persistent_clock64(struct timespec64 *ts)
{
__u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
get_tod_clock_ext(clk);
*(__u64 *) &clk[1] -= delta;
if (*(__u64 *) &clk[1] > delta)
clk[0]--;
ext_to_timespec64(clk, ts);
}
Assume &clk[1] == 3 and delta == 2; then after the substraction the if
condition becomes true and the epoch part of the clock is decremented
by one because of an assumed overflow, even though there is none.
Fix this by using 128 bit arithmetics and let the compiler do the
right thing:
void read_persistent_clock64(struct timespec64 *ts)
{
union tod_clock clk;
u64 delta;
delta = initial_leap_seconds + TOD_UNIX_EPOCH;
store_tod_clock_ext(&clk);
clk.eitod -= delta;
ext_to_timespec64(&clk, ts);
}
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Rename store_tod_clock_ext() to store_tod_clock_ext_cc() to reflect
that it returns a condition code and also use union tod_clock as
parameter.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Use STORE CLOCK EXTENDED instead of STORE CLOCK in early tod clock
setup. This is just to remove another usage of stck, trying to remove
all usages of STORE CLOCK. This doesn't fix anything.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
s390_base_ext_handler_fn haven't been used since its introduction in
commit ab14de6c37fa ("[S390] Convert memory detection into C code.").
s390_base_ext_handler itself is currently falsely storing 16 registers
at __LC_SAVE_AREA_ASYNC rewriting several following lowcore values:
cpu_flags, return_psw, return_mcck_psw, sync_enter_timer and
async_enter_timer.
Besides that s390_base_ext_handler itself is only potentially hiding
EXT interrupts which should not have happen in the first place. Any
piece of code which requires EXT interrupts before fully functional
ext_int_handler is enabled has to do it on its own, like this is done
by sclp_early_cmd() which is doing EXT interrupts handling synchronously
in sclp_early_wait_irq().
With s390_base_ext_handler removed unexpected EXT interrupt leads
to disabled wait with the address 0x1b0 (__LC_EXT_NEW_PSW), which is
currently setup in the decompressor.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
remove the cad command line option as the instruction was never
published and never used by userspace.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently if early_pgm_check_handler is called it ends up in pgm check
loop. The problem is that early_pgm_check_handler is instrumented by
KASAN but executed without DAT flag enabled which leads to addressing
exception when KASAN checks try to access shadow memory.
Fix that by executing early handlers with DAT flag on under KASAN as
expected.
Reported-and-tested-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
diag 0x44 is a voluntary undirected yield of a virtual CPU. This has
caused a lot of performance issues in the past.
There is only one caller left, and that one is only executed if diag
0x9c (directed yield) is not present. Given that all hypervisors
implement diag 0x9c anyway, remove the last diag 0x44 to avoid that
more callers will be added.
Worst case that could happen now, if diag 0x9c is not present, is that
a virtual CPU would loop a bit instead of giving its time slice up.
diag 0x44 statistics in debugfs are kept and will always be zero, so
that user space can tell that there are no calls.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
| |
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
| |
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
"noexec" option is already parsed during startup and its value is
exposed via noexec_disabled variable. Simply reuse that value during
machine facilities detection.
Suggested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean uncompressed kernel .bss section in the startup code before
the uncompressed kernel is executed. At this point of time initrd and
certificates have been already rescued. Uncompressed kernel .bss size
is known from vmlinux_info. It is also taken into consideration during
uncompressed kernel positioning by kaslr (so it is safe to clean it).
With that uncompressed kernel is starting with .bss section zeroed and
no .bss section usage restrictions apply. Which makes chkbss checks for
uncompressed kernel objects obsolete and they can be removed.
early_nobss.c is also not needed anymore. Parts of it which are still
relevant are moved to early.c. Kasan initialization code is now called
directly from head64 (early.c is instrumented and should not be
executed before kasan shadow memory is set up).
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move enablement of mio addressing control from detect_machine_facilities
to pci_base_init. detect_machine_facilities runs so early that the
static branches have not been toggled yet, thus mio addressing control
was always off. In pci_base_init we have to use the SMP aware
ctl_set_bit though.
Fixes: 833b441ec0f6 ("s390: enable processes for mio instructions")
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The disabled_wait() function uses its argument as the PSW address when
it stops the CPU with a wait PSW that is disabled for interrupts.
The different callers sometimes use a specific number like 0xdeadbeef
to indicate a specific failure, the early boot code uses 0 and some
other calls sites use __builtin_return_address(0).
At the time a dump is created the current PSW and the registers of a
CPU are written to lowcore to make them avaiable to the dump analysis
tool. For a CPU stopped with disabled_wait the PSW and the registers
do not really make sense together, the PSW address does not point to
the function the registers belong to.
Simplify disabled_wait() by using _THIS_IP_ for the PSW address and
drop the argument to the function.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With a relocatable kernel that could reside at any place in memory, code
and data that has to stay below 2 GB needs special handling.
This patch introduces .dma sections for such text, data and ex_table.
The sections will be part of the decompressor kernel, so they will not
be relocated and stay below 2 GB. Their location is passed over to the
decompressed / relocated kernel via the .boot.preserved.data section.
The duald and aste for control register setup also need to stay below
2 GB, so move the setup code from arch/s390/kernel/head64.S to
arch/s390/boot/head.S. The duct and linkage_stack could reside above
2 GB, but their content has to be preserved for the decompresed kernel,
so they are also moved into the .dma section.
The start and end address of the .dma sections is added to vmcoreinfo,
for crash support, to help debugging in case the kernel crashed there.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Reviewed-by: Philipp Rudo <prudo@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
| |
Allow for userspace to use PCI MIO instructions.
Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
.boot.preserved.data is a better fit for ipl block than .boot.data
which is discarded after init. Reusing .boot.preserved.data allows to
simplify code a little bit and avoid copying data from .boot.data to
persistent variables.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the CPU
measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and store
thenn as event raw data.
- Bug fixes and cleanups
* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390/cpum_cf: Add kernel message exaplanations"
s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
s390/suspend: fix prefix register reset in swsusp_arch_resume
s390: warn about clearing als implied facilities
s390: allow overriding facilities via command line
s390: clean up redundant facilities list setup
s390/als: remove duplicated in-place implementation of stfle
s390/cio: Use cpa range elsewhere within vfio-ccw
s390/cio: Fix vfio-ccw handling of recursive TICs
s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
s390/cpum_cf: Add kernel message exaplanations
s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
s390/cpum_cf: add ctr_stcctm() function
s390/cpum_cf: move common functions into a separate file
s390/cpum_cf: introduce kernel_cpumcf_avail() function
s390/cpu_mf: replace stcctm5() with the stcctm() function
s390/cpu_mf: add store cpu counter multiple instruction support
s390/cpum_cf: Add minimal in-kernel interface for counter measurements
s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Facilities list in the lowcore is initially set up by verify_facilities
from als.c and later initializations are redundant, so cleaning them up.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Right now the early machine detection code check stsi 3.2.2 for "KVM"
and set MACHINE_IS_VM if this is different. As the console detection
uses diagnose 8 if MACHINE_IS_VM returns true this will crash Linux
early for any non z/VM system that sets a different value than KVM.
So instead of assuming z/VM, do not set any of MACHINE_IS_LPAR,
MACHINE_IS_VM, or MACHINE_IS_KVM.
CC: stable@vger.kernel.org
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To distinguish zfcpdump case and to be able to parse some of the kernel
command line arguments early (e.g. mem=) ipl block retrieval and command
line construction code is moved to the early boot phase.
"memory_end" is set up correctly respecting "mem=" and hsa_size in case
of the zfcpdump.
arch/s390/boot/string.c is introduced to provide string handling and
command line parsing functions to early boot phase code for the compressed
kernel image case.
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since the plain vmlinux ELF file no longer carries all necessary parts
for starting up (like the entry point and decompressor), add a check
which would block boot process and encourage users to use bzImage or
arch/s390/boot/compressed/vmlinux instead.
The check relies on s390 linux entry point ABI definition, which is only
present in bzImage and arch/s390/boot/compressed/vmlinux.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
Move functions which may not access bss section to extra file. This
makes it easier to verify that all early functions which may not rely
on an initialized bss section are not accessing it.
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Use IS_ENABLED to get rid of an #ifdef statement.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
memmove_early was introduced with commit d543a106f96d6 ("s390: fix
initrd corruptions with gcov/kcov instrumented kernels"). The reason
for writing this extra memmove implementation was to be able to move
memory from an unknown location (aka SCSI IPL parameter block) to a
known location.
This had to done early before it was known if the SCSI IPL parameter
block pointer was valid or not, and therefore the memmove
implementation was supposed to be able to handle program checks.
The code has been changed and especially with commit d08091ac9654
("s390/ipl: rely on diag308 store to get ipl info") and
commit b4623d4e5b23 ("s390: provide memmove implementation") there
is no need to have a memmove version that can handle program checks,
and in addition it cannot be gcov/kcov instrumented anymore.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
For both ccw and fcp boot retrieve ipl info from ipl block received via
diag308 store. Old scsi ipl parm block handling and cio_get_iplinfo are
removed. Ipl type is deducted from ipl block (if valid).
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
| |
With the new macros for CPU alternatives the MACHINE_FLAG_LPP check
around the LPP instruction can be optimized. After this is done the
flag can be removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Common code defines linker symbols which denote sections start/end in
a form of char []. Referencing those symbols as _symbol or &_symbol
yields the same result, but "_symbol" form is more widespread across
newly written code. Convert s390 specific code to this style.
Also removes unused _text symbol definition in boot/compressed/misc.c.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
"Since Martin is on vacation you get the s390 pull request for the
v4.15 merge window this time from me.
Besides a lot of cleanups and bug fixes these are the most important
changes:
- a new regset for runtime instrumentation registers
- hardware accelerated AES-GCM support for the aes_s390 module
- support for the new CEX6S crypto cards
- support for FORTIFY_SOURCE
- addition of missing z13 and new z14 instructions to the in-kernel
disassembler
- generate opcode tables for the in-kernel disassembler out of a
simple text file instead of having to manually maintain those
tables
- fast memset16, memset32 and memset64 implementations
- removal of named saved segment support
- hardware counter support for z14
- queued spinlocks and queued rwlocks implementations for s390
- use the stack_depth tracking feature for s390 BPF JIT
- a new s390_sthyi system call which emulates the sthyi (store
hypervisor information) instruction
- removal of the old KVM virtio transport
- an s390 specific CPU alternatives implementation which is used in
the new spinlock code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (88 commits)
MAINTAINERS: add virtio-ccw.h to virtio/s390 section
s390/noexec: execute kexec datamover without DAT
s390: fix transactional execution control register handling
s390/bpf: take advantage of stack_depth tracking
s390: simplify transactional execution elf hwcap handling
s390/zcrypt: Rework struct ap_qact_ap_info.
s390/virtio: remove unused header file kvm_virtio.h
s390: avoid undefined behaviour
s390/disassembler: generate opcode tables from text file
s390/disassembler: remove insn_to_mnemonic()
s390/dasd: avoid calling do_gettimeofday()
s390: vfio-ccw: Do not attempt to free no-op, test and tic cda.
s390: remove named saved segment support
s390/archrandom: Reconsider s390 arch random implementation
s390/pci: do not require AIS facility
s390/qdio: sanitize put_indicator
s390/qdio: use atomic_cmpxchg
s390/nmi: avoid using long-displacement facility
s390: pass endianness info to sparse
s390/decompressor: remove informational messages
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Dan Horák reported the following crash related to transactional execution:
User process fault: interruption code 0013 ilc:3 in libpthread-2.26.so[3ff93c00000+1b000]
CPU: 2 PID: 1 Comm: /init Not tainted 4.13.4-300.fc27.s390x #1
Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
task: 00000000fafc8000 task.stack: 00000000fafc4000
User PSW : 0705200180000000 000003ff93c14e70
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
User GPRS: 0000000000000077 000003ff00000000 000003ff93144d48 000003ff93144d5e
0000000000000000 0000000000000002 0000000000000000 000003ff00000000
0000000000000000 0000000000000418 0000000000000000 000003ffcc9fe770
000003ff93d28f50 000003ff9310acf0 000003ff92b0319a 000003ffcc9fe6d0
User Code: 000003ff93c14e62: 60e0b030 std %f14,48(%r11)
000003ff93c14e66: 60f0b038 std %f15,56(%r11)
#000003ff93c14e6a: e5600000ff0e tbegin 0,65294
>000003ff93c14e70: a7740006 brc 7,3ff93c14e7c
000003ff93c14e74: a7080000 lhi %r0,0
000003ff93c14e78: a7f40023 brc 15,3ff93c14ebe
000003ff93c14e7c: b2220000 ipm %r0
000003ff93c14e80: 8800001c srl %r0,28
There are several bugs with control register handling with respect to
transactional execution:
- on task switch update_per_regs() is only called if the next task has
an mm (is not a kernel thread). This however is incorrect. This
breaks e.g. for user mode helper handling, where the kernel creates
a kernel thread and then execve's a user space program. Control
register contents related to transactional execution won't be
updated on execve. If the previous task ran with transactional
execution disabled then the new task will also run with
transactional execution disabled, which is incorrect. Therefore call
update_per_regs() unconditionally within switch_to().
- on startup the transactional execution facility is not enabled for
the idle thread. This is not really a bug, but an inconsistency to
other facilities. Therefore enable the facility if it is available.
- on fork the new thread's per_flags field is not cleared. This means
that a child process inherits the PER_FLAG_NO_TE flag. This flag can
be set with a ptrace request to disable transactional execution for
the current process. It should not be inherited by new child
processes in order to be consistent with the handling of all other
PER related debugging options. Therefore clear the per_flags field in
copy_thread_tls().
Reported-and-tested-by: Dan Horák <dan@danny.cz>
Fixes: d35339a42dd1 ("s390: add support for transactional memory")
Cc: <stable@vger.kernel.org> # v3.7+
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Remove the support to create a z/VM named saved segment (NSS). This
feature is not supported since quite a while in favour of jump labels,
function tracing and (now) CPU alternatives. All of these features
require to write to the kernel text section which is not possible if
the kernel is contained within an NSS.
Given that memory savings are minimal if kernel images are shared and
in addition updates of shared images are painful, the NSS feature can
be removed.
Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If running on machines that do not provide topology information we
currently generate a "fake" topology which defines the maximum
distance between each cpu: each cpu will be put into an own drawer.
Historically this used to be the best option for (virtual) machines in
overcommited hypervisors.
For some workloads however it is better to generate a different
topology where all cpus are siblings within a package (all cpus are
core siblings). This shows performance improvements of up to 10%,
depending on the workload.
In order to keep the current behaviour, but also allow to switch to
the different core sibling topology use the existing "topology="
kernel parameter:
Specifying "topology=on" on machines without topology information will
generate the core siblings (fake) topology information, instead of the
default topology information where all cpus have the maximum distance.
On machines which provide topology information specifying
"topology=on" does not have any effect.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|