summaryrefslogtreecommitdiffstats
path: root/arch/s390 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* s390/pci: fix hot-plug of PCI function missing busNiklas Schnelle2020-11-031-0/+4
| | | | | | | | | | | | | | | | | | | | | Under some circumstances in particular with "Reconfigure I/O Path" a zPCI function may first appear in Standby through a PCI event with PEC 0x0302 which initially makes it visible to the zPCI subsystem, Only after that is it configured with a zPCI event with PEC 0x0301. If the zbus is still missing a PCI function zero (devfn == 0) when the PCI event 0x0301 is handled zdev->zbus->bus is still NULL and gets dereferenced in common code. Check for this case and enable but don't scan the zPCI function. This matches what would happen if we immediately got the 0x0301 configuration request or the function was included in CLP List PCI. In all cases the PCI functions with devfn != 0 will be scanned once function 0 appears. Fixes: 3047766bc6ec ("s390/pci: fix enabling a reserved PCI function") Cc: <stable@vger.kernel.org> # 5.8 Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/smp: move rcu_cpu_starting() earlierQian Cai2020-11-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The call to rcu_cpu_starting() in smp_init_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage ----------------------------- kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: show_stack+0x158/0x1f0 dump_stack+0x1f2/0x238 __lock_acquire+0x2640/0x4dd0 lock_acquire+0x3a8/0xd08 _raw_spin_lock_irqsave+0xc0/0xf0 clockevents_register_device+0xa8/0x528 init_cpu_timer+0x33e/0x468 smp_init_secondary+0x11a/0x328 smp_start_secondary+0x82/0x88 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the smp_init_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Signed-off-by: Qian Cai <cai@redhat.com> Acked-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390: update defconfigsHeiko Carstens2020-11-033-9/+12
| | | | Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/vdso: remove unused constantsHeiko Carstens2020-11-031-8/+0
| | | | Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/vdso: remove empty unused fileHeiko Carstens2020-11-031-0/+0
| | | | Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/mm: make pmd/pud_deref() large page awareGerald Schaefer2020-11-031-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | pmd/pud_deref() assume that they will never operate on large pmd/pud entries, and therefore only use the non-large _xxx_ENTRY_ORIGIN mask. With commit 9ec8fa8dc331b ("s390/vmemmap: extend modify_pagetable() to handle vmemmap"), that assumption is no longer true, at least for pmd_deref(). In theory, we could end up with wrong addresses because some of the non-address bits of a large entry would not be masked out. In practice, this does not (yet) show any impact, because vmemmap_free() is currently never used for s390. Fix pmd/pud_deref() to check for the entry type and use the _xxx_ENTRY_ORIGIN_LARGE mask for large entries. While at it, also move pmd/pud_pfn() around, in order to avoid code duplication, because they do the same thing. Fixes: 9ec8fa8dc331b ("s390/vmemmap: extend modify_pagetable() to handle vmemmap") Cc: <stable@vger.kernel.org> # 5.9 Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390: correct __bootdata / __bootdata_preserved macrosVasily Gorbik2020-10-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently s390 build is broken. SECTCMP .boot.data error: section .boot.data differs between vmlinux and arch/s390/boot/compressed/vmlinux make[2]: *** [arch/s390/boot/section_cmp.boot.data] Error 1 SECTCMP .boot.preserved.data error: section .boot.preserved.data differs between vmlinux and arch/s390/boot/compressed/vmlinux make[2]: *** [arch/s390/boot/section_cmp.boot.preserved.data] Error 1 make[1]: *** [bzImage] Error 2 Commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") converted all __section(foo) to __section("foo"). This is wrong for __bootdata / __bootdata_preserved macros which want variable names to be a part of intermediate section names .boot.data.<var name> and .boot.preserved.data.<var name>. Those sections are later sorted by alignment + name and merged together into final .boot.data / .boot.preserved.data sections. Those sections must be identical in the decompressor and the decompressed kernel (that is checked during the build). Fixes: 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* treewide: Convert macro and uses of __section(foo) to __section("foo")Joe Perches2020-10-254-5/+5
| | | | | | | | | | | | | | | | | | | | Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2020-10-232-0/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: "vhost, vdpa, and virtio cleanups and fixes A very quiet cycle, no new features" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: MAINTAINERS: add URL for virtio-mem vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call vringh: fix __vringh_iov() when riov and wiov are different vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK s390: virtio: PV needs VIRTIO I/O device protection virtio: let arch advertise guest's memory access restrictions vhost_vdpa: Fix duplicate included kernel.h vhost: reduce stack usage in log_used virtio-mem: Constify mem_id_table virtio_input: Constify id_table virtio-balloon: Constify id_table vdpa/mlx5: Fix failure to bring link up vdpa/mlx5: Make use of a specific 16 bit endianness API
| * s390: virtio: PV needs VIRTIO I/O device protectionPierre Morel2020-10-212-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If protected virtualization is active on s390, VIRTIO has only retricted access to the guest memory. Define CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS and export arch_has_restricted_virtio_memory_access to advertize VIRTIO if that's the case. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Link: https://lore.kernel.org/r/1599728030-17085-3-git-send-email-pmorel@linux.ibm.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-blockLinus Torvalds2020-10-231-1/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull arch task_work cleanups from Jens Axboe: "Two cleanups that don't fit other categories: - Finally get the task_work_add() cleanup done properly, so we don't have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates all callers, and also fixes up the documentation for task_work_add(). - While working on some TIF related changes for 5.11, this TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch duplication for how that is handled" * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block: task_work: cleanup notification modes tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
| * | tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()Jens Axboe2020-10-171-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | All the callers currently do this, clean it up and move the clearing into tracehook_notify_resume() instead. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'kbuild-v5.10' of ↵Linus Torvalds2020-10-221-2/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support 'make compile_commands.json' to generate the compilation database more easily, avoiding stale entries - Support 'make clang-analyzer' and 'make clang-tidy' for static checks using clang-tidy - Preprocess scripts/modules.lds.S to allow CONFIG options in the module linker script - Drop cc-option tests from compiler flags supported by our minimal GCC/Clang versions - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y - Use sha1 build id for both BFD linker and LLD - Improve deb-pkg for reproducible builds and rootless builds - Remove stale, useless scripts/namespace.pl - Turn -Wreturn-type warning into error - Fix build error of deb-pkg when CONFIG_MODULES=n - Replace 'hostname' command with more portable 'uname -n' - Various Makefile cleanups * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: Use uname for LINUX_COMPILE_HOST detection kbuild: Only add -fno-var-tracking-assignments for old GCC versions kbuild: remove leftover comment for filechk utility treewide: remove DISABLE_LTO kbuild: deb-pkg: clean up package name variables kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n kbuild: enforce -Werror=return-type scripts: remove namespace.pl builddeb: Add support for all required debian/rules targets builddeb: Enable rootless builds builddeb: Pass -n to gzip for reproducible packages kbuild: split the build log of kallsyms kbuild: explicitly specify the build id style scripts/setlocalversion: make git describe output more reliable kbuild: remove cc-option test of -Werror=date-time kbuild: remove cc-option test of -fno-stack-check kbuild: remove cc-option test of -fno-strict-overflow kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan kbuild: do not create built-in objects for external module builds ...
| * | | kbuild: explicitly specify the build id styleBill Wendling2020-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ld's --build-id defaults to "sha1" style, while lld defaults to "fast". The build IDs are very different between the two, which may confuse programs that reference them. Signed-off-by: Bill Wendling <morbo@google.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * | | arch: vdso: add vdso linker script to 'targets' instead of extra-yMasahiro Yamada2020-09-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vdso linker script is preprocessed on demand. Adding it to 'targets' is enough to include the .cmd file. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Greentime Hu <green.hu@gmail.com>
* | | | Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfioLinus Torvalds2020-10-223-3/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VFIO updates from Alex Williamson: - New fsl-mc vfio bus driver supporting userspace drivers of objects within NXP's DPAA2 architecture (Diana Craciun) - Support for exposing zPCI information on s390 (Matthew Rosato) - Fixes for "detached" VFs on s390 (Matthew Rosato) - Fixes for pin-pages and dma-rw accesses (Yan Zhao) - Cleanups and optimize vconfig regen (Zenghui Yu) - Fix duplicate irq-bypass token registration (Alex Williamson) * tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits) vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages vfio/pci: Clear token on bypass registration failure vfio/fsl-mc: fix the return of the uninitialized variable ret vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit MAINTAINERS: Add entry for s390 vfio-pci vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO vfio/fsl-mc: Add support for device reset vfio/fsl-mc: Add read/write support for fsl-mc devices vfio/fsl-mc: trigger an interrupt via eventfd vfio/fsl-mc: Add irq infrastructure for fsl-mc devices vfio/fsl-mc: Added lock support in preparation for interrupt handling vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO s390/pci: track whether util_str is valid in the zpci_dev s390/pci: stash version in the zpci_dev vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices ...
| | \ \ \
| | \ \ \
| *-. \ \ \ Merge branches 'v5.10/vfio/fsl-mc-v6' and 'v5.10/vfio/zpci-info-v3' into ↵Alex Williamson2020-10-122-1/+5
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | v5.10/vfio/next
| | | * | | | s390/pci: track whether util_str is valid in the zpci_devMatthew Rosato2020-10-072-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll need to keep track of whether or not the byte string in util_str is valid and thus needs to be passed to a vfio-pci passthrough device. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| | | * | | | s390/pci: stash version in the zpci_devMatthew Rosato2020-10-072-0/+2
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for passing the info on to vfio-pci devices, stash the supported PCI version for the target device in the zpci_dev. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Niklas Schnelle <schnelle@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * / / / / s390/pci: Mark all VFs as not implementing PCI_COMMAND_MEMORYMatthew Rosato2020-09-221-2/+3
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For s390 we can have VFs that are passed-through without the associated PF. Firmware provides an emulation layer to allow these devices to operate independently, but is missing emulation of the Memory Space Enable bit. For these as well as linked VFs, set no_command_memory which specifies these devices do not implement PCI_COMMAND_MEMORY. Fixes: abafbc551fdd ("vfio-pci: Invalidate mmaps and block MMIO access on disabled memory") Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | | | | Merge branch 'work.set_fs' of ↵Linus Torvalds2020-10-221-0/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull initial set_fs() removal from Al Viro: "Christoph's set_fs base series + fixups" * 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Allow a NULL pos pointer to __kernel_read fs: Allow a NULL pos pointer to __kernel_write powerpc: remove address space overrides using set_fs() powerpc: use non-set_fs based maccess routines x86: remove address space overrides using set_fs() x86: make TASK_SIZE_MAX usable from assembly code x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h lkdtm: remove set_fs-based tests test_bitmap: remove user bitmap tests uaccess: add infrastructure for kernel builds with set_fs() fs: don't allow splice read/write without explicit ops fs: don't allow kernel reads and writes without iter ops sysctl: Convert to iter interfaces proc: add a read_iter method to proc proc_ops proc: cleanup the compat vs no compat file ops proc: remove a level of indentation in proc_get_inode
| * | | | | uaccess: add infrastructure for kernel builds with set_fs()Christoph Hellwig2020-09-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a CONFIG_SET_FS option that is selected by architecturess that implement set_fs, which is all of them initially. If the option is not set stubs for routines related to overriding the address space are provided so that architectures can start to opt out of providing set_fs. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | mm/madvise: introduce process_madvise() syscall: an external memory hinting APIMinchan Kim2020-10-181-0/+1
| |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ [minchan@kernel.org: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com [minchan@kernel.org: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com [akpm@linux-foundation.org: fix patch ordering issue] [akpm@linux-foundation.org: fix arm64 whoops] [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian] [akpm@linux-foundation.org: fix i386 build] [sfr@canb.auug.org.au: fix syscall numbering] Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au [sfr@canb.auug.org.au: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au [minchan@kernel.org: fix mips build] Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com [yuehaibing@huawei.com: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com [minchan@kernel.org: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com [akpm@linux-foundation.org: pidfd_get_pid() gained an argument] [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.au Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 's390-5.10-1' of ↵Linus Torvalds2020-10-1687-1202/+1745
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Remove address space overrides using set_fs() - Convert to generic vDSO - Convert to generic page table dumper - Add ARCH_HAS_DEBUG_WX support - Add leap seconds handling support - Add NVMe firmware-assisted kernel dump support - Extend NVMe boot support with memory clearing control and addition of kernel parameters - AP bus and zcrypt api code rework. Add adapter configure/deconfigure interface. Extend debug features. Add failure injection support - Add ECC secure private keys support - Add KASan support for running protected virtualization host with 4-level paging - Utilize destroy page ultravisor call to speed up secure guests shutdown - Implement ioremap_wc() and ioremap_prot() with MIO in PCI code - Various checksum improvements - Other small various fixes and improvements all over the code * tag 's390-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (85 commits) s390/uaccess: fix indentation s390/uaccess: add default cases for __put_user_fn()/__get_user_fn() s390/zcrypt: fix wrong format specifications s390/kprobes: move insn_page to text segment s390/sie: fix typo in SIGP code description s390/lib: fix kernel doc for memcmp() s390/zcrypt: Introduce Failure Injection feature s390/zcrypt: move ap_msg param one level up the call chain s390/ap/zcrypt: revisit ap and zcrypt error handling s390/ap: Support AP card SCLP config and deconfig operations s390/sclp: Add support for SCLP AP adapter config/deconfig s390/ap: add card/queue deconfig state s390/ap: add error response code field for ap queue devices s390/ap: split ap queue state machine state from device state s390/zcrypt: New config switch CONFIG_ZCRYPT_DEBUG s390/zcrypt: introduce msg tracking in zcrypt functions s390/startup: correct early pgm check info formatting s390: remove orphaned extern variables declarations s390/kasan: make sure int handler always run with DAT on s390/ipl: add support to control memory clearing for nvme re-IPL ...
| * | | | | s390/uaccess: fix indentationHeiko Carstens2020-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/uaccess: add default cases for __put_user_fn()/__get_user_fn()Heiko Carstens2020-10-091-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add default cases for __put_user_fn()/__get_user_fn(). This doesn't fix anything since the functions are only called with sane values. However we get rid of smatch warnings: ./arch/s390/include/asm/uaccess.h:143 __get_user_fn() error: uninitialized symbol 'rc'. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/kprobes: move insn_page to text segmentHeiko Carstens2020-10-094-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the in-kernel kprobes insn page to text segment. Rationale: having that page in rw data segment is suboptimal, since as soon as a kprobe is set, this will split the 1:1 kernel mapping for a single page which get new permissions. Note: there is always at least one kprobe present for the kretprobe trampoline; so the mapping will always be split into smaller 4k mappings because of this. Moving the kprobes insn page into text segment makes sure that the page is mapped RO/X in any case, and avoids that the 1:1 mapping is split. The kprobe insn_page is defined as a dummy function which is filled with "br %r14" instructions. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/sie: fix typo in SIGP code descriptionJulian Wiedmann2020-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/ait address/at address Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/lib: fix kernel doc for memcmp()Julian Wiedmann2020-10-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/count/n Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/sclp: Add support for SCLP AP adapter config/deconfigHarald Freudenberger2020-10-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for AP bus adapter config and deconfig to the sclp core code. The code is statically build into the kernel when ZCRYPT is configured either as module or with static support. This is the base functionality for having configure/deconfigure support in the AP bus and card code. Another patch will exploit this soon. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Suggested-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/startup: correct early pgm check info formattingVasily Gorbik2020-10-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Early sclp console messages are printed in line mode on z/VM and LPAR, but under kvm newlines matter. Add a missing newline between "kernel version" and "Kernel fault". Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390: remove orphaned extern variables declarationsVasily Gorbik2020-10-022-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch/s390/kernel/entry.h: suspend_zero_pages - only declaration left after commit 394216275c7d ("s390: remove broken hibernate / power management support") arch/s390/include/asm/setup.h: vmhalt_cmd - only declaration left after commit 99ca4e582d4a ("[S390] kernel: Shutdown Actions Interface") arch/s390/include/asm/setup.h: vmpoff_cmd - only declaration left after commit 99ca4e582d4a ("[S390] kernel: Shutdown Actions Interface") Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/kasan: make sure int handler always run with DAT onVasily Gorbik2020-10-021-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 998f5bbe3dbd ("s390/kasan: fix early pgm check handler execution") early pgm check handler is executed with DAT on if Kasan is enabled. Still there is a window between setup_lowcore_dat_off() and setup_lowcore_dat_on() when int handlers could be executed with DAT off under Kasan. If this happens the kernel ends up in pgm check loop due to Kasan shadow memory access attempts. With Kasan enabled paging is initialized much earlier and DAT flag has to be on at all times instrumented code is executed. Make sure int handlers are set up to be called with DAT on right away in this case. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/ipl: add support to control memory clearing for nvme re-IPLGerald Schaefer2020-10-021-6/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-IPL for nvme is currently done by using diag 308 with the "Load Clear" subcode, which means that all memory will be cleared. This can increase re-IPL duration considerably on very large machines. For list-directed IPL like nvme or fcp IPL, a "Load Normal" subcode was introduced with z14. The "Load Normal" diag 308 subcode allows to re-IPL without clearing memory. This patch adds a new "clear" sysfs attribute to /sys/firmware/reipl/nvme, which can be set to either "0" or "1" to disable or enable re-IPL with memory clearing. The default value is "0", which disables memory clearing. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/nvme: support firmware-assisted dump to NVMe disksAlexander Egorenkov2020-10-025-19/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the kernel perspective NVMe dump works exactly like zFCP dump. Therefore, adapt all places where code explicitly tests only for IPL of type FCP DUMP. And also set the memory end correctly in this case. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/ipl: support NVMe IPL kernel parametersAlexander Egorenkov2020-10-021-8/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable extracting of extra kernel command-line parameters from the NVMe IPL block passed by the firmware to the kernel at boot. Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390: nvme dump supportJason J. Herne2020-10-022-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the nvme dump ipl type, associated data, and sysfs entries. This allows booting into a stand alone dump environment that resides on an nvme device. Signed-off-by: Jason J. Herne <jjherne@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390: remove orphaned function declarationsVasily Gorbik2020-09-308-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch/s390/pci/pci_bus.h: zpci_bus_init - only declaration left after commit 05bc1be6db4b ("s390/pci: create zPCI bus") arch/s390/include/asm/gmap.h: gmap_pte_notify - only declaration left after commit 4be130a08420 ("s390/mm: add shadow gmap support") arch/s390/include/asm/pgalloc.h: rcu_table_freelist_finish - only declaration left after commit 36409f6353fc ("[S390] use generic RCU page-table freeing code") arch/s390/include/asm/tlbflush.h: smp_ptlb_all - only declaration left after commit 5a79859ae0f3 ("s390: remove 31 bit support") arch/s390/include/asm/vtimer.h: init_cpu_vtimer - only declaration left after commit b5f87f15e200 ("s390/idle: consolidate idle functions and definitions") arch/s390/include/asm/pci.h: zpci_debug_info - only declaration left after commit 386aa051fb4b ("s390/pci: remove per device debug attribute") arch/s390/include/asm/vdso.h: vdso_alloc_boot_cpu - only declaration left after commit 4bff8cb54502 ("s390: convert to GENERIC_VDSO") arch/s390/include/asm/smp.h: smp_vcpu_scheduled - only declaration left after commit 67626fadd269 ("s390: enforce CONFIG_SMP") arch/s390/kernel/entry.h: restart_call_handler - only declaration left after commit 8b646bd75908 ("[S390] rework smp code") arch/s390/kernel/entry.h: startup_init_nobss - only declaration left after commit 2e83e0eb85ca ("s390: clean .bss before running uncompressed kernel") arch/s390/kernel/entry.h: s390_early_resume - only declaration left after commit 394216275c7d ("s390: remove broken hibernate / power management support") drivers/s390/char/raw3270.h: raw3270_request_alloc_bootmem - only declaration left after commit 33403dcfcdfd ("[S390] 3270 console: convert from bootmem to slab") drivers/s390/cio/device.h: ccw_device_schedule_sch_unregister - only declaration left after commit 37de53bb5290 ("[S390] cio: introduce ccw device todos") drivers/s390/char/tape.h: tape_hotplug_event - has only declaration since recorded git history. drivers/s390/char/tape.h: tape_oper_handler - has only declaration since recorded git history. drivers/s390/char/tape.h: tape_noper_handler - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_check_locate - only declaration left after commit 161beff8f40d ("s390/tape: remove tape block leftovers") drivers/s390/char/tape_std.h: tape_std_default_handler - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_unexpect_uchk_handler - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_irq - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_error_recovery - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_error_recovery_has_failed - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_error_recovery_succeded - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_error_recovery_do_retry - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_error_recovery_read_opposite - has only declaration since recorded git history. drivers/s390/char/tape_std.h: tape_std_error_recovery_HWBUG - has only declaration since recorded git history. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/startup: add kaslr_offset to pgm check info printVasily Gorbik2020-09-301-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | startup pgm check handler is active since the very beginning of kernel code execution until uncompressed kernel sets up s390_base_pgm_handler. It is useful not just for the decompressor debugging itself, but also for early code of uncompressed kernel, in particular Kasan initialization. But since there is no stack trace or symbolic representation of failing psw address it is impossible to figure out faulty code location without knowing Kaslr kernel base. So, let's add it to the startup pgm check info printed as well. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/startup: correct "dfltcc" option parsingVasily Gorbik2020-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently if just "dfltcc" is passed as a kernel command line option "val" going to be NULL, this leads to reading at address 0 in strcmp(val, "off") Fix that by making sure "val" is not NULL. This does not affect option handling logic. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/vdso: remove orphaned declarationsVasily Gorbik2020-09-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove couple of declarations which are unused since commit 4bff8cb54502 ("s390: convert to GENERIC_VDSO"). Acked-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/cio: remove unused channel_subsystem_reinitVasily Gorbik2020-09-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added with commit 77e844b96440 ("s390/hibernate: add early resume function") unused since commit 394216275c7d ("s390: remove broken hibernate / power management support"). Reviewed-by: Vineeth Vijayan <vneethv@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390: remove cad commandline optionSven Schnelle2020-09-301-13/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | remove the cad command line option as the instruction was never published and never used by userspace. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/startup: avoid save_area_sync overflowVasily Gorbik2020-09-291-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we overflow save_area_sync and write over save_area_async. Although this is not a real problem make startup_pgm_check_handler consistent with late pgm check handler and store [%r0,%r7] directly into gpregs_save_area. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390: remove unused _swsusp_reset_dmaVasily Gorbik2020-09-294-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 394216275c7d ("s390: remove broken hibernate / power management support") _swsusp_reset_dma is unused and could be safely removed. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/kaslr: correct and explain randomization base generationVasily Gorbik2020-09-291-38/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there are several minor problems with randomization base generation code: 1. It might misbehave in low memory conditions. In particular there might be enough space for the kernel on [0, block_sum] but after if (base < safe_addr) base = safe_addr; it might not be enough anymore. 2. It does not correctly handle minimal address constraint. In condition if (base < safe_addr) base = safe_addr; a synthetic value is compared with an address. If we have a memory setup with memory holes due to offline memory regions, and safe_addr is close to the end of the first online memory block - we might position the kernel in invalid memory. 3. block_sum calculation logic contains off-by-one error. Let's say we have a memory block in which the kernel fits perfectly (end - start == kernel_size). In this case: if (end - start < kernel_size) continue; block_sum += end - start - kernel_size; block_sum is not increased, while it is a valid kernel position. So, address problems listed and explain algorithm used. Besides that restructuring the code makes it possible to extend kernel positioning algorithm further. Currently we pick position in between single [min, max] range (min = safe_addr, max = memory_limit). In future we can do that for multiple ranges as well (by calling count_valid_kernel_positions for each range). Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/kaslr: avoid mixing valid random value and an error codeVasily Gorbik2020-09-291-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0 is a valid random value. To avoid mixing it with error code 0 as an return code make get_random() take extra argument to output random value and return an error code. Reviewed-by: Philipp Rudo <prudo@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/stp: unify stp_work_mutex and clock_sync_mutexSven Schnelle2020-09-261-27/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to have two mutexes, and while at it rename it to stp_mutex. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/stp: add sysfs file to show scheduled leap secondsSven Schnelle2020-09-261-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces /sys/devices/system/stp/scheduled_leap_seconds, which will contain either 0,0 if no leap second is scheduled, or the UTC timestamp + leap second offset. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
| * | | | | s390/stp: add support for leap secondsSven Schnelle2020-09-263-10/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation, leap seconds are only synchronized during the bootup process when the STP clock is synced. If the Leap second offset (LSO) changes the machine must be rebooted, which is not desired. This patch adds the required code to handle Leap second changes during runtime. If the Leap second changes, a Configuration change machine check is triggered. The STP code than schedules a Leap second insertion/deletion with do_adjtimex(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>