diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-05-13 17:33:52 +0200 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-05-13 17:33:52 +0200 |
commit | d65e1a0f305ba3e7aabf6261a37bb871790d9f93 (patch) | |
tree | 4033039381e4a35eb76158e5b5130f05ae204139 /arch/s390/mm | |
parent | Linux 6.9 (diff) | |
parent | Revert "s390: Relocate vmlinux ELF data to virtual address space" (diff) | |
download | linux-d65e1a0f305ba3e7aabf6261a37bb871790d9f93.tar.xz linux-d65e1a0f305ba3e7aabf6261a37bb871790d9f93.zip |
Merge tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Store AP Query Configuration Information in a static buffer
- Rework the AP initialization and add missing cleanups to the error
path
- Swap IRQ and AP bus/device registration to avoid race conditions
- Export prot_virt_guest symbol
- Introduce AP configuration changes notifier interface to facilitate
modularization of the AP bus
- Add CONFIG_AP kernel configuration option to allow modularization of
the AP bus
- Rework CONFIG_ZCRYPT_DEBUG kernel configuration option description
and dependency and rename it to CONFIG_AP_DEBUG
- Convert sprintf() and snprintf() to sysfs_emit() in CIO code
- Adjust indentation of RELOCS command build step
- Make crypto performance counters upward compatible
- Convert make_page_secure() and gmap_make_secure() to use folio
- Rework channel-utilization-block (CUB) handling in preparation of
introducing additional CUBs
- Use attribute groups to simplify registration, removal and extension
of measurement-related channel-path sysfs attributes
- Add a per-channel-path binary "ext_measurement" sysfs attribute that
provides access to extended channel-path measurement data
- Export measurement data for all channel-measurement-groups (CMG), not
only for a specific ones. This enables support of new CMG data
formats in userspace without the need for kernel changes
- Add a per-channel-path sysfs attribute "speed_bps" that provides the
operating speed in bits per second or 0 if the operating speed is not
available
- The CIO tracepoint subchannel-type field "st" is incorrectly set to
the value of subchannel-enabled SCHIB "ena" field. Fix that
- Do not forcefully limit vmemmap starting address to MAX_PHYSMEM_BITS
- Consider the maximum physical address available to a DCSS segment
(512GB) when memory layout is set up
- Simplify the virtual memory layout setup by reducing the size of
identity mapping vs vmemmap overlap
- Swap vmalloc and Lowcore/Real Memory Copy areas in virtual memory.
This will allow to place the kernel image next to kernel modules
- Move everyting KASLR related from <asm/setup.h> to <asm/page.h>
- Put virtual memory layout information into a structure to improve
code generation
- Currently __kaslr_offset is the kernel offset in both physical and
virtual memory spaces. Uncouple these offsets to allow uncoupling of
the addresses spaces
- Currently the identity mapping base address is implicit and is always
set to zero. Make it explicit by putting into __identity_base
persistent boot variable and use it in proper context
- Introduce .amode31 section start and end macros AMODE31_START and
AMODE31_END
- Introduce OS_INFO entries that do not reference any data in memory,
but rather provide only values
- Store virtual memory layout in OS_INFO. It is read out by
makedumpfile, crash and other tools
- Store virtual memory layout in VMCORE_INFO. It is read out by crash
and other tools when /proc/kcore device is used
- Create additional PT_LOAD ELF program header that covers kernel image
only, so that vmcore tools could locate kernel text and data when
virtual and physical memory spaces are uncoupled
- Uncouple physical and virtual address spaces
- Map kernel at fixed location when KASLR mode is disabled. The
location is defined by CONFIG_KERNEL_IMAGE_BASE kernel configuration
value.
- Rework deployment of kernel image for both compressed and
uncompressed variants as defined by CONFIG_KERNEL_UNCOMPRESSED kernel
configuration value
- Move .vmlinux.relocs section in front of the compressed kernel. The
interim section rescue step is avoided as result
- Correct modules thunk offset calculation when branch target is more
than 2GB away
- Kernel modules contain their own set of expoline thunks. Now that the
kernel modules area is less than 4GB away from kernel expoline
thunks, make modules use kernel expolines. Also make EXPOLINE_EXTERN
the default if the compiler supports it
- userfaultfd can insert shared zeropages into processes running VMs,
but that is not allowed for s390. Fallback to allocating a fresh
zeroed anonymous folio and insert that instead
- Re-enable shared zeropages for non-PV and non-skeys KVM guests
- Rename hex2bitmap() to ap_hex2bitmap() and export it for external use
- Add ap_config sysfs attribute to provide the means for setting or
displaying adapters, domains and control domains assigned to a
vfio-ap mediated device in a single operation
- Make vfio_ap_mdev_link_queue() ignore duplicate link requests
- Add write support to ap_config sysfs attribute to allow atomic update
a vfio-ap mediated device state
- Document ap_config sysfs attribute
- Function os_info_old_init() is expected to be called only from a
regular kdump kernel. Enable it to be called from a stand-alone dump
kernel
- Address gcc -Warray-bounds warning and fix array size in struct
os_info
- s390 does not support SMBIOS, so drop unneeded CONFIG_DMI checks
- Use unwinder instead of __builtin_return_address() with ftrace to
prevent returning of undefined values
- Sections .hash and .gnu.hash are only created when CONFIG_PIE_BUILD
kernel is enabled. Drop these for the case CONFIG_PIE_BUILD is
disabled
- Compile kernel with -fPIC and link with -no-pie to allow kpatch
feature always succeed and drop the whole CONFIG_PIE_BUILD
option-enabled code
- Add missing virt_to_phys() converter for VSIE facility and crypto
control blocks
* tag 's390-6.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390: Relocate vmlinux ELF data to virtual address space"
KVM: s390: vsie: Use virt_to_phys for crypto control block
s390: Relocate vmlinux ELF data to virtual address space
s390: Compile kernel with -fPIC and link with -no-pie
s390: vmlinux.lds.S: Drop .hash and .gnu.hash for !CONFIG_PIE_BUILD
s390/ftrace: Use unwinder instead of __builtin_return_address()
s390/pci: Drop unneeded reference to CONFIG_DMI
s390/os_info: Fix array size in struct os_info
s390/os_info: Initialize old os_info in standalone dump kernel
docs: Update s390 vfio-ap doc for ap_config sysfs attribute
s390/vfio-ap: Add write support to sysfs attr ap_config
s390/vfio-ap: Ignore duplicate link requests in vfio_ap_mdev_link_queue
s390/vfio-ap: Add sysfs attr, ap_config, to export mdev state
s390/ap: Externalize AP bus specific bitmap reading function
s390/mm: Re-enable the shared zeropage for !PV and !skeys KVM guests
mm/userfaultfd: Do not place zeropages when zeropages are disallowed
s390/expoline: Make modules use kernel expolines
s390/nospec: Correct modules thunk offset calculation
s390/boot: Do not rescue .vmlinux.relocs section
s390/boot: Rework deployment of the kernel image
...
Diffstat (limited to 'arch/s390/mm')
-rw-r--r-- | arch/s390/mm/gmap.c | 165 | ||||
-rw-r--r-- | arch/s390/mm/vmem.c | 5 |
2 files changed, 129 insertions, 41 deletions
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 12d22a7fa32f..474a25ca5c48 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2550,41 +2550,6 @@ static inline void thp_split_mm(struct mm_struct *mm) #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /* - * Remove all empty zero pages from the mapping for lazy refaulting - * - This must be called after mm->context.has_pgste is set, to avoid - * future creation of zero pages - * - This must be called after THP was disabled. - * - * mm contracts with s390, that even if mm were to remove a page table, - * racing with the loop below and so causing pte_offset_map_lock() to fail, - * it will never insert a page table containing empty zero pages once - * mm_forbids_zeropage(mm) i.e. mm->context.has_pgste is set. - */ -static int __zap_zero_pages(pmd_t *pmd, unsigned long start, - unsigned long end, struct mm_walk *walk) -{ - unsigned long addr; - - for (addr = start; addr != end; addr += PAGE_SIZE) { - pte_t *ptep; - spinlock_t *ptl; - - ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); - if (!ptep) - break; - if (is_zero_pfn(pte_pfn(*ptep))) - ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID)); - pte_unmap_unlock(ptep, ptl); - } - return 0; -} - -static const struct mm_walk_ops zap_zero_walk_ops = { - .pmd_entry = __zap_zero_pages, - .walk_lock = PGWALK_WRLOCK, -}; - -/* * switch on pgstes for its userspace process (for kvm) */ int s390_enable_sie(void) @@ -2601,22 +2566,142 @@ int s390_enable_sie(void) mm->context.has_pgste = 1; /* split thp mappings and disable thp for future mappings */ thp_split_mm(mm); - walk_page_range(mm, 0, TASK_SIZE, &zap_zero_walk_ops, NULL); mmap_write_unlock(mm); return 0; } EXPORT_SYMBOL_GPL(s390_enable_sie); -int gmap_mark_unmergeable(void) +static int find_zeropage_pte_entry(pte_t *pte, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + unsigned long *found_addr = walk->private; + + /* Return 1 of the page is a zeropage. */ + if (is_zero_pfn(pte_pfn(*pte))) { + /* + * Shared zeropage in e.g., a FS DAX mapping? We cannot do the + * right thing and likely don't care: FAULT_FLAG_UNSHARE + * currently only works in COW mappings, which is also where + * mm_forbids_zeropage() is checked. + */ + if (!is_cow_mapping(walk->vma->vm_flags)) + return -EFAULT; + + *found_addr = addr; + return 1; + } + return 0; +} + +static const struct mm_walk_ops find_zeropage_ops = { + .pte_entry = find_zeropage_pte_entry, + .walk_lock = PGWALK_WRLOCK, +}; + +/* + * Unshare all shared zeropages, replacing them by anonymous pages. Note that + * we cannot simply zap all shared zeropages, because this could later + * trigger unexpected userfaultfd missing events. + * + * This must be called after mm->context.allow_cow_sharing was + * set to 0, to avoid future mappings of shared zeropages. + * + * mm contracts with s390, that even if mm were to remove a page table, + * and racing with walk_page_range_vma() calling pte_offset_map_lock() + * would fail, it will never insert a page table containing empty zero + * pages once mm_forbids_zeropage(mm) i.e. + * mm->context.allow_cow_sharing is set to 0. + */ +static int __s390_unshare_zeropages(struct mm_struct *mm) +{ + struct vm_area_struct *vma; + VMA_ITERATOR(vmi, mm, 0); + unsigned long addr; + vm_fault_t fault; + int rc; + + for_each_vma(vmi, vma) { + /* + * We could only look at COW mappings, but it's more future + * proof to catch unexpected zeropages in other mappings and + * fail. + */ + if ((vma->vm_flags & VM_PFNMAP) || is_vm_hugetlb_page(vma)) + continue; + addr = vma->vm_start; + +retry: + rc = walk_page_range_vma(vma, addr, vma->vm_end, + &find_zeropage_ops, &addr); + if (rc < 0) + return rc; + else if (!rc) + continue; + + /* addr was updated by find_zeropage_pte_entry() */ + fault = handle_mm_fault(vma, addr, + FAULT_FLAG_UNSHARE | FAULT_FLAG_REMOTE, + NULL); + if (fault & VM_FAULT_OOM) + return -ENOMEM; + /* + * See break_ksm(): even after handle_mm_fault() returned 0, we + * must start the lookup from the current address, because + * handle_mm_fault() may back out if there's any difficulty. + * + * VM_FAULT_SIGBUS and VM_FAULT_SIGSEGV are unexpected but + * maybe they could trigger in the future on concurrent + * truncation. In that case, the shared zeropage would be gone + * and we can simply retry and make progress. + */ + cond_resched(); + goto retry; + } + + return 0; +} + +static int __s390_disable_cow_sharing(struct mm_struct *mm) { + int rc; + + if (!mm->context.allow_cow_sharing) + return 0; + + mm->context.allow_cow_sharing = 0; + + /* Replace all shared zeropages by anonymous pages. */ + rc = __s390_unshare_zeropages(mm); /* * Make sure to disable KSM (if enabled for the whole process or * individual VMAs). Note that nothing currently hinders user space * from re-enabling it. */ - return ksm_disable(current->mm); + if (!rc) + rc = ksm_disable(mm); + if (rc) + mm->context.allow_cow_sharing = 1; + return rc; +} + +/* + * Disable most COW-sharing of memory pages for the whole process: + * (1) Disable KSM and unmerge/unshare any KSM pages. + * (2) Disallow shared zeropages and unshare any zerpages that are mapped. + * + * Not that we currently don't bother with COW-shared pages that are shared + * with parent/child processes due to fork(). + */ +int s390_disable_cow_sharing(void) +{ + int rc; + + mmap_write_lock(current->mm); + rc = __s390_disable_cow_sharing(current->mm); + mmap_write_unlock(current->mm); + return rc; } -EXPORT_SYMBOL_GPL(gmap_mark_unmergeable); +EXPORT_SYMBOL_GPL(s390_disable_cow_sharing); /* * Enable storage key handling from now on and initialize the storage @@ -2685,7 +2770,7 @@ int s390_enable_skey(void) goto out_up; mm->context.uses_skeys = 1; - rc = gmap_mark_unmergeable(); + rc = __s390_disable_cow_sharing(mm); if (rc) { mm->context.uses_skeys = 0; goto out_up; diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c index 85cddf904cb2..41c714e21292 100644 --- a/arch/s390/mm/vmem.c +++ b/arch/s390/mm/vmem.c @@ -13,7 +13,9 @@ #include <linux/slab.h> #include <linux/sort.h> #include <asm/page-states.h> +#include <asm/abs_lowcore.h> #include <asm/cacheflush.h> +#include <asm/maccess.h> #include <asm/nospec-branch.h> #include <asm/ctlreg.h> #include <asm/pgalloc.h> @@ -21,6 +23,7 @@ #include <asm/tlbflush.h> #include <asm/sections.h> #include <asm/set_memory.h> +#include <asm/physmem_info.h> static DEFINE_MUTEX(vmem_mutex); @@ -436,7 +439,7 @@ static int modify_pagetable(unsigned long start, unsigned long end, bool add, if (WARN_ON_ONCE(!PAGE_ALIGNED(start | end))) return -EINVAL; /* Don't mess with any tables not fully in 1:1 mapping & vmemmap area */ - if (WARN_ON_ONCE(end > VMALLOC_START)) + if (WARN_ON_ONCE(end > __abs_lowcore)) return -EINVAL; for (addr = start; addr < end; addr = next) { next = pgd_addr_end(addr, end); |