summaryrefslogtreecommitdiffstats
path: root/arch/s390 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* KVM: Don't actually set a request when evicting vCPUs for GFN cache invdSean Christopherson2022-04-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'kvm-s390-next-5.18-2' of ↵Paolo Bonzini2022-03-155-15/+100
|\ | | | | | | | | | | | | | | | | | | https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix, test and feature for 5.18 part 2 - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests
| * KVM: s390x: fix SCK lockingClaudio Imbrenda2022-03-143-6/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling the SCK instruction, the kvm lock is taken, even though the vcpu lock is already being held. The normal locking order is kvm lock first and then vcpu lock. This is can (and in some circumstances does) lead to deadlocks. The function kvm_s390_set_tod_clock is called both by the SCK handler and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu lock, so they can safely take the kvm lock. The SCK handler holds the vcpu lock, but will also somehow need to acquire the kvm lock without relinquishing the vcpu lock. The solution is to factor out the code to set the clock, and provide two wrappers. One is called like the original function and does the locking, the other is called kvm_s390_try_set_tod_clock and uses trylock to try to acquire the kvm lock. This new wrapper is then used in the SCK handler. If locking fails, -EAGAIN is returned, which is eventually propagated to userspace, thus also freeing the vcpu lock and allowing for forward progress. This is not the most efficient or elegant way to solve this issue, but the SCK instruction is deprecated and its performance is not critical. The goal of this patch is just to provide a simple but correct way to fix the bug. Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220301143340.111129-1-imbrenda@linux.ibm.com Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
| * KVM: s390: pv: make use of ultravisor AIV supportMichael Mueller2022-02-254-9/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables the ultravisor adapter interruption vitualization support indicated by UV feature BIT_UV_FEAT_AIV. This allows ISC interruption injection directly into the GISA IPM for PV kvm guests. Hardware that does not support this feature will continue to use the UV interruption interception method to deliver ISC interruptions to PV kvm guests. For this purpose, the ECA_AIV bit for all guest cpus will be cleared and the GISA will be disabled during PV CPU setup. In addition a check in __inject_io() has been removed. That reduces the required instructions for interruption handling for PV and traditional kvm guests. Signed-off-by: Michael Mueller <mimu@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220209152217.1793281-2-mimu@linux.ibm.com Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* | KVM: s390: Replace KVM_REQ_MMU_RELOAD usage with arch specific requestSean Christopherson2022-03-013-5/+7
|/ | | | | | | | | | | | | | | Add an arch request, KVM_REQ_REFRESH_GUEST_PREFIX, to deal with guest prefix changes instead of piggybacking KVM_REQ_MMU_RELOAD. This will allow for the removal of the generic KVM_REQ_MMU_RELOAD, which isn't actually used by generic KVM. No functional change intended. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220225182248.3812651-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: s390: Add missing vm MEM_OP size checkJanis Schoetterl-Glausch2022-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check that size is not zero, preventing the following warning: WARNING: CPU: 0 PID: 9692 at mm/vmalloc.c:3059 __vmalloc_node_range+0x528/0x648 Modules linked in: CPU: 0 PID: 9692 Comm: memop Not tainted 5.17.0-rc3-e4+ #80 Hardware name: IBM 8561 T01 701 (LPAR) Krnl PSW : 0704c00180000000 0000000082dc584c (__vmalloc_node_range+0x52c/0x648) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000083 ffffffffffffffff 0000000000000000 0000000000000001 0000038000000000 000003ff80000000 0000000000000cc0 000000008ebb8000 0000000087a8a700 000000004040aeb1 000003ffd9f7dec8 000000008ebb8000 000000009d9b8000 000000000102a1b4 00000380035afb68 00000380035afaa8 Krnl Code: 0000000082dc583e: d028a7f4ff80 trtr 2036(41,%r10),3968(%r15) 0000000082dc5844: af000000 mc 0,0 #0000000082dc5848: af000000 mc 0,0 >0000000082dc584c: a7d90000 lghi %r13,0 0000000082dc5850: b904002d lgr %r2,%r13 0000000082dc5854: eb6ff1080004 lmg %r6,%r15,264(%r15) 0000000082dc585a: 07fe bcr 15,%r14 0000000082dc585c: 47000700 bc 0,1792 Call Trace: [<0000000082dc584c>] __vmalloc_node_range+0x52c/0x648 [<0000000082dc5b62>] vmalloc+0x5a/0x68 [<000003ff8067f4ca>] kvm_arch_vm_ioctl+0x2da/0x2a30 [kvm] [<000003ff806705bc>] kvm_vm_ioctl+0x4ec/0x978 [kvm] [<0000000082e562fe>] __s390x_sys_ioctl+0xbe/0x100 [<000000008360a9bc>] __do_syscall+0x1d4/0x200 [<0000000083618bd2>] system_call+0x82/0xb0 Last Breaking-Event-Address: [<0000000082dc5348>] __vmalloc_node_range+0x28/0x648 Other than the warning, there is no ill effect from the missing check, the condition is detected by subsequent code and causes a return with ENOMEM. Fixes: ef11c9463ae0 (KVM: s390: Add vm IOCTL for key checked guest absolute memory access) Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220221163237.4122868-1-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: Add capability for storage key extension of MEM_OP IOCTLJanis Schoetterl-Glausch2022-02-141-0/+1
| | | | | | | | | | | | Availability of the KVM_CAP_S390_MEM_OP_EXTENSION capability signals that: * The vcpu MEM_OP IOCTL supports storage key checking. * The vm MEM_OP IOCTL exists. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-9-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: Rename existing vcpu memop functionsJanis Schoetterl-Glausch2022-02-141-9/+10
| | | | | | | | | | Makes the naming consistent, now that we also have a vm ioctl. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-8-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: Add vm IOCTL for key checked guest absolute memory accessJanis Schoetterl-Glausch2022-02-143-0/+159
| | | | | | | | | | | | | | | | | | | Channel I/O honors storage keys and is performed on absolute memory. For I/O emulation user space therefore needs to be able to do key checked accesses. The vm IOCTL supports read/write accesses, as well as checking if an access would succeed. Unlike relying on KVM_S390_GET_SKEYS for key checking would, the vm IOCTL performs the check in lockstep with the read or write, by, ultimately, mapping the access to move instructions that support key protection checking with a supplied key. Fetch and storage protection override are not applicable to absolute accesses and so are not applied as they are when using the vcpu memop. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-7-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: Add optional storage key checking to MEMOP IOCTLJanis Schoetterl-Glausch2022-02-141-10/+21
| | | | | | | | | | | | | | | | | User space needs a mechanism to perform key checked accesses when emulating instructions. The key can be passed as an additional argument. Having an additional argument is flexible, as user space can pass the guest PSW's key, in order to make an access the same way the CPU would, or pass another key if necessary. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-6-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: handle_tprot: Honor storage keysJanis Schoetterl-Glausch2022-02-143-43/+35
| | | | | | | | | | | | | | | | | | Use the access key operand to check for key protection when translating guest addresses. Since the translation code checks for accessing exceptions/error hvas, we can remove the check here and simplify the control flow. Keep checking if the memory is read-only even if such memslots are currently not supported. handle_tprot was the last user of guest_translate_address, so remove it. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-4-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: Honor storage keys when accessing guest memoryJanis Schoetterl-Glausch2022-02-146-31/+253
| | | | | | | | | | | | | | | | | | | | | Storage key checking had not been implemented for instructions emulated by KVM. Implement it by enhancing the functions used for guest access, in particular those making use of access_guest which has been renamed to access_guest_with_key. Accesses via access_guest_real should not be key checked. For actual accesses, key checking is done by copy_from/to_user_key (which internally uses MVCOS/MVCP/MVCS). In cases where accessibility is checked without an actual access, this is performed by getting the storage key and checking if the access key matches. In both cases, if applicable, storage and fetch protection override are honored. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-3-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* s390/uaccess: Add copy_from/to_user_key functionsJanis Schoetterl-Glausch2022-02-142-18/+85
| | | | | | | | | | | | | | | | | | | | | Add copy_from/to_user_key functions, which perform storage key checking. These functions can be used by KVM for emulating instructions that need to be key checked. These functions differ from their non _key counterparts in include/linux/uaccess.h only in the additional key argument and must be kept in sync with those. Since the existing uaccess implementation on s390 makes use of move instructions that support having an additional access key supplied, we can implement raw_copy_from/to_user_key by enhancing the existing implementation. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-2-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* KVM: s390: Return error on SIDA memop on normal guestJanis Schoetterl-Glausch2022-02-021-0/+2
| | | | | | | | | | | Refuse SIDA memops on guests which are not protected. For normal guests, the secure instruction data address designation, which determines the location we access, is not under control of KVM. Fixes: 19e122776886 (KVM: S390: protvirt: Introduce instruction data area bounce buffer) Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
* s390/hypfs: include z/VM guests with access control group setVasily Gorbik2022-01-261-2/+4
| | | | | | | | | | | | | | | | Currently if z/VM guest is allowed to retrieve hypervisor performance data globally for all guests (privilege class B) the query is formed in a way to include all guests but the group name is left empty. This leads to that z/VM guests which have access control group set not being included in the results (even local vm). Change the query group identifier from empty to "any" to retrieve information about all guests from any groups (or without a group set). Cc: stable@vger.kernel.org Fixes: 31cb4bd31a48 ("[S390] Hypervisor filesystem (s390_hypfs) for z/VM") Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
* s390: update defconfigsHeiko Carstens2022-01-243-17/+22
| | | | Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/module: test loading modules with a lot of relocationsIlya Leoshkevich2022-01-245-0/+116
| | | | | | | | | | Add a test in order to prevent regressions. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/module: fix loading modules with a lot of relocationsIlya Leoshkevich2022-01-241-19/+18
| | | | | | | | | | | | | | | | | If the size of the PLT entries generated by apply_rela() exceeds 64KiB, the first ones can no longer reach __jump_r1 with brc. Fix by using brcl. An alternative solution is to add a __jump_r1 copy after every 64KiB, however, the space savings are quite small and do not justify the additional complexity. Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches") Cc: stable@vger.kernel.org Reported-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/uaccess: fix compile errorHeiko Carstens2022-01-231-2/+2
| | | | | | | | | | | | | | | | Compiling with e.g MARCH=z900 results in compile errors: arch/s390/lib/uaccess.c: In function 'copy_from_user_mvcos': >> arch/s390/lib/uaccess.c:65:15: error: variable 'spec' has initializer but incomplete type 65 | union oac spec = { Therefore make definition of union oac visible for all MARCHs. Reported-by: kernel test robot <lkp@intel.com> Cc: Nico Boehr <nrb@linux.ibm.com> Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Fixes: 012a224e1fa3 ("s390/uaccess: introduce bit field for OAC specifier") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/nmi: handle vector validity failures for KVM guestsChristian Borntraeger2022-01-231-1/+8
| | | | | | | | | | | | | The machine check validity bit tells about the context. If a KVM guest was running the bit tells about the guest validity and the host state is not affected. As a guest can disable the guest validity this might result in unwanted host errors on machine checks. Cc: stable@vger.kernel.org Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* s390/nmi: handle guarded storage validity failures for KVM guestsChristian Borntraeger2022-01-231-4/+14
| | | | | | | | | | | | | | | | | | machine check validity bits reflect the state of the machine check. If a guest does not make use of guarded storage, the validity bit might be off. We can not use the host CR bit to decide if the validity bit must be on. So ignore "invalid" guarded storage controls for KVM guests in the host and rely on the machine check being forwarded to the guest. If no other errors happen from a host perspective everything is fine and no process must be killed and the host can continue to run. Cc: stable@vger.kernel.org Fixes: c929500d7a5a ("s390/nmi: s390: New low level handling for machine check happening in guest") Reported-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Tested-by: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linuxLinus Torvalds2022-01-233-3/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull bitmap updates from Yury Norov: - introduce for_each_set_bitrange() - use find_first_*_bit() instead of find_next_*_bit() where possible - unify for_each_bit() macros * tag 'bitmap-5.17-rc1' of git://github.com/norov/linux: vsprintf: rework bitmap_list_string lib: bitmap: add performance test for bitmap_print_to_pagebuf bitmap: unify find_bit operations mm/percpu: micro-optimize pcpu_is_populated() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate find: micro-optimize for_each_{set,clear}_bit() include/linux: move for_each_bit() macros from bitops.h to find.h cpumask: replace cpumask_next_* with cpumask_first_* where appropriate tools: sync tools/bitmap with mother linux all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate cpumask: use find_first_and_bit() lib: add find_first_and_bit() arch: remove GENERIC_FIND_FIRST_BIT entirely include: move find.h from asm_generic to linux bitops: move find_bit_*_le functions from le.h to find.h bitops: protect find_first_{,zero}_bit properly
| * all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriateYury Norov2022-01-151-1/+1
| | | | | | | | | | | | | | | | | | find_first{,_zero}_bit is a more effective analogue of 'next' version if start == 0. This patch replaces 'next' with 'first' where things look trivial. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
| * arch: remove GENERIC_FIND_FIRST_BIT entirelyYury Norov2022-01-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 5.12 cycle we enabled GENERIC_FIND_FIRST_BIT config option for ARM64 and MIPS. It increased performance and shrunk .text size; and so far I didn't receive any negative feedback on the change. https://lore.kernel.org/linux-arch/20210225135700.1381396-1-yury.norov@gmail.com/ Now I think it's a good time to switch all architectures to use find_{first,last}_bit() unconditionally, and so remove corresponding config option. The patch does't introduce functioal changes for arc, arm, arm64, mips, m68k, s390 and x86, for other architectures I expect improvement both in performance and .text size. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Alexander Lobakin <alobakin@pm.me> (mips) Reviewed-by: Alexander Lobakin <alobakin@pm.me> (mips) Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
| * include: move find.h from asm_generic to linuxYury Norov2022-01-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | find_bit API and bitmap API are closely related, but inclusion paths are different - include/asm-generic and include/linux, correspondingly. In the past it made a lot of troubles due to circular dependencies and/or undefined symbols. Fix this by moving find.h under include/linux. Signed-off-by: Yury Norov <yury.norov@gmail.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
* | mm: remove cleancacheChristoph Hellwig2022-01-222-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit 814bbf49dcd0 ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 's390-5.17-2' of ↵Linus Torvalds2022-01-216-56/+104
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - add Sven Schnelle as reviewer for s390 code - make uaccess code more readable - change cpu measurement facility code to also support counter second version number 7, and add discard support for limited samples * tag 's390-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: add Sven Schnelle as reviewer s390/uaccess: introduce bit field for OAC specifier s390/cpumf: Support for CPU Measurement Sampling Facility LS bit s390/cpumf: Support for CPU Measurement Facility CSVN 7
| * | s390/uaccess: introduce bit field for OAC specifierNico Boehr2022-01-172-49/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we've used magic values to specify the OAC (operand-access control) for mvcos. Instead we introduce a bit field for it. When using a bit field, we cannot use an immediate value with K constraint anymore, since GCC older than 10 doesn't recognize the bit field union as a compile time constant. To make things work with older compilers, load the OAC value through a register. Bloat-o-meter reports a slight increase in kernel size with this change: Total: Before=15692135, After=15693015, chg +0.01% Signed-off-by: Nico Boehr <nrb@linux.ibm.com> Co-developed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220111100003.743116-1-scgl@linux.ibm.com Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | s390/cpumf: Support for CPU Measurement Sampling Facility LS bitThomas Richter2022-01-172-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for the CPU Measurement Sampling Facility limit sampling bit in the sampling device driver. Limited samples have no valueable information are not collected. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
| * | s390/cpumf: Support for CPU Measurement Facility CSVN 7Thomas Richter2022-01-172-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for the CPU Measurement Counter Facility second version number 7. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
* | | Merge tag 'kbuild-v5.17' of ↵Linus Torvalds2022-01-191-15/+13
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add new kconfig target 'make mod2noconfig', which will be useful to speed up the build and test iteration. - Raise the minimum supported version of LLVM to 11.0.0 - Refactor certs/Makefile - Change the format of include/config/auto.conf to stop double-quoting string type CONFIG options. - Fix ARCH=sh builds in dash - Separate compression macros for general purposes (cmd_bzip2 etc.) and the ones for decompressors (cmd_bzip2_with_size etc.) - Misc Makefile cleanups * tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits) kbuild: add cmd_file_size arch: decompressor: remove useless vmlinux.bin.all-y kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22} kbuild: drop $(size_append) from cmd_zstd sh: rename suffix-y to suffix_y doc: kbuild: fix default in `imply` table microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV} certs: move scripts/extract-cert to certs/ kbuild: do not quote string values in include/config/auto.conf kbuild: do not include include/config/auto.conf from shell scripts certs: simplify $(srctree)/ handling and remove config_filename macro kbuild: stop using config_filename in scripts/Makefile.modsign certs: remove misleading comments about GCC PR certs: refactor file cleaning certs: remove unneeded -I$(srctree) option for system_certificates.o certs: unify duplicated cmd_extract_certs and improve the log certs: use $< and $@ to simplify the key generation rule kbuild: remove headers_check stub kbuild: move headers_check.pl to usr/include/ certs: use if_changed to re-generate the key when the key type is changed ...
| * | | arch: decompressor: remove useless vmlinux.bin.all-yMasahiro Yamada2022-01-131-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presumably, arch/{parisc,s390,sh}/boot/compressed/Makefile copied arch/x86/boot/compressed/Makefile, but vmlinux.bin.all-y is useless here because it is the same as $(obj)/vmlinux.bin. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
| * | | kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}Masahiro Yamada2022-01-131-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GZIP-compressed files end with 4 byte data that represents the size of the original input. The decompressors (the self-extracting kernel) exploit it to know the vmlinux size beforehand. To mimic the GZIP's trailer, Kbuild provides cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}. Unfortunately these macros are used everywhere despite the appended size data is only useful for the decompressors. There is no guarantee that such hand-crafted trailers are safely ignored. In fact, the kernel refuses compressed initramdfs with the garbage data. That is why usr/Makefile overrides size_append to make it no-op. To limit the use of such broken compressed files, this commit renames the existing macros as follows: cmd_bzip2 --> cmd_bzip2_with_size cmd_lzma --> cmd_lzma_with_size cmd_lzo --> cmd_lzo_with_size cmd_lz4 --> cmd_lz4_with_size cmd_xzkern --> cmd_xzkern_with_size cmd_zstd22 --> cmd_zstd22_with_size To keep the decompressors working, I updated the following Makefiles accordingly: arch/arm/boot/compressed/Makefile arch/h8300/boot/compressed/Makefile arch/mips/boot/compressed/Makefile arch/parisc/boot/compressed/Makefile arch/s390/boot/compressed/Makefile arch/sh/boot/compressed/Makefile arch/x86/boot/compressed/Makefile I reused the current macro names for the normal usecases; they produce the compressed data in the proper format. I did not touch the following: arch/arc/boot/Makefile arch/arm64/boot/Makefile arch/csky/boot/Makefile arch/mips/boot/Makefile arch/riscv/boot/Makefile arch/sh/boot/Makefile kernel/Makefile This means those Makefiles will stop appending the size data. I dropped the 'override size_append' hack from usr/Makefile. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <n.schier@avm.de>
* | | | Merge branch 'signal-for-v5.17' of ↵Linus Torvalds2022-01-172-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal/exit/ptrace updates from Eric Biederman: "This set of changes deletes some dead code, makes a lot of cleanups which hopefully make the code easier to follow, and fixes bugs found along the way. The end-game which I have not yet reached yet is for fatal signals that generate coredumps to be short-circuit deliverable from complete_signal, for force_siginfo_to_task not to require changing userspace configured signal delivery state, and for the ptrace stops to always happen in locations where we can guarantee on all architectures that the all of the registers are saved and available on the stack. Removal of profile_task_ext, profile_munmap, and profile_handoff_task are the big successes for dead code removal this round. A bunch of small bug fixes are included, as most of the issues reported were small enough that they would not affect bisection so I simply added the fixes and did not fold the fixes into the changes they were fixing. There was a bug that broke coredumps piped to systemd-coredump. I dropped the change that caused that bug and replaced it entirely with something much more restrained. Unfortunately that required some rebasing. Some successes after this set of changes: There are few enough calls to do_exit to audit in a reasonable amount of time. The lifetime of struct kthread now matches the lifetime of struct task, and the pointer to struct kthread is no longer stored in set_child_tid. The flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is removed. Issues where task->exit_code was examined with signal->group_exit_code should been examined were fixed. There are several loosely related changes included because I am cleaning up and if I don't include them they will probably get lost. The original postings of these changes can be found at: https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org I trimmed back the last set of changes to only the obviously correct once. Simply because there was less time for review than I had hoped" * 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits) ptrace/m68k: Stop open coding ptrace_report_syscall ptrace: Remove unused regs argument from ptrace_report_syscall ptrace: Remove second setting of PT_SEIZED in ptrace_attach taskstats: Cleanup the use of task->exit_code exit: Use the correct exit_code in /proc/<pid>/stat exit: Fix the exit_code for wait_task_zombie exit: Coredumps reach do_group_exit exit: Remove profile_handoff_task exit: Remove profile_task_exit & profile_munmap signal: clean up kernel-doc comments signal: Remove the helper signal_group_exit signal: Rename group_exit_task group_exec_task coredump: Stop setting signal->group_exit_task signal: Remove SIGNAL_GROUP_COREDUMP signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process signal: Make coredump handling explicit in complete_signal signal: Have prepare_signal detect coredumps using signal->core_state signal: Have the oom killer detect coredumps using signal->core_state exit: Move force_uaccess back into do_exit exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit ...
| * | | | exit: Add and use make_task_dead.Eric W. Biederman2021-12-132-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * | | | exit/s390: Remove dead reference to do_exit from copy_threadEric W. Biederman2021-12-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My s390 assembly is not particularly good so I have read the history of the reference to do_exit copy_thread and have been able to verify that do_exit is not used. The general argument is that s390 has been changed to use the generic kernel_thread and kernel_execve and the generic versions do not call do_exit. So it is strange to see a do_exit reference sitting there. The history of the do_exit reference in s390's version of copy_thread seems conclusive that the do_exit reference is something that lingers and should have been removed several years ago. Up through 8d19f15a60be ("[PATCH] s390 update (1/27): arch.") the s390 code made a call to the exit(2) system call when a kernel thread finished. Then kernel_thread_starter was added which branched directly to the value in register 11 when the kernel thread finshed. The value in register 11 was set in kernel_thread to "regs.gprs[11] = (unsigned long) do_exit" In commit 37fe5d41f640 ("s390: fold kernel_thread_helper() into ret_from_fork()") kernel_thread_starter was moved into entry.S and entry64.S unchanged (except for the syntax differences between inline assemly and in the assembly file). In commit f9a7e025dfc2 ("s390: switch to generic kernel_thread()") the assignment to "gprs[11]" was moved into copy_thread from the old kernel_thread. The helper kernel_thread_starter was still being used and was still branching to "%r11" at the end. In commit 30dcb0996e40 ("s390: switch to saner kernel_execve() semantics") kernel_thread_starter was changed to unconditionally branch to sysc_tracenogo instead to %r11 which held the value of do_exit. Unfortunately copy_thread was not updated to stop passing do_exit in "gprs[11]". In commit 56e62a737028 ("s390: convert to generic entry") kernel_thread_starter was replaced by __ret_from_fork. And the code still continued to pass do_exit in "gprs[11]" despite __ret_from_fork not caring in the slightest. Remove this dead reference to do_exit to make it clear that s390 is not doing anything with do_exit in copy_thread. Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: 30dcb0996e40 ("s390: switch to saner kernel_execve() semantics") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2022-01-1610-196/+231
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...
| * \ \ \ \ Merge tag 'kvm-s390-next-5.17-1' of ↵Paolo Bonzini2021-12-216-85/+152
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fix and cleanup - fix sigp sense/start/stop/inconsistency - cleanups
| | * | | | | KVM: s390: Clarify SIGP orders versus STOP/RESTARTEric Farman2021-12-174-2/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With KVM_CAP_S390_USER_SIGP, there are only five Signal Processor orders (CONDITIONAL EMERGENCY SIGNAL, EMERGENCY SIGNAL, EXTERNAL CALL, SENSE, and SENSE RUNNING STATUS) which are intended for frequent use and thus are processed in-kernel. The remainder are sent to userspace with the KVM_CAP_S390_USER_SIGP capability. Of those, three orders (RESTART, STOP, and STOP AND STORE STATUS) have the potential to inject work back into the kernel, and thus are asynchronous. Let's look for those pending IRQs when processing one of the in-kernel SIGP orders, and return BUSY (CC2) if one is in process. This is in agreement with the Principles of Operation, which states that only one order can be "active" on a CPU at a time. Cc: stable@vger.kernel.org Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211213210550.856213-2-farman@linux.ibm.com [borntraeger@linux.ibm.com: add stable tag] Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
| | * | | | | s390: uv: Add offset comments to UV query struct and fix namingJanosch Frank2021-12-171-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes to the struct are easier to manage with offset comments so let's add some. And now that we know that the last struct member has the wrong name let's also fix this. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
| | * | | | | KVM: s390: gaccess: Cleanup access to guest pagesJanis Schoetterl-Glausch2021-12-171-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a helper function for guest frame access. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20211126164549.7046-4-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
| | * | | | | KVM: s390: gaccess: Refactor access address range checkJanis Schoetterl-Glausch2021-12-171-53/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not round down the first address to the page boundary, just translate it normally, which gives the value we care about in the first place. Given this, translating a single address is just the special case of translating a range spanning a single page. Make the output optional, so the function can be used to just check a range. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20211126164549.7046-3-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
| | * | | | | KVM: s390: gaccess: Refactor gpa and length calculationJanis Schoetterl-Glausch2021-12-171-15/+17
| | | |/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve readability by renaming the length variable and not calculating the offset manually. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20211126164549.7046-2-scgl@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
| * | | | | KVM: s390: Use Makefile.kvm for common filesDavid Woodhouse2021-12-091-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Message-Id: <20211121125451.9489-4-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | Merge branch 'kvm-on-hv-msrbm-fix' into HEADPaolo Bonzini2021-12-085-8/+23
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge bugfix for enlightened MSR Bitmap, before adding support to KVM for exposing the feature to nested guests.
| * | | | | | KVM: Rename kvm_vcpu_block() => kvm_vcpu_halt()Sean Christopherson2021-12-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename kvm_vcpu_block() to kvm_vcpu_halt() in preparation for splitting the actual "block" sequences into a separate helper (to be named kvm_vcpu_block()). x86 will use the standalone block-only path to handle non-halt cases where the vCPU is not runnable. Rename block_ns to halt_ns to match the new function name. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-14-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: Drop obsolete kvm_arch_vcpu_block_finish()Sean Christopherson2021-12-082-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop kvm_arch_vcpu_block_finish() now that all arch implementations are nops. No functional change intended. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: s390: Clear valid_wakeup in kvm_s390_handle_wait(), not in arch hookSean Christopherson2021-12-082-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the clearing of valid_wakeup from kvm_arch_vcpu_block_finish() so that a future patch can drop said arch hook. Unlike the other blocking- related arch hooks, vcpu_blocking/unblocking(), vcpu_block_finish() needs to be called even if the KVM doesn't actually block the vCPU. This will allow future patches to differentiate between truly blocking the vCPU and emulating a halt condition without introducing a contradiction. Alternatively, the hook could be renamed to kvm_arch_vcpu_halt_finish(), but there's literally one call site in s390, and future cleanup can also be done to handle valid_wakeup fully within kvm_s390_handle_wait() and allow generic KVM to drop vcpu_valid_wakeup(). No functional change intended. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-9-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: s390: Ensure kvm_arch_no_poll() is read once when blocking vCPUSean Christopherson2021-12-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrap s390's halt_poll_max_steal with READ_ONCE and snapshot the result of kvm_arch_no_poll() in kvm_vcpu_block() to avoid a mostly-theoretical, largely benign bug on s390 where the result of kvm_arch_no_poll() could change due to userspace modifying halt_poll_max_steal while the vCPU is blocking. The bug is largely benign as it will either cause KVM to skip updating halt-polling times (no_poll toggles false=>true) or to update halt-polling times with a slightly flawed block_ns. Note, READ_ONCE is unnecessary in the current code, add it in case the arch hook is ever inlined, and to provide a hint that userspace can change the param at will. Fixes: 8b905d28ee17 ("KVM: s390: provide kvm_arch_no_poll function") Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20211009021236.4122790-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: Keep memslots in tree-based structures instead of array-based onesMaciej S. Szmigiero2021-12-082-15/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <17c0cf3663b760a0d3753d4ac08c0753e941b811.1638817641.git.maciej.szmigiero@oracle.com>