summaryrefslogtreecommitdiffstats
path: root/arch/powerpc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-11-2619-176/+295
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "ARM: - data abort report and injection - steal time support - GICv4 performance improvements - vgic ITS emulation fixes - simplify FWB handling - enable halt polling counters - make the emulated timer PREEMPT_RT compliant s390: - small fixes and cleanups - selftest improvements - yield improvements PPC: - add capability to tell userspace whether we can single-step the guest - improve the allocation of XIVE virtual processor IDs - rewrite interrupt synthesis code to deliver interrupts in virtual mode when appropriate. - minor cleanups and improvements. x86: - XSAVES support for AMD - more accurate report of nested guest TSC to the nested hypervisor - retpoline optimizations - support for nested 5-level page tables - PMU virtualization optimizations, and improved support for nested PMU virtualization - correct latching of INITs for nested virtualization - IOAPIC optimization - TSX_CTRL virtualization for more TAA happiness - improved allocation and flushing of SEV ASIDs - many bugfixes and cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraints KVM: x86: Grab KVM's srcu lock when setting nested state KVM: x86: Open code shared_msr_update() in its only caller KVM: Fix jump label out_free_* in kvm_init() KVM: x86: Remove a spurious export of a static function KVM: x86: create mmu/ subdirectory KVM: nVMX: Remove unnecessary TLB flushes on L1<->L2 switches when L1 use apic-access-page KVM: x86: remove set but not used variable 'called' KVM: nVMX: Do not mark vmcs02->apic_access_page as dirty when unpinning KVM: vmx: use MSR_IA32_TSX_CTRL to hard-disable TSX on guest that lack it KVM: vmx: implement MSR_IA32_TSX_CTRL disable RTM functionality KVM: x86: implement MSR_IA32_TSX_CTRL effect on CPUID KVM: x86: do not modify masked bits of shared MSRs KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one KVM: nVMX: Assume TLB entries of L1 and L2 are tagged differently if L0 use EPT KVM: x86: Unexport kvm_vcpu_reload_apic_access_page() KVM: nVMX: add CR4_LA57 bit to nested CR4_FIXED1 KVM: nVMX: Use semi-colon instead of comma for exit-handlers initialization ...
| * Merge tag 'kvm-ppc-next-5.5-2' of ↵Paolo Bonzini2019-11-251-15/+29
| |\ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD Second KVM PPC update for 5.5 - Two fixes from Greg Kurz to fix memory leak bugs in the XIVE code.
| | * KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error pathGreg Kurz2019-11-211-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to check the host page size is big enough to accomodate the EQ. Let's do this before taking a reference on the EQ page to avoid a potential leak if the check fails. Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new oneGreg Kurz2019-11-211-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EQ page is allocated by the guest and then passed to the hypervisor with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page before handing it over to the HW. This reference is dropped either when the guest issues the H_INT_RESET hcall or when the KVM device is released. But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times, either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot). In both cases the existing EQ page reference is leaked because we simply overwrite it in the XIVE queue structure without calling put_page(). This is especially visible when the guest memory is backed with huge pages: start a VM up to the guest userspace, either reboot it or unplug a vCPU, quit QEMU. The leak is observed by comparing the value of HugePages_Free in /proc/meminfo before and after the VM is run. Ideally we'd want the XIVE code to handle the EQ page de-allocation at the platform level. This isn't the case right now because the various XIVE drivers have different allocation needs. It could maybe worth introducing hooks for this purpose instead of exposing XIVE internals to the drivers, but this is certainly a huge work to be done later. In the meantime, for easier backport, fix both vCPU unplug and guest reboot leaks by introducing a wrapper around xive_native_configure_queue() that does the necessary cleanup. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Cc: stable@vger.kernel.org # v5.2 Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Tested-by: Lijun Pan <ljp@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | Merge branch 'kvm-tsx-ctrl' into HEADPaolo Bonzini2019-11-217-18/+70
| |\ \ | | | | | | | | | | | | | | | | Conflicts: arch/x86/kvm/vmx/vmx.c
| * \ \ Merge tag 'kvm-ppc-next-5.5-1' of ↵Paolo Bonzini2019-11-0118-159/+264
| |\ \ \ | | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD KVM PPC update for 5.5 * Add capability to tell userspace whether we can single-step the guest. * Improve the allocation of XIVE virtual processor IDs, to reduce the risk of running out of IDs when running many VMs on POWER9. * Rewrite interrupt synthesis code to deliver interrupts in virtual mode when appropriate. * Minor cleanups and improvements.
| | * | KVM: PPC: Book3S HV: Reject mflags=2 (LPCR[AIL]=2) ADDR_TRANS_MODE modeNicholas Piggin2019-10-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AIL=2 mode has no known users, so is not well tested or supported. Disallow guests from selecting this mode because it may become deprecated in future versions of the architecture. This policy decision is not left to QEMU because KVM support is required for AIL=2 (when injecting interrupts). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: Implement LPCR[AIL]=3 mode for injected interruptsNicholas Piggin2019-10-221-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvmppc_inject_interrupt does not implement LPCR[AIL]!=0 modes, which can result in the guest receiving interrupts as if LPCR[AIL]=0 contrary to the ISA. In practice, Linux guests cope with this deviation, but it should be fixed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: Reuse kvmppc_inject_interrupt for async guest deliveryNicholas Piggin2019-10-223-57/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This consolidates the HV interrupt delivery logic into one place. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S: Replace reset_msr mmu op with inject_interrupt arch opNicholas Piggin2019-10-228-62/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reset_msr sets the MSR for interrupt injection, but it's cleaner and more flexible to provide a single op to set both MSR and PC for the interrupt. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S: Define and use SRR1_MSR_BITSNicholas Piggin2019-10-223-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: XIVE: Allow userspace to set the # of VPsGreg Kurz2019-10-223-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new attribute to both XIVE and XICS-on-XIVE KVM devices so that userspace can tell how many interrupt servers it needs. If a VM needs less than the current default of KVM_MAX_VCPUS (2048), we can allocate less VPs in OPAL. Combined with a core stride (VSMT) that matches the number of guest threads per core, this may substantially increases the number of VMs that can run concurrently with an in-kernel XIVE device. Since the legacy XIVE KVM device is exposed to userspace through the XICS KVM API, a new attribute group is added to it for this purpose. While here, fix the syntax of the existing KVM_DEV_XICS_GRP_SOURCES in the XICS documentation. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: XIVE: Make VP block size configurableGreg Kurz2019-10-223-25/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XIVE VP is an internal structure which allow the XIVE interrupt controller to maintain the interrupt context state of vCPUs non dispatched on HW threads. When a guest is started, the XIVE KVM device allocates a block of XIVE VPs in OPAL, enough to accommodate the highest possible vCPU id KVM_MAX_VCPU_ID (16384) packed down to KVM_MAX_VCPUS (2048). With a guest's core stride of 8 and a threading mode of 1 (QEMU's default), a VM must run at least 256 vCPUs to actually need such a range of VPs. A POWER9 system has a limited XIVE VP space : 512k and KVM is currently wasting this HW resource with large VP allocations, especially since a typical VM likely runs with a lot less vCPUs. Make the size of the VP block configurable. Add an nr_servers field to the XIVE structure and a function to set it for this purpose. Split VP allocation out of the device create function. Since the VP block isn't used before the first vCPU connects to the XIVE KVM device, allocation is now performed by kvmppc_xive_connect_vcpu(). This gives the opportunity to set nr_servers in between: kvmppc_xive_create() / kvmppc_xive_native_create() . . kvmppc_xive_set_nr_servers() . . kvmppc_xive_connect_vcpu() / kvmppc_xive_native_connect_vcpu() The connect_vcpu() functions check that the vCPU id is below nr_servers and if it is the first vCPU they allocate the VP block. This is protected against a concurrent update of nr_servers by kvmppc_xive_set_nr_servers() with the xive->lock mutex. Also, the block is allocated once for the device lifetime: nr_servers should stay constant otherwise connect_vcpu() could generate a boggus VP id and likely crash OPAL. It is thus forbidden to update nr_servers once the block is allocated. If the VP allocation fail, return ENOSPC which seems more appropriate to report the depletion of system wide HW resource than ENOMEM or ENXIO. A VM using a stride of 8 and 1 thread per core with 32 vCPUs would hence only need 256 VPs instead of 2048. If the stride is set to match the number of threads per core, this goes further down to 32. This will be exposed to userspace by a subsequent patch. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: XIVE: Compute the VP id in a common helperGreg Kurz2019-10-223-18/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduce code duplication by consolidating the checking of vCPU ids and VP ids to a common helper used by both legacy and native XIVE KVM devices. And explain the magic with a comment. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: XIVE: Show VP id in debugfsGreg Kurz2019-10-222-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print out the VP id of each connected vCPU, this allow to see: - the VP block base in which OPAL encodes information that may be useful when debugging - the packed vCPU id which may differ from the raw vCPU id if the latter is >= KVM_MAX_VCPUS (2048) Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Book3S HV: XIVE: Set kvm->arch.xive when VPs are allocatedGreg Kurz2019-10-222-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we cannot allocate the XIVE VPs in OPAL, the creation of a XIVE or XICS-on-XIVE device is aborted as expected, but we leave kvm->arch.xive set forever since the release method isn't called in this case. Any subsequent tentative to create a XIVE or XICS-on-XIVE for this VM will thus always fail (DoS). This is a problem for QEMU since it destroys and re-creates these devices when the VM is reset: the VM would be restricted to using the much slower emulated XIVE or XICS forever. As an alternative to adding rollback, do not assign kvm->arch.xive before making sure the XIVE VPs are allocated in OPAL. Cc: stable@vger.kernel.org # v5.2 Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method") Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: E500: Replace current->mm by kvm->mmLeonardo Bras2019-10-221-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Given that in kvm_create_vm() there is: kvm->mm = current->mm; And that on every kvm_*_ioctl we have: if (kvm->mm != current->mm) return -EIO; I see no reason to keep using current->mm instead of kvm->mm. By doing so, we would reduce the use of 'global' variables on code, relying more in the contents of kvm struct. Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Reduce calls to get current->mm by storing the value locallyLeonardo Bras2019-10-221-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reduces the number of calls to get_current() in order to get the value of current->mm by doing it once and storing the value, since it is not supposed to change inside the same process). Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| | * | KVM: PPC: Report single stepping capabilityFabiano Rosas2019-10-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request the next instruction to be single stepped via the KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure. This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order to inform userspace about the state of single stepping support. We currently don't have support for guest single stepping implemented in Book3S HV so the capability is only present for Book3S PR and BookE. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
| * | | KVM: Add separate helper for putting borrowed reference to kvmSean Christopherson2019-10-222-2/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new helper, kvm_put_kvm_no_destroy(), to handle putting a borrowed reference[*] to the VM when installing a new file descriptor fails. KVM expects the refcount to remain valid in this case, as the in-progress ioctl() has an explicit reference to the VM. The primary motiviation for the helper is to document that the 'kvm' pointer is still valid after putting the borrowed reference, e.g. to document that doing mutex(&kvm->lock) immediately after putting a ref to kvm isn't broken. [*] When exposing a new object to userspace via a file descriptor, e.g. a new vcpu, KVM grabs a reference to itself (the VM) prior to making the object visible to userspace to avoid prematurely freeing the VM in the scenario where userspace immediately closes file descriptor. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge tag 'arm64-upstream' of ↵Linus Torvalds2019-11-262-14/+15
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Apart from the arm64-specific bits (core arch and perf, new arm64 selftests), it touches the generic cow_user_page() (reviewed by Kirill) together with a macro for x86 to preserve the existing behaviour on this architecture. Summary: - On ARMv8 CPUs without hardware updates of the access flag, avoid failing cow_user_page() on PFN mappings if the pte is old. The patches introduce an arch_faults_on_old_pte() macro, defined as false on x86. When true, cow_user_page() makes the pte young before attempting __copy_from_user_inatomic(). - Covert the synchronous exception handling paths in arch/arm64/kernel/entry.S to C. - FTRACE_WITH_REGS support for arm64. - ZONE_DMA re-introduced on arm64 to support Raspberry Pi 4 - Several kselftest cases specific to arm64, together with a MAINTAINERS update for these files (moved to the ARM64 PORT entry). - Workaround for a Neoverse-N1 erratum where the CPU may fetch stale instructions under certain conditions. - Workaround for Cortex-A57 and A72 errata where the CPU may speculatively execute an AT instruction and associate a VMID with the wrong guest page tables (corrupting the TLB). - Perf updates for arm64: additional PMU topologies on HiSilicon platforms, support for CCN-512 interconnect, AXI ID filtering in the IMX8 DDR PMU, support for the CCPI2 uncore PMU in ThunderX2. - GICv3 optimisation to avoid a heavy barrier when accessing the ICC_PMR_EL1 register. - ELF HWCAP documentation updates and clean-up. - SMC calling convention conduit code clean-up. - KASLR diagnostics printed during boot - NVIDIA Carmel CPU added to the KPTI whitelist - Some arm64 mm clean-ups: use generic free_initrd_mem(), remove stale macro, simplify calculation in __create_pgd_mapping(), typos. - Kconfig clean-ups: CMDLINE_FORCE to depend on CMDLINE, choice for endinanness to help with allmodconfig" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: Kconfig: add a choice for endianness kselftest: arm64: fix spelling mistake "contiguos" -> "contiguous" arm64: Kconfig: make CMDLINE_FORCE depend on CMDLINE MAINTAINERS: Add arm64 selftests to the ARM64 PORT entry arm64: kaslr: Check command line before looking for a seed arm64: kaslr: Announce KASLR status on boot kselftest: arm64: fake_sigreturn_misaligned_sp kselftest: arm64: fake_sigreturn_bad_size kselftest: arm64: fake_sigreturn_duplicated_fpsimd kselftest: arm64: fake_sigreturn_missing_fpsimd kselftest: arm64: fake_sigreturn_bad_size_for_magic0 kselftest: arm64: fake_sigreturn_bad_magic kselftest: arm64: add helper get_current_context kselftest: arm64: extend test_init functionalities kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht] kselftest: arm64: mangle_pstate_invalid_daif_bits kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils kselftest: arm64: extend toplevel skeleton Makefile drivers/perf: hisi: update the sccl_id/ccl_id for certain HiSilicon platform arm64: mm: reserve CMA and crashkernel in ZONE_DMA32 ...
| * | dma/direct: turn ARCH_ZONE_DMA_BITS into a variableNicolas Saenz Julienne2019-11-012-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Some architectures, notably ARM, are interested in tweaking this depending on their runtime DMA addressing limitations. Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2019-11-091-0/+13
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) BPF sample build fixes from Björn Töpel 2) Fix powerpc bpf tail call implementation, from Eric Dumazet. 3) DCCP leaks jiffies on the wire, fix also from Eric Dumazet. 4) Fix crash in ebtables when using dnat target, from Florian Westphal. 5) Fix port disable handling whne removing bcm_sf2 driver, from Florian Fainelli. 6) Fix kTLS sk_msg trim on fallback to copy mode, from Jakub Kicinski. 7) Various KCSAN fixes all over the networking, from Eric Dumazet. 8) Memory leaks in mlx5 driver, from Alex Vesker. 9) SMC interface refcounting fix, from Ursula Braun. 10) TSO descriptor handling fixes in stmmac driver, from Jose Abreu. 11) Add a TX lock to synchonize the kTLS TX path properly with crypto operations. From Jakub Kicinski. 12) Sock refcount during shutdown fix in vsock/virtio code, from Stefano Garzarella. 13) Infinite loop in Intel ice driver, from Colin Ian King. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (108 commits) ixgbe: need_wakeup flag might not be set for Tx i40e: need_wakeup flag might not be set for Tx igb/igc: use ktime accessors for skb->tstamp i40e: Fix for ethtool -m issue on X722 NIC iavf: initialize ITRN registers with correct values ice: fix potential infinite loop because loop counter being too small qede: fix NULL pointer deref in __qede_remove() net: fix data-race in neigh_event_send() vsock/virtio: fix sock refcnt holding during the shutdown net: ethernet: octeon_mgmt: Account for second possible VLAN header mac80211: fix station inactive_time shortly after boot net/fq_impl: Switch to kvmalloc() for memory allocation mac80211: fix ieee80211_txq_setup_flows() failure path ipv4: Fix table id reference in fib_sync_down_addr ipv6: fixes rt6_probe() and fib6_nh->last_probe init net: hns: Fix the stray netpoll locks causing deadlock in NAPI path net: usb: qmi_wwan: add support for DW5821e with eSIM support CDC-NCM: handle incomplete transfer of MTU nfc: netlink: fix double device reference drop NFC: st21nfca: fix double free ...
| * \ \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2019-11-061-0/+13
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2019-11-02 The following pull-request contains BPF updates for your *net* tree. We've added 6 non-merge commits during the last 6 day(s) which contain a total of 8 files changed, 35 insertions(+), 9 deletions(-). The main changes are: 1) Fix ppc BPF JIT's tail call implementation by performing a second pass to gather a stable JIT context before opcode emission, from Eric Dumazet. 2) Fix build of BPF samples sys_perf_event_open() usage to compiled out unavailable test_attr__{enabled,open} checks. Also fix potential overflows in bpf_map_{area_alloc,charge_init} on 32 bit archs, from Björn Töpel. 3) Fix narrow loads of bpf_sysctl context fields with offset > 0 on big endian archs like s390x and also improve the test coverage, from Ilya Leoshkevich. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | powerpc/bpf: Fix tail call implementationEric Dumazet2019-11-021-0/+13
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have seen many crashes on powerpc hosts while loading bpf programs. The problem here is that bpf_int_jit_compile() does a first pass to compute the program length. Then it allocates memory to store the generated program and calls bpf_jit_build_body() a second time (and a third time later) What I have observed is that the second bpf_jit_build_body() could end up using few more words than expected. If bpf_jit_binary_alloc() put the space for the program at the end of the allocated page, we then write on a non mapped memory. It appears that bpf_jit_emit_tail_call() calls bpf_jit_emit_common_epilogue() while ctx->seen might not be stable. Only after the second pass we can be sure ctx->seen wont be changed. Trying to avoid a second pass seems quite complex and probably not worth it. Fixes: ce0761419faef ("powerpc/bpf: Implement support for tail calls") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191101033444.143741-1-edumazet@google.com
* | | Merge tag 'powerpc-5.4-4' of ↵Linus Torvalds2019-11-026-18/+57
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Our recent cleanup of EEH led to an oops on bare metal machines when the cxl (CAPI) driver creates virtual devices for an attached FPGA accelerator. The "secure virtual machine" support we added in v5.4 had a bug if the kernel was relocated (moved during boot), in those cases the signature of the kernel text wouldn't verify and the Ultravisor would refuse to run the VM. A recent change to disable interrupts before calling arch_cpu_idle_dead() caused a WARN_ON() in our bare metal CPU offline code to always trigger. The KUAP (SMAP) support we added for 32-bit Book3S had a bug if the address range crossed a segment (256MB) boundary which could lead to spurious faults. Thanks to: Christophe Leroy, Frederic Barrat, Michael Anderson, Nicholas Piggin, Sam Bobroff, Thiago Jung Bauermann" * tag 'powerpc-5.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Fix CPU idle to be called with IRQs disabled powerpc/prom_init: Undo relocation before entering secure mode powerpc/powernv/eeh: Fix oops when probing cxl devices powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.
| * | powerpc/powernv: Fix CPU idle to be called with IRQs disabledNicholas Piggin2019-10-291-16/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e78a7614f3876 ("idle: Prevent late-arriving interrupts from disrupting offline") changes arch_cpu_idle_dead to be called with interrupts disabled, which triggers the WARN in pnv_smp_cpu_kill_self. Fix this by fixing up irq_happened after hard disabling, rather than requiring there are no pending interrupts, similarly to what was done done until commit 2525db04d1cc5 ("powerpc/powernv: Simplify lazy IRQ handling in CPU offline"). Fixes: e78a7614f3876 ("idle: Prevent late-arriving interrupts from disrupting offline") Reported-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add unexpected_mask rather than checking for known bad values, change the WARN_ON() to a WARN_ON_ONCE()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191022115814.22456-1-npiggin@gmail.com
| * | powerpc/prom_init: Undo relocation before entering secure modeThiago Jung Bauermann2019-10-293-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ultravisor will do an integrity check of the kernel image but we relocated it so the check will fail. Restore the original image by relocating it back to the kernel virtual base address. This works because during build vmlinux is linked with an expected virtual runtime address of KERNELBASE. Fixes: 6a9c930bd775 ("powerpc/prom_init: Add the ESM call to prom_init") Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Tested-by: Michael Anderson <andmike@linux.ibm.com> [mpe: Add IS_ENABLED() to fix the CONFIG_RELOCATABLE=n build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911163433.12822-1-bauerman@linux.ibm.com
| * | powerpc/powernv/eeh: Fix oops when probing cxl devicesFrederic Barrat2019-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent cleanup in the way EEH support is added to a device causes a kernel oops when the cxl driver probes a device and creates virtual devices discovered on the FPGA: BUG: Kernel NULL pointer dereference at 0x000000a0 Faulting instruction address: 0xc000000000048070 Oops: Kernel access of bad area, sig: 7 [#1] ... NIP eeh_add_device_late.part.9+0x50/0x1e0 LR eeh_add_device_late.part.9+0x3c/0x1e0 Call Trace: _dev_info+0x5c/0x6c (unreliable) pnv_pcibios_bus_add_device+0x60/0xb0 pcibios_bus_add_device+0x40/0x60 pci_bus_add_device+0x30/0x100 pci_bus_add_devices+0x64/0xd0 cxl_pci_vphb_add+0xe0/0x130 [cxl] cxl_probe+0x504/0x5b0 [cxl] local_pci_probe+0x6c/0x110 work_for_cpu_fn+0x38/0x60 The root cause is that those cxl virtual devices don't have a representation in the device tree and therefore no associated pci_dn structure. In eeh_add_device_late(), pdn is NULL, so edev is NULL and we oops. We never had explicit support for EEH for those virtual devices. Instead, EEH events are reported to the (real) pci device and handled by the cxl driver. Which can then forward to the virtual devices and handle dependencies. The fact that we try adding EEH support for the virtual devices is new and a side-effect of the recent cleanup. This patch fixes it by skipping adding EEH support on powernv for devices which don't have a pci_dn structure. The cxl driver doesn't create virtual devices on pseries so this patch doesn't fix it there intentionally. Fixes: b905f8cdca77 ("powerpc/eeh: EEH for pSeries hot plug") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191016162833.22509-1-fbarrat@linux.ibm.com
| * | powerpc/32s: fix allow/prevent_user_access() when crossing segment boundaries.Christophe Leroy2019-10-161-0/+1
| |/ | | | | | | | | | | | | | | | | | | | | Make sure starting addr is aligned to segment boundary so that when incrementing the segment, the starting address of the new segment is below the end address. Otherwise the last segment might get missed. Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/067a1b09f15f421d40797c2d04c22d4049a1cee8.1571071875.git.christophe.leroy@c-s.fr
* / KVM: PPC: Book3S HV: XIVE: Ensure VP isn't already in useGreg Kurz2019-10-153-10/+32
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Connecting a vCPU to a XIVE KVM device means establishing a 1:1 association between a vCPU id and the offset (VP id) of a VP structure within a fixed size block of VPs. We currently try to enforce the 1:1 relationship by checking that a vCPU with the same id isn't already connected. This is good but unfortunately not enough because we don't map VP ids to raw vCPU ids but to packed vCPU ids, and the packing function kvmppc_pack_vcpu_id() isn't bijective by design. We got away with it because QEMU passes vCPU ids that fit well in the packing pattern. But nothing prevents userspace to come up with a forged vCPU id resulting in a packed id collision which causes the KVM device to associate two vCPUs to the same VP. This greatly confuses the irq layer and ultimately crashes the kernel, as shown below. Example: a guest with 1 guest thread per core, a core stride of 8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to inject at some point an invalid vCPU id 348, which is the packed version of itself and 2392, we get: genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348) CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38 Call Trace: [c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable) [c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0 [c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260 [c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm] [c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm] [c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm] [c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm] [c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30 [c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120 [c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80 [c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68 xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392 ------------[ cut here ]------------ remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348' WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200 Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm] CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38 NIP: c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0 REGS: c000003f7f9934b0 TRAP: 0700 Not tainted (5.3.0-xive-nr-servers-5.3-gku+) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48228222 XER: 20040000 CFAR: c000000000131a50 IRQMASK: 0 GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057 GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033 GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1 GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000 GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006 GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48 GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0 NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200 LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 Call Trace: [c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable) [c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150 [c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0 [c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100 [c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80 [c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm] [c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm] [c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm] [c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm] [c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30 [c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120 [c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80 [c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68 Instruction dump: 2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378 3884f080 3863b948 4bbf6925 60000000 <0fe00000> 4bffff7c fba10088 4bbf6e41 ---[ end trace b925b67a74a1d8d1 ]--- BUG: Kernel NULL pointer dereference at 0x00000010 Faulting instruction address: 0xc00800000d44fc04 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm] CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G W 5.3.0-xive-nr-servers-5.3-gku+ #38 NIP: c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970 REGS: c000003f7f9938e0 TRAP: 0300 Tainted: G W (5.3.0-xive-nr-servers-5.3-gku+) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24228882 XER: 20040000 CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000 GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8 GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1 GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000 GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8 GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000 GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90 NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm] LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] Call Trace: [c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable) [c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm] [c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm] [c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320 [c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0 [c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440 [c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74 Instruction dump: 3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c 2f890000 419effb8 4800821d e8410018 <e9230010> e9490008 9b2a0039 7c0004ac ---[ end trace b925b67a74a1d8d2 ]--- Kernel panic - not syncing: Fatal exception This affects both XIVE and XICS-on-XIVE devices since the beginning. Check the VP id instead of the vCPU id when a new vCPU is connected. The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu() is moved after the check to avoid the need for rollback. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* spufs: fix a crash in spufs_create_root()Emmanuel Nicolet2019-10-111-0/+1
| | | | | | | | | | | The spu_fs_context was not set in fc->fs_private, this caused a crash when accessing ctx->mode in spufs_create_root(). Fixes: d2e0981c3b9a ("vfs: Convert spufs to use the new mount API") Signed-off-by: Emmanuel Nicolet <emmanuel.nicolet@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20191008141342.GA266797@gmail.com
* powerpc/kvm: Fix kvmppc_vcore->in_guest value in kvmhv_switch_to_hostJordan Niethe2019-10-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to continue. This happens after resetting the PCR. Before commit 13c7bb3c57dc ("powerpc/64s: Set reserved PCR bits"), r0 would always be 0 before it was stored to kvmppc_vcore->in_guest. However because of this change in the commit: /* Reset PCR */ ld r0, VCORE_PCR(r5) - cmpdi r0, 0 + LOAD_REG_IMMEDIATE(r6, PCR_MASK) + cmpld r0, r6 beq 18f - li r0, 0 - mtspr SPRN_PCR, r0 + mtspr SPRN_PCR, r6 18: /* Signal secondary CPUs to continue */ stb r0,VCORE_IN_GUEST(r5) We are no longer comparing r0 against 0 and loading it with 0 if it contains something else. Hence when we store r0 to kvmppc_vcore->in_guest, it might not be 0. This means that secondary CPUs will not be signalled to continue. Those CPUs get stuck and errors like the following are logged: KVM: CPU 1 seems to be stuck KVM: CPU 2 seems to be stuck KVM: CPU 3 seems to be stuck KVM: CPU 4 seems to be stuck KVM: CPU 5 seems to be stuck KVM: CPU 6 seems to be stuck KVM: CPU 7 seems to be stuck This can be reproduced with: $ for i in `seq 1 7` ; do chcpu -d $i ; done ; $ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \ -M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \ -kernel vmlinux -initrd initrd.cpio.xz Fix by making sure r0 is 0 before storing it to kvmppc_vcore->in_guest. Fixes: 13c7bb3c57dc ("powerpc/64s: Set reserved PCR bits") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
* powerpc/pseries: Remove confusing warning message.Laurent Dufour2019-10-091-0/+3
| | | | | | | | | | | | | | | | | | | | | Since commit 1211ee61b4a8 ("powerpc/pseries: Read TLB Block Invalidate Characteristics"), a warning message is displayed when booting a guest on top of KVM: lpar: arch/powerpc/platforms/pseries/lpar.c pseries_lpar_read_hblkrm_characteristics Error calling get-system-parameter (0xfffffffd) This message is displayed because this hypervisor is not supporting the H_BLOCK_REMOVE hcall and thus is not exposing the corresponding feature. Reading the TLB Block Invalidate Characteristics should not be done if the feature is not exposed. Fixes: 1211ee61b4a8 ("powerpc/pseries: Read TLB Block Invalidate Characteristics") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191001132928.72555-1-ldufour@linux.ibm.com
* powerpc/64s/radix: Fix build failure with RADIX_MMU=nStephen Rothwell2019-10-091-0/+4
| | | | | | | | | | | | | | | | | | | | After merging the powerpc tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/mm/book3s64/pgtable.c:216:3: error: implicit declaration of function 'radix__flush_all_lpid_guest' radix__flush_all_lpid_guest() is only declared for CONFIG_PPC_RADIX_MMU which is not set for this build. Fix it by adding an empty version for the RADIX_MMU=n case, which should never be called. Fixes: 99161de3a283 ("powerpc/64s/radix: tidy up TLB flushing code") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [mpe: Munge change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190930101342.36c1afa0@canb.auug.org.au
* Merge tag 'kbuild-fixes-v5.4' of ↵Linus Torvalds2019-10-051-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - remove unneeded ar-option and KBUILD_ARFLAGS - remove long-deprecated SUBDIRS - fix modpost to suppress false-positive warnings for UML builds - fix namespace.pl to handle relative paths to ${objtree}, ${srctree} - make setlocalversion work for /bin/sh - make header archive reproducible - fix some Makefiles and documents * tag 'kbuild-fixes-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kheaders: make headers archive reproducible kbuild: update compile-test header list for v5.4-rc2 kbuild: two minor updates for Documentation/kbuild/modules.rst scripts/setlocalversion: clear local variable to make it work for sh namespace: fix namespace.pl script to support relative paths video/logo: do not generate unneeded logo C files video/logo: remove unneeded *.o pattern from clean-files integrity: remove pointless subdir-$(CONFIG_...) integrity: remove unneeded, broken attempt to add -fshort-wchar modpost: fix static EXPORT_SYMBOL warnings for UML build kbuild: correct formatting of header in kbuild module docs kbuild: remove SUBDIRS support kbuild: remove ar-option and KBUILD_ARFLAGS
| * kbuild: remove ar-option and KBUILD_ARFLAGSMasahiro Yamada2019-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 40df759e2b9e ("kbuild: Fix build with binutils <= 2.19") introduced ar-option and KBUILD_ARFLAGS to deal with old binutils. According to Documentation/process/changes.rst, the current minimal supported version of binutils is 2.21 so you can assume the 'D' option is always supported. Not only GNU ar but also llvm-ar supports it. With the 'D' option hard-coded, there is no more user of ar-option or KBUILD_ARFLAGS. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2019-10-041-4/+4
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "ARM and x86 bugfixes of all kinds. The most visible one is that migrating a nested hypervisor has always been busted on Broadwell and newer processors, and that has finally been fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) KVM: x86: omit "impossible" pmu MSRs from MSR list KVM: nVMX: Fix consistency check on injected exception error code KVM: x86: omit absent pmu MSRs from MSR list selftests: kvm: Fix libkvm build error kvm: vmx: Limit guest PMCs to those supported on the host kvm: x86, powerpc: do not allow clearing largepages debugfs entry KVM: selftests: x86: clarify what is reported on KVM_GET_MSRS failure KVM: VMX: Set VMENTER_L1D_FLUSH_NOT_REQUIRED if !X86_BUG_L1TF selftests: kvm: add test for dirty logging inside nested guests KVM: x86: fix nested guest live migration with PML KVM: x86: assign two bits to track SPTE kinds KVM: x86: Expose XSAVEERPTR to the guest kvm: x86: Enumerate support for CLZERO instruction kvm: x86: Use AMD CPUID semantics for AMD vCPUs kvm: x86: Improve emulation of CPUID leaves 0BH and 1FH KVM: X86: Fix userspace set invalid CR4 kvm: x86: Fix a spurious -E2BIG in __do_cpuid_func KVM: LAPIC: Loosen filter for adaptive tuning of lapic_timer_advance_ns KVM: arm/arm64: vgic: Use the appropriate TRACE_INCLUDE_PATH arm64: KVM: Kill hyp_alternate_select() ...
| * kvm: x86, powerpc: do not allow clearing largepages debugfs entryPaolo Bonzini2019-09-301-4/+4
| | | | | | | | | | | | | | | | | | | | | | The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | Merge tag 'libnvdimm-fixes-5.4-rc1' of ↵Linus Torvalds2019-09-294-9/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm More libnvdimm updates from Dan Williams: - Complete the reworks to interoperate with powerpc dynamic huge page sizes - Fix a crash due to missed accounting for the powerpc 'struct page'-memmap mapping granularity - Fix badblock initialization for volatile (DRAM emulated) pmem ranges - Stop triggering request_key() notifications to userspace when NVDIMM-security is disabled / not present - Miscellaneous small fixups * tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm/region: Enable MAP_SYNC for volatile regions libnvdimm: prevent nvdimm from requesting key when security is disabled libnvdimm/region: Initialize bad block for volatile namespaces libnvdimm/nfit_test: Fix acpi_handle redefinition libnvdimm/altmap: Track namespace boundaries in altmap libnvdimm: Fix endian conversion issues  libnvdimm/dax: Pick the right alignment default when creating dax devices powerpc/book3s64: Export has_transparent_hugepage() related functions.
| * | libnvdimm/altmap: Track namespace boundaries in altmapAneesh Kumar K.V2019-09-241-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With PFN_MODE_PMEM namespace, the memmap area is allocated from the device area. Some architectures map the memmap area with large page size. On architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. This maps a namespace size of 16G. When populating memmap region with 16MB page from the device area, make sure the allocated space is not used to map resources outside this namespace. Such usage of device area will prevent a namespace destroy. Add resource end pnf in altmap and use that to check if the memmap area allocation can map pfn outside the namespace. On ppc64 in such case we fallback to allocation from memory. This fix kernel crash reported below: [ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0 [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000 [ 133.464760] Faulting instruction address: 0xc00000000007580c [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1] [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ..... [ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0 [ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0 [ 133.464910] Call Trace: [ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable) [ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240 [ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590 [ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160 [ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0 [ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50 [ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400 [ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250 [ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110 [ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70 [ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0 [ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250 [ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70 [ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250 djbw: Aneesh notes that this crash can likely be triggered in any kernel that supports 'papr_scm', so flagging that commit for -stable consideration. Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: <stable@vger.kernel.org> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Tested-by: Santosh Sivaraj <santosh@fossix.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | powerpc/book3s64: Export has_transparent_hugepage() related functions.Aneesh Kumar K.V2019-09-243-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In later patch, we want to use hash_transparent_hugepage() in a kernel module. Export two related functions. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190924042440.27946-1-aneesh.kumar@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | Merge tag 'powerpc-5.4-2' of ↵Linus Torvalds2019-09-2823-92/+559
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "An assortment of fixes that were either missed by me, or didn't arrive quite in time for the first v5.4 pull. - Most notable is a fix for an issue with tlbie (broadcast TLB invalidation) on Power9, when using the Radix MMU. The tlbie can race with an mtpid (move to PID register, essentially MMU context switch) on another thread of the core, which can cause stores to continue to go to a page after it's unmapped. - A fix in our KVM code to add a missing barrier, the lack of which has been observed to cause missed IPIs and subsequently stuck CPUs in the host. - A change to the way we initialise PCR (Processor Compatibility Register) to make it forward compatible with future CPUs. - On some older PowerVM systems our H_BLOCK_REMOVE support could oops, fix it to detect such systems and fallback to the old invalidation method. - A fix for an oops seen on some machines when using KASAN on 32-bit. - A handful of other minor fixes, and two new selftests. Thanks to: Alistair Popple, Aneesh Kumar K.V, Christophe Leroy, Gustavo Romero, Joel Stanley, Jordan Niethe, Laurent Dufour, Michael Roth, Oliver O'Halloran" * tag 'powerpc-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/eeh: Fix eeh eeh_debugfs_break_device() with SRIOV devices powerpc/nvdimm: use H_SCM_QUERY hcall on H_OVERLAP error powerpc/nvdimm: Use HCALL error as the return value selftests/powerpc: Add test case for tlbie vs mtpidr ordering issue powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9 powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flag powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisions powerpc/pseries: Call H_BLOCK_REMOVE when supported powerpc/pseries: Read TLB Block Invalidate Characteristics KVM: PPC: Book3S HV: use smp_mb() when setting/clearing host_ipi flag powerpc/mm: Fix an Oops in kasan_mmu_init() powerpc/mm: Add a helper to select PAGE_KERNEL_RO or PAGE_READONLY powerpc/64s: Set reserved PCR bits powerpc: Fix definition of PCR bits to work with old binutils powerpc/book3s64/radix: Remove WARN_ON in destroy_context() powerpc/tm: Add tm-poison test
| * | | powerpc/eeh: Fix eeh eeh_debugfs_break_device() with SRIOV devicesOliver O'Halloran2019-09-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | s/CONFIG_IOV/CONFIG_PCI_IOV/ Whoops. Fixes: bd6461cc7b3c ("powerpc/eeh: Add a eeh_dev_break debugfs interface") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Fixup the #endif comment as well] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190926122502.14826-1-oohall@gmail.com
| * | | powerpc/nvdimm: use H_SCM_QUERY hcall on H_OVERLAP errorAneesh Kumar K.V2019-09-251-8/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we force an unbind of SCM memory at drcindex on H_OVERLAP error. This really slows down operations like kexec where we get the H_OVERLAP error because we don't go through a full hypervisor re init. H_OVERLAP error for a H_SCM_BIND_MEM hcall indicates that SCM memory at drc index is already bound. Since we don't specify a logical memory address for bind hcall, we can use the H_SCM_QUERY hcall to query the already bound logical address. Boot time difference with and without patch is: [ 5.583617] IOMMU table initialized, virtual merging enabled [ 5.603041] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Retrying bind after unbinding [ 301.514221] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: Retrying bind after unbinding [ 340.057238] hv-24x7: read 1530 catalog entries, created 537 event attrs (0 failures), 275 descs after fix [ 5.101572] IOMMU table initialized, virtual merging enabled [ 5.116984] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Querying SCM details [ 5.117223] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: Querying SCM details [ 5.120530] hv-24x7: read 1530 catalog entries, created 537 event attrs (0 failures), 275 descs Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190903123452.28620-2-aneesh.kumar@linux.ibm.com
| * | | powerpc/nvdimm: Use HCALL error as the return valueAneesh Kumar K.V2019-09-251-15/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies the error handling and also enable us to switch to H_SCM_QUERY hcall in a later patch on H_OVERLAP error. We also do some kernel print formatting fixup in this patch. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190903123452.28620-1-aneesh.kumar@linux.ibm.com
| * | | powerpc/mm: Fixup tlbie vs mtpidr/mtlpidr ordering issue on POWER9Aneesh Kumar K.V2019-09-245-22/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com
| * | | powerpc/book3s64/radix: Rename CPU_FTR_P9_TLBIE_BUG feature flagAneesh Kumar K.V2019-09-245-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
| * | | powerpc/book3s64/mm: Don't do tlbie fixup for some hardware revisionsAneesh Kumar K.V2019-09-241-2/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The store ordering vs tlbie issue mentioned in commit a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them We can only do this on PowerNV. On pseries guest with KVM we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2 Fixes: a5d4b5891c2f ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com
| * | | powerpc/pseries: Call H_BLOCK_REMOVE when supportedLaurent Dufour2019-09-241-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE may not be able to process all the page sizes for a segment base page size, as reported by the TLB Invalidate Characteristics. For each pair of base segment page size and actual page size, this characteristic tells us the size of the block the hcall supports. In the case, the hcall is not supporting a pair of base segment page size, actual page size, it is returning H_PARAM which leads to a panic like this: kernel BUG at /home/srikar/work/linux.git/arch/powerpc/platforms/pseries/lpar.c:466! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 28 PID: 583 Comm: modprobe Not tainted 5.2.0-master #5 NIP: c0000000000be8dc LR: c0000000000be880 CTR: 0000000000000000 REGS: c0000007e77fb130 TRAP: 0700 Not tainted (5.2.0-master) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42224824 XER: 20000000 CFAR: c0000000000be8fc IRQMASK: 0 GPR00: 0000000022224828 c0000007e77fb3c0 c000000001434d00 0000000000000005 GPR04: 9000000004fa8c00 0000000000000000 0000000000000003 0000000000000001 GPR08: c0000007e77fb450 0000000000000000 0000000000000001 ffffffffffffffff GPR12: c0000007e77fb450 c00000000edfcb80 0000cd7d3ea30000 c0000000016022b0 GPR16: 00000000000000b0 0000cd7d3ea30000 0000000000000001 c080001f04f00105 GPR20: 0000000000000003 0000000000000004 c000000fbeb05f58 c000000001602200 GPR24: 0000000000000000 0000000000000004 8800000000000000 c000000000c5d148 GPR28: c000000000000000 8000000000000000 a000000000000000 c0000007e77fb580 NIP [c0000000000be8dc] .call_block_remove+0x12c/0x220 LR [c0000000000be880] .call_block_remove+0xd0/0x220 Call Trace: 0xc000000fb8c00240 (unreliable) .pSeries_lpar_flush_hash_range+0x578/0x670 .flush_hash_range+0x44/0x100 .__flush_tlb_pending+0x3c/0xc0 .zap_pte_range+0x7ec/0x830 .unmap_page_range+0x3f4/0x540 .unmap_vmas+0x94/0x120 .exit_mmap+0xac/0x1f0 .mmput+0x9c/0x1f0 .do_exit+0x388/0xd60 .do_group_exit+0x54/0x100 .__se_sys_exit_group+0x14/0x20 system_call+0x5c/0x70 Instruction dump: 39400001 38a00000 4800003c 60000000 60420000 7fa9e800 38e00000 419e0014 7d29d278 7d290074 7929d182 69270001 <0b070000> 7d495378 394a0001 7fa93040 The call to H_BLOCK_REMOVE should only be made for the supported pair of base segment page size, actual page size and using the correct maximum block size. Due to the required complexity in do_block_remove() and call_block_remove(), and the fact that currently a block size of 8 is returned by the hypervisor, we are only supporting 8 size block to the H_BLOCK_REMOVE hcall. In order to identify this limitation easily in the code, a local define HBLKR_SUPPORTED_SIZE defining the currently supported block size, and a dedicated checking helper is_supported_hlbkr() are introduced. For regular pages and hugetlb, the assumption is made that the page size is equal to the base page size. For THP the page size is assumed to be 16M. Fixes: ba2dd8a26baa ("powerpc/pseries/mm: call H_BLOCK_REMOVE") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190920130523.20441-3-ldufour@linux.ibm.com