summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/configs: Update for symbol movement onlyMichael Ellerman2017-08-2844-114/+114
| | | | | | | Update defconfigs for symbols that have moved around, without their value changing. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/oops: Line up NIP & MSR with other rowsMichael Ellerman2017-08-281-2/+2
| | | | | | | | | | | | | | | | | | | | | This is purely cosmetic, but does look nicer IMHO: Before: task: c000000001453400 task.stack: c000000001c6c000 NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220 REGS: c0000001fffef820 TRAP: 0300 Not tainted (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88088242 XER: 00000000 CFAR: c0000000000b3488 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 After: task: c000000001453400 task.stack: c000000001c6c000 NIP: c000000000a0fbfc LR: c000000000a0fbf4 CTR: c000000000ba6220 REGS: c0000001fffef820 TRAP: 0300 Not tainted (4.13.0-rc6-gcc-6.3.1-00234-g423af27f7d81-dirty) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 88088242 XER: 00000000 CFAR: c0000000000b34a4 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 0 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/oops: Print CR/XER on same line as MSRMichael Ellerman2017-08-281-1/+1
| | | | | | | | | Somehow we missed this when the pr_cont() changes went in. Fix CR/XER to go on the same line as MSR, as they have historically, eg: MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI> CR: 4804408a XER: 20000000 Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/oops: Use IS_ENABLED() for oops markersMichael Ellerman2017-08-281-9/+10
| | | | | | Just because it looks less gross. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/oops: Print the kernel's endian in the oopsMichael Ellerman2017-08-281-0/+6
| | | | | | | | | Although the MSR tells you what endian you're in it's possible that isn't the same endian the kernel was built for, and if that happens you're usually having a very bad day. So print a marker to make it 100% clear which endian the kernel was built for. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/oops: Fix the oops markers to use pr_cont()Michael Ellerman2017-08-281-5/+5
| | | | | | | | When we oops we print a few markers for significant config options such as PREEMPT, SMP etc. Currently these appear on separate lines because we're not using pr_cont() properly. Fix it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Fix build error in opal-imc.c when NUMA=nLABBE Corentin2017-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | When building a random powerpc kernel I hit this build error: arch/powerpc/platforms/powernv/opal-imc.c:130:13: error : assignment discards « const » qualifier from pointer target type [-Werror=discarded-qualifiers] l_cpumask = cpumask_of_node(nid); ^ This happens because when CONFIG_NUMA=n cpumask_of_node() returns a const pointer. This patch simply adds const to l_cpumask to fix this issue. Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Enable removal of memory for in memory tracingRashmica Gupta2017-08-243-0/+291
| | | | | | | | | | | | | | | | | | | | The hardware trace macro feature requires access to a chunk of real memory. This patch provides a debugfs interface to do this. By writing an integer containing the size of memory to be unplugged into /sys/kernel/debug/powerpc/memtrace/enable, the code will attempt to remove that much memory from the end of each NUMA node. This patch also adds additional debugsfs files for each node that allows the tracer to interact with the removed memory, as well as a trace file that allows userspace to read the generated trace. Note that this patch does not invoke the hardware trace macro, it only allows memory to be removed during runtime for the trace macro to utilise. Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> [mpe: Minor formatting etc fixups] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/uprobes: Implement arch_uretprobe_is_alive()Naveen N. Rao2017-08-241-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helper is used to detect if a uprobe'd function has returned through a setjmp/longjmp, rather than branching to the LR that was updated previously by us. This fixes a SIGSEGV that gets generated when programs use setjmp/longjmp with uretprobes. We use the arm64 model (arch/arm64/kernel/probes/uprobes.c: arch_uretprobe_is_alive()) for detecting when stack frames have been removed from under us. Reference: https://marc.info/?l=linux-kernel&m=143748610330073 commit 7b868e4802a86 ("uprobes/x86: Reimplement arch_uretprobe_is_alive()") commit db087ef69a2b1 ("uprobes/x86: Make arch_uretprobe_is_alive(RP_CHECK_CALL) more clever") Tested with the test program from: https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/bz5274.c;hb=HEAD And this script: $ cat test.sh #!/bin/bash perf probe -x ./bz5274 -a bz5274_main_return=main%return perf probe -x ./bz5274 -a bz5274_funca_return=funca%return perf probe -x ./bz5274 -a bz5274_funcb_return=funcb%return perf probe -x ./bz5274 -a bz5274_funcc_return=funcc%return perf probe -x ./bz5274 -a bz5274_funcd_return=funcd%return perf record -e 'probe_bz5274:*' -aR ./bz5274 Reported-by: Gustavo Luiz Duarte <gduarte@redhat.com> Reported-by: zsun@redhat.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kprobes: Don't save/restore DAR/DSISR to/from pt_regs for optprobesNaveen N. Rao2017-08-241-8/+0
| | | | | | | We don't save/restore these across a trap, or with KPROBES_ON_FTRACE. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xive: Fix the size of the cpumask used in xive_find_target_in_mask()Cédric Le Goater2017-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. This can result in a WARN_ON such as: WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0 ... NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0 LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0 Call Trace: xive_find_target_in_mask+0x74/0x2f0 (unreliable) xive_pick_irq_target.isra.1+0x200/0x230 xive_irq_startup+0x60/0x180 irq_startup+0x70/0xd0 __setup_irq+0x7bc/0x880 request_threaded_irq+0x14c/0x2c0 request_event_sources_irqs+0x100/0x180 __machine_initcall_pseries_init_ras_IRQ+0x104/0x134 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x290/0x374 kernel_init+0x24/0x170 ret_from_kernel_thread+0x5c/0x74 This happens because we're being called with our affinity mask set to irq_default_affinity. That in turn was populated using cpumask_setall(), which sets NR_CPUs worth of bits, not nr_cpu_ids worth. Finally cpumask_weight() will return > nr_cpu_ids when passed a mask which has > nr_cpu_ids bits set. Fix it by limiting the value returned by cpumask_weight(). Signed-off-by: Cédric Le Goater <clg@kaod.org> [mpe: Add change log details on actual cause] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64: Optimise set/clear of CTRL[RUN] (runlatch)Nicholas Piggin2017-08-231-8/+27
| | | | | | | | | | | | | On modern CPUs the CTRL register is read-only except bit 63 which is the run latch control. This means it can be updated with a mtspr rather than mfspr/mtspr. To accomodate older CPUs (Cell at least), where there are other bits in the register, we still do a read/modify/write on pre 2.06 CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Update change log to mention 2.06 workaround] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Remove spurious IRQ reason in IRQ replayNicholas Piggin2017-08-231-2/+0
| | | | | | | HVI interrupts have always used 0x500, so remove the dead branch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64: Remove redundant instruction in interrupt replayNicholas Piggin2017-08-231-1/+0
| | | | | Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Use the HV handler for external IRQ replay in HV mode on POWER9Nicholas Piggin2017-08-231-0/+4
| | | | | | | | | | POWER9 host external interrupts use the h_virt_irq_common handler, so use that to replay them rather than using the hardware_interrupt_common handler. Both call do_IRQ, but using the correct handler reduces i-cache footprint. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Merge HV and non-HV paths for doorbell IRQ replayNicholas Piggin2017-08-233-8/+2
| | | | | | | | | | This results in smaller code, and fewer branches. This relies on the fact that both the 0xe80 and 0xa00 handlers call the same upper level code, namely doorbell_exception(). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Mention we rely on the implementation of the 0xe80/0xa00 handlers] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64: Cleanup __check_irq_replay()Nicholas Piggin2017-08-231-22/+23
| | | | | | | | | Move the clearing of irq_happened bits into the condition where they were found to be set. This reduces instruction count slightly, and reduces stores into irq_happened. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: masked_interrupt() returns to kernel so avoid restoring r13Nicholas Piggin2017-08-231-1/+1
| | | | | | | | | | Places in the kernel where r13 is not the PACA pointer must have maskable interrupts disabled, so r13 does not have to be restored when returning from a soft-masked interrupt. We should never have interrupts soft disabled when we're in user space. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Optimise clearing of MSR_EE in masked_[H]interrupt()Nicholas Piggin2017-08-231-2/+1
| | | | | | | | MSR_EE is always enabled in SRR1 for masked interrupts, so we can use xor to clear it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Avoid a branch in masked_[H]interrupt()Nicholas Piggin2017-08-231-4/+2
| | | | | | | | Interrupts which do not require EE to be cleared can all be tested with a single bitwise test. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Make switch_mm_irqs_off() out of lineBenjamin Herrenschmidt2017-08-233-88/+102
| | | | | | | | | It's too big to be inline, there is no reason to keep it that way. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Rework to incorporate the comment changes via fixes branch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Optimize detection of thread local mm'sBenjamin Herrenschmidt2017-08-234-1/+24
| | | | | | | | | | Instead of comparing the whole CPU mask every time, let's keep a counter of how many bits are set in the mask. Thus testing for a local mm only requires testing if that counter is 1 and the current CPU bit is set in the mask. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Use mm_is_thread_local() instread of open-codingBenjamin Herrenschmidt2017-08-234-14/+6
| | | | | | | | We open-code testing for the mm being local to the current CPU in a few places. Use our existing helper instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Avoid double irq save/restore in activate_mmBenjamin Herrenschmidt2017-08-231-4/+0
| | | | | | | | It calls switch_mm() which already does the irq save/restore these days. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Move pgdir setting into a helperBenjamin Herrenschmidt2017-08-231-8/+22
| | | | | | | Makes switch_mm_irqs_off() a bit more readable Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Fix replay interrupt return label nameMichael Ellerman2017-08-231-2/+2
| | | | | | | | | | In __replay_interrupt() we take the address of a local label so we can return to it later. However the assembler turns the local label into a symbol with a name like ".L1^B42" - where "^B" is literally "\002". This does not make for pleasant stack traces. Fix it by giving the label a sensible name. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: pseries: remove dlpar_attach_node dependency on full pathRob Herring2017-08-234-7/+5
| | | | | | | | | | | | | | | In preparation to stop storing the full node path in full_name, remove the dependency on full_name from dlpar_attach_node(). Callers of dlpar_attach_node() already have the parent device_node, so just pass the parent node into dlpar_attach_node instead of the path. This avoids doing a lookup of the parent node by the path. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Convert to using %pOF instead of full_nameRob Herring2017-08-2379-487/+461
| | | | | | | | | | | | | | | | | | Now that we have a custom printf format specifier, convert users of full_name to use %pOF instead. This is preparation to remove storing of the full path string for each node. Signed-off-by: Rob Herring <robh@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anatolij Gustschin <agust@denx.de> Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/vio: Use device_type to detect familyMichael Ellerman2017-08-231-2/+2
| | | | | | | | | | | | | | | | | | Currently in the vio.c code we use a comparision against the parent device node's full path to decide if the device is a PFO or VIO family device. Both the ibm,platform-facilities and vdevice nodes are defined by PAPR, and must have a matching device_type. So instead of using the path we can instead compare the device_type. I've checked Qemu and kvmtool both do this correctly, and all the PowerVM systems I have access to do also. So it seems to be safe. This removes the dependency on full_name, which is being removed upstream. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge branch 'fixes' into nextMichael Ellerman2017-08-2327-127/+363
|\ | | | | | | | | | | There's a non-trivial dependency between some commits we want to put in next and the KVM prefetch work around that went into fixes. So merge fixes into next.
| * powerpc/mm: Ensure cpumask update is orderedBenjamin Herrenschmidt2017-08-183-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no guarantee that the various isync's involved with the context switch will order the update of the CPU mask with the first TLB entry for the new context being loaded by the HW. Be safe here and add a memory barrier to order any subsequent load/store which may bring entries into the TLB. The corresponding barrier on the other side already exists as pte updates use pte_xchg() which uses __cmpxchg_u64 which has a sync after the atomic operation. Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add comments in the code] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VECBenjamin Herrenschmidt2017-08-161-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VSX uses a combination of the old vector registers, the old FP registers and new "second halves" of the FP registers. Thus when we need to see the VSX state in the thread struct (flush_vsx_to_thread()) or when we'll use the VSX in the kernel (enable_kernel_vsx()) we need to ensure they are all flushed into the thread struct if either of them is individually enabled. Unfortunately we only tested if the whole VSX was enabled, not if they were individually enabled. Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available") Cc: stable@vger.kernel.org # v4.3+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: add locking around init/exit functionsNicholas Piggin2017-08-091-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | When CPUs start and stop the watchdog, they manipulate shared data that is normally protected by the lock. Other CPUs can be running concurrently at this time, so it's a good idea to use locking here to be on the safe side. Remove the barrier which is undocumented and didn't do anything. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Fix marking of stuck CPUsNicholas Piggin2017-08-091-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the SMP detector finds other CPUs stuck, it iterates over them and marks them as stuck. This pulls them out of the pending mask and allows the detector to continue with remaining good CPUs (if nmi_watchdog=panic is not enabled). The code to dothat was buggy because when setting a CPU stuck, if the pending mask became empty, it resets it to keep the watchdog running. However the iterator will continue to run over the new pending mask and mark remaining good CPUs sas stuck. Fix this by doing it with cpumask bitwise operations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Fix final-check recovered caseNicholas Piggin2017-08-091-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | When the watchdog decides to panic, it takes the lock and double checks everything (to avoid races with the CPU being unstuck or panic()ed by something else). The exit label was misplaced and would result in all-CPUs backtrace and watchdog panic even in the case that the condition was found to be resolved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Moderate touch_nmi_watchdog overheadNicholas Piggin2017-08-091-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some code can go into a tight loop calling touch_nmi_watchdog (e.g., stop_machine CPU hotplug code). This can cause contention on watchdog locks particularly if all CPUs with watchdog enabled are spinning in the loops. Avoid this storm of activity by running the watchdog timer callback from this path if we have exceeded the timer period since it was last run. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/watchdog: Improve watchdog lock primitiveNicholas Piggin2017-08-091-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Hard-disable interrupts before taking the lock, which prevents soft-NMI re-entrancy and therefore can prevent deadlocks. - Use raw_ variants of local_irq_disable to avoid irq debugging. - When the lock is contended, spin at low SMT priority, using loads only, and with interrupts enabled (where possible). Some stalls have been noticed at high loads that go away with improved locking. There should not be so much locking contention in the first place (which is addressed in a subsequent patch), but locking should still be improved. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc: NMI IPI improve lock primitiveNicholas Piggin2017-08-091-3/+3
| | | | | | | | | | | | | | | | | | | | When the NMI IPI lock is contended, spin at low SMT priority, using loads only, and with interrupts enabled (where possible). This improves behaviour under high contention (e.g., a system crash when a number of CPUs are trying to enter the debugger). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/configs: Re-enable HARD/SOFT lockup detectorsMichael Ellerman2017-08-093-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 05a4a9527931 ("kernel/watchdog: split up config options"), CONFIG_LOCKUP_DETECTOR was split into two separate config options, CONFIG_HARDLOCKUP_DETECTOR and CONFIG_SOFTLOCKUP_DETECTOR. Our defconfigs still have CONFIG_LOCKUP_DETECTOR=y, but that is no longer user selectable, and we don't mention the new options, so we end up with none of them enabled. So update the defconfigs to turn on the new SOFT and HARD options, the end result being the same as what we had previously. Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api failsGautham R. Shenoy2017-08-081-3/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we use the opal call opal_slw_set_reg() to inform the Sleep-Winkle Engine (SLW) to restore the contents of some of the Hypervisor state on wakeup from deep idle states that lose full hypervisor context (characterized by the flag OPAL_PM_LOSE_FULL_CONTEXT). However, the current code has a bug in that if opal_slw_set_reg() fails, we don't disable the use of these deep states (winkle on POWER8, stop4 onwards on POWER9). This patch fixes this bug by ensuring that if programing the sleep-winkle engine to restore the hypervisor states in pnv_save_sprs_for_deep_states() fails, then we exclude such states by clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from supported_cpuidle_states. As a result POWER8 will be prevented from using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to the default stop state when available. Further, we ensure in the initialization of the cpuidle-powernv driver to only include those states whose flags are present in supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT states when they have been disabled due to stop-api failure. Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle states via stop API.") Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"Michael Ellerman2017-08-072-46/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bc4f65e4cf9d6cc43e0e9ba0b8648cf9201cd55f. As reported by Andreas, this commit is causing unrecoverable SLB misses in the system call exit path: Unrecoverable exception 4100 at c00000000000a1ec Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2 PowerMac ... CPU: 0 PID: 18626 Comm: rm Not tainted 4.13.0-rc3 #1 task: c00000018335e080 task.stack: c000000139e50000 NIP: c00000000000a1ec LR: c00000000000a118 CTR: 0000000000000000 REGS: c000000139e53bb0 TRAP: 4100 Not tainted (4.13.0-rc3) MSR: 9000000000001030 <SF,HV,ME,IR,DR> CR: 24000044 XER: 20000000 SOFTE: 1 GPR00: 0000000000000000 c000000139e53e30 c000000000abb500 fffffffffffffffe GPR04: c0000001eb866298 0000000000000000 0000000000000000 c00000018335e080 GPR08: 900000000000d032 0000000000000000 0000000000000002 fffffffffffff001 GPR12: c000000139e50000 c00000000ffff000 00003fffa8c0dca0 00003fffa8c0dc88 GPR16: 0000000010000000 0000000000000001 00003fffa8c0eaa0 0000000000000000 GPR20: 00003fffa8c27528 00003fffa8c27b00 0000000000000000 0000000000000000 GPR24: 00003fffa8c0d918 00003ffff1b3efa0 00003fffa8c26d68 0000000000000000 GPR28: 00003fffa8c249e8 00003fffa8c263d0 00003fffa8c27550 00003ffff1b3ef10 NIP [c00000000000a1ec] system_call_exit+0xc0/0x21c LR [c00000000000a118] system_call+0x58/0x6c Call Trace: [c000000139e53e30] [c00000000000a118] system_call+0x58/0x6c (unreliable) Instruction dump: 64a51000 7c6300d0 f8a101a0 4bffff9c 3c000000 60000006 780007c6 64000000 60000000 7c004039 4082001c e8ed0170 <88070b78> 88c70b79 7c003214 2c200000 This is caused by us trying to load THREAD_LOAD_FP with MSR_RI=0, and taking an SLB miss on the thread struct. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Diagnosed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64: Fix __check_irq_replay missing decrementer interruptNicholas Piggin2017-08-041-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the decrementer wraps again and de-asserts the decrementer exception while hard-disabled, __check_irq_replay() has a test to notice the wrap when interrupts are re-enabled. The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked because the decrementer interrupt was always the first one checked after clearing the hard disable flag, but HMI check was moved ahead of that, which introduced this bug. This can cause a missed decrementer interrupt if we soft-disable interrupts then take an HMI which is recorded in irq_happened, then hard-disable interrupts for > 4s to wrap the decrementer. Fixes: e0e0d6b7390b ("powerpc/64: Replay hypervisor maintenance interrupt first") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf: POWER9 PMU stops after idle workaroundNicholas Piggin2017-08-041-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | POWER9 DD2 PMU can stop after a state-loss idle in some conditions. A solution is to set then clear MMCRA[60] after wake from state-loss idle. MMCRA[60] is a non-architected bit, see the user manual for details. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error checkSergei Shtylyov2017-08-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | of_irq_to_resource() has recently been fixed to return negative error #'s along with 0 in case of failure, however the Freescale MPC832x RDB board code still only regards 0 as a failure indication -- fix it up. Fixes: 7a4228bbff76 ("of: irq: use of_irq_get() in of_irq_to_resource()") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Scott Wood <oss@buserror.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64s: Fix stack setup in watchdog soft_nmi_common()Nicholas Piggin2017-07-311-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The watchdog soft-NMI exception stack setup loads a stack pointer twice, which is an obvious error. It ends up using the system reset interrupt (true-NMI) stack, which is also a bug because the watchdog could be preempted by a system reset interrupt that overwrites the NMI stack. Change the soft-NMI to use the "emergency stack". The current kernel stack is not used, because of the longer-term goal to prevent asynchronous stack access using soft-disable. Fixes: 2104180a5369 ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * Merge tag 'v4.13-rc1' into fixesMichael Ellerman2017-07-31316-3232/+4156
| |\ | | | | | | | | | | | | | | | | | | | | | The fixes branch is based off a random pre-rc1 commit, because we had some fixes that needed to go in before rc1 was released. However we now need to fix some code that went in after that point, but before rc1, so merge rc1 to get that code into fixes so we can fix it!
| * | powerpc/powernv/pci: Return failure for some uses of dma_set_mask()Alistair Popple2017-07-281-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8e3f1b1d8255 ("powerpc/powernv/pci: Enable 64-bit devices to access >4GB DMA space") introduced the ability for PCI device drivers to request a DMA mask between 64 and 32 bits and actually get a mask greater than 32-bits. However currently if certain machine configuration dependent conditions are not meet the code silently falls back to a 32-bit mask. This makes it hard for device drivers to detect which mask they actually got. Instead we should return an error when the request could not be fulfilled which allows drivers to either fallback or implement other workarounds as documented in DMA-API-HOWTO.txt. Signed-off-by: Alistair Popple <alistair@popple.id.au> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compilerMichael Ellerman2017-07-281-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically the boot wrapper was always built 32-bit big endian, even for 64-bit kernels. That was because old firmwares didn't necessarily support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile uses CROSS32CC for compilation. However when we added 64-bit little endian support, we also added support for building the boot wrapper 64-bit. However we kept using CROSS32CC, because in most cases it is just CC and everything works. However if the user doesn't specify CROSS32_COMPILE (which no one ever does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC becomes just "gcc". On native systems that is probably OK, but if we're cross building it definitely isn't, leading to eg: gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c gcc: error: unrecognized argument in option ‘-mabi=elfv2’ gcc: error: unrecognized command line option ‘-mlittle-endian’ make: *** [zImage] Error 2 To fix it, stop using CROSS32CC, because we may or may not be building 32-bit. Instead setup a BOOTCC, which defaults to CC, and only use CROSS32_COMPILE if it's set and we're building for 32-bit. Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian wrapper") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
| * | powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPUMichael Ellerman2017-07-281-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot CPU, which means it has to run *on* the boot CPU. In the past we ensured it ran on the boot CPU by changing the CPU affinity mask of current directly. That was removed in commit 6d11b87d55eb ("powerpc/smp: Replace open coded task affinity logic"), and replaced with a work queue call. Unfortunately using a work queue leads to a lockdep warning, now that the CPU hotplug lock is a regular semaphore: ====================================================== WARNING: possible circular locking dependency detected ... kworker/0:1/971 is trying to acquire lock: (cpu_hotplug_lock.rw_sem){++++++}, at: [<c000000000100974>] apply_workqueue_attrs+0x34/0xa0 but task is already holding lock: ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800 ... CPU0 CPU1 ---- ---- lock((&wfc.work)); lock(cpu_hotplug_lock.rw_sem); lock((&wfc.work)); lock(cpu_hotplug_lock.rw_sem); Although the deadlock can't happen in practice, because smp_cpus_done() only runs in early boot before CPU hotplug is allowed, lockdep can't tell that. Luckily in commit 8fb12156b8db ("init: Pin init task to the boot CPU, initially") tglx changed the generic code to pin init to the boot CPU to begin with. The unpinning of init from the boot CPU happens in sched_init_smp(), which is called after smp_cpus_done(). So smp_cpus_done() is always called on the boot CPU, which means we don't need the work queue call at all - and the lockdep warning goes away. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
| * | powerpc/tm: Fix saving of TM SPRs in core dumpGustavo Romero2017-07-281-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR, TFIAR, TEXASR) to the thread struct, unless the process is currently inside a suspended transaction. If the process is core dumping, and the TM SPRs have changed since the last time the process was context switched, then we will save stale values of the TM SPRs to the core dump. Fix it by saving the live register state to the thread struct in that case. Fixes: 08e1c01d6aed ("powerpc/ptrace: Enable support for TM SPR state") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Gustavo Romero <gromero@linux.vnet.ibm.com> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>