summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/rtas: clean up includesNathan Lynch2022-12-071-26/+16
| | | | | | | | | | | | | | | rtas.c used to host complex code related to pseries-specific guest migration and suspend, which used atomics, completions, hcalls, and CPU hotplug APIs. That's all been deleted or moved, so remove the include directives that have been rendered unnecessary. Sort the remainder (with linux/ before asm/) to impose some order on where future additions go. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-8-nathanl@linux.ibm.com
* powerpc/rtas: clean up rtas_error_log_max initializationNathan Lynch2022-12-071-11/+26
| | | | | | | | | | | | | | | | | | | | The code in rtas_get_error_log_max() doesn't cause problems in practice, but there are no measures to ensure that the lazy initialization of the static rtas_error_log_max variable is atomic, and it's not worth adding them. Initialize the static rtas_error_log_max variable at boot when we're single-threaded instead of lazily on first use. Use the more appropriate of_property_read_u32() API instead of rtas_token() to consult the "rtas-error-log-max" property, which is not the name of an RTAS function. Convert use of printk() to pr_warn() and distinguish the possible error cases. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-7-nathanl@linux.ibm.com
* powerpc/pseries/eeh: use correct API for error log sizeNathan Lynch2022-12-071-10/+1
| | | | | | | | | | | | | rtas-error-log-max is not the name of an RTAS function, so rtas_token() is not the appropriate API for retrieving its value. We already have rtas_get_error_log_max() which returns a sensible value if the property is absent for any reason, so use that instead. Fixes: 8d633291b4fc ("powerpc/eeh: pseries platform EEH error log retrieval") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> [mpe: Drop no-longer possible error handling as noticed by ajd] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-6-nathanl@linux.ibm.com
* powerpc/rtas: avoid scheduling in rtas_os_term()Nathan Lynch2022-12-071-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's unsafe to use rtas_busy_delay() to handle a busy status from the ibm,os-term RTAS function in rtas_os_term(): Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 2, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Tainted: G D 6.0.0-rc5-02182-gf8553a572277-dirty #9 Call Trace: [c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable) [c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0 [c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0 [c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4 [c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68 [c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50 [c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0 [c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0 [c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0 [c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420 [c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200 Use rtas_busy_delay_time() instead, which signals without side effects whether to attempt the ibm,os-term RTAS call again. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com
* powerpc/rtas: avoid device tree lookups in rtas_os_term()Nathan Lynch2022-12-071-3/+10
| | | | | | | | | | | | | | | | | | | | rtas_os_term() is called during panic. Its behavior depends on a couple of conditions in the /rtas node of the device tree, the traversal of which entails locking and local IRQ state changes. If the kernel panics while devtree_lock is held, rtas_os_term() as currently written could hang. Instead of discovering the relevant characteristics at panic time, cache them in file-static variables at boot. Note the lookup for "ibm,extended-os-term" is converted to of_property_read_bool() since it is a boolean property, not an RTAS function token. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> [mpe: Incorporate suggested change from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com
* powerpc/rtasd: use correct OF API for event scan rateNathan Lynch2022-12-071-2/+5
| | | | | | | | | | | | | | | rtas_token() should be used only for properties that are RTAS function tokens. "rtas-event-scan-rate" does not contain a function token, but it has the same size/format as token properties so reading it with rtas_token() happens to work. Convert to of_property_read_u32(). Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-3-nathanl@linux.ibm.com
* powerpc/rtas: document rtas_call()Nathan Lynch2022-12-072-15/+58
| | | | | | | | | | | | rtas_call() has a complex calling convention, non-standard return values, and many users. Add kernel-doc for it and remove the less structured commentary from rtas.h. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221118150751.469393-2-nathanl@linux.ibm.com
* powerpc/pseries: unregister VPA when hot unplugging a CPULaurent Dufour2022-12-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The VPA should unregister when offlining a CPU. Otherwise there could be a short window where 2 CPUs could share the same VPA. This happens because the hypervisor is still keeping the VPA attached to the vCPU even if it became offline. Here is a potential situation: 1. remove proc A, 2. add proc B. If proc B gets proc A's place in cpu_present_mask, then it registers proc A's VPAs. 3. If proc B is then re-added to the LP, its threads are sharing VPAs with proc A briefly as they come online. As the hypervisor may check for the VPA's yield_count field oddity, it may detect an unexpected value and kill the LPAR. Suggested-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com> [mpe: s/cpu_present_map/cpu_present_mask/ in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114160150.13554-1-ldufour@linux.ibm.com
* powerpc/pseries: reset the RCU watchdogs after a LPMLaurent Dufour2022-12-071-2/+5
| | | | | | | | | | | The RCU watchdog timer should be reset when restarting the CPU after a Live Partition Mobility operation. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Combine comments into a single comment block] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221125173204.15329-1-ldufour@linux.ibm.com
* powerpc: Take in account addition CPU node when building kexec FDTLaurent Dufour2022-12-071-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a system with a large number of CPUs, the creation of the FDT for a kexec kernel may fail because the allocated FDT is not large enough. When this happens, such a message is displayed on the console: Unable to add ibm,processor-vadd-size property: FDT_ERR_NOSPACE The property's name may change depending when the buffer overwrite is detected. Obviously the created FDT is missing information, and it is expected that system dump or kexec kernel failed to run properly. When the FDT is allocated, the size of the FDT the kernel received at boot time is used and an extra size can be applied. Currently, only memory added after boot time is taken in account, not the CPU nodes. The extra size should take in account these additional CPU nodes and compute the required extra space. To achieve that, the size of a CPU node, including its subnode is computed once and multiplied by the number of additional CPU nodes. The assumption is that the size of the CPU node is _same_ for all the node, the only variable part should be the name "PowerPC,POWERxx@##" where "##" may vary a little. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> [mpe: Don't shadow function name w/variable, minor coding style changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221110180619.15796-3-ldufour@linux.ibm.com
* powerpc: export the CPU node countLaurent Dufour2022-12-072-0/+4
| | | | | | | | | | | | | | At boot time, the FDT is parsed to compute the number of CPUs. In addition count the number of CPU nodes and export it. This is useful when building the FDT for a kexeced kernel since we need to take in account the CPU node added since the boot time during CPU hotplug operations. Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221110180619.15796-2-ldufour@linux.ibm.com
* powerpc/cpuidle: Set CPUIDLE_FLAG_POLLING for snooze stateAboorva Devarajan2022-12-062-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the comparative study of cpuidle governors, it is noticed that the menu governor does not select CEDE state in some scenarios even though when the sleep duration of the CPU exceeds the target residency of the CEDE idle state this is because the CPU exits the snooze "polling" state when snooze time limit is reached in the snooze_loop(), which is not a real wake up and it just means that the polling state selection was not adequate. cpuidle governors rely on CPUIDLE_FLAG_POLLING flag to be set for the polling states to handle the condition mentioned above. Hence, set the CPUIDLE_FLAG_POLLING flag for snooze state (polling state) in powerpc arch to make the cpuidle governor work as expected. Reference Commits: - Timeout enabled for snooze state: commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state") - commit dc2251bf98c6 ("cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol") - Fix wakeup stats in governor for polling states commit 5f26bdceb9c0 ("cpuidle: menu: Fix wakeup statistics updates for polling state") Signed-off-by: Aboorva Devarajan <aboorvad@linux.vnet.ibm.com> Tested-by: Vishal Chourasia <vishalc@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Reviewed-by: Vishal Chourasia <vishalc@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221114145611.37669-1-aboorvad@linux.vnet.ibm.com
* powerpc/dts/fsl: Fix pca954x i2c-mux node namesGeert Uytterhoeven2022-12-066-6/+6
| | | | | | | | | | | | | | | | | | "make dtbs_check": arch/powerpc/boot/dts/fsl/t1040rdb-rev-a.dtb: pca9546@77: $nodename:0: 'pca9546@77' does not match '^(i2c-?)?mux' From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml arch/powerpc/boot/dts/fsl/t1024qds.dtb: pca9547@77: Unevaluated properties are not allowed ('#address-cells', '#size-cells', 'i2c@0', 'i2c@2', 'i2c@3' were unexpected) From schema: Documentation/devicetree/bindings/i2c/i2c-mux-pca954x.yaml ... Fix this by renaming pca954x nodes to "i2c-mux", to match the I2C bus multiplexer/switch DT bindings and the Generic Names Recommendation in the Devicetree Specification. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6c5d86c49ac170e9d56ab121ea0602f3873849ca.1669999298.git.geert+renesas@glider.be
* cxl: Remove unnecessary cxl_pci_window_alignment()Bjorn Helgaas2022-12-061-7/+0
| | | | | | | | | | | | | | | | | cxl_pci_window_alignment() is referenced only via the struct pci_controller_ops.window_alignment function pointer, and only in the powerpc implementation of pcibios_window_alignment(). pcibios_window_alignment() defaults to returning 1 if the function pointer is NULL, which is the same was what cxl_pci_window_alignment() does. cxl_pci_window_alignment() is unnecessary, so remove it. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221205223231.1268085-1-helgaas@kernel.org
* selftests/powerpc: Fix resource leaksMiaoqian Lin2022-12-051-1/+4
| | | | | | | | | | | | | | In check_all_cpu_dscr_defaults, opendir() opens the directory stream. Add missing closedir() in the error path to release it. In check_cpu_dscr_default, open() creates an open file descriptor. Add missing close() in the error path to release it. Fixes: ebd5858c904b ("selftests/powerpc: Add test for all DSCR sysfs interfaces") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221205084429.570654-1-linmq006@gmail.com
* powerpc/code-patching: Remove protection against patching init addresses ↵Christophe Leroy2022-12-023-15/+1
| | | | | | | | | | | | | | | | | | | | | after init Once init section is freed, attempting to patch init code ends up in the weed. Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") protected patch_instruction() against that, but it is the responsibility of the caller to ensure that the patched memory is valid. All callers have now been verified and fixed so the check can be removed. This improves ftrace activation by about 2% on 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/504310828f473d424e2ed229eff57bf075f52796.1669969781.git.christophe.leroy@csgroup.eu
* powerpc/feature-fixups: Do not patch init section after initChristophe Leroy2022-12-021-0/+12
| | | | | | | | | | | | | | | | | | Once init section is freed, attempting to patch init code ends up in the weed. Commit 51c3c62b58b3 ("powerpc: Avoid code patching freed init sections") protected patch_instruction() against that, but it is the responsibility of the caller to ensure that the patched memory is valid. In the same spirit as jump_label with its jump_label_can_update() function, add is_fixup_addr_valid() function to skip patching on freed init section. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8e9311fc1b057e4e6a2a3a0701ebcc74b787affe.1669969781.git.christophe.leroy@csgroup.eu
* powerpc/feature-fixups: Refactor other fixups patchingChristophe Leroy2022-12-021-49/+28
| | | | | | | | | | | Several fonctions have the same loop for patching instructions. Introduce function do_patch_fixups() to refactor those loops. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/58ab36949c18f94d466fc98d6c085783b0cd474f.1669969781.git.christophe.leroy@csgroup.eu
* powerpc/feature-fixups: Refactor entry fixups patchingChristophe Leroy2022-12-021-52/+32
| | | | | | | | | | | Several fonctions have the same loop for patching instructions. Introduce function do_patch_entry_fixups() to refactor those loops. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/79eeff7b20a98f7136da5f79b1f7c436928f27f3.1669969781.git.christophe.leroy@csgroup.eu
* powerpc/code-patching: Remove #ifdef CONFIG_STRICT_KERNEL_RWXChristophe Leroy2022-12-021-11/+5
| | | | | | | | | | | | | | | | | | | No need to have one implementation of patch_instruction() for CONFIG_STRICT_KERNEL_RWX and one for !CONFIG_STRICT_KERNEL_RWX. In patch_instruction(), call raw_patch_instruction() when !CONFIG_STRICT_KERNEL_RWX. In poking_init(), bail out immediately, it will be equivalent to the weak default implementation. Everything else is declared static and will be discarded by GCC when !CONFIG_STRICT_KERNEL_RWX. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f67d2a109404d03e8fdf1ea15388c8778337a76b.1669969781.git.christophe.leroy@csgroup.eu
* powerpc/ftrace: fix syscall tracing on PPC64_ELF_ABI_V1Michael Jeanson2022-12-021-12/+0
| | | | | | | | | | | | | | | | | | | | | | In v5.7 the powerpc syscall entry/exit logic was rewritten in C, on PPC64_ELF_ABI_V1 this resulted in the symbols in the syscall table changing from their dot prefixed variant to the non-prefixed ones. Since ftrace prefixes a dot to the syscall names when matching them to build its syscall event list, this resulted in no syscall events being available. Remove the PPC64_ELF_ABI_V1 specific version of arch_syscall_match_sym_name to have the same behavior across all powerpc variants. Fixes: 68b34588e202 ("powerpc/64/sycall: Implement syscall entry/exit logic in C") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201161442.2127231-1-mjeanson@efficios.com
* powerpc/64: Sanitise user registers on interrupt in pseries, POWERNVRohan McLure2022-12-021-1/+1
| | | | | | | | | | | | | Cause pseries and POWERNV platforms to default to zeroising all potentially user-defined registers when entering the kernel by means of any interrupt source, reducing user-influence of the kernel and the likelihood or producing speculation gadgets. Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-7-rmclure@linux.ibm.com
* powerpc/64e: Clear gprs on interrupt routine entry on Book3ERohan McLure2022-12-021-2/+2
| | | | | | | | | | | | | | | | | | Zero GPRS r14-r31 on entry into the kernel for interrupt sources to limit influence of user-space values in potential speculation gadgets. Prior to this commit, all other GPRS are reassigned during the common prologue to interrupt handlers and so need not be zeroised explicitly. This may be done safely, without loss of register state prior to the interrupt, as the common prologue saves the initial values of non-volatiles, which are unconditionally restored in interrupt_64.S. Mitigation defaults to enabled by INTERRUPT_SANITIZE_REGISTERS. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-6-rmclure@linux.ibm.com
* powerpc/64s: Zeroise gprs on interrupt routine entry on Book3SRohan McLure2022-12-022-11/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Zeroise user state in gprs (assign to zero) to reduce the influence of user registers on speculation within kernel syscall handlers. Clears occur at the very beginning of the sc and scv 0 interrupt handlers, with restores occurring following the execution of the syscall handler. Zeroise GPRS r0, r2-r11, r14-r31, on entry into the kernel for all other interrupt sources. The remaining gprs are overwritten by entry macros to interrupt handlers, irrespective of whether or not a given handler consumes these register values. If an interrupt does not select the IMSR_R12 IOption, zeroise r12. Prior to this commit, r14-r31 are restored on a per-interrupt basis at exit, but now they are always restored on 64bit Book3S. Remove explicit REST_NVGPRS invocations on 64-bit Book3S. 32-bit systems do not clear user registers on interrupt, and continue to depend on the return value of interrupt_exit_user_prepare to determine whether or not to restore non-volatiles. The mmap_bench benchmark in selftests should rapidly invoke pagefaults. See ~0.8% performance regression with this mitigation, but this indicates the worst-case performance due to heavier-weight interrupt handlers. This mitigation is able to be enabled/disabled through CONFIG_INTERRUPT_SANITIZE_REGISTERS. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-5-rmclure@linux.ibm.com
* powerpc/64s: IOption for MSR stored in r12Rohan McLure2022-12-021-0/+7
| | | | | | | | | | | | | | | | | Interrupt handlers in asm/exceptions-64s.S contain a great deal of common code produced by the GEN_COMMON macros. Currently, at the exit point of the macro, r12 will contain the contents of the MSR. A future patch will cause these macros to zeroise architected registers to avoid potential speculation influence of user data. Provide an IOption that signals that r12 must be retained, as the interrupt handler assumes it to hold the contents of the MSR. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-4-rmclure@linux.ibm.com
* powerpc/64: Sanitise common exit code for interruptsRohan McLure2022-12-021-0/+6
| | | | | | | | | | | | | | | Interrupt code is shared between Book3E/S 64-bit systems for interrupt handlers. Ensure that exit code correctly restores non-volatile gprs on each system when CONFIG_INTERRUPT_SANITIZE_REGISTERS is enabled. Also introduce macros for clearing/restoring registers on interrupt entry for when this configuration option is either disabled or enabled. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-3-rmclure@linux.ibm.com
* powerpc/64: Add interrupt register sanitisation macrosRohan McLure2022-12-021-0/+19
| | | | | | | | | | | | | | | | | | | | Include in asm/ppc_asm.h macros to be used in multiple successive patches to implement zeroising architected registers in interrupt handlers. Registers will be sanitised in this fashion in future patches to reduce the speculation influence of user-controlled register values. These mitigations will be configurable through the CONFIG_INTERRUPT_SANITIZE_REGISTERS Kconfig option. Included are macros for conditionally zeroising registers and restoring as required with the mitigation enabled. With the mitigation disabled, non-volatiles must be restored on demand at separate locations to those required by the mitigation. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-2-rmclure@linux.ibm.com
* powerpc/64: Add INTERRUPT_SANITIZE_REGISTERS KconfigRohan McLure2022-12-021-0/+9
| | | | | | | | | | | | | | | | | | | | | Add Kconfig option for enabling clearing of registers on arrival in an interrupt handler. This reduces the speculation influence of registers on kernel internals. The option will be consumed by 64-bit systems that feature speculation and wish to implement this mitigation. This patch only introduces the Kconfig option, no actual mitigations. The primary overhead of this mitigation lies in an increased number of registers that must be saved and restored by interrupt handlers on Book3S systems. Enable by default on Book3E systems, which prior to this patch eagerly save and restore register state, meaning that the mitigation when implemented will have minimal overhead. Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221201071019.1953023-1-rmclure@linux.ibm.com
* powerpc/hv-gpci: Fix hv_gpci event listKajol Jain2022-12-024-2/+58
| | | | | | | | | | | | | | | | | | | | | Based on getPerfCountInfo v1.018 documentation, some of the hv_gpci events were deprecated for platform firmware that supports counter_info_version 0x8 or above. Fix the hv_gpci event list by adding a new attribute group called "hv_gpci_event_attrs_v6" and a "ENABLE_EVENTS_COUNTERINFO_V6" macro to enable these events for platform firmware that supports counter_info_version 0x6 or below. And assigning the hv_gpci event list based on output counter info version of underlying plaform. Fixes: 97bf2640184f ("powerpc/perf/hv-gpci: add the remaining gpci requests") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221130174513.87501-1-kjain@linux.ibm.com
* powerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in ↵Yang Yingliang2022-12-021-1/+1
| | | | | | | | | | | | | | of_fsl_spi_probe() If platform_device_add() is not called or failed, it can not call platform_device_del() to clean up memory, it should call platform_device_put() in error case. Fixes: 26f6cb999366 ("[POWERPC] fsl_soc: add support for fsl_spi") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221029111626.429971-1-yangyingliang@huawei.com
* Merge branch 'topic/qspinlock' into nextMichael Ellerman2022-12-028-63/+1215
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge Nick's powerpc qspinlock implementation. From his cover letter: This replaces the generic queued spinlock code (like s390 does) with our own implementation. Generic PV qspinlock code is causing latency / starvation regressions on large systems that are resulting in hard lockups reported (mostly in pathoogical cases). The generic qspinlock code has a number of issues important for powerpc hardware and hypervisors that aren't easily solved without changing code that would impact other architectures. Follow s390's lead and implement our own for now. Issues for powerpc using generic qspinlocks: - The previous lock value should not be loaded with simple loads, and need not be passed around from previous loads or cmpxchg results, because powerpc uses ll/sc-style atomics which can perform more complex operations that do not require this. powerpc implementations tend to prefer loads use larx for improved coherency performance. - The queueing process should absolutely minimise the number of stores to the lock word to reduce exclusive coherency probes, important for large system scalability. The pending logic is counter productive here. - Non-atomic unlock for paravirt locks is important (atomic instructions tend to still be more expensive than x86 CPUs). - Yielding to the lock owner is important in the oversubscribed paravirt case, which requires storing the owner CPU in the lock word. - More control of lock stealing for the paravirt case is important to keep latency down on large systems. - The lock acquisition operation should always be made with a special variant of atomic instructions with the lock hint bit set, including (especially) in the queueing paths. This is more a matter of adding more arch lock helpers so not an insurmountable problem for generic code.
| * powerpc/qspinlock: add compile-time tuning adjustmentsNicholas Piggin2022-12-022-6/+94
| | | | | | | | | | | | | | | | | | | | | | This adds compile-time options that allow the EH lock hint bit to be enabled or disabled, and adds some new options that may or may not help matters. To help with experimentation and tuning. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-18-npiggin@gmail.com
| * powerpc/qspinlock: provide accounting and options for sleepy locksNicholas Piggin2022-12-022-19/+230
| | | | | | | | | | | | | | | | | | | | | | | | | | Finding the owner or a queued waiter on a lock with a preempted vcpu is indicative of an oversubscribed guest causing the lock to get into trouble. Provide some options to detect this situation and have new CPUs avoid queueing for a longer time (more steal iterations) to minimise the problems caused by vcpu preemption on the queue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-17-npiggin@gmail.com
| * powerpc/qspinlock: allow indefinite spinning on a preempted ownerNicholas Piggin2022-12-021-15/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | Provide an option that holds off queueing indefinitely while the lock owner is preempted. This could reduce queueing latencies for very overcommitted vcpu situations. This is disabled by default. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-16-npiggin@gmail.com
| * powerpc/qspinlock: reduce remote node steal spinsNicholas Piggin2022-12-021-3/+40
| | | | | | | | | | | | | | | | | | | | | | | | Allow for a reduction in the number of times a CPU from a different node than the owner can attempt to steal the lock before queueing. This could bias the transfer behaviour of the lock across the machine and reduce NUMA crossings. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-15-npiggin@gmail.com
| * powerpc/qspinlock: use spin_begin/end APINicholas Piggin2022-12-021-4/+35
| | | | | | | | | | | | | | | | | | | | | | Use the spin_begin/spin_cpu_relax/spin_end APIs in qspinlock, which helps to prevent threads issuing a lot of expensive priority nops which may not have much effect due to immediately executing low then medium priority. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-14-npiggin@gmail.com
| * powerpc/qspinlock: allow lock stealing in trylock and lock fastpathNicholas Piggin2022-12-022-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change allows trylock to steal the lock. It also allows the initial lock attempt to steal the lock rather than bailing out and going to the slow path. This gives trylock more strength: without this a continually-contended lock will never permit a trylock to succeed. With this change, the trylock has a small but non-zero chance. It also gives the lock fastpath most of the benefit of passing the reservation back through to the steal loop in the slow path without the complexity. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-13-npiggin@gmail.com
| * powerpc/qspinlock: add ability to prod new queue head CPUNicholas Piggin2022-12-021-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After the head of the queue acquires the lock, it releases the next waiter in the queue to become the new head. Add an option to prod the new head if its vCPU was preempted. This may only have an effect if queue waiters are yielding. Disable this option by default for now, i.e., no logical change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-12-npiggin@gmail.com
| * powerpc/qspinlock: allow propagation of yield CPU down the queueNicholas Piggin2022-12-021-0/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having all CPUs poll the lock word for the owner CPU that should be yielded to defeats most of the purpose of using MCS queueing for scalability. Yet it may be desirable for queued waiters to yield to a preempted owner. With this change, queue waiters never sample the owner CPU directly from the lock word. The queue head (which is spinning on the lock) propagates the owner CPU back to the next waiter if it finds the owner has been preempted. That waiter then propagates the owner CPU back to the next waiter, and so on. s390 addresses this problem differenty, by having queued waiters sample the lock word to find the owner at a low frequency. That has the advantage of being simpler, the advantage of propagation is that the lock word never has to be accesed by queued waiters, and the transfer of cache lines to transmit the owner data is only required when lock holder vCPU preemption occurs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-11-npiggin@gmail.com
| * powerpc/qspinlock: allow stealing when head of queue yieldsNicholas Piggin2022-12-021-3/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If the head of queue is preventing stealing but it finds the owner vCPU is preempted, it will yield its cycles to the owner which could cause it to become preempted. Add an option to re-allow stealers before yielding, and disallow them again after returning from the yield. Disable this option by default for now, i.e., no logical change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-10-npiggin@gmail.com
| * powerpc/qspinlock: implement option to yield to previous nodeNicholas Piggin2022-12-021-1/+45
| | | | | | | | | | | | | | | | | | | | | | | | Queued waiters which are not at the head of the queue don't spin on the lock word but their qnode lock word, waiting for the previous queued CPU to release them. Add an option which allows these waiters to yield to the previous CPU if its vCPU is preempted. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-9-npiggin@gmail.com
| * powerpc/qspinlock: paravirt yield to lock ownerNicholas Piggin2022-12-021-12/+87
| | | | | | | | | | | | | | | | | | | | | | Waiters spinning on the lock word should yield to the lock owner if the vCPU is preempted. This improves performance when the hypervisor has oversubscribed physical CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-8-npiggin@gmail.com
| * powerpc/qspinlock: store owner CPU in lock wordNicholas Piggin2022-12-023-4/+22
| | | | | | | | | | | | | | | | | | | | Store the owner CPU number in the lock word so it may be yielded to, as powerpc's paravirtualised simple spinlocks do. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-7-npiggin@gmail.com
| * powerpc/qspinlock: theft prevention to control latencyNicholas Piggin2022-12-022-1/+60
| | | | | | | | | | | | | | | | | | | | | | Give the queue head the ability to stop stealers. After a number of spins without successfully acquiring the lock, the queue head sets this, which halts stealing and will assure it is the next owner. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-6-npiggin@gmail.com
| * powerpc/qspinlock: allow new waiters to steal the lock before queueingNicholas Piggin2022-12-022-9/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow new waiters to "steal" the lock before queueing. That is, to acquire it while other CPUs have queued. This particularly helps paravirt performance when physical CPUs are oversubscribed, by keeping the lock from becoming a strict FIFO and vCPU preemption causing queue train wrecks. The new __queued_spin_trylock_steal() function is put in qspinlock.h to save having to move it, because it will be used there by a later change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-5-npiggin@gmail.com
| * powerpc/qspinlock: convert atomic operations to assemblyNicholas Piggin2022-12-023-42/+68
| | | | | | | | | | | | | | | | | | | | | | This uses more optimal ll/sc style access patterns (rather than cmpxchg), and also sets the EH=1 lock hint on those operations which acquire ownership of the lock. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-4-npiggin@gmail.com
| * powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.Nicholas Piggin2022-12-022-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | The first 16 bits of the lock are only modified by the owner, and other modifications always use atomic operations on the entire 32 bits, so unlocks can use plain stores on the 16 bits. This is the same kind of optimisation done by core qspinlock code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-3-npiggin@gmail.com
| * powerpc/qspinlock: add mcs queueing for contended waitersNicholas Piggin2022-12-023-6/+214
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This forms the basis of the qspinlock slow path. Like generic qspinlocks and unlike the vanilla MCS algorithm, the lock owner does not participate in the queue, only waiters. The first waiter spins on the lock word, then when the lock is released it takes ownership and unqueues the next waiter. This is how qspinlocks can be implemented with the spinlock API -- lock owners don't need a node, only waiters do. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221126095932.1234527-2-npiggin@gmail.com
| * powerpc/qspinlock: powerpc qspinlock implementationNicholas Piggin2022-12-028-69/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a powerpc specific implementation of queued spinlocks. This is the build framework with a very simple (non-queued) spinlock implementation to begin with. Later changes add queueing, and other features and optimisations one-at-a-time. It is done this way to more easily see how the queued spinlocks are built, and to make performance and correctness bisects more useful. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop paravirt.h & processor.h changes to fix 32-bit build] [mpe: Fix 32-bit build of qspinlock.o & disallow GENERIC_LOCKBREAK per Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/CONLLQB6DCJU.2ZPOS7T6S5GRR@bobo
* | selftests/powerpc: Add ptrace setup_core_pattern() null-terminatorBenjamin Gray2022-12-021-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - malloc() does not zero the buffer, - fread() does not null-terminate it's output, - `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is not inherently null-terminated So using string operations on the buffer is risky. Explicitly add a null character to the end to make it safer. Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20221128041948.58339-3-bgray@linux.ibm.com