linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	PCI/hotplug: PowerPC PowerNV PCI hotplug driver	Gavin Shan	2016-06-21	4	-0/+750
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds standalone driver to support PCI hotplug for PowerPC PowerNV platform that runs on top of skiboot firmware. The firmware identifies hotpluggable slots and marked their device tree node with proper "ibm,slot-pluggable" and "ibm,reset-by-firmware". The driver scans device tree nodes to create/register PCI hotplug slot accordingly. The PCI slots are organized in fashion of tree, which means one PCI slot might have parent PCI slot and parent PCI slot possibly contains multiple child PCI slots. At the plugging time, the parent PCI slot is populated before its children. The child PCI slots are removed before their parent PCI slot can be removed from the system. If the skiboot firmware doesn't support slot status retrieval, the PCI slot device node shouldn't have property "ibm,reset-by-firmware". In that case, none of valid PCI slots will be detected from device tree. The skiboot firmware doesn't export the capability to access attention LEDs yet and it's something for TBD. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Functions to get/set PCI slot state	Gavin Shan	2016-06-21	5	-1/+115
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This exports 4 functions, which base on the corresponding OPAL APIs to get/set PCI slot status. Those functions are going to be used by PowerNV PCI hotplug driver: pnv_pci_get_device_tree() opal_get_device_tree() pnv_pci_get_presence_state() opal_pci_get_presence_state() pnv_pci_get_power_state() opal_pci_get_power_state() pnv_pci_set_power_state() opal_pci_set_power_state() Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Introduce pnv_pci_get_slot_id()	Gavin Shan	2016-06-21	2	-0/+40
\| \| \| \| \| \| \| \| \| \|	This introduces pnv_pci_get_slot_id() to get the hotpluggable PCI slot ID from the corresponding device node. It will be used by hotplug driver. Requested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Use PCI slot reset infrastructure	Gavin Shan	2016-06-21	1	-1/+40
\| \| \| \| \| \| \| \| \| \| \| \| \|	The (OPAL) firmware might provide the PCI slot reset capability which is identified by property "ibm,reset-by-firmware" on the PCI slot associated device node. This routes the reset request to firmware if "ibm,reset-by-firmware" exists in the PCI slot device node. Otherwise, the reset is done inside kernel as before. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Support PCI slot ID	Gavin Shan	2016-06-21	3	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The reset and poll functionality from (OPAL) firmware supports PHB and PCI slot at same time. They are identified by ID. This supports PCI slot ID by: * Rename the argument name for opal_pci_reset() and opal_pci_poll() accordingly * Rename pnv_eeh_phb_poll() to pnv_eeh_poll() and adjust its argument name. * One macro is added to produce PCI slot ID. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Delay populating pdn	Gavin Shan	2016-06-21	9	-59/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The pdn (struct pci_dn) instances are allocated from memblock or bootmem when creating PCI controller (hoses) in setup_arch(). PCI hotplug, which will be supported by proceeding patches, releases PCI device nodes and their corresponding pdn on unplugging event. The memory chunks for pdn instances allocated from memblock or bootmem are hard to reused after being released. This delays creating pdn by pci_devs_phb_init() from setup_arch() to core_initcall() so that they are allocated from slab. The memory consumed by pdn can be released to system without problem during PCI unplugging time. It indicates that pci_dn is unavailable in setup_arch() and the the fixup on pdn (like AGP's) can't be carried out that time. We have to do that in pcibios_root_bridge_prepare() on maple/pasemi/powermac platforms where/when the pdn is available. pcibios_root_bridge_prepare is called from subsys_initcall() which is executed after core_initcall() so the code flow does not change. At the mean while, the EEH device is created when pdn is populated, meaning pdn and EEH device have same life cycle. In turn, we needn't call eeh_dev_init() to create EEH device explicitly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Update bridge windows on PCI plug	Gavin Shan	2016-06-21	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On the PCI plugging event, PCI slot's subordinate devices are scanned and their (IO and MMIO) resources are assigned. Platform dependent resources (PE#, IO/MMIO/DMA windows) are allocated or created on updating windows of the slot's upstream bridge. This updates the windows of the hot plugged slot's upstream bridge in pcibios_finish_adding_to_bus() so that the platform resources (PE#, IO/MMIO/DMA segments) are allocated or created accordingly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Dynamically release PE	Gavin Shan	2016-06-21	2	-0/+175
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This supports releasing PEs dynamically. A reference count is introduced to PE representing number of PCI devices associated with the PE. The reference count is increased when PCI device joins the PE and decreased when PCI device leaves the PE in pnv_pci_release_device(). When the count becomes zero, the PE and its consumed resources are released. Note that the count is accessed concurrently. So a counter with "int" type is enough here. In order to release the sources consumed by the PE, couple of helper functions are introduced as below: * pnv_pci_ioda1_unset_window() - Unset IODA1 DMA32 window * pnv_pci_ioda1_release_dma_pe() - Release IODA1 DMA32 segments * pnv_pci_ioda2_release_dma_pe() - Release IODA2 DMA resource * pnv_ioda_release_pe_seg() - Unmap IO/M32/M64 segments Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Make pnv_ioda_deconfigure_pe() visible	Gavin Shan	2016-06-21	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	pnv_ioda_deconfigure_pe() is visible only when CONFIG_PCI_IOV is enabled. The function will be used to tear down PE's associated mapping in PCI hotplug path that doesn't depend on CONFIG_PCI_IOV. This makes pnv_ioda_deconfigure_pe() visible and not depend on CONFIG_PCI_IOV. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Extend PCI bridge resources	Gavin Shan	2016-06-21	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PCI slots are associated with root port or downstream ports of the PCIe switch connected to root port. When adapter is hot added to the PCI slot, it usually requests more IO or memory resource from the directly connected parent bridge (port) and update the bridge's windows accordingly. The resource windows of upstream bridges can't be updated automatically. It possibly leads to unbalanced resource across the bridges: The window of downstream bridge is overruning that of upstream bridge. The IO or MMIO path won't work. This resolves the above issue by extending bridge windows of root port and upstream port of the PCIe switch connected to the root port to PHB's windows. The windows of root port and bridge behind that are extended to the PHB's windows to accomodate the PCI hotplug happening in future. The PHB's 64KB 32-bits MSI region is included in bridge's M32 windows (in hardware) though it's excluded in the corresponding resource, as the bridge's M32 windows have 1MB as their minimal alignment. We observed EEH error during system boot when the MSI region is included in bridge's M32 window. This excludes top 1MB (including 64KB 32-bits MSI region) region from bridge's M32 windows when extending them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Setup PE for root bus	Gavin Shan	2016-06-21	2	-10/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no parent bridge for root bus, meaning pcibios_setup_bridge() isn't invoked for root bus. The PE for root bus is the ancestor of other PEs in PELTV. It means we need PE for root bus populated before all others. This populates the PE for root bus in pcibios_setup_bridge() path if it's not populated yet. The PE number next to the reserved one is used as the PE# to avoid holes in continuous M64 space. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Create PEs in pcibios_setup_bridge()	Gavin Shan	2016-06-21	1	-115/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the PEs and their associated resources are assigned in ppc_md.pcibios_fixup() except those used by SRIOV VFs. The function is called for once after PCI probing and resources assignment is completed. So it's obviously not hotplug friendly. This creates PEs dynamically in pcibios_setup_bridge() that is called for the event during system bootup and PCI hotplug: updating PCI bridge's windows after resource assignment/reassignment are done. In partial hotplug case, not all PCI devices included to one particular PE are unplugged and plugged again, we just need unbinding/binding the hot added PCI devices with the corresponding PE without creating new one. The change is applied to IODA1 and IODA2 PHBs only. The behaviour on NPU PHBs aren't changed. There are no PCI bridges on NPU PHBs, meaning pcibios_setup_bridge() won't be invoked there. We have to use old path (pnv_pci_ioda_fixup()) to setup PEs on NPU PHBs. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Allocate PE# in reverse order	Gavin Shan	2016-06-21	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PE number for one particular PE can be allocated dynamically or reserved according to the consumed M64 (64-bits prefetchable) segments of the PE. The M64 segment can't be remapped to arbitrary PE, meaning the PE number is determined according to the index of the consumed M64 segment. As below figure shows, M64 resource grows from low to high end, meaning the PE (number) reserved according to M64 segment grows from low to high end as well, so does the dynamically allocated PE number. It will lead to conflict: PE number (M64 segment) reserved by dynamic allocation is required by hot added PCI adapter at later point. It fails the PCI hotplug because of the PE number can't be reserved based on the index of the consumed M64 segment. +---+---+---+---+---+--------------------------------+-----+ \| 0 \| 1 \| 2 \| 3 \| 4 \| ....... \| 255 \| +---+---+---+---+---+--------------------------------+-----+ PE number for dynamic allocation -----------------> PE number reserved for M64 segment -----------------> To resolve above conflicts, this forces the PE number to be allocated dynamically in reverse order. With this patch applied, the PE numbers are reserved in ascending order, but allocated dynamically in reverse order. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Increase PE# capacity	Gavin Shan	2016-06-21	2	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each PHB maintains an array helping to translate 2-bytes Request ID (RID) to PE# with the assumption that PE# takes one byte, meaning that we can't have more than 256 PEs. However, pci_dn->pe_number already had 4-bytes for the PE#. This extends the PE# capacity for every PHB. After that, the PE number is represented by 4-bytes value. Then we can reuse IODA_INVALID_PE to check the PE# in phb->pe_rmap[] is valid or not. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Move pnv_pci_ioda_setup_opal_tce_kill() around	Gavin Shan	2016-06-21	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pnv_pci_ioda_setup_opal_tce_kill() called by pnv_ioda_setup_dma() to remap the TCE kill regiter. What's done in pnv_ioda_setup_dma() will be covered in pcibios_setup_bridge() which is invoked on each PCI bridge. It means we will possibly remap the TCE kill register for multiple times and it's unnecessary. This moves pnv_pci_ioda_setup_opal_tce_kill() to where the PHB is initialized (pnv_pci_init_ioda_phb()) to avoid above issue. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/powernv: Remove PCI_RESET_DELAY_US	Gavin Shan	2016-06-21	1	-3/+0
\| \| \| \| \| \| \| \| \|	The macro defined in arch/powerpc/platforms/powernv/pci.c isn't used by anyone. Just remove it. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Override pcibios_setup_bridge()	Gavin Shan	2016-06-21	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \|	This overrides pcibios_setup_bridge() that is called to update PCI bridge windows when PCI resource assignment is completed, to assign PE and setup various (resource) mapping for the PE in subsequent patches. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	PCI: Add pcibios_setup_bridge()	Gavin Shan	2016-06-21	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, PowerPC PowerNV platform utilizes ppc_md.pcibios_fixup(), which is called for once after PCI probing and resource assignment are completed, to allocate platform required resources for PCI devices: PE#, IO and MMIO mapping, DMA address translation (TCE) table etc. Obviously, it's not hotplug friendly. This adds weak function pcibios_setup_bridge(), which is called by pci_setup_bridge(). PowerPC PowerNV platform will reuse the function to assign above platform required resources to newly plugged PCI devices during PCI hotplug in subsequent patches. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: export cpu_to_core_id()	Mauricio Faria de Oliveira	2016-06-21	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Export cpu_to_core_id(). This will be used by the lpfc driver. This enables topology_core_id() from <linux/topology.h> (defined to cpu_to_core_id() in arch/powerpc/include/asm/topology.h) to be used by (non-builtin) modules. That is arch-neutral, already used by eg, drivers/base/topology.c, but it is builtin (obj-y in Makefile) thus didn't need the export. Since the module uses topology_core_id() and this is defined to cpu_to_core_id(), it needs the export, otherwise: ERROR: "cpu_to_core_id" [drivers/scsi/lpfc/lpfc.ko] undefined! Tested on next-20160601. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	selftests/powerpc: Load Monitor Register Tests	Jack Miller	2016-06-21	6	-1/+227
\| \| \| \| \| \| \| \| \| \| \|	Adds two tests. One is a simple test to ensure that the new registers LMRR and LMSER are properly maintained. The other actually uses the existing EBB test infrastructure to test that LMRR and LMSER behave as documented. Signed-off-by: Jack Miller <jack@codezen.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Load Monitor Register Support	Jack Miller	2016-06-21	4	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This enables new registers, LMRR and LMSER, that can trigger an EBB in userspace code when a monitored load (via the new ldmx instruction) loads memory from a monitored space. This facility is controlled by a new FSCR bit, LM. This patch disables the FSCR LM control bit on task init and enables that bit when a load monitor facility unavailable exception is taken for using it. On context switch, this bit is then used to determine whether the two relevant registers are saved and restored. This is done lazily for performance reasons. Signed-off-by: Jack Miller <jack@codezen.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Improve FSCR init and context switching	Michael Neuling	2016-06-21	3	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a few issues with FSCR init and switching. In commit 152d523e6307 ("powerpc: Create context switch helpers save_sprs() and restore_sprs()") we moved the setting of the FSCR register from inside an CPU_FTR_ARCH_207S section to inside just a CPU_FTR_ARCH_DSCR section. Hence we are setting FSCR on POWER6/7 where the FSCR doesn't exist. This is harmless but we shouldn't do it. Also, we can simplify the FSCR context switch. We don't need to go through the calculation involving dscr_inherit. We can just restore what we saved last time. We also set an initial value in INIT_THREAD, so that pid 1 which is cloned from that gets a sane value. Based on patch by Jack Miller. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Fix misleading comment in early_setup_secondary()	Madhavan Srinivasan	2016-06-21	1	-1/+1
\| \| \| \| \| \| \| \| \|	Current comment in the early_setup_secondary() for paca->soft_enabled update is misleading. Comment should say to Mark interrupts "disabled" instead of "enabled". Fix the typo. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/kprobes: Remove kretprobe_trampoline_holder.	Thiago Jung Bauermann	2016-06-21	1	-6/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes the following testsuite failure: $ sudo ./perf test -v kallsyms 1: vmlinux symtab matches kallsyms : --- start --- test child forked, pid 12489 Using /proc/kcore for kernel object code Looking at the vmlinux_path (8 entries long) Using /boot/vmlinux for symbols 0xc00000000003d300: diff name v: .kretprobe_trampoline_holder k: kretprobe_trampoline Maps only in vmlinux: c00000000086ca38-c000000000879b6c 87ca38 [kernel].text.unlikely c000000000879b6c-c000000000bf0000 889b6c [kernel].meminit.text c000000000bf0000-c000000000c53264 c00000 [kernel].init.text c000000000c53264-d000000004250000 c63264 [kernel].exit.text d000000004250000-d000000004450000 0 [libcrc32c] d000000004450000-d000000004620000 0 [xfs] d000000004620000-d000000004680000 0 [autofs4] d000000004680000-d0000000046e0000 0 [x_tables] d0000000046e0000-d000000004780000 0 [ip_tables] d000000004780000-d0000000047e0000 0 [rng_core] d0000000047e0000-ffffffffffffffff 0 [pseries_rng] Maps in vmlinux with a different name in kallsyms: Maps only in kallsyms: d000000000000000-f000000000000000 1000000000010000 [kernel.kallsyms] f000000000000000-ffffffffffffffff 3000000000010000 [kernel.kallsyms] test child finished with -1 ---- end ---- vmlinux symtab matches kallsyms: FAILED! The problem is that the kretprobe_trampoline symbol looks like this: $ eu-readelf -s /boot/vmlinux G kretprobe_trampoline 2431: c000000001302368 24 NOTYPE LOCAL DEFAULT 37 kretprobe_trampoline_holder 2432: c00000000003d300 8 FUNC LOCAL DEFAULT 1 .kretprobe_trampoline_holder 97543: c00000000003d300 0 NOTYPE GLOBAL DEFAULT 1 kretprobe_trampoline Its type is NOTYPE, and its size is 0, and this is a problem because symbol-elf.c:dso__load_sym skips function symbols that are not STT_FUNC or STT_GNU_IFUNC (this is determined by elf_sym__is_function). Even if the type is changed to STT_FUNC, when dso__load_sym calls symbols__fixup_duplicate, the kretprobe_trampoline symbol is dropped in favour of .kretprobe_trampoline_holder because the latter has non-zero size (as determined by choose_best_symbol). With this patch, all vmlinux symbols match /proc/kallsyms and the testcase passes. Commit c1c355ce14c0 ("x86/kprobes: Get rid of kretprobe_trampoline_holder()") gets rid of kretprobe_trampoline_holder altogether on x86. This commit does the same on powerpc. This change introduces no regressions on the perf and ftracetest testsuite results. Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pci: Fix SRIOV not building without EEH enabled	Russell Currey	2016-06-17	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Book3E CPUs (and possibly other configs), it is possible to have SRIOV (CONFIG_PCI_IOV) set without CONFIG_EEH. The SRIOV code does not check for this, and if EEH is disabled, pci_dn.c fails to build. Fix this by gating the EEH-specific code in the SRIOV implementation behind CONFIG_EEH. Fixes: 39218cd0 ("powerpc/eeh: EEH device for VF") Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Make vPHB device node match adapter's	Frederic Barrat	2016-06-16	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On bare-metal, when a device is attached to the cxl card, lsvpd shows a location code such as (with cxlflash): # lsvpd -l sg22 ... YL U78CB.001.WZS0073-P1-C33-B0-T0-L0 which makes it hard to easily identify the cxl adapter owning the flash device, since in this example C33 refers to a P8 processor. lsvpd looks in the parent devices until it finds a location code, so the device node for the vPHB ends up being used. By reusing the device node of the adapter for the vPHB, lsvpd shows: # lsvpd -l sg16 ... YL U78C9.001.WZS09XA-P1-C7-B1-T0-L3 where C7 is the PCI slot of the cxl adapter. On powerVM, the vPHB was already using the adapter device node, so there's no change there. Tested by cxlflash on bare-metal and powerVM. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Add support for CAPP DMA mode	Ian Munsie	2016-06-16	4	-2/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for using CAPP DMA mode, which is required for XSL based cards such as the Mellanox CX4 to function. This is currently an RFC as it depends on the corresponding support to be merged into skiboot first, which was submitted here: http://patchwork.ozlabs.org/patch/625582/ In the event that the skiboot on the system does not have the above support, it will indicate as such in the kernel log and abort the init process. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Abstract the differences between the PSL and XSL	Frederic Barrat	2016-06-16	4	-48/+218
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The XSL (Translation Service Layer) is a stripped down version of the PSL (Power Service Layer) used in some cards such as the Mellanox CX4. Like the PSL, it implements the CAIA architecture, but has a number of differences, mostly in it's implementation dependent registers. This adds an ops structure to abstract these differences to bring initial support for XSL CAPI devices. The XSL does not implement the optional architected SERR register, however while it treats it as a reserved register and should work with no special treatment, attempting to access it will cause the XSL_FEC (First Error Capture) register to be filled out, preventing it from capturing any subsequent errors. Therefore, this patch also prevents the kernel from trying to set up the SERR register so that the FEC register may still be useful, and to save one interrupt. The XSL also uses a special DMA cxl mode, which uses a slightly different init sequence for the CAPP and PHB. The kernel support for this will be in a future patch once the corresponding support has been merged into skiboot. Co-authored-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: Update process element after allocating interrupts	Ian Munsie	2016-06-16	4	-17/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the kernel API, it is possible to attempt to allocate AFU interrupts after already starting a context. Since the process element structure used by the hardware is only filled out at the time the context is started, it will not be updated with the interrupt numbers that have just been allocated and therefore AFU interrupts will not work unless they were allocated prior to starting the context. This can present some difficulties as each CAPI enabled PCI device in the kernel API has a default context, which may need to be started very early to enable translations, potentially before interrupts can easily be set up. This patch makes the API more flexible to allow interrupts to be allocated after a context has already been started and takes care of updating the PE structure used by the hardware and notifying it to discard any cached copy it may have. The update is currently performed via a terminate/remove/add sequence. This is necessary on some hardware such as the XSL that does not properly support the update LLCMD. Note that this is only supported on powernv at present - attempting to perform this ordering on PowerVM will raise a warning. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	cxl: static-ify variables to fix sparse warnings	Andrew Donnellan	2016-06-16	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Make a couple more variables static. Found by sparse. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: fbarrat@linux.vnet.ibm.com Reviewed-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/align: Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE	Daniel Axtens	2016-06-16	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sparse complains that it doesn't know what REG_BYTE is: arch/powerpc/kernel/align.c:313:29: error: undefined identifier 'REG_BYTE' REG_BYTE is defined differently based on whether we're compiling for LE, BE32 or BE64. Sparse apparently doesn't provide __BIG_ENDIAN__ or __LITTLE_ENDIAN__, which means we get no definition. Rather than check for __BIG_ENDIAN__ and then separately for __LITTLE_ENDIAN__, just switch the #ifdef to check for __BIG_ENDIAN__ and then #else we define the little endian version. Technically that's dicey because PDP_ENDIAN is also a possibility, but we already do it in a lot of places so one more hardly matters. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/sparse: Include headers containing prototypes	Daniel Axtens	2016-06-16	2	-0/+3
\| \| \| \| \| \| \| \| \| \|	Sometimes headers that provide prototypes for functions are accidentally omitted from the files that define the functions. Fix a couple of times that occurs. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Introduce asm-prototypes.h	Daniel Axtens	2016-06-16	7	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Sparse picked up a number of functions that are implemented in C and then only referred to in asm code. This introduces asm-prototypes.h, which provides a place for prototypes of these functions. This silences some sparse warnings. Signed-off-by: Daniel Axtens <dja@axtens.net> [mpe: Add include guards, clean up copyright & GPL text] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/sparse: make some things static	Daniel Axtens	2016-06-16	4	-7/+7
\| \| \| \| \| \| \| \|	This is just a smattering of things picked up by sparse that should be made static. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Add array bounds checking to crash_shutdown_handlers	Suraj Jitindar Singh	2016-06-16	1	-4/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The array crash_shutdown_handles is an array of size CRASH_HANDLER_MAX+1 containing up to CRASH_HANDLER_MAX shutdown_handlers. It is assumed to be NULL terminated, which it is under normal circumstances. Array accesses in the functions crash_shutdown_unregister() and default_machine_crash_shutdown() rely on this NULL termination property when traversing this list and don't protect again out of bounds accesses. If the NULL terminator were somehow overwritten these functions could potentially access out of the bounds of the array. Shrink the array to size CRASH_HANDLER_MAX and implement explicit array bounds checking when accessing the elements of the crash_shutdown_handles[] array in crash_shutdown_unregister() and default_machine_crash_shutdown(). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/mm: Ensure "special" zones are empty	Oliver O'Halloran	2016-06-16	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mm zone mechanism was traditionally used by arch specific code to partition memory into allocation zones. However there are several zones that are managed by the mm subsystem rather than the architecture. Most architectures set the max PFN of these special zones to zero, however on powerpc we set them to ~0ul. This, in conjunction with a bug in free_area_init_nodes() results in all of system memory being placed in ZONE_DEVICE when enabled. Device memory cannot be used for regular kernel memory allocations so this will cause a kernel panic at boot. Given the planned addition of more mm managed zones (ZONE_CMA) we should aim to be consistent with every other architecture and set the max PFN for these zones to zero. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/asm: Remove unused symbols in asm-offsets.c	Rashmica Gupta	2016-06-16	1	-49/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	THREAD_DSCR: Added in efcac6589a27 "powerpc: Per process DSCR + some fixes (try#4)" Last usage removed in 152d523e6307 "powerpc: Create context switch helpers save_sprs() and restore_sprs()" THREAD_DSCR_INHERIT: Added in 714332858bfd "powerpc: Restore correct DSCR in context switch" Last usage removed in 152d523e6307 "powerpc: Create context switch helpers save_sprs() and restore_sprs()" THREAD_TAR: Added in 2468dcf641e4 "powerpc: Add support for context switching the TAR register" Last usage removed in 152d523e6307 "powerpc: Create context switch helpers save_sprs() and restore_sprs()" THREAD_BESCR, THREAD_EBBHR and THREAD_EBBRR: Added in 9353374b8e15 "powerpc: Context switch the new EBB SPRs" Last usage removed in 152d523e6307 "powerpc: Create context switch helpers save_sprs() and restore_sprs()" THREAD_SIAR, THREAD_SDAR, THREAD_SIER, THREAD_MMCR0, and THREAD_MMCR2: Added in 59affcd3e460 "powerpc: Context switch more PMU related SPRs" Last usage removed in b11ae95100f7 "powerpc: Partial revert of "Context switch more PMU related SPRs"" PACA_LOCK_TOKEN: Added in 9e368f291560 "KVM: PPC: book3s_hv: Add support for PPC970-family processors" Last usage removed in c17b98cf6028 "KVM: PPC: Book3S HV: Remove code for PPC970 processors" HCALL_STAT_SIZE, HCALL_STAT_CALLS, HCALL_STAT_TB and HCALL_STAT_PURR: Added in 57852a853b0d "[POWERPC] powerpc: Instrument Hypervisor Calls" Last usage removed in c8cd093a6e9f "powerpc: tracing: Add hypervisor call tracepoints" VCPU_EPLC: Added in d30f6e480055 "KVM: PPC: booke: category E.HV (GS-mode) support" Never used. CPU_DOWN_FLUSH: Added in e7affb1dba0e "powerpc/cache: add cache flush operation for various e500" Never used. CFG_STAMP_XSEC: Added in 14cf11af6cf6 "powerpc: Merge enough to start building in arch/powerpc." Last usage removed in 0e469db8f70c "powerpc: Rework VDSO gettimeofday to prevent time going backwards" KVM_LPCR: Added in aa04b4cc5be6 "KVM: PPC: Allocate RMAs (Real Mode Areas) at boot for use by guests" Last usage removed in a0144e2a6b0b "KVM: PPC: Book3S HV: Store LPCR value for each virtual core" GPR15, GPR16, GPR17, GPR18, GPR19, GPR20, GPR21, GPR22, GPR23, GPR24, GPR25, GPR26, GPR27, GPR28, GPR29, GPR30 and GPR31: Added in 14cf11af6cf6 "powerpc: Merge enough to start building in arch/powerpc." Never used. VCPU_SHADOW_FSCR: Added in 616dff860282 "KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR" Never used. VCPU_SHADOW_SRR1: Added in a2d56020d1d9 "KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu" Never used. KVM_SPLIT_SIZE: Added in b4deba5c41e9 "KVM: PPC: Book3S HV: Implement dynamicmicro-threading on POWER8" Never used. VCPU_VCPUID: Added in de56a948b918 "KVM: PPC: Add support for Book3S processors in hypervisor mode" Last usage removed 1b400ba0cd24 "KVM: PPC: Book3S HV: Improve handling of local vs. global TLB invalidations" _MQ: Added in 14cf11af6cf6 "powerpc: Merge enough to start building in arch/powerpc." Never used. AUDITCONTEXT: Added in 14cf11af6cf6 "powerpc: Merge enough to start building in arch/powerpc." Last usage removed in 401d1f029beb "[PATCH] syscall entry/exit revamp" CLONE_VM: Added in 14cf11af6cf6 "powerpc: Merge enough to start building in arch/powerpc." Currently unused. CLONE_UNTRACED: Added in 14cf11af6cf6 "powerpc: Merge enough to start building in arch/powerpc." Currently unused. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> [mpe: Munge change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/numa: Fix multiple bugs in memory_hotplug_max()	Bharata B Rao	2016-06-14	1	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum addressable memory by referring to ibm,dyanamic-memory property. There are three problems with the current approach: 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes all the LMBs of the guest, but that is not true for PowerKVM which populates only DR LMBs (LMBs that can be hotplugged/removed) in that property. 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive at the max possible address. Since ibm,dynamic-memory doesn't include RMA LMBs, the address thus obtained will be less than the actual max address. For example, if max possible memory size is 32G, with lmb-size of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA which won't be present here). hot_add_drconf_memory_max() would then return the max addressable memory as 127 * 256MB = 31.75GB, the max address should have been 32G which is what ibm,lrdr-capacity shows. 3 In PowerKVM, there can be a gap between the end of boot time RAM and beginning of hotplug RAM area. So just multiplying lmb-count with lmb-size will not provide the correct max possible address for PowerKVM. This patch fixes 1 by using ibm,lrdr-capacity property to return the max addressable memory whenever the property is present. Then it fixes 2 & 3 by fetching the address of the last LMB in ibm,dynamic-memory property. Fixes: cd34206e949b ("powerpc: Add memory_hotplug_max()") Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/numa: Fix whitespace in hot_add_drconf_memory_max()	Bharata B Rao	2016-06-14	1	-10/+10
\| \| \| \| \| \|	Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/spinlock: Fix spin_unlock_wait()	Boqun Feng	2016-06-14	2	-22/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is an ordering issue with spin_unlock_wait() on powerpc, because the spin_lock primitive is an ACQUIRE and an ACQUIRE is only ordering the load part of the operation with memory operations following it. Therefore the following event sequence can happen: CPU 1 CPU 2 CPU 3 ================== ==================== ============== spin_unlock(&lock); spin_lock(&lock): r1 = lock; // r1 == 0; o = object; o = READ_ONCE(object); // reordered here object = NULL; smp_mb(); spin_unlock_wait(&lock); lock = 1; smp_mb(); o->dead = true; < o = READ_ONCE(object); > // reordered upwards if (o) // true BUG_ON(o->dead); // true!! To fix this, we add a "nop" ll/sc loop in arch_spin_unlock_wait() on ppc, the "nop" ll/sc loop reads the lock value and writes it back atomically, in this way it will synchronize the view of the lock on CPU1 with that on CPU2. Therefore in the scenario above, either CPU2 will fail to get the lock at first or CPU1 will see the lock acquired by CPU2, both cases will eliminate this bug. This is a similar idea as what Will Deacon did for ARM64 in: d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers") Furthermore, if the "nop" ll/sc figures out the lock is locked, we actually don't need to do the "nop" ll/sc trick again, we can just do a normal load+check loop for the lock to be released, because in that case, spin_unlock_wait() is called when someone is holding the lock, and the store part of the "nop" ll/sc happens before the lock release of the current lock holder: "nop" ll/sc -> spin_unlock() and the lock release happens before the next lock acquisition: spin_unlock() -> spin_lock() <next holder> which means the "nop" ll/sc happens before the next lock acquisition: "nop" ll/sc -> spin_unlock() -> spin_lock() <next holder> With a smp_mb() preceding spin_unlock_wait(), the store of object is guaranteed to be observed by the next lock holder: STORE -> smp_mb() -> "nop" ll/sc -> spin_unlock() -> spin_lock() <next holder> This patch therefore fixes the issue and also cleans the arch_spin_unlock_wait() a little bit by removing superfluous memory barriers in loops and consolidating the implementations for PPC32 and PPC64 into one. Suggested-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> [mpe: Inline the "nop" ll/sc loop and set EH=0, munge change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Fix trivial typo in function name	Greg Kurz	2016-06-14	1	-3/+3
\| \| \| \| \|	Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Remove unused pstore headers in nvram.c	Geliang Tang	2016-06-14	1	-2/+0
\| \| \| \| \| \| \| \|	Since the pstore code has moved away from nvram.c, remove unused pstore headers pstore.h and kmsg_dump.h. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Define and use PPC64_ELF_ABI_v2/v1	Michael Ellerman	2016-06-14	16	-35/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're approaching 20 locations where we need to check for ELF ABI v2. That's fine, except the logic is a bit awkward, because we have to check that _CALL_ELF is defined and then what its value is. So check it once in asm/types.h and define PPC64_ELF_ABI_v2 when ELF ABI v2 is detected. We also have a few places where what we're really trying to check is that we are using the 64-bit v1 ABI, ie. function descriptors. So also add a #define for that, which simplifies several checks. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Remove MPIC from pseries event sources	Rashmica Gupta	2016-06-14	1	-40/+13
\| \| \| \| \| \| \| \| \|	MPIC was only used by Power3 which is now unsupported, so remove MPIC code. XICS is now the only supported interrupt controller for pSeries so do some cleanups too. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Remove MPIC from pseries cpu hotplug	Rashmica Gupta	2016-06-14	1	-13/+0
\| \| \| \| \| \| \| \|	MPIC was only used by Power3 which is now unsupported, so remove MPIC code. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Remove MPIC from pseries kexec	Rashmica Gupta	2016-06-14	3	-30/+3
\| \| \| \| \| \| \| \| \|	MPIC was only used by Power3 which is now unsupported, so remove MPIC code. XICS is now the only supported interrupt controller for pSeries so do some cleanups too. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Remove MPIC from pseries smp	Rashmica Gupta	2016-06-14	3	-31/+8
\| \| \| \| \| \| \| \| \|	MPIC was only used by Power3 which is now unsupported, so remove MPIC code. XICS is now the only supported interrupt controller for pSeries so do some cleanups too. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/pseries: Drop support for MPIC in pseries	Rashmica Gupta	2016-06-14	1	-72/+5
\| \| \| \| \| \| \| \| \|	MPIC was only used by Power3 which is now unsupported, so drop support for MPIC. XICS is now the only supported interrupt controller for pSeries so make the XICS functions generic. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc: Various typo fixes	Michael Ellerman	2016-06-14	36	-40/+40
\| \| \| \| \|	Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
*	powerpc/32: Get rid of sub_reloc_offset()	Christophe Leroy	2016-06-14	1	-14/+0
\| \| \| \| \| \| \| \| \|	sub_reloc_offset() has not been used since commit 917f0af9e5a9 ("powerpc: Remove arch/ppc and include/asm-ppc") which removed include/asm-ppc/prom.h. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>