summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* genalloc:support memory-allocation with bytes-alignment to genallocZhao Qiang2015-12-232-10/+78
| | | | | | | | | | | Bytes alignment is required to manage some special RAM, so add gen_pool_first_fit_align to genalloc, meanwhile add gen_pool_alloc_algo to pass algo in case user layer using more than one algo, and pass data to gen_pool_first_fit_align(modify gen_pool_alloc as a wrapper) Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
* powerpc/pseries: Enable kernel CPU dlpar from sysfsNathan Fontenot2015-12-171-0/+6
| | | | | | | | Enable new kernel cpu hotplug functionality by allowing cpu dlpar requests to be initiated from sysfs. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Add CPU dlpar add functionalityNathan Fontenot2015-12-171-0/+116
| | | | | | | | | Add the ability to hotplug add cpus via rtas hotplug events by either specifying the drc index of the CPU to add, or providing a count of the number of CPUs to add. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Add CPU dlpar remove functionalityNathan Fontenot2015-12-172-0/+156
| | | | | | | | | | | | | | Add the ability to dlpar remove CPUs via hotplug rtas events, either by specifying the drc-index of the CPU to remove or providing a count of cpus to remove. To remove multiple cpus in a single request we create a list of possible DR (Dynamic Reconfiguration) cpus and their drc indexes that can be removed. We can then traverse the list remove each cpu and easily clean up in any cases of failure. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Update CPU hotplug error recoveryNathan Fontenot2015-12-171-13/+63
| | | | | | | | Update the cpu dlpar add/remove paths to do better error recovery when a failure occurs during the add/remove operation. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Factor out common cpu hotplug codeNathan Fontenot2015-12-171-31/+39
| | | | | | | | | | | Re-factor the cpu hotplug code to support doing cpu hotplug completely in the kernel and using the existing sysfs probe/release interfaces. This patch pulls out pieces of existing cpu hotplug code into common routines, dlpar_cpu_add() and dlpar_cpu_remove(), to be used by both interfaces. There are no functional changes introduced. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Consolidate CPU hotplug code to hotplug-cpu.cNathan Fontenot2015-12-172-225/+219
| | | | | | | | | No functional changes, this patch is simply a move of the cpu hotplug code from pseries/dlpar.c to pseries/hotplug-cpu.c. This is in an effort to consolidate all of the cpu hotplug code in a common place. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Verify CPU doesn't exist before addingNathan Fontenot2015-12-171-4/+39
| | | | | | | | | | | | | | | | When DLPAR adding a CPU we should verify that the CPU does not already exist. Failure to do so can generate a kernel oops; [ 9.465585] kernel BUG at arch/powerpc/platforms/pseries/dlpar.c:382! [ 9.465796] Oops: Exception in kernel mode, sig: 5 [#1] This oops can be generated by causing a probe to be performed on a cpu by writing to the sysfs cpu probe file (/sys/devices/system/cpu/probe). This patch adds a check for the existence of cpu prior to probing the cpu so userspace doing the wrong thing won't trigger a BUG_ON(). Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/476fpe: Add support for kexecAlistair Popple2015-12-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | PPC476FPE has a different PVR from previous PPC476 processors. The kexec code checks the PVR in order to correctly setup the MMU. When the initial support for 476FPE processors was added the corresponding change in the kexec code was missed. This patch simply adds the check and solves the following bug on kexec: kexec: Starting new kernel Bye! Unable to handle kernel paging request for instruction fetch Faulting instruction address: 0xee9a50f8 cpu 0x0: Vector: 400 (Instruction Access) at [ee9d7d20] pc: ee9a50f8 lr: ee9a50e4 sp: ee9d7dd0 msr: 21020 current = 0xee40f000 pid = 960, comm = kexec enter ? for help [link register ] ee9a50e4 [ee9d7dd0] c0013748 default_machine_kexec+0x58/0x70 (unreliable) [ee9d7df0] c0012f04 machine_kexec+0x34/0x40 [ee9d7e00] c00aa1ec kernel_kexec+0x9c/0xb0 [ee9d7e20] c005d704 SyS_reboot+0x1f4/0x220 [ee9d7f40] c000db68 ret_from_syscall+0x0/0x3c Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Add support for Nvlink NPUsAlistair Popple2015-12-176-13/+502
| | | | | | | | | | | | | | | | | | | | | | NVLink is a high speed interconnect that is used in conjunction with a PCI-E connection to create an interface between CPU and GPU that provides very high data bandwidth. A PCI-E connection to a GPU is used as the control path to initiate and report status of large data transfers sent via the NVLink. On IBM Power systems the NVLink processing unit (NPU) is similar to the existing PHB3. This patch adds support for a new NPU PHB type. DMA operations on the NPU are not supported as this patch sets the TCE translation tables to be the same as the related GPU PCIe device for each NVLink. Therefore all DMA operations are setup and controlled via the PCIe device. EEH is not presently supported for the NPU devices, although it may be added in future. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add __raw_rm_writeq() functionAlistair Popple2015-12-172-10/+11
| | | | | | | | Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to include/asm/io.h so that it can be used by other code. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field"Alistair Popple2015-12-172-0/+2
| | | | | | | | | | | | | This commit removed the pcidev field from struct pci_dn as it was no longer in use by the kernel. However to support finding the association of Nvlink devices to GPU devices from the device-tree this field is required. This reverts commit 250c7b277c65 ("powerpc/pci: Remove unused struct pci_dn.pcidev field"). Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Fix M64 resource name in /proc/iomemGavin Shan2015-12-171-0/+1
| | | | | | | | | | | | | | | | | | | | The name of PCI root bus's M64 resource isn't initialized properly. When dumping "/proc/iomem", "<BAD>" is seen for those M64 resources on PCI root buses. ~# cat /proc/iomem | grep -e "BAD" 3b0000000000-3b0fefffffff : <BAD> 3b1000000000-3b1fefffffff : <BAD> 3c0000000000-3c0fefffffff : <BAD> 3c1000000000-3c1fefffffff : <BAD> 3c2000000000-3c2fefffffff : <BAD> This fixes the issue by setting the name of PCI root bus's M64 resource to that of PHB's device node full name. With the patch, no "<BAD>" is seen from "/proc/iomem". Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Add page soft dirty trackingLaurent Dufour2015-12-175-8/+56
| | | | | | | | | | | | | | | | | | | | | | User space checkpoint and restart tool (CRIU) needs the page's change to be soft tracked. This allows to do a pre checkpoint and then dump only touched pages. This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when the page is swapped out. To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k and hash 64k pte, the bits already defined in hash-*4k.h should be shifted left by one. The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in the swap pte. A check is added to ensure that the bit is not overwritten by _PAGE_HPTEFLAGS. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIESMichael Ellerman2015-12-172-10/+10
| | | | | | | | | | | | | The STD_EXCEPTION_PSERIES macro takes both a vector number, and a location (memory address). However both are always identical, so combine them to save repeating ourselves. This does mean an exception handler must always exist at the location in memory that matches its vector number. But that's OK because this is the "STD" macro (standard), which does exactly that. We have other macros for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Open code SET_DEFAULT_THREAD_PPRMichael Ellerman2015-12-172-14/+7
| | | | | | This is only used in one location, open code it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPRMichael Ellerman2015-12-172-6/+5
| | | | | | HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARDMichael Ellerman2015-12-172-24/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most of our first level exception handlers. It conditionally executes a HMT_MEDIUM instruction, which sets the processor priority to medium. On on modern systems, ie. Power7 and later, it is nop'ed out at boot. All it does is make the exception vectors more cramped, and consume 4 bytes of icache. On old systems it has the effect of boosting the processor priority at the start of exception processing. If we were previously in the idle loop for example, we may be at low or very low priority. This is desirable as we want to process the exception as fast as possible. However looking closely at the generated code, we see that in all cases we execute another HMT_MEDIUM just four instructions later. With code patching applied, the final code on an old (Power6) system will look like, eg: c000000000000300 <data_access_pSeries>: c000000000000300: 7c 42 13 78 mr r2,r2 <- c000000000000304: 7d b2 43 a6 mtsprg 2,r13 c000000000000308: 7d b1 42 a6 mfsprg r13,1 c00000000000030c: f9 2d 00 80 std r9,128(r13) c000000000000310: 60 00 00 00 nop c000000000000314: 7c 42 13 78 mr r2,r2 <- So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is not justified by the benefit of boosting the processor priority for the duration of four instructions, and therefore we drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/rtas: Make enter_rtas() privateMichael Ellerman2015-12-172-1/+3
| | | | | | | | | There are no longer any users of enter_rtas() outside of rtas.c, so make it "private", by moving the declaration inside rtas.c. Hopefully this will encourage people to use one of the wrappers which takes the sharp edges off the RTAS calling sequence. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()Michael Ellerman2015-12-171-10/+2
| | | | | | | | | | Although call_rtas_display_status() does actually want to use the regular RTAS locking, it doesn't want the extra logic that is in rtas_call(), so currently it open codes the logic. Instead we can use rtas_call_unlocked(), after taking the RTAS lock. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Use rtas_call_unlocked() in pseries hotplugMichael Ellerman2015-12-171-8/+3
| | | | | | Avoid open coding the logic by using rtas_call_unlocked(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Use rtas_call_unlocked() in xmonMichael Ellerman2015-12-171-10/+6
| | | | | | Avoid open coding the logic by using rtas_call_unlocked(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/rtas: Add rtas_call_unlocked()Michael Ellerman2015-12-172-11/+35
| | | | | | | | | | | | | | | | | | | | | | | | Most users of RTAS (Run-Time Abstraction Services) use rtas_call(), which deals with locking as well as endian handling. However we have two users outside of rtas.c that can't use rtas_call() because they have different locking requirements. The hotplug CPU code can't take the RTAS lock because the CPU would go offline with the lock held and no other CPUs would be able to call RTAS until the CPU came back online. The xmon code doesn't want to take the lock because it would risk dead locking when we are trying to recover from a crash. Both sites required multiple patches when we added little endian support, proving that programmers can't do endian right. Although that ship has sailed, we can still clean the code up by providing an unlocked version of rtas_call() which avoids the need to open code the logic elsewhere. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPALStewart Smith2015-12-1710-70/+54
| | | | | | | | | | | | Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OPALv3 to be cared about or supported by mainline kernels. So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL exclusively. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: Remove OPALv2 firmware define and referencesStewart Smith2015-12-174-15/+5
| | | | | | | | | | | | | | OPALv2 only ever existed in the lab and didn't escape to the world. All OPAL systems in the wild are OPALv3. The probability of there being an OPALv2 system still powered on anywhere inside IBM is approximately zero, let alone anyone expecting to run mainline kernels. So, start to remove references to OPALv2. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/powernv: panic() on OPAL < V3Stewart Smith2015-12-171-4/+1
| | | | | | | | | | | | | The OpenPower Abstraction Layer firmware went through a couple of iterations in the lab before being released. What we now know as OPAL advertises itself as OPALv3. OPALv2 and OPALv1 never made it outside the lab, and the possibility of anyone at all ever building a mainline kernel today and expecting it to boot on such hardware is zero. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add script to test HMI functionalityDaniel Axtens2015-12-171-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | HMIs (Hypervisor Management|Maintenance Interrupts) are a class of interrupt on POWER systems. HMI support has traditionally been exceptionally difficult to test, however Skiboot ships a tool that, with the correct magic numbers, will inject them. This, therefore, is a first pass at a script to inject HMIs and monitor Linux's response. It injects an HMI on each core on every chip in turn It then watches dmesg to see if it's acknowledged by Linux. On a Tuletta, I observed that we see 8 (or sometimes 9 or more) events per injection, regardless of SMT setting, so we wait for 8 before progressing. It sits in a new scripts/ directory in selftests/powerpc, because it's not designed to be run as part of the regular make selftests process. In particular, it is quite possibly going to end up garding lots of your CPUs, so it should only be run if you know how to undo that. CC: Mahesh J Salgaonkar <mahesh.salgaonkar@in.ibm.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Make context_switch touch FP/altivec/vector by defaultMichael Ellerman2015-12-171-6/+6
| | | | | | | | Simply because it touches more code paths that way, and therefore tests more things. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Anton Blanchard <anton@samba.org>
* selftests/powerpc: Make context_switch do something with no argsMichael Ellerman2015-12-172-9/+24
| | | | | | | | | | | For ease of use make the context_switch test do something useful when called with no arguments. Default to a 30 second run, using threads, doing yield, and use any online cpu. Make it print out what it's doing to avoid confusion. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Anton Blanchard <anton@samba.org>
* selftests/powerpc: Import Anton's context_switch2 benchmarkMichael Ellerman2015-12-173-1/+456
| | | | | | | | | | | This gets referred to a lot in commit messages, so let's pull it into the selftests. Almost vanilla from: http://ozlabs.org/~anton/junkcode/context_switch2.c Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Anton Blanchard <anton@samba.org>
* selftests/powerpc: Move pick_online_cpu() up into utils.cMichael Ellerman2015-12-175-28/+31
| | | | | | | We want to use this in another test, so make it available at the top of the powerpc selftests tree. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Remove broken GregorianDay()Daniel Axtens2015-12-165-39/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GregorianDay() is supposed to calculate the day of the week (tm->tm_wday) for a given day/month/year. In that calcuation it indexed into an array called MonthOffset using tm->tm_mon-1. However tm_mon is zero-based, not one-based, so this is off-by-one. It also means that every January, GregoiranDay() will access element -1 of the MonthOffset array. It also doesn't appear to be a correct algorithm either: see in contrast kernel/time/timeconv.c's time_to_tm function. It's been broken forever, which suggests no-one in userland uses this. It looks like no-one in the kernel uses tm->tm_wday either (see e.g. drivers/rtc/rtc-ds1305.c:319). tm->tm_wday is conventionally set to -1 when not available in hardware so we can simply set it to -1 and drop the function. (There are over a dozen other drivers in drivers/rtc that do this.) Found using UBSAN. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> # as an example of what UBSan finds. Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: rtc-linux@googlegroups.com Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add test to check if VSRs are corruptedRashmica Gupta2015-12-143-1/+105
| | | | | | | | | | | | | | | | | When a transaction is aborted, VSR values should rollback to the checkpointed values before the transaction began. VSRs used elsewhere in the kernel during a transaction, or while the transaction is suspended should not affect the checkpointed values. Prior to the bug fix in commit d31626f70b61 ("powerpc: Don't corrupt transactional state when using FP/VMX in kernel") when VMX was requested by the kernel the .vr_state (which held the checkpointed state of VSRs before the transaction) was overwritten with the current state from outside the transation. Thus if the transaction did not complete, the VSR values would be "rolled back" to potentially incorrect values. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/xmon: Append linux_banner to exception information in xmon.Rashmica Gupta2015-12-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | Currently if you are in xmon without an oops etc. to view the kernel version you have to type "d $linux_banner" - not necessarily obvious. As this is useful information, append to the output of "e" command. Example output: $mon> e cpu 0x1: Vector: 0 at [c0000000f879ba80] pc: c000000000081718: sysrq_handle_xmon+0x68/0x80 lr: c000000000081718: sysrq_handle_xmon+0x68/0x80 sp: c0000000f879bbe0 msr: 8000000000009033 current = 0xc0000000f604d5c0 paca = 0xc00000000fdc0480 softe: 0 irq_happened: 0x01 pid = 2467, comm = bash Linux version 4.4.0-rc2-00008-gc51af91c3ab3-dirty (rashmica@circle) (gcc version 5.1.1 20150629 (GCC) ) #45 SMP Wed Nov 25 10:25:12 AEDT 2015 Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/cell: Remove the Cell QPACE codeRashmica Gupta2015-12-146-161/+0
| | | | | | | | All users of QPACE have upgraded to QPACE2 so remove the Cell QPACE code. Signed-off-by: Rashmica Gupta <rashmicy@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/pseries: Limit EPOW reset event warningsVipin K Parashar2015-12-141-24/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel prints respective warnings about various EPOW events for user information/action after parsing EPOW interrupts. At times below EPOW reset event warning is seen to be flooding kernel log over a period of time. May 25 03:46:34 alp kernel: Non critical power or cooling issue cleared May 25 03:46:52 alp kernel: Non critical power or cooling issue cleared May 25 03:53:48 alp kernel: Non critical power or cooling issue cleared May 25 03:55:46 alp kernel: Non critical power or cooling issue cleared May 25 03:56:34 alp kernel: Non critical power or cooling issue cleared May 25 03:59:04 alp kernel: Non critical power or cooling issue cleared May 25 04:02:01 alp kernel: Non critical power or cooling issue cleared These EPOW reset events are spurious in nature and are triggered by firmware without an actual EPOW event being reset. This patch avoids these multiple EPOW reset warnings by using a counter variable. This variable is incremented every time an EPOW event is reported. Upon receiving a EPOW reset event the same variable is checked to filter out spurious events and decremented accordingly. This patch also improves log messages to better describe EPOW event being reported. Merged adjacent log messages into single one to reduce number of lines printed per event. Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add TM signal with invalid stack testMichael Neuling2015-12-143-1/+78
| | | | | | | | | | Test the kernels signal generation code to ensure it can handle an invalid stack pointer when transactional. Signed-off-by: Michael Neuling <mikey@neuling.org> Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Skip if we don't have TM] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add TM signal return testMichael Neuling2015-12-143-1/+76
| | | | | | | | | | | Test the kernel's signal return code to ensure that it doesn't crash when both the transactional and suspend MSR bits are set in the signal context. Signed-off-by: Michael Neuling <mikey@neuling.org> Tested-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Skip if we don't have TM] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Skip tm-resched-dscr if we don't have TMMichael Ellerman2015-12-142-2/+5
| | | | Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Move TM helpers into tm.hMichael Ellerman2015-12-142-11/+35
| | | | | | | Move have_htm_nosc() into a new tm.h, and add a new helper, have_htm() which we'll use in the next patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Add have_hwcap2() helperMichael Ellerman2015-12-143-4/+8
| | | | | | We already do this twice and want to add another so add a helper. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* selftests/powerpc: Move get_auxv_entry() into utils.cMichael Ellerman2015-12-145-45/+63
| | | | | | | This doesn't really belong in harness.c, it's a helper function. So move it into utils.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge tag 'powerpc-4.4-3' into nextMichael Ellerman2015-12-144-5/+32
|\ | | | | | | | | | | | | | | | | | | Merge the two TM fixes we merged in 4.4. We are about to merge selftests for these, and without the fixes the selftests will oops. powerpc fixes for 4.4 #2 - tm: Block signal return from setting invalid MSR state from Michael Neuling - tm: Check for already reclaimed tasks from Michael Neuling
| * powerpc/tm: Check for already reclaimed tasksMichael Neuling2015-11-231-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we can hit a scenario where we'll tm_reclaim() twice. This results in a TM bad thing exception because the second reclaim occurs when not in suspend mode. The scenario in which this can happen is the following. We attempt to deliver a signal to userspace. To do this we need obtain the stack pointer to write the signal context. To get this stack pointer we must tm_reclaim() in case we need to use the checkpointed stack pointer (see get_tm_stackpointer()). Normally we'd then return directly to userspace to deliver the signal without going through __switch_to(). Unfortunatley, if at this point we get an error (such as a bad userspace stack pointer), we need to exit the process. The exit will result in a __switch_to(). __switch_to() will attempt to save the process state which results in another tm_reclaim(). This tm_reclaim() now causes a TM Bad Thing exception as this state has already been saved and the processor is no longer in TM suspend mode. Whee! This patch checks the state of the MSR to ensure we are TM suspended before we attempt the tm_reclaim(). If we've already saved the state away, we should no longer be in TM suspend mode. This has the additional advantage of checking for a potential TM Bad Thing exception. Found using syscall fuzzer. Fixes: fb09692e71f1 ("powerpc: Add reclaim and recheckpoint functions for context switching transactional memory processes") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/tm: Block signal return setting invalid MSR stateMichael Neuling2015-11-233-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we allow both the MSR T and S bits to be set by userspace on a signal return. Unfortunately this is a reserved configuration and will cause a TM Bad Thing exception if attempted (via rfid). This patch checks for this case in both the 32 and 64 bit signals code. If both T and S are set, we mark the context as invalid. Found using a syscall fuzzer. Fixes: 2b0a576d15e0 ("powerpc: Add new transactional memory state to the signal context") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Print MSR TM bits in oops messagesMichael Neuling2015-12-141-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Print MSR TM bits in oops messages. This appends them to the end like this: MSR: 8000000502823031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[TE]> You get the TM[] only if at least one TM MSR bit is set. Inside the TM[], E means Enabled (bit 32), S means Suspended (bit 33), and T means Transactional (bit 34) If no bits are set, you get no TM[] output. Include rework of printbits() to handle this case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Make {cmp}xchg* and their atomic_ versions fully orderedBoqun Feng2015-12-141-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_ versions all need to be fully ordered, however they are now just RELEASE+ACQUIRE, which are not fully ordered. So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") This patch depends on patch "powerpc: Make value-returning atomics fully ordered" for PPC_ATOMIC_ENTRY_BARRIER definition. Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc: Make value-returning atomics fully orderedBoqun Feng2015-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to memory-barriers.txt: > Any atomic operation that modifies some state in memory and returns > information about the state (old or new) implies an SMP-conditional > general memory barrier (smp_mb()) on each side of the actual > operation ... Which mean these operations should be fully ordered. However on PPC, PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation, which is currently "lwsync" if SMP=y. The leading "lwsync" can not guarantee fully ordered atomics, according to Paul Mckenney: https://lkml.org/lkml/2015/10/14/970 To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee the fully-ordered semantics. This also makes futex atomics fully ordered, which can avoid possible memory ordering problems if userspace code relies on futex system call for fully ordered semantics. Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm: Don't open code pgtable_t sizeAneesh Kumar K.V2015-12-141-1/+1
| | | | | | | | | | | | | | | | | | The slot information of base page size hash pte is stored in the pgtable_t w.r.t transparent hugepage. We need to make sure we don't index beyond pgtable_t size. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm: Use H_READ with H_READ_4Aneesh Kumar K.V2015-12-142-27/+44
| | | | | | | | | | | | | | This will bulk read 4 hash pte slot entries and should reduce the loop Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>