linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	mac: Make cuda_init_via() __init	Geert Uytterhoeven	2013-07-01	1	-1/+1
\| \| \| \| \| \| \|	cuda_init_via() is called from find_via_cuda() only, which is __init. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Delete __cpuinit usage from all users	Paul Gortmaker	2013-07-01	19	-50/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the powerpc uses of the __cpuinit macros. There are no __CPUINIT users in assembly files in powerpc. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	macintosh: Convert use of typedef ctl_table to struct ctl_table	Joe Perches	2013-07-01	1	-4/+4
\| \| \| \| \| \| \|	This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/idle: Convert use of typedef ctl_table to struct ctl_table	Joe Perches	2013-07-01	1	-2/+2
\| \| \| \| \| \| \|	This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/iommu: Remove unused pci_iommu_init() and pci_direct_iommu_init()	Bjorn Helgaas	2013-07-01	1	-7/+0
\| \| \| \| \| \| \| \|	pci_iommu_init() and pci_direct_iommu_init() are not referenced anywhere, so remove them. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Don't flush/invalidate the d/icache for an unknown relocation type	Kevin Hao	2013-07-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For an unknown relocation type since the value of r4 is just the 8bit relocation type, the sum of r4 and r7 may yield an invalid memory address. For example: In normal case: r4 = c00xxxxx r7 = 40000000 r4 + r7 = 000xxxxx For an unknown relocation type: r4 = 000000xx r7 = 40000000 r4 + r7 = 400000xx 400000xx is an invalid memory address for a board which has just 512M memory. And for operations such as dcbst or icbi may cause bus error for an invalid memory address on some platforms and then cause the board reset. So we should skip the flush/invalidate the d/icache for an unknown relocation type. Signed-off-by: Kevin Hao <haokexin@gmail.com> Acked-by: Suzuki K. Poulose <suzuki@in.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/windfarm: Fix overtemperature clearing	Aaro Koskinen	2013-07-01	3	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With pm81/pm91/pm121, when the overtemperature state is entered, and when it remains on after skipped ticks, the driver will try to leave it too soon (immediately on the next tick). This is because the active FAILURE_OVERTEMP state is not visible in "new_failure" variable of the current tick. Furthermore, the driver will keep trying to clear condition in subsequent ticks as FAILURE_OVERTEMP remains set in the "last_failure" variable. These will start to trigger WARNINGS from windfarm core: [ 100.082735] windfarm: Clamping CPU frequency to minimum ! [ 100.108132] windfarm: Overtemp condition detected ! [ 101.952908] windfarm: Overtemp condition cleared ! [...] [ 102.980388] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 103.982227] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 105.030494] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 105.973666] WARNING: at drivers/macintosh/windfarm_core.c:463 [...] [ 106.977913] WARNING: at drivers/macintosh/windfarm_core.c:463 Fix by adding a helper global variable. We leave the overtemp state only after all failure bits have been cleared. I saw this error on iMac G5 iSight (pm121). Also pm81/pm91 are fixed based on the observation that these are almost identical/copy-pasted code. Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Use dev-node in PCI config accessors	Gavin Shan	2013-07-01	3	-89/+79
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, we're using the combo (PCI bus + devfn) in the PCI config accessors and PCI config accessors in EEH depends on them. However, it's not safe to refer the PCI bus which might have been removed during hotplug. So we're using device node in the PCI config accessors and the corresponding backends just reuse them. The patch also fix one potential risk: We possiblly have frozen PE during the early PCI probe time, but we haven't setup the PE mapping yet. So the errors should be counted to PE#0. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Avoid build warnings	Gavin Shan	2013-07-01	3	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch is for avoiding following build warnings: The function .pnv_pci_ioda_fixup() references the function __init .eeh_init(). This is often because .pnv_pci_ioda_fixup lacks a __init The function .pnv_pci_ioda_fixup() references the function __init .eeh_addr_cache_build(). This is often because .pnv_pci_ioda_fixup lacks a __init Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Refactor the output message	Gavin Shan	2013-07-01	3	-16/+41
\| \| \| \| \| \| \| \| \| \| \| \|	We needn't the the whole backtrace other than one-line message in the error reporting interrupt handler. For errors triggered by access PCI config space or MMIO, we replace "WARN(1, ...)" with pr_err() and dump_stack(). The patch also adds more output messages to indicate what EEH core is doing. Besides, some printk() are replaced with pr_warning(). Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Fix address catch for PowerNV	Gavin Shan	2013-07-01	2	-1/+2
\| \| \| \| \| \| \| \| \|	On the PowerNV platform, the EEH address cache isn't built correctly because we skipped the EEH devices without binding PE. The patch fixes that. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Replace variables with flags	Gavin Shan	2013-07-01	3	-8/+11
\| \| \| \| \| \| \| \| \|	We have 2 fields in "struct pnv_phb" to trace the states. The patch replace the fields with one and introduces flags for that. The patch doesn't impact the logic. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Check PCIe link after reset	Gavin Shan	2013-07-01	1	-13/+144
\| \| \| \| \| \| \| \| \| \| \|	After reset (e.g. complete reset) in order to bring the fenced PHB back, the PCIe link might not be ready yet. The patch intends to make sure the PCIe link is ready before accessing its subordinate PCI devices. The patch also fixes that wrong values restored to PCI_COMMAND register for PCI bridges. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Don't collect PCI-CFG data on PHB	Gavin Shan	2013-07-01	1	-9/+23
\| \| \| \| \| \| \| \| \| \|	When the PHB is fenced or dead, it's pointless to collect the data from PCI config space of subordinate PCI devices since it should return 0xFF's. The patch also fixes overwritten buffer while getting PCI config data. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/tm: Clear MSR RI in non-recoverable TM code	Michael Neuling	2013-06-30	1	-2/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we treclaim and trecheckpoint there's an unavoidable period when r1 will not be a valid kernel stack pointer. This patch clears the MSR recoverable interrupt (RI) bit over these regions to indicate we have an invalid kernel stack pointer. For treclaim, the region over which we clear MSR RI is larger than required to avoid the need for an extra costly mtmsrd. Thanks to Paulus for suggesting this change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix string instr. emulation for 32-bit processes on ppc64	James Yang	2013-06-30	1	-0/+4
\| \| \| \| \| \| \| \| \|	String instruction emulation would erroneously result in a segfault if the upper bits of the EA are set and is so high that it fails access check. Truncate the EA to 32 bits if the process is 32-bit. Signed-off-by: James Yang <James.Yang@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	trivial: powerpc: Fix typo in ioei_interrupt() description	Sebastien Bessiere	2013-06-30	1	-1/+1
\| \| \| \| \|	Signed-off-by: Sebastien Bessiere <sebastien.bessiere@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	mm/thp: define HPAGE_PMD_* constants as BUILD_BUG() if !THP	Kirill A. Shutemov	2013-06-26	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, HPAGE_PMD_* constans rely on PMD_SHIFT regardless of CONFIG_TRANSPARENT_HUGEPAGE. PMD_SHIFT is not defined everywhere (e.g. arm nommu case). It means we can't use anything like this in generic code: if (PageTransHuge(page)) zero_huge_user(page, 0, HPAGE_PMD_SIZE); else clear_highpage(page); For !THP case, PageTransHuge() is 0 and compiler can eliminate zero_huge_user() call. But it still need to be valid C expression, means HPAGE_PMD_SIZE has to expand to something compiler can understand. Previously, HPAGE_PMD_* were defined to BUILD_BUG() for !THP. Let's come back to it. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Use interruptible sleep in keehd	Gavin Shan	2013-06-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To replace down() with down_interrutible() to avoid following warning: [c00000007ba7b710] [c000000000014410] .__switch_to+0x1b0/0x380 [c00000007ba7b7c0] [c0000000007b408c] .__schedule+0x3ec/0x970 [c00000007ba7ba50] [c0000000007b1f24] .schedule_timeout+0x1a4/0x2b0 [c00000007ba7bb30] [c0000000007b34a4] .__down+0xa4/0x104 [c00000007ba7bbf0] [c0000000000b9230] .down+0x60/0x70 [c00000007ba7bc80] [c0000000000336d0] .eeh_event_handler+0x70/0x190 [c00000007ba7bd30] [c0000000000b1a58] .kthread+0xe8/0xf0 [c00000007ba7be30] [c00000000000a05c] .ret_from_kernel_thread+0x5c/0x8 This also avoids keeping the load average up while doing nothing. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Remove eeh_mutex	Gavin Shan	2013-06-25	3	-46/+1
\| \| \| \| \| \| \| \| \| \| \|	Originally, eeh_mutex was introduced to protect the PE hierarchy tree and the attached EEH devices because EEH core was possiblly running with multiple threads to access the PE hierarchy tree. However, we now have only one kthread in EEH core. So we needn't the eeh_mutex and just remove it. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Fix build warnings with CONFIG_TRANSPARENT_HUGEPAGE disabled	Nathan Fontenot	2013-06-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Building with CONFIG_TRANSPARENT_HUGEPAGE disabled causes the following build wearnings; powerpc/arch/powerpc/include/asm/mmu-hash64.h: In function ‘__hash_page_thp’: powerpc/arch/powerpc/include/asm/mmu-hash64.h:354: warning: no return statement in function returning non-void This patch adds a return -1 to the static inline for __hash_page_thp() to correct the warnings. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Enable PSTORE in pseries_defconfig	Aruna Balakrishnaiah	2013-06-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	Since now we have pstore support for nvram in pseries, enable it in the default config. With this config option enabled, pstore infra-structure will be used to read/write the messages from/to nvram. Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/hw_brk: Fix clearing of extraneous IRQ	Michael Neuling	2013-06-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 9422de3 "powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers" we changed the way we mark extraneous irqs with this: - info->extraneous_interrupt = !((bp->attr.bp_addr <= dar) && - (dar - bp->attr.bp_addr < bp->attr.bp_len)); + if (!((bp->attr.bp_addr <= dar) && + (dar - bp->attr.bp_addr < bp->attr.bp_len))) + info->type \|= HW_BRK_TYPE_EXTRANEOUS_IRQ; Unfortunately this is bogus as it never clears extraneous IRQ if it's already set. This correctly clears extraneous IRQ before possibly setting it. Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/hw_brk: Fix setting of length for exact mode breakpoints	Michael Neuling	2013-06-25	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The smallest match region for both the DABR and DAWR is 8 bytes, so the kernel needs to filter matches when users want to look at regions smaller than this. Currently we set the length of PPC_BREAKPOINT_MODE_EXACT breakpoints to 8. This is wrong as in exact mode we should only match on 1 address, hence the length should be 1. This ensures that the kernel will filter out any exact mode hardware breakpoint matches on any addresses other than the requested one. Signed-off-by: Michael Neuling <mikey@neuling.org> Reported-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	macintosh/adb: Replace __WAITQUEUE_INITIALIZER with more standard ↵	Robert P. J. Day	2013-06-25	1	-1/+1
\| \| \| \| \| \| \|	DECLARE_WAITQUEUE. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Optimize hugepage invalidate	Aneesh Kumar K.V	2013-06-21	4	-9/+201
\| \| \| \| \| \| \| \| \|	Hugepage invalidate involves invalidating multiple hpte entries. Optimize the operation using H_BULK_REMOVE on lpar platforms. On native, reduce the number of tlb flush. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/THP: Enable THP on PPC64	Aneesh Kumar K.V	2013-06-21	2	-2/+30
\| \| \| \| \| \| \| \|	We enable only if the we support 16MB page size. Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: split hugepage when using subpage protection	Aneesh Kumar K.V	2013-06-21	1	-0/+48
\| \| \| \| \| \| \| \| \| \|	We find all the overlapping vma and mark them such that we don't allocate hugepage in that range. Also we split existing huge page so that the normal page hash can be invalidated and new page faulted in with new protection bits. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: disable assert_pte_locked for collapse_huge_page	Aneesh Kumar K.V	2013-06-21	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \|	With THP we set pmd to none, before we do pte_clear. Hence we can't walk page table to get the pte lock ptr and verify whether it is locked. THP do take pte lock before calling pte_clear. So we don't change the locking rules here. It is that we can't use page table walking to check whether pte locks are held with THP. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Prevent gcc to re-read the pagetables	Aneesh Kumar K.V	2013-06-21	2	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC is very likely to read the pagetables just once and cache them in the local stack or in a register, but it is can also decide to re-read the pagetables. The problem is that the pagetable in those places can change from under gcc. With THP/hugetlbfs the pmd (and pgd for hugetlbfs giga pages) can change under gup_fast. The pages won't be freed untill we finish gup fast because we have irq disabled and we free these pages via rcu callback. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make linux pagetable walk safe with THP enabled	Aneesh Kumar K.V	2013-06-21	4	-38/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to have irqs disabled to handle all the possible parallel update for linux page table without holding locks. Events that we are intersted in while walking page tables are 1) Page fault 2) umap 3) THP split 4) THP collapse A) local_irq_disabled: ------------------------ 1) page fault: A none to valid transition via page fault is not an issue because we would either see a none or valid. If it is none, we would error out the page table walk. We may need to use on stack values when checking for type of page table elements, because if we do if (!is_hugepd()) { if (!pmd_none() { if (pmd_bad() { We could take that bad condition because the pmd got converted to a hugepd after the !is_hugepd check via a hugetlb fault. The right way would be to check for pmd_none higher up or use on stack value. 2) A valid to none conversion via unmap: We can safely walk the upper level table, because we don't remove the the page table entries until rcu grace period. So even if we followed a wrong pointer we still have the pointer valid till the grace period. A PTE pointer returned need to be atomically checked for _PAGE_PRESENT and _PAGE_BUSY. A valid pointer returned could becoming none later. To prevent pte_clear we take _PAGE_BUSY. 3) THP split: A valid transparent hugepage is converted to nomal page. Before we split we do pmd_splitting_flush, which sets the hugepage PTE to _PAGE_SPLITTING So when walking page table we need to check for pmd_trans_splitting and handle that. The pte returned should also need to be checked for _PAGE_SPLITTING before setting _PAGE_BUSY similar to _PAGE_PRESENT. We save the value of PTE on stack and check for the flag in the local pte value. If we don't have the value set we can safely operate on the local pte value and we atomicaly set _PAGE_BUSY. 4) THP collapse: A normal page gets converted to hugepage. In the collapse path, we mark the pmd none early (pmdp_clear_flush). With irq disabled, if we are aleady walking page table we would see the pmd_none and won't continue. If we see a valid PMD, we should still check for _PAGE_PRESENT before setting _PAGE_BUSY, to make sure we didn't collapse the PTE to a Huge PTE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/THP: Add code to handle HPTE faults for hugepages	Aneesh Kumar K.V	2013-06-21	4	-4/+203
\| \| \| \| \| \| \| \| \| \|	The deposted PTE page in the second half of the PMD table is used to track the state on hash PTEs. After updating the HPTE, we mark the coresponding slot in the deposted PTE page valid. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Update gup_pmd_range to handle transparent hugepages	Aneesh Kumar K.V	2013-06-21	1	-2/+8
\| \| \| \| \| \|	Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/kvm: Handle transparent hugepage in KVM	Aneesh Kumar K.V	2013-06-21	3	-34/+44
\| \| \| \| \| \| \| \|	We can find pte that are splitting while walking page tables. Return None pte in that case. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Replace find_linux_pte with find_linux_pte_or_hugepte	Aneesh Kumar K.V	2013-06-21	7	-33/+36
\| \| \| \| \| \| \| \|	Replace find_linux_pte with find_linux_pte_or_hugepte and explicitly document why we don't need to handle transparent hugepages at callsites. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Update find_linux_pte_or_hugepte to handle transparent hugepages	Aneesh Kumar K.V	2013-06-21	1	-6/+26
\| \| \| \| \| \|	Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common code	Aneesh Kumar K.V	2013-06-21	5	-138/+138
\| \| \| \| \| \| \| \|	We will use this in the later patch for handling THP pages Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/THP: Implement transparent hugepages for ppc64	Aneesh Kumar K.V	2013-06-21	6	-2/+625
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now have pmd entries covering 16MB range and the PMD table double its original size. We use the second half of the PMD table to deposit the pgtable (PTE page). The depoisted PTE page is further used to track the HPTE information. The information include [ secondary group \| 3 bit hidx \| valid ]. We use one byte per each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need 4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk the PTE page and invalidate all valid HPTEs. This patch implements necessary arch specific functions for THP support and also hugepage invalidate logic. These PMD related functions are intentionally kept similar to their PTE counter-part. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/THP: Double the PMD table size for THP	Aneesh Kumar K.V	2013-06-21	4	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	THP code does PTE page allocation along with large page request and deposit them for later use. This is to ensure that we won't have any failures when we split hugepages to regular pages. On powerpc we want to use the deposited PTE page for storing hash pte slot and secondary bit information for the HPTEs. We use the second half of the pmd table to save the deposted PTE page. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: handle hugepage size correctly when invalidating hpte entries	Aneesh Kumar K.V	2013-06-21	9	-107/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. With hugepages, we need to pass the correct actual page size value for tlb invalidation. This change update the patch 0608d692463598c1d6e826d9dd7283381b4f246c "powerpc/mm: Always invalidate tlb on hpte invalidate and update" to handle transparent hugepages correctly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Debugfs for error injection	Gavin Shan	2013-06-21	1	-0/+31
\| \| \| \| \| \| \| \|	The patch creates debugfs entries (powerpc/PCIxxxx/err_injct) for injecting EEH errors for testing purpose. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/powernv: Debugfs directory for PHB	Gavin Shan	2013-06-21	2	-0/+27
\| \| \| \| \| \| \| \| \|	The patch creates one debugfs directory ("powerpc/PCIxxxx") for each PHB so that we can hook EEH error injection debugfs entry there in proceeding patch. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Register OPAL notifier for PCI error	Gavin Shan	2013-06-21	1	-1/+40
\| \| \| \| \| \| \| \| \|	The patch registers OPAL event notifier and process the PCI errors from firmware. If we have pending PCI errors, special EEH event (without binding PE) will be sent to EEH core for processing. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powernv/opal: Disable OPAL notifier upon poweroff	Gavin Shan	2013-06-21	1	-0/+4
\| \| \| \| \| \| \| \|	While we're restarting or powering off the system, we needn't the OPAL notifier any more. So just to disable that. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powernv/opal: Notifier for OPAL events	Gavin Shan	2013-06-21	2	-1/+73
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a notifier to receive a notification on OPAL event mask changes. The notifier is only called as a result of an OPAL interrupt, which will happen upon reception of FSP messages or PCI errors. Any event mask change detected as a result of opal_poll_events() will not result in a notifier call. [benh: changelog] Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Allow to check fenced PHB proactively	Gavin Shan	2013-06-20	1	-0/+60
\| \| \| \| \| \| \| \| \|	It's meaningless to handle frozen PE if we already had fenced PHB. The patch intends to check the PHB state before checking PE. If the PHB has been put into fenced state, we need take care of that firstly. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Enable EEH check for config access	Gavin Shan	2013-06-20	1	-1/+39
\| \| \| \| \| \| \| \| \| \| \|	The patch enables EEH check and let EEH core to process the EEH errors for PowerNV platform while accessing config space. Originally, the implementation already had mechanism to check EEH errors and tried to recover from them. However, we never let EEH core to handle the EEH errors. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: Initialization for PowerNV	Gavin Shan	2013-06-20	2	-5/+17
\| \| \| \| \| \| \| \|	The patch initializes EEH for PowerNV platform. Because the OPAL APIs requires HUB ID, we need trace that through struct pnv_phb. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: PowerNV EEH backends	Gavin Shan	2013-06-20	2	-1/+420
\| \| \| \| \| \| \| \| \| \|	The patch adds EEH backends for PowerNV platform. It's notable that part of those EEH backends call to the I/O chip dependent backends. [Removed pointless change to eeh_pseries.c -- BenH] Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/eeh: I/O chip next error	Gavin Shan	2013-06-20	2	-2/+333
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch implements the backend for EEH core to retrieve next EEH error to handle. For the informational errors, we won't bother the EEH core. Otherwise, the EEH should take appropriate actions depending on the return value: 0 - No further errors detected 1 - Frozen PE 2 - Fenced PHB 3 - Dead PHB 4 - Dead IOC Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>