summaryrefslogtreecommitdiffstats
path: root/arch (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of ↵Linus Torvalds2014-04-0836-958/+892
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull second set of s390 patches from Martin Schwidefsky: "The second part of Heikos uaccess rework, the page table walker for uaccess is now a thing of the past (yay!) The code change to fix the theoretical TLB flush problem allows us to add a TLB flush optimization for zEC12, this machine has new instructions that allow to do CPU local TLB flushes for single pages and for all pages of a specific address space. Plus the usual bug fixing and some more cleanup" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/uaccess: rework uaccess code - fix locking issues s390/mm,tlb: optimize TLB flushing for zEC12 s390/mm,tlb: safeguard against speculative TLB creation s390/irq: Use defines for external interruption codes s390/irq: Add defines for external interruption codes s390/sclp: add timeout for queued requests kvm/s390: also set guest pages back to stable on kexec/kdump lcs: Add missing destroy_timer_on_stack() s390/tape: Add missing destroy_timer_on_stack() s390/tape: Use del_timer_sync() s390/3270: fix crash with multiple reset device requests s390/bitops,atomic: add missing memory barriers s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
| * s390/uaccess: rework uaccess code - fix locking issuesHeiko Carstens2014-04-0315-815/+556
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current uaccess code uses a page table walk in some circumstances, e.g. in case of the in atomic futex operations or if running on old hardware which doesn't support the mvcos instruction. However it turned out that the page table walk code does not correctly lock page tables when accessing page table entries. In other words: a different cpu may invalidate a page table entry while the current cpu inspects the pte. This may lead to random data corruption. Adding correct locking however isn't trivial for all uaccess operations. Especially copy_in_user() is problematic since that requires to hold at least two locks, but must be protected against ABBA deadlock when a different cpu also performs a copy_in_user() operation. So the solution is a different approach where we change address spaces: User space runs in primary address mode, or access register mode within vdso code, like it currently already does. The kernel usually also runs in home space mode, however when accessing user space the kernel switches to primary or secondary address mode if the mvcos instruction is not available or if a compare-and-swap (futex) instruction on a user space address is performed. KVM however is special, since that requires the kernel to run in home address space while implicitly accessing user space with the sie instruction. So we end up with: User space: - runs in primary or access register mode - cr1 contains the user asce - cr7 contains the user asce - cr13 contains the kernel asce Kernel space: - runs in home space mode - cr1 contains the user or kernel asce -> the kernel asce is loaded when a uaccess requires primary or secondary address mode - cr7 contains the user or kernel asce, (changed with set_fs()) - cr13 contains the kernel asce In case of uaccess the kernel changes to: - primary space mode in case of a uaccess (copy_to_user) and uses e.g. the mvcp instruction to access user space. However the kernel will stay in home space mode if the mvcos instruction is available - secondary space mode in case of futex atomic operations, so that the instructions come from primary address space and data from secondary space In case of kvm the kernel runs in home space mode, but cr1 gets switched to contain the gmap asce before the sie instruction gets executed. When the sie instruction is finished cr1 will be switched back to contain the user asce. A context switch between two processes will always load the kernel asce for the next process in cr1. So the first exit to user space is a bit more expensive (one extra load control register instruction) than before, however keeps the code rather simple. In sum this means there is no need to perform any error prone page table walks anymore when accessing user space. The patch seems to be rather large, however it mainly removes the the page table walk code and restores the previously deleted "standard" uaccess code, with a couple of changes. The uaccess without mvcos mode can be enforced with the "uaccess_primary" kernel parameter. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/mm,tlb: optimize TLB flushing for zEC12Martin Schwidefsky2014-04-0311-66/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zEC12 machines introduced the local-clearing control for the IDTE and IPTE instruction. If the control is set only the TLB of the local CPU is cleared of entries, either all entries of a single address space for IDTE, or the entry for a single page-table entry for IPTE. Without the local-clearing control the TLB flush is broadcasted to all CPUs in the configuration, which is expensive. The reset of the bit mask of the CPUs that need flushing after a non-local IDTE is tricky. As TLB entries for an address space remain in the TLB even if the address space is detached a new bit field is required to keep track of attached CPUs vs. CPUs in the need of a flush. After a non-local flush with IDTE the bit-field of attached CPUs is copied to the bit-field of CPUs in need of a flush. The ordering of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is such that an underindication in mm_cpumask(mm) is prevented but an overindication in mm_cpumask(mm) is possible. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/mm,tlb: safeguard against speculative TLB creationMartin Schwidefsky2014-04-033-18/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The principles of operations states that the CPU is allowed to create TLB entries for an address space anytime while an ASCE is loaded to the control register. This is true even if the CPU is running in the kernel and the user address space is not (actively) accessed. In theory this can affect two aspects of the TLB flush logic. For full-mm flushes the ASCE of the dying process is still attached. The approach to flush first with IDTE and then just free all page tables can in theory lead to stale TLB entries. Use the batched free of page tables for the full-mm flushes as well. For operations that can have a stale ASCE in the control register, e.g. a delayed update_user_asce in switch_mm, load the kernel ASCE to prevent invalid TLBs from being created. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/irq: Use defines for external interruption codesThomas Huth2014-04-0310-23/+30
| | | | | | | | | | | | | | | | | | | | | | Use the new defines for external interruption codes to get rid of "magic" numbers in the s390 source code. And while we're at it, also rename the (un-)register_external_interrupt function to something shorter so that this patch does not exceed the 80 columns all over the place. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/irq: Add defines for external interruption codesThomas Huth2014-04-031-0/+13
| | | | | | | | | | | | | | | | Introduce defines for external interruption codes so that we can get rid of some "magic" numbers in the s390 source code. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * kvm/s390: also set guest pages back to stable on kexec/kdumpChristian Borntraeger2014-04-011-0/+4
| | | | | | | | | | | | | | | | | | We need to reset the usage state of the pages on kexec/kdump, which use subcode 0 and 1. We will only do the cmma reset in the kernel, everything else is done in userspace as before. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * s390/bitops,atomic: add missing memory barriersHeiko Carstens2014-04-012-46/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reworking the bitops and atomic ops I missed that those instructions that got atomic behaviour only perform a "specific-operand-serialization" instead of a full "serialization". The compare-and-swap instruction used before performs a full serialization before and after the instruction is executed, which means it has full memory barrier semantics. In order to give the new bitops and atomic ops functions also full memory barrier semantics add a "bcr 14,0" before and after each of those new instructions which performs full serialization as well. This restores memory barrier semantics for bitops and atomic ops functions which return values, like e.g. atomic_add_return(), but not for functions which do not return a value, like e.g. atomic_add(). This is consistent to other architectures and what common code requires. Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* | Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2014-04-084-0/+209
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm updates from Dave Airlie: "Highlights: - drm: Generic display port aux features, primary plane support, drm master management fixes, logging cleanups, enforced locking checks (instead of docs), documentation improvements, minor number handling cleanup, pseudofs for shared inodes. - ttm: add ability to allocate from both ends - i915: broadwell features, power domain and runtime pm, per-process address space infrastructure (not enabled) - msm: power management, hdmi audio support - nouveau: ongoing GPU fault recovery, initial maxwell support, random fixes - exynos: refactored driver to clean up a lot of abstraction, DP support moved into drm, LVDS bridge support added, parallel panel support - gma500: SGX MMU support, SGX irq handling, asle irq work fixes - radeon: video engine bringup, ring handling fixes, use dp aux helpers - vmwgfx: add rendernode support" * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (849 commits) DRM: armada: fix corruption while loading cursors drm/dp_helper: don't return EPROTO for defers (v2) drm/bridge: export ptn3460_init function drm/exynos: remove MODULE_DEVICE_TABLE definitions ARM: dts: exynos4412-trats2: enable exynos/fimd node ARM: dts: exynos4210-trats: enable exynos/fimd node ARM: dts: exynos4412-trats2: add panel node ARM: dts: exynos4210-trats: add panel node ARM: dts: exynos4: add MIPI DSI Master node drm/panel: add S6E8AA0 driver ARM: dts: exynos4210-universal_c210: add proper panel node drm/panel: add ld9040 driver panel/ld9040: add DT bindings panel/s6e8aa0: add DT bindings drm/exynos: add DSIM driver exynos/dsim: add DT bindings drm/exynos: disallow fbdev initialization if no device is connected drm/mipi_dsi: create dsi devices only for nodes with reg property drm/mipi_dsi: add flags to DSI messages Skip intel_crt_init for Dell XPS 8700 ...
| * \ Merge tag 'drm-intel-fixes-2014-04-04' of ↵Dave Airlie2014-04-0533-251/+105
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/drm-intel into drm-next Merge window -fixes pull request as usual. Well, I did sneak in Jani's drm_i915_private_t typedef removal, need to have fun with a big sed job too ;-) Otherwise: - hdmi interlaced fixes (Jesse&Ville) - pipe error/underrun/crc tracking fixes, regression in late 3.14-rc (but not cc: stable since only really relevant for igt runs) - large cursor wm fixes (Chris) - fix gpu turbo boost/throttle again, was getting stuck due to vlv rps patches (Chris+Imre) - fix runtime pm fallout (Paulo) - bios framebuffer inherit fix (Chris) - a few smaller things * tag 'drm-intel-fixes-2014-04-04' of git://anongit.freedesktop.org/drm-intel: (196 commits) Skip intel_crt_init for Dell XPS 8700 drm/i915: vlv: fix RPS interrupt mask setting Revert "drm/i915/vlv: fixup DDR freq detection per Punit spec" drm/i915: move power domain init earlier during system resume drm/i915: Fix the computation of required fb size for pipe drm/i915: don't get/put runtime PM at the debugfs forcewake file drm/i915: fix WARNs when reading DDI state while suspended drm/i915: don't read cursor registers on powered down pipes drm/i915: get runtime PM at i915_display_info drm/i915: don't read pp_ctrl_reg if we're suspended drm/i915: get runtime PM at i915_reg_read_ioctl drm/i915: don't schedule force_wake_timer at gen6_read drm/i915: vlv: reserve the GT power context only once during driver init drm/i915: prefer struct drm_i915_private to drm_i915_private_t drm/i915/overlay: prefer struct drm_i915_private to drm_i915_private_t drm/i915/ringbuffer: prefer struct drm_i915_private to drm_i915_private_t drm/i915/display: prefer struct drm_i915_private to drm_i915_private_t drm/i915/irq: prefer struct drm_i915_private to drm_i915_private_t drm/i915/gem: prefer struct drm_i915_private to drm_i915_private_t drm/i915/dma: prefer struct drm_i915_private to drm_i915_private_t ...
| * | | ARM: dts: exynos4412-trats2: enable exynos/fimd nodeAndrzej Hajda2014-04-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch changes fimd node status to OK. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| * | | ARM: dts: exynos4210-trats: enable exynos/fimd nodeAndrzej Hajda2014-04-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch changes fimd node status to OK. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| * | | ARM: dts: exynos4412-trats2: add panel nodeAndrzej Hajda2014-04-041-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds s6e8aa0 panel node for trats2. It adds also trats2 specific properties for DSI and regulator required by panel. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| * | | ARM: dts: exynos4210-trats: add panel nodeAndrzej Hajda2014-04-041-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds s6e8aa0 panel node for trats. It adds also trats specific properties for DSI. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| * | | ARM: dts: exynos4: add MIPI DSI Master nodeAndrzej Hajda2014-04-041-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a common part of DSI node for all Exynos4 boards. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| * | | ARM: dts: exynos4210-universal_c210: add proper panel nodeAndrzej Hajda2014-04-041-17/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces panel bindings for panel initialized by boot loader with bindings to proper ld9040 panel. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
| * | | ARM: dts: exynos4210-universal: add exynos/fimd nodeAndrzej Hajda2014-03-231-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds fimd node with display timings for exynos4210-universal device. It also makes LCD regulators always on. This allow to re-use panel initialized by boot loader. Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Reviewed-by: Tomasz Figa <t.figa@samsung.com> Signed-off-by: Inki Dae <inki.dae@samsung.com>
* | | | Merge tag 'arc-v3.15-rc1' of ↵Linus Torvalds2014-04-0821-115/+114
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC changes from Vineet Gupta: - Support for external initrd from Noam - Fix broken serial console in nsimosci Virtual Platform - Reuse of ENTRY/END assembler macros across hand asm code - Other minor fixes here and there * tag 'arc-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: [nsimosci] Unbork console ARC: [nsimosci] Change .dts to use generic 8250 UART ARC: [SMP] General Fixes ARC: Remove unused DT template file ARC: [clockevent] simplify timer ISR ARC: [clockevent] can't be SoC specific ARC: Remove ARC_HAS_COH_RTSC ARC: switch to generic ENTRY/END assembler annotations ARC: support external initrd ARC: add uImage to .gitignore ARC: [arcfpga] Fix __initconst data const-correctness
| * | | | ARC: [nsimosci] Unbork consoleVineet Gupta2014-04-052-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Despite the switch to right UART driver (prev patch), serial console still doesn't work due to missing CONFIG_SERIAL_OF_PLATFORM Also fix the default cmdline in DT to not refer to out-of-tree ARC framebuffer driver for console. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: <stable@vger.kernel.org> #3.10, 3.12, 3.13, 3.14 Cc: Francois Bedard <Francois.Bedard@synopsys.com>
| * | | | ARC: [nsimosci] Change .dts to use generic 8250 UARTMischa Jonker2014-04-051-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Synopsys APB DW UART has a couple of special features that are not in the System C model. In 3.8, the 8250_dw driver didn't really use these features, but from 3.9 onwards, the 8250_dw driver has become incompatible with our model. Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: <stable@vger.kernel.org> #3.10, 3.12, 3.13, 3.14 Cc: Francois Bedard <Francois.Bedard@synopsys.com>
| * | | | ARC: [SMP] General FixesVineet Gupta2014-04-052-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -Pass the expected arg to non-boot park'ing routine (It worked so far because existing SMP backends don't use the arg) -CONFIG_DEBUG_PREEMPT warning
| * | | | ARC: Remove unused DT template fileVineet Gupta2014-03-311-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit abe11ddea1d759f9 "Enabling DeviceTree for Angel4 board" made angel4.dts as default leaving the orphan skeleton file behind. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: [clockevent] simplify timer ISRVineet Gupta2014-03-261-19/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Remove one liner IRQ ACK accessor, it was coming in the way of readability. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: [clockevent] can't be SoC specificVineet Gupta2014-03-261-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So no point keeping it weak Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: Remove ARC_HAS_COH_RTSCRichard Weinberger2014-03-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The symbol is an orphan, get rid of it. Fixes: 7d0857a54aed ("ARC: [SMP] Disallow RTSC") Signed-off-by: Richard Weinberger <richard@nod.at> Acked-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: switch to generic ENTRY/END assembler annotationsVineet Gupta2014-03-2611-70/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit 9df62f054406 "arch: use ASM_NL instead of ';'" the generic macros can handle the arch specific newline quirk. Hence we can get rid of ARC asm macros and use the "C" style macros. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: support external initrdNoam Camus2014-03-261-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ARC only supports embedded initrd. This patch enables external ones too. [vgupta: Changed from "rt_start"=start/"rd_size"=sz to unified "initrd"=start,sz] Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: add uImage to .gitignoreMischa Jonker2014-03-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
| * | | | ARC: [arcfpga] Fix __initconst data const-correctnessMischa Jonker2014-03-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
* | | | | Merge branch 'akpm' (incoming from Andrew)Linus Torvalds2014-04-0856-393/+297
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge second patch-bomb from Andrew Morton: - the rest of MM - zram updates - zswap updates - exit - procfs - exec - wait - crash dump - lib/idr - rapidio - adfs, affs, bfs, ufs - cris - Kconfig things - initramfs - small amount of IPC material - percpu enhancements - early ioremap support - various other misc things * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits) MAINTAINERS: update Intel C600 SAS driver maintainers fs/ufs: remove unused ufs_super_block_third pointer fs/ufs: remove unused ufs_super_block_second pointer fs/ufs: remove unused ufs_super_block_first pointer fs/ufs/super.c: add __init to init_inodecache() doc/kernel-parameters.txt: add early_ioremap_debug arm64: add early_ioremap support arm64: initialize pgprot info earlier in boot x86: use generic early_ioremap mm: create generic early_ioremap() support x86/mm: sparse warning fix for early_memremap lglock: map to spinlock when !CONFIG_SMP percpu: add preemption checks to __this_cpu ops vmstat: use raw_cpu_ops to avoid false positives on preemption checks slub: use raw_cpu_inc for incrementing statistics net: replace __this_cpu_inc in route.c with raw_cpu_inc modules: use raw_cpu_write for initialization of per cpu refcount. mm: use raw_cpu ops for determining current NUMA node percpu: add raw_cpu_ops slub: fix leak of 'name' in sysfs_slab_add ...
| * | | | | arm64: add early_ioremap supportMark Salter2014-04-0810-50/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for early IO or memory mappings which are needed before the normal ioremap() is usable. This also adds fixmap support for permanent fixed mappings such as that used by the earlyprintk device register region. Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | arm64: initialize pgprot info earlier in bootMark Salter2014-04-083-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Presently, paging_init() calls init_mem_pgprot() to initialize pgprot values used by macros such as PAGE_KERNEL, PAGE_KERNEL_EXEC, etc. The new fixmap and early_ioremap support also needs to use these macros before paging_init() is called. This patch moves the init_mem_pgprot() call out of paging_init() and into setup_arch() so that pgprot_default gets initialized in time for fixmap and early_ioremap. Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | x86: use generic early_ioremapMark Salter2014-04-086-240/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move x86 over to the generic early ioremap implementation. Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | x86/mm: sparse warning fix for early_memremapDave Young2014-04-082-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch series takes the common bits from the x86 early ioremap implementation and creates a generic implementation which may be used by other architectures. The early ioremap interfaces are intended for situations where boot code needs to make temporary virtual mappings before the normal ioremap interfaces are available. Typically, this means before paging_init() has run. This patch (of 6): There's a lot of sparse warnings for code like below: void *a = early_memremap(phys_addr, size); early_memremap intend to map kernel memory with ioremap facility, the return pointer should be a kernel ram pointer instead of iomem one. For making the function clearer and supressing sparse warnings this patch do below two things: 1. cast to (__force void *) for the return value of early_memremap 2. add early_memunmap function and pass (__force void __iomem *) to iounmap From Boris: "Ingo told me yesterday, it makes sense too. I'd guess we can try it. FWIW, all callers of early_memremap use the memory they get remapped as normal memory so we should be safe" Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Mark Salter <msalter@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | percpu: add raw_cpu_opsChristoph Lameter2014-04-082-57/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel has never been audited to ensure that this_cpu operations are consistently used throughout the kernel. The code generated in many places can be improved through the use of this_cpu operations (which uses a segment register for relocation of per cpu offsets instead of performing address calculations). The patch set also addresses various consistency issues in general with the per cpu macros. A. The semantics of __this_cpu_ptr() differs from this_cpu_ptr only because checks are skipped. This is typically shown through a raw_ prefix. So this patch set changes the places where __this_cpu_ptr() is used to raw_cpu_ptr(). B. There has been the long term wish by some that __this_cpu operations would check for preemption. However, there are cases where preemption checks need to be skipped. This patch set adds raw_cpu operations that do not check for preemption and then adds preemption checks to the __this_cpu operations. C. The use of __get_cpu_var is always a reference to a percpu variable that can also be handled via a this_cpu operation. This patch set replaces all uses of __get_cpu_var with this_cpu operations. D. We can then use this_cpu RMW operations in various places replacing sequences of instructions by a single one. E. The use of this_cpu operations throughout will allow other arches than x86 to implement optimized references and RMV operations to work with per cpu local data. F. The use of this_cpu operations opens up the possibility to further optimize code that relies on synchronization through per cpu data. The patch set works in a couple of stages: I. Patch 1 adds the additional raw_cpu operations and raw_cpu_ptr(). Also converts the existing __this_cpu_xx_# primitive in the x86 code to raw_cpu_xx_#. II. Patch 2-4 use the raw_cpu operations in places that would give us false positives once they are enabled. III. Patch 5 adds preemption checks to __this_cpu operations to allow checking if preemption is properly disabled when these functions are used. IV. Patches 6-20 are patches that simply replace uses of __get_cpu_var with this_cpu_ptr. They do not depend on any changes to the percpu code. No preemption tests are skipped if they are applied. V. Patches 21-46 are conversion patches that use this_cpu operations in various kernel subsystems/drivers or arch code. VI. Patches 47/48 (not included in this series) remove no longer used functions (__this_cpu_ptr and __get_cpu_var). These should only be applied after all the conversion patches have made it and after we have done additional passes through the kernel to ensure that none of the uses of these functions remain. This patch (of 46): The patches following this one will add preemption checks to __this_cpu ops so we need to have an alternative way to use this_cpu operations without preemption checks. raw_cpu_ops will be the basis for all other ops since these will be the operations that do not implement any checks. Primitive operations are renamed by this patch from __this_cpu_xxx to raw_cpu_xxxx. Also change the uses of the x86 percpu primitives in preempt.h. These depend directly on asm/percpu.h (header #include nesting issue). Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alex Shi <alex.shi@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bryan Wu <cooloney@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: David S. Miller <davem@davemloft.net> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Hedi Berriche <hedi@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: John Stultz <john.stultz@linaro.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Mike Travis <travis@sgi.com> Cc: Neil Brown <neilb@suse.de> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Robert Richter <rric@kernel.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Wim Van Sebroeck <wim@iguana.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | asm/system.h: um: arch_align_stack() moved to asm/exec.hDavid Howells2014-04-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch_align_stack() moved to asm/exec.h, so change the comment referring to asm/system.h which no longer exists. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAPUwe Kleine-König2014-04-0829-43/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the renamed symbol is defined lib/iomap.c implements ioport_map and ioport_unmap and currently (nearly) all platforms define the port accessor functions outb/inb and friend unconditionally. So HAS_IOPORT_MAP is the better name for this. Consequently NO_IOPORT is renamed to NO_IOPORT_MAP. The motivation for this change is to reintroduce a symbol HAS_IOPORT that signals if outb/int et al are available. I will address that at least one merge window later though to keep surprises to a minimum and catch new introductions of (HAS|NO)_IOPORT. The changes in this commit were done using: $ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/' Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | x86: always define BUG() and HAVE_ARCH_BUG, even with !CONFIG_BUGJosh Triplett2014-04-081-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that BUG() always has a definition that causes a trap (via an undefined instruction), and that the compiler still recognizes the code following BUG() as unreachable, avoiding warnings that would otherwise appear (such as on non-void functions that don't return a value after BUG()). In addition to saving a few bytes over the generic infinite-loop implementation, this implementation traps rather than looping, which potentially allows for better error-recovery behavior (such as by rebooting). Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | ppc: make PPC_BOOK3S_64 select IRQ_WORKJosh Triplett2014-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix breakage which will be exposed by the patch "kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT". arch/powerpc/kernel/mce.c, compiled in for PPC_BOOK3S_64, calls functions only built when IRQ_WORK, so select it. Fixes the following build error: arch/powerpc/kernel/built-in.o: In function `.machine_check_queue_event': (.text+0x11260): undefined reference to `.irq_work_queue' Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | ia64: select CONFIG_TTY for use of tty_write_message in unalignedJosh Triplett2014-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix breakage which will be exposed by the patch "kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT". arch/ia64/kernel/unaligned.c uses tty_write_message to print an unaligned access exception to the TTY of the current user process. Enable TTY to prevent a build error. Minimal fix, on the basis that few people on ia64 will care deeply about kernel size enough to turn off TTY. Ideally, I'd instead suggest dropping the tty_write_message entirely, and just leaving the printk. Bonus: no need to sprintf first. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | cris: cpuinfo_op should depend on CONFIG_PROC_FSGeert Uytterhoeven2014-04-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix breakage which will be exposed by the patch "kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT". Now allnoconfig started disabling CONFIG_PROC_FS: arch/cris/kernel/built-in.o:(.rodata+0xc): undefined reference to `show_cpuinfo' make: *** [vmlinux] Error 1 Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | cris: make ETRAX_ARCH_V10 select TTY for use in debugportJosh Triplett2014-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix breakage which will be exposed by the patch "kconfig: make allnoconfig disable options behind EMBEDDED and EXPERT". arch/cris/arch-v10/kernel/debugport.c, compiled in unconditionally with ETRAX_ARCH_V10, requires TTY, so select TTY to avoid a build failure. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | rapidio: rework device hierarchy and introduce mport class of devicesAlexandre Bounine2014-04-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes an artificial RapidIO bus root device and establishes actual device hierarchy by providing reference to real parent devices. It also introduces device class for RapidIO controller devices (on-chip or an eternal bridge, known as "mport"). Existing implementation was sufficient for SoC-based platforms that have a single RapidIO controller. With introduction of devices using multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme is very limiting or does not work at all. The implemented changes allow to properly reference platform's local RapidIO mport devices and provide device details needed for upper layers. This change to RapidIO device hierarchy does not break any known existing kernel or user space interfaces. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Stef van Os <stef.van.os@prodrive-technologies.com> Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com> Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | include/linux/crash_dump.h: add vmcore_cleanup() prototypeRashika Kheria2014-04-081-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eliminate the following warning in proc/vmcore.c: fs/proc/vmcore.c:1088:6: warning: no previous prototype for `vmcore_cleanup' [-Wmissing-prototypes] [akpm@linux-foundation.org: clean up powerpc, remove unneeded EXPORT_SYMBOL] Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | mm: per-thread vma cachingDavidlohr Bueso2014-04-081-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 50.61% | 19.90 | | patched | 73.45% | 13.58 | +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 75.28% | 11.03 | | patched | 88.09% | 9.31 | +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 70.66% | 17.14 | | patched | 91.15% | 12.57 | +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 1.06% | 91.54 | | patched | 99.97% | 14.18 | +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | mm: revert "thp: make MADV_HUGEPAGE check for mm->def_flags"Alex Thorlton2014-04-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main motivation behind this patch is to provide a way to disable THP for jobs where the code cannot be modified, and using a malloc hook with madvise is not an option (i.e. statically allocated data). This patch allows us to do just that, without affecting other jobs running on the system. We need to do this sort of thing for jobs where THP hurts performance, due to the possibility of increased remote memory accesses that can be created by situations such as the following: When you touch 1 byte of an untouched, contiguous 2MB chunk, a THP will be handed out, and the THP will be stuck on whatever node the chunk was originally referenced from. If many remote nodes need to do work on that same chunk, they'll be making remote accesses. With THP disabled, 4K pages can be handed out to separate nodes as they're needed, greatly reducing the amount of remote accesses to memory. This patch is based on some of my work combined with some suggestions/patches given by Oleg Nesterov. The main goal here is to add a prctl switch to allow us to disable to THP on a per mm_struct basis. Here's a bit of test data with the new patch in place... First with the flag unset: # perf stat -a ./prctl_wrapper_mmv3 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g Setting thp_disabled for this task... thp_disable: 0 Set thp_disabled state to 0 Process pid = 18027 PF/ MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/ TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES 512 1.120 0.060 0.000 0.110 0.110 0.000 28571428864 -9223372036854775808 55803572 23 Performance counter stats for './prctl_wrapper_mmv3_hack 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g': 273719072.841402 task-clock # 641.026 CPUs utilized [100.00%] 1,008,986 context-switches # 0.000 M/sec [100.00%] 7,717 CPU-migrations # 0.000 M/sec [100.00%] 1,698,932 page-faults # 0.000 M/sec 355,222,544,890,379 cycles # 1.298 GHz [100.00%] 536,445,412,234,588 stalled-cycles-frontend # 151.02% frontend cycles idle [100.00%] 409,110,531,310,223 stalled-cycles-backend # 115.17% backend cycles idle [100.00%] 148,286,797,266,411 instructions # 0.42 insns per cycle # 3.62 stalled cycles per insn [100.00%] 27,061,793,159,503 branches # 98.867 M/sec [100.00%] 1,188,655,196 branch-misses # 0.00% of all branches 427.001706337 seconds time elapsed Now with the flag set: # perf stat -a ./prctl_wrapper_mmv3 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g Setting thp_disabled for this task... thp_disable: 1 Set thp_disabled state to 1 Process pid = 144957 PF/ MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/ TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES 512 0.620 0.260 0.250 0.320 0.570 0.001 51612901376 128000000000 100806448 23 Performance counter stats for './prctl_wrapper_mmv3_hack 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g': 138789390.540183 task-clock # 641.959 CPUs utilized [100.00%] 534,205 context-switches # 0.000 M/sec [100.00%] 4,595 CPU-migrations # 0.000 M/sec [100.00%] 63,133,119 page-faults # 0.000 M/sec 147,977,747,269,768 cycles # 1.066 GHz [100.00%] 200,524,196,493,108 stalled-cycles-frontend # 135.51% frontend cycles idle [100.00%] 105,175,163,716,388 stalled-cycles-backend # 71.07% backend cycles idle [100.00%] 180,916,213,503,160 instructions # 1.22 insns per cycle # 1.11 stalled cycles per insn [100.00%] 26,999,511,005,868 branches # 194.536 M/sec [100.00%] 714,066,351 branch-misses # 0.00% of all branches 216.196778807 seconds time elapsed As with previous versions of the patch, We're getting about a 2x performance increase here. Here's a link to the test case I used, along with the little wrapper to activate the flag: http://oss.sgi.com/projects/memtests/thp_pthread_mmprctlv3.tar.gz This patch (of 3): Revert commit 8e72033f2a48 and add in code to fix up any issues caused by the revert. The revert is necessary because hugepage_madvise would return -EINVAL when VM_NOHUGEPAGE is set, which will break subsequent chunks of this patch set. Here's a snip of an e-mail from Gerald detailing the original purpose of this code, and providing justification for the revert: "The intent of commit 8e72033f2a48 was to guard against any future programming errors that may result in an madvice(MADV_HUGEPAGE) on guest mappings, which would crash the kernel. Martin suggested adding the bit to arch/s390/mm/pgtable.c, if 8e72033f2a48 was to be reverted, because that check will also prevent a kernel crash in the case described above, it will now send a SIGSEGV instead. This would now also allow to do the madvise on other parts, if needed, so it is a more flexible approach. One could also say that it would have been better to do it this way right from the beginning..." Signed-off-by: Alex Thorlton <athorlton@sgi.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge tag 'cpu-hotplug-3.15-rc1' of ↵Linus Torvalds2014-04-0726-70/+158
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull CPU hotplug notifiers registration fixes from Rafael Wysocki: "The purpose of this single series of commits from Srivatsa S Bhat (with a small piece from Gautham R Shenoy) touching multiple subsystems that use CPU hotplug notifiers is to provide a way to register them that will not lead to deadlocks with CPU online/offline operations as described in the changelog of commit 93ae4f978ca7f ("CPU hotplug: Provide lockless versions of callback registration functions"). The first three commits in the series introduce the API and document it and the rest simply goes through the users of CPU hotplug notifiers and converts them to using the new method" * tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits) net/iucv/iucv.c: Fix CPU hotplug callback registration net/core/flow.c: Fix CPU hotplug callback registration mm, zswap: Fix CPU hotplug callback registration mm, vmstat: Fix CPU hotplug callback registration profile: Fix CPU hotplug callback registration trace, ring-buffer: Fix CPU hotplug callback registration xen, balloon: Fix CPU hotplug callback registration hwmon, via-cputemp: Fix CPU hotplug callback registration hwmon, coretemp: Fix CPU hotplug callback registration thermal, x86-pkg-temp: Fix CPU hotplug callback registration octeon, watchdog: Fix CPU hotplug callback registration oprofile, nmi-timer: Fix CPU hotplug callback registration intel-idle: Fix CPU hotplug callback registration clocksource, dummy-timer: Fix CPU hotplug callback registration drivers/base/topology.c: Fix CPU hotplug callback registration acpi-cpufreq: Fix CPU hotplug callback registration zsmalloc: Fix CPU hotplug callback registration scsi, fcoe: Fix CPU hotplug callback registration scsi, bnx2fc: Fix CPU hotplug callback registration scsi, bnx2i: Fix CPU hotplug callback registration ...
| * | | | | | arm64, debug-monitors: Fix CPU hotplug callback registrationSrivatsa S. Bhat2014-03-201-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the debug-monitors code in arm64 by using this latter form of callback registration. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | arm64, hw_breakpoint.c: Fix CPU hotplug callback registrationSrivatsa S. Bhat2014-03-201-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the hw-breakpoint code in arm64 by using this latter form of callback registration. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | | | | | x86, kvm: Fix CPU hotplug callback registrationSrivatsa S. Bhat2014-03-201-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the kvm code in x86 by using this latter form of callback registration. Cc: Gleb Natapov <gleb@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>