summaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2012-08-202-0/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar. A x32 socket ABI fix with a -stable backport tag among other fixes. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x32: Use compat shims for {g,s}etsockopt Revert "x86-64/efi: Use EFI to deal with platform wall clock" x86, apic: fix broken legacy interrupts in the logical apic mode x86, build: Globally set -fno-pic x86, avx: don't use avx instructions with "noxsave" boot param
| * x86, apic: fix broken legacy interrupts in the logical apic modeSuresh Siddha2012-08-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent commit 332afa656e76458ee9cf0f0d123016a0658539e4 cleaned up a workaround that updates irq_cfg domain for legacy irq's that are handled by the IO-APIC. This was assuming that the recent changes in assign_irq_vector() were sufficient to remove the workaround. But this broke couple of AMD platforms. One of them seems to be sending interrupts to the offline cpu's, resulting in spurious "No irq handler for vector xx (irq -1)" messages when those cpu's come online. And the other platform seems to always send the interrupt to the last logical CPU (cpu-7). Recent changes had an unintended side effect of using only logical cpu-0 in the IO-APIC RTE (during boot for the legacy interrupts) and this broke the legacy interrupts not getting routed to the cpu-7 on the AMD platform, resulting in a boot hang. For now, reintroduce the removed workaround, (essentially not allowing the vector to change for legacy irq's when io-apic starts to handle the irq. Which also addressed the uninteded sife effect of just specifying cpu-0 in the IO-APIC RTE for those irq's during boot). Reported-and-tested-by: Robert Richter <robert.richter@amd.com> Reported-and-tested-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1344453412.29170.5.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * x86, avx: don't use avx instructions with "noxsave" boot paramSuresh Siddha2012-08-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clear AVX, AVX2 features along with clearing XSAVE feature bits, as part of the parsing "noxsave" parameter. Fixes the kernel boot panic with "noxsave" boot parameter. We could have checked cpu_has_osxsave along with cpu_has_avx etc, but Peter mentioned clearing the feature bits will be better for uses like static_cpu_has() etc. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1343755754.2041.2.camel@sbsiddha-desk.sc.intel.com Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | perf/x86: disable PEBS on a guest entry.Gleb Natapov2012-08-131-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | If PMU counter has PEBS enabled it is not enough to disable counter on a guest entry since PEBS memory write can overshoot guest entry and corrupt guest memory. Disabling PEBS during guest entry solves the problem. Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120809085234.GI3341@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | perf/x86: Add Intel Westmere-EX uncore supportYan, Zheng2012-08-132-33/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Westmere-EX uncore is similar to the Nehalem-EX uncore. The differences are: - Westmere-EX uncore has 10 instances of Cbox. The MSRs for Cbox8 and Cbox9 in the Westmere-EX aren't contiguous with Cbox 0~7. - The fvid field in the ZDP_CTL_FVC register in the Mbox is different. It's 5 bits in the Nehalem-EX, 6 bits in the Westmere-EX. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1344229882-3907-3-git-send-email-zheng.z.yan@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | perf/x86: Fixes for Nehalem-EX uncore driverYan, Zheng2012-08-132-109/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch includes following fixes and update: - Only some events in the Sbox and Mbox can use the match/mask registers, add code to check this. - The format definitions for xbr_mm_cfg and xbr_match registers in the Rbox are wrong, xbr_mm_cfg should use 32 bits, xbr_match should use 64 bits. - Cleanup the Rbox code. Compute the addresses extra registers in the enable_event function instead of the hw_config function. This simplifies the code in nhmex_rbox_alter_er(). Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1344229882-3907-2-git-send-email-zheng.z.yan@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | perf, x86: Fix uncore_types_exit section mismatchBorislav Petkov2012-08-131-1/+1
|/ | | | | | | | | | | | | | | | | | | | Fix the following section mismatch: WARNING: arch/x86/kernel/cpu/built-in.o(.text+0x7ad9): Section mismatch in reference from the function uncore_types_exit() to the function .init.text:uncore_type_exit() The function uncore_types_exit() references the function __init uncore_type_exit(). This is often because uncore_types_exit lacks a __init annotation or the annotation of uncore_type_exit is wrong. caused by 14371cce03c2 ("perf: Add generic PCI uncore PMU device support"). Cc: Zheng Yan <zheng.z.yan@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1339741902-8449-8-git-send-email-zheng.z.yan@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'release' of ↵Linus Torvalds2012-08-034-8/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI and power management fixes from Len Brown: "A 3.3 sleep regression fixed, numa bugfix, plus some minor cleanups" * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: ACPI processor: Fix tick_broadcast_mask online/offline regression ACPI: Only count valid srat memory structures ACPI: Untangle a return statement for better readability ACPI / PCI: Do not try to acquire _OSC control if that is hopeless ACPI: delete _GTS/_BFS support ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler' ACPI: replace strlen("string") with sizeof("string") -1 ACPI / PM: Fix build warning in sleep.c for CONFIG_ACPI_SLEEP unset
| *-. Merge branches 'delete-gts-bfs', 'misc', 'novell-bugzilla-757888-numa' and ↵Len Brown2012-08-0370-1386/+5119
| |\ \ | | | | | | | | | | | | 'osc-pcie' into base
| * | | ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C ↵Len Brown2012-07-314-8/+6
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | function from assembler' cd74257b974d6d26442c97891c4d05772748b177 patched up GTS/BFS -- a feature we want to remove. So revert it (by hand, due to conflict in sleep.h) to prepare for GTS/BFS removal. Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2012-08-036-15/+48
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Various fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, kcmp: The kcmp system call can be common arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case x86/mce: Add quirk for instruction recovery on Sandy Bridge processors x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h> x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs x86, nops: Missing break resulting in incorrect selection on Intel x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental
| * | | arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error caseJulia Lawall2012-07-261-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Typically, the return value desired for the failure of a function with an integer return value is a negative integer. In these cases, the return value is sometimes a negative integer and sometimes 0, due to a subsequent initialization of the return variable within the loop. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) //<smpl> @r exists@ identifier ret; position p; constant C; expression e1,e3,e4; statement S; @@ ret = -C ... when != ret = e3 when any if@p (...) S ... when any if (\(ret != 0\|ret < 0\|ret > 0\) || ...) { ... return ...; } ... when != ret = e3 when any *if@p (...) { ... when != ret = e4 return ret; } //</smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Link: http://lkml.kernel.org/r/1342284188-19176-7-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/mce: Add quirk for instruction recovery on Sandy Bridge processorsTony Luck2012-07-261-3/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sandy Bridge processors follow the SDM (Vol 3B, Table 15-20) and set both the RIPV and EIPV bits in the MCG_STATUS register to zero for machine checks during instruction fetch. This is more than a little counter-intuitive and means that Linux cannot recover from these errors. Rather than insert special case code at several places in mce.c and mce-severity.c, we pretend the EIPV bit was set for just this case early in processing the machine check. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Link: http://lkml.kernel.org/r/180a06f3f357cf9f78259ae443a082b14a29535b.1343078495.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h>Tony Luck2012-07-261-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will need some of these values in mce.c. Move them to the appropriate header file so they are available. Acked-by: Borislav Petkov <bp@amd64.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Chen Gong <gong.chen@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqsTomoki Sekiyama2012-07-262-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current kernel, percpu variable `vector_irq' is not always cleared when a CPU is offlined. If the CPU that has the disabled irqs in vector_irq is hotplugged again, __setup_vector_irq() hits invalid irq vector and may crash. This bug can be reproduced as following; # echo 0 > /sys/devices/system/cpu/cpu7/online # modprobe -r some_driver_using_interrupts # vector_irq@cpu7 uncleared # echo 1 > /sys/devices/system/cpu/cpu7/online # kernel may crash To fix this problem, this patch clears vector_irq in __fixup_irqs() when the CPU is offlined. This also reverts commit f6175f5bfb4c, which partially fixes this bug by clearing vector in __clear_irq_vector(). But in environments with IOMMU IRQ remapper, it could fail because cfg->domain doesn't contain offlined CPUs. With this patch, the fix in __clear_irq_vector() can be reverted because every vector_irq is already cleared in __fixup_irqs() on offlined CPUs. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Yinghai Lu <yinghai@kernel.org> Cc: Alexander Gordeev <agordeev@redhat.com> Link: http://lkml.kernel.org/r/20120726104732.2889.19144.stgit@kvmdev Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge branch 'linus' into x86/urgentIngo Molnar2012-07-2559-1065/+3362
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Merge in Linus's tree to avoid a conflict. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86, nops: Missing break resulting in incorrect selection on IntelAlan Cox2012-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Intel case falls through into the generic case which then changes the values. For cases like the P6 it doesn't do the right thing so this seems to be a screwup. Signed-off-by: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/n/tip-lww2uirad4skzjlmrm0vru8o@git.kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@vger.kernel.org>
* | | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2012-08-035-15/+107
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Fix merge window fallout and fix sleep profiling (this was always broken, so it's not a fix for the merge window - we can skip this one from the head of the tree)." * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/trace: Add ability to set a target task for events perf/x86: Fix USER/KERNEL tagging of samples properly perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit
| * | | | perf/x86: Fix USER/KERNEL tagging of samples properlyPeter Zijlstra2012-07-314-14/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some PMUs don't provide a full register set for their sample, specifically 'advanced' PMUs like AMD IBS and Intel PEBS which provide 'better' than regular interrupt accuracy. In this case we use the interrupt regs as basis and over-write some fields (typically IP) with different information. The perf core however uses user_mode() to distinguish user/kernel samples, user_mode() relies on regs->cs. If the interrupt skid pushed us over a boundary the new IP might not be in the same domain as the interrupt. Commit ce5c1fe9a9e ("perf/x86: Fix USER/KERNEL tagging of samples") tried to fix this by making the perf core use kernel_ip(). This however is wrong (TM), as pointed out by Linus, since it doesn't allow for VM86 and non-zero based segments in IA32 mode. Therefore, provide a new helper to set the regs->ip field, set_linear_ip(), which massages the regs into a suitable state assuming the provided IP is in fact a linear address. Also modify perf_instruction_pointer() and perf_callchain_user() to deal with segments base offsets. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1341910954.3462.102.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bitAndrew Morton2012-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i386 allmodconfig: arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_hrtimer': arch/x86/kernel/cpu/perf_event_intel_uncore.c:728: warning: integer overflow in expression arch/x86/kernel/cpu/perf_event_intel_uncore.c: In function 'uncore_pmu_start_hrtimer': arch/x86/kernel/cpu/perf_event_intel_uncore.c:735: warning: integer overflow in expression Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-h84qlqj02zrojmxxybzmy9hi@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2012-08-014-145/+1472
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes updates/cleanups/fixes from Oleg and diverse tooling updates (mostly fixes) now that Arnaldo is back from vacation." * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) uprobes: __replace_page() needs munlock_vma_page() uprobes: Rename vma_address() and make it return "unsigned long" uprobes: Fix register_for_each_vma()->vma_address() check uprobes: Introduce vaddr_to_offset(vma, vaddr) uprobes: Teach build_probe_list() to consider the range uprobes: Remove insert_vm_struct()->uprobe_mmap() uprobes: Remove copy_vma()->uprobe_mmap() uprobes: Fix overflow in vma_address()/find_active_uprobe() uprobes: Suppress uprobe_munmap() from mmput() uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe() uprobes: Clean up and document write_opcode()->lock_page(old_page) uprobes: Kill write_opcode()->lock_page(new_page) uprobes: __replace_page() should not use page_address_in_vma() uprobes: Don't recheck vma/f_mapping in write_opcode() perf/x86: Fix missing struct before structure name perf/x86: Fix format definition of SNB-EP uncore QPI box perf/x86: Make bitfield unsigned perf/x86: Fix LLC-* and node-* events on Intel SandyBridge perf/x86: Add Intel Nehalem-EX uncore support perf/x86: Fix typo in format definition of uncore PCU filter ...
| * | | | perf/x86: Fix format definition of SNB-EP uncore QPI boxYan, Zheng2012-07-262-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The event control register of SNB-EP uncore QPI box has a one bit extension at bit position 21. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1343097850-4348-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86: Make bitfield unsignedPeter Zijlstra2012-07-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix: arch/x86/kernel/cpu/perf_event.h:377:43: sparse: dubious one-bit signed bitfield Cc: Borislav Petkov <bp@amd64.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-2jxkmktkppkclj1qe6qxd7ah@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86: Fix LLC-* and node-* events on Intel SandyBridgeYan, Zheng2012-07-261-6/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLC-* and node-* events require using the OFFCORE_RESPONSE events on SandyBridge, but the hw_cache_extra_regs is left uninitialized. This patch adds the missing extra register configure table for SandyBridge. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1342517275-2875-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86: Add Intel Nehalem-EX uncore supportYan, Zheng2012-07-262-129/+1352
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The uncore subsystem in Nehalem-EX consists of 7 components (U-Box, C-Box, B-Box, S-Box, R-Box, M-Box and W-Box). This patch is large because the way to program these boxes is diverse. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FF534F1.3030307@intel.com [ Improved the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | perf/x86: Fix typo in format definition of uncore PCU filterYan, Zheng2012-07-261-8/+8
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The format definition of uncore PCU filter should be filter_band* instead of filter_brand*. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1343024611-4692-1-git-send-email-zheng.z.yan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | firmware_map: make firmware_map_add_early() argument consistent with ↵Yasuaki Ishimatsu2012-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | firmware_map_add_hotplug() There are two ways to create /sys/firmware/memmap/X sysfs: - firmware_map_add_early When the system starts, it is calledd from e820_reserve_resources() - firmware_map_add_hotplug When the memory is hot plugged, it is called from add_memory() But these functions are called without unifying value of end argument as below: - end argument of firmware_map_add_early() : start + size - 1 - end argument of firmware_map_add_hogplug() : start + size The patch unifies them to "start + size". Even if applying the patch, /sys/firmware/memmap/X/end file content does not change. [akpm@linux-foundation.org: clarify comments] Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2012-07-268-99/+224
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros
| * | | | x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTORAlex Shi2012-06-282-91/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | x86/tlb: add tlb_flushall_shift for specific CPUAlex Shi2012-06-282-2/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Testing show different CPU type(micro architectures and NUMA mode) has different balance points between the TLB flush all and multiple invlpg. And there also has cases the tlb flush change has no any help. This patch give a interface to let x86 vendor developers have a chance to set different shift for different CPU type. like some machine in my hands, balance points is 16 entries on Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing help. For untested machine, do a conservative optimization, same as NHM CPU. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | x86/tlb_info: get last level TLB entry number of CPUAlex Shi2012-06-283-0/+172
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and instruction TLB, second level is shared TLB for both data and instructions. For hupe page TLB, usually there is just one level and seperated by 2MB/4MB and 1GB. Although each levels TLB size is important for performance tuning, but for genernal and rude optimizing, last level TLB entry number is suitable. And in fact, last level TLB always has the biggest entry number. This patch will get the biggest TLB entry number and use it in furture TLB optimizing. Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other function and data will be released after system boot up. For all kinds of x86 vendor friendly, vendor specific code was moved to its specific files. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| * | | | x86: Add read_mostly declaration/definition to variables from smp.hVlad Zolotarov2012-06-143-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add "read-mostly" qualifier to the following variables in smp.h: - cpu_sibling_map - cpu_core_map - cpu_llc_shared_map - cpu_llc_id - cpu_number - x86_cpu_to_apicid - x86_bios_cpu_apicid - x86_cpu_to_logical_apicid As long as all the variables above are only written during the initialization, this change is meant to prevent the false sharing. More specifically, on vSMP Foundation platform x86_cpu_to_apicid shared the same internode_cache_line with frequently written lapic_events. From the analysis of the first 33 per_cpu variables out of 219 (memories they describe, to be more specific) the 8 have read_mostly nature (tlb_vector_offset, cpu_loops_per_jiffy, xen_debug_irq, etc.) and 25 are frequently written (irq_stack_union, gdt_page, exception_stacks, idt_desc, etc.). Assuming that the spread of the rest of the per_cpu variables is similar, identifying the read mostly memories will make more sense in terms of long-term code maintenance comparing to identifying frequently written memories. Signed-off-by: Vlad Zolotarov <vlad@scalemp.com> Acked-by: Shai Fultheim <shai@scalemp.com> Cc: Shai Fultheim (Shai@ScaleMP.com) <Shai@scalemp.com> Cc: ido@wizery.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1719258.EYKzE4Zbq5@vlad Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2012-07-262-56/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The biggest change is a performance improvement on SMP systems: | 4 socket 40 core + SMT Westmere box, single 30 sec tbench | runs, higher is better: | | clients 1 2 4 8 16 32 64 128 |.......................................................................... | pre 30 41 118 645 3769 6214 12233 14312 | post 299 603 1211 2418 4697 6847 11606 14557 | | A nice increase in performance. which speedup is particularly noticeable on heavily interacting few-tasks workloads, so the changes should help desktop-style Xorg workloads and interactivity as well, on multi-core CPUs. There are also cpuset suspend behavior fixes/restructuring and various smaller tweaks." * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix race in task_group() sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task sched: Reset loop counters if all tasks are pinned and we need to redo load balance sched: Reorder 'struct lb_env' members to reduce its size sched: Improve scalability via 'CPU buddies', which withstand random perturbations cpusets: Remove/update outdated comments cpusets, hotplug: Restructure functions that are invoked during hotplug cpusets, hotplug: Implement cpuset tree traversal in a helper function CPU hotplug, cpusets, suspend: Don't modify cpusets during suspend/resume sched/x86: Remove broken power estimation
| * | | | | sched/x86: Remove broken power estimationPeter Zijlstra2012-07-242-56/+1
| | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x86 sched power implementation has been broken forever and gets in the way of other stuff, remove it. [ For archaeological interest, fixing this code would require dealing with the cross-cpu calling of these functions and more importantly, we need to filter idle time out of the a/m-perf stuff because the ratio will go down to 0 when idle, giving a 0 capacity which is not what we'd want. ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Link: http://lkml.kernel.org/r/1339594110.8980.38.camel@twins Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge tag 'iommu-updates-v3.6-rc1' of ↵Linus Torvalds2012-07-251-11/+0
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "The most important part of these updates is the IOMMU groups code enhancement written by Alex Williamson. It abstracts the problem that a given hardware IOMMU can't isolate any given device from any other device (e.g. 32 bit PCI devices can't usually be isolated). Devices that can't be isolated are grouped together. This code is required for the upcoming VFIO framework. Another IOMMU-API change written by me is the introduction of domain attributes. This makes it easier to handle GART-like IOMMUs with the IOMMU-API because now the start-address and the size of the domain address space can be queried. Besides that there are a few cleanups and fixes for the NVidia Tegra IOMMU drivers and the reworked init-code for the AMD IOMMU. The latter is from my patch-set to support interrupt remapping. The rest of this patch-set requires x86 changes which are not mergabe yet. So full support for interrupt remapping with AMD IOMMUs will come in a future merge window." * tag 'iommu-updates-v3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (33 commits) iommu/amd: Fix hotplug with iommu=pt iommu/amd: Add missing spin_lock initialization iommu/amd: Convert iommu initialization to state machine iommu/amd: Introduce amd_iommu_init_dma routine iommu/amd: Move unmap_flush message to amd_iommu_init_dma_ops() iommu/amd: Split enable_iommus() routine iommu/amd: Introduce early_amd_iommu_init routine iommu/amd: Move informational prinks out of iommu_enable iommu/amd: Split out PCI related parts of IOMMU initialization iommu/amd: Use acpi_get_table instead of acpi_table_parse iommu/amd: Fix sparse warnings iommu/tegra: Don't call alloc_pdir with as->lock iommu/tegra: smmu: Fix unsleepable memory allocation at alloc_pdir() iommu/tegra: smmu: Remove unnecessary sanity check at alloc_pdir() iommu/exynos: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/tegra: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/msm: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/omap: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/vt-d: Implement DOMAIN_ATTR_GEOMETRY attribute iommu/amd: Implement DOMAIN_ATTR_GEOMETRY attribute ...
| | \ \ \ \
| | \ \ \ \
| | \ \ \ \
| | \ \ \ \
| *---. \ \ \ \ Merge branches 'iommu/fixes', 'x86/amd', 'groups', 'arm/tegra' and ↵Joerg Roedel2012-07-231-11/+0
| |\ \ \ \ \ \ \ | | |_|_|/ / / / | |/| | | | | / | | | | | |_|/ | | | | |/| | | | | | | | | | | | | | | | | | | | | | | 'api/domain-attr' into next Conflicts: drivers/iommu/iommu.c include/linux/iommu.h
| | | * | | | iommu: Remove group_mfAlex Williamson2012-06-251-11/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The iommu=group_mf is really no longer needed with the addition of ACS support in IOMMU drivers creating groups. Most multifunction devices will now be grouped already. If a device has gone to the trouble of exposing ACS, trust that it works. We can use the device specific ACS function for fixing devices we trust individually. This largely reverts bcb71abe. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* | | | | | | Merge tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pciLinus Torvalds2012-07-251-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull PCI changes from Bjorn Helgaas: "Host bridge hotplug: - Add MMCONFIG support for hot-added host bridges (Jiang Liu) Device hotplug: - Move fixups from __init to __devinit (Sebastian Andrzej Siewior) - Call FINAL fixups for hot-added devices, too (Myron Stowe) - Factor out generic code for P2P bridge hot-add (Yinghai Lu) - Remove all functions in a slot, not just those with _EJx (Amos Kong) Dynamic resource management: - Track bus number allocation (struct resource tree per domain) (Yinghai Lu) - Make P2P bridge 1K I/O windows work with resource reassignment (Bjorn Helgaas, Yinghai Lu) - Disable decoding while updating 64-bit BARs (Bjorn Helgaas) Power management: - Add PCIe runtime D3cold support (Huang Ying) Virtualization: - Add VFIO infrastructure (ACS, DMA source ID quirks) (Alex Williamson) - Add quirks for devices with broken INTx masking (Jan Kiszka) Miscellaneous: - Fix some PCI Express capability version issues (Myron Stowe) - Factor out some arch code with a weak, generic, pcibios_setup() (Myron Stowe)" * tag 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (122 commits) PCI: hotplug: ensure a consistent return value in error case PCI: fix undefined reference to 'pci_fixup_final_inited' PCI: build resource code for M68K architecture PCI: pciehp: remove unused pciehp_get_max_lnk_width(), pciehp_get_cur_lnk_width() PCI: reorder __pci_assign_resource() (no change) PCI: fix truncation of resource size to 32 bits PCI: acpiphp: merge acpiphp_debug and debug PCI: acpiphp: remove unused res_lock sparc/PCI: replace pci_cfg_fake_ranges() with pci_read_bridge_bases() PCI: call final fixups hot-added devices PCI: move final fixups from __init to __devinit x86/PCI: move final fixups from __init to __devinit MIPS/PCI: move final fixups from __init to __devinit PCI: support sizing P2P bridge I/O windows with 1K granularity PCI: reimplement P2P bridge 1K I/O windows (Intel P64H2) PCI: disable MEM decoding while updating 64-bit MEM BARs PCI: leave MEM and IO decoding disabled during 64-bit BAR sizing, too PCI: never discard enable/suspend/resume_early/resume fixups PCI: release temporary reference in __nv_msi_ht_cap_quirk() PCI: restructure 'pci_do_fixups()' ...
| * | | | | | | x86/PCI: move fixup hooks from __init to __devinitSebastian Andrzej Siewior2012-06-121-1/+1
| | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fixups are executed once the pci-device is found which is during boot process so __init seems fine as long as the platform does not support hotplug. However it is possible to remove the PCI bus at run time and have it rediscovered again via "echo 1 > /sys/bus/pci/rescan" and this will call the fixups again. Cc: x86@kernel.org Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-07-241-1/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree from Jiri Kosina: "Trivial updates all over the place as usual." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits) Fix typo in include/linux/clk.h . pci: hotplug: Fix typo in pci iommu: Fix typo in iommu video: Fix typo in drivers/video Documentation: Add newline at end-of-file to files lacking one arm,unicore32: Remove obsolete "select MISC_DEVICES" module.c: spelling s/postition/position/g cpufreq: Fix typo in cpufreq driver trivial: typo in comment in mksysmap mach-omap2: Fix typo in debug message and comment scsi: aha152x: Fix sparse warning and make printing pointer address more portable. Change email address for Steve Glendinning Btrfs: fix typo in convert_extent_bit via: Remove bogus if check netprio_cgroup.c: fix comment typo backlight: fix memory leak on obscure error path Documentation: asus-laptop.txt references an obsolete Kconfig item Documentation: ManagementStyle: fixed typo mm/vmscan: cleanup comment error in balance_pgdat mm: cleanup on the comments of zone_reclaim_stat ...
| * | | | | | | module.c: spelling s/postition/position/gGeert Uytterhoeven2012-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* | | | | | | | Merge tag 'stable/for-linus-3.6-rc0-tag' of ↵Linus Torvalds2012-07-242-4/+22
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen update from Konrad Rzeszutek Wilk: "Features: * Performance improvement to lower the amount of traps the hypervisor has to do 32-bit guests. Mainly for setting PTE entries and updating TLS descriptors. * MCE polling driver to collect hypervisor MCE buffer and present them to /dev/mcelog. * Physical CPU online/offline support. When an privileged guest is booted it is present with virtual CPUs, which might have an 1:1 to physical CPUs but usually don't. This provides mechanism to offline/online physical CPUs. Bug-fixes for: * Coverity found fixes in the console and ACPI processor driver. * PVonHVM kexec fixes along with some cleanups. * Pages that fall within E820 gaps and non-RAM regions (and had been released to hypervisor) would be populated back, but potentially in non-RAM regions." * tag 'stable/for-linus-3.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: populate correct number of pages when across mem boundary (v2) xen PVonHVM: move shared_info to MMIO before kexec xen: simplify init_hvm_pv_info xen: remove cast from HYPERVISOR_shared_info assignment xen: enable platform-pci only in a Xen guest xen/pv-on-hvm kexec: shutdown watches from old kernel xen/x86: avoid updating TLS descriptors if they haven't changed xen/x86: add desc_equal() to compare GDT descriptors xen/mm: zero PTEs for non-present MFNs in the initial page table xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable xen/hvc: Fix up checks when the info is allocated. xen/acpi: Fix potential memory leak. xen/mce: add .poll method for mcelog device driver xen/mce: schedule a workqueue to avoid sleep in atomic context xen/pcpu: Xen physical cpus online/offline sys interface xen/mce: Register native mce handler as vMCE bounce back point x86, MCE, AMD: Adjust initcall sequence for xen xen/mce: Add mcelog support for Xen platform
| * | | | | | | | x86, MCE, AMD: Adjust initcall sequence for xenLiu, Jinsong2012-07-191-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | there are 3 funcs which need to be _initcalled in a logic sequence: 1. xen_late_init_mcelog 2. mcheck_init_device 3. threshold_init_device xen_late_init_mcelog must register xen_mce_chrdev_device before native mce_chrdev_device registration if running under xen platform; mcheck_init_device should be inited before threshold_init_device to initialize mce_device, otherwise a a NULL ptr dereference will cause panic. so we use following _initcalls 1. device_initcall(xen_late_init_mcelog); 2. device_initcall_sync(mcheck_init_device); 3. late_initcall(threshold_init_device); when running under xen, the initcall order is 1,2,3; on baremetal, we skip 1 and we do only 2 and 3. Acked-and-tested-by: Borislav Petkov <bp@amd64.org> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | | | | xen/mce: Add mcelog support for Xen platformLiu, Jinsong2012-07-191-3/+1
| | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When MCA error occurs, it would be handled by Xen hypervisor first, and then the error information would be sent to initial domain for logging. This patch gets error information from Xen hypervisor and convert Xen format error into Linux format mcelog. This logic is basically self-contained, not touching other kernel components. By using tools like mcelog tool users could read specific error information, like what they did under native Linux. To test follow directions outlined in Documentation/acpi/apei/einj.txt Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | | | | Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2012-07-243-3/+81
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Avi Kivity: "Highlights include - full big real mode emulation on pre-Westmere Intel hosts (can be disabled with emulate_invalid_guest_state=0) - relatively small ppc and s390 updates - PCID/INVPCID support in guests - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on interrupt intensive workloads) - Lockless write faults during live migration - EPT accessed/dirty bits support for new Intel processors" Fix up conflicts in: - Documentation/virtual/kvm/api.txt: Stupid subchapter numbering, added next to each other. - arch/powerpc/kvm/booke_interrupts.S: PPC asm changes clashing with the KVM fixes - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c: Duplicated commits through the kvm tree and the s390 tree, with subsequent edits in the KVM tree. * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits) KVM: fix race with level interrupts x86, hyper: fix build with !CONFIG_KVM_GUEST Revert "apic: fix kvm build on UP without IOAPIC" KVM guest: switch to apic_set_eoi_write, apic_write apic: add apic_set_eoi_write for PV use KVM: VMX: Implement PCID/INVPCID for guests with EPT KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check KVM: PPC: Critical interrupt emulation support KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests KVM: PPC64: booke: Set interrupt computation mode for 64-bit host KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt KVM: PPC: bookehv64: Add support for std/ld emulation. booke: Added crit/mc exception handler for e500v2 booke/bookehv: Add host crit-watchdog exception support KVM: MMU: document mmu-lock and fast page fault KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint KVM: MMU: trace fast page fault KVM: MMU: fast path of handling guest page fault KVM: MMU: introduce SPTE_MMU_WRITEABLE bit KVM: MMU: fold tlb flush judgement into mmu_spte_update ...
| * | | | | | | | x86, hyper: fix build with !CONFIG_KVM_GUESTAvi Kivity2012-07-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | | | | | KVM guest: switch to apic_set_eoi_write, apic_writeMichael S. Tsirkin2012-07-161-10/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use apic_set_eoi_write, apic_write to avoid meedling in core apic driver data structures directly. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | | apic: add apic_set_eoi_write for PV useMichael S. Tsirkin2012-07-161-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM PV EOI optimization overrides eoi_write apic op with its own version. Add an API for this to avoid meddling with core x86 apic driver data structures directly. For KVM use, we don't need any guarantees about when the switch to the new op will take place, so it could in theory use this API after SMP init, but it currently doesn't, and restricting callers to early init makes it clear that it's safe as it won't race with actual APIC driver use. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | | KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform checkPrarit Bhargava2012-07-112-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While debugging I noticed that unlike all the other hypervisor code in the kernel, kvm does not have an entry for x86_hyper which is used in detect_hypervisor_platform() which results in a nice printk in the syslog. This is only really a stub function but it does make kvm more consistent with the other hypervisors. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Marcelo Tostatti <mtosatti@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | | KVM guest: guest side for eoi avoidanceMichael S. Tsirkin2012-06-251-3/+54
| | |_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The idea is simple: there's a bit, per APIC, in guest memory, that tells the guest that it does not need EOI. Guest tests it using a single est and clear operation - this is necessary so that host can detect interrupt nesting - and if set, it can skip the EOI MSR. I run a simple microbenchmark to show exit reduction (note: for testing, need to apply follow-up patch 'kvm: host side for eoi optimization' + a qemu patch I posted separately, on host): Before: Performance counter stats for 'sleep 1s': 47,357 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 5,001 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 22,124 kvm:kvm_apic [99.98%] 49,849 kvm:kvm_exit [99.98%] 21,115 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 22,937 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.98%] 22,207 kvm:kvm_apic_accept_irq [99.98%] 22,421 kvm:kvm_eoi [99.98%] 0 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 57 kvm:kvm_emulate_insn [99.99%] 0 kvm:vcpu_match_mmio [99.99%] 0 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 23,609 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [99.99%] 226 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002100578 seconds time elapsed After: Performance counter stats for 'sleep 1s': 28,354 kvm:kvm_entry [99.98%] 0 kvm:kvm_hypercall [99.98%] 0 kvm:kvm_hv_hypercall [99.98%] 1,347 kvm:kvm_pio [99.98%] 0 kvm:kvm_cpuid [99.98%] 1,931 kvm:kvm_apic [99.98%] 29,595 kvm:kvm_exit [99.98%] 24,884 kvm:kvm_inj_virq [99.98%] 0 kvm:kvm_inj_exception [99.98%] 0 kvm:kvm_page_fault [99.98%] 1,986 kvm:kvm_msr [99.98%] 0 kvm:kvm_cr [99.98%] 0 kvm:kvm_pic_set_irq [99.98%] 0 kvm:kvm_apic_ipi [99.99%] 25,953 kvm:kvm_apic_accept_irq [99.99%] 26,132 kvm:kvm_eoi [99.99%] 26,593 kvm:kvm_pv_eoi [99.99%] 0 kvm:kvm_nested_vmrun [99.99%] 0 kvm:kvm_nested_intercepts [99.99%] 0 kvm:kvm_nested_vmexit [99.99%] 0 kvm:kvm_nested_vmexit_inject [99.99%] 0 kvm:kvm_nested_intr_vmexit [99.99%] 0 kvm:kvm_invlpga [99.99%] 0 kvm:kvm_skinit [99.99%] 284 kvm:kvm_emulate_insn [99.99%] 68 kvm:vcpu_match_mmio [99.99%] 68 kvm:kvm_userspace_exit [99.99%] 2 kvm:kvm_set_irq [99.99%] 2 kvm:kvm_ioapic_set_irq [99.99%] 28,288 kvm:kvm_msi_set_irq [99.99%] 1 kvm:kvm_ack_irq [99.99%] 131 kvm:kvm_mmio [100.00%] 588 kvm:kvm_fpu [100.00%] 0 kvm:kvm_age_page [100.00%] 0 kvm:kvm_try_async_get_page [100.00%] 0 kvm:kvm_async_pf_doublefault [100.00%] 0 kvm:kvm_async_pf_not_present [100.00%] 0 kvm:kvm_async_pf_ready [100.00%] 0 kvm:kvm_async_pf_completed 1.002039622 seconds time elapsed We see that # of exits is almost halved. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>