summaryrefslogtreecommitdiffstats
path: root/arch/x86/mm (follow)
Commit message (Collapse)AuthorAgeFilesLines
*---------------. Merge branches 'x86/apic', 'x86/cleanups', 'x86/cpufeature', ↵Ingo Molnar2008-12-235-18/+266
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'x86/crashdump', 'x86/debug', 'x86/defconfig', 'x86/detect-hyper', 'x86/doc', 'x86/dumpstack', 'x86/early-printk', 'x86/fpu', 'x86/idle', 'x86/io', 'x86/memory-corruption-check', 'x86/microcode', 'x86/mm', 'x86/mtrr', 'x86/nmi-watchdog', 'x86/pat2', 'x86/pci-ioapic-boot-irq-quirks', 'x86/ptrace', 'x86/quirks', 'x86/reboot', 'x86/setup-memory', 'x86/signal', 'x86/sparse-fixes', 'x86/time', 'x86/uv' and 'x86/xen' into x86/core
| | | | | | | | | * x86: PAT: remove follow_pfnmap_pte in favor of follow_physvenkatesh.pallipadi@intel.com2008-12-201-19/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Cleanup - removes a new function in favor of a recently modified older one. Replace follow_pfnmap_pte in pat code with follow_phys. follow_phys lso returns protection eliminating the need of pte_pgprot call. Using follow_phys also eliminates the need for pte_pa. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | | | | | | | | * x86: PAT: add pgprot_writecombine() interface for drivers - v3venkatesh.pallipadi@intel.com2008-12-181-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: New mm functionality. Add pgprot_writecombine. pgprot_writecombine will be aliased to pgprot_noncached when not supported by the architecture. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | | | | | | | | * x86: PAT: implement track/untrack of pfnmap regions for x86 - v3venkatesh.pallipadi@intel.com2008-12-181-0/+236
| |_|_|_|_|_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: New mm functionality. Hookup remap_pfn_range and vm_insert_pfn and corresponding copy and free routines with reserve and free tracking. reserve and free here only takes care of non RAM region mapping. For RAM region, driver should use set_memory_[uc|wc|wb] to set the cache type and then setup the mapping for user pte. We can bypass below reserve/free in that case. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
| | | | | | * | | Merge branch 'linus' into x86/memory-corruption-checkIngo Molnar2008-11-209-44/+166
| | | | | | |\ \ \ | |_|_|_|_|_|/ / / |/| | | | | | | |
| | | | | | * | | x86: corruption check: run the corruption checks from a work queueArjan van de Ven2008-10-272-4/+0
| | | | | | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: change the implementation of the debug feature the periodic corruption checks are better off run from a work queue; there's nothing time critical about them and this way the amount of interrupt-context work is reduced. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | | | | * | | x86, dumpstack: let signr=0 signal no do_exitAlexander van Heukelum2008-10-221-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change oops_end such that signr=0 signals that do_exit is not to be called. Currently, each use of __die is soon followed by a call to oops_end and 'regs' is set to NULL if oops_end is expected not to call do_exit. Change all such pairs to set signr=0 instead. On x86_64 oops_end is used 'bare' in die_nmi; use signr=0 instead of regs=NULL there, too. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | | * | | | | x86, 32-bit: add some compile time checks to mem_init()Jan Beulich2008-12-161-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some of the inconsistencies checked for at run time can be detected at build time already, so duplicate the checks done at run time to also be done at build time. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | | * | | | | x86: soften multi-BAR mapping sanity check warning messageIngo Molnar2008-12-121-1/+2
| |_|/ / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: make debug warning less scary The ioremap() time multi-BAR map warning has been causing false positives: http://lkml.org/lkml/2008/12/10/432 http://lkml.org/lkml/2008/12/11/136 So make it less scary by making it once-per-boot, by making it KERN_INFO and by adding this text: "Info: mapping multiple BARs. Your kernel is fine." Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | x86, 32-bit: simplify alloc_low_page()Jan Beulich2008-12-161-8/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup Neither of the callers really needs the physical address this function returns, so eliminate the pointless argument. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | Merge commit 'v2.6.28-rc7' into x86/cleanupsIngo Molnar2008-12-041-0/+35
| |\ \ \ \ \ \ | |/ / / / / / |/| | | | | |
| * | | | | | Merge branch 'linus' into x86/cleanupsIngo Molnar2008-11-088-44/+131
| |\| | | | |
| * | | | | | x86: avoid duplicate running of pud_offset and pmd_offset in one_md_table_init()Zhaolei2008-10-311-0/+2
| | |_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: simplify implementation, cleanup If !(pgd_val(*pgd) & _PAGE_PRESENT) in PAE mode, we need not get value of pmd_table again. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | x86, hibernate: fix breakage on x86_32 with CONFIG_NUMA setRafael J. Wysocki2008-11-121-0/+35
| |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix crash during hibernation on 32-bit NUMA The NUMA code on x86_32 creates special memory mapping that allows each node's pgdat to be located in this node's memory. For this purpose it allocates a memory area at the end of each node's memory and maps this area so that it is accessible with virtual addresses belonging to low memory. As a result, if there is high memory, these NUMA-allocated areas are physically located in high memory, although they are mapped to low memory addresses. Our hibernation code does not take that into account and for this reason hibernation fails on all x86_32 systems with CONFIG_NUMA=y and with high memory present. Fix this by adding a special mapping for the NUMA-allocated memory areas to the temporary page tables created during the last phase of resume. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2008-11-071-4/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: Revert "x86: default to reboot via ACPI" x86: align DirectMap in /proc/meminfo AMD IOMMU: fix lazy IO/TLB flushing in unmap path x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR x86: remove VISWS and PARAVIRT around NR_IRQS puzzle x86: mention ACPI in top-level Kconfig menu x86: size NR_IRQS on 32-bit systems the same way as 64-bit x86: don't allow nr_irqs > NR_IRQS x86/docs: remove noirqbalance param docs x86: don't use tsc_khz to calculate lpj if notsc is passed x86, voyager: fix smp_intr_init() compile breakage AMD IOMMU: fix detection of NP capable IOMMUs
| * | | | | x86: align DirectMap in /proc/meminfoHugh Dickins2008-11-061-4/+4
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: right-align /proc/meminfo consistent with other fields When the split-LRU patches added Inactive(anon) and Inactive(file) lines to /proc/meminfo, all counts were moved two columns rightwards to fit in. Now move x86's DirectMap lines two columns rightwards to line up. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* / | | | x86: add iomap_atomic*()/iounmap_atomic() on 32-bit using fixmapsKeith Packard2008-10-313-3/+61
|/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: introduce new APIs, separate kmap code from CONFIG_HIGHMEM This takes the code used for CONFIG_HIGHMEM memory mappings except that it's designed for dynamic IO resource mapping. These fixmaps are available even with CONFIG_HIGHMEM turned off. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: fix /dev/mem mmap breakage when PAT is disabledRavikiran G Thirumalai2008-10-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: allow /dev/mem mmaps on non-PAT CPUs/platforms Fix mmap to /dev/mem when CONFIG_X86_PAT is off and CONFIG_STRICT_DEVMEM is off mmap to /dev/mem on kernel memory has been failing since the introduction of PAT (CONFIG_STRICT_DEVMEM=n case). Seems like the check to avoid cache aliasing with PAT is kicking in even when PAT is disabled. The bug seems to have crept in 2.6.26. This patch makes sure that the mmap to regular kernel memory succeeds if CONFIG_STRICT_DEVMEM=n and PAT is disabled, and the checks to avoid cache aliasing still happens if PAT is enabled. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Tested-by: Tim Sirianni <tim@scalemp.com> Cc: <stable@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: remove debug code from arch_add_memory()Gary Hade2008-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: remove incorrect WARN_ON(1) Gets rid of dmesg spam created during physical memory hot-add which will very likely confuse users. The change removes what appears to be debugging code which I assume was unintentionally included in: x86: arch/x86/mm/init_64.c printk fixes commit 10f22dde556d1ed41d55355d1fb8ad495f9810c8 Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: start annotating early ioremap pointers with __iomemHarvey Harrison2008-10-291-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: some new sparse warnings in e820.c etc, but no functional change. As with regular ioremap, iounmap etc, annotate with __iomem. Fixes the following sparse warnings, will produce some new ones elsewhere in arch/x86 that will get worked out over time. arch/x86/mm/ioremap.c:402:9: warning: cast removes address space of expression arch/x86/mm/ioremap.c:406:10: warning: cast adds address space to expression (<asn:2>) arch/x86/mm/ioremap.c:782:19: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: two trivial sparse annotationsHarvey Harrison2008-10-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fewer sparse warnings, no functional changes arch/x86/kernel/vsmp_64.c:87:14: warning: incorrect type in argument 1 (different address spaces) arch/x86/kernel/vsmp_64.c:87:14: expected void const volatile [noderef] <asn:2>*addr arch/x86/kernel/vsmp_64.c:87:14: got void *[assigned] address arch/x86/kernel/vsmp_64.c:88:22: warning: incorrect type in argument 1 (different address spaces) arch/x86/kernel/vsmp_64.c:88:22: expected void const volatile [noderef] <asn:2>*addr arch/x86/kernel/vsmp_64.c:88:22: got void * arch/x86/kernel/vsmp_64.c:100:23: warning: incorrect type in argument 2 (different address spaces) arch/x86/kernel/vsmp_64.c:100:23: expected void volatile [noderef] <asn:2>*addr arch/x86/kernel/vsmp_64.c:100:23: got void * arch/x86/kernel/vsmp_64.c:101:23: warning: incorrect type in argument 1 (different address spaces) arch/x86/kernel/vsmp_64.c:101:23: expected void const volatile [noderef] <asn:2>*addr arch/x86/kernel/vsmp_64.c:101:23: got void * arch/x86/mm/gup.c:235:6: warning: incorrect type in argument 1 (different base types) arch/x86/mm/gup.c:235:6: expected void const volatile [noderef] <asn:1>*<noident> arch/x86/mm/gup.c:235:6: got unsigned long [unsigned] [assigned] start Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: fix init_memory_mapping for [dc000000 - e0000000) - v2Yinghai Lu2008-10-281-17/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: change over-mapping to precise mapping, fix /proc/meminfo output v2: fix less than 1G ram system handling when gart aperture is 0xdc000000 - 0xe0000000 it return 0xc0000000 - 0xe0000000 that is not right. this patch fix that will get exact mapping on 256g sytem with that aperture after patch LBSuse:~ # cat /proc/meminfo MemTotal: 264742432 kB MemFree: 263920628 kB Buffers: 1416 kB Cached: 24468 kB ... DirectMap4k: 5760 kB DirectMap2M: 3205120 kB DirectMap1G: 265289728 kB it is consistent to LBSuse:~ # cat /sys/kernel/debug/kernel_page_tables .. ---[ Low Kernel Mapping ]--- 0xffff880000000000-0xffff880000200000 2M RW GLB x pte 0xffff880000200000-0xffff880040000000 1022M RW PSE GLB x pmd 0xffff880040000000-0xffff8800c0000000 2G RW PSE GLB NX pud 0xffff8800c0000000-0xffff8800d7e00000 382M RW PSE GLB NX pmd 0xffff8800d7e00000-0xffff8800d7fa0000 1664K RW GLB NX pte 0xffff8800d7fa0000-0xffff8800d8000000 384K pte 0xffff8800d8000000-0xffff8800dc000000 64M pmd 0xffff8800dc000000-0xffff8800e0000000 64M RW PSE GLB NX pmd 0xffff8800e0000000-0xffff880100000000 512M pmd 0xffff880100000000-0xffff880800000000 28G RW PSE GLB NX pud 0xffff880800000000-0xffff880824600000 582M RW PSE GLB NX pmd 0xffff880824600000-0xffff8808247f0000 1984K RW GLB NX pte 0xffff8808247f0000-0xffff880824800000 64K RW PCD GLB NX pte 0xffff880824800000-0xffff880840000000 440M RW PSE GLB NX pmd 0xffff880840000000-0xffff884000000000 223G RW PSE GLB NX pud 0xffff884000000000-0xffff884028000000 640M RW PSE GLB NX pmd 0xffff884028000000-0xffff884040000000 384M pmd 0xffff884040000000-0xffff888000000000 255G pud 0xffff888000000000-0xffffc20000000000 58880G pgd Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: 64 bit print out absent pages num tooYinghai Lu2008-10-281-3/+6
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | so users are not confused with memhole causing big total ram we don't need to worry about 32 bit, because memhole is always above max_low_pfn. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | x86, memory hotplug: remove wrong -1 in calling init_memory_mapping()Shaohua Li2008-10-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix crash with memory hotplug Shuahua Li found: | I just did some experiments on a desktop for memory hotplug and this bug | triggered a crash in my test. | | Yinghai's suggestion also fixed the bug. We don't need to round it, just remove that extra -1 Signed-off-by: Yinghai <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | x86: keep the /proc/meminfo page count correctYinghai Lu2008-10-271-3/+9
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: get correct page count in /proc/meminfo found page count in /proc/meminfo is nor correct on 1G system in VirtualBox 2.0.4 # cat /proc/meminfo MemTotal: 1017508 kB MemFree: 822700 kB Buffers: 1456 kB Cached: 26632 kB SwapCached: 0 kB ... Hugepagesize: 2048 kB DirectMap4k: 4032 kB DirectMap2M: 18446744073709549568 kB with this patch get: ... DirectMap4k: 4032 kB DirectMap2M: 1044480 kB which is consistent to kernel_page_tables ---[ Low Kernel Mapping ]--- 0xffff880000000000-0xffff880000001000 4K RW PCD GLB x pte 0xffff880000001000-0xffff88000009f000 632K RW GLB x pte 0xffff88000009f000-0xffff8800000a0000 4K RW PCD GLB x pte 0xffff8800000a0000-0xffff880000200000 1408K RW GLB x pte 0xffff880000200000-0xffff88003fe00000 1020M RW PSE GLB x pmd 0xffff88003fe00000-0xffff88003fff0000 1984K RW GLB NX pte 0xffff88003fff0000-0xffff880040000000 64K pte 0xffff880040000000-0xffff888000000000 511G pud 0xffff888000000000-0xffffc20000000000 58880G pgd Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2008-10-231-4/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix section mismatch warning - apic_x2apic_phys x86: fix section mismatch warning - apic_x2apic_cluster x86: fix section mismatch warning - apic_x2apic_uv_x x86: fix section mismatch warning - apic_physflat x86: fix section mismatch warning - apic_flat x86: memtest fix use of reserve_early() x86 syscall.h: fix argument order x86/tlb_uv: remove strange mc146818rtc include x86: remove redundant KERN_DEBUG on pr_debug x86: do_boot_cpu - check if we have ESR register x86: MAINTAINERS change for AMD microcode patch loader x86/proc: fix /proc/cpuinfo cpu offline bug x86: call dmi-quirks for HP Laptops after early-quirks are executed x86, kexec: fix hang on i386 when panic occurs while console_sem is held MCE: Don't run 32bit machine checks with interrupts on x86: SB600: skip IRQ0 override if it is not routed to INT2 of IOAPIC x86: make variables static
| * | x86: memtest fix use of reserve_early()Daniele Calore2008-10-221-4/+3
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi all, Wrong usage of 2nd parameter in reserve_early call. 66/75: reserve_early(start_bad, last_bad - start_bad, "BAD RAM"); ^^^^^^^^^^^^^^^^^^^^ The correct way is to use 'end' address and not 'size'. As a bonus a fix to the printk format. Signed-off-by: Daniele Calore <orkaan@orkaan.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* / proc: switch /proc/meminfo to seq_fileAlexey Dobriyan2008-10-231-6/+5
|/ | | | | | and move it to fs/proc/meminfo.c while I'm at it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* Merge branch 'tracing-v28-for-linus' of ↵Linus Torvalds2008-10-203-103/+109
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (131 commits) tracing/fastboot: improve help text tracing/stacktrace: improve help text tracing/fastboot: fix initcalls disposition in bootgraph.pl tracing/fastboot: fix bootgraph.pl initcall name regexp tracing/fastboot: fix issues and improve output of bootgraph.pl tracepoints: synchronize unregister static inline tracepoints: tracepoint_synchronize_unregister() ftrace: make ftrace_test_p6nop disassembler-friendly markers: fix synchronize marker unregister static inline tracing/fastboot: add better resolution to initcall debug/tracing trace: add build-time check to avoid overrunning hex buffer ftrace: fix hex output mode of ftrace tracing/fastboot: fix initcalls disposition in bootgraph.pl tracing/fastboot: fix printk format typo in boot tracer ftrace: return an error when setting a nonexistent tracer ftrace: make some tracers reentrant ring-buffer: make reentrant ring-buffer: move page indexes into page headers tracing/fastboot: only trace non-module initcalls ftrace: move pc counter in irqtrace ... Manually fix conflicts: - init/main.c: initcall tracing - kernel/module.c: verbose level vs tracepoints - scripts/bootgraph.pl: fallout from cherry-picking commits.
| * mmiotrace: remove left-over marker cruftPekka Paalanen2008-10-141-64/+0
| | | | | | | | | | | | Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86 mmiotrace: implement mmiotrace_printk()Pekka Paalanen2008-10-142-1/+22
| | | | | | | | | | | | | | | | | | Offer mmiotrace users a function to inject markers from inside the kernel. This depends on the trace_vprintk() patch. Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86 mmiotrace: fix a rare memory leakPekka Paalanen2008-10-141-1/+3
| | | | | | | | | | | | Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * x86: fix mmiotrace 8-bit register decodingPekka Paalanen2008-10-141-37/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | When SIL, DIL, BPL or SPL registers were used in MMIO, the datum was extracted from AH, BH, CH, or DH, which are incorrect. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: "Vegard Nossum" <vegard.nossum@gmail.com> Cc: "Steven Rostedt" <srostedt@redhat.com> Cc: proski@gnu.org Cc: "Pekka Enberg" <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | mm: rewrite vmap layerNick Piggin2008-10-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and provide a fast, scalable percpu frontend for small vmaps (requires a slightly different API, though). The biggest problem with vmap is actually vunmap. Presently this requires a global kernel TLB flush, which on most architectures is a broadcast IPI to all CPUs to flush the cache. This is all done under a global lock. As the number of CPUs increases, so will the number of vunmaps a scaled workload will want to perform, and so will the cost of a global TLB flush. This gives terrible quadratic scalability characteristics. Another problem is that the entire vmap subsystem works under a single lock. It is a rwlock, but it is actually taken for write in all the fast paths, and the read locking would likely never be run concurrently anyway, so it's just pointless. This is a rewrite of vmap subsystem to solve those problems. The existing vmalloc API is implemented on top of the rewritten subsystem. The TLB flushing problem is solved by using lazy TLB unmapping. vmap addresses do not have to be flushed immediately when they are vunmapped, because the kernel will not reuse them again (would be a use-after-free) until they are reallocated. So the addresses aren't allocated again until a subsequent TLB flush. A single TLB flush then can flush multiple vunmaps from each CPU. XEN and PAT and such do not like deferred TLB flushing because they can't always handle multiple aliasing virtual addresses to a physical address. They now call vm_unmap_aliases() in order to flush any deferred mappings. That call is very expensive (well, actually not a lot more expensive than a single vunmap under the old scheme), however it should be OK if not called too often. The virtual memory extent information is stored in an rbtree rather than a linked list to improve the algorithmic scalability. There is a per-CPU allocator for small vmaps, which amortizes or avoids global locking. To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces must be used in place of vmap and vunmap. Vmalloc does not use these interfaces at the moment, so it will not be quite so scalable (although it will use lazy TLB flushing). As a quick test of performance, I ran a test that loops in the kernel, linearly mapping then touching then unmapping 4 pages. Different numbers of tests were run in parallel on an 4 core, 2 socket opteron. Results are in nanoseconds per map+touch+unmap. threads vanilla vmap rewrite 1 14700 2900 2 33600 3000 4 49500 2800 8 70631 2900 So with a 8 cores, the rewritten version is already 25x faster. In a slightly more realistic test (although with an older and less scalable version of the patch), I ripped the not-very-good vunmap batching code out of XFS, and implemented the large buffer mapping with vm_map_ram and vm_unmap_ram... along with a couple of other tricks, I was able to speed up a large directory workload by 20x on a 64 CPU system. I believe vmap/vunmap is actually sped up a lot more than 20x on such a system, but I'm running into other locks now. vmap is pretty well blown off the profiles. Before: 1352059 total 0.1401 798784 _write_lock 8320.6667 <- vmlist_lock 529313 default_idle 1181.5022 15242 smp_call_function 15.8771 <- vmap tlb flushing 2472 __get_vm_area_node 1.9312 <- vmap 1762 remove_vm_area 4.5885 <- vunmap 316 map_vm_area 0.2297 <- vmap 312 kfree 0.1950 300 _spin_lock 3.1250 252 sn_send_IPI_phys 0.4375 <- tlb flushing 238 vmap 0.8264 <- vmap 216 find_lock_page 0.5192 196 find_next_bit 0.3603 136 sn2_send_IPI 0.2024 130 pio_phys_write_mmr 2.0312 118 unmap_kernel_range 0.1229 After: 78406 total 0.0081 40053 default_idle 89.4040 33576 ia64_spinlock_contention 349.7500 1650 _spin_lock 17.1875 319 __reg_op 0.5538 281 _atomic_dec_and_lock 1.0977 153 mutex_unlock 1.5938 123 iget_locked 0.1671 117 xfs_dir_lookup 0.1662 117 dput 0.1406 114 xfs_iget_core 0.0268 92 xfs_da_hashname 0.1917 75 d_alloc 0.0670 68 vmap_page_range 0.0462 <- vmap 58 kmem_cache_alloc 0.0604 57 memset 0.0540 52 rb_next 0.1625 50 __copy_user 0.0208 49 bitmap_find_free_region 0.2188 <- vmap 46 ia64_sn_udelay 0.1106 45 find_inode_fast 0.1406 42 memcmp 0.2188 42 finish_task_switch 0.1094 42 __d_lookup 0.0410 40 radix_tree_lookup_slot 0.1250 37 _spin_unlock_irqrestore 0.3854 36 xfs_bmapi 0.0050 36 kmem_cache_free 0.0256 35 xfs_vn_getattr 0.0322 34 radix_tree_lookup 0.1062 33 __link_path_walk 0.0035 31 xfs_da_do_buf 0.0091 30 _xfs_buf_find 0.0204 28 find_get_page 0.0875 27 xfs_iread 0.0241 27 __strncpy_from_user 0.2812 26 _xfs_buf_initialize 0.0406 24 _xfs_buf_lookup_pages 0.0179 24 vunmap_page_range 0.0250 <- vunmap 23 find_lock_page 0.0799 22 vm_map_ram 0.0087 <- vmap 20 kfree 0.0125 19 put_page 0.0330 18 __kmalloc 0.0176 17 xfs_da_node_lookup_int 0.0086 17 _read_lock 0.0885 17 page_waitqueue 0.0664 vmap has gone from being the top 5 on the profiles and flushing the crap out of all TLBs, to using less than 1% of kernel time. [akpm@linux-foundation.org: cleanups, section fix] [akpm@linux-foundation.org: fix build on alpha] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Export kmap_atomic_pfn for DRM-GEM.Eric Anholt2008-10-171-0/+1
| | | | | | | | | | | | | | | | | | The driver would like to map IO space directly for copying data in when appropriate, to avoid CPU cache flushing for streaming writes. kmap_atomic_pfn lets us avoid IPIs associated with ioremap for this process. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Dave Airlie <airlied@redhat.com>
* | Merge branch 'core-v28-for-linus' of ↵Linus Torvalds2008-10-171-0/+6
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: do_generic_file_read: s/EINTR/EIO/ if lock_page_killable() fails softirq, warning fix: correct a format to avoid a warning softirqs, debug: preemption check x86, pci-hotplug, calgary / rio: fix EBDA ioremap() IO resources, x86: ioremap sanity check to catch mapping requests exceeding, fix IO resources, x86: ioremap sanity check to catch mapping requests exceeding the BAR sizes softlockup: Documentation/sysctl/kernel.txt: fix softlockup_thresh description dmi scan: warn about too early calls to dmi_check_system() generic: redefine resource_size_t as phys_addr_t generic: make PFN_PHYS explicitly return phys_addr_t generic: add phys_addr_t for holding physical addresses softirq: allocate less vectors IO resources: fix/remove printk printk: robustify printk, update comment printk: robustify printk, fix #2 printk: robustify printk, fix printk: robustify printk Fixed up conflicts in: arch/powerpc/include/asm/types.h arch/powerpc/platforms/Kconfig.cputype manually.
| | \
| | \
| | \
| | \
| *---. \ Merge branches 'core/softlockup', 'core/softirq', 'core/resources', ↵Ingo Molnar2008-10-151-0/+6
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | 'core/printk' and 'core/misc' into core-v28-for-linus
| | | * | IO resources, x86: ioremap sanity check to catch mapping requests exceeding ↵Suresh Siddha2008-09-261-0/+6
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the BAR sizes Go through the iomem resource tree to check if any of the ioremap() requests span more than any slot in the iomem resource tree and do a WARN_ON() if we hit this check. This will raise a red-flag, if some driver is mapping more than what is needed. And hopefully identify possible corruptions much earlier. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2008-10-171-28/+17
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix compat-vdso x86/mm: unify init task OOM handling x86/mm: do not trigger a kernel warning if user-space disables interrupts and generates a page fault
| * | | x86/mm: unify init task OOM handlingIngo Molnar2008-10-131-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linus noticed that the "again:" versus "survive:" OOM logic for the init task was arbitrarily different. The 64-bit codepath is the better one, because it correctly re-lookups the vma after having dropped the ->mmap_sem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | x86/mm: do not trigger a kernel warning if user-space disables interrupts ↵Linus Torvalds2008-10-131-19/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and generates a page fault Arjan reported a spike in the following bug pattern in v2.6.27: http://www.kerneloops.org/searchweek.php?search=lock_page which happens because hwclock started triggering warnings due to a (correct) might_sleep() check in the MM code. The warning occurs because hwclock uses this dubious sequence of code to run "atomic" code: static unsigned long atomic(const char *name, unsigned long (*op)(unsigned long), unsigned long arg) { unsigned long v; __asm__ volatile ("cli"); v = (*op)(arg); __asm__ volatile ("sti"); return v; } Then it pagefaults in that "atomic" section, triggering the warning. There is no way the kernel could provide "atomicity" in this path, a page fault is a cannot-continue machine event so the kernel has to wait for the page to be filled in. Even if it was just a minor fault we'd have to take locks and might have to spend quite a bit of time with interrupts disabled - not nice to irq latencies in general. So instead just enable interrupts in the pagefault path unconditionally if we come from user-space, and handle the fault. Also, while touching this code, unify some trivial parts of the x86 VM paths at the same time. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: change early_ioremap to use slots instead of nestingYinghai Lu2008-10-131-20/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | so we could remove the requirement that one needs to call early_iounmap() in exactly reverse order of early_ioremap(). Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2Vegard Nossum2008-10-131-2/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virt_addr_valid() calls __pa(), which calls __phys_addr(). With CONFIG_DEBUG_VIRTUAL=y, __phys_addr() will kill the kernel if the address *isn't* valid. That's clearly wrong for virt_addr_valid(). We also incorporate the debugging checks into virt_addr_valid(). Signed-off-by: Vegard Nossum <vegardno@ben.ifi.uio.no> Acked-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | traps: x86: remove trace_hardirqs_fixup from pagefault handlerAlexander van Heukelum2008-10-131-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last use of trace_hardirqs_fixup is unnecessary, because the trap is taken with interrupt off on i386 as well as x86_64, and the irq-tracer is notified of this from the assembly code. trace_hardirqs_fixup and trace_hardirqs_fixup_flags are removed from include/asm-x86/irqflags.h as they are no longer used. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86, uv: add early detection of UV system typesJack Steiner2008-10-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Portions of the ACPI code needs to know if a system is a UV system prior to genapic initialization. This patch adds a call early_acpi_boot_init() so that the apic type is discovered earlier. V2 of the patch adding fixes from Yinghai Lu. Much cleaner and smaller. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: make mm/gup.c more virtualization friendlyJan Beulich2008-10-131-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since pte_flags() is much cheaper than pte_val() in some virtualized environments (namely, Xen), use the former whereever possible. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: "Nick Piggin" <npiggin@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86-64: fix combining of regions in init_memory_mapping()Jan Beulich2008-10-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When nr_range gets decremented, the same slot must be considered for coalescing with its new successor again. The issue is apparently pretty benign to native code, but surfaces as a boot time crash in our forward ported Xen tree (where the page table setup overall works differently than in native). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86-64: don't check for map replacementJeremy Fitzhardinge2008-10-131-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check prevents flags on mappings from being changed, which is not desireable. There's no need to check for replacing a mapping, and x86-32 does not do this check. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: add early_memremap()Jeremy Fitzhardinge2008-10-132-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | early_ioremap() is also used to map normal memory when constructing the linear memory mapping. However, since we sometimes need to be able to distinguish between actual IO mappings and normal memory mappings, add a early_memremap() call, which maps with PAGE_KERNEL (as opposed to PAGE_KERNEL_IO for early_ioremap()), and use it when constructing pagetables. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | x86: add _PAGE_IOMAP pte flag for IO mappingsJeremy Fitzhardinge2008-10-133-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use one of the software-defined PTE bits to indicate that a mapping is intended for an IO address. On native hardware this is irrelevent, since a physical address is a physical address. But in a virtual environment, physical addresses are also virtualized, so there needs to be some way to distinguish between pseudo-physical addresses and actual hardware addresses; _PAGE_IOMAP indicates this intent. By default, __supported_pte_mask masks out _PAGE_IOMAP, so it doesn't even appear in the final pagetable. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>