diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2014-12-13 22:00:36 +0100 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2014-12-13 22:00:36 +0100 |
commit | 78a45c6f067824cf5d0a9fedea7339ac2e28603c (patch) | |
tree | b4f78c8b6b9059ddace0a18c11629b8d2045f793 /Documentation/vm | |
parent | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (diff) | |
parent | cgroups: Documentation: fix trivial typos and wrong paragraph numberings (diff) | |
download | linux-78a45c6f067824cf5d0a9fedea7339ac2e28603c.tar.xz linux-78a45c6f067824cf5d0a9fedea7339ac2e28603c.zip |
Merge branch 'akpm' (second patch-bomb from Andrew)
Merge second patchbomb from Andrew Morton:
- the rest of MM
- misc fs fixes
- add execveat() syscall
- new ratelimit feature for fault-injection
- decompressor updates
- ipc/ updates
- fallocate feature creep
- fsnotify cleanups
- a few other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
cgroups: Documentation: fix trivial typos and wrong paragraph numberings
parisc: percpu: update comments referring to __get_cpu_var
percpu: update local_ops.txt to reflect this_cpu operations
percpu: remove __get_cpu_var and __raw_get_cpu_var macros
fsnotify: remove destroy_list from fsnotify_mark
fsnotify: unify inode and mount marks handling
fallocate: create FAN_MODIFY and IN_MODIFY events
mm/cma: make kmemleak ignore CMA regions
slub: fix cpuset check in get_any_partial
slab: fix cpuset check in fallback_alloc
shmdt: use i_size_read() instead of ->i_size
ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
ipc/msg: increase MSGMNI, remove scaling
ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
lib/decompress.c: consistency of compress formats for kernel image
decompress_bunzip2: off by one in get_next_block()
usr/Kconfig: make initrd compression algorithm selection not expert
fault-inject: add ratelimit option
ratelimit: add initialization macro
...
Diffstat (limited to 'Documentation/vm')
-rw-r--r-- | Documentation/vm/page_owner.txt | 81 |
1 files changed, 81 insertions, 0 deletions
diff --git a/Documentation/vm/page_owner.txt b/Documentation/vm/page_owner.txt new file mode 100644 index 000000000000..8f3ce9b3aa11 --- /dev/null +++ b/Documentation/vm/page_owner.txt @@ -0,0 +1,81 @@ +page owner: Tracking about who allocated each page +----------------------------------------------------------- + +* Introduction + +page owner is for the tracking about who allocated each page. +It can be used to debug memory leak or to find a memory hogger. +When allocation happens, information about allocation such as call stack +and order of pages is stored into certain storage for each page. +When we need to know about status of all pages, we can get and analyze +this information. + +Although we already have tracepoint for tracing page allocation/free, +using it for analyzing who allocate each page is rather complex. We need +to enlarge the trace buffer for preventing overlapping until userspace +program launched. And, launched program continually dump out the trace +buffer for later analysis and it would change system behviour with more +possibility rather than just keeping it in memory, so bad for debugging. + +page owner can also be used for various purposes. For example, accurate +fragmentation statistics can be obtained through gfp flag information of +each page. It is already implemented and activated if page owner is +enabled. Other usages are more than welcome. + +page owner is disabled in default. So, if you'd like to use it, you need +to add "page_owner=on" into your boot cmdline. If the kernel is built +with page owner and page owner is disabled in runtime due to no enabling +boot option, runtime overhead is marginal. If disabled in runtime, it +doesn't require memory to store owner information, so there is no runtime +memory overhead. And, page owner inserts just two unlikely branches into +the page allocator hotpath and if it returns false then allocation is +done like as the kernel without page owner. These two unlikely branches +would not affect to allocation performance. Following is the kernel's +code size change due to this facility. + +- Without page owner + text data bss dec hex filename + 40662 1493 644 42799 a72f mm/page_alloc.o + +- With page owner + text data bss dec hex filename + 40892 1493 644 43029 a815 mm/page_alloc.o + 1427 24 8 1459 5b3 mm/page_ext.o + 2722 50 0 2772 ad4 mm/page_owner.o + +Although, roughly, 4 KB code is added in total, page_alloc.o increase by +230 bytes and only half of it is in hotpath. Building the kernel with +page owner and turning it on if needed would be great option to debug +kernel memory problem. + +There is one notice that is caused by implementation detail. page owner +stores information into the memory from struct page extension. This memory +is initialized some time later than that page allocator starts in sparse +memory system, so, until initialization, many pages can be allocated and +they would have no owner information. To fix it up, these early allocated +pages are investigated and marked as allocated in initialization phase. +Although it doesn't mean that they have the right owner information, +at least, we can tell whether the page is allocated or not, +more accurately. On 2GB memory x86-64 VM box, 13343 early allocated pages +are catched and marked, although they are mostly allocated from struct +page extension feature. Anyway, after that, no page is left in +un-tracking state. + +* Usage + +1) Build user-space helper + cd tools/vm + make page_owner_sort + +2) Enable page owner + Add "page_owner=on" to boot cmdline. + +3) Do the job what you want to debug + +4) Analyze information from page owner + cat /sys/kernel/debug/page_owner > page_owner_full.txt + grep -v ^PFN page_owner_full.txt > page_owner.txt + ./page_owner_sort page_owner.txt sorted_page_owner.txt + + See the result about who allocated each page + in the sorted_page_owner.txt. |