summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* fs/proc/mmu.c: headers butcheryAlexey Dobriyan2007-10-171-19/+2
| | | | | | | | | | | | | | | | | | | | | | | fs/proc/mmu.c consists of only one function which uses only: 1) struct vmalloc_info * 2) struct vm_struct * 3) struct vmalloc_info 4) vmlist 5) VMALLOC_TOTAL, VMALLOC_START, VMALLOC_END 6) read_lock, read_unlock 7) vmlist_lock 8) struct vm_struct This gives us linux/spinlock.h, asm/pgtable.h, "internal.h", linux/vmalloc.h. asm/pgtable.h uses PKMAP_BASE on i386, for which asm/highmem.h is needed. But, linux/highmem.h is actually used to make it compile everywhere. I'll deal later with this particular i386 surprise. Cross-compile tested on many archs and configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_poll: return -EINTR when signalledOleg Nesterov2007-10-171-7/+7
| | | | | | | | | | | | | | do_poll() checks signal_pending() but returns 0 when interrupted. This means the caller has to check signal_pending() again. Change it to return -EINTR when signal_pending() and count == 0. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Andi Kleen <ak@suse.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_sys_poll: simplify playing with on-stack dataOleg Nesterov2007-10-171-56/+36
| | | | | | | | | | | | | | | | | | | Cleanup. Lessens both the source and compiled code (100 bytes) and imho makes the code much more understandable. With this patch "struct poll_list *head" always points to on-stack stack_pps, so we can remove all "is it on-stack" and "was it initialized" checks. Also, move poll_initwait/poll_freewait and -EINTR detection closer to the do_poll()'s callsite. [akpm@linux-foundation.org: fix warning (size_t != uint)] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Looks-good-to: Andi Kleen <ak@suse.de> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Vadim Lobanov <vlobanov@speakeasy.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove unneeded lock_kernel() in driver/block/loop.cDiego Woitasen2007-10-171-2/+0
| | | | | | | Signed-off-by: Diego Woitasen <diego@woitasen.com.ar> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KCONFIG: Make "Instrumentation support" non-EXPERIMENTALRobert P. J. Day2007-10-176-6/+0
| | | | | | | | | | | | | | It makes more sense to make instrumentation support experimental on a case-by-case basis. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: mark nibblemap constPhilippe De Muyter2007-10-175-5/+5
| | | | | | Signed-off-by: Philippe De Muyter <phdm@macqel.be> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/romfs/inode.c: trivial improvementsWANG Cong2007-10-171-2/+2
| | | | | | | | | | | | - There are no lists in fs/romfs/inode.c, so using list_entry is a bit confusing. Replace it with container_of. - It is unnecessary to cast the return value of kmem_cache_alloc, since it returns a void* pointer. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nbd: change a parameter's type to remove a memcpy callDenis Cheng2007-10-171-5/+3
| | | | | | | | | | | | | | This memcpy looks so strange, in fact it's merely a pointer dereference, so I change the parameter's type to refer it more directly, this could make the memcpy not needed anymore. In the function nbd_read_stat where nbd_find_request is only once called, the parameter served should be transformed accordingly. Signed-off-by: Denis Cheng <crquan@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nbd: use list_for_each_entry_safe to make it more consolidated and readableDenis Cheng2007-10-171-4/+2
| | | | | | | | | Thus the traverse of the loop may delete nodes, use the safe version. Signed-off-by: Denis Cheng <crquan@gmail.com> Cc: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kill DECLARE_MUTEX_LOCKEDChristoph Hellwig2007-10-1724-24/+3
| | | | | | | | | | | | | DECLARE_MUTEX_LOCKED was used for semaphores used as completions and we've got rid of them. Well, except for one in libusual that the maintainer explicitly wants to keep as semaphore. So convert that useage to an explicit sema_init and kill of DECLARE_MUTEX_LOCKED so that new code is reminded to use a completion. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: "Satyam Sharma" <satyam.sharma@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* unexport asm/shmparam.hOlaf Hering2007-10-174-4/+2
| | | | | | | | | | | | | | SHMLBA cant possible be used in userspace, see sparc versions of that header. Do not export asm/shmparam.h during make headers_install_all This removes another uservisible place of PAGE_SIZE Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Driver for the Atmel on-chip SSC on AT32AP and AT91Hans-Christian Egtvedt2007-10-174-0/+499
| | | | | | | | | | | | | | | | | | | | | | | | The Synchronous Serial Controller (SSC) on Atmel microprocessors are capable of tranceiving many frame based protocols, like I2S. Tested on the AT32AP7000/ATSTK1000. This driver is used in the ALSA sound driver for the AT73C213 external DAC on the ATSTK1000 development board for AVR32. This sound driver will be submitted soon. Hardware documentation can be found in the AT32AP7000 data sheet, which can be downloaded from http://www.atmel.com/dyn/products/datasheets.asp?family_id=682 [akpm@linux-foundation.org: init spinlock at compile time] Signed-off-by: Hans-Christian Egtvedt <hcegtvedt@atmel.com> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: David Brownell <david-b@pacbell.net> Cc: Andrew Victor <andrew@sanpeople.com> Cc: Patrice Vilchez <patrice.vilchez@rfo.atmel.com> Cc: Nicolas Ferre <nicolas.ferre@rfo.atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Force erroneous inclusions of compiler-*.h files to be errorsRobert P. J. Day2007-10-174-4/+12
| | | | | | | | | | Replace worthless comments with actual preprocessor errors when including the wrong versions of the compiler.h files. [akpm@linux-foundation.org: make it work] Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zisofs use mutex instead of semaphoreDave Young2007-10-171-20/+5
| | | | | | | | | | Use mutex instead of semaphore in fs/isofs/compress.c, and remove an unnecessary variable. Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* SLAB_PANIC more (proc, posix-timers, shmem)Alexey Dobriyan2007-10-173-7/+4
| | | | | | | | These aren't modular, so SLAB_PANIC is OK. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* softlockup: add a /proc tuning parameterRavikiran G Thirumalai2007-10-174-9/+40
| | | | | | | | | | | | | | | | | | Control the trigger limit for softlockup warnings. This is useful for debugging softlockups, by lowering the softlockup_thresh to identify possible softlockups earlier. This patch: 1. Adds a sysctl softlockup_thresh with valid values of 1-60s (Higher value to disable false positives) 2. Changes the softlockup printk to print the cpu softlockup time [akpm@linux-foundation.org: Fix various warnings and add definition of "two"] Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* softlockup watchdog: style cleanupsIngo Molnar2007-10-171-7/+7
| | | | | | | | | | | | | | | | kernel/softirq.c grew a few style uncleanlinesses in the past few months, clean that up. No functional changes: text data bss dec hex filename 1126 76 4 1206 4b6 softlockup.o.before 1129 76 4 1209 4b9 softlockup.o.after ( the 3 bytes .text increase is due to the "<1>" appended to one of the printk messages. ) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* softlockup: improve debug outputIngo Molnar2007-10-171-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c0105f59>] dump_stack+0x14/0x16 [<c015f6bc>] softlockup_tick+0xbe/0xd0 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fb8>] tick_sched_timer+0x7c/0xc0 [<c0140a75>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm: prctl EIP: 0060:[<c013915f>] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26) EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000 ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0 [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c01040be>] show_regs+0x1ab/0x1b3 [<c015f807>] softlockup_tick+0xef/0x108 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fcc>] tick_sched_timer+0x7c/0xc0 [<c0140a89>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [satyam@infradead.org: fix warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* softlockup: make asm/irq_regs.h available on every platformIngo Molnar2007-10-174-0/+4
| | | | | | | | | | | | | | | | | | The softlockup detector would like to use get_irq_regs(), so generalize the availability on every Linux architecture. (It is fine for an architecture to always return NULL to get_irq_regs(), which it does by default.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Ian Molton <spyro@f2s.com> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix the softlockup watchdog to actually workIngo Molnar2007-10-171-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | this Xen related commit: commit 966812dc98e6a7fcdf759cbfa0efab77500a8868 Author: Jeremy Fitzhardinge <jeremy@goop.org> Date: Tue May 8 00:28:02 2007 -0700 Ignore stolen time in the softlockup watchdog broke the softlockup watchdog to never report any lockups. (!) print_timestamp defaults to 0, this makes the following condition always true: if (print_timestamp < (touch_timestamp + 1) || and we'll in essence never report soft lockups. apparently the functionality of the soft lockup watchdog was never actually tested with that patch applied ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* softlockup: use cpu_clock() instead of sched_clock()Ingo Molnar2007-10-171-4/+6
| | | | | | | | sched_clock() is not a reliable time-source, use cpu_clock() instead. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove workaround for unimmunized rcu_dereference from mce_log()Paul E. McKenney2007-10-171-3/+0
| | | | | | | | | | | Remove the rmb() from mce_log(), since the immunized version of rcu_dereference() makes it unnecessary. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Immunize rcu_dereference() against crazy compiler writersPaul E. McKenney2007-10-171-1/+13
| | | | | | | | | | | | | Turns out that compiler writers are a bit more aggressive about optimizing than one might expect. This patch prevents a number of such optimizations from messing up rcu_deference(). This is not merely a theoretical problem, as evidenced by the rmb() in mce_log(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Josh Triplett <josh@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Make unregister_binfmt() return voidAlexey Dobriyan2007-10-173-4/+3
| | | | | | | | | | | list_del() hardly can fail, so checking for return value is pointless (and current code always return 0). Nobody really cared that return value anyway. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use list_head in binfmt handlingAlexey Dobriyan2007-10-174-33/+17
| | | | | | | | | | | | | | Switch single-linked binfmt formats list to usual list_head's. This leads to one-liners in register_binfmt() and unregister_binfmt(). The downside is one pointer more in struct linux_binfmt. This is not a problem, since the set of registered binfmts on typical box is very small -- (ELF + something distro enabled for you). Test-booted, played with executable .txt files, modprobe/rmmod binfmt_misc. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/reiserfs/: cleanupsAdrian Bunk2007-10-174-67/+4
| | | | | | | | | | | | | | | | | - remove the following no longer used functions: - bitmap.c: reiserfs_claim_blocks_to_be_allocated() - bitmap.c: reiserfs_release_claimed_blocks() - bitmap.c: reiserfs_can_fit_pages() - make the following functions static: - inode.c: restart_transaction() - journal.c: reiserfs_async_progress_wait() Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Vladimir V. Saveliev <vs@namesys.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* writeback: don't propagate AOP_WRITEPAGE_ACTIVATEAndrew Morton2007-10-171-1/+3
| | | | | | | | | This is a writeback-internal marker but we're propagating it all the way back to userspace!. Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: document tree_lock->zone.lock lockorderNick Piggin2007-10-172-0/+2
| | | | | | | | | | | | | zone->lock is quite an "inner" lock and mostly constrained to page alloc as well, so like slab locks, it probably isn't something that is critically important to document here. However unlike slab locks, zone lock could be used more widely in future, and page_alloc.c might possibly have more business to do tricky things with pagecache than does slab. So... I don't think it hurts to document it. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: test and set zone reclaim lock before starting reclaimDavid Rientjes2007-10-172-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces new zone flag interface for testing and setting flags: int zone_test_and_set_flag(struct zone *zone, zone_flags_t flag) Instead of setting and clearing ZONE_RECLAIM_LOCKED each time shrink_zone() is called, this flag is test and set before starting zone reclaim. Zone reclaim starts in __alloc_pages() when a zone's watermark fails and the system is in zone_reclaim_mode. If it's already in reclaim, there's no need to start again so it is simply considered full for that allocation attempt. There is a change of behavior with regard to concurrent zone shrinking. It is now possible for try_to_free_pages() or kswapd to already be shrinking a particular zone when __alloc_pages() starts zone reclaim. In this case, it is possible for two concurrent threads to invoke shrink_zone() for a single zone. This change forbids a zone to be in zone reclaim twice, which was always the behavior, but allows for concurrent try_to_free_pages() or kswapd shrinking when starting zone reclaim. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: convert zone_scan_lock from mutex to spinlockDavid Rientjes2007-10-171-5/+5
| | | | | | | | | | | | | | There's no reason to sleep in try_set_zone_oom() or clear_zonelist_oom() if the lock can't be acquired; it will be available soon enough once the zonelist scanning is done. All other threads waiting for the OOM killer are also contingent on the exiting task being able to acquire the lock in clear_zonelist_oom() so it doesn't make sense to put it to sleep. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: add header file to Kbuild as unifdefDavid Rientjes2007-10-171-0/+1
| | | | | | | | | | | Preprocess include/linux/oom.h before exporting it to userspace. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: prevent including sched.h in header fileDavid Rientjes2007-10-171-2/+5
| | | | | | | | | | | | | It's not necessary to include all of linux/sched.h in linux/oom.h. Instead, simply include prototypes for the relevant structs and include linux/types.h for gfp_t. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: do not take callback_mutexDavid Rientjes2007-10-171-3/+0
| | | | | | | | | | | | Since no task descriptor's 'cpuset' field is dereferenced in the execution of the OOM killer anymore, it is no longer necessary to take callback_mutex. [akpm@linux-foundation.org: restore cpuset_lock for other patches] Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: compare cpuset mems_allowed instead of exclusive ancestorsDavid Rientjes2007-10-173-35/+16
| | | | | | | | | | | | | | | | | | | Instead of testing for overlap in the memory nodes of the the nearest exclusive ancestor of both current and the candidate task, it is better to simply test for intersection between the task's mems_allowed in their task descriptors. This does not require taking callback_mutex since it is only used as a hint in the badness scoring. Tasks that do not have an intersection in their mems_allowed with the current task are not explicitly restricted from being OOM killed because it is quite possible that the candidate task has allocated memory there before and has since changed its mems_allowed. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: suppress extraneous stack and memory dumpDavid Rientjes2007-10-171-13/+14
| | | | | | | | | | | | | Suppresses the extraneous stack and memory dump when a parallel OOM killing has been found. There's no need to fill the ring buffer with this information if its already been printed and the condition that triggered the previous OOM killer has not yet been alleviated. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: add oom_kill_allocating_task sysctlDavid Rientjes2007-10-173-5/+39
| | | | | | | | | | | | | | Adds a new sysctl, 'oom_kill_allocating_task', which will automatically kill the OOM-triggering task instead of scanning through the tasklist to find a memory-hogging target. This is helpful for systems with an insanely large number of tasks where scanning the tasklist significantly degrades performance. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: serialize out of memory callsDavid Rientjes2007-10-171-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | A final allocation attempt with a very high watermark needs to be attempted before invoking out_of_memory(). OOM killer serialization needs to occur before this final attempt, otherwise tasks attempting to OOM-lock all zones in its zonelist may spin and acquire the lock unnecessarily after the OOM condition has already been alleviated. If the final allocation does succeed, the zonelist is simply OOM-unlocked and __alloc_pages() returns the page. Otherwise, the OOM killer is invoked. If the task cannot acquire OOM-locks on all zones in its zonelist, it is put to sleep and the allocation is retried when it gets rescheduled. One of its zones is already marked as being in the OOM killer so it'll hopefully be getting some free memory soon, at least enough to satisfy a high watermark allocation attempt. This prevents needlessly killing a task when the OOM condition would have already been alleviated if it had simply been given enough time. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: add per-zone lockingDavid Rientjes2007-10-173-0/+60
| | | | | | | | | | | | | | | | | | | | | | | | OOM killer synchronization should be done with zone granularity so that memory policy and cpuset allocations may have their corresponding zones locked and allow parallel kills for other OOM conditions that may exist elsewhere in the system. DMA allocations can be targeted at the zone level, which would not be possible if locking was done in nodes or globally. Synchronization shall be done with a variation of "trylocks." The goal is to put the current task to sleep and restart the failed allocation attempt later if the trylock fails. Otherwise, the OOM killer is invoked. Each zone in the zonelist that __alloc_pages() was called with is checked for the newly-introduced ZONE_OOM_LOCKED flag. If any zone has this flag present, the "trylock" to serialize the OOM killer fails and returns zero. Otherwise, all the zones have ZONE_OOM_LOCKED set and the try_set_zone_oom() function returns non-zero. Cc: Andrea Arcangeli <andrea@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: change all_unreclaimable zone member to flagsDavid Rientjes2007-10-174-21/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the int all_unreclaimable member of struct zone to unsigned long flags. This can now be used to specify several different zone flags such as all_unreclaimable and reclaim_in_progress, which can now be removed and converted to a per-zone flag. Flags are set and cleared as follows: zone_set_flag(struct zone *zone, zone_flags_t flag) zone_clear_flag(struct zone *zone, zone_flags_t flag) Defines the first zone flags, ZONE_ALL_UNRECLAIMABLE and ZONE_RECLAIM_LOCKED, which have the same semantics as the old zone->all_unreclaimable and zone->reclaim_in_progress, respectively. Also converts all current users that set or clear either flag to use the new interface. Helper functions are defined to test the flags: int zone_is_all_unreclaimable(const struct zone *zone) int zone_is_reclaim_locked(const struct zone *zone) All flag operators are of the atomic variety because there are currently readers that are implemented that do not take zone->lock. [akpm@linux-foundation.org: add needed include] Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: move constraints to enumDavid Rientjes2007-10-172-9/+12
| | | | | | | | | | | The OOM killer's CONSTRAINT definitions are really more appropriate in an enum, so define them in include/linux/oom.h. Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* oom: move prototypes to appropriate header fileDavid Rientjes2007-10-175-6/+13
| | | | | | | | | | | | | Move the OOM killer's extern function prototypes to include/linux/oom.h and include it where necessary. [clg@fr.ibm.com: build fix] Cc: Andrea Arcangeli <andrea@suse.de> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Slab API: remove useless ctor parameter and reorder parametersChristoph Lameter2007-10-1767-99/+83
| | | | | | | | | | | | | | | | | | | | | Slab constructors currently have a flags parameter that is never used. And the order of the arguments is opposite to other slab functions. The object pointer is placed before the kmem_cache pointer. Convert ctor(void *object, struct kmem_cache *s, unsigned long flags) to ctor(struct kmem_cache *s, void *object) throughout the kernel [akpm@linux-foundation.org: coupla fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* SLUB: simplify IRQ off handlingChristoph Lameter2007-10-171-11/+7
| | | | | | | | | | Move irq handling out of new slab into __slab_alloc. That is useful for Mathieu's cmpxchg_local patchset and also allows us to remove the crude local_irq_off in early_kmem_cache_alloc(). Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: dirty balancing for tasksPeter Zijlstra2007-10-174-1/+62
| | | | | | | | | | | | | | | | | | | | Based on ideas of Andrew: http://marc.info/?l=linux-kernel&m=102912915020543&w=2 Scale the bdi dirty limit inversly with the tasks dirty rate. This makes heavy writers have a lower dirty limit than the occasional writer. Andrea proposed something similar: http://lwn.net/Articles/152277/ The main disadvantage to his patch is that he uses an unrelated quantity to measure time, which leaves him with a workload dependant tunable. Other than that the two approaches appear quite similar. [akpm@linux-foundation.org: fix warning] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: per device dirty thresholdPeter Zijlstra2007-10-175-38/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scale writeback cache per backing device, proportional to its writeout speed. By decoupling the BDI dirty thresholds a number of problems we currently have will go away, namely: - mutual interference starvation (for any number of BDIs); - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts). It might be that all dirty pages are for a single BDI while other BDIs are idling. By giving each BDI a 'fair' share of the dirty limit, each one can have dirty pages outstanding and make progress. A global threshold also creates a deadlock for stacked BDIs; when A writes to B, and A generates enough dirty pages to get throttled, B will never start writeback until the dirty pages go away. Again, by giving each BDI its own 'independent' dirty limit, this problem is avoided. So the problem is to determine how to distribute the total dirty limit across the BDIs fairly and efficiently. A DBI that has a large dirty limit but does not have any dirty pages outstanding is a waste. What is done is to keep a floating proportion between the DBIs based on writeback completions. This way faster/more active devices get a larger share than slower/idle devices. [akpm@linux-foundation.org: fix warnings] [hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lib: floating proportionsPeter Zijlstra2007-10-173-1/+505
| | | | | | | | | | | | | | | | | Given a set of objects, floating proportions aims to efficiently give the proportional 'activity' of a single item as compared to the whole set. Where 'activity' is a measure of a temporal property of the items. It is efficient in that it need not inspect any other items of the set in order to provide the answer. It is not even needed to know how many other items there are. It has one parameter, and that is the period of 'time' over which the 'activity' is measured. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: count writeback pages per BDIPeter Zijlstra2007-10-172-2/+11
| | | | | | | | Count per BDI writeback pages. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: count reclaimable pages per BDIPeter Zijlstra2007-10-175-0/+16
| | | | | | | | Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: scalable bdi statistics countersPeter Zijlstra2007-10-172-3/+109
| | | | | | | | Provide scalable per backing_dev_info statistics counters. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: bdi init hooksPeter Zijlstra2007-10-1719-7/+131
| | | | | | | | | provide BDI constructor/destructor hooks [akpm@linux-foundation.org: compile fix] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>