summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* block: make unplug timer trace event correspond to the schedule() unplugJens Axboe2011-04-163-18/+31
| | | | | | | | | | | It's a pretty close match to what we had before - the timer triggering would mean that nobody unplugged the plug in due time, in the new scheme this matches very closely what the schedule() unplug now is. It's essentially the difference between an explicit unplug (IO unplug) or an implicit unplug (timer unplug, we scheduled with pending IO queued). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: let io_schedule() flush the plug inlineJens Axboe2011-04-162-1/+14
| | | | | | | | | | | | | | Linus correctly observes that the most important dispatch cases are now done from kblockd, this isn't ideal for latency reasons. The original reason for switching dispatches out-of-line was to avoid too deep a stack, so by _only_ letting the "accidental" flush directly in schedule() be guarded by offload to kblockd, we should be able to get the best of both worlds. So add a blk_schedule_flush_plug() that offloads to kblockd, and only use that from the schedule() path. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2011-04-156-84/+78
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: only force kblockd unplugging from the schedule() path block: cleanup the block plug helper functions block, blk-sysfs: Use the variable directly instead of a function call block: move queue run on unplug to kblockd block: kill queue_sync_plugs() block: readd plug trace event block: add callback function for unplug notification block: add comment on why we save and disable interrupts in flush_plug_list() block: fixup block IO unplug trace call block: remove block_unplug_timer() trace point block: splice plug list to local context
| * block: only force kblockd unplugging from the schedule() pathJens Axboe2011-04-152-8/+9
| | | | | | | | | | | | | | | | | | For the explicit unplugging, we'd prefer to kick things off immediately and not pay the penalty of the latency to switch to kblockd. So let blk_finish_plug() do the run inline, while the implicit-on-schedule-out unplug will punt to kblockd. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: cleanup the block plug helper functionsChristoph Hellwig2011-04-152-21/+9
| | | | | | | | | | | | | | | | | | | | | | | | It's a bit of a mess currently. task->plug is being cleared and reset in __blk_finish_plug(), and blk_finish_plug() is testing for a NULL plug which cannot happen even from schedule() anymore since it uses blk_needs_flush_plug() to determine whether to call into this function at all. So get rid of some of the cruft. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block, blk-sysfs: Use the variable directly instead of a function callLiu Yuan2011-04-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | In the function blk_register_queue(), var _dev_ is already assigned by disk_to_dev().So use it directly instead of calling disk_to_dev() again. Signed-off-by: Liu Yuan <tailai.ly@taobao.com> Modified by me to delete an empty line in the same function while in there anyway. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: move queue run on unplug to kblockdJens Axboe2011-04-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are worries that we are now consuming a lot more stack in some cases, since we potentially call into IO dispatch from schedule() or io_schedule(). We can reduce this problem by moving the running of the queue to kblockd, like the old plugging scheme did as well. This may or may not be a good idea from a performance perspective, depending on how many tasks have queue plugs running at the same time. For even the slightly contended case, doing just a single queue run from kblockd instead of multiple runs directly from the unpluggers will be faster. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: kill queue_sync_plugs()Jens Axboe2011-04-121-14/+0
| | | | | | | | | | | | | | | | The original use for this dates back to when we had to track write requests for serializing around barriers. That's not needed anymore, so kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: readd plug trace eventJens Axboe2011-04-121-1/+9
| | | | | | | | | | | | | | | | | | This was removed with the queue plug state. But we can easily readd by checking if this is the first request going to this queue. It's good information to have when tracing to see how effective the plugging is. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: add callback function for unplug notificationJens Axboe2011-04-123-0/+22
| | | | | | | | | | | | | | MD would like to know when a queue is unplugged, so it can flush it's bitmap writes. Add such a callback. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: add comment on why we save and disable interrupts in flush_plug_list()Jens Axboe2011-04-121-0/+5
| | | | | | | | | | | | It's done at the top to avoid doing it for every queue we unplug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: fixup block IO unplug trace callJens Axboe2011-04-123-10/+22
| | | | | | | | | | | | | | It was removed with the on-stack plugging, readd it and track the depth of requests added when flushing the plug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: remove block_unplug_timer() trace pointJens Axboe2011-04-122-31/+0
| | | | | | | | | | | | | | We no longer have an unplug timer running, so no point in keeping the trace point. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * block: splice plug list to local contextNeilBrown2011-04-111-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the request_fn ends up blocking, we could be re-entering the plug flush. Since the list is protected by explicitly not allowing schedule events, this isn't a terribly good idea. Additionally, it can cause us to recurse. As request_fn called by __blk_run_queue is allowed to 'schedule()' (after dropping the queue lock of course), it is possible to get a recursive call: schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list -> __blk_run_queue -> request_fn -> schedule We must make sure that the second schedule does not call into blk_flush_plug again. So instead of leaving the list of requests on blk_plug->list, move them to a separate list leaving blk_plug->list empty. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6Linus Torvalds2011-04-152-58/+97
|\ \ | | | | | | | | | | | | | | | * 'linux-next' of git://git.infradead.org/ubifs-2.6: UBIFS: fix compilation warnings when compiling with gcc 4.5 UBIFS: fix oops when R/O file-system is fsync'ed
| * | UBIFS: fix compilation warnings when compiling with gcc 4.5Maksim Rayskiy2011-04-131-58/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiling UBIFS with CONFIG_UBIFS_FS_DEBUG not set, gcc-4.5.2 generates a slew of "warning: statement with no effect" on references to non-void functions defined as 0. To avoid these warnings, replace #defines with dummy inline functions. Artem: massage the patch a bit, also remove the duplicate 'dbg_check_lprops()' prototype. Signed-off-by: Maksim Rayskiy <maksim.rayskiy@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | UBIFS: fix oops when R/O file-system is fsync'edArtem Bityutskiy2011-04-131-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes severe UBIFS bug: UBIFS oopses when we 'fsync()' an file on R/O-mounter file-system. We (the UBIFS authors) incorrectly thought that VFS would not propagate 'fsync()' down to the file-system if it is read-only, but this is not the case. It is easy to exploit this bug using the following simple perl script: use strict; use File::Sync qw(fsync sync); die "File path is not specified" if not defined $ARGV[0]; my $path = $ARGV[0]; open FILE, "<", "$path" or die "Cannot open $path: $!"; fsync(\*FILE) or die "cannot fsync $path: $!"; close FILE or die "Cannot close $path: $!"; Thanks to Reuben Dowle <Reuben.Dowle@navico.com> for reporting about this issue. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reported-by: Reuben Dowle <Reuben.Dowle@navico.com> Cc: stable@kernel.org
* | | vfs: fix incorrect dentry_update_name_case() BUG_ON() testLinus Torvalds2011-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The case we should be verifying when updating the dentry name is that the _parent_ inode (the directory) semaphore is held, not the semaphore for the dentry itself. It's the directory locking that rename and readdir() etc all care about. The comment just above even says so - but then the BUG_ON() still checked the dentry inode itself. Very few people noticed, because this helper function really isn't used for very much, so you had to be using ncpfs to ever hit it. I think I should just remove the BUG_ON (the function really has just one user), but let's run with it fixed for a while before getting rid of it entirely. Reported-and-tested-by: Bongani Hlope <bonganih@bankservafrica.com> Reported-and-tested-by: Bernd Feige <bernd.feige@uniklinik-freiburg.de> Cc: Petr Vandrovec <petr@vandrovec.name>, Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Nick Piggin <npiggin@kernel.dk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2011-04-154-23/+42
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: Blackfin: SMP: fix cache flush loop Blackfin: time-ts: ack gptimer sooner to avoid missing short ints Blackfin: gptimers: fix thinko when disabling timers Blackfin: SMP: make all barriers handle cache issues
| * | | Blackfin: SMP: fix cache flush loopSonic Zhang2011-04-141-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent commit (10774912647781) wasn't entirely correct. While it fixed some issues, it introduced others. So pull in the fixes from the public cache flush functions, and document why we need to call things directly ourselves. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
| * | | Blackfin: time-ts: ack gptimer sooner to avoid missing short intsMike Frysinger2011-04-141-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the period of a gptimer is fairly low, we might miss an interrupt by acking it too late (we end up acking the new int as well). Reported-by: Isabelle Leonardi <i.leonardi@detracom.fr> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
| * | | Blackfin: gptimers: fix thinko when disabling timersMike Frysinger2011-04-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only want to clear the run bit for this one timer, not all status bits. So don't read the whole reg and then write all the bits back out. Reported-by: Isabelle Leonardi <i.leonardi@detracom.fr> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
| * | | Blackfin: SMP: make all barriers handle cache issuesGraf Yang2011-04-141-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When suspending/resuming, the common task freezing code will run in parallel and freeze processes on each core. This is because the code uses the non-smp version of memory barriers (as well it should). The Blackfin smp barrier logic at the moment contains the cache sync logic, but the non-smp barriers do not. This is incorrect as Rafel summarized: > ... > The existing memory barriers are SMP barriers too, but they are more > than _just_ SMP barriers. At least that's how it is _supposed_ to be > (eg. rmb() is supposed to be stronger than smp_rmb()). > ... > However, looking at the blackfin's definitions of SMP barriers I see > that it uses extra stuff that should _also_ be used in the definitions > of the mandatory barriers. > ... URL: http://lkml.org/lkml/2011/4/13/11 LKML-Reference: <BANLkTi=F-C-vwX4PGGfbkdTBw3OWL-twfg@mail.gmail.com> Signed-off-by: Graf Yang <graf.yang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-04-151-3/+9
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix linger request requeueing
| * | | | libceph: fix linger request requeueingSage Weil2011-04-061-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the request transition from linger -> normal request. The key is to preserve r_osd and requeue on the same OSD. Reregister as a normal request, add the request to the proper queues, then unregister the linger. Fix the unregister helper to avoid clearing r_osd (and also simplify the parallel check in __unregister_request()). Reported-by: Henry Chang <henry.cy.chang@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | | | | mm/thp: use conventional format for boolean attributesBen Hutchings2011-04-151-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conventional format for boolean attributes in sysfs is numeric ("0" or "1" followed by new-line). Any boolean attribute can then be read and written using a generic function. Using the strings "yes [no]", "[yes] no" (read), "yes" and "no" (write) will frustrate this. [akpm@linux-foundation.org: use kstrtoul()] [akpm@linux-foundation.org: test_bit() doesn't return 1/0, per Neil] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Tested-by: David Rientjes <rientjes@google.com> Cc: NeilBrown <neilb@suse.de> Cc: <stable@kernel.org> [2.6.38.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | ramfs: fix memleak on no-mmu archBob Liu2011-04-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On no-mmu arch, there is a memleak during shmem test. The cause of this memleak is ramfs_nommu_expand_for_mapping() added page refcount to 2 which makes iput() can't free that pages. The simple test file is like this: int main(void) { int i; key_t k = ftok("/etc", 42); for ( i=0; i<100; ++i) { int id = shmget(k, 10000, 0644|IPC_CREAT); if (id == -1) { printf("shmget error\n"); } if(shmctl(id, IPC_RMID, NULL ) == -1) { printf("shm rm error\n"); return -1; } } printf("run ok...\n"); return 0; } And the result: root:/> free total used free shared buffers Mem: 60320 17912 42408 0 0 -/+ buffers: 17912 42408 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 19096 41224 0 0 -/+ buffers: 19096 41224 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 20296 40024 0 0 -/+ buffers: 20296 40024 ... After this patch the test result is:(no memleak anymore) root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 root:/> shmem run ok... root:/> free total used free shared buffers Mem: 60320 16668 43652 0 0 -/+ buffers: 16668 43652 Signed-off-by: Bob Liu <lliubbo@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | um: disable CONFIG_CMPXCHG_LOCALRichard Weinberger2011-04-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8a5ec0ba "Lockless (and preemptless) fastpaths for slub" makes use of this_cpu_cmpxchg_double() which needs this_cpu_cmpxchg16b_emu() on x86_64. Implementing cmpxchg16b emulation for UML would introduce too much complexity. So just disable it. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Sergei Trofimovich <slyich@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | um: fix call tracer and bug handlerRichard Weinberger2011-04-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1de1502c ("x86, um: now we can get rid of trivial uml headers") removed accidentally bug.h which broke UML's call tracer and bug handler. Without asm-generic/bug.h UML uses BUG() from arch/x86/ which makes use of ud2. UML cannot use ud2, it raises SIGILL in user mode. As UML has a different stack for handling signals the call trace will be cut off. Signed-off-by: Richard Weinberger <richard@nod.at> Reported-by: Sergei Trofimovich <slyich@gmail.com> Tested-by: Sergei Trofimovich <slyich@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | fs/fhandle.c: add <linux/personality.h> for ia64Jeff Mahoney2011-04-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | force_o_largefile() on ia64 is defined in <asm/fcntl.h> and requires <linux/personality.h>. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | MAINTAINERS: change mail adress of Hans J. KochHans J. Koch2011-04-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | My old mail address doesn't exist anymore. This patch changes all occurences in MAINTAINERS to my new address. Signed-off-by: Hans J. Koch <hjk@hansjkoch.de> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | RapidIO/mpc85xx: fix possible mport registration problemsAlexandre Bounine2011-04-153-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a possible problem with mport registration left non-cleared after fsl_rio_setup() exits on link error. Abort mport initialization if registration failed. This patch is applicable to 2.6.39-rc1 only. The problem does not exist for earlier versions. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | RapidIO: add IDT CPS-1432 switch definitionsAlexandre Bounine2011-04-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | oom-kill: remove boost_dying_task_prio()KOSAKI Motohiro2011-04-151-28/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an almost-revert of commit 93b43fa ("oom: give the dying task a higher priority"). That commit dramatically improved oom killer logic when a fork-bomb occurs. But I've found that it has nasty corner case. Now cpu cgroup has strange default RT runtime. It's 0! That said, if a process under cpu cgroup promote RT scheduling class, the process never run at all. If an admin inserts a !RT process into a cpu cgroup by setting rtruntime=0, usually it runs perfectly because a !RT task isn't affected by the rtruntime knob. But if it promotes an RT task via an explicit setscheduler() syscall or an OOM, the task can't run at all. In short, the oom killer doesn't work at all if admins are using cpu cgroup and don't touch the rtruntime knob. Eventually, kernel may hang up when oom kill occur. I and the original author Luis agreed to disable this logic. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Luis Claudio R. Goncalves <lclaudio@uudg.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | vmscan: all_unreclaimable() use zone->all_unreclaimable as a nameKOSAKI Motohiro2011-04-151-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | all_unreclaimable check in direct reclaim has been introduced at 2.6.19 by following commit. 2006 Sep 25; commit 408d8544; oom: use unreclaimable info And it went through strange history. firstly, following commit broke the logic unintentionally. 2008 Apr 29; commit a41f24ea; page allocator: smarter retry of costly-order allocations Two years later, I've found obvious meaningless code fragment and restored original intention by following commit. 2010 Jun 04; commit bb21c7ce; vmscan: fix do_try_to_free_pages() return value when priority==0 But, the logic didn't works when 32bit highmem system goes hibernation and Minchan slightly changed the algorithm and fixed it . 2010 Sep 22: commit d1908362: vmscan: check all_unreclaimable in direct reclaim path But, recently, Andrey Vagin found the new corner case. Look, struct zone { .. int all_unreclaimable; .. unsigned long pages_scanned; .. } zone->all_unreclaimable and zone->pages_scanned are neigher atomic variables nor protected by lock. Therefore zones can become a state of zone->page_scanned=0 and zone->all_unreclaimable=1. In this case, current all_unreclaimable() return false even though zone->all_unreclaimabe=1. This resulted in the kernel hanging up when executing a loop of the form 1. fork 2. mmap 3. touch memory 4. read memory 5. munmmap as described in http://www.gossamer-threads.com/lists/linux/kernel/1348725#1348725 Is this ignorable minor issue? No. Unfortunately, x86 has very small dma zone and it become zone->all_unreclamble=1 easily. and if it become all_unreclaimable=1, it never restore all_unreclaimable=0. Why? if all_unreclaimable=1, vmscan only try DEF_PRIORITY reclaim and a-few-lru-pages>>DEF_PRIORITY always makes 0. that mean no page scan at all! Eventually, oom-killer never works on such systems. That said, we can't use zone->pages_scanned for this purpose. This patch restore all_unreclaimable() use zone->all_unreclaimable as old. and in addition, to add oom_killer_disabled check to avoid reintroduce the issue of commit d1908362 ("vmscan: check all_unreclaimable in direct reclaim path"). Reported-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | mm: check that we have the right vma in __access_remote_vm()Michael Ellerman2011-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __access_remote_vm() we need to check that we have found the right vma, not the following vma before we try to access it. Otherwise we might call the vma's access routine with an address which does not fall inside the vma. It was discovered on a current kernel but with an unreleased driver, from memory it was strace leading to a kernel bad access, but it obviously depends on what the access implementation does. Looking at other access implementations I only see: $ git grep -A 5 vm_operations|grep access arch/powerpc/platforms/cell/spufs/file.c- .access = spufs_mem_mmap_access, arch/x86/pci/i386.c- .access = generic_access_phys, drivers/char/mem.c- .access = generic_access_phys fs/sysfs/bin.c- .access = bin_access, The spufs one looks like it might behave badly given the wrong vma, it assumes vma->vm_file->private_data is a spu_context, and looks like it would probably blow up pretty quickly if it wasn't. generic_access_phys() only uses the vma to check vm_flags and get the mm, and then walks page tables using the address. So it should bail on the vm_flags check, or at worst let you access some other VM_IO mapping. And bin_access() just proxies to another access implementation. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | brk: COMPAT_BRK: fix detection of randomized brkJiri Kosina2011-04-153-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 5520e89 ("brk: fix min_brk lower bound computation for COMPAT_BRK") tried to get the whole logic of brk randomization for legacy (libc5-based) applications finally right. It turns out that the way to detect whether brk has actually been randomized in the end or not introduced by that patch still doesn't work for those binaries, as reported by Geert: : /sbin/init from my old m68k ramdisk exists prematurely. : : Before the patch: : : | brk(0x80005c8e) = 0x80006000 : : After the patch: : : | brk(0x80005c8e) = 0x80005c8e : : Old libc5 considers brk() to have failed if the return value is not : identical to the requested value. I don't like it, but currently see no better option than a bit flag in task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2 case. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | drivers/misc/sgi-gru/grufile.c: fix the wrong members of gru_chipWanlong Gao2011-04-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the wrong members and the wrong function's definition, since the irq_chip had changed. Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | tmpfs: fix off-by-one in max_blocks checksHugh Dickins2011-04-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you fill up a tmpfs, df was showing tmpfs 460800 - - - /tmp because of an off-by-one in the max_blocks checks. Fix it so df shows tmpfs 460800 460800 0 100% /tmp Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | MAINTAINERS: update STABLE BRANCH infoRandy Dunlap2011-04-151-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drop Chris Wright from STABLE maintainers. He hasn't done STABLE release work for quite some time. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | mm: add VM counters for transparent hugepagesAndi Kleen2011-04-153-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found it difficult to make sense of transparent huge pages without having any counters for its actions. Add some counters to vmstat for allocation of transparent hugepages and fallback to smaller pages. Optional patch, but useful for development and understanding the system. Contains improvements from Andrea Arcangeli and Johannes Weiner [akpm@linux-foundation.org: coding-style fixes] [hannes@cmpxchg.org: fix vmstat_text[] entries] Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | MAINTAINERS: update various tty patternsJoe Perches2011-04-151-16/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commits 4a6514e6d0 ("tty: move obsolete and broken tty drivers to drivers/staging/tty/") and a6afd9f3e8 ("tty: move a number of tty drivers from drivers/char/ to drivers/tty/") moved files around. Update patterns and orphan some files that were moved to staging. Signed-off-by: Joe Perches <joe@perches.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | MAINTAINERS: update m68knommu patternsJoe Perches2011-04-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 66d857b08b ("m68k: merge m68k and m68knommu arch directories") moved the files around. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | MAINTAINERS: add ARM/ts78xx-setup platform maintainerAlexander Clouter2011-04-151-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alexander Clouter <alex@digriz.org.uk> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | kstrtox: simpler code in _kstrtoull()Alexey Dobriyan2011-04-151-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | kstrtox: fix compile warnings in testAlexey Dobriyan2011-04-151-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warnings: CC [M] lib/test-kstrtox.o lib/test-kstrtox.c: In function 'test_kstrtou64_ok': lib/test-kstrtox.c:318: warning: this decimal constant is unsigned only in ISO C90 ... Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | leds/leds-regulator.c: fix handling of already enabled regulatorsAntonio Ospite2011-04-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the driver aware of the initial status of the regulator. The leds-regulator driver was ignoring the initial status of the regulator; this resulted in rdev->use_count being incremented to 2 after calling regulator_led_set_value() in the .probe method when a regulator was already enabled at insmod time, which made it impossible to ever disable the regulator. Signed-off-by: Antonio Ospite <ospite@studenti.unina.it> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Antonio Ospite <ospite@studenti.unina.it> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Liam Girdwood <lrg@slimlogic.co.uk> Cc: Daniel Ribeiro <drwyrm@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | vmstat: update comment regarding stat_thresholdChristoph Lameter2011-04-151-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | mm/page_alloc.c: silence build_all_zonelists() section mismatchPaul Mundt2011-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The memory hotplug case involves calling to build_all_zonelists() which in turns calls in to setup_zone_pageset(). The latter is marked __meminit while build_all_zonelists() itself has no particular annotation. build_all_zonelists() is only handed a non-NULL pointer in the case of memory hotplug through an existing __meminit path, so the setup_zone_pageset() reference is always safe. The options as such are either to flag build_all_zonelists() as __ref (as per __build_all_zonelists()), or to simply discard the __meminit annotation from setup_zone_pageset(). Signed-off-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | fs/partitions/ldm.c: fix oops caused by corrupted partition tableTimo Warns2011-04-151-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel automatically evaluates partition tables of storage devices. The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains a bug that causes a kernel oops on certain corrupted LDM partitions. A kernel subsystem seems to crash, because, after the oops, the kernel no longer recognizes newly connected storage devices. The patch validates the value of vblk_size. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Timo Warns <warns@pre-sense.de> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Harvey Harrison <harvey.harrison@gmail.com> Cc: Richard Russon <rich@flatcap.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>