summaryrefslogtreecommitdiffstats
path: root/kernel (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'timers-for-linus' of ↵Linus Torvalds2011-01-066-102/+63
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: MAINTAINERS: Update timer related entries timers: Use this_cpu_read timerqueue: Make timerqueue_getnext() static inline hrtimer: fix timerqueue conversion flub hrtimers: Convert hrtimers to use timerlist infrastructure timers: Fixup allmodconfig build issue timers: Rename timerlist infrastructure to timerqueue timers: Introduce timerlist infrastructure. hrtimer: Remove stale comment on curr_timer timer: Warn when del_timer_sync() is called in hardirq context timer: Del_timer_sync() can be used in softirq context timer: Make try_to_del_timer_sync() the same on SMP and UP posix-timers: Annotate lock_timer() timer: Permit statically-declared work with deferrable timers time: Use ARRAY_SIZE macro in timecompare.c timer: Initialize the field slack of timer_list timer_list: Remove alignment padding on 64 bit when CONFIG_TIMER_STATS time: Compensate for rounding on odd-frequency clocksources Fix up trivial conflict in MAINTAINERS
| * timers: Use this_cpu_readChristoph Lameter2010-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric asked for this. [tglx: Because it generates faster code according to Erics ] Signed-off-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tejun Heo <tj@kernel.org> Cc: linux-mm@kvack.org LKML-Reference: <alpine.DEB.2.00.1011301404490.4039@router.home> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * hrtimer: fix timerqueue conversion flubJohn Stultz2010-12-111-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In converting the hrtimers to timerqueue, I missed a spot in hrtimer_run_queues where we loop running timers. We end up not pulling the new next value out and instead just use the last next value, causing boot time hangs in some cases. The proper fix is to pull timerqueue_getnext each iteration instead of using a local next value. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * hrtimers: Convert hrtimers to use timerlist infrastructureJohn Stultz2010-12-102-60/+34
| | | | | | | | | | | | | | | | | | | | | | Converts the hrtimer code to use the new timerlist infrastructure Signed-off-by: John Stultz <john.stultz@linaro.org> LKML Reference: <1290136329-18291-3-git-send-email-john.stultz@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Alessandro Zummo <a.zummo@towertech.it> CC: Thomas Gleixner <tglx@linutronix.de> CC: Richard Cochran <richardcochran@gmail.com>
| * timer: Warn when del_timer_sync() is called in hardirq contextYong Zhang2010-10-221-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | Add explict warning when del_timer_sync() is called in hardirq context. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * timer: Del_timer_sync() can be used in softirq contextYong Zhang2010-10-221-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Actually we have used del_timer_sync() in softirq context for a long time, e.g. in __dst_free()::cancel_delayed_work(). So change the comments of it to warn on hardirq context only, and make lockdep know about this change. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * timer: Make try_to_del_timer_sync() the same on SMP and UPYong Zhang2010-10-221-14/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On UP try_to_del_timer_sync() is mapped to del_timer() which does not take the running timer callback into account, so it has different semantics. Remove the SMP dependency of try_to_del_timer_sync() by using base->running_timer in the UP case as well. [ tglx: Removed set_running_timer() inline and tweaked the changelog ] Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * posix-timers: Annotate lock_timer()Namhyung Kim2010-10-211-2/+8
| | | | | | | | | | | | | | | | | | | | | | lock_timer() conditionally grabs it_lock in case of returning non-NULL but unlock_timer() releases it unconditionally. This leads sparse to complain about the lock context imbalance. Rename and wrap lock_timer using __cond_lock() macro to make sparse happy. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * timer: Permit statically-declared work with deferrable timersPhil Carmody2010-10-211-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, you have to just define a delayed_work uninitialised, and then initialise it before first use. That's a tad clumsy. At risk of playing mind-games with the compiler, fooling it into doing pointer arithmetic with compile-time-constants, this lets clients properly initialise delayed work with deferrable timers statically. This patch was inspired by the issues which lead Artem Bityutskiy to commit 8eab945c5616fc984 ("sunrpc: make the cache cleaner workqueue deferrable"). Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * time: Use ARRAY_SIZE macro in timecompare.cNikitas Angelinas2010-10-211-2/+3
| | | | | | | | | | | | | | | | | | Replace sizeof(buffer)/sizeof(buffer[0]) with ARRAY_SIZE(buffer) in kernel/time/timecompare.c Signed-off-by: Nikitas Angelinas <nikitasangelinas@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * time: Compensate for rounding on odd-frequency clocksourcesKasper Pedersen2010-10-211-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the clocksource is not a multiple of HZ, the clock will be off. For acpi_pm, HZ=1000 the error is 127.111 ppm: The rounding of cycle_interval ends up generating a false error term in ntp_error accumulation since xtime_interval is not exactly 1/HZ. So, we subtract out the error caused by the rounding. This has been visible since 2.6.32-rc2 commit a092ff0f90cae22b2ac8028ecd2c6f6c1a9e4601 time: Implement logarithmic time accumulation That commit raised NTP_INTERVAL_FREQ and exposed the rounding error. testing tool: http://n1.taur.dk/permanent/testpmt.c Also tested with ntpd and a frequency counter. Signed-off-by: Kasper Pedersen <kkp2010@kasperkp.dk> Acked-by: john stultz <johnstul@us.ibm.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2011-01-0619-585/+785
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) sched: Change wait_for_completion_*_timeout() to return a signed long sched, autogroup: Fix reference leak sched, autogroup: Fix potential access to freed memory sched: Remove redundant CONFIG_CGROUP_SCHED ifdef sched: Fix interactivity bug by charging unaccounted run-time on entity re-weight sched: Move periodic share updates to entity_tick() printk: Use this_cpu_{read|write} api on printk_pending sched: Make pushable_tasks CONFIG_SMP dependant sched: Add 'autogroup' scheduling feature: automated per session task groups sched: Fix unregister_fair_sched_group() sched: Remove unused argument dest_cpu to migrate_task() mutexes, sched: Introduce arch_mutex_cpu_relax() sched: Add some clock info to sched_debug cpu: Remove incorrect BUG_ON cpu: Remove unused variable sched: Fix UP build breakage sched: Make task dump print all 15 chars of proc comm sched: Update tg->shares after cpu.shares write sched: Allow update_cfs_load() to update global load sched: Implement demand based update_cfs_load() ...
| * | sched: Change wait_for_completion_*_timeout() to return a signed longNeilBrown2011-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | wait_for_completion_*_timeout() can return: 0: if the wait timed out -ve: if the wait was interrupted +ve: if the completion was completed. As they currently return an 'unsigned long', the last two cases are not easily distinguished which can easily result in buggy code, as is the case for the recently added wait_for_completion_interruptible_timeout() call in net/sunrpc/cache.c So change them both to return 'long'. As MAX_SCHEDULE_TIMEOUT is LONG_MAX, a large +ve return value should never overflow. Signed-off-by: NeilBrown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20110105125016.64ccab0e@notabene.brown> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge commit 'v2.6.37' into sched/coreIngo Molnar2011-01-0513-171/+361
| |\ \ | | | | | | | | | | | | | | | | | | | | Merge reason: Merge the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched, autogroup: Fix reference leakMike Galbraith2011-01-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cgroup exit mess also uncovered a struct autogroup reference leak. copy_process() was simply freeing vs putting the signal_struct, stranding a reference. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> LKML-Reference: <1293784350.6839.2.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched, autogroup: Fix potential access to freed memoryMike Galbraith2011-01-041-8/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleg pointed out that the /proc interface kref_get() useage may race with the final put during autogroup_move_group(). A signal->autogroup assignment may be in flight when the /proc interface dereference, leaving them taking a reference to an already dead group. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1292508592.5940.28.camel@maggy.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Remove redundant CONFIG_CGROUP_SCHED ifdefYong Zhang2011-01-041-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_[FAIR|RT]_GROUP_SCHED always means CONFIG_CGROUP_SCHED Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1293803938-8157-1-git-send-email-yong.zhang0@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Fix interactivity bug by charging unaccounted run-time on entity ↵Paul Turner2010-12-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | re-weight Mike Galbraith reported poor interactivity[*] when the new shares distribution code was combined with autogroups. The root cause turns out to be a mis-ordering of accounting accrued execution time and shares updates. Since update_curr() is issued hierarchically, updating the parent entity weights to reflect child enqueue/dequeue results in the parent's unaccounted execution time then being accrued (vs vruntime) at the new weight as opposed to the weight present at accumulation. While this doesn't have much effect on processes with timeslices that cross a tick, it is particularly problematic for an interactive process (e.g. Xorg) which incurs many (tiny) timeslices. In this scenario almost all updates are at dequeue which can result in significant fairness perturbation (especially if it is the only thread, resulting in potential {tg->shares, MIN_SHARES} transitions). Correct this by ensuring unaccounted time is accumulated prior to manipulating an entity's weight. [*] http://xkcd.com/619/ is perversely Nostradamian here. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20101216031038.159704378@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | sched: Move periodic share updates to entity_tick()Paul Turner2010-12-191-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long running entities that do not block (dequeue) require periodic updates to maintain accurate share values. (Note: group entities with several threads are quite likely to be non-blocking in many circumstances). By virtue of being long-running however, we will see entity ticks (otherwise the required update occurs in dequeue/put and we are done). Thus we can move the detection (and associated work) for these updates into the periodic path. This restores the 'atomicity' of update_curr() with respect to accounting. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101216031038.067028969@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge commit 'v2.6.37-rc6' into sched/coreIngo Molnar2010-12-191-3/+4
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Update to the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | printk: Use this_cpu_{read|write} api on printk_pendingEric Dumazet2010-12-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() is a bit inefficient, lets use __this_cpu_read() and __this_cpu_write() to manipulate printk_pending. printk_needs_cpu(cpu) is called only for the current cpu : Use faster __this_cpu_read(). Remove the redundant unlikely on (cpu_is_offline(cpu)) test: # size kernel/printk.o* text data bss dec hex filename 9942 756 263488 274186 42f0a kernel/printk.o.new 9990 756 263488 274234 42f3a kernel/printk.o.old Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290788536.2855.237.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: Make pushable_tasks CONFIG_SMP dependantDario Faggioli2010-12-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted by Peter Zijlstra at https://lkml.org/lkml/2010/11/10/391 (while reviewing other stuff, though), tracking pushable tasks only makes sense on SMP systems. Signed-off-by: Dario Faggioli <raistlin@linux.it> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1291143093.2697.298.camel@Palantir> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge branch 'linus' into sched/coreIngo Molnar2010-12-0814-60/+188
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: we want to queue up dependent cleanup Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | sched: Add 'autogroup' scheduling feature: automated per session task groupsMike Galbraith2010-11-307-49/+292
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recurring complaint from CFS users is that parallel kbuild has a negative impact on desktop interactivity. This patch implements an idea from Linus, to automatically create task groups. Currently, only per session autogroups are implemented, but the patch leaves the way open for enhancement. Implementation: each task's signal struct contains an inherited pointer to a refcounted autogroup struct containing a task group pointer, the default for all tasks pointing to the init_task_group. When a task calls setsid(), a new task group is created, the process is moved into the new task group, and a reference to the preveious task group is dropped. Child processes inherit this task group thereafter, and increase it's refcount. When the last thread of a process exits, the process's reference is dropped, such that when the last process referencing an autogroup exits, the autogroup is destroyed. At runqueue selection time, IFF a task has no cgroup assignment, its current autogroup is used. Autogroup bandwidth is controllable via setting it's nice level through the proc filesystem: cat /proc/<pid>/autogroup Displays the task's group and the group's nice level. echo <nice level> > /proc/<pid>/autogroup Sets the task group's shares to the weight of nice <level> task. Setting nice level is rate limited for !admin users due to the abuse risk of task group locking. The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via the boot option noautogroup, and can also be turned on/off on the fly via: echo [01] > /proc/sys/kernel/sched_autogroup_enabled ... which will automatically move tasks to/from the root task group. Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ] Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | sched: Fix unregister_fair_sched_group()Paul Turner2010-11-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the flipping and flopping between calling unregister_fair_sched_group() on a per-cpu versus per-group basis we ended up in a bad state. Remove from the list for the passed cpu as opposed to some arbitrary index. ( This fixes explosions w/ autogroup as well as a group creation/destruction stress test. ) Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20101130005740.080828123@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | sched: Remove unused argument dest_cpu to migrate_task()Nikanth Karthikesan2010-11-261-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unused argument, 'dest_cpu' of migrate_task(), and pass runqueue, as it is always known at the call site. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <201011261237.09187.knikanth@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | mutexes, sched: Introduce arch_mutex_cpu_relax()Gerald Schaefer2010-11-262-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spinning mutex implementation uses cpu_relax() in busy loops as a compiler barrier. Depending on the architecture, cpu_relax() may do more than needed in this specific mutex spin loops. On System z we also give up the time slice of the virtual cpu in cpu_relax(), which prevents effective spinning on the mutex. This patch replaces cpu_relax() in the spinning mutex code with arch_mutex_cpu_relax(), which can be defined by each architecture that selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so this patch should not affect other architectures than System z for now. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290437256.7455.4.camel@thinkpad> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | Merge commit 'v2.6.37-rc3' into sched/coreIngo Molnar2010-11-2611-39/+84
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Add some clock info to sched_debugPeter Zijlstra2010-11-232-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add more clock information to /proc/sched_debug, Thomas wanted to see the sched_clock_stable state. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | cpu: Remove incorrect BUG_ONPeter Zijlstra2010-11-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleg mentioned that there is no actual guarantee the dying cpu's migration thread is actually finished running when we get there, so replace the BUG_ON() with a spinloop waiting for it. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | cpu: Remove unused variableDhaval Giani2010-11-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC warns us about: kernel/cpu.c: In function ‘take_cpu_down’: kernel/cpu.c:200:15: warning: unused variable ‘cpu’ This variable is unused since param->hcpu is directly used later on in cpu_notify. Signed-off-by: Dhaval Giani <dhaval_giani@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290091494.1145.5.camel@gondor.retis> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Fix UP build breakagePeter Zijlstra2010-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent cgroup-scheduling rework caused a UP build problem. Cc: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Make task dump print all 15 chars of proc commErik Gilling2010-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Erik Gilling <konkers@android.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290218934-8544-3-git-send-email-john.stultz@linaro.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Update tg->shares after cpu.shares writePaul Turner2010-11-181-31/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Formerly sched_group_set_shares would force a rebalance by overflowing domain share sums. Now that per-cpu averages are maintained we can set the true value by issuing an update_cfs_shares() following a tg->shares update. Also initialize tg se->load to 0 for consistency since we'll now set correct weights on enqueue. Signed-off-by: Paul Turner <pjt@google.com?> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.465521344@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Allow update_cfs_load() to update global loadPaul Turner2010-11-181-15/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the global load updates from update_shares_cpu() so that update_cfs_load() can update global load when it is more than ~10% out of sync. The new global_load parameter allows us to force an update, regardless of the error factor so that we can synchronize w/ update_shares(). Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.377473595@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Implement demand based update_cfs_load()Paul Turner2010-11-182-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the system is busy, dilation of rq->next_balance makes lb->update_shares() insufficiently frequent for threads which don't sleep (no dequeue/enqueue updates). Adjust for this by making demand based updates based on the accumulation of execution time sufficient to wrap our averaging window. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.291159744@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Update shares on idle_balancePaul Turner2010-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since shares updates are no longer expensive and effectively local, update them at idle_balance(). This allows us to more quickly redistribute shares to another cpu when our load becomes idle. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.204191702@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Add sysctl_sched_shares_windowPaul Turner2010-11-182-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new sysctl for the shares window and disambiguate it from sched_time_avg. A 10ms window appears to be a good compromise between accuracy and performance. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.112173964@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Introduce hierarchal order on shares update listPaul Turner2010-11-181-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid duplicate shares update calls by ensuring children always appear before parents in rq->leaf_cfs_rq_list. This allows us to do a single in-order traversal for update_shares(). Since we always enqueue in bottom-up order this reduces to 2 cases: 1) Our parent is already in the list, e.g. root \ b /\ c d* (root->b->c already enqueued) Since d's parent is enqueued we push it to the head of the list, implicitly ahead of b. 2) Our parent does not appear in the list (or we have no parent) In this case we enqueue to the tail of the list, if our parent is subsequently enqueued (bottom-up) it will appear to our right by the same rule. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234938.022488865@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Fix update_cfs_load() synchronizationPaul Turner2010-11-182-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using cfs_rq->nr_running is not sufficient to synchronize update_cfs_load with the put path since nr_running accounting occurs at deactivation. It's also not safe to make the removal decision based on load_avg as this fails with both high periods and low shares. Resolve this by clipping history after 4 periods without activity. Note: the above will always occur from update_shares() since in the last-task-sleep-case that task will still be cfs_rq->curr when update_cfs_load is called. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.933428187@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Fix load corruption from update_cfs_shares()Paul Turner2010-11-181-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of enqueue_entity both a new entity weight and its contribution to the queuing cfs_rq / rq are updated. Since update_cfs_shares will only update the queueing weights when the entity is on_rq (which in this case it is not yet), there's a dependency loop here: update_cfs_shares needs account_entity_enqueue to update cfs_rq->load.weight account_entity_enqueue needs the updated weight for the queuing cfs_rq load[*] Fix this and avoid spurious dequeue/enqueues by issuing update_cfs_shares as if we had accounted the enqueue already. This was also resulting in rq->load corruption previously. [*]: this dependency also exists when using the group cfs_rq w/ update_cfs_shares as the weight of the enqueued entity changes without the load being updated. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.844900206@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Make tg_shares_up() walk on-demandPeter Zijlstra2010-11-182-67/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make tg_shares_up() use the active cgroup list, this means we cannot do a strict bottom-up walk of the hierarchy, but assuming its a very wide tree with a small number of active groups it should be a win. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.754159484@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Implement on-demand (active) cfs_rq listPeter Zijlstra2010-11-183-83/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make certain load-balance actions scale per number of active cgroups instead of the number of existing cgroups. This makes wakeup/sleep paths more expensive, but is a win for systems where the vast majority of existing cgroups are idle. Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.666535048@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Rewrite tg_shares_up)Peter Zijlstra2010-11-185-211/+162
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By tracking a per-cpu load-avg for each cfs_rq and folding it into a global task_group load on each tick we can rework tg_shares_up to be strictly per-cpu. This should improve cpu-cgroup performance for smp systems significantly. [ Paul: changed to use queueing cfs_rq + bug fixes ] Signed-off-by: Paul Turner <pjt@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101115234937.580480400@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | sched: Simplify cpu-hot-unplug task migrationPeter Zijlstra2010-11-182-155/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While discussing the need for sched_idle_next(), Oleg remarked that since try_to_wake_up() ensures sleeping tasks will end up running on a sane cpu, we can do away with migrate_live_tasks(). If we then extend the existing hack of migrating current from CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the need for the sched_idle_next() abomination disappears as well, since idle will be the only possible thread left after the migration thread stops. This greatly simplifies the hot-unplug task migration path, as can be seen from the resulting code reduction (and about half the new lines are comments). Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1289851597.2109.547.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | Merge commit 'v2.6.37-rc2' into sched/coreIngo Molnar2010-11-1862-911/+1301
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Move to a .37-rc base. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | sched: Make sched_param argument static in sched_setscheduler() callersKOSAKI Motohiro2010-10-236-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrew Morton pointed out almost all sched_setscheduler() callers are using fixed parameters and can be converted to static. It reduces runtime memory use a little. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: James Morris <jmorris@namei.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2011-01-0613-348/+894
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (146 commits) tools, perf: Documentation for the power events API perf: Add calls to suspend trace point perf script: Make some lists static perf script: Use the default lost event handler perf session: Warn about errors when processing pipe events too perf tools: Fix perf_event.h header usage perf test: Clarify some error reports in the open syscall test x86, NMI: Add touch_nmi_watchdog to io_check_error delay x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time x86: Only call smp_processor_id in non-preempt cases perf timechart: Adjust perf timechart to the new power events perf: Clean up power events by introducing new, more generic ones perf: Do not export power_frequency, but power_start event perf test: Add test for counting open syscalls perf evsel: Auto allocate resources needed for some methods perf evsel: Use {cpu,thread}_map to shorten list of parameters perf tools: Refactor all_tids to hold nr and the map perf tools: Refactor cpumap to hold nr and the map perf evsel: Introduce per cpu and per thread open helpers perf evsel: Steal the counter reading routines from stat ...
| * | | | | | | | perf: Add calls to suspend trace pointJean Pihet2011-01-051-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Uses the machine_suspend trace point, called from the generic kernel suspend_devices_and_enter function. Signed-off-by: Jean Pihet <j-pihet@ti.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Arjan van de Ven <arjan@linux.intel.com> CC: Thomas Renninger <trenn@suse.de> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: linux-pm@lists.linux-foundation.org LKML-Reference: <1294253342-29056-2-git-send-email-j-pihet@ti.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | | Merge commit 'v2.6.37' into perf/coreIngo Molnar2011-01-052-1/+3
| |\ \ \ \ \ \ \ \ | | | |_|_|_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge reason: Add the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>