summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sched: remove the !PREEMPT_BKL codeIngo Molnar2008-01-255-161/+5
| | | | | | | | remove the !PREEMPT_BKL code. this removes 160 lines of legacy code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: make PREEMPT_BKL the defaultIngo Molnar2008-01-251-8/+1
| | | | | | | | make PREEMPT_BKL the default. precursor to removal of the !PREEMPT_BKL code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* debug: track and print last unloaded module in the oops traceArjan van de Ven2008-01-251-0/+6
| | | | | | | | | | | | | | | | | | | | Based on a suggestion from Andi: In various cases, the unload of a module may leave some bad state around that causes a kernel crash AFTER a module is unloaded; and it's then hard to find which module caused that. This patch tracks the last unloaded module, and prints this as part of the module list in the oops trace. Right now, only the last 1 module is tracked; I expect that this is enough for the vast majority of cases where this information matters; if it turns out that tracking more is important, we can always extend it to that. [ mingo@elte.hu: build fix ] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* debug: show being-loaded/being-unloaded indicator for modulesArjan van de Ven2008-01-251-6/+15
| | | | | | | | | | | | | | | | | | | | It's rather common that an oops/WARN_ON/BUG happens during the load or unload of a module. Unfortunatly, it's not always easy to see directly which module is being loaded/unloaded from the oops itself. Worse, it's not even always possible to ask the bug reporter, since there are so many components (udev etc) that auto-load modules that there's a good chance that even the reporter doesn't know which module this is. This patch extends the existing "show if it's tainting" print code, which is used as part of printing the modules in the oops/BUG/WARN_ON to include a "+" for "being loaded" and a "-" for "being unloaded". As a result this extension, the "taint_flags()" function gets renamed to "module_flags()" (and takes a module struct as argument, not a taint flags int). Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt-watchdog: fix .rlim_max = RLIM_INFINITYPeter Zijlstra2008-01-252-8/+3
| | | | | | | | | | | | | Remove the curious logic to set it_sched_expires in the future. It useless because rt.timeout wouldn't be incremented anyway. Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1 is d-2. Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl> CC: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt-group: reduce reschedulingPeter Zijlstra2008-01-251-1/+4
| | | | | | | Only reschedule if the new group has a higher prio task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: unlock hrtimer_wakeupPeter Zijlstra2008-01-251-1/+3
| | | | | | | | | | | | | | | | hrtimer_wakeup creates a base->lock rq->lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base->lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallbackPeter Zijlstra2008-01-253-135/+143
| | | | | | | | | | Currently all highres=off timers are run from softirq context, but HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context. Fix this up by splitting it similar to the highres=on case. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* hrtimer: clean up cpu->base locking tricksPeter Zijlstra2008-01-252-9/+19
| | | | | | | | In order to more easily allow for the scheduler to use timers, clean up the locking a bit. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt throttling vs no_hzPeter Zijlstra2008-01-254-15/+45
| | | | | | | We need to teach no_hz about the rt throttling because its tick driven. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: pull_rt_task() cleanupMike Galbraith2008-01-251-6/+4
| | | | | | | "goto out" is an odd way to spell "skip". Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt group schedulingPeter Zijlstra2008-01-255-206/+549
| | | | | | | | | | | | | | | | | Extend group scheduling to also cover the realtime classes. It uses the time limiting introduced by the previous patch to allow multiple realtime groups. The hard time limit is required to keep behaviour deterministic. The algorithms used make the realtime scheduler O(tg), linear scaling wrt the number of task groups. This is the worst case behaviour I can't seem to get out of, the avg. case of the algorithms can be improved, I focused on correctness and worst case. [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt time limitPeter Zijlstra2008-01-254-21/+122
| | | | | | | | | | Very simple time limit on the realtime scheduling classes. Allow the rq's realtime class to consume sched_rt_ratio of every sched_rt_period slice. If the class exceeds this quota the fair class will preempt the realtime class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: high-res preemption tickPeter Zijlstra2008-01-2512-21/+295
| | | | | | | | | | | | | | | | Use HR-timers (when available) to deliver an accurate preemption tick. The regular scheduler tick that runs at 1/HZ can be too coarse when nice level are used. The fairness system will still keep the cpu utilisation 'fair' by then delaying the task that got an excessive amount of CPU time but try to minimize this by delivering preemption points spot-on. The average frequency of this extra interrupt is sched_latency / nr_latency. Which need not be higher than 1/HZ, its just that the distribution within the sched_latency period is important. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: do not do cond_resched() when CONFIG_PREEMPTHerbert Xu2008-01-253-5/+18
| | | | | | | | | Why do we even have cond_resched when real preemption is on? It seems to be a waste of space and time. remove cond_resched with CONFIG_PREEMPT on. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: documentation, whitespace fixesIngo Molnar2008-01-251-4/+4
| | | | | | whitespace fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: SCHED_FIFO/SCHED_RR watchdog timerPeter Zijlstra2008-01-254-2/+63
| | | | | | | | | | | | | | | | Introduce a new rlimit that allows the user to set a runtime timeout on real-time tasks their slice. Once this limit is exceeded the task will receive SIGXCPU. So it measures runtime since the last sleep. Input and ideas by Thomas Gleixner and Lennart Poettering. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Lennart Poettering <mzxreary@0pointer.de> CC: Michael Kerrisk <mtk.manpages@googlemail.com> CC: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: sched_rt_entityPeter Zijlstra2008-01-255-16/+21
| | | | | | | | | | Move the task_struct members specific to rt scheduling together. A future optimization could be to put sched_entity and sched_rt_entity into a union. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* uids: merge multiple error paths in alloc_uid() into onePavel Emelyanov2008-01-251-27/+20
| | | | | | | | | | | | | | There are already 4 error paths in alloc_uid() that do incremental rollbacks. I think it's time to merge them. This costs us 8 lines of code :) Maybe it would be better to merge this patch with the previous one, but I remember that some time ago I sent a similar patch (fixing the error path and cleaning it), but I was told to make two patches in such cases. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: dynamically update the root-domain span/online mapsGregory Haskins2008-01-251-12/+19
| | | | | | | | | | The baseline code statically builds the span maps when the domain is formed. Previous attempts at dynamically updating the maps caused a suspend-to-ram regression, which should now be fixed. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: update RCU Documentation.Paul E. McKenney2008-01-253-19/+221
| | | | | | | | | | This patch updates the RCU documentation to reflect preemptible RCU as well as recent publications. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: CPU Hotplug handlingPaul E. McKenney2008-01-251-5/+142
| | | | | | | | | | | This patch allows preemptible RCU to tolerate CPU-hotplug operations. It accomplishes this by maintaining a local copy of a map of online CPUs, which it accesses under its own lock. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: implementationPaul E. McKenney2008-01-2513-7/+1394
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements a new version of RCU which allows its read-side critical sections to be preempted. It uses a set of counter pairs to keep track of the read-side critical sections and flips them when all tasks exit read-side critical section. The details of this implementation can be found in this paper - http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf and the article- http://lwn.net/Articles/253651/ This patch was developed as a part of the -rt kernel development and meant to provide better latencies when read-side critical sections of RCU don't disable preemption. As a consequence of keeping track of RCU readers, the readers have a slight overhead (optimizations in the paper). This implementation co-exists with the "classic" RCU implementations and can be switched to at compiler. Also includes RCU tracing summarized in debugfs. [ akpm@linux-foundation.org: build fixes on non-preempt architectures ] Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: fix rcu_barrier for preemptive environment.Paul E. McKenney2008-01-252-1/+11
| | | | | | | | | | | | Fix rcu_barrier() to work properly in preemptive kernel environment. Also, the ordering of callback must be preserved while moving callbacks to another CPU during CPU hotplug. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: reorganize RCU code into rcuclassic.c and rcupdate.cPaul E. McKenney2008-01-255-670/+812
| | | | | | | | | | | | | This patch re-organizes the RCU code to enable multiple implementations of RCU. Users of RCU continues to include rcupdate.h and the RCU interfaces remain the same. This is in preparation for subsequently merging the preemptible RCU implementation. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: Use softirq instead of tasklets forDipankar Sarma2008-01-252-8/+18
| | | | | | | | | | | | | | | | This patch makes RCU use softirq instead of tasklets. It also adds a memory barrier after raising the softirq inorder to ensure that the cpu sees the most recently updated value of rcu->cur while processing callbacks. The discussion of the related theoretical race pointed out by James Huang can be found here --> http://lkml.org/lkml/2007/11/20/603 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove some old cpuset logicGregory Haskins2008-01-251-33/+0
| | | | | | | | | We had support for overlapping cpuset based rto logic in early prototypes that is no longer used, so remove it. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, only adjust overload state when changingGregory Haskins2008-01-251-3/+5
| | | | | | | | | | | | The overload set/clears were originally idempotent when this logic was first implemented. But that is no longer true due to the addition of the atomic counter and this logic was never updated to work properly with that change. So only adjust the overload state if it is actually changing to avoid getting out of sync. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, add new methods to sched_classSteven Rostedt2008-01-255-22/+186
| | | | | | | | | | | | | | | | | Dmitry Adamushko found that the current implementation of the RT balancing code left out changes to the sched_setscheduler and rt_mutex_setprio. This patch addresses this issue by adding methods to the schedule classes to handle being switched out of (switched_from) and being switched into (switched_to) a sched_class. Also a method for changing of priorities is also added (prio_changed). This patch also removes some duplicate logic between rt_mutex_setprio and sched_setscheduler. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance, replace hooks with pre/post schedule and wakeup methodsSteven Rostedt2008-01-253-14/+26
| | | | | | | | | | | To make the main sched.c code more agnostic to the schedule classes. Instead of having specific hooks in the schedule code for the RT class balancing. They are replaced with a pre_schedule, post_schedule and task_wake_up methods. These methods may be used by any of the classes but currently, only the sched_rt class implements them. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove do_div() from __sched_slice()Peter Zijlstra2008-01-251-1/+1
| | | | | | | | | | | Yanmin Zhang noticed a nice optimization: p = l * nr / nl, nl = l/g -> p = g * nr which eliminates a do_div() from __sched_period(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: get rid of 'new_cpu' in try_to_wake_up()Dmitry Adamushko2008-01-251-8/+3
| | | | | | | | | | Clean-up try_to_wake_up(). Get rid of the 'new_cpu' variable in try_to_wake_up() [ that's, one #ifdef section less ]. Also remove a few redundant blank lines. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: no need for 'affine wakeup' balancingDmitry Adamushko2008-01-251-0/+3
| | | | | | | | | | | No need to do a check for 'affine wakeup and passive balancing possibilities' in select_task_rq_fair() when task_cpu(p) == this_cpu. I guess, this part got missed upon introduction of per-sched_class select_task_rq() in try_to_wake_up(). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: whitespace cleanups in topology.hIngo Molnar2008-01-251-1/+1
| | | | | | whitespace cleanups in topology.h. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: reactivate fork balancingIngo Molnar2008-01-251-0/+3
| | | | | | reactivate fork balancing. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: add credits for RT balancing improvementsIngo Molnar2008-01-251-0/+2
| | | | | | add credits for RT balancing improvements. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: style cleanup, #2Ingo Molnar2008-01-251-15/+17
| | | | | | | | | | | | style cleanup of various changes that were done recently. no code changed: text data bss dec hex filename 26399 2578 48 29025 7161 sched.o.before 26399 2578 48 29025 7161 sched.o.after Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove unused JIFFIES_TO_NS() macroIngo Molnar2008-01-251-2/+1
| | | | | | remove unused JIFFIES_TO_NS() macro. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix sched_rt.c:join/leave_domainIngo Molnar2008-01-251-17/+16
| | | | | | | fix build bug in sched_rt.c:join/leave_domain and make them only be included on SMP builds. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: only balance our RT tasks within our domainGregory Haskins2008-01-252-26/+38
| | | | | | | | | | | | | | | | | | | | | | We move the rt-overload data as the first global to per-domain reclassification. This limits the scope of overload related cache-line bouncing to stay with a specified partition instead of affecting all cpus in the system. Finally, we limit the scope of find_lowest_cpu searches to the domain instead of the entire system. Note that we would always respect domain boundaries even without this patch, but we first would scan potentially all cpus before whittling the list down. Now we can avoid looking at RQs that are out of scope, again reducing cache-line hits. Note: In some cases, task->cpus_allowed will effectively reduce our search to within our domain. However, I believe there are cases where the cpus_allowed mask may be all ones and therefore we err on the side of caution. If it can be optimized later, so be it. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: add sched-domain rootsGregory Haskins2008-01-252-3/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | We add the notion of a root-domain which will be used later to rescope global variables to per-domain variables. Each exclusive cpuset essentially defines an island domain by fully partitioning the member cpus from any other cpuset. However, we currently still maintain some policy/state as global variables which transcend all cpusets. Consider, for instance, rt-overload state. Whenever a new exclusive cpuset is created, we also create a new root-domain object and move each cpu member to the root-domain's span. By default the system creates a single root-domain with all cpus as members (mimicking the global state we have today). We add some plumbing for storing class specific data in our root-domain. Whenever a RQ is switching root-domains (because of repartitioning) we give each sched_class the opportunity to remove any state from its old domain and add state to the new one. This logic doesn't have any clients yet but it will later in the series. Signed-off-by: Gregory Haskins <ghaskins@novell.com> CC: Christoph Lameter <clameter@sgi.com> CC: Paul Jackson <pj@sgi.com> CC: Simon Derr <simon.derr@bull.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up schedule_balance_rt()Ingo Molnar2008-01-251-4/+2
| | | | | | clean up schedule_balance_rt(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up pull_rt_task()Ingo Molnar2008-01-251-12/+10
| | | | | | clean up pull_rt_task(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove leftover debuggingIngo Molnar2008-01-251-8/+0
| | | | | | remove leftover debugging. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: remove rt_overload()Ingo Molnar2008-01-251-9/+1
| | | | | | remove rt_overload() - it's an unnecessary indirection. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up kernel/sched_rt.cIngo Molnar2008-01-251-0/+9
| | | | | | clean up whitespace damage and missing comments in kernel/sched_rt.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up overlong line in kernel/sched_debug.cIngo Molnar2008-01-251-2/+4
| | | | | | clean up overlong line in kernel/sched_debug.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up find_lock_lowest_rq()Ingo Molnar2008-01-251-4/+5
| | | | | | clean up find_lock_lowest_rq(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: clean up pick_next_highest_task_rt()Ingo Molnar2008-01-251-3/+3
| | | | | | clean up pick_next_highest_task_rt(). Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: RT-balance on new taskSteven Rostedt2008-01-251-0/+1
| | | | | | rt-balance when creating new tasks. Signed-off-by: Ingo Molnar <mingo@elte.hu>