summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nohz: Rename CONFIG_NO_HZ to CONFIG_NO_HZ_COMMONFrederic Weisbecker2013-04-0315-46/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are planning to convert the dynticks Kconfig options layout into a choice menu. The user must be able to easily pick any of the following implementations: constant periodic tick, idle dynticks, full dynticks. As this implies a mutual exclusion, the two dynticks implementions need to converge on the selection of a common Kconfig option in order to ease the sharing of a common infrastructure. It would thus seem pretty natural to reuse CONFIG_NO_HZ to that end. It already implements all the idle dynticks code and the full dynticks depends on all that code for now. So ideally the choice menu would propose CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED then both would select CONFIG_NO_HZ. On the other hand we want to stay backward compatible: if CONFIG_NO_HZ is set in an older config file, we want to enable CONFIG_NO_HZ_IDLE by default. But we can't afford both at the same time or we run into a circular dependency: 1) CONFIG_NO_HZ_IDLE and CONFIG_NO_HZ_EXTENDED both select CONFIG_NO_HZ 2) If CONFIG_NO_HZ is set, we default to CONFIG_NO_HZ_IDLE We might be able to support that from Kconfig/Kbuild but it may not be wise to introduce such a confusing behaviour. So to solve this, create a new CONFIG_NO_HZ_COMMON option which gathers the common code between idle and full dynticks (that common code for now is simply the idle dynticks code) and select it from their referring Kconfig. Then we'll later create CONFIG_NO_HZ_IDLE and map CONFIG_NO_HZ to it for backward compatibility. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
* nohz: Unhide full dynticks feature from its dependenciesFrederic Weisbecker2013-04-021-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The full dynticks feature only shows up when all its Kconfig dependencies are met (RCU nocbs, RCU user mode, ...) This is far from being user friendly as those who want to activate this feature need to look into the Kconfig files and iterate through each dependency then activate these by hand in order to show and select the full dynticks Kconfig option. So process the other way around: show up the Kconfig option if the minimal low level dependencies are met and activate the high level ones when we enable the feature. Note there is one exception in the picture: CONFIG_VIRT_CPU_ACCOUNTING_GEN is part of a Kconfig choice menu and it appears we can't select it from another Kconfig selection when it's under such layout. So for now this particular item stays as a passive dependency. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
* nohz: Wake up full dynticks CPUs when a timer gets enqueuedFrederic Weisbecker2013-03-213-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wake up a CPU when a timer list timer is enqueued there and the target is part of the full dynticks range. Sending an IPI to it makes it reconsidering the next timer to program on top of recent updates. This may later be improved by checking if the tick is really stopped on the target. This would need some careful synchronization though. So deal with such optimization later and start simple. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
* nohz: Assign timekeeping duty to a CPU outside the full dynticks rangeFrederic Weisbecker2013-03-213-4/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This way the full nohz CPUs can safely run with the tick stopped with a guarantee that somebody else is taking care of the jiffies and GTOD progression. Once the duty is attributed to a CPU, it won't change. Also that CPU can't enter into dyntick idle mode or be hot unplugged. This may later be improved from a power consumption POV. At least we should be able to share the duty amongst all CPUs outside the full dynticks range. Then the duty could even be shared with full dynticks CPUs when those can't stop their tick for any reason. But let's start with that very simple approach first. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> [fix have_nohz_full_mask offcase] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* nohz: Basic full dynticks interfaceFrederic Weisbecker2013-03-214-0/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For extreme usecases such as Real Time or HPC, having the ability to shutdown the tick when a single task runs on a CPU is a desired feature: * Reducing the amount of interrupts improves throughput for CPU-bound tasks. The CPU is less distracted from its real job, from an execution time and from the cache point of views. * This also improve latency response as we have less critical sections. Start with introducing a very simple interface to define full dynticks CPU: use a boot time option defined cpumask through the "nohz_extended=" kernel parameter. CPUs that are part of this range will have their tick shutdown whenever possible: provided they run a single task and they don't do kernel activity that require the periodic tick. These details will be later documented in Documentation/* An online CPU must be kept outside this range to handle the timekeeping. Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Gilad Ben Yossef <gilad@benyossef.com> Cc: Hakan Akkan <hakanakkan@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kevin Hilman <khilman@linaro.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
* sched/tracing: Allow tracing the preemption decision on wakeupPeter Zijlstra2013-03-181-1/+1
| | | | | | | | | | | | | | | | Thomas noted that we do the wakeup preemption check after the wakeup trace point, this means the tracepoint cannot test/report this decision; which is rather important for latency sensitive workloads. Therefore move the tracepoint after doing the preemption check. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Paul Turner <pjt@google.com> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/1363254519.26965.9.camel@laptop Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'sched/core' of ↵Ingo Molnar2013-03-183-19/+65
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull CPU runtime stats/accounting fixes from Frederic Weisbecker: " Some users are complaining that their threadgroup's runtime accounting freezes after a week or so of intense cpu-bound workload. This set tries to fix the issue by reducing the risk of multiplication overflow in the cputime scaling code. " Stanislaw Gruszka further explained the historic context and impact of the bug: " Commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239 start to use scalling for whole thread group, so increase chances of hitting multiplication overflow, depending on how many CPUs are on the system. We have multiplication utime * rtime for one thread since commit b27f03d4bdc145a09fb7b0c0e004b29f1ee555fa. Overflow will happen after: rtime * utime > 0xffffffffffffffff jiffies if thread utilize 100% of CPU time, that gives: rtime > sqrt(0xffffffffffffffff) jiffies ritme > sqrt(0xffffffffffffffff) / (24 * 60 * 60 * HZ) days For HZ 100 it will be 497 days for HZ 1000 it will be 49 days. Bug affect only users, who run CPU intensive application for that long period. Also they have to be interested on utime,stime values, as bug has no other visible effect as making those values incorrect. " Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * sched: Lower chances of cputime scaling overflowFrederic Weisbecker2013-03-131-12/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some users have reported that after running a process with hundreds of threads on intensive CPU-bound loads, the cputime of the group started to freeze after a few days. This is due to how we scale the tick-based cputime against the scheduler precise execution time value. We add the values of all threads in the group and we multiply that against the sum of the scheduler exec runtime of the whole group. This easily overflows after a few days/weeks of execution. A proposed solution to solve this was to compute that multiplication on stime instead of utime: 62188451f0d63add7ad0cd2a1ae269d600c1663d ("cputime: Avoid multiplication overflow on utime scaling") The rationale behind that was that it's easy for a thread to spend most of its time in userspace under intensive CPU-bound workload but it's much harder to do CPU-bound intensive long run in the kernel. This postulate got defeated when a user recently reported he was still seeing cputime freezes after the above patch. The workload that triggers this issue relates to intensive networking workloads where most of the cputime is consumed in the kernel. To reduce much more the opportunities for multiplication overflow, lets reduce the multiplication factors to the remainders of the division between sched exec runtime and cputime. Assuming the difference between these shouldn't ever be that large, it could work on many situations. This gets the same results as in the upstream scaling code except for a small difference: the upstream code always rounds the results to the nearest integer not greater to what would be the precise result. The new code rounds to the nearest integer either greater or not greater. In practice this difference probably shouldn't matter but it's worth mentioning. If this solution appears not to be enough in the end, we'll need to partly revert back to the behaviour prior to commit 0cf55e1ec08bb5a22e068309e2d8ba1180ab4239 ("sched, cputime: Introduce thread_group_times()") Back then, the scaling was done on exit() time before adding the cputime of an exiting thread to the signal struct. And then we'll need to scale one-by-one the live threads cputime in thread_group_cputime(). The drawback may be a slightly slower code on exit time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
| * math64: New div64_u64_rem helperFrederic Weisbecker2013-03-132-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide an extended version of div64_u64() that also returns the remainder of the division. We are going to need this to refine the cputime scaling code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org>
* | sched: Fix variable name misnomer, add commentsAndrei Epure2013-03-141-4/+5
|/ | | | | | | | | | The min_vruntime variable actually stores the maximum value. The added comment was taken from place_entity function. Signed-off-by: Andrei Epure <epure.andrei@gmail.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1363115544-1964-1-git-send-email-epure.andrei@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Spelling fixAndrei Epure2013-03-111-1/+1
| | | | | | | | Signed-off-by: Andrei Epure <epure.andrei@gmail.com> Cc: trivial@kernel.org Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1362996200-2674-1-git-send-email-epure.andrei@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Fix update_group_power() prototype placement to fix build warning ↵Li Zefan2013-03-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | when !CONFIG_SMP All warnings: In file included from kernel/sched/core.c:85:0: kernel/sched/sched.h:1036:39: warning: 'struct sched_domain' declared inside parameter list kernel/sched/sched.h:1036:39: warning: its scope is only this definition or declaration, which is probably not what you want It's because struct sched_domain is defined inside #if CONFIG_SMP, while update_group_power() is declared unconditionally. Fix this warning by declaring update_group_power() only if CONFIG_SMP=n. Build tested with CONFIG_SMP enabled and then disabled. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5137F4BA.2060101@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'sched/cputime' of ↵Ingo Molnar2013-03-0810-133/+163
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull cputime changes from Frederic Weisbecker: * Generalize exception handling * Fix race in context tracking state restore on return from exception and irq exit kernel preemption * Fix cputime scaling in full dynticks accounting dynamic off-case * Fix default Kconfig value Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * context_tracking: Enable probes by default for selftestingFrederic Weisbecker2013-03-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until we provide the nohz_mask boot parameter, keeping the context tracking probes disabled by default is pointless since what we want is to runtime test this code anyway. It's furthermore confusing for the users which don't expect the probes to be off when they select RCU user mode or full dynticks cputime accounting. Let's enable these probes selftests by default for now. Suggested: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * cputime: Dynamically scale cputime for full dynticks accountingFrederic Weisbecker2013-03-073-77/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The full dynticks cputime accounting is able to account either using the tick or the context tracking subsystem. This way the housekeeping CPU can keep the low overhead tick based solution. This latter mode has a low jiffies resolution granularity and need to be scaled against CFS precise runtime accounting to improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING, now we also need to expand it to full dynticks accounting dynamic off-case as well. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * context_tracking: Restore preempted context state after preempt_schedule_irq()Frederic Weisbecker2013-03-071-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the context tracking POV, preempt_schedule_irq() behaves pretty much like an exception: It can be called anytime and schedule another task. But currently it doesn't restore the context tracking state of the preempted code on preempt_schedule_irq() return. As a result, if preempt_schedule_irq() is called in the tiny frame between user_enter() and the actual return to userspace, we resume userspace with the wrong context tracking state. Fix this by using exception_enter/exit() which are a perfect fit for this kind of issue. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * context_tracking: Restore correct previous context state on exception exitFrederic Weisbecker2013-03-074-35/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On exception exit, we restore the previous context tracking state based on the regs of the interrupted frame. Iff that frame is in user mode as stated by user_mode() helper, we restore the context tracking user mode. However there is a tiny chunck of low level arch code after we pass through user_enter() and until the CPU eventually resumes userspace. If an exception happens in this tiny area, exception_enter() correctly exits the context tracking user mode but exception_exit() won't restore it because of the value returned by user_mode(regs). As a result we may return to userspace with the wrong context tracking state. To fix this, change exception_enter() to return the context tracking state prior to its call and pass this saved state to exception_exit(). This restores the real context tracking state of the interrupted frame. (May be this patch was suggested to me, I don't recall exactly. If so, sorry for the missing credit). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * context_tracking: Move exception handling to generic codeFrederic Weisbecker2013-03-075-26/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exceptions handling on context tracking should share common treatment: on entry we exit user mode if the exception triggered in that context. Then on exception exit we return to that previous context. Generalize this to avoid duplication across archs. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Mats Liljegren <mats.liljegren@enea.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | sched: Remove double declaration of root_task_groupLi Zefan2013-03-062-5/+4
| | | | | | | | | | | | | | | | | | It's already declared in include/linux/sched.h Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A7D8.7000107@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Move group scheduling functions out of include/linux/sched.hLi Zefan2013-03-063-26/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Make sched_group_{set_,}runtime(), sched_group_{set_,}period() and sched_rt_can_attach() static. - Move sched_{create,destroy,online,offline}_group() to kernel/sched/sched.h. - Remove declaration of sched_group_shares(). Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A7C5.3000708@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Make default_scale_freq_power() staticLi Zefan2013-03-062-6/+3
| | | | | | | | | | | | | | | | | | | | | | As default_scale_{freq,smt}_power() and update_rt_power() are used in kernel/sched/fair.c only, annotate them as static functions. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A7AF.8010900@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Move struct sched_class to kernel/sched/sched.hLi Zefan2013-03-062-59/+55
| | | | | | | | | | | | | | | | | | It's used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A79F.8090502@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Move wake flags to kernel/sched/sched.hLi Zefan2013-03-062-7/+7
| | | | | | | | | | | | | | | | | | They are used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A78E.7040609@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Move struct sched_group to kernel/sched/sched.hLi Zefan2013-03-062-56/+58
| | | | | | | | | | | | | | | | | | | | | | Move struct sched_group_power and sched_group and related inline functions to kernel/sched/sched.h, as they are used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A77F.2010705@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Move SCHED_LOAD_SHIFT macros to kernel/sched/sched.hLi Zefan2013-03-062-26/+25
| | | | | | | | | | | | | | | | | | They are used internally only. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A771.4070104@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Remove test_sd_parent()Li Zefan2013-03-061-9/+0
| | | | | | | | | | | | | | | | | | It's unused. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A75F.4070202@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | sched: Remove some dummy functionsLi Zefan2013-03-061-12/+0
|/ | | | | | | | | No one will call those functions if CONFIG_SCHED_DEBUG=n. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/5135A748.3050206@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Linux 3.9-rc1v3.9-rc1Linus Torvalds2013-03-041-2/+2
|
* Merge tag 'disintegrate-fbdev-20121220' of ↵Linus Torvalds2013-03-038-254/+284
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/dhowells/linux-headers Pull fbdev UAPI disintegration from David Howells: "You'll be glad to here that the end is nigh for the UAPI patches. Only the fbdev/framebuffer piece remains now that the SCSI stuff has gone in. Here are the UAPI disintegration bits for the fbdev drivers. It appears that Florian hasn't had time to deal with my patch, but back in December he did say he didn't mind if I pushed it forward." Yay. No more uapi movement. And hopefully no more big header file cleanups coming up either, it just tends to be very painful. * tag 'disintegrate-fbdev-20121220' of git://git.infradead.org/users/dhowells/linux-headers: UAPI: (Scripted) Disintegrate include/video
| * UAPI: (Scripted) Disintegrate include/videoDavid Howells2012-12-208-254/+284
| | | | | | | | | | | | | | | | | | Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Dave Jones <davej@redhat.com>
* | Merge tag 'stable/for-linus-3.9-rc1-tag' of ↵Linus Torvalds2013-03-035-49/+57
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - Update the Xen ACPI memory and CPU hotplug locking mechanism. - Fix PAT issues wherein various applications would not start - Fix handling of multiple MSI as AHCI now does it. - Fix ARM compile failures. * tag 'stable/for-linus-3.9-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xenbus: fix compile failure on ARM with Xen enabled xen/pci: We don't do multiple MSI's. xen/pat: Disable PAT using pat_enabled value. xen/acpi: xen cpu hotplug minor updates xen/acpi: xen memory hotplug minor updates
| * | xenbus: fix compile failure on ARM with Xen enabledSteven Noonan2013-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding an include of linux/mm.h resolves this: drivers/xen/xenbus/xenbus_client.c: In function ‘xenbus_map_ring_valloc_hvm’: drivers/xen/xenbus/xenbus_client.c:532:66: error: implicit declaration of function ‘page_to_section’ [-Werror=implicit-function-declaration] CC: stable@vger.kernel.org Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/pci: We don't do multiple MSI's.Konrad Rzeszutek Wilk2013-03-011-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no hypercall to setup multiple MSI per PCI device. As such with these two new commits: - 08261d87f7d1b6253ab3223756625a5c74532293 PCI/MSI: Enable multiple MSIs with pci_enable_msi_block_auto() - 5ca72c4f7c412c2002363218901eba5516c476b1 AHCI: Support multiple MSIs we would call the PHYSDEVOP_map_pirq 'nvec' times with the same contents of the PCI device. Sander discovered that we would get the same PIRQ value 'nvec' times and return said values to the caller. That of course meant that the device was configured only with one MSI and AHCI would fail with: ahci 0000:00:11.0: version 3.0 xen: registering gsi 19 triggering 0 polarity 1 xen: --> pirq=19 -> irq=19 (gsi=19) (XEN) [2013-02-27 19:43:07] IOAPIC[0]: Set PCI routing entry (6-19 -> 0x99 -> IRQ 19 Mode:1 Active:1) ahci 0000:00:11.0: AHCI 0001.0200 32 slots 4 ports 6 Gbps 0xf impl SATA mode ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part ahci: probe of 0000:00:11.0 failed with error -22 That is b/c in ahci_host_activate the second call to devm_request_threaded_irq would return -EINVAL as we passed in (on the second run) an IRQ that was never initialized. CC: stable@vger.kernel.org Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/pat: Disable PAT using pat_enabled value.Konrad Rzeszutek Wilk2013-02-281-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The git commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1 (xen/pat: Disable PAT support for now) explains in details why we want to disable PAT for right now. However that change was not enough and we should have also disabled the pat_enabled value. Otherwise we end up with: mmap-example:3481 map pfn expected mapping type write-back for [mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() mem 0x00010000-0x00010fff], got uncached-minus ------------[ cut here ]------------ WARNING: at /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 untrack_pfn+0xb8/0xd0() ... Pid: 3481, comm: mmap-example Tainted: GF 3.8.0-6-generic #13-Ubuntu Call Trace: [<ffffffff8105879f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810587fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff8104bcc8>] untrack_pfn+0xb8/0xd0 [<ffffffff81156c1c>] unmap_single_vma+0xac/0x100 [<ffffffff81157459>] unmap_vmas+0x49/0x90 [<ffffffff8115f808>] exit_mmap+0x98/0x170 [<ffffffff810559a4>] mmput+0x64/0x100 [<ffffffff810560f5>] dup_mm+0x445/0x660 [<ffffffff81056d9f>] copy_process.part.22+0xa5f/0x1510 [<ffffffff81057931>] do_fork+0x91/0x350 [<ffffffff81057c76>] sys_clone+0x16/0x20 [<ffffffff816ccbf9>] stub_clone+0x69/0x90 [<ffffffff816cc89d>] ? system_call_fastpath+0x1a/0x1f ---[ end trace 4918cdd0a4c9fea4 ]--- (a similar message shows up if you end up launching 'mcelog') The call chain is (as analyzed by Liu, Jinsong): do_fork --> copy_process --> dup_mm --> dup_mmap --> copy_page_range --> track_pfn_copy --> reserve_pfn_range --> line 624: flags != want_flags It comes from different memory types of page table (_PAGE_CACHE_WB) and MTRR (_PAGE_CACHE_UC_MINUS). Stefan Bader dug in this deep and found out that: "That makes it clearer as this will do reserve_memtype(...) --> pat_x_mtrr_type --> mtrr_type_lookup --> __mtrr_type_lookup And that can return -1/0xff in case of MTRR not being enabled/initialized. Which is not the case (given there are no messages for it in dmesg). This is not equal to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS. It looks like the problem starts early in reserve_memtype: if (!pat_enabled) { /* This is identical to page table setting without PAT */ if (new_type) { if (req_type == _PAGE_CACHE_WC) *new_type = _PAGE_CACHE_UC_MINUS; else *new_type = req_type & _PAGE_CACHE_MASK; } return 0; } This would be what we want, that is clearing the PWT and PCD flags from the supported flags - if pat_enabled is disabled." This patch does that - disabling PAT. CC: stable@vger.kernel.org # 3.3 and further Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Reported-and-Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reported-and-Tested-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/acpi: xen cpu hotplug minor updatesLiu Jinsong2013-02-251-22/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently at native Rafael did some cleanup for acpi, say, drop acpi_bus_add, remove unnecessary argument of acpi_bus_scan, and run acpi_bus_scan under acpi_scan_lock. This patch does similar cleanup for xen cpu hotplug, removing redundant logic, and adding lock. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | xen/acpi: xen memory hotplug minor updatesLiu Jinsong2013-02-251-26/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan Carpenter found current xen memory hotplug logic has potential issue: at func acpi_memory_get_device() *mem_device = acpi_driver_data(device); while the device may be NULL and then dereference. At native side, Rafael recently updated acpi_memory_get_device(), dropping acpi_bus_add, adding lock, and avoiding above issue. This patch updates xen memory hotplug logic accordingly, removing redundant logic, adding lock, and avoiding dereference. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2013-03-0326-171/+108
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more VFS bits from Al Viro: "Unfortunately, it looks like xattr series will have to wait until the next cycle ;-/ This pile contains 9p cleanups and fixes (races in v9fs_fid_add() etc), fixup for nommu breakage in shmem.c, several cleanups and a bit more file_inode() work" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify path_get/path_put and fs_struct.c stuff fix nommu breakage in shmem.c cache the value of file_inode() in struct file 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry 9p: make sure ->lookup() adds fid to the right dentry 9p: untangle ->lookup() a bit 9p: double iput() in ->lookup() if d_materialise_unique() fails 9p: v9fs_fid_add() can't fail now v9fs: get rid of v9fs_dentry 9p: turn fid->dlist into hlist 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine more file_inode() open-coded instances selinux: opened file can't have NULL or negative ->f_path.dentry (In the meantime, the hlist traversal macros have changed, so this required a semantic conflict fixup for the newly hlistified fid->dlist)
| * | | constify path_get/path_put and fs_struct.c stuffAl Viro2013-03-025-10/+10
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fix nommu breakage in shmem.cAl Viro2013-03-021-3/+2
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | cache the value of file_inode() in struct fileAl Viro2013-03-023-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that this thing does *not* contribute to inode refcount; it's pinned down by dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentryAl Viro2013-02-281-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ... otherwise the path we'd built isn't worth much. Don't accept such fids obtained from paths unless dentry is still alived by the end of the work. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: make sure ->lookup() adds fid to the right dentryAl Viro2013-02-281-2/+5
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: untangle ->lookup() a bitAl Viro2013-02-281-18/+9
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: double iput() in ->lookup() if d_materialise_unique() failsAl Viro2013-02-281-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | d_materialise_unique() does iput() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: v9fs_fid_add() can't fail nowAl Viro2013-02-284-22/+11
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | v9fs: get rid of v9fs_dentryAl Viro2013-02-283-51/+9
| | | | | | | | | | | | | | | | | | | | | | | | ->d_fsdata can act as hlist_head... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: turn fid->dlist into hlistAl Viro2013-02-284-11/+9
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do ↵Al Viro2013-02-282-7/+6
| | | | | | | | | | | | | | | | | | | | | | | | just fine Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | more file_inode() open-coded instancesAl Viro2013-02-2710-45/+38
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | selinux: opened file can't have NULL or negative ->f_path.dentryAl Viro2013-02-271-9/+0
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>