summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sched, cleanup, treewide: Remove set_current_state(TASK_RUNNING) after ↵Kirill Tkhai2014-09-1921-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | schedule() schedule(), io_schedule() and schedule_timeout() always return with TASK_RUNNING state set, so one more setting is unnecessary. (All places in patch are visible good, only exception is kiblnd_scheduler() from: drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c Its schedule() is one line above standard 3 lines of unified diff) No places where set_current_state() is used for mb(). Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1410529254.3569.23.camel@tkhai Cc: Alasdair Kergon <agk@redhat.com> Cc: Anil Belur <askb23@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: David Airlie <airlied@linux.ie> Cc: David Howells <dhowells@redhat.com> Cc: Dmitry Eremin <dmitry.eremin@intel.com> Cc: Frank Blaschka <blaschka@linux.vnet.ibm.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Isaac Huang <he.huang@intel.com> Cc: James E.J. Bottomley <JBottomley@parallels.com> Cc: James E.J. Bottomley <jejb@parisc-linux.org> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Liang Zhen <liang.zhen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Masaru Nomura <massa.nomura@gmail.com> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleg Drokin <green@linuxhacker.ru> Cc: Peng Tao <bergwolf@gmail.com> Cc: Richard Weinberger <richard@nod.at> Cc: Robert Love <robert.w.love@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Cc: Zi Shen Lim <zlim.lnx@gmail.com> Cc: devel@driverdev.osuosl.org Cc: dm-devel@redhat.com Cc: dri-devel@lists.freedesktop.org Cc: fcoe-devel@open-fcoe.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux390@de.ibm.com Cc: linux-afs@lists.infradead.org Cc: linux-cris-kernel@axis.com Cc: linux-kernel@vger.kernel.org Cc: linux-nfs@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-raid@vger.kernel.org Cc: linux-s390@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: qla2xxx-upstream@qlogic.com Cc: user-mode-linux-devel@lists.sourceforge.net Cc: user-mode-linux-user@lists.sourceforge.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched, time: Fix lock inversion in thread_group_cputime()Rik van Riel2014-09-191-2/+3
| | | | | | | | | | | | | | | | | | | | | | The sig->stats_lock nests inside the tasklist_lock and the sighand->siglock in __exit_signal and wait_task_zombie. However, both of those locks can be taken from irq context, which means we need to use the interrupt safe variant of read_seqbegin_or_lock. This blocks interrupts when the "lock" branch is taken (seq is odd), preventing the lock inversion. On the first (lockless) pass through the loop, irqs are not blocked. Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: prarit@redhat.com Cc: oleg@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410527535-9814-3-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* seqlock: Add irqsave variant of read_seqbegin_or_lock()Rik van Riel2014-09-191-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | There are cases where read_seqbegin_or_lock() needs to block irqs, because the seqlock in question nests inside a lock that is also be taken from irq context. Add read_seqbegin_or_lock_irqsave() and done_seqretry_irqrestore(), which are almost identical to read_seqbegin_or_lock() and done_seqretry(). Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: prarit@redhat.com Cc: oleg@redhat.com Cc: sgruszka@redhat.com Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Link: http://lkml.kernel.org/r/1410527535-9814-2-git-send-email-riel@redhat.com [ Improved the readability of the code a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* cpuidle: Use wake_up_all_idle_cpus() to wake up all idle cpusChuansheng Liu2014-09-191-7/+2
| | | | | | | | | | | | | | | | | | | | | | Currently kick_all_cpus_sync() or smp_call_function() can not break the polling idle cpu immediately. Instead using wake_up_all_idle_cpus() which can wake up the polling idle cpu quickly is much more helpful for power. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-pm@vger.kernel.org Cc: changcheng.liu@intel.com Cc: xiaoming.wang@intel.com Cc: souvik.k.chakravarty@intel.com Cc: luto@amacapital.net Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Link: http://lkml.kernel.org/r/1409815075-4180-3-git-send-email-chuansheng.liu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* smp: Add new wake_up_all_idle_cpus() functionChuansheng Liu2014-09-192-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently kick_all_cpus_sync() can break non-polling idle cpus thru IPI interrupts. But sometimes we need to break the polling idle cpus immediately to reselect the suitable c-state, also for non-idle cpus, we need to do nothing if we try to wake up them. Here adding one new function wake_up_all_idle_cpus() to let all cpus out of idle based on function wake_up_if_idle(). Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: daniel.lezcano@linaro.org Cc: rjw@rjwysocki.net Cc: linux-pm@vger.kernel.org Cc: changcheng.liu@intel.com Cc: xiaoming.wang@intel.com Cc: souvik.k.chakravarty@intel.com Cc: luto@amacapital.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1409815075-4180-2-git-send-email-chuansheng.liu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Add new API wake_up_if_idle() to wake up the idle cpuChuansheng Liu2014-09-192-0/+20
| | | | | | | | | | | | | | | | | | | Implementing one new API wake_up_if_idle(), which is used to wake up the idle CPU. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: daniel.lezcano@linaro.org Cc: rjw@rjwysocki.net Cc: linux-pm@vger.kernel.org Cc: changcheng.liu@intel.com Cc: xiaoming.wang@intel.com Cc: souvik.k.chakravarty@intel.com Cc: chuansheng.liu@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409815075-4180-1-git-send-email-chuansheng.liu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/numa: Use select_idle_sibling() to select a destination for ↵Rik van Riel2014-09-191-0/+8
| | | | | | | | | | | | | | | | | | | | | | task_numa_move() The code in task_numa_compare() will only examine at most one idle CPU per node, because they all have the same score. However, some idle CPUs are better candidates than others, due to busy or idle SMT siblings, etc... The scheduler has logic to find the best CPU within an LLC to place a task. The NUMA code should probably use it. This seems to reduce the standard deviation for single instance SPECjbb2005 with a low warehouse count on my 4 node test system. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140904163530.189d410a@cuia.bos.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Documentation/scheduler/sched-deadline.txt: Add minimal main() appendixJuri Lelli2014-09-161-0/+126
| | | | | | | | | | | | | | | | | Add an appendix providing a simple self-contained code snippet showing how SCHED_DEADLINE reservations can be created by application developers. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Reviewed-by: Henrik Austad <henrik@austad.us> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410256636-26171-6-git-send-email-juri.lelli@arm.com [ Fixed some whitespace inconsistencies. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Documentation/scheduler/sched-deadline.txt: Add tests suite appendixJuri Lelli2014-09-161-0/+52
| | | | | | | | | | | | | | | | Add an appendix briefly describing tools that can be used to test SCHED_DEADLINE (and the scheduler in general). Links to where source code of the tools is hosted are also provided. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Reviewed-by: Henrik Austad <henrik@austad.us> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410256636-26171-5-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Documentation/scheduler/sched-deadline.txt: Improve and clarify AC bitsLuca Abeni2014-09-161-17/+82
| | | | | | | | | | | | | | | | | | | | Admission control is of key importance for SCHED_DEADLINE, since it guarantees system schedulability (or tells us something about the degree of guarantees we can provide to the user). This patch improves and clarifies bits and pieces regarding AC, both for UP and SMP systems. Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Reviewed-by: Henrik Austad <henrik@austad.us> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410256636-26171-4-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Documentation/scheduler/sched-deadline.txt: Rewrite section 4 introJuri Lelli2014-09-161-26/+25
| | | | | | | | | | | | | | | | Section 4 intro was still describing the old interface. Rewrite it. Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Reviewed-by: Henrik Austad <henrik@austad.us> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410256636-26171-3-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Documentation/scheduler/sched-deadline.txt: Fix terminology and improve clarityLuca Abeni2014-09-161-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | Several small changes regarding SCHED_DEADLINE documentation that fix terminology and improve clarity and readability: - "current runtime" becomes "remaining runtime" - readablity of an equation is improved by introducing more spacing - clarify when admission control will certainly fail - new URL for CBS technical report - substitue "smallest" with "earliest" Signed-off-by: Luca Abeni <luca.abeni@unitn.it> Signed-off-by: Juri Lelli <juri.lelli@arm.com> Reviewed-by: Henrik Austad <henrik@austad.us> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dario Faggioli <raistlin@linux.it> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1410256636-26171-2-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Reduce contention in update_cfs_rq_blocked_load()Jason Low2014-09-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running workloads on 2+ socket systems, based on perf profiles, the update_cfs_rq_blocked_load() function often shows up as taking up a noticeable % of run time. Much of the contention is in __update_cfs_rq_tg_load_contrib() when we update the tg load contribution stats. However, it turns out that in many cases, they don't need to be updated and "tg_contrib" is 0. This patch adds a check in __update_cfs_rq_tg_load_contrib() to skip updating tg load contribution stats when nothing needs to be updated. This reduces the cacheline contention that would be unnecessary. Reviewed-by: Ben Segall <bsegall@google.com> Reviewed-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Jason Low <jason.low2@hp.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: jason.low2@hp.com Cc: Yuyang Du <yuyang.du@intel.com> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Chegu Vinod <chegu_vinod@hp.com> Cc: Scott J Norton <scott.norton@hp.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409643684.19197.15.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: Migrate waking tasksLai Jiangshan2014-09-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Current code can fail to migrate a waking task (silently) when TTWU_QUEUE is enabled. When a task is waking, it is pending on the wake_list of the rq, but it is not queued (task->on_rq == 0). In this case, set_cpus_allowed_ptr() and __migrate_task() will not migrate it because its invisible to them. This behavior is incorrect, because the task has been already woken, it will be running on the wrong CPU without correct placement until the next wake-up or update for cpus_allowed. To fix this problem, we need to finish the wakeup (so they appear on the runqueue) before we migrate them. Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Tested-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/538ED7EB.5050303@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched, time: Atomically increment stime & utimeRik van Riel2014-09-081-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions task_cputime_adjusted and thread_group_cputime_adjusted() can be called locklessly, as well as concurrently on many different CPUs. This can occasionally lead to the utime and stime reported by times(), and other syscalls like it, going backward. The cause for this appears to be multiple threads racing in cputime_adjust(), both with values for utime or stime that is larger than the original, but each with a different value. Sometimes the larger value gets saved first, only to be immediately overwritten with a smaller value by another thread. Using atomic exchange prevents that problem, and ensures time progresses monotonically. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: umgwanakikbuti@gmail.com Cc: fweisbec@gmail.com Cc: akpm@linux-foundation.org Cc: srao@redhat.com Cc: lwoodman@redhat.com Cc: atheurer@redhat.com Cc: oleg@redhat.com Link: http://lkml.kernel.org/r/1408133138-22048-4-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* time, signal: Protect resource use statistics with seqlockRik van Riel2014-09-086-29/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability issues on large systems, due to both functions being serialized with a lock. The lock protects against reporting a wrong value, due to a thread in the task group exiting, its statistics reporting up to the signal struct, and that exited task's statistics being counted twice (or not at all). Protecting that with a lock results in times() and clock_gettime() being completely serialized on large systems. This can be fixed by using a seqlock around the events that gather and propagate statistics. As an additional benefit, the protection code can be moved into thread_group_cputime(), slightly simplifying the calling functions. In the case of posix_cpu_clock_get_task() things can be simplified a lot, because the calling function already ensures that the task sticks around, and the rest is now taken care of in thread_group_cputime(). This way the statistics reporting code can run lockless. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Daeseok Youn <daeseok.youn@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guillaume Morin <guillaume@morinfr.org> Cc: Ionut Alexa <ionut.m.alexa@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Michal Schmidt <mschmidt@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: umgwanakikbuti@gmail.com Cc: fweisbec@gmail.com Cc: srao@redhat.com Cc: lwoodman@redhat.com Cc: atheurer@redhat.com Link: http://lkml.kernel.org/r/20140816134010.26a9b572@annuminas.surriel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* exit: Always reap resource stats in __exit_signal()Rik van Riel2014-09-081-22/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleg pointed out that wait_task_zombie adds a task's usage statistics to the parent's signal struct, but the task's own signal struct should also propagate the statistics at exit time. This allows thread_group_cputime(reaped_zombie) to get the statistics after __unhash_process() has made the task invisible to for_each_thread, but before the thread has actually been rcu freed, making sure no non-monotonic results are returned inside that window. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Guillaume Morin <guillaume@morinfr.org> Cc: Ionut Alexa <ionut.m.alexa@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Li Zefan <lizefan@huawei.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Michal Schmidt <mschmidt@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: umgwanakikbuti@gmail.com Cc: fweisbec@gmail.com Cc: srao@redhat.com Cc: lwoodman@redhat.com Cc: atheurer@redhat.com Link: http://lkml.kernel.org/r/1408133138-22048-2-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'v3.17-rc4' into sched/core, to prevent conflicts with upcoming ↵Ingo Molnar2014-09-089224-441113/+375888
|\ | | | | | | | | | | patches, and to refresh the tree Linux 3.17-rc4
| * Linux 3.17-rc4v3.17-rc4Linus Torvalds2014-09-081-1/+1
| |
| * Documentation: new page link in SubmittingPatchesSudip Mukherjee2014-09-081-0/+1
| | | | | | | | | | | | | | | | new link for - How to piss off a Linux kernel subsystem maintainer Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Documentation: NFS/RDMA: Document separate Kconfig symbolsPaul Bolle2014-09-081-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The NFS/RDMA Kconfig symbol was split into separate options for client and server in commit 2e8c12e1b765 ("xprtrdma: add separate Kconfig options for NFSoRDMA client and server support"). Update the documentation to reflect this split. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "J. Bruce Fields" <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Documentation: misc-devices: Rename freefall.c from hpfall.c in lis2lv02dMasanari Iida2014-09-081-1/+1
| | | | | | | | | | | | | | | | | | hpfall.c was renamed to freefall.c in 3.16, but this file still refer to hpfall.c instead of freefall.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Documentation: i2c: rename variable "register" to "reg"Jose Manuel Alarcon Roldan2014-09-081-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The example code provided with the i2c device interface documentation won't compile since it uses the reserved word "register" to name a variable. The compiler fails with this error message: error: expected identifier or '(' before '=' token __u8 register = 0x20; /* Device register to access */ ^ Rename the variable "register" to simply "reg" in the example code. Another couple of typos has been fixed as well. [Change "! =" to "!=".] Signed-off-by: Jose Alarcon Roldan <jose.alarcon.roldan@gmail.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Documentation: seq_file: Document seq_open_private(), seq_release_private()Rob Jones2014-09-081-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | Despite the fact that these functions have been around for years, they are little used (only 15 uses in 13 files at the preseht time) even though many other files use work-arounds to achieve the same result. By documenting them, hopefully they will become more widely used. Signed-off-by: Rob Jones <rob.jones@codethink.co.uk> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * Merge tag 'pm+acpi-3.17-rc4' of ↵Linus Torvalds2014-09-0712-34/+122
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management fixes from Rafael Wysocki: "These are regression fixes (ACPI sysfs, ACPI video, suspend test), ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD output, a fix and a new CPU ID for the RAPL driver, new blacklist entry for the ACPI EC driver and a couple of trivial cleanups (intel_pstate and generic PM domains). Specifics: - Fix for recently broken test_suspend= command line argument (Rafael Wysocki). - Fixes for regressions related to the ACPI video driver caused by switching the default to native backlight handling in 3.16 from Hans de Goede. - Fix for a sysfs attribute of ACPI device objects that returns stale values sometimes due to the fact that they are cached instead of executing the appropriate method (_SUN) every time (broken in 3.14). From Yasuaki Ishimatsu. - Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the ACPI processor driver from Jiri Kosina. - Runtime output validation for the ACPI _DSD device configuration object missing from the support for it that has been introduced recently. From Mika Westerberg. - Fix for an unuseful and misleading RAPL (Running Average Power Limit) domain detection message in the RAPL driver from Jacob Pan. - New Intel Haswell CPU ID for the RAPL driver from Jason Baron. - New Clevo W350etq blacklist entry for the ACPI EC driver from Lan Tianyu. - Cleanup for the intel_pstate driver and the core generic PM domains code from Gabriele Mazzotta and Geert Uytterhoeven" * tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock ACPI / scan: not cache _SUN value in struct acpi_device_pnp cpufreq: intel_pstate: Remove unneeded variable powercap / RAPL: change domain detection message powercap / RAPL: add support for CPU model 0x3f PM / domains: Make generic_pm_domain.name const PM / sleep: Fix test_suspend= command line option ACPI / EC: Add msi quirk for Clevo W350etq ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC ACPI / video: Add a disable_native_backlight quirk ACPI / video: Fix use_native_backlight selection logic ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.
| | *-----. Merge branches 'pm-sleep', 'powercap', 'pm-domains' and 'pm-cpufreq'Rafael J. Wysocki2014-09-056-22/+26
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * pm-sleep: PM / sleep: Fix test_suspend= command line option * powercap: powercap / RAPL: change domain detection message powercap / RAPL: add support for CPU model 0x3f * pm-domains: PM / domains: Make generic_pm_domain.name const * pm-cpufreq: cpufreq: intel_pstate: Remove unneeded variable
| | | | | | * cpufreq: intel_pstate: Remove unneeded variableGabriele Mazzotta2014-09-031-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It should have been removed with commit d1b6848590af ("cpufreq / intel_pstate: Optimize intel_pstate_set_policy") Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | | * | PM / domains: Make generic_pm_domain.name constGeert Uytterhoeven2014-09-031-1/+1
| | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | * | powercap / RAPL: change domain detection messageJacob Pan2014-09-031-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many CPUs do not support complete set of RAPL domains, as a result this detection failed message is very misleading and can be annoying. [ 5.082632] intel_rapl: RAPL domain core detection failed [ 5.088370] intel_rapl: RAPL domain uncore detection failed So lower the warning message to info and only print out the RAPL domains that are supported. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | | * | powercap / RAPL: add support for CPU model 0x3fJason Baron2014-09-031-0/+1
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've confirmed that monitoring the package power usage as well as setting power limits appear to be working as expected. Supports the package and dram domains. Tested aginst cpu: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * / PM / sleep: Fix test_suspend= command line optionRafael J. Wysocki2014-09-033-13/+21
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit d431cbc53cb7 (PM / sleep: Simplify sleep states sysfs interface code) the pm_states[] array is not populated initially, which causes setup_test_suspend() to always fail and the suspend testing during boot doesn't work any more. Fix the problem by using pm_labels[] instead of pm_states[] in setup_test_suspend() and storing a pointer to the label of the sleep state to test rather than the number representing it, because the connection between the state numbers and labels is only established by suspend_set_ops(). Fixes: d431cbc53cb7 (PM / sleep: Simplify sleep states sysfs interface code) Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | |
| | | \
| | *-. \ Merge branches 'acpi-video' and 'acpi-ec'Rafael J. Wysocki2014-09-052-2/+47
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpi-video: ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC ACPI / video: Add a disable_native_backlight quirk ACPI / video: Fix use_native_backlight selection logic * acpi-ec: ACPI / EC: Add msi quirk for Clevo W350etq
| | | | * | ACPI / EC: Add msi quirk for Clevo W350etqLan Tianyu2014-09-021-0/+4
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clevo W350etq's EC will not produce GPE interrupt some time after booting. The ACPI notify event won't trigger when the issue takes place. After debugging, adding msi quirk for the machine can fix the issue. This patch is to add msi quirk for the machine. Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431 Reported-and-tested-by: qbanin@gmail.com Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PCHans de Goede2014-09-021-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Link: https://bugs.freedesktop.org/show_bug.cgi?id=81515 Reported-and-tested-by: Hohahiu <rakothedin@gmail.com> Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | ACPI / video: Add a disable_native_backlight quirkHans de Goede2014-09-021-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some laptops have a working acpi_video backlight control, and using native backlight on these causes a regression where backlight control does not work when userspace is not handling brightness key events. Disable native_backlight on these to fix this. Link: https://bugzilla.kernel.org/show_bug.cgi?id=81691 Reported-and-tested-by: Andre Müller <andre.muller@web.de> Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * | ACPI / video: Fix use_native_backlight selection logicHans de Goede2014-09-021-2/+2
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 751109aad583 ("ACPI / video: Change the default for video.use_native_backlight to 1") has changed the default for use_native_backlight from 0 to 1, but instead of changing use_native_backlight_dmi to true, and leaving use_native_backlight_param at -1, it has changed use_native_backlight_param to 1. This causes acpi_video_use_native_backlight() to always think that a value was specified through the param, making it impossible to add a dmi based quirk to force 0 now that the default is 1. This fixes this by restoring the use_native_backlight_param default to -1, and instead setting the use_native_backlight_dmi default to true. Fixes: 751109aad583 (ACPI / video: Change the default for video.use_native_backlight to 1) Cc: 3.16+ <stable@vger.kernel.org> # 3.16+ Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | |
| | | \
| | *-. \ Merge branches 'acpica', 'acpi-processor' and 'acpi-scan'Rafael J. Wysocki2014-09-053-10/+10
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * acpica: ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package. * acpi-processor: ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock * acpi-scan: ACPI / scan: not cache _SUN value in struct acpi_device_pnp
| | | | * | ACPI / scan: not cache _SUN value in struct acpi_device_pnpYasuaki Ishimatsu2014-09-032-8/+8
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The _SUN device indentification object is not guaranteed to return the same value every time it is executed, so we should not cache its return value, but rather execute it every time as needed. If it is cached, an incorrect stale value may be used in some situations. This issue was exposed by commit 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace). Fix it by avoiding to cache the return value of _SUN. Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace) Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | | * / ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lockJiri Kosina2014-09-031-2/+2
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a following AB-BA dependency between cpu_hotplug.lock and cpuidle_lock: 1) cpu_hotplug.lock -> cpuidle_lock enable_nonboot_cpus() _cpu_up() cpu_hotplug_begin() LOCK(cpu_hotplug.lock) cpu_notify() ... acpi_processor_hotplug() cpuidle_pause_and_lock() LOCK(cpuidle_lock) 2) cpuidle_lock -> cpu_hotplug.lock acpi_os_execute_deferred() workqueue ... acpi_processor_cst_has_changed() cpuidle_pause_and_lock() LOCK(cpuidle_lock) get_online_cpus() LOCK(cpu_hotplug.lock) Fix this by reversing the order acpi_processor_cst_has_changed() does thigs -- let it first execute the protection against CPU hotplug by calling get_online_cpus() and obtain the cpuidle lock only after that (and perform the symmentric change when allowing CPUs hotplug again and dropping cpuidle lock). Spotted by lockdep. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| | * / ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.Mika Westerberg2014-09-021-0/+39
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds ACPICA kernel runtime support to validate contents/format of the _DSD package, similar to the iASL support. Ported by Mika Westerberg. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * | Merge branch 'for-linus' of ↵Linus Torvalds2014-09-074-14/+18
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull filesystem fixes from Al Viro: "Several bugfixes (all of them -stable fodder). Alexey's one deals with double mutex_lock() in UFS (apparently, nobody has tried to test "ufs: sb mutex merge + mutex_destroy" on something like file creation/removal on ufs). Mine deal with two kinds of umount bugs, in umount propagation and in handling of automounted submounts, both resulting in bogus transient EBUSY from umount" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: fix deadlocks introduced by sb mutex merge fix EBUSY on umount() from MNT_SHRINKABLE get rid of propagate_umount() mistakenly treating slaves as busy.
| | * | ufs: fix deadlocks introduced by sb mutex mergeAlexey Khoroshilov2014-09-072-13/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0244756edc4b ("ufs: sb mutex merge + mutex_destroy") introduces deadlocks in ufs_new_inode() and ufs_free_inode(). Most callers of that functions acqure the mutex by themselves and ufs_{new,free}_inode() do that via lock_ufs(), i.e we have an unavoidable double lock. The patch proposes to resolve the issue by making sure that ufs_{new,free}_inode() are not called with the mutex held. Found by Linux Driver Verification project (linuxtesting.org). Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | fix EBUSY on umount() from MNT_SHRINKABLEAl Viro2014-08-311-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need the parents of victims alive until namespace_unlock() gets to dput() of the (ex-)mountpoints. However, that screws up the "is it busy" checks in case when we have shrinkable mounts that need to be killed. Solution: go ahead and decrement refcounts of parents right in umount_tree(), increment them again just before dropping rwsem in namespace_unlock() (and let the loop in the end of namespace_unlock() finally drop those references for good, as we do now). Parents can't get freed until we drop rwsem - at least one reference is kept until then, both in case when parent is among the victims and when it is not. So they'll still be around when we get to namespace_unlock(). Cc: stable@vger.kernel.org # 3.12+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| | * | get rid of propagate_umount() mistakenly treating slaves as busy.Al Viro2014-08-312-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The check in __propagate_umount() ("has somebody explicitly mounted something on that slave?") is done *before* taking the already doomed victims out of the child lists. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | Merge branch 'core-urgent-for-linus' of ↵Linus Torvalds2014-09-072-12/+12
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU fix from Ingo Molnar: "A boot hang fix for the offloaded callback RCU model (RCU_NOCB_CPU=y && (TREE_CPU=y || TREE_PREEMPT_RC)) in certain bootup scenarios" * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rcu: Make nocb leader kthreads process pending callbacks after spawning
| | * \ \ Merge branch 'rcu/urgent' of ↵Ingo Molnar2014-09-032-12/+12
| | |\ \ \ | | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent Pull an RCU fix from Paul E. McKenney: "This series contains a single commit fixing an initialization bug reported by Amit Shah and fixed by Pranith Kumar (and tested by Amit). This bug results in a boot-time hang in callback-offloaded configurations where callbacks were posted before the offloading ('rcuo') kthreads were created." Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | | * | rcu: Make nocb leader kthreads process pending callbacks after spawningPranith Kumar2014-08-282-12/+12
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nocb callbacks generated before the nocb kthreads are spawned are enqueued in the nocb queue for later processing. Commit fbce7497ee5af ("rcu: Parallelize and economize NOCB kthread wakeups") introduced nocb leader kthreads which checked the nocb_leader_wake flag to see if there were any such pending callbacks. A case was reported in which newly spawned leader kthreads were not processing the pending callbacks as this flag was not set, which led to a boot hang. The following commit ensures that the newly spawned nocb kthreads process the pending callbacks by allowing the kthreads to run immediately after spawning instead of waiting. This is done by inverting the logic of nocb_leader_wake tests to nocb_leader_sleep which allows us to use the default initialization of this flag to 0 to let the kthreads run. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://www.spinics.net/lists/kernel/msg1802899.html [ paulmck: Backported to v3.17-rc2. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Amit Shah <amit.shah@redhat.com>
| * | | Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2014-09-074-11/+39
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Three fixlets from the timer departement: - Update the timekeeper before updating vsyscall and pvclock. This fixes the kvm-clock regression reported by Chris and Paolo. - Use the proper irq work interface from NMI. This fixes the regression reported by Catalin and Dave. - Clarify the compat_nanosleep error handling mechanism to avoid future confusion" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timekeeping: Update timekeeper before updating vsyscall and pvclock compat: nanosleep: Clarify error handling nohz: Restore NMI safe local irq work for local nohz kick
| | * | | timekeeping: Update timekeeper before updating vsyscall and pvclockThomas Gleixner2014-09-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The update_walltime() code works on the shadow timekeeper to make the seqcount protected region as short as possible. But that update to the shadow timekeeper does not update all timekeeper fields because it's sufficient to do that once before it becomes life. One of these fields is tkr.base_mono. That stays stale in the shadow timekeeper unless an operation happens which copies the real timekeeper to the shadow. The update function is called after the update calls to vsyscall and pvclock. While not correct, it did not cause any problems because none of the invoked update functions used base_mono. commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based) changed that in the kvm pvclock update function, so the stale mono_base value got used and caused kvm-clock to malfunction. Put the update where it belongs and fix the issue. Reported-by: Chris J Arges <chris.j.arges@canonical.com> Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | | compat: nanosleep: Clarify error handlingThomas Gleixner2014-09-061-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handling in compat_sys_nanosleep() is correct, but completely non obvious. Document it and restrict it to the -ERESTART_RESTARTBLOCK return value for clarity. Reported-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>