summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2013-04-3035-385/+1178
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core timer updates from Ingo Molnar: "The main changes in this cycle's merge are: - Implement shadow timekeeper to shorten in kernel reader side blocking, by Thomas Gleixner. - Posix timers enhancements by Pavel Emelyanov: - allocate timer ID per process, so that exact timer ID allocations can be re-created be checkpoint/restore code. - debuggability and tooling (/proc/PID/timers, etc.) improvements. - suspend/resume enhancements by Feng Tang: on certain new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, so the TSC value won't be reset to 0 after resume. This can be taken advantage of by the generic via the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to recover/approximate sleep time, the main (and precise) clocksource can be used. - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many CPUs the file goes beyond 4MB of size and thus the current simplistic seqfile approach fails. Convert /proc/timer_list to a proper seq_file with its own iterator. - Cleanups and refactorings of the core timekeeping code by John Stultz. - International Atomic Clock time is managed by the NTP code internally currently but not exposed externally. Separate the TAI code out and add CLOCK_TAI support and TAI support to the hrtimer and posix-timer code, by John Stultz. - Add deep idle support enhacement to the broadcast clockevents core timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ clockevents feature (which will be utilized by future clockevents driver updates), which allows the use of IRQ affinities to avoid spurious wakeups of idle CPUs - the right CPU with an expiring timer will be woken. - Add new ARM bcm281xx clocksource driver, by Christian Daudt - ... various other fixes and cleanups" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) clockevents: Set dummy handler on CPU_DEAD shutdown timekeeping: Update tk->cycle_last in resume posix-timers: Remove unused variable clockevents: Switch into oneshot mode even if broadcast registered late timer_list: Convert timer list to be a proper seq_file timer_list: Split timer_list_show_tickdevices posix-timers: Show sigevent info in proc file posix-timers: Introduce /proc/PID/timers file posix timers: Allocate timer id per process (v2) timekeeping: Make sure to notify hrtimers when TAI offset changes hrtimer: Fix ktime_add_ns() overflow on 32bit architectures hrtimer: Add expiry time overflow check in hrtimer_interrupt timekeeping: Shorten seq_count region timekeeping: Implement a shadow timekeeper timekeeping: Delay update of clock->cycle_last timekeeping: Store cycle_last value in timekeeper struct as well ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state timekeeping: Simplify tai updating from do_adjtimex timekeeping: Hold timekeepering locks in do_adjtimex and hardpps timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex() ...
| * clockevents: Set dummy handler on CPU_DEAD shutdownThomas Gleixner2013-04-252-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vitaliy reported that a per cpu HPET timer interrupt crashes the system during hibernation. What happens is that the per cpu HPET timer gets shut down when the nonboot cpus are stopped. When the nonboot cpus are onlined again the HPET code sets up the MSI interrupt which fires before the clock event device is registered. The event handler is still set to hrtimer_interrupt, which then crashes the machine due to highres mode not being active. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=700333 There is no real good way to avoid that in the HPET code. The HPET code alrady has a mechanism to detect spurious interrupts when event handler == NULL for a similar reason. We can handle that in the clockevent/tick layer and replace the previous functional handler with a dummy handler like we do in tick_setup_new_device(). The original clockevents code did this in clockevents_exchange_device(), but that got removed by commit 7c1e76897 (clockevents: prevent clockevent event_handler ending up handler_noop) which forgot to fix it up in tick_shutdown(). Same issue with the broadcast device. Reported-by: Vitaliy Fillipov <vitalif@yourcmc.ru> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: stable@vger.kernel.org Cc: 700333@bugs.debian.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * Merge branch 'linus' into timers/coreThomas Gleixner2013-04-241262-7800/+13864
| |\ | | | | | | | | | | | | | | | Reason: Get upstream fixes before adding conflicting code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Update tk->cycle_last in resumeThomas Gleixner2013-04-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7ec98e15aa (timekeeping: Delay update of clock->cycle_last) forgot to update tk->cycle_last in the resume path. This results in a stale value versus clock->cycle_last and prevents resume in the worst case. Reported-by: Jiri Slaby <jslaby@suse.cz> Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Linux-pm mailing list <linux-pm@lists.linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304211648150.21884@ionos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | posix-timers: Remove unused variableThomas Gleixner2013-04-181-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Remove the unused variable *node introduced by commit 5ed67f05 (posix timers: Allocate timer id per process) Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Pavel Emelyanov <xemul@parallels.com>
| * | clockevents: Switch into oneshot mode even if broadcast registered lateStephen Boyd2013-04-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tick_oneshot_notify() is used to notify a particular CPU to try to switch into oneshot mode after a oneshot capable tick device is registered and tick_clock_notify() is used to notify all CPUs to try to switch into oneshot mode after a high res clocksource is registered. There is one caveat; if the tick devices suffer from FEAT_C3_STOP we don't try to switch into oneshot mode unless we have a oneshot capable broadcast device already registered. If the broadcast device is registered after the tick devices that have FEAT_C3_STOP we'll never try to switch into oneshot mode again, causing us to be stuck in periodic mode forever. Avoid this scenario by calling tick_clock_notify() after we register the broadcast device so that we try to switch into oneshot mode on all CPUs one more time. [ tglx: Adopted to timers/core and added a comment ] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1366219566-29783-1-git-send-email-sboyd@codeaurora.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timer_list: Convert timer list to be a proper seq_fileNathan Zimmer2013-04-171-12/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running with 4096 cores attemping to read /proc/timer_list will fail with an ENOMEM condition. On a sufficantly large systems the total amount of data is more then 4mb, so it won't fit into a single buffer. The failure can also occur on smaller systems when memory fragmentation is high as reported by Dave Jones. Convert /proc/timer_list to a proper seq_file with its own iterator. This is a little more complex given that we have to make two passes with two separate headers. sysrq_timer_list_show also needed to be updated to reflect the fact that now timer_list_show only does one cpu at at time. Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1364345790-14577-3-git-send-email-nzimmer@sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timer_list: Split timer_list_show_tickdevicesNathan Zimmer2013-04-171-12/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split timer_list_show_tickdevices() into the header printout and pull the rest up to timer_list_show. This is a preparatory patch for converting timer_list to a proper seqfile with its own iterator Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Reported-by: Dave Jones <davej@redhat.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Link: http://lkml.kernel.org/r/1364345790-14577-2-git-send-email-nzimmer@sgi.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | posix-timers: Show sigevent info in proc filePavel Emelyanov2013-04-171-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previous patch added proc file to list posix timers created by task. Expand the information provided in this file by adding info about notification method, with which timers were created. I.e. after the "ID:" line there go 1. "signal:" line, that shows signal number and sigval bits; 2. "notify:" line, that shows the timer notification method. Thus the timer entry would looke like this: ID: 123 signal: 14/0000000000b005d0 notify: signal/pid.732 This information is enough to understand how timer_create() was called for each particular timer. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Link: http://lkml.kernel.org/r/513DA024.80404@parallels.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | posix-timers: Introduce /proc/PID/timers filePavel Emelyanov2013-04-171-0/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently kernel doesn't provide any API for getting info about what posix timers are configured by processes. It's implied, that a process which configured some timers, knows what it did. However, for external tools it's impossible to get this information. In particular, this is critical for checkpoint-restore project to have this info. Introduce a per-pid proc file with information about posix timers. Since these timers are shared between threads, this file is present on tgid level only, no such thing in tid subdirs. The file format is expected to be the "/proc/<pid>/smaps"-like, i.e. each timer will occupy seveal lines to allow for future extending. Each new timer entry starts with the ID: <number> line which is added by this patch. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Link: http://lkml.kernel.org/r/513DA00D.6070009@parallels.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | posix timers: Allocate timer id per process (v2)Pavel Emelyanov2013-04-173-38/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently kernel generates IDs for posix timers in a global manner -- there's a kernel-wide IDR tree from which IDs are created. This makes it impossible to recreate a timer with a desired ID (in particular this is done by the CRIU checkpoint-restore project) -- since these IDs are global it may happen, that at the time we recreate a timer, the ID we want for it is already busy by some other timer. In order to address this, replace the IDR tree with a global hash table for timers and makes timer IDs unique per signal_struct (to which timers are linked anyway). With this, two timers belonging to different processes may have equal IDs and we can recreate either of them with the ID we want. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Matthew Helsley <matt.helsley@gmail.com> Link: http://lkml.kernel.org/r/513D9FF5.9010004@parallels.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | timekeeping: Make sure to notify hrtimers when TAI offset changesJohn Stultz2013-04-111-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have CLOCK_TAI timers, make sure we notify hrtimer code when TAI offset is changed. Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1365622909-953-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | hrtimer: Fix ktime_add_ns() overflow on 32bit architecturesDavid Engraf2013-04-081-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One can trigger an overflow when using ktime_add_ns() on a 32bit architecture not supporting CONFIG_KTIME_SCALAR. When passing a very high value for u64 nsec, e.g. 7881299347898368000 the do_div() function converts this value to seconds (7881299347) which is still to high to pass to the ktime_set() function as long. The result in is a negative value. The problem on my system occurs in the tick-sched.c, tick_nohz_stop_sched_tick() when time_delta is set to timekeeping_max_deferment(). The check for time_delta < KTIME_MAX is valid, thus ktime_add_ns() is called with a too large value resulting in a negative expire value. This leads to an endless loop in the ticker code: time_delta: 7881299347898368000 expires = ktime_add_ns(last_update, time_delta) expires: negative value This fix caps the value to KTIME_MAX. This error doesn't occurs on 64bit or architectures supporting CONFIG_KTIME_SCALAR (e.g. ARM, x86-32). Cc: stable@vger.kernel.org Signed-off-by: David Engraf <david.engraf@sysgo.com> [jstultz: Minor tweaks to commit message & header] Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | hrtimer: Add expiry time overflow check in hrtimer_interruptPrarit Bhargava2013-04-081-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The settimeofday01 test in the LTP testsuite effectively does gettimeofday(current time); settimeofday(Jan 1, 1970 + 100 seconds); settimeofday(current time); This test causes a stack trace to be displayed on the console during the setting of timeofday to Jan 1, 1970 + 100 seconds: [ 131.066751] ------------[ cut here ]------------ [ 131.096448] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x135/0x140() [ 131.104935] Hardware name: Dinar [ 131.108150] Modules linked in: sg nfsv3 nfs_acl nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables kvm_amd kvm sp5100_tco bnx2 i2c_piix4 crc32c_intel k10temp fam15h_power ghash_clmulni_intel amd64_edac_mod pcspkr serio_raw edac_mce_amd edac_core microcode xfs libcrc32c sr_mod sd_mod cdrom ata_generic crc_t10dif pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm ahci pata_atiixp libahci libata usb_storage i2c_core dm_mirror dm_region_hash dm_log dm_mod [ 131.176784] Pid: 0, comm: swapper/28 Not tainted 3.8.0+ #6 [ 131.182248] Call Trace: [ 131.184684] <IRQ> [<ffffffff810612af>] warn_slowpath_common+0x7f/0xc0 [ 131.191312] [<ffffffff8106130a>] warn_slowpath_null+0x1a/0x20 [ 131.197131] [<ffffffff810b9fd5>] clockevents_program_event+0x135/0x140 [ 131.203721] [<ffffffff810bb584>] tick_program_event+0x24/0x30 [ 131.209534] [<ffffffff81089ab1>] hrtimer_interrupt+0x131/0x230 [ 131.215437] [<ffffffff814b9600>] ? cpufreq_p4_target+0x130/0x130 [ 131.221509] [<ffffffff81619119>] smp_apic_timer_interrupt+0x69/0x99 [ 131.227839] [<ffffffff8161805d>] apic_timer_interrupt+0x6d/0x80 [ 131.233816] <EOI> [<ffffffff81099745>] ? sched_clock_cpu+0xc5/0x120 [ 131.240267] [<ffffffff814b9ff0>] ? cpuidle_wrap_enter+0x50/0xa0 [ 131.246252] [<ffffffff814b9fe9>] ? cpuidle_wrap_enter+0x49/0xa0 [ 131.252238] [<ffffffff814ba050>] cpuidle_enter_tk+0x10/0x20 [ 131.257877] [<ffffffff814b9c89>] cpuidle_idle_call+0xa9/0x260 [ 131.263692] [<ffffffff8101c42f>] cpu_idle+0xaf/0x120 [ 131.268727] [<ffffffff815f8971>] start_secondary+0x255/0x257 [ 131.274449] ---[ end trace 1151a50552231615 ]--- When we change the system time to a low value like this, the value of timekeeper->offs_real will be a negative value. It seems that the WARN occurs because an hrtimer has been started in the time between the releasing of the timekeeper lock and the IPI call (via a call to on_each_cpu) in clock_was_set() in the do_settimeofday() code. The end result is that a REALTIME_CLOCK timer has been added with softexpires = expires = KTIME_MAX. The hrtimer_interrupt() fires/is called and the loop at kernel/hrtimer.c:1289 is executed. In this loop the code subtracts the clock base's offset (which was set to timekeeper->offs_real in do_settimeofday()) from the current hrtimer_cpu_base->expiry value (which was KTIME_MAX): KTIME_MAX - (a negative value) = overflow A simple check for an overflow can resolve this problem. Using KTIME_MAX instead of the overflow value will result in the hrtimer function being run, and the reprogramming of the timer after that. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> [jstultz: Tweaked commit subject] Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Shorten seq_count regionThomas Gleixner2013-04-041-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shorten the seqcount write hold region to the actual update of the timekeeper and the related data (e.g vsyscall). On a contemporary x86 system this reduces the maximum latencies on Preempt-RT from 8us to 4us on the non-timekeeping cores. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Implement a shadow timekeeperThomas Gleixner2013-04-041-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the shadow timekeeper to do the update_wall_time() adjustments and then copy it over to the real timekeeper. Keep the shadow timekeeper in sync when updating stuff outside of update_wall_time(). This allows us to limit the timekeeper_seq hold time to the update of the real timekeeper and the vsyscall data in the next patch. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Delay update of clock->cycle_lastThomas Gleixner2013-04-041-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | For calculating the new timekeeper values store the new cycle_last value in the timekeeper and update the clock->cycle_last just when we actually update the new values. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Store cycle_last value in timekeeper struct as wellThomas Gleixner2013-04-042-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For implementing a shadow timekeeper and a split calculation/update region we need to store the cycle_last value in the timekeeper and update the value in the clocksource struct only in the update region. Add the extra storage to the timekeeper. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | ntp: Remove ntp_lock, using the timekeeping locks to protect ntp stateJohn Stultz2013-04-041-37/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to properly handle the NTP state in future changes to the timekeeping lock management, this patch moves the management of all of the ntp state under the timekeeping locks. This allows us to remove the ntp_lock. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Simplify tai updating from do_adjtimexJohn Stultz2013-04-041-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we are taking the timekeeping locks, just go ahead and update any tai change directly, rather then dropping the lock and calling a function that will just take it again. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Hold timekeepering locks in do_adjtimex and hardppsJohn Stultz2013-04-041-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In moving the NTP state to be protected by the timekeeping locks, be sure to acquire the timekeeping locks prior to calling ntp functions. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()John Stultz2013-04-042-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since ADJ_SETOFFSET adjusts the timekeeping state, process it as part of the top level do_adjtimex() function in timekeeping.c. This avoids deadlocks that could occur once we change the ntp locking rules. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | ntp: Rework do_adjtimex to take timespec and tai argumentsJohn Stultz2013-04-043-16/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to change the locking rules, we need to provide the timespec and tai values rather then having the ntp logic acquire these values itself. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | ntp: Move timex validation to timekeeping do_adjtimex call.John Stultz2013-04-043-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move logic that does not need the ntp state to be done in the timekeeping do_adjtimex() call. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | ntp: Move do_adjtimex() and hardpps() functions to timekeeping.cJohn Stultz2013-04-044-12/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for changing the ntp locking rules, move do_adjtimex and hardpps accessor functions to timekeeping.c, but keep the code logic in ntp.c. This patch also introduces a ntp_internal.h file so timekeeping specific interfaces of ntp.c can be more limitedly shared with timekeeping.c. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | ntp: Split out timex validation from do_adjtimexJohn Stultz2013-04-041-12/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out the timex validation done in do_adjtimex into a separate function. This will help simplify logic in following patches. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | Merge branch 'fortglx/3.10/time' of ↵Thomas Gleixner2013-04-03130-968/+1676
| |\ \ | | | | | | | | | | | | git://git.linaro.org/people/jstultz/linux into timers/core
| | * | ARM: bcm281xx: Add timer driver (driver portion)Christian Daudt2013-03-284-5/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for the Broadcom timer, used in the following SoCs: BCM11130, BCM11140, BCM11351, BCM28145, BCM28155 Updates from V6: - Split DT portion into a separate patch Updates from V5: - Rebase to latest arm-soc/for-next Updates from V4: - Switch code to use CLOCKSOURCE_OF_DECLARE Updates from V3: - Migrate to 3.9 timer framework updates Updates from V2: - prepend static fns + fields with kona_ Updates from V1: - Rename bcm_timer.c to bcm_kona_timer.c - Pull .h into bcm_kona_timer.c - Make timers static - Clean up comment block - Switched to using clockevents_config_and_register - Added an error to the get_timer loop if it repeats too much - Added to Documentation/devicetree/bindings/arm/bcm/bcm,kona-timer.txt - Added missing readl to timer_disable_and_clear Note: bcm,kona-timer was kept as the 'compatible' field to make it specific enough for when there are multiple bcm timers (bcm,timer is too generic). Signed-off-by: Christian Daudt <csd@broadcom.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Stultz <john.stultz@linaro.org> Reviewed-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: __timekeeping_set_tai_offset can be staticFengguang Wu2013-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yet again, the kbuild test robot saves the day, noting I left out defining __timekeeping_set_tai_offset as static. It even sent me this patch. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Split timekeeper_lock into lock and seqcountThomas Gleixner2013-03-231-59/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to shorten the seqcount write hold time. So split the seqlock into a lock and a seqcount. Open code the seqwrite_lock in the places which matter and drop the sequence counter update where it's pointless. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [jstultz: Merge fixups from CLOCK_TAI collisions] Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Move lock out of timekeeper structThomas Gleixner2013-03-232-57/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the lock a separate entity. Preparatory patch for shadow timekeeper structure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [Merged with CLOCK_TAI changes] Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Make jiffies_lock internalThomas Gleixner2013-03-233-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing outside of the timekeeping core needs that lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Calc stuff onceThomas Gleixner2013-03-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculate the cycle interval shifted value once. No functional change, just makes the code more readable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | hrtimer: Add hrtimer support for CLOCK_TAIJohn Stultz2013-03-235-3/+44
| | | | | | | | | | | | | | | | | | | | | | | | Add hrtimer support for CLOCK_TAI, as well as posix timer interfaces. Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Add CLOCK_TAI clockidJohn Stultz2013-03-234-4/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This add a CLOCK_TAI clockid and the needed accessors. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Move TAI managment into timekeeping core from ntpJohn Stultz2013-03-234-8/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently NTP manages the TAI offset. Since there's plans for a CLOCK_TAI clockid, push the TAI management into the timekeeping core. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: utilize the suspend-nonstop clocksource to count suspended timeFeng Tang2013-03-161-7/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are some new processors whose TSC clocksource won't stop during suspend. Currently, after system resumes, kernel will use persistent clock or RTC to compensate the sleep time, but with these nonstop clocksources, we could skip the special compensation from external sources, and just use current clocksource for time recounting. This can solve some time drift bugs caused by some not-so-accurate or error-prone RTC devices. The current way to count suspended time is first try to use the persistent clock, and then try the RTC if persistent clock can't be used. This patch will change the trying order to: suspend-nonstop clocksource -> persistent clock -> RTC When counting the sleep time with nonstop clocksource, use an accurate way suggested by Jason Gunthorpe to cover very large delta cycles. Signed-off-by: Feng Tang <feng.tang@intel.com> [jstultz: Small optimization, avoiding re-reading the clocksource] Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | x86: tsc: Add support for new S3_NONSTOP featureFeng Tang2013-03-161-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for new S3_NONSTOP feature Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | clocksource: Add new feature flag CLOCK_SOURCE_SUSPEND_NONSTOPFeng Tang2013-03-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some x86 processors have a TSC clocksource, which continues to run even when system is suspended. Also most OMAP platforms have a 32 KHz timer which has similar capability. Add a feature flag so that it could be utilized. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | x86: Add cpu capability flag X86_FEATURE_NONSTOP_TSC_S3Feng Tang2013-03-162-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some new Intel Atom processors (Penwell and Cloverview), there is a feature that the TSC won't stop in S3 state, say the TSC value won't be reset to 0 after resume. This feature makes TSC a more reliable clocksource and could benefit the timekeeping code during system suspend/resume cycle, so add a flag for it. Signed-off-by: Feng Tang <feng.tang@intel.com> [jstultz: Fix checkpatch warning] Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | x86: Do full rtc synchronization with ntpPrarit Bhargava2013-03-164-83/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every 11 minutes ntp attempts to update the x86 rtc with the current system time. Currently, the x86 code only updates the rtc if the system time is within +/-15 minutes of the current value of the rtc. This was done originally to avoid setting the RTC if the RTC was in localtime mode (common with Windows dualbooting). Other architectures do a full synchronization and now that we have better infrastructure to detect when the RTC is in localtime, there is no reason that x86 should be software limited to a 30 minute window. This patch changes the behavior of the kernel to do a full synchronization (year, month, day, hour, minute, and second) of the rtc when ntp requests a synchronization between the system time and the rtc. I've used the RTC library functions in this patchset as they do all the required bounds checking. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: x86@kernel.org Cc: Matt Fleming <matt.fleming@intel.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: linux-efi@vger.kernel.org Signed-off-by: Prarit Bhargava <prarit@redhat.com> [jstultz: Tweak commit message, fold in build fix found by fengguang Also add select RTC_LIB to X86, per new dependency, as found by prarit] Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Use inject_offset in warp_clockJohn Stultz2013-03-161-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When warping the clock (from a local time RTC), use timekeeping_inject_offset() to atomically add the offset. This avoids any minor time error caused by the delay between reading the time, and then setting the adjusted time. Signed-off-by: John Stultz <john.stultz@linaro.org>
| | * | timekeeping: Avoid adjust kernel time once hwclock kept in UTC timeDong Zhu2013-03-161-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the Hardware Clock kept in local time,kernel will adjust the time to be UTC time.But if Hardware Clock kept in UTC time,system will make a dummy settimeofday call first (sys_tz.tz_minuteswest = 0) to make sure the time is not shifted,so at this point I think maybe it is not necessary to set the kernel time once the sys_tz.tz_minuteswest is zero. Signed-off-by: Dong Zhu <bluezhudong@gmail.com> [jstultz: Updated to merge with conflicting changes ] Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | | tick: Change log level of NOHZ: local_softirq_pending messageRado Vrbovsky2013-03-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "NOHZ: local_softirq_pending" message is a largely informational message. This makes extra work for customers that have a policy of investigating all kernel log messages logged at <= KERN_ERR log level. This patch sets the message to a different log level. [ tglx: Use pr_warn() ] Signed-off-by: Rado Vrbovsky <rvrbovsk@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/r/2037057938.893524.1360345050772.JavaMail.root@redhat.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | hrtimer/trivial: Fix comment text in hrtimer.cDavid Daney2013-03-251-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comments mention HRTIMER_ABS and HRTIMER_REL, these symbols don't exist, the proper names are HRTIMER_MODE_ABS and HRTIMER_MODE_REL. Signed-off-by: David Daney <david.daney@cavium.com> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/1363202438-21234-1-git-send-email-ddaney.cavm@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | clockevents: Add missing tick_check_broadcast_expired() for CLOCKEVENTS=nThomas Gleixner2013-03-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fengs build robot reports: arch/arm/kernel/process.c: In function 'cpu_idle': arch/arm/kernel/process.c:211:4: error: implicit declaration of function 'tick_check_broadcast_expired' [-Werror=implicit-function-declaration] Add the missing inline function for non clockevent builds Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | x86: Use tick broadcast expired checkThomas Gleixner2013-03-131-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid going back into deep idle if the tick broadcast IPI is about to fire. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20130306111537.702278273@linutronix.de
| * | | arm: Use tick broadcast expired checkThomas Gleixner2013-03-131-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid going back into deep idle if the tick broadcast IPI is about to fire. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Jason Liu <liu.h.jason@gmail.com> Link: http://lkml.kernel.org/r/20130306111537.640722922@linutronix.de
| * | | tick: Provide a check for a forced broadcast pendingThomas Gleixner2013-03-132-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On the CPU which gets woken along with the target CPU of the broadcast the following happens: deep_idle() <-- spurious wakeup broadcast_exit() set forced bit enable interrupts <-- Nothing happens disable interrupts broadcast_enter() <-- Here we observe the forced bit is set deep_idle() Now after that the target CPU of the broadcast runs the broadcast handler and finds the other CPU in both the broadcast and the forced mask, sends the IPI and stuff gets back to normal. So it's not actually harmful, just more evidence for the theory, that hardware designers have access to very special drug supplies. Now there is no point in going back to deep idle just to wake up again right away via an IPI. Provide a check which allows the idle code to avoid the deep idle transition. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Jason Liu <liu.h.jason@gmail.com> Link: http://lkml.kernel.org/r/20130306111537.565418308@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | tick: Handle broadcast wakeup of multiple cpusThomas Gleixner2013-03-131-1/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some brilliant hardware implementations wake multiple cores when the broadcast timer fires. This leads to the following interesting problem: CPU0 CPU1 wakeup from idle wakeup from idle leave broadcast mode leave broadcast mode restart per cpu timer restart per cpu timer go back to idle handle broadcast (empty mask) enter broadcast mode programm broadcast device enter broadcast mode programm broadcast device So what happens is that due to the forced reprogramming of the cpu local timer, we need to set a event in the future. Now if we manage to go back to idle before the timer fires, we switch off the timer and arm the broadcast device with an already expired time (covered by forced mode). So in the worst case we repeat the above ping pong forever. Unfortunately we have no information about what caused the wakeup, but we can check current time against the expiry time of the local cpu. If the local event is already in the past, we know that the broadcast timer is about to fire and send an IPI. So we mark ourself as an IPI target even if we left broadcast mode and avoid the reprogramming of the local cpu timer. This still leaves the possibility that a CPU which is not handling the broadcast interrupt is going to reach idle again before the IPI arrives. This can't be solved in the core code and will be handled in follow up patches. Reported-by: Jason Liu <liu.h.jason@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: LAK <linux-arm-kernel@lists.infradead.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Arjan van de Veen <arjan@infradead.org> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Link: http://lkml.kernel.org/r/20130306111537.492045206@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>