perf: Drop sample rate when sampling is too slow

This patch keeps track of how long perf's NMI handler is taking, and also calculates how many samples perf can take a second. If the sample length times the expected max number of samples exceeds a configurable threshold, it drops the sample rate. This way, we don't have a runaway sampling process eating up the CPU. This patch can tend to drop the sample rate down to level where perf doesn't work very well. *BUT* the alternative is that my system hangs because it spends all of its time handling NMIs. I'll take a busted performance tool over an entire system that's busted and undebuggable any day. BTW, my suspicion is that there's still an underlying bug here. Using the HPET instead of the TSC is definitely a contributing factor, but I suspect there are some other things going on. But, I can't go dig down on a bug like that with my machine hanging all the time. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus@samba.org Cc: acme@ghostprotocols.net Cc: Dave Hansen <dave@sr71.net> [ Prettified it a bit. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Dave Hansen <dave.hansen@linux.intel.com> 2013-06-21 17:51:36 +0200
committer: Ingo Molnar <mingo@kernel.org> 2013-06-23 11:52:57 +0200
commit: 14c63f17b1fde5a575a28e96547a22b451c71fb5 (patch)
tree: 781f7327f4341a3d27197e88994b1859e9b51722 /Documentation
parent: x86: Warn when NMI handlers take large amounts of time (diff)
download: linux-14c63f17b1fde5a575a28e96547a22b451c71fb5.tar.xz
linux-14c63f17b1fde5a575a28e96547a22b451c71fb5.zip
1 files changed, 26 insertions, 0 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index bcff3f9de550..ab7d16efa96b 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -427,6 +427,32 @@ This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
 
 ==============================================================
 
+perf_cpu_time_max_percent:
+
+Hints to the kernel how much CPU time it should be allowed to
+use to handle perf sampling events.  If the perf subsystem
+is informed that its samples are exceeding this limit, it
+will drop its sampling frequency to attempt to reduce its CPU
+usage.
+
+Some perf sampling happens in NMIs.  If these samples
+unexpectedly take too long to execute, the NMIs can become
+stacked up next to each other so much that nothing else is
+allowed to execute.
+
+0: disable the mechanism.  Do not monitor or correct perf's
+   sampling rate no matter how CPU time it takes.
+
+1-100: attempt to throttle perf's sample rate to this
+   percentage of CPU.  Note: the kernel calculates an
+   "expected" length of each sample event.  100 here means
+   100% of that expected length.  Even if this is set to
+   100, you may still see sample throttling if this
+   length is exceeded.  Set to 0 if you truly do not care
+   how much CPU is consumed.
+
+==============================================================
+
 
 pid_max:
author	Dave Hansen <dave.hansen@linux.intel.com>	2013-06-21 17:51:36 +0200
committer	Ingo Molnar <mingo@kernel.org>	2013-06-23 11:52:57 +0200
commit	14c63f17b1fde5a575a28e96547a22b451c71fb5 (patch)
tree	781f7327f4341a3d27197e88994b1859e9b51722 /Documentation
parent	x86: Warn when NMI handlers take large amounts of time (diff)
download	linux-14c63f17b1fde5a575a28e96547a22b451c71fb5.tar.xz linux-14c63f17b1fde5a575a28e96547a22b451c71fb5.zip