tracing: Initialize iter->seq after zeroing in tracing_read_pipe()

A customer reported the following softlockup: [899688.160002] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [test.sh:16464] [899688.160002] CPU: 0 PID: 16464 Comm: test.sh Not tainted 4.12.14-6.23-azure #1 SLE12-SP4 [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] Kernel panic - not syncing: softlockup: hung tasks [899688.160002] RIP: 0010:up_write+0x1a/0x30 [899688.160002] RSP: 0018:ffffa86784d4fde8 EFLAGS: 00000257 ORIG_RAX: ffffffffffffff12 [899688.160002] RAX: ffffffff970fea00 RBX: 0000000000000001 RCX: 0000000000000000 [899688.160002] RDX: ffffffff00000001 RSI: 0000000000000080 RDI: ffffffff970fea00 [899688.160002] RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000 [899688.160002] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b59014720d8 [899688.160002] R13: ffff8b59014720c0 R14: ffff8b5901471090 R15: ffff8b5901470000 [899688.160002] tracing_read_pipe+0x336/0x3c0 [899688.160002] __vfs_read+0x26/0x140 [899688.160002] vfs_read+0x87/0x130 [899688.160002] SyS_read+0x42/0x90 [899688.160002] do_syscall_64+0x74/0x160 It caught the process in the middle of trace_access_unlock(). There is no loop. So, it must be looping in the caller tracing_read_pipe() via the "waitagain" label. Crashdump analyze uncovered that iter->seq was completely zeroed at this point, including iter->seq.seq.size. It means that print_trace_line() was never able to print anything and there was no forward progress. The culprit seems to be in the code: /* reset all but tr, trace, and overruns */ memset(&iter->seq, 0, sizeof(struct trace_iterator) - offsetof(struct trace_iterator, seq)); It was added by the commit 53d0aa773053ab182877 ("ftrace: add logic to record overruns"). It was v2.6.27-rc1. It was the time when iter->seq looked like: struct trace_seq { unsigned char buffer[PAGE_SIZE]; unsigned int len; }; There was no "size" variable and zeroing was perfectly fine. The solution is to reinitialize the structure after or without zeroing. Link: http://lkml.kernel.org/r/20191011142134.11997-1-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
author: Petr Mladek <pmladek@suse.com> 2019-10-11 16:21:34 +0200
committer: Steven Rostedt (VMware) <rostedt@goodmis.org> 2019-10-13 02:49:34 +0200
commit: d303de1fcf344ff7c15ed64c3f48a991c9958775 (patch)
tree: 7de6bf675dbe5550409121f4583fc9fd81c6b852
parent: tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency (diff)
download: linux-d303de1fcf344ff7c15ed64c3f48a991c9958775.tar.xz
linux-d303de1fcf344ff7c15ed64c3f48a991c9958775.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2b4eff383505..6a0ee9178365 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6036,6 +6036,7 @@ waitagain:
 	       sizeof(struct trace_iterator) -
 	       offsetof(struct trace_iterator, seq));
 	cpumask_clear(iter->started);
+	trace_seq_init(&iter->seq);
 	iter->pos = -1;
 
 	trace_event_read_lock();
author	Petr Mladek <pmladek@suse.com>	2019-10-11 16:21:34 +0200
committer	Steven Rostedt (VMware) <rostedt@goodmis.org>	2019-10-13 02:49:34 +0200
commit	d303de1fcf344ff7c15ed64c3f48a991c9958775 (patch)
tree	7de6bf675dbe5550409121f4583fc9fd81c6b852
parent	tracing/hwlat: Don't ignore outer-loop duration when calculating max_latency (diff)
download	linux-d303de1fcf344ff7c15ed64c3f48a991c9958775.tar.xz linux-d303de1fcf344ff7c15ed64c3f48a991c9958775.zip