timers/migration: Fix endless timer requeue after idle interrupts

When a CPU is an idle migrator, but another CPU wakes up before it, becomes an active migrator and handles the queue, the initial idle migrator may end up endlessly reprogramming its clockevent, chasing ghost timers forever such as in the following scenario: [GRP0:0] migrator = 0 active = 0 nextevt = T1 / \ 0 1 active idle (T1) 0) CPU 1 is idle and has a timer queued (T1), CPU 0 is active and is the active migrator. [GRP0:0] migrator = NONE active = NONE nextevt = T1 / \ 0 1 idle idle (T1) wakeup = T1 1) CPU 0 is now idle and is therefore the idle migrator. It has programmed its next timer interrupt to handle T1. [GRP0:0] migrator = 1 active = 1 nextevt = KTIME_MAX / \ 0 1 idle active wakeup = T1 2) CPU 1 has woken up, it is now active and it has just handled its own timer T1. 3) CPU 0 gets a timer interrupt to handle T1 but tmigr_handle_remote() realize it is not the migrator anymore. So it early returns without observing that T1 has been expired already and therefore without updating its ->wakeup value. 4) CPU 0 goes into tmigr_cpu_new_timer() which also early returns because it doesn't queue a timer of its own. So ->wakeup is left unchanged and the next timer is programmed to fire now. 5) goto 3) forever This results in timer interrupt storms in idle and also in nohz_full (as observed in rcutorture's TREE07 scenario). Fix this with forcing a re-evaluation of tmc->wakeup while trying remote timer handling when the CPU isn't the migrator anymmore. The check is inherently racy but in the worst case the CPU just races setting the KTIME_MAX value that a remote expiry also tries to set. Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Reported-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240318230729.15497-2-frederic@kernel.org
author: Frederic Weisbecker <frederic@kernel.org> 2024-03-19 00:07:28 +0100
committer: Thomas Gleixner <tglx@linutronix.de> 2024-03-19 10:14:55 +0100
commit: f55acb1e44f3d4bf1ca7926d777895a67d4ec606 (patch)
tree: 0cac0331e0a5bcc0d8b0125f513e9af235dec0b8
parent: timer/migration: Remove buggy early return on deactivation (diff)
download: linux-f55acb1e44f3d4bf1ca7926d777895a67d4ec606.tar.xz
linux-f55acb1e44f3d4bf1ca7926d777895a67d4ec606.zip
1 files changed, 9 insertions, 2 deletions
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 611cd904f035..c63a0afdcebe 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -1038,8 +1038,15 @@ void tmigr_handle_remote(void)
 	 * in tmigr_handle_remote_up() anyway. Keep this check to speed up the
 	 * return when nothing has to be done.
 	 */
-	if (!tmigr_check_migrator(tmc->tmgroup, tmc->childmask))
-		return;
+	if (!tmigr_check_migrator(tmc->tmgroup, tmc->childmask)) {
+		/*
+		 * If this CPU was an idle migrator, make sure to clear its wakeup
+		 * value so it won't chase timers that have already expired elsewhere.
+		 * This avoids endless requeue from tmigr_new_timer().
+		 */
+		if (READ_ONCE(tmc->wakeup) == KTIME_MAX)
+			return;
+	}
 
 	data.now = get_jiffies_update(&data.basej);
author	Frederic Weisbecker <frederic@kernel.org>	2024-03-19 00:07:28 +0100
committer	Thomas Gleixner <tglx@linutronix.de>	2024-03-19 10:14:55 +0100
commit	f55acb1e44f3d4bf1ca7926d777895a67d4ec606 (patch)
tree	0cac0331e0a5bcc0d8b0125f513e9af235dec0b8
parent	timer/migration: Remove buggy early return on deactivation (diff)
download	linux-f55acb1e44f3d4bf1ca7926d777895a67d4ec606.tar.xz linux-f55acb1e44f3d4bf1ca7926d777895a67d4ec606.zip