powerpc/opal-irqchip: Fix deadlock introduced by "Fix double endian conversion"

Commit 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion") fixed an endian bug by calling opal_handle_events() in opal_event_unmask(). However this introduced a deadlock if we find an event is active during unmasking and call opal_handle_events() again. The bad call sequence is: opal_interrupt() -> opal_handle_events() -> generic_handle_irq() -> handle_level_irq() -> raw_spin_lock(&desc->lock) handle_irq_event(desc) unmask_irq(desc) -> opal_event_unmask() -> opal_handle_events() -> generic_handle_irq() -> handle_level_irq() -> raw_spin_lock(&desc->lock) (BOOM) When generating multiple opal events in quick succession this would lead to the following stall warnings: EEH: Fenced PHB#0 detected, location: U78C9.001.WZS09XA-P1-C32 INFO: rcu_sched detected stalls on CPUs/tasks: 12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=2065 15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=2065 (detected by 13, t=2102 jiffies, g=1325, c=1324, q=602) NMI watchdog: BUG: soft lockup - CPU#18 stuck for 22s! [irqbalance:2696] INFO: rcu_sched detected stalls on CPUs/tasks: 12-...: (1 GPs behind) idle=68f/140000000000001/0 softirq=860/861 fqs=8371 15-...: (1 GPs behind) idle=be5/140000000000001/0 softirq=1142/1143 fqs=8371 (detected by 20, t=8407 jiffies, g=1325, c=1324, q=1290) This patch corrects the problem by queuing the work if an event is active during unmasking, which is similar to the pre-endian fix behaviour. Fixes: 25642e1459ac ("powerpc/opal-irqchip: Fix double endian conversion") Signed-off-by: Alistair Popple <alistair@popple.id.au> Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
author: Alistair Popple <alistair@popple.id.au> 2015-12-18 07:16:17 +0100
committer: Michael Ellerman <mpe@ellerman.id.au> 2015-12-18 12:24:15 +0100
commit: 036592fbbe753d236402a0ae68148e7c143a0f0e (patch)
tree: d2f05d8738a0e78ed8227f10c0b55eff3b23cfaf /arch/powerpc/platforms/powernv/opal-irqchip.c
parent: powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type (diff)
download: linux-036592fbbe753d236402a0ae68148e7c143a0f0e.tar.xz
linux-036592fbbe753d236402a0ae68148e7c143a0f0e.zip
1 files changed, 13 insertions, 1 deletions
diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c
index 0a00e2aed393..e505223b4ec5 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -83,7 +83,19 @@ static void opal_event_unmask(struct irq_data *d)
 	set_bit(d->hwirq, &opal_event_irqchip.mask);
 
 	opal_poll_events(&events);
-	opal_handle_events(be64_to_cpu(events));
+	last_outstanding_events = be64_to_cpu(events);
+
+	/*
+	 * We can't just handle the events now with opal_handle_events().
+	 * If we did we would deadlock when opal_event_unmask() is called from
+	 * handle_level_irq() with the irq descriptor lock held, because
+	 * calling opal_handle_events() would call generic_handle_irq() and
+	 * then handle_level_irq() which would try to take the descriptor lock
+	 * again. Instead queue the events for later.
+	 */
+	if (last_outstanding_events & opal_event_irqchip.mask)
+		/* Need to retrigger the interrupt */
+		irq_work_queue(&opal_event_irq_work);
 }
 
 static int opal_event_set_type(struct irq_data *d, unsigned int flow_type)
author	Alistair Popple <alistair@popple.id.au>	2015-12-18 07:16:17 +0100
committer	Michael Ellerman <mpe@ellerman.id.au>	2015-12-18 12:24:15 +0100
commit	036592fbbe753d236402a0ae68148e7c143a0f0e (patch)
tree	d2f05d8738a0e78ed8227f10c0b55eff3b23cfaf /arch/powerpc/platforms/powernv/opal-irqchip.c
parent	powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type (diff)
download	linux-036592fbbe753d236402a0ae68148e7c143a0f0e.tar.xz linux-036592fbbe753d236402a0ae68148e7c143a0f0e.zip