diff options
author | Thomas Gleixner <tglx@linutronix.de> | 2023-04-26 18:37:00 +0200 |
---|---|---|
committer | Thomas Gleixner <tglx@linutronix.de> | 2023-06-20 14:51:46 +0200 |
commit | 1f5e7eb7868e42227ac426c96d437117e6e06e8e (patch) | |
tree | c38bd16c3881d89d1a2a1b03fa613aca9417d7d7 /arch/x86/kernel/process.c | |
parent | Linux 6.4-rc4 (diff) | |
download | linux-1f5e7eb7868e42227ac426c96d437117e6e06e8e.tar.xz linux-1f5e7eb7868e42227ac426c96d437117e6e06e8e.zip |
x86/smp: Make stop_other_cpus() more robust
Tony reported intermittent lockups on poweroff. His analysis identified the
wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that
on SME enabled machines a kexec() does not leave any stale data in the
caches when switching from encrypted to non-encrypted mode or vice versa.
That wbinvd() is conditional on the SME feature bit which is read directly
from CPUID. But that readout does not check whether the CPUID leaf is
available or not. If it's not available the CPU will return the value of
the highest supported leaf instead. Depending on the content the "SME" bit
might be set or not.
That's incorrect but harmless. Making the CPUID readout conditional makes
the observed hangs go away, but it does not fix the underlying problem:
CPU0 CPU1
stop_other_cpus()
send_IPIs(REBOOT); stop_this_cpu()
while (num_online_cpus() > 1); set_online(false);
proceed... -> hang
wbinvd()
WBINVD is an expensive operation and if multiple CPUs issue it at the same
time the resulting delays are even larger.
But CPU0 already observed num_online_cpus() going down to 1 and proceeds
which causes the system to hang.
This issue exists independent of WBINVD, but the delays caused by WBINVD
make it more prominent.
Make this more robust by adding a cpumask which is initialized to the
online CPU mask before sending the IPIs and CPUs clear their bit in
stop_this_cpu() after the WBINVD completed. Check for that cpumask to
become empty in stop_other_cpus() instead of watching num_online_cpus().
The cpumask cannot plug all holes either, but it's better than a raw
counter and allows to restrict the NMI fallback IPI to be sent only the
CPUs which have not reported within the timeout window.
Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use")
Reported-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com
Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
Diffstat (limited to 'arch/x86/kernel/process.c')
-rw-r--r-- | arch/x86/kernel/process.c | 23 |
1 files changed, 21 insertions, 2 deletions
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index dac41a0072ea..05924bc4b3af 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -759,13 +759,23 @@ bool xen_set_default_idle(void) } #endif +struct cpumask cpus_stop_mask; + void __noreturn stop_this_cpu(void *dummy) { + unsigned int cpu = smp_processor_id(); + local_irq_disable(); + /* - * Remove this CPU: + * Remove this CPU from the online mask and disable it + * unconditionally. This might be redundant in case that the reboot + * vector was handled late and stop_other_cpus() sent an NMI. + * + * According to SDM and APM NMIs can be accepted even after soft + * disabling the local APIC. */ - set_cpu_online(smp_processor_id(), false); + set_cpu_online(cpu, false); disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); @@ -783,6 +793,15 @@ void __noreturn stop_this_cpu(void *dummy) */ if (cpuid_eax(0x8000001f) & BIT(0)) native_wbinvd(); + + /* + * This brings a cache line back and dirties it, but + * native_stop_other_cpus() will overwrite cpus_stop_mask after it + * observed that all CPUs reported stop. This write will invalidate + * the related cache line on this CPU. + */ + cpumask_clear_cpu(cpu, &cpus_stop_mask); + for (;;) { /* * Use native_halt() so that memory contents don't change |