Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs

vmbus_wait_for_unload() may be called in the panic path after other CPUs are stopped. vmbus_wait_for_unload() currently loops through online CPUs looking for the UNLOAD response message. But the values of CONFIG_KEXEC_CORE and crash_kexec_post_notifiers affect the path used to stop the other CPUs, and in one of the paths the stopped CPUs are removed from cpu_online_mask. This removal happens in both x86/x64 and arm64 architectures. In such a case, vmbus_wait_for_unload() only checks the panic'ing CPU, and misses the UNLOAD response message except when the panic'ing CPU is CPU 0. vmbus_wait_for_unload() eventually times out, but only after waiting 100 seconds. Fix this by looping through *present* CPUs in vmbus_wait_for_unload(). The cpu_present_mask is not modified by stopping the other CPUs in the panic path, nor should it be. Also, in a CoCo VM the synic_message_page is not allocated in hv_synic_alloc(), but is set and cleared in hv_synic_enable_regs() and hv_synic_disable_regs() such that it is set only when the CPU is online. If not all present CPUs are online when vmbus_wait_for_unload() is called, the synic_message_page might be NULL. Add a check for this. Fixes: cd95aad55793 ("Drivers: hv: vmbus: handle various crash scenarios") Cc: stable@vger.kernel.org Reported-by: John Starks <jostarks@microsoft.com> Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/1684422832-38476-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org>
author: Michael Kelley <mikelley@microsoft.com> 2023-05-18 17:13:52 +0200
committer: Wei Liu <wei.liu@kernel.org> 2023-05-23 20:53:16 +0200
commit: 320805ab61e5f1e2a5729ae266e16bec2904050c (patch)
tree: d23598814e22eb256c4f0887b875f386db82db49 /drivers/hv/channel_mgmt.c
parent: Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails (diff)
download: linux-320805ab61e5f1e2a5729ae266e16bec2904050c.tar.xz
linux-320805ab61e5f1e2a5729ae266e16bec2904050c.zip
1 files changed, 16 insertions, 2 deletions
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 007f26d5f1a4..2f4d09ce027a 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -829,11 +829,22 @@ static void vmbus_wait_for_unload(void)
 		if (completion_done(&vmbus_connection.unload_event))
 			goto completed;
 
-		for_each_online_cpu(cpu) {
+		for_each_present_cpu(cpu) {
 			struct hv_per_cpu_context *hv_cpu
 				= per_cpu_ptr(hv_context.cpu_context, cpu);
 
+			/*
+			 * In a CoCo VM the synic_message_page is not allocated
+			 * in hv_synic_alloc(). Instead it is set/cleared in
+			 * hv_synic_enable_regs() and hv_synic_disable_regs()
+			 * such that it is set only when the CPU is online. If
+			 * not all present CPUs are online, the message page
+			 * might be NULL, so skip such CPUs.
+			 */
 			page_addr = hv_cpu->synic_message_page;
+			if (!page_addr)
+				continue;
+
 			msg = (struct hv_message *)page_addr
 				+ VMBUS_MESSAGE_SINT;
 
@@ -867,11 +878,14 @@ completed:
 	 * maybe-pending messages on all CPUs to be able to receive new
 	 * messages after we reconnect.
 	 */
-	for_each_online_cpu(cpu) {
+	for_each_present_cpu(cpu) {
 		struct hv_per_cpu_context *hv_cpu
 			= per_cpu_ptr(hv_context.cpu_context, cpu);
 
 		page_addr = hv_cpu->synic_message_page;
+		if (!page_addr)
+			continue;
+
 		msg = (struct hv_message *)page_addr + VMBUS_MESSAGE_SINT;
 		msg->header.message_type = HVMSG_NONE;
 	}
author	Michael Kelley <mikelley@microsoft.com>	2023-05-18 17:13:52 +0200
committer	Wei Liu <wei.liu@kernel.org>	2023-05-23 20:53:16 +0200
commit	320805ab61e5f1e2a5729ae266e16bec2904050c (patch)
tree	d23598814e22eb256c4f0887b875f386db82db49 /drivers/hv/channel_mgmt.c
parent	Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails (diff)
download	linux-320805ab61e5f1e2a5729ae266e16bec2904050c.tar.xz linux-320805ab61e5f1e2a5729ae266e16bec2904050c.zip