diff options
author | Tomer Tayar <ttayar@habana.ai> | 2022-01-18 11:31:15 +0100 |
---|---|---|
committer | Oded Gabbay <ogabbay@kernel.org> | 2022-02-28 13:22:03 +0100 |
commit | 930feb41efe2e799992ae07c1a274f68be7980ea (patch) | |
tree | 537333bbfe3b8a4cbcd21d4ea7620a00b393ab9b /drivers/misc | |
parent | habanalabs: fix race between wait and irq (diff) | |
download | linux-930feb41efe2e799992ae07c1a274f68be7980ea.tar.xz linux-930feb41efe2e799992ae07c1a274f68be7980ea.zip |
habanalabs: prevent false heartbeat failure during soft-reset
The heartbeat thread is active during soft-reset, and it tries to send
messages to CPU-CP core.
Within the soft-reset, in the time window in which the device is marked
as disabled, any CPU-CP command is "silently" skipped and a success
value it returned.
However, in addition to the return value, the heartbeat function also
checks the F/W result, but because no command is sent in this time
window, the result variable won't hold the expected value and we will
have a false heartbeat failure.
To avoid it, modify the "silent" skip to be done only in hard-reset.
The CPU-CP should be able to handle messages during soft-reset.
In addition to the heartbeat problem, this should also solve other
issues in other flows that send messages during soft-reset and use the
F/W result as it w/o being aware to the reset.
Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Diffstat (limited to 'drivers/misc')
-rw-r--r-- | drivers/misc/habanalabs/common/firmware_if.c | 7 |
1 files changed, 5 insertions, 2 deletions
diff --git a/drivers/misc/habanalabs/common/firmware_if.c b/drivers/misc/habanalabs/common/firmware_if.c index 39de9d86ee6c..11957d36c6a9 100644 --- a/drivers/misc/habanalabs/common/firmware_if.c +++ b/drivers/misc/habanalabs/common/firmware_if.c @@ -214,7 +214,7 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg, dma_addr_t pkt_dma_addr; struct hl_bd *sent_bd; u32 tmp, expected_ack_val, pi; - int rc = 0; + int rc; pkt = hdev->asic_funcs->cpu_accessible_dma_pool_alloc(hdev, len, &pkt_dma_addr); @@ -228,8 +228,11 @@ int hl_fw_send_cpu_message(struct hl_device *hdev, u32 hw_queue_id, u32 *msg, mutex_lock(&hdev->send_cpu_message_lock); - if (hdev->disabled) + /* CPU-CP messages can be sent during soft-reset */ + if (hdev->disabled && !hdev->reset_info.is_in_soft_reset) { + rc = 0; goto out; + } if (hdev->device_cpu_disabled) { rc = -EIO; |