summaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorYuri Nudelman <ynudelman@habana.ai>2021-06-06 09:28:51 +0200
committerOded Gabbay <ogabbay@kernel.org>2021-08-29 08:47:46 +0200
commit938b793fdede518e10c67dc1dcfba1804faf98a6 (patch)
tree640575a63c1b0c204cdbacab5f2d0bb1c4cce3d7 /Documentation
parenthabanalabs: use get_task_pid() to take PID (diff)
downloadlinux-938b793fdede518e10c67dc1dcfba1804faf98a6.tar.xz
linux-938b793fdede518e10c67dc1dcfba1804faf98a6.zip
habanalabs: expose state dump
To improve the user's ability to debug the case where a workload that is part of executing training/inference of a topology is getting stuck, we need to add a 'core dump' each time a CS times-out. The 'core dump' shall contain all relevant Sync Manager information and corresponding fence values. The most recent dumps shall be accessible via debugfs, under 'state_dump' node. Reading from the node will provide the oldest dump available. Writing an integer value X will discard X dumps, starting with the oldest one, i.e. subsequent read will now return newer dumps. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/debugfs-driver-habanalabs11
1 files changed, 11 insertions, 0 deletions
diff --git a/Documentation/ABI/testing/debugfs-driver-habanalabs b/Documentation/ABI/testing/debugfs-driver-habanalabs
index a5c28f606865..e29156511388 100644
--- a/Documentation/ABI/testing/debugfs-driver-habanalabs
+++ b/Documentation/ABI/testing/debugfs-driver-habanalabs
@@ -215,6 +215,17 @@ Description: Sets the skip reset on timeout option for the device. Value of
"0" means device will be reset in case some CS has timed out,
otherwise it will not be reset.
+What: /sys/kernel/debug/habanalabs/hl<n>/state_dump
+Date: Oct 2021
+KernelVersion: 5.15
+Contact: ynudelman@habana.ai
+Description: Gets the state dump occurring on a CS timeout or failure.
+ State dump is used for debug and is created each time in case of
+ a problem in a CS execution, before reset.
+ Reading from the node returns the newest state dump available.
+ Writing an integer X discards X state dumps, so that the
+ next read would return X+1-st newest state dump.
+
What: /sys/kernel/debug/habanalabs/hl<n>/stop_on_err
Date: Mar 2020
KernelVersion: 5.6