summaryrefslogtreecommitdiffstats
path: root/drivers/misc/habanalabs/gaudi (follow)
Commit message (Collapse)AuthorAgeFilesLines
* habanalabs: use common wrapper for MMU cache invalidationOded Gabbay2022-02-281-3/+3
| | | | | | | | | We have a common function that wraps the call to the MMU cache invalidation function, which is ASIC-specific. The wrapper checks the return value and prints error if necessary. For consistency, try to use the wrapper when possible. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: remove power9 workaround for dma supportOded Gabbay2022-02-281-8/+1
| | | | | | We don't need this workaround anymore. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add vrm version to sysfsOded Gabbay2022-02-282-4/+25
| | | | | | | infineon version is only applicable to GOYA and GAUDI. For later ASICs, we display the Voltage Regulator Monitor f/w version. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: remove asic callback set_pll_profile()Oded Gabbay2022-02-281-2/+1
| | | | | | | | Setting PLL profile is the same for all ASICs, except for GOYA. However, because this function is never called from common code, there is no need to have an asic-specific callback function. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: remove hwmgr.cOded Gabbay2022-02-281-1/+1
| | | | | | | The two remaining functions in this file belong to firmware_if.c, as they communicate with the firmware. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: get clk is common functionOded Gabbay2022-02-281-1/+0
| | | | | | | | | | | Retrieving the clock from the f/w is done exactly the same in ALL our ASICs. Therefore, no real justification for doing it as an ASIC-specific function. The only thing is we need to check if we are running on simulator, which doesn't require ASIC-specific callback. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: sysfs functions should be in sysfs.cOded Gabbay2022-02-281-1/+1
| | | | | | | | | Move common sysfs store/show functions to sysfs.c file for consistency. This is part of a patch-set to remove hwmgr.c Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: remove ASIC functions of clock gatingOded Gabbay2022-02-282-121/+5
| | | | | | | | Now that clock gating is permanently disabled in GAUDI, no need for the ASIC functions of setting and disabling clock gating, as this was a unique scenario in GAUDI. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: disable CGM permanentlyOded Gabbay2022-02-282-167/+64
| | | | | | | | Due to the need of SynapseAI to configure all TPC engines from a single QMAN, the driver must disable CGM and never allow the user to enable it. Otherwise, the configuration of the TPC engines will fail. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: refactor reset information variablesOfir Bitton2021-12-261-4/+4
| | | | | | | | | Unify variables related to device reset, which will help us to add some new reset functionality in future patches. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: clean MMU headers definitionsOhad Sharabi2021-12-261-12/+12
| | | | | | | | | | | | | | | | | | | | | During the MMU development the MMU header files were left with unclean definitions: - MMU "version specific" definitions that were left in the mmu_general file - unused definitions This patch attempts, where possible, to keep definitions that can serve multiple MMU versions (but that are not tightly bound with specific MMU arch) in the mmu_general header file (e.g. different definitions for number of HOPs). Otherwise, move MMU version specific definitions (e.g. HOPs masks and shifts) to the specific MMU version file. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: fix etr asid configurationOded Gabbay2021-12-262-4/+4
| | | | | | | | | | Pass the user's context pointer into the etr configuration function to extract its ASID. Using the compute_ctx pointer is an error as it is just an indication of whether a user has opened the compute device. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: return EPERM on non hard-resetOded Gabbay2021-12-261-6/+2
| | | | | | | GAUDI supports only hard-reset. Therefore, this function should return an error of operation not permitted. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: rename late init after reset functionOded Gabbay2021-12-261-2/+2
| | | | | | | | | | | The ASIC-specific soft_reset_late_init() is now called after either soft-reset or reset-upon-device-release. Therefore, it needs a more appropriate name. No need to split it to two functions, as an ASIC either supports soft-reset or reset-upon-device-release. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: Move frequency change thread to goya_late_initRajaravi Krishna Katta2021-12-261-0/+2
| | | | | | | | | | | | | | | | | | Changing the frequency automatically is only done in Goya. In future ASICs this is done inside the firmware. Therefore, move the common code into the Goya specific files. Main changes as part of the commit are: 1. The thread for setting frequency is moved from device_late_init to goya_late_init 2. hl_device_set_frequency is removed from hl_device_open as it is not relevant for other ASICs and for Goya it is taken care by the thread 3. hl_device_set_frequency is renamed as goya_set_frequency Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: fix possible deadlock in cache invl failureOfir Bitton2021-12-261-6/+0
| | | | | | | | | | | | | Currently there is a deadlock in driver in scenarios where MMU cache invalidation fails. The issue is basically device reset being performed without releasing the MMU mutex. The solution is to skip device reset as it is not necessary. In addition we introduce a slight code refactor that prints the invalidation error from a single location. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: skip PLL freq fetchOhad Sharabi2021-12-261-0/+5
| | | | | | | | | | | | Getting the used PLL index with which to send the CPUPU packet relies on the CPUCP info packet. In case CPU queues are not enabled getting the PLL index will issue an error and in some ASICs will also fail the driver load. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add support for fetching historic errorsDani Liberman2021-12-261-43/+124
| | | | | | | | | | | | | | | | | | | | | A new uAPI is added for debug purposes of the user-space to retrieve errors related data from previous session (before device reset was performed). Inforamtion is filled when a razwi or CS timeout happens and can contain one of the following: 1. Retrieve timestamp of last time the device was opened and razwi or CS timeout happened. 2. Retrieve information about last CS timeout. 3. Retrieve information about last razwi error. This information doesn't contain user data, so no danger of data leakage between users. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: Fix collective wait bugfarah kassabri2021-12-261-4/+11
| | | | | | | | | | In Signaling-From-Graph case, the driver didn't set the hw_sob pointer at the right place, which is needed for the cs completion check prior to start sending all the master/slaves jobs to device. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: expand clock throttling information uAPIOfir Bitton2021-12-261-4/+18
| | | | | | | | | | In addition to the clock throttling reason, user should be able to obtain also the start time and the duration of the throttling event. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: rename reset flagsBharat Jauhari2021-12-261-6/+8
| | | | | | | | | Rename reset flags for better readability as compared to HL_RESET_CAUSE* enum shared with the f/w. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add dedicated message towards f/w to set powerRajaravi Krishna Katta2021-12-261-0/+2
| | | | | | | | | | | | | | | | CPUCP_PACKET_POWER_GET packet type was used for both hl_get_power() and hl_set_power(). To align with other sensor functions hl_set_power() should use CPUCP_PACKET_POWER_SET. This packet will only be used with newer ASICs, so need to add a compatibility flag to the asic properties to indicate whether to use this packet or the GET packet. Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: adding indication of boot fit loadedOhad Sharabi2021-12-261-2/+2
| | | | | | | | | | | | | Up until now the driver stored indication if Linux was loaded on the device CPU. This was needed in order to coordinate some tasks that are performed by the Linux. In future ASICs, many of those tasks will be performed by the boot fit, so now we need the same indication of boot fit load status. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add enum mmu_op_flagsYuri Nudelman2021-12-261-2/+2
| | | | | | | | | | | The enum vm_type was abused, used once as a value (indication memory type for map) and once as a flag (for cache invalidation). This makes it hard to add new and still keep it meaningful, hence it is better to split into one enum for values and one for flags. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: make last_mask an MMU propertyYuri Nudelman2021-12-261-0/+1
| | | | | | | | | Currently LAST_MASK is a global, but really it is an MMU implementation specific. We need this change for future ASICs. Signed-off-by: Yuri Nudelman <ynudelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: fix debugfs dma channel selectionGuy Zadicario2021-12-261-2/+9
| | | | | | | | | Do not use a dma channel for debugfs requested transfer if it's QM is not idle. Signed-off-by: Guy Zadicario <gzadicario@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: recover from CPU WD eventOded Gabbay2021-12-261-1/+19
| | | | | | | | | | | | | | | | | There are rare cases where the device CPU's watchdog has expired and as a result, the watchdog reset has happened and the CPU will now move to running its preboot f/w. When that happens, the driver will only know that a heartbeat failure occurred. As a result, the driver will send a message to the CPU's main f/w asking it to reset the device, but because the CPU is now running preboot, it won't respond and the re-initialization process will later fail when trying to load the f/w. The solution is to send the request to the preboot as well, only if the reset was caused because of HB failure. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: Unify frequency set/get functionalityRajaravi Krishna Katta2021-10-184-131/+7
| | | | | | | | | Make the frequency set/get functionality common to all ASICs. This makes more code reusable when adding support for newer ASICs. Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add support for dma-buf exporterTomer Tayar2021-10-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the calls to the dma-buf kernel api to create a dma-buf object backed by FD. We block the option to mmap the DMA-BUF object because we don't support DIRECT_IO and implicit P2P. We only implement support for explicit P2P through importing the FD of the DMA-BUF. In the export phase, we provide to the DMA-BUF object an array of pages that represent the device's memory area. During the map callback, we convert the array of pages into an SGT. We split/merge the pages according to the dma max segment size of the importer. To get the DMA address of the PCI bar, we use the dma_map_resources() kernel API, because our device memory is not backed by page struct and this API doesn't need page struct to map the physical address to a DMA address. We set the orig_nents member of the SGT to be 0, to indicate to other drivers that we don't support CPU mappings. Note that in Habanalabs's ASICs, the device memory is pinned and immutable. Therefore, there is no need for dynamic mappings and pinning callbacks. Also note that in GAUDI we don't have an MMU towards the device memory and the user works on physical addresses. Therefore, the user doesn't pass through the kernel driver to allocate memory there. As a result, only for GAUDI we receive from the user a device memory physical address (instead of a handle) and a size. We check the p2p distance using pci_p2pdma_distance_many() and refusing to map dmabuf in case the distance doesn't allow p2p. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Gal Pressman <galpress@amazon.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: use only u32Oded Gabbay2021-10-181-1/+1
| | | | | | In the kernel it is common to use u32 and not uint32_t. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: bypass reset for continuous h/w error eventBharat Jauhari2021-10-181-2/+6
| | | | | | | | | | | | | | | There may be a situation where drivers receives continuous fatal H/W error events from FW immediately post reset cycle. This may be due to some fault on the silicon itself. In such case its better to bypass reset cycle so we won't be stuck in endless loop of resets. This commit bypasses reset request in case driver received two back to back FW fatal error before first occurrence of heartbeat event. Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: fix LBW RR configurationOded Gabbay2021-09-141-48/+67
| | | | | | | | | | | Couple of fixes to the LBW RR configuration: 1. Add missing configuration of the SM RR registers in the DMA_IF. 2. Remove HBW range that doesn't belong. 3. Add entire gap + DBG area, from end of TPC7 to end of entire DBG space. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: Fix spelling mistake "FEADBACK" -> "FEEDBACK"Colin Ian King2021-09-141-1/+1
| | | | | | | | There is a spelling mistake in a literal string. Fix it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: use direct MSI in single modeOmer Shpigelman2021-09-141-3/+6
| | | | | | | | | | | | Due to FLR scenario when running inside a VM, we must not use indirect MSI because it might cause some issues on VM destroy. In a VM we use single MSI mode in contrary to multi MSI mode which is used in bare-metal. Hence direct MSI should be used in single MSI mode only. Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: hwmon default card nameRajaravi Krishna Katta2021-09-011-1/+1
| | | | | | | | This commit corrects CARD NAME for Gaudi as "HL205" Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add support for f/w resetOded Gabbay2021-09-011-9/+24
| | | | | | | | | | | | When the f/w runs in secured mode, it can reset the ASIC when certain events occur. In unsecured mode, the driver asks the f/w to reset the ASIC for those events. We need to perform the entire reset procedure but without accessing the ASIC. i.e. without halting the engines and without sending messages to the f/w. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: block ICACHE_BASE_ADDERESS_HIGH in TPCOded Gabbay2021-09-011-0/+8
| | | | | | | This register shouldn't be modified by user. Prefetch is disabled in Gaudi. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: invalidate PMMU mem cache on initOded Gabbay2021-09-011-0/+3
| | | | | | | This must be done to clear the internal mem cache so we won't get ecc errors on the first invalidation. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: size should be printed in decimalOded Gabbay2021-09-011-2/+2
| | | | | | It's more readable for the size to be in decimal. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: define DC POWER for secured PMCOded Gabbay2021-09-012-1/+7
| | | | | | | | In secured mode, the CGM is disabled. Therefore, the DC power is higher. Without taking it into consideration, the utilization is 12-15% at idle. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: unmask out of bounds SLM access interruptTomer Tayar2021-09-011-1/+1
| | | | | | | | | | The out of bounds SLM access TPC interrupt indicates a severe compiler bug and needs to be informed to user. This interrupt is currently masked so unmask it. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: fetch TPC/MME ECC errors from F/WOfir Bitton2021-09-011-0/+6
| | | | | | | | | In case F/W security is enabled driver cannot access ECC registers, hence driver must fetch the ECC info from F/W. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: modify multi-CS to wait on stream mastersOhad Sharabi2021-09-012-1/+23
| | | | | | | | | | | | During the integration, the multi-CS requirements were refined: - The multi CS call shall wait on "per-ASIC" predefined stream masters instead of set of streams. - Stream masters are set of QIDs used by the upper SW layers (synapse) for completion (must be an external/HW queue). Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: add monitored SOBs to state dumpAlon Mizrahi2021-09-012-2/+37
| | | | | | | | | Current "state dump" is lacking of monitored SOB IDs. Add for convenience. Signed-off-by: Alon Mizrahi <amizrahi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: restore user registers when context opensOded Gabbay2021-09-011-2/+12
| | | | | | | | | Because we don't have multiple contexts in GAUDI, and to minimize calls to is_idle function (which uses many register reads), move the call to clear the user registers to the opening of the single user context. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: increase boot fit timeoutOded Gabbay2021-09-011-1/+1
| | | | | | | Various f/w versions have different timeouts, so increase the default timeout to accommodate all the options. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: minimize number of register readsOded Gabbay2021-09-012-9/+11
| | | | | | | | | Because the register reads might be trapped by the hypervisor in certain deployments, minimize the number of reads during runtime by moving static initializations to functions that occur during device initialization instead of context open. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: move scrubbing to late initOded Gabbay2021-09-011-5/+5
| | | | | | | HW init is mostly about configuring registers. Therefore, it is better to activate DMAs only in late init and afterwards. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs/gaudi: scrub HBM to a specific valueOfir Bitton2021-09-011-2/+7
| | | | | | | | | | In order to enhance debuggability, we will scrub the whole HBM to a specific value, in case HBM scrubbing is enabled. Scrubbing will be performed after reset and after user closes the FD. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* habanalabs: add validity check for event ID received from F/WOfir Bitton2021-09-011-0/+6
| | | | | | | | | Currently there is no validity check for event ID received from F/W, Thus exposing driver to memory overrun. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>