linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'char-misc-6.2-rc1' of ↵	Linus Torvalds	2022-12-16	11	-233/+677
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the large set of char/misc and other driver subsystem changes for 6.2-rc1. Nothing earth-shattering in here at all, just a lot of new driver development and minor fixes. Highlights include: - fastrpc driver updates - iio new drivers and updates - habanalabs driver updates for new hardware and features - slimbus driver updates - speakup module parameters added to aid in boot time configuration - i2c probe_new conversions for lots of different drivers - other small driver fixes and additions One semi-interesting change in here is the increase of the number of misc dynamic minors available to 1048448 to handle new huge-cpu systems. All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (521 commits) extcon: usbc-tusb320: Convert to i2c's .probe_new() extcon: rt8973: Convert to i2c's .probe_new() extcon: fsa9480: Convert to i2c's .probe_new() extcon: max77843: Replace irqchip mask_invert with unmask_base chardev: fix error handling in cdev_device_add() mcb: mcb-parse: fix error handing in chameleon_parse_gdd() drivers: mcb: fix resource leak in mcb_probe() coresight: etm4x: fix repeated words in comments coresight: cti: Fix null pointer error on CTI init before ETM coresight: trbe: remove cpuhp instance node before remove cpuhp state counter: stm32-lptimer-cnt: fix the check on arr and cmp registers update misc: fastrpc: Add dma_mask to fastrpc_channel_ctx misc: fastrpc: Add mmap request assigning for static PD pool misc: fastrpc: Safekeep mmaps on interrupted invoke misc: fastrpc: Add support for audiopd misc: fastrpc: Rework fastrpc_req_munmap misc: fastrpc: Use fastrpc_map_put in fastrpc_map_create on fail misc: fastrpc: Add fastrpc_remote_heap_alloc misc: fastrpc: Add reserved mem support misc: fastrpc: Rename audio protection domain to root ...
\| *	habanalabs: fix VA range calculation	Ohad Sharabi	2022-11-23	1	-8/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current implementation is fixing the page size to PAGE_SIZE whereas the input page size may be different. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fail driver load if EEPROM errors detected	Ofir Bitton	2022-11-23	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case EEPROM is not burned, firmware sets default EEPROM values. As this is not valid in production, driver should fail load upon any EEPROM error reported by firmware. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: make print of engines idle mask more readable	Tomer Tayar	2022-11-23	1	-6/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The engines idle mask was increased to be an array of 4 u64 entries. To make the print of this mask more readable, remove the "0x" prefix, and zero-pad each u64 to 16 bytes if either it isn't zero or if any of the higher-order u64's is not zero. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: clear non-released encapsulated signals	Tomer Tayar	2022-11-23	3	-31/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reserved encapsulated signals which were not released hold the context refcount, leading to a failure when killing the user process on device reset or device fini. Add the release of these left signals in the CS roll-back process. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: don't put context in hl_encaps_handle_do_release_sob()	Tomer Tayar	2022-11-23	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hl_encaps_handle_do_release_sob() can be called only when the last reference to the context object is released and hl_ctx_do_release() is initiated, and therefore it shouldn't call hl_ctx_put(). Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: print context refcount value if hard reset fails	Tomer Tayar	2022-11-23	1	-3/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Failing to kill a user process during a hard reset can be due to a reference to the user context which isn't released. To make it easier to understand if this the reason for the failure and not something else, add a print of the context refcount value. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: add RMWREG32_SHIFTED to set a val within a mask	Dafna Hirschfeld	2022-11-23	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to RMWREG32, but the given 'val' is already shifted according to the mask. This allows several 'ORed' vals and masks to be set at once The patch also fixes wrong usage of RMWREG32 by replacing it with RMWREG32_SHIFTED Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fix rc when new CPUCP opcodes are not supported	Tomer Tayar	2022-11-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the new CPUCP opcodes are not supported and a CPUCP packet fails, the return value is the F/W error resposone which is a positive value. If this packet is sent from IOCTL and the positive value is used, the ICOTL will not be considered as unsuccessful. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: added return value check for hl_fw_dynamic_send_clear_cmd()	Marco Pagani	2022-11-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The clang-analyzer reported a warning: "Value stored to 'rc' is never read". The return value check for the first hl_fw_dynamic_send_clear_cmd() call in hl_fw_dynamic_send_protocol_cmd() appears to be missing. Signed-off-by: Marco Pagani <marpagan@redhat.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: increase the size of busy engines mask	Tomer Tayar	2022-11-23	1	-4/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Increase the size of the busy engines mask in 'struct hl_info_hw_idle', for future ASICs with more than 128 engines. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: extend process wait timeout in device fine	Oded Gabbay	2022-11-23	2	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Processes that use our device are likely to use at the same time other devices such as remote storage. In case our device is removed and a user process is still using the device, we need to kill the user process. However, if that process has a thread waiting for i/o to complete on remote storage, for example, the process won't terminate. Let's give it enough time to terminate before giving up. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
\| *	habanalabs: check schedule_hard_reset correctly	Oded Gabbay	2022-11-23	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	schedule_hard_reset can be true only if we didn't do hard-reset. Therefore, no point of checking it in case hard_reset is true. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Tomer Tayar <ttayar@habana.ai>
\| *	habanalabs: reset device if still in use when released	Tomer Tayar	2022-11-23	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the device file is released while a context is still held, it won't be possible to reopen it until the context is eventually released. If that doesn't happen, only a device reset will revert it back to an operational state, i.e. need to wait for a CS timeout or an error, or to wait for an external intervention of injecting a reset via sysfs. At this stage, after the device was released by user, context is held either because of CS which were left running on the device and are not relevant anymore, or due to missing cleanup steps from user side. All of this is in any case handled in the device reset flow, so initiate the reset at this point instead of waiting for it. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: skip events info ioctl if not supported	Ohad Sharabi	2022-11-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some ASICs haven't yet implemented this functionality and so the ioctl call should fail and the user should be notified of the reason. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fix firmware descriptor copy operation	farah kassabri	2022-11-23	1	-2/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is needed to allow adding more data to the lkd_fw_comms_desc structure. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs/gaudi2: add razwi notify event	Dani Liberman	2022-11-23	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each time razwi (read-only zero, write ignored) event happens, besides capturing its data, also notify the user about it. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs/gaudi: add page fault notify event	Dani Liberman	2022-11-23	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each time page fault happens, besides capturing its data, also notify the user about it. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: use single threaded WQ for event handling	Dani Liberman	2022-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Creating event queue workqueue using alloc_workqueue made it run in multi threaded mode, which caused parallel dumping of events as well as parallel events notifying to user, causing logs with multiple events to be out of order. Fixed by creating event queue workqueue as single threaded work queue. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs/gaudi: add razwi notify event	Dani Liberman	2022-11-23	2	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Each time razwi (read-only zero, write ignore) happens, besides capturing its data, also notify the user about it. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs/gaudi2: add PCI revision 2 support	Ofir Bitton	2022-11-23	6	-10/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for Gaudi2 Device with PCI revision 2. Functionality is exactly the same as revision 1, the only difference is device name exposed to user. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: remove redundant gaudi2_sec asic type	Ofir Bitton	2022-11-23	4	-8/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As Gaudi2 has a single PCI id, the secured asic type is redundant. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: add warning print upon a PCI error	Ofir Bitton	2022-11-23	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to know if driver catches PCI errors correctly, we need to print a warning per each error. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fix PCIe access to SRAM via debugfs	Tomer Tayar	2022-11-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	hl_access_sram_dram_region() uses a region base which is set within the hl_set_dram_bar() function. However, for SRAM access this function is not called, and we end up with a wrong value of region base and with a bad calculated address. Fix it by initializing the region base value independently of whether hl_set_dram_bar() is called or not. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: zero ts registration buff when allocated	farah kassabri	2022-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To avoid memory corruption in kernel memory while using timestamp registration nodes, zero the kernel buff memory when its allocated. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: no consecutive err when user context is enabled	Tal Cohen	2022-11-23	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consecutive error protects a device reset loop from being triggered due to h/w issues and enters the device into an unavailable state. When user may cause the error, an unavailable state will prevent the user from running its workloads. The commit prevents entering consecutive state when a user context is enabled. Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: use graceful hard reset for CS timeouts	Tomer Tayar	2022-11-23	1	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use graceful hard reset when detecting a CS timeout that requires a device reset. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: add an option to control watchdog timeout via debugfs	Tomer Tayar	2022-11-23	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add an option to control the timeout value for the driver's watchdog of the reset process. The timeout represents the amount of the user has to close his process once he gets a device reset notification from the driver. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: add support for graceful hard reset	Tomer Tayar	2022-11-23	2	-15/+140
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling hl_device_reset() for a hard reset will lead to a quite immediate device reset and to killing user process. For resets that follow errors, it disables the option to debug the errors on both the device side and the user application side. This patch adds a 'graceful hard reset' option and a new hl_device_cond_reset() function. Under some conditions, mainly if there is no user process or if he is not registered to driver notifications, this function will execute hard reset as usual. Otherwise, the reset will be postponed and a notification will be sent to user, to let him perform post-error actions and then to release the device, after which reset will take place. If device is not released by user in some defined time, a watchdog work will execute the reset in any case. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: avoid divide by zero in device utilization	Ohad Sharabi	2022-11-23	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently there is no verification whether the divisor is legal. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fix user mappings calculation in case of page fault	Dani Liberman	2022-11-23	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As there are 2 types of user mappings, pmmu and hmmu, calculate only the relevant mappings for the requested type. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: allow setting HBM BAR to other regions	Ohad Sharabi	2022-11-23	2	-13/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up until now the use-case in the driver was that the HBM is accessed using the HBM BAR, yet the BAR sometimes cannot cover the whole HBM and so we needed to set the BAR to other HBM offset. Now we are facing the need to access other PCI memory regions that can be covered by the HBM BAR. To answer that we are allowing the caller to determine if the HBM BAR need to be set or not regardless of the PCI memory region. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fix using freed pointer	Ohad Sharabi	2022-11-23	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code uses the pointer for trace purpose (without actually dereference it) but still get static analysis warning. This patch eliminate the warning. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: verify no zero event is sent	Tal Cohen	2022-11-23	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The event notifier mechanism should not raise an empty event (event equals zero). Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: handle HBM MMU when capturing page fault data	Dani Liberman	2022-11-23	1	-8/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case of HBM MMU page fault, capture its relevant mappings. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: move reset workqueue to be under hl_device	Tomer Tayar	2022-11-23	2	-15/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	'struct hl_device_reset_work' is used as a wrapper for the reset work and its parameters, including the reset workqueue on which it runs. In a future commit, another reset related work with similar parameters is going to be added, but it won't use the reset workqueue. As in any case there is a single reset workqueue, and to allow the resue of this structure, move the reset workqueue to 'struct hl_device'. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: allow unregistering eventfd when device non-operational	Tomer Tayar	2022-11-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unregistering eventfd is for releasing host resources and doesn't involve an access to the device. As such, there is no reason to disallow it when device isn't operational. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: skip idle status check if reset on device release	Tomer Tayar	2022-11-23	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If reset upon device release is enabled, there is no need to check the device idle status in hpriv_release(), because device is going to be reset in any case. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: replace 'pf' to 'prefetch'	Dafna Hirschfeld	2022-11-23	3	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pf was an abbreviation for prefetch but because pf already stands for 'physical function', we decided to change it to 'prefetch'. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: add page fault info uapi	Dani Liberman	2022-11-23	4	-1/+122
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Only the first page fault will be saved. Besides the address which caused the page fault, the driver captures all of the mmu user mappings. User can retrieve this data via the new uapi (new opcode in INFO ioctl). Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: use lower_32_bits()	Bharat Jauhari	2022-11-23	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes sparse warning on doing cast to 32-bits Signed-off-by: Bharat Jauhari <bjauhari@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: refactor razwi event notification	Dani Liberman	2022-11-23	4	-35/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This event notification was compatible only with gaudi, where razwi and page fault happens together. To make it compatible with all ASICs, this refactor contains: 1. Razwi notification will only notify about razwi info. New notification will be added in future patch, to retrieve data about page fault error. 2. Changed razwi info structure to support all ASICs. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: Use simplified API for p2p dist calc	Oded Gabbay	2022-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Use the simplified API that calculates distance between two devices. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: allow control device open during reset	Ofir Bitton	2022-11-23	3	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Monitoring apps would like to query device state at any time so we should allow it also during reset because it doesn't involve accessing the h/w. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
\| *	habanalabs: fix return value check in hl_fw_get_sec_attest_data()	Yang Yingliang	2022-11-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If hl_cpu_accessible_dma_pool_alloc() fails, we should check 'req_cpu_addr', fix it. Fixes: 0c88760f8f5e ("habanalabs/gaudi2: add secured attestation info uapi") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
* \|	habanalabs: remove FOLL_FORCE usage	David Hildenbrand	2022-12-01	1	-2/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages using unpin_user_pages_dirty_lock(true), the assumption is that all these pages are writable. FOLL_FORCE in this case seems to be due to copy-and-past from other drivers. Let's just remove it. Link: https://lkml.kernel.org/r/20221116102659.70287-20-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Oded Gabbay <ogabbay@kernel.org> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
*	habanalabs: eliminate aggregate use warning	Oded Gabbay	2022-09-20	5	-10/+8
\| \| \| \| \| \| \| \| \|	When doing sizeof() and giving as argument a dereference of a pointer-to-a-pointer object, clang will issue a warning. Eliminate the warning by passing struct <name>* Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
*	habanalabs: remove some f/w descriptor validations	farah kassabri	2022-09-20	1	-29/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To be forward-backward compatible with the firmware in the initial communication during preboot, we need to remove the validation of the header size. This will allow us to add more fields to the lkd_fw_comms_desc structure. Instead of the validation of the header size, we just print warning when some mismatch in descriptor has been revealed, and we calculate the CRC base on descriptor size reported by the firmware instead of calculating it ourselves. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
*	habanalabs: failure to open device due to reset is debug level	Oded Gabbay	2022-09-19	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	If the user wants to open the device, and the device is currently in reset, the user will get an error from the open(). We don't need to display an error in the dmesg for that as it is not a real error and we can spam the kernel log with this message. Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
*	habanalabs/gaudi2: add secured attestation info uapi	Dani Liberman	2022-09-19	3	-0/+101
\| \| \| \| \| \| \| \| \| \|	User will provide a nonce via the ioctl, and will retrieve secured attestation data of the boot, generated using given nonce. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>