diff options
author | Carl Vanderlip <quic_carlv@quicinc.com> | 2023-10-16 19:00:36 +0200 |
---|---|---|
committer | Jeffrey Hugo <quic_jhugo@quicinc.com> | 2023-10-27 17:39:39 +0200 |
commit | bb8e97e26ce6437d2f57f37e8ba767a2b9cf0d65 (patch) | |
tree | 07adcb149ad7ecad7f61e218d17ac4b79dccfa75 /Documentation/accel | |
parent | drm/doc: describe PATH format for DP MST (diff) | |
download | linux-bb8e97e26ce6437d2f57f37e8ba767a2b9cf0d65.tar.xz linux-bb8e97e26ce6437d2f57f37e8ba767a2b9cf0d65.zip |
accel/qaic: Enable 1 MSI fallback mode
Several virtualization use-cases either don't support 32 MultiMSIs
(Xen/VMware) or have significant drawbacks to their use (KVM's vIOMMU,
which is required to support 32 MSI, needs to allocate an alternate
system memory space for each device using vIOMMU (e.g. 8GB VM mem and
2 cards => 8 + 2 * 8 = 24GB host memory required)). Support these
cases by enabling a 1 MSI fallback mode.
Whenever all 32 MSIs requested are not available, a second request for
a single MSI is made. Its success is the initiator of single MSI mode.
This mode causes all interrupts generated by the device to be directed
to the 0th MSI (firmware >=v1.10 will do this as a response to the PCIe
MSI capability configuration). Likewise, all interrupt handlers for the
device are registered to the 0th MSI.
Since the DBC interrupt handler checks if the DBC is in use or if
there is any pending changes, the 'spurious' interrupts are
disregarded. If there is work to be done, the standard threaded IRQ
handler is dispatched.
On every interrupt, the MHI handler wakes up its threaded interrupt
handler, and attempts to wake any waiters for MHI state events.
Performance is within +-0.6% for test cases that typify real world
use. Larger differences ([-4,+132]%, avg +47%) exist for very simple
tasks (e.g. addition) compiled for single NSPs. It is assumed that the
small work and many interrupts typically cause contention (e.g. 16 NSPs
vs 4 CPUs), as evidenced by the standard deviation between runs also
decreasing (r=-0.48 between delta(Performace_test) and
delta(StdDev_test/Avg_test))
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231016170036.5409-1-quic_jhugo@quicinc.com
Diffstat (limited to 'Documentation/accel')
-rw-r--r-- | Documentation/accel/qaic/aic100.rst | 5 | ||||
-rw-r--r-- | Documentation/accel/qaic/qaic.rst | 23 |
2 files changed, 26 insertions, 2 deletions
diff --git a/Documentation/accel/qaic/aic100.rst b/Documentation/accel/qaic/aic100.rst index c80d0f1307db..a5fef0869aab 100644 --- a/Documentation/accel/qaic/aic100.rst +++ b/Documentation/accel/qaic/aic100.rst @@ -36,8 +36,9 @@ AIC100 DID (0xa100). AIC100 does not implement FLR (function level reset). -AIC100 implements MSI but does not implement MSI-X. AIC100 requires 17 MSIs to -operate (1 for MHI, 16 for the DMA Bridge). +AIC100 implements MSI but does not implement MSI-X. AIC100 prefers 17 MSIs to +operate (1 for MHI, 16 for the DMA Bridge). Falling back to 1 MSI is possible in +scenarios where reserving 32 MSIs isn't feasible. As a PCIe device, AIC100 utilizes BARs to provide host interfaces to the device hardware. AIC100 provides 3, 64-bit BARs. diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst index c88502383136..9ccbfea86f5a 100644 --- a/Documentation/accel/qaic/qaic.rst +++ b/Documentation/accel/qaic/qaic.rst @@ -10,6 +10,9 @@ accelerator products. Interrupts ========== +IRQ Storm Mitigation +-------------------- + While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation mechanism, it is still possible for an IRQ storm to occur. A storm can happen if the workload is particularly quick, and the host is responsive. If the host @@ -35,6 +38,26 @@ generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64 IRQs over 5 minutes while keeping the host system stable, and having the same workload throughput performance (within run to run noise variation). +Single MSI Mode +--------------- + +MultiMSI is not well supported on all systems; virtualized ones even less so +(circa 2023). Between hypervisors masking the PCIe MSI capability structure to +large memory requirements for vIOMMUs (required for supporting MultiMSI), it is +useful to be able to fall back to a single MSI when needed. + +To support this fallback, we allow the case where only one MSI is able to be +allocated, and share that one MSI between MHI and the DBCs. The device detects +when only one MSI has been configured and directs the interrupts for the DBCs +to the interrupt normally used for MHI. Unfortunately this means that the +interrupt handlers for every DBC and MHI wake up for every interrupt that +arrives; however, the DBC threaded irq handlers only are started when work to be +done is detected (MHI will always start its threaded handler). + +If the DBC is configured to force MSI interrupts, this can circumvent the +software IRQ storm mitigation mentioned above. Since the MSI is shared it is +never disabled, allowing each new entry to the FIFO to trigger a new interrupt. + Neural Network Control (NNC) Protocol ===================================== |