summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-03-24 20:35:59 +0100
committerLinus Torvalds <torvalds@linux-foundation.org>2022-03-24 20:35:59 +0100
commit7403e6d8263937dea206dd201fed1ceed190ca18 (patch)
tree72e84c7bc56998c9998e95a4f14ebdc252dded41
parentMerge tag 'hyperv-next-signed-20220322' of git://git.kernel.org/pub/scm/linux... (diff)
parentvfio-pci: Provide reviewers and acceptance criteria for variant drivers (diff)
downloadlinux-7403e6d8263937dea206dd201fed1ceed190ca18.tar.xz
linux-7403e6d8263937dea206dd201fed1ceed190ca18.zip
Merge tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson: - Introduce new device migration uAPI and implement device specific mlx5 vfio-pci variant driver supporting new protocol (Jason Gunthorpe, Yishai Hadas, Leon Romanovsky) - New HiSilicon acc vfio-pci variant driver, also supporting migration interface (Shameer Kolothum, Longfang Liu) - D3hot fixes for vfio-pci-core (Abhishek Sahu) - Document new vfio-pci variant driver acceptance criteria (Alex Williamson) - Fix UML build unresolved ioport_{un}map() functions (Alex Williamson) - Fix MAINTAINERS due to header movement (Lukas Bulwahn) * tag 'vfio-v5.18-rc1' of https://github.com/awilliam/linux-vfio: (31 commits) vfio-pci: Provide reviewers and acceptance criteria for variant drivers MAINTAINERS: adjust entry for header movement in hisilicon qm driver hisi_acc_vfio_pci: Use its own PCI reset_done error handler hisi_acc_vfio_pci: Add support for VFIO live migration crypto: hisilicon/qm: Set the VF QM state register hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices hisi_acc_qm: Move VF PCI device IDs to common header crypto: hisilicon/qm: Move few definitions to common header crypto: hisilicon/qm: Move the QM header to include/linux vfio/mlx5: Fix to not use 0 as NULL pointer PCI/IOV: Fix wrong kernel-doc identifier vfio/mlx5: Use its own PCI reset_done error handler vfio/pci: Expose vfio_pci_core_aer_err_detected() vfio/mlx5: Implement vfio_pci driver for mlx5 devices vfio/mlx5: Expose migration commands over mlx5 device vfio: Remove migration protocol v1 documentation vfio: Extend the device migration protocol with RUNNING_P2P vfio: Define device migration protocol v2 ...
-rw-r--r--Documentation/driver-api/index.rst1
-rw-r--r--Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst35
-rw-r--r--Documentation/maintainer/maintainer-entry-profile.rst1
-rw-r--r--MAINTAINERS25
-rw-r--r--drivers/crypto/hisilicon/hpre/hpre.h2
-rw-r--r--drivers/crypto/hisilicon/hpre/hpre_main.c19
-rw-r--r--drivers/crypto/hisilicon/qm.c68
-rw-r--r--drivers/crypto/hisilicon/sec2/sec.h2
-rw-r--r--drivers/crypto/hisilicon/sec2/sec_main.c21
-rw-r--r--drivers/crypto/hisilicon/sgl.c2
-rw-r--r--drivers/crypto/hisilicon/zip/zip.h2
-rw-r--r--drivers/crypto/hisilicon/zip/zip_main.c17
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/cmd.c10
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/main.c45
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h1
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/sriov.c17
-rw-r--r--drivers/pci/iov.c43
-rw-r--r--drivers/vfio/pci/Kconfig5
-rw-r--r--drivers/vfio/pci/Makefile4
-rw-r--r--drivers/vfio/pci/hisilicon/Kconfig15
-rw-r--r--drivers/vfio/pci/hisilicon/Makefile4
-rw-r--r--drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c1326
-rw-r--r--drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h116
-rw-r--r--drivers/vfio/pci/mlx5/Kconfig10
-rw-r--r--drivers/vfio/pci/mlx5/Makefile4
-rw-r--r--drivers/vfio/pci/mlx5/cmd.c259
-rw-r--r--drivers/vfio/pci/mlx5/cmd.h36
-rw-r--r--drivers/vfio/pci/mlx5/main.c676
-rw-r--r--drivers/vfio/pci/vfio_pci.c1
-rw-r--r--drivers/vfio/pci/vfio_pci_core.c162
-rw-r--r--drivers/vfio/pci/vfio_pci_rdwr.c2
-rw-r--r--drivers/vfio/vfio.c296
-rw-r--r--include/linux/hisi_acc_qm.h (renamed from drivers/crypto/hisilicon/qm.h)49
-rw-r--r--include/linux/mlx5/driver.h3
-rw-r--r--include/linux/mlx5/mlx5_ifc.h147
-rw-r--r--include/linux/pci.h15
-rw-r--r--include/linux/pci_ids.h3
-rw-r--r--include/linux/vfio.h53
-rw-r--r--include/linux/vfio_pci_core.h13
-rw-r--r--include/uapi/linux/vfio.h406
40 files changed, 3558 insertions, 358 deletions
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index c57c609ad2eb..a7b0223e2886 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -103,6 +103,7 @@ available subsections can be seen below.
sync_file
vfio-mediated-device
vfio
+ vfio-pci-device-specific-driver-acceptance
xilinx/index
xillybus
zorro
diff --git a/Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst b/Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
new file mode 100644
index 000000000000..b7b99b876b50
--- /dev/null
+++ b/Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
@@ -0,0 +1,35 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Acceptance criteria for vfio-pci device specific driver variants
+================================================================
+
+Overview
+--------
+The vfio-pci driver exists as a device agnostic driver using the
+system IOMMU and relying on the robustness of platform fault
+handling to provide isolated device access to userspace. While the
+vfio-pci driver does include some device specific support, further
+extensions for yet more advanced device specific features are not
+sustainable. The vfio-pci driver has therefore split out
+vfio-pci-core as a library that may be reused to implement features
+requiring device specific knowledge, ex. saving and loading device
+state for the purposes of supporting migration.
+
+In support of such features, it's expected that some device specific
+variants may interact with parent devices (ex. SR-IOV PF in support of
+a user assigned VF) or other extensions that may not be otherwise
+accessible via the vfio-pci base driver. Authors of such drivers
+should be diligent not to create exploitable interfaces via these
+interactions or allow unchecked userspace data to have an effect
+beyond the scope of the assigned device.
+
+New driver submissions are therefore requested to have approval via
+sign-off/ack/review/etc for any interactions with parent drivers.
+Additionally, drivers should make an attempt to provide sufficient
+documentation for reviewers to understand the device specific
+extensions, for example in the case of migration data, how is the
+device state composed and consumed, which portions are not otherwise
+available to the user via vfio-pci, what safeguards exist to validate
+the data, etc. To that extent, authors should additionally expect to
+require reviews from at least one of the listed reviewers, in addition
+to the overall vfio maintainer.
diff --git a/Documentation/maintainer/maintainer-entry-profile.rst b/Documentation/maintainer/maintainer-entry-profile.rst
index 5d5cc3acdf85..93b2ae6c34a9 100644
--- a/Documentation/maintainer/maintainer-entry-profile.rst
+++ b/Documentation/maintainer/maintainer-entry-profile.rst
@@ -103,3 +103,4 @@ to do something different in the near future.
../nvdimm/maintainer-entry-profile
../riscv/patch-acceptance
../driver-api/media/maintainer-entry-profile
+ ../driver-api/vfio-pci-device-specific-driver-acceptance
diff --git a/MAINTAINERS b/MAINTAINERS
index 40fa38a11209..3230ad5db30c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8722,9 +8722,9 @@ L: linux-crypto@vger.kernel.org
S: Maintained
F: Documentation/ABI/testing/debugfs-hisi-zip
F: drivers/crypto/hisilicon/qm.c
-F: drivers/crypto/hisilicon/qm.h
F: drivers/crypto/hisilicon/sgl.c
F: drivers/crypto/hisilicon/zip/
+F: include/linux/hisi_acc_qm.h
HISILICON ROCE DRIVER
M: Wenpeng Liang <liangwenpeng@huawei.com>
@@ -20399,6 +20399,13 @@ L: kvm@vger.kernel.org
S: Maintained
F: drivers/vfio/fsl-mc/
+VFIO HISILICON PCI DRIVER
+M: Longfang Liu <liulongfang@huawei.com>
+M: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
+L: kvm@vger.kernel.org
+S: Maintained
+F: drivers/vfio/pci/hisilicon/
+
VFIO MEDIATED DEVICE DRIVERS
M: Kirti Wankhede <kwankhede@nvidia.com>
L: kvm@vger.kernel.org
@@ -20408,12 +20415,28 @@ F: drivers/vfio/mdev/
F: include/linux/mdev.h
F: samples/vfio-mdev/
+VFIO PCI DEVICE SPECIFIC DRIVERS
+R: Jason Gunthorpe <jgg@nvidia.com>
+R: Yishai Hadas <yishaih@nvidia.com>
+R: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
+R: Kevin Tian <kevin.tian@intel.com>
+L: kvm@vger.kernel.org
+S: Maintained
+P: Documentation/driver-api/vfio-pci-device-specific-driver-acceptance.rst
+F: drivers/vfio/pci/*/
+
VFIO PLATFORM DRIVER
M: Eric Auger <eric.auger@redhat.com>
L: kvm@vger.kernel.org
S: Maintained
F: drivers/vfio/platform/
+VFIO MLX5 PCI DRIVER
+M: Yishai Hadas <yishaih@nvidia.com>
+L: kvm@vger.kernel.org
+S: Maintained
+F: drivers/vfio/pci/mlx5/
+
VGA_SWITCHEROO
R: Lukas Wunner <lukas@wunner.de>
S: Maintained
diff --git a/drivers/crypto/hisilicon/hpre/hpre.h b/drivers/crypto/hisilicon/hpre/hpre.h
index e0b4a1982ee9..9a0558ed82f9 100644
--- a/drivers/crypto/hisilicon/hpre/hpre.h
+++ b/drivers/crypto/hisilicon/hpre/hpre.h
@@ -4,7 +4,7 @@
#define __HISI_HPRE_H
#include <linux/list.h>
-#include "../qm.h"
+#include <linux/hisi_acc_qm.h>
#define HPRE_SQE_SIZE sizeof(struct hpre_sqe)
#define HPRE_PF_DEF_Q_NUM 64
diff --git a/drivers/crypto/hisilicon/hpre/hpre_main.c b/drivers/crypto/hisilicon/hpre/hpre_main.c
index ebfab3e14499..36ab30e9e654 100644
--- a/drivers/crypto/hisilicon/hpre/hpre_main.c
+++ b/drivers/crypto/hisilicon/hpre/hpre_main.c
@@ -68,8 +68,7 @@
#define HPRE_REG_RD_INTVRL_US 10
#define HPRE_REG_RD_TMOUT_US 1000
#define HPRE_DBGFS_VAL_MAX_LEN 20
-#define HPRE_PCI_DEVICE_ID 0xa258
-#define HPRE_PCI_VF_DEVICE_ID 0xa259
+#define PCI_DEVICE_ID_HUAWEI_HPRE_PF 0xa258
#define HPRE_QM_USR_CFG_MASK GENMASK(31, 1)
#define HPRE_QM_AXI_CFG_MASK GENMASK(15, 0)
#define HPRE_QM_VFG_AX_MASK GENMASK(7, 0)
@@ -111,8 +110,8 @@
static const char hpre_name[] = "hisi_hpre";
static struct dentry *hpre_debugfs_root;
static const struct pci_device_id hpre_dev_ids[] = {
- { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, HPRE_PCI_DEVICE_ID) },
- { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, HPRE_PCI_VF_DEVICE_ID) },
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_PF) },
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_VF) },
{ 0, }
};
@@ -242,7 +241,7 @@ MODULE_PARM_DESC(uacce_mode, UACCE_MODE_DESC);
static int pf_q_num_set(const char *val, const struct kernel_param *kp)
{
- return q_num_set(val, kp, HPRE_PCI_DEVICE_ID);
+ return q_num_set(val, kp, PCI_DEVICE_ID_HUAWEI_HPRE_PF);
}
static const struct kernel_param_ops hpre_pf_q_num_ops = {
@@ -921,7 +920,7 @@ static int hpre_debugfs_init(struct hisi_qm *qm)
qm->debug.sqe_mask_len = HPRE_SQE_MASK_LEN;
hisi_qm_debug_init(qm);
- if (qm->pdev->device == HPRE_PCI_DEVICE_ID) {
+ if (qm->pdev->device == PCI_DEVICE_ID_HUAWEI_HPRE_PF) {
ret = hpre_ctrl_debug_init(qm);
if (ret)
goto failed_to_create;
@@ -958,7 +957,7 @@ static int hpre_qm_init(struct hisi_qm *qm, struct pci_dev *pdev)
qm->sqe_size = HPRE_SQE_SIZE;
qm->dev_name = hpre_name;
- qm->fun_type = (pdev->device == HPRE_PCI_DEVICE_ID) ?
+ qm->fun_type = (pdev->device == PCI_DEVICE_ID_HUAWEI_HPRE_PF) ?
QM_HW_PF : QM_HW_VF;
if (qm->fun_type == QM_HW_PF) {
qm->qp_base = HPRE_PF_DEF_Q_BASE;
@@ -1191,6 +1190,12 @@ static struct pci_driver hpre_pci_driver = {
.driver.pm = &hpre_pm_ops,
};
+struct pci_driver *hisi_hpre_get_pf_driver(void)
+{
+ return &hpre_pci_driver;
+}
+EXPORT_SYMBOL_GPL(hisi_hpre_get_pf_driver);
+
static void hpre_register_debugfs(void)
{
if (!debugfs_initialized())
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index 453390044181..009132333d2b 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -15,7 +15,7 @@
#include <linux/uacce.h>
#include <linux/uaccess.h>
#include <uapi/misc/uacce/hisi_qm.h>
-#include "qm.h"
+#include <linux/hisi_acc_qm.h>
/* eq/aeq irq enable */
#define QM_VF_AEQ_INT_SOURCE 0x0
@@ -33,23 +33,6 @@
#define QM_ABNORMAL_EVENT_IRQ_VECTOR 3
/* mailbox */
-#define QM_MB_CMD_SQC 0x0
-#define QM_MB_CMD_CQC 0x1
-#define QM_MB_CMD_EQC 0x2
-#define QM_MB_CMD_AEQC 0x3
-#define QM_MB_CMD_SQC_BT 0x4
-#define QM_MB_CMD_CQC_BT 0x5
-#define QM_MB_CMD_SQC_VFT_V2 0x6
-#define QM_MB_CMD_STOP_QP 0x8
-#define QM_MB_CMD_SRC 0xc
-#define QM_MB_CMD_DST 0xd
-
-#define QM_MB_CMD_SEND_BASE 0x300
-#define QM_MB_EVENT_SHIFT 8
-#define QM_MB_BUSY_SHIFT 13
-#define QM_MB_OP_SHIFT 14
-#define QM_MB_CMD_DATA_ADDR_L 0x304
-#define QM_MB_CMD_DATA_ADDR_H 0x308
#define QM_MB_PING_ALL_VFS 0xffff
#define QM_MB_CMD_DATA_SHIFT 32
#define QM_MB_CMD_DATA_MASK GENMASK(31, 0)
@@ -103,19 +86,12 @@
#define QM_DB_CMD_SHIFT_V1 16
#define QM_DB_INDEX_SHIFT_V1 32
#define QM_DB_PRIORITY_SHIFT_V1 48
-#define QM_DOORBELL_SQ_CQ_BASE_V2 0x1000
-#define QM_DOORBELL_EQ_AEQ_BASE_V2 0x2000
#define QM_QUE_ISO_CFG_V 0x0030
#define QM_PAGE_SIZE 0x0034
#define QM_QUE_ISO_EN 0x100154
#define QM_CAPBILITY 0x100158
#define QM_QP_NUN_MASK GENMASK(10, 0)
#define QM_QP_DB_INTERVAL 0x10000
-#define QM_QP_MAX_NUM_SHIFT 11
-#define QM_DB_CMD_SHIFT_V2 12
-#define QM_DB_RAND_SHIFT_V2 16
-#define QM_DB_INDEX_SHIFT_V2 32
-#define QM_DB_PRIORITY_SHIFT_V2 48
#define QM_MEM_START_INIT 0x100040
#define QM_MEM_INIT_DONE 0x100044
@@ -693,7 +669,7 @@ static void qm_mb_pre_init(struct qm_mailbox *mailbox, u8 cmd,
}
/* return 0 mailbox ready, -ETIMEDOUT hardware timeout */
-static int qm_wait_mb_ready(struct hisi_qm *qm)
+int hisi_qm_wait_mb_ready(struct hisi_qm *qm)
{
u32 val;
@@ -701,6 +677,7 @@ static int qm_wait_mb_ready(struct hisi_qm *qm)
val, !((val >> QM_MB_BUSY_SHIFT) &
0x1), POLL_PERIOD, POLL_TIMEOUT);
}
+EXPORT_SYMBOL_GPL(hisi_qm_wait_mb_ready);
/* 128 bit should be written to hardware at one time to trigger a mailbox */
static void qm_mb_write(struct hisi_qm *qm, const void *src)
@@ -726,14 +703,14 @@ static void qm_mb_write(struct hisi_qm *qm, const void *src)
static int qm_mb_nolock(struct hisi_qm *qm, struct qm_mailbox *mailbox)
{
- if (unlikely(qm_wait_mb_ready(qm))) {
+ if (unlikely(hisi_qm_wait_mb_ready(qm))) {
dev_err(&qm->pdev->dev, "QM mailbox is busy to start!\n");
goto mb_busy;
}
qm_mb_write(qm, mailbox);
- if (unlikely(qm_wait_mb_ready(qm))) {
+ if (unlikely(hisi_qm_wait_mb_ready(qm))) {
dev_err(&qm->pdev->dev, "QM mailbox operation timeout!\n");
goto mb_busy;
}
@@ -745,8 +722,8 @@ mb_busy:
return -EBUSY;
}
-static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
- bool op)
+int hisi_qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
+ bool op)
{
struct qm_mailbox mailbox;
int ret;
@@ -762,6 +739,7 @@ static int qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
return ret;
}
+EXPORT_SYMBOL_GPL(hisi_qm_mb);
static void qm_db_v1(struct hisi_qm *qm, u16 qn, u8 cmd, u16 index, u8 priority)
{
@@ -1351,7 +1329,7 @@ static int qm_get_vft_v2(struct hisi_qm *qm, u32 *base, u32 *number)
u64 sqc_vft;
int ret;
- ret = qm_mb(qm, QM_MB_CMD_SQC_VFT_V2, 0, 0, 1);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_VFT_V2, 0, 0, 1);
if (ret)
return ret;
@@ -1725,12 +1703,12 @@ static int dump_show(struct hisi_qm *qm, void *info,
static int qm_dump_sqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id)
{
- return qm_mb(qm, QM_MB_CMD_SQC, dma_addr, qp_id, 1);
+ return hisi_qm_mb(qm, QM_MB_CMD_SQC, dma_addr, qp_id, 1);
}
static int qm_dump_cqc_raw(struct hisi_qm *qm, dma_addr_t dma_addr, u16 qp_id)
{
- return qm_mb(qm, QM_MB_CMD_CQC, dma_addr, qp_id, 1);
+ return hisi_qm_mb(qm, QM_MB_CMD_CQC, dma_addr, qp_id, 1);
}
static int qm_sqc_dump(struct hisi_qm *qm, const char *s)
@@ -1842,7 +1820,7 @@ static int qm_eqc_aeqc_dump(struct hisi_qm *qm, char *s, size_t size,
if (IS_ERR(xeqc))
return PTR_ERR(xeqc);
- ret = qm_mb(qm, cmd, xeqc_dma, 0, 1);
+ ret = hisi_qm_mb(qm, cmd, xeqc_dma, 0, 1);
if (ret)
goto err_free_ctx;
@@ -2495,7 +2473,7 @@ unlock:
static int qm_stop_qp(struct hisi_qp *qp)
{
- return qm_mb(qp->qm, QM_MB_CMD_STOP_QP, 0, qp->qp_id, 0);
+ return hisi_qm_mb(qp->qm, QM_MB_CMD_STOP_QP, 0, qp->qp_id, 0);
}
static int qm_set_msi(struct hisi_qm *qm, bool set)
@@ -2763,7 +2741,7 @@ static int qm_sq_ctx_cfg(struct hisi_qp *qp, int qp_id, u32 pasid)
return -ENOMEM;
}
- ret = qm_mb(qm, QM_MB_CMD_SQC, sqc_dma, qp_id, 0);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_SQC, sqc_dma, qp_id, 0);
dma_unmap_single(dev, sqc_dma, sizeof(struct qm_sqc), DMA_TO_DEVICE);
kfree(sqc);
@@ -2804,7 +2782,7 @@ static int qm_cq_ctx_cfg(struct hisi_qp *qp, int qp_id, u32 pasid)
return -ENOMEM;
}
- ret = qm_mb(qm, QM_MB_CMD_CQC, cqc_dma, qp_id, 0);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_CQC, cqc_dma, qp_id, 0);
dma_unmap_single(dev, cqc_dma, sizeof(struct qm_cqc), DMA_TO_DEVICE);
kfree(cqc);
@@ -3514,6 +3492,12 @@ static void hisi_qm_pci_uninit(struct hisi_qm *qm)
pci_disable_device(pdev);
}
+static void hisi_qm_set_state(struct hisi_qm *qm, u8 state)
+{
+ if (qm->ver > QM_HW_V2 && qm->fun_type == QM_HW_VF)
+ writel(state, qm->io_base + QM_VF_STATE);
+}
+
/**
* hisi_qm_uninit() - Uninitialize qm.
* @qm: The qm needed uninit.
@@ -3542,6 +3526,7 @@ void hisi_qm_uninit(struct hisi_qm *qm)
dma_free_coherent(dev, qm->qdma.size,
qm->qdma.va, qm->qdma.dma);
}
+ hisi_qm_set_state(qm, QM_NOT_READY);
up_write(&qm->qps_lock);
qm_irq_unregister(qm);
@@ -3655,7 +3640,7 @@ static int qm_eq_ctx_cfg(struct hisi_qm *qm)
return -ENOMEM;
}
- ret = qm_mb(qm, QM_MB_CMD_EQC, eqc_dma, 0, 0);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_EQC, eqc_dma, 0, 0);
dma_unmap_single(dev, eqc_dma, sizeof(struct qm_eqc), DMA_TO_DEVICE);
kfree(eqc);
@@ -3684,7 +3669,7 @@ static int qm_aeq_ctx_cfg(struct hisi_qm *qm)
return -ENOMEM;
}
- ret = qm_mb(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 0);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_AEQC, aeqc_dma, 0, 0);
dma_unmap_single(dev, aeqc_dma, sizeof(struct qm_aeqc), DMA_TO_DEVICE);
kfree(aeqc);
@@ -3723,11 +3708,11 @@ static int __hisi_qm_start(struct hisi_qm *qm)
if (ret)
return ret;
- ret = qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0);
if (ret)
return ret;
- ret = qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0);
+ ret = hisi_qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0);
if (ret)
return ret;
@@ -3767,6 +3752,7 @@ int hisi_qm_start(struct hisi_qm *qm)
if (!ret)
atomic_set(&qm->status.flags, QM_START);
+ hisi_qm_set_state(qm, QM_READY);
err_unlock:
up_write(&qm->qps_lock);
return ret;
diff --git a/drivers/crypto/hisilicon/sec2/sec.h b/drivers/crypto/hisilicon/sec2/sec.h
index d97cf02b1df7..c2e9b01187a7 100644
--- a/drivers/crypto/hisilicon/sec2/sec.h
+++ b/drivers/crypto/hisilicon/sec2/sec.h
@@ -4,7 +4,7 @@
#ifndef __HISI_SEC_V2_H
#define __HISI_SEC_V2_H
-#include "../qm.h"
+#include <linux/hisi_acc_qm.h>
#include "sec_crypto.h"
/* Algorithm resource per hardware SEC queue */
diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c
index 0b9906ff69e3..92fae706bdb2 100644
--- a/drivers/crypto/hisilicon/sec2/sec_main.c
+++ b/drivers/crypto/hisilicon/sec2/sec_main.c
@@ -20,8 +20,7 @@
#define SEC_VF_NUM 63
#define SEC_QUEUE_NUM_V1 4096
-#define SEC_PF_PCI_DEVICE_ID 0xa255
-#define SEC_VF_PCI_DEVICE_ID 0xa256
+#define PCI_DEVICE_ID_HUAWEI_SEC_PF 0xa255
#define SEC_BD_ERR_CHK_EN0 0xEFFFFFFF
#define SEC_BD_ERR_CHK_EN1 0x7ffff7fd
@@ -229,7 +228,7 @@ static const struct debugfs_reg32 sec_dfx_regs[] = {
static int sec_pf_q_num_set(const char *val, const struct kernel_param *kp)
{
- return q_num_set(val, kp, SEC_PF_PCI_DEVICE_ID);
+ return q_num_set(val, kp, PCI_DEVICE_ID_HUAWEI_SEC_PF);
}
static const struct kernel_param_ops sec_pf_q_num_ops = {
@@ -317,8 +316,8 @@ module_param_cb(uacce_mode, &sec_uacce_mode_ops, &uacce_mode, 0444);
MODULE_PARM_DESC(uacce_mode, UACCE_MODE_DESC);
static const struct pci_device_id sec_dev_ids[] = {
- { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, SEC_PF_PCI_DEVICE_ID) },
- { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, SEC_VF_PCI_DEVICE_ID) },
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_PF) },
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_VF) },
{ 0, }
};
MODULE_DEVICE_TABLE(pci, sec_dev_ids);
@@ -748,7 +747,7 @@ static int sec_core_debug_init(struct hisi_qm *qm)
regset->base = qm->io_base;
regset->dev = dev;
- if (qm->pdev->device == SEC_PF_PCI_DEVICE_ID)
+ if (qm->pdev->device == PCI_DEVICE_ID_HUAWEI_SEC_PF)
debugfs_create_file("regs", 0444, tmp_d, regset, &sec_regs_fops);
for (i = 0; i < ARRAY_SIZE(sec_dfx_labels); i++) {
@@ -766,7 +765,7 @@ static int sec_debug_init(struct hisi_qm *qm)
struct sec_dev *sec = container_of(qm, struct sec_dev, qm);
int i;
- if (qm->pdev->device == SEC_PF_PCI_DEVICE_ID) {
+ if (qm->pdev->device == PCI_DEVICE_ID_HUAWEI_SEC_PF) {
for (i = SEC_CLEAR_ENABLE; i < SEC_DEBUG_FILE_NUM; i++) {
spin_lock_init(&sec->debug.files[i].lock);
sec->debug.files[i].index = i;
@@ -908,7 +907,7 @@ static int sec_qm_init(struct hisi_qm *qm, struct pci_dev *pdev)
qm->sqe_size = SEC_SQE_SIZE;
qm->dev_name = sec_name;
- qm->fun_type = (pdev->device == SEC_PF_PCI_DEVICE_ID) ?
+ qm->fun_type = (pdev->device == PCI_DEVICE_ID_HUAWEI_SEC_PF) ?
QM_HW_PF : QM_HW_VF;
if (qm->fun_type == QM_HW_PF) {
qm->qp_base = SEC_PF_DEF_Q_BASE;
@@ -1120,6 +1119,12 @@ static struct pci_driver sec_pci_driver = {
.driver.pm = &sec_pm_ops,
};
+struct pci_driver *hisi_sec_get_pf_driver(void)
+{
+ return &sec_pci_driver;
+}
+EXPORT_SYMBOL_GPL(hisi_sec_get_pf_driver);
+
static void sec_register_debugfs(void)
{
if (!debugfs_initialized())
diff --git a/drivers/crypto/hisilicon/sgl.c b/drivers/crypto/hisilicon/sgl.c
index 057273769f26..f7efc02b065f 100644
--- a/drivers/crypto/hisilicon/sgl.c
+++ b/drivers/crypto/hisilicon/sgl.c
@@ -1,9 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 HiSilicon Limited. */
#include <linux/dma-mapping.h>
+#include <linux/hisi_acc_qm.h>
#include <linux/module.h>
#include <linux/slab.h>
-#include "qm.h"
#define HISI_ACC_SGL_SGE_NR_MIN 1
#define HISI_ACC_SGL_NR_MAX 256
diff --git a/drivers/crypto/hisilicon/zip/zip.h b/drivers/crypto/hisilicon/zip/zip.h
index 517fdbdff3ea..3dfd3bac5a33 100644
--- a/drivers/crypto/hisilicon/zip/zip.h
+++ b/drivers/crypto/hisilicon/zip/zip.h
@@ -7,7 +7,7 @@
#define pr_fmt(fmt) "hisi_zip: " fmt
#include <linux/list.h>
-#include "../qm.h"
+#include <linux/hisi_acc_qm.h>
enum hisi_zip_error_type {
/* negative compression */
diff --git a/drivers/crypto/hisilicon/zip/zip_main.c b/drivers/crypto/hisilicon/zip/zip_main.c
index 678f8b58ec42..4534e1e107d1 100644
--- a/drivers/crypto/hisilicon/zip/zip_main.c
+++ b/drivers/crypto/hisilicon/zip/zip_main.c
@@ -15,8 +15,7 @@
#include <linux/uacce.h>
#include "zip.h"
-#define PCI_DEVICE_ID_ZIP_PF 0xa250
-#define PCI_DEVICE_ID_ZIP_VF 0xa251
+#define PCI_DEVICE_ID_HUAWEI_ZIP_PF 0xa250
#define HZIP_QUEUE_NUM_V1 4096
@@ -246,7 +245,7 @@ MODULE_PARM_DESC(uacce_mode, UACCE_MODE_DESC);
static int pf_q_num_set(const char *val, const struct kernel_param *kp)
{
- return q_num_set(val, kp, PCI_DEVICE_ID_ZIP_PF);
+ return q_num_set(val, kp, PCI_DEVICE_ID_HUAWEI_ZIP_PF);
}
static const struct kernel_param_ops pf_q_num_ops = {
@@ -268,8 +267,8 @@ module_param_cb(vfs_num, &vfs_num_ops, &vfs_num, 0444);
MODULE_PARM_DESC(vfs_num, "Number of VFs to enable(1-63), 0(default)");
static const struct pci_device_id hisi_zip_dev_ids[] = {
- { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_PF) },
- { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_ZIP_VF) },
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_PF) },
+ { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_VF) },
{ 0, }
};
MODULE_DEVICE_TABLE(pci, hisi_zip_dev_ids);
@@ -838,7 +837,7 @@ static int hisi_zip_qm_init(struct hisi_qm *qm, struct pci_dev *pdev)
qm->sqe_size = HZIP_SQE_SIZE;
qm->dev_name = hisi_zip_name;
- qm->fun_type = (pdev->device == PCI_DEVICE_ID_ZIP_PF) ?
+ qm->fun_type = (pdev->device == PCI_DEVICE_ID_HUAWEI_ZIP_PF) ?
QM_HW_PF : QM_HW_VF;
if (qm->fun_type == QM_HW_PF) {
qm->qp_base = HZIP_PF_DEF_Q_BASE;
@@ -1013,6 +1012,12 @@ static struct pci_driver hisi_zip_pci_driver = {
.driver.pm = &hisi_zip_pm_ops,
};
+struct pci_driver *hisi_zip_get_pf_driver(void)
+{
+ return &hisi_zip_pci_driver;
+}
+EXPORT_SYMBOL_GPL(hisi_zip_get_pf_driver);
+
static void hisi_zip_register_debugfs(void)
{
if (!debugfs_initialized())
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 3eacd8739929..fc19a0762af2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -478,6 +478,11 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op,
case MLX5_CMD_OP_QUERY_VHCA_STATE:
case MLX5_CMD_OP_MODIFY_VHCA_STATE:
case MLX5_CMD_OP_ALLOC_SF:
+ case MLX5_CMD_OP_SUSPEND_VHCA:
+ case MLX5_CMD_OP_RESUME_VHCA:
+ case MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE:
+ case MLX5_CMD_OP_SAVE_VHCA_STATE:
+ case MLX5_CMD_OP_LOAD_VHCA_STATE:
*status = MLX5_DRIVER_STATUS_ABORTED;
*synd = MLX5_DRIVER_SYND;
return -EIO;
@@ -675,6 +680,11 @@ const char *mlx5_command_str(int command)
MLX5_COMMAND_STR_CASE(MODIFY_VHCA_STATE);
MLX5_COMMAND_STR_CASE(ALLOC_SF);
MLX5_COMMAND_STR_CASE(DEALLOC_SF);
+ MLX5_COMMAND_STR_CASE(SUSPEND_VHCA);
+ MLX5_COMMAND_STR_CASE(RESUME_VHCA);
+ MLX5_COMMAND_STR_CASE(QUERY_VHCA_MIGRATION_STATE);
+ MLX5_COMMAND_STR_CASE(SAVE_VHCA_STATE);
+ MLX5_COMMAND_STR_CASE(LOAD_VHCA_STATE);
default: return "unknown command opcode";
}
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index bba72b220cc3..976578d1904c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1620,6 +1620,7 @@ static void remove_one(struct pci_dev *pdev)
struct devlink *devlink = priv_to_devlink(dev);
devlink_unregister(devlink);
+ mlx5_sriov_disable(pdev);
mlx5_crdump_disable(dev);
mlx5_drain_health_wq(dev);
mlx5_uninit_one(dev);
@@ -1882,6 +1883,50 @@ static struct pci_driver mlx5_core_driver = {
.sriov_set_msix_vec_count = mlx5_core_sriov_set_msix_vec_count,
};
+/**
+ * mlx5_vf_get_core_dev - Get the mlx5 core device from a given VF PCI device if
+ * mlx5_core is its driver.
+ * @pdev: The associated PCI device.
+ *
+ * Upon return the interface state lock stay held to let caller uses it safely.
+ * Caller must ensure to use the returned mlx5 device for a narrow window
+ * and put it back with mlx5_vf_put_core_dev() immediately once usage was over.
+ *
+ * Return: Pointer to the associated mlx5_core_dev or NULL.
+ */
+struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev)
+ __acquires(&mdev->intf_state_mutex)
+{
+ struct mlx5_core_dev *mdev;
+
+ mdev = pci_iov_get_pf_drvdata(pdev, &mlx5_core_driver);
+ if (IS_ERR(mdev))
+ return NULL;
+
+ mutex_lock(&mdev->intf_state_mutex);
+ if (!test_bit(MLX5_INTERFACE_STATE_UP, &mdev->intf_state)) {
+ mutex_unlock(&mdev->intf_state_mutex);
+ return NULL;
+ }
+
+ return mdev;
+}
+EXPORT_SYMBOL(mlx5_vf_get_core_dev);
+
+/**
+ * mlx5_vf_put_core_dev - Put the mlx5 core device back.
+ * @mdev: The mlx5 core device.
+ *
+ * Upon return the interface state lock is unlocked and caller should not
+ * access the mdev any more.
+ */
+void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev)
+ __releases(&mdev->intf_state_mutex)
+{
+ mutex_unlock(&mdev->intf_state_mutex);
+}
+EXPORT_SYMBOL(mlx5_vf_put_core_dev);
+
static void mlx5_core_verify_params(void)
{
if (prof_sel >= ARRAY_SIZE(profile)) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 6f8baa0f2a73..37b2805b3bf3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -164,6 +164,7 @@ void mlx5_sriov_cleanup(struct mlx5_core_dev *dev);
int mlx5_sriov_attach(struct mlx5_core_dev *dev);
void mlx5_sriov_detach(struct mlx5_core_dev *dev);
int mlx5_core_sriov_configure(struct pci_dev *dev, int num_vfs);
+void mlx5_sriov_disable(struct pci_dev *pdev);
int mlx5_core_sriov_set_msix_vec_count(struct pci_dev *vf, int msix_vec_count);
int mlx5_core_enable_hca(struct mlx5_core_dev *dev, u16 func_id);
int mlx5_core_disable_hca(struct mlx5_core_dev *dev, u16 func_id);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index e8185b69ac6c..887ee0f729d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -161,7 +161,7 @@ static int mlx5_sriov_enable(struct pci_dev *pdev, int num_vfs)
return err;
}
-static void mlx5_sriov_disable(struct pci_dev *pdev)
+void mlx5_sriov_disable(struct pci_dev *pdev)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
int num_vfs = pci_num_vf(dev->pdev);
@@ -205,19 +205,8 @@ int mlx5_core_sriov_set_msix_vec_count(struct pci_dev *vf, int msix_vec_count)
mlx5_get_default_msix_vec_count(dev, pci_num_vf(pf));
sriov = &dev->priv.sriov;
-
- /* Reversed translation of PCI VF function number to the internal
- * function_id, which exists in the name of virtfn symlink.
- */
- for (id = 0; id < pci_num_vf(pf); id++) {
- if (!sriov->vfs_ctx[id].enabled)
- continue;
-
- if (vf->devfn == pci_iov_virtfn_devfn(pf, id))
- break;
- }
-
- if (id == pci_num_vf(pf) || !sriov->vfs_ctx[id].enabled)
+ id = pci_iov_vf_id(vf);
+ if (id < 0 || !sriov->vfs_ctx[id].enabled)
return -EINVAL;
return mlx5_set_msix_vec_count(dev, id + 1, msix_vec_count);
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 0267977c9f17..952217572113 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -33,6 +33,49 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
}
EXPORT_SYMBOL_GPL(pci_iov_virtfn_devfn);
+int pci_iov_vf_id(struct pci_dev *dev)
+{
+ struct pci_dev *pf;
+
+ if (!dev->is_virtfn)
+ return -EINVAL;
+
+ pf = pci_physfn(dev);
+ return (((dev->bus->number << 8) + dev->devfn) -
+ ((pf->bus->number << 8) + pf->devfn + pf->sriov->offset)) /
+ pf->sriov->stride;
+}
+EXPORT_SYMBOL_GPL(pci_iov_vf_id);
+
+/**
+ * pci_iov_get_pf_drvdata - Return the drvdata of a PF
+ * @dev: VF pci_dev
+ * @pf_driver: Device driver required to own the PF
+ *
+ * This must be called from a context that ensures that a VF driver is attached.
+ * The value returned is invalid once the VF driver completes its remove()
+ * callback.
+ *
+ * Locking is achieved by the driver core. A VF driver cannot be probed until
+ * pci_enable_sriov() is called and pci_disable_sriov() does not return until
+ * all VF drivers have completed their remove().
+ *
+ * The PF driver must call pci_disable_sriov() before it begins to destroy the
+ * drvdata.
+ */
+void *pci_iov_get_pf_drvdata(struct pci_dev *dev, struct pci_driver *pf_driver)
+{
+ struct pci_dev *pf_dev;
+
+ if (!dev->is_virtfn)
+ return ERR_PTR(-EINVAL);
+ pf_dev = dev->physfn;
+ if (pf_dev->driver != pf_driver)
+ return ERR_PTR(-EINVAL);
+ return pci_get_drvdata(pf_dev);
+}
+EXPORT_SYMBOL_GPL(pci_iov_get_pf_drvdata);
+
/*
* Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
* change when NumVFs changes.
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 860424ccda1b..4da1914425e1 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -43,4 +43,9 @@ config VFIO_PCI_IGD
To enable Intel IGD assignment through vfio-pci, say Y.
endif
+
+source "drivers/vfio/pci/mlx5/Kconfig"
+
+source "drivers/vfio/pci/hisilicon/Kconfig"
+
endif
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 349d68d242b4..7052ebd893e0 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -7,3 +7,7 @@ obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o
vfio-pci-y := vfio_pci.o
vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
+
+obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/
+
+obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/
diff --git a/drivers/vfio/pci/hisilicon/Kconfig b/drivers/vfio/pci/hisilicon/Kconfig
new file mode 100644
index 000000000000..5daa0f45d2f9
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/Kconfig
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config HISI_ACC_VFIO_PCI
+ tristate "VFIO PCI support for HiSilicon ACC devices"
+ depends on ARM64 || (COMPILE_TEST && 64BIT)
+ depends on VFIO_PCI_CORE
+ depends on PCI_MSI
+ depends on CRYPTO_DEV_HISI_QM
+ depends on CRYPTO_DEV_HISI_HPRE
+ depends on CRYPTO_DEV_HISI_SEC2
+ depends on CRYPTO_DEV_HISI_ZIP
+ help
+ This provides generic PCI support for HiSilicon ACC devices
+ using the VFIO framework.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/hisilicon/Makefile b/drivers/vfio/pci/hisilicon/Makefile
new file mode 100644
index 000000000000..c66b3783f2f9
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisi-acc-vfio-pci.o
+hisi-acc-vfio-pci-y := hisi_acc_vfio_pci.o
+
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
new file mode 100644
index 000000000000..767b5d47631a
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -0,0 +1,1326 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, HiSilicon Ltd.
+ */
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/hisi_acc_qm.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/vfio.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/anon_inodes.h>
+
+#include "hisi_acc_vfio_pci.h"
+
+/* return 0 on VM acc device ready, -ETIMEDOUT hardware timeout */
+static int qm_wait_dev_not_ready(struct hisi_qm *qm)
+{
+ u32 val;
+
+ return readl_relaxed_poll_timeout(qm->io_base + QM_VF_STATE,
+ val, !(val & 0x1), MB_POLL_PERIOD_US,
+ MB_POLL_TIMEOUT_US);
+}
+
+/*
+ * Each state Reg is checked 100 times,
+ * with a delay of 100 microseconds after each check
+ */
+static u32 qm_check_reg_state(struct hisi_qm *qm, u32 regs)
+{
+ int check_times = 0;
+ u32 state;
+
+ state = readl(qm->io_base + regs);
+ while (state && check_times < ERROR_CHECK_TIMEOUT) {
+ udelay(CHECK_DELAY_TIME);
+ state = readl(qm->io_base + regs);
+ check_times++;
+ }
+
+ return state;
+}
+
+static int qm_read_regs(struct hisi_qm *qm, u32 reg_addr,
+ u32 *data, u8 nums)
+{
+ int i;
+
+ if (nums < 1 || nums > QM_REGS_MAX_LEN)
+ return -EINVAL;
+
+ for (i = 0; i < nums; i++) {
+ data[i] = readl(qm->io_base + reg_addr);
+ reg_addr += QM_REG_ADDR_OFFSET;
+ }
+
+ return 0;
+}
+
+static int qm_write_regs(struct hisi_qm *qm, u32 reg,
+ u32 *data, u8 nums)
+{
+ int i;
+
+ if (nums < 1 || nums > QM_REGS_MAX_LEN)
+ return -EINVAL;
+
+ for (i = 0; i < nums; i++)
+ writel(data[i], qm->io_base + reg + i * QM_REG_ADDR_OFFSET);
+
+ return 0;
+}
+
+static int qm_get_vft(struct hisi_qm *qm, u32 *base)
+{
+ u64 sqc_vft;
+ u32 qp_num;
+ int ret;
+
+ ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_VFT_V2, 0, 0, 1);
+ if (ret)
+ return ret;
+
+ sqc_vft = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
+ ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) <<
+ QM_XQC_ADDR_OFFSET);
+ *base = QM_SQC_VFT_BASE_MASK_V2 & (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2);
+ qp_num = (QM_SQC_VFT_NUM_MASK_V2 &
+ (sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1;
+
+ return qp_num;
+}
+
+static int qm_get_sqc(struct hisi_qm *qm, u64 *addr)
+{
+ int ret;
+
+ ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_BT, 0, 0, 1);
+ if (ret)
+ return ret;
+
+ *addr = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
+ ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) <<
+ QM_XQC_ADDR_OFFSET);
+
+ return 0;
+}
+
+static int qm_get_cqc(struct hisi_qm *qm, u64 *addr)
+{
+ int ret;
+
+ ret = hisi_qm_mb(qm, QM_MB_CMD_CQC_BT, 0, 0, 1);
+ if (ret)
+ return ret;
+
+ *addr = readl(qm->io_base + QM_MB_CMD_DATA_ADDR_L) |
+ ((u64)readl(qm->io_base + QM_MB_CMD_DATA_ADDR_H) <<
+ QM_XQC_ADDR_OFFSET);
+
+ return 0;
+}
+
+static int qm_get_regs(struct hisi_qm *qm, struct acc_vf_data *vf_data)
+{
+ struct device *dev = &qm->pdev->dev;
+ int ret;
+
+ ret = qm_read_regs(qm, QM_VF_AEQ_INT_MASK, &vf_data->aeq_int_mask, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_VF_AEQ_INT_MASK\n");
+ return ret;
+ }
+
+ ret = qm_read_regs(qm, QM_VF_EQ_INT_MASK, &vf_data->eq_int_mask, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_VF_EQ_INT_MASK\n");
+ return ret;
+ }
+
+ ret = qm_read_regs(qm, QM_IFC_INT_SOURCE_V,
+ &vf_data->ifc_int_source, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_IFC_INT_SOURCE_V\n");
+ return ret;
+ }
+
+ ret = qm_read_regs(qm, QM_IFC_INT_MASK, &vf_data->ifc_int_mask, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_IFC_INT_MASK\n");
+ return ret;
+ }
+
+ ret = qm_read_regs(qm, QM_IFC_INT_SET_V, &vf_data->ifc_int_set, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_IFC_INT_SET_V\n");
+ return ret;
+ }
+
+ ret = qm_read_regs(qm, QM_PAGE_SIZE, &vf_data->page_size, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_PAGE_SIZE\n");
+ return ret;
+ }
+
+ /* QM_EQC_DW has 7 regs */
+ ret = qm_read_regs(qm, QM_EQC_DW0, vf_data->qm_eqc_dw, 7);
+ if (ret) {
+ dev_err(dev, "failed to read QM_EQC_DW\n");
+ return ret;
+ }
+
+ /* QM_AEQC_DW has 7 regs */
+ ret = qm_read_regs(qm, QM_AEQC_DW0, vf_data->qm_aeqc_dw, 7);
+ if (ret) {
+ dev_err(dev, "failed to read QM_AEQC_DW\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static int qm_set_regs(struct hisi_qm *qm, struct acc_vf_data *vf_data)
+{
+ struct device *dev = &qm->pdev->dev;
+ int ret;
+
+ /* check VF state */
+ if (unlikely(hisi_qm_wait_mb_ready(qm))) {
+ dev_err(&qm->pdev->dev, "QM device is not ready to write\n");
+ return -EBUSY;
+ }
+
+ ret = qm_write_regs(qm, QM_VF_AEQ_INT_MASK, &vf_data->aeq_int_mask, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_VF_AEQ_INT_MASK\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(qm, QM_VF_EQ_INT_MASK, &vf_data->eq_int_mask, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_VF_EQ_INT_MASK\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(qm, QM_IFC_INT_SOURCE_V,
+ &vf_data->ifc_int_source, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_IFC_INT_SOURCE_V\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(qm, QM_IFC_INT_MASK, &vf_data->ifc_int_mask, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_IFC_INT_MASK\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(qm, QM_IFC_INT_SET_V, &vf_data->ifc_int_set, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_IFC_INT_SET_V\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(qm, QM_QUE_ISO_CFG_V, &vf_data->que_iso_cfg, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_QUE_ISO_CFG_V\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(qm, QM_PAGE_SIZE, &vf_data->page_size, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_PAGE_SIZE\n");
+ return ret;
+ }
+
+ /* QM_EQC_DW has 7 regs */
+ ret = qm_write_regs(qm, QM_EQC_DW0, vf_data->qm_eqc_dw, 7);
+ if (ret) {
+ dev_err(dev, "failed to write QM_EQC_DW\n");
+ return ret;
+ }
+
+ /* QM_AEQC_DW has 7 regs */
+ ret = qm_write_regs(qm, QM_AEQC_DW0, vf_data->qm_aeqc_dw, 7);
+ if (ret) {
+ dev_err(dev, "failed to write QM_AEQC_DW\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static void qm_db(struct hisi_qm *qm, u16 qn, u8 cmd,
+ u16 index, u8 priority)
+{
+ u64 doorbell;
+ u64 dbase;
+ u16 randata = 0;
+
+ if (cmd == QM_DOORBELL_CMD_SQ || cmd == QM_DOORBELL_CMD_CQ)
+ dbase = QM_DOORBELL_SQ_CQ_BASE_V2;
+ else
+ dbase = QM_DOORBELL_EQ_AEQ_BASE_V2;
+
+ doorbell = qn | ((u64)cmd << QM_DB_CMD_SHIFT_V2) |
+ ((u64)randata << QM_DB_RAND_SHIFT_V2) |
+ ((u64)index << QM_DB_INDEX_SHIFT_V2) |
+ ((u64)priority << QM_DB_PRIORITY_SHIFT_V2);
+
+ writeq(doorbell, qm->io_base + dbase);
+}
+
+static int pf_qm_get_qp_num(struct hisi_qm *qm, int vf_id, u32 *rbase)
+{
+ unsigned int val;
+ u64 sqc_vft;
+ u32 qp_num;
+ int ret;
+
+ ret = readl_relaxed_poll_timeout(qm->io_base + QM_VFT_CFG_RDY, val,
+ val & BIT(0), MB_POLL_PERIOD_US,
+ MB_POLL_TIMEOUT_US);
+ if (ret)
+ return ret;
+
+ writel(0x1, qm->io_base + QM_VFT_CFG_OP_WR);
+ /* 0 mean SQC VFT */
+ writel(0x0, qm->io_base + QM_VFT_CFG_TYPE);
+ writel(vf_id, qm->io_base + QM_VFT_CFG);
+
+ writel(0x0, qm->io_base + QM_VFT_CFG_RDY);
+ writel(0x1, qm->io_base + QM_VFT_CFG_OP_ENABLE);
+
+ ret = readl_relaxed_poll_timeout(qm->io_base + QM_VFT_CFG_RDY, val,
+ val & BIT(0), MB_POLL_PERIOD_US,
+ MB_POLL_TIMEOUT_US);
+ if (ret)
+ return ret;
+
+ sqc_vft = readl(qm->io_base + QM_VFT_CFG_DATA_L) |
+ ((u64)readl(qm->io_base + QM_VFT_CFG_DATA_H) <<
+ QM_XQC_ADDR_OFFSET);
+ *rbase = QM_SQC_VFT_BASE_MASK_V2 &
+ (sqc_vft >> QM_SQC_VFT_BASE_SHIFT_V2);
+ qp_num = (QM_SQC_VFT_NUM_MASK_V2 &
+ (sqc_vft >> QM_SQC_VFT_NUM_SHIFT_V2)) + 1;
+
+ return qp_num;
+}
+
+static void qm_dev_cmd_init(struct hisi_qm *qm)
+{
+ /* Clear VF communication status registers. */
+ writel(0x1, qm->io_base + QM_IFC_INT_SOURCE_V);
+
+ /* Enable pf and vf communication. */
+ writel(0x0, qm->io_base + QM_IFC_INT_MASK);
+}
+
+static int vf_qm_cache_wb(struct hisi_qm *qm)
+{
+ unsigned int val;
+
+ writel(0x1, qm->io_base + QM_CACHE_WB_START);
+ if (readl_relaxed_poll_timeout(qm->io_base + QM_CACHE_WB_DONE,
+ val, val & BIT(0), MB_POLL_PERIOD_US,
+ MB_POLL_TIMEOUT_US)) {
+ dev_err(&qm->pdev->dev, "vf QM writeback sqc cache fail\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void vf_qm_fun_reset(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct hisi_qm *qm)
+{
+ int i;
+
+ for (i = 0; i < qm->qp_num; i++)
+ qm_db(qm, i, QM_DOORBELL_CMD_SQ, 0, 1);
+}
+
+static int vf_qm_func_stop(struct hisi_qm *qm)
+{
+ return hisi_qm_mb(qm, QM_MB_CMD_PAUSE_QM, 0, 0, 0);
+}
+
+static int vf_qm_check_match(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct hisi_acc_vf_migration_file *migf)
+{
+ struct acc_vf_data *vf_data = &migf->vf_data;
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
+ struct device *dev = &vf_qm->pdev->dev;
+ u32 que_iso_state;
+ int ret;
+
+ if (migf->total_length < QM_MATCH_SIZE)
+ return -EINVAL;
+
+ if (vf_data->acc_magic != ACC_DEV_MAGIC) {
+ dev_err(dev, "failed to match ACC_DEV_MAGIC\n");
+ return -EINVAL;
+ }
+
+ if (vf_data->dev_id != hisi_acc_vdev->vf_dev->device) {
+ dev_err(dev, "failed to match VF devices\n");
+ return -EINVAL;
+ }
+
+ /* vf qp num check */
+ ret = qm_get_vft(vf_qm, &vf_qm->qp_base);
+ if (ret <= 0) {
+ dev_err(dev, "failed to get vft qp nums\n");
+ return -EINVAL;
+ }
+
+ if (ret != vf_data->qp_num) {
+ dev_err(dev, "failed to match VF qp num\n");
+ return -EINVAL;
+ }
+
+ vf_qm->qp_num = ret;
+
+ /* vf isolation state check */
+ ret = qm_read_regs(pf_qm, QM_QUE_ISO_CFG_V, &que_iso_state, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_QUE_ISO_CFG_V\n");
+ return ret;
+ }
+
+ if (vf_data->que_iso_cfg != que_iso_state) {
+ dev_err(dev, "failed to match isolation state\n");
+ return ret;
+ }
+
+ ret = qm_write_regs(vf_qm, QM_VF_STATE, &vf_data->vf_qm_state, 1);
+ if (ret) {
+ dev_err(dev, "failed to write QM_VF_STATE\n");
+ return ret;
+ }
+
+ hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
+ return 0;
+}
+
+static int vf_qm_get_match_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct acc_vf_data *vf_data)
+{
+ struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm;
+ struct device *dev = &pf_qm->pdev->dev;
+ int vf_id = hisi_acc_vdev->vf_id;
+ int ret;
+
+ vf_data->acc_magic = ACC_DEV_MAGIC;
+ /* save device id */
+ vf_data->dev_id = hisi_acc_vdev->vf_dev->device;
+
+ /* vf qp num save from PF */
+ ret = pf_qm_get_qp_num(pf_qm, vf_id, &vf_data->qp_base);
+ if (ret <= 0) {
+ dev_err(dev, "failed to get vft qp nums!\n");
+ return -EINVAL;
+ }
+
+ vf_data->qp_num = ret;
+
+ /* VF isolation state save from PF */
+ ret = qm_read_regs(pf_qm, QM_QUE_ISO_CFG_V, &vf_data->que_iso_cfg, 1);
+ if (ret) {
+ dev_err(dev, "failed to read QM_QUE_ISO_CFG_V!\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static int vf_qm_load_data(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct hisi_acc_vf_migration_file *migf)
+{
+ struct hisi_qm *qm = &hisi_acc_vdev->vf_qm;
+ struct device *dev = &qm->pdev->dev;
+ struct acc_vf_data *vf_data = &migf->vf_data;
+ int ret;
+
+ /* Return if only match data was transferred */
+ if (migf->total_length == QM_MATCH_SIZE)
+ return 0;
+
+ if (migf->total_length < sizeof(struct acc_vf_data))
+ return -EINVAL;
+
+ qm->eqe_dma = vf_data->eqe_dma;
+ qm->aeqe_dma = vf_data->aeqe_dma;
+ qm->sqc_dma = vf_data->sqc_dma;
+ qm->cqc_dma = vf_data->cqc_dma;
+
+ qm->qp_base = vf_data->qp_base;
+ qm->qp_num = vf_data->qp_num;
+
+ ret = qm_set_regs(qm, vf_data);
+ if (ret) {
+ dev_err(dev, "Set VF regs failed\n");
+ return ret;
+ }
+
+ ret = hisi_qm_mb(qm, QM_MB_CMD_SQC_BT, qm->sqc_dma, 0, 0);
+ if (ret) {
+ dev_err(dev, "Set sqc failed\n");
+ return ret;
+ }
+
+ ret = hisi_qm_mb(qm, QM_MB_CMD_CQC_BT, qm->cqc_dma, 0, 0);
+ if (ret) {
+ dev_err(dev, "Set cqc failed\n");
+ return ret;
+ }
+
+ qm_dev_cmd_init(qm);
+ return 0;
+}
+
+static int vf_qm_state_save(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct hisi_acc_vf_migration_file *migf)
+{
+ struct acc_vf_data *vf_data = &migf->vf_data;
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ struct device *dev = &vf_qm->pdev->dev;
+ int ret;
+
+ ret = vf_qm_get_match_data(hisi_acc_vdev, vf_data);
+ if (ret)
+ return ret;
+
+ if (unlikely(qm_wait_dev_not_ready(vf_qm))) {
+ /* Update state and return with match data */
+ vf_data->vf_qm_state = QM_NOT_READY;
+ hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
+ migf->total_length = QM_MATCH_SIZE;
+ return 0;
+ }
+
+ vf_data->vf_qm_state = QM_READY;
+ hisi_acc_vdev->vf_qm_state = vf_data->vf_qm_state;
+
+ ret = vf_qm_cache_wb(vf_qm);
+ if (ret) {
+ dev_err(dev, "failed to writeback QM Cache!\n");
+ return ret;
+ }
+
+ ret = qm_get_regs(vf_qm, vf_data);
+ if (ret)
+ return -EINVAL;
+
+ /* Every reg is 32 bit, the dma address is 64 bit. */
+ vf_data->eqe_dma = vf_data->qm_eqc_dw[2];
+ vf_data->eqe_dma <<= QM_XQC_ADDR_OFFSET;
+ vf_data->eqe_dma |= vf_data->qm_eqc_dw[1];
+ vf_data->aeqe_dma = vf_data->qm_aeqc_dw[2];
+ vf_data->aeqe_dma <<= QM_XQC_ADDR_OFFSET;
+ vf_data->aeqe_dma |= vf_data->qm_aeqc_dw[1];
+
+ /* Through SQC_BT/CQC_BT to get sqc and cqc address */
+ ret = qm_get_sqc(vf_qm, &vf_data->sqc_dma);
+ if (ret) {
+ dev_err(dev, "failed to read SQC addr!\n");
+ return -EINVAL;
+ }
+
+ ret = qm_get_cqc(vf_qm, &vf_data->cqc_dma);
+ if (ret) {
+ dev_err(dev, "failed to read CQC addr!\n");
+ return -EINVAL;
+ }
+
+ migf->total_length = sizeof(struct acc_vf_data);
+ return 0;
+}
+
+/* Check the PF's RAS state and Function INT state */
+static int
+hisi_acc_check_int_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct hisi_qm *vfqm = &hisi_acc_vdev->vf_qm;
+ struct hisi_qm *qm = hisi_acc_vdev->pf_qm;
+ struct pci_dev *vf_pdev = hisi_acc_vdev->vf_dev;
+ struct device *dev = &qm->pdev->dev;
+ u32 state;
+
+ /* Check RAS state */
+ state = qm_check_reg_state(qm, QM_ABNORMAL_INT_STATUS);
+ if (state) {
+ dev_err(dev, "failed to check QM RAS state!\n");
+ return -EBUSY;
+ }
+
+ /* Check Function Communication state between PF and VF */
+ state = qm_check_reg_state(vfqm, QM_IFC_INT_STATUS);
+ if (state) {
+ dev_err(dev, "failed to check QM IFC INT state!\n");
+ return -EBUSY;
+ }
+ state = qm_check_reg_state(vfqm, QM_IFC_INT_SET_V);
+ if (state) {
+ dev_err(dev, "failed to check QM IFC INT SET state!\n");
+ return -EBUSY;
+ }
+
+ /* Check submodule task state */
+ switch (vf_pdev->device) {
+ case PCI_DEVICE_ID_HUAWEI_SEC_VF:
+ state = qm_check_reg_state(qm, SEC_CORE_INT_STATUS);
+ if (state) {
+ dev_err(dev, "failed to check QM SEC Core INT state!\n");
+ return -EBUSY;
+ }
+ return 0;
+ case PCI_DEVICE_ID_HUAWEI_HPRE_VF:
+ state = qm_check_reg_state(qm, HPRE_HAC_INT_STATUS);
+ if (state) {
+ dev_err(dev, "failed to check QM HPRE HAC INT state!\n");
+ return -EBUSY;
+ }
+ return 0;
+ case PCI_DEVICE_ID_HUAWEI_ZIP_VF:
+ state = qm_check_reg_state(qm, HZIP_CORE_INT_STATUS);
+ if (state) {
+ dev_err(dev, "failed to check QM ZIP Core INT state!\n");
+ return -EBUSY;
+ }
+ return 0;
+ default:
+ dev_err(dev, "failed to detect acc module type!\n");
+ return -EINVAL;
+ }
+}
+
+static void hisi_acc_vf_disable_fd(struct hisi_acc_vf_migration_file *migf)
+{
+ mutex_lock(&migf->lock);
+ migf->disabled = true;
+ migf->total_length = 0;
+ migf->filp->f_pos = 0;
+ mutex_unlock(&migf->lock);
+}
+
+static void hisi_acc_vf_disable_fds(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ if (hisi_acc_vdev->resuming_migf) {
+ hisi_acc_vf_disable_fd(hisi_acc_vdev->resuming_migf);
+ fput(hisi_acc_vdev->resuming_migf->filp);
+ hisi_acc_vdev->resuming_migf = NULL;
+ }
+
+ if (hisi_acc_vdev->saving_migf) {
+ hisi_acc_vf_disable_fd(hisi_acc_vdev->saving_migf);
+ fput(hisi_acc_vdev->saving_migf->filp);
+ hisi_acc_vdev->saving_migf = NULL;
+ }
+}
+
+/*
+ * This function is called in all state_mutex unlock cases to
+ * handle a 'deferred_reset' if exists.
+ */
+static void
+hisi_acc_vf_state_mutex_unlock(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+again:
+ spin_lock(&hisi_acc_vdev->reset_lock);
+ if (hisi_acc_vdev->deferred_reset) {
+ hisi_acc_vdev->deferred_reset = false;
+ spin_unlock(&hisi_acc_vdev->reset_lock);
+ hisi_acc_vdev->vf_qm_state = QM_NOT_READY;
+ hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+ hisi_acc_vf_disable_fds(hisi_acc_vdev);
+ goto again;
+ }
+ mutex_unlock(&hisi_acc_vdev->state_mutex);
+ spin_unlock(&hisi_acc_vdev->reset_lock);
+}
+
+static void hisi_acc_vf_start_device(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+
+ if (hisi_acc_vdev->vf_qm_state != QM_READY)
+ return;
+
+ vf_qm_fun_reset(hisi_acc_vdev, vf_qm);
+}
+
+static int hisi_acc_vf_load_state(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct device *dev = &hisi_acc_vdev->vf_dev->dev;
+ struct hisi_acc_vf_migration_file *migf = hisi_acc_vdev->resuming_migf;
+ int ret;
+
+ /* Check dev compatibility */
+ ret = vf_qm_check_match(hisi_acc_vdev, migf);
+ if (ret) {
+ dev_err(dev, "failed to match the VF!\n");
+ return ret;
+ }
+ /* Recover data to VF */
+ ret = vf_qm_load_data(hisi_acc_vdev, migf);
+ if (ret) {
+ dev_err(dev, "failed to recover the VF!\n");
+ return ret;
+ }
+
+ return 0;
+}
+
+static int hisi_acc_vf_release_file(struct inode *inode, struct file *filp)
+{
+ struct hisi_acc_vf_migration_file *migf = filp->private_data;
+
+ hisi_acc_vf_disable_fd(migf);
+ mutex_destroy(&migf->lock);
+ kfree(migf);
+ return 0;
+}
+
+static ssize_t hisi_acc_vf_resume_write(struct file *filp, const char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct hisi_acc_vf_migration_file *migf = filp->private_data;
+ loff_t requested_length;
+ ssize_t done = 0;
+ int ret;
+
+ if (pos)
+ return -ESPIPE;
+ pos = &filp->f_pos;
+
+ if (*pos < 0 ||
+ check_add_overflow((loff_t)len, *pos, &requested_length))
+ return -EINVAL;
+
+ if (requested_length > sizeof(struct acc_vf_data))
+ return -ENOMEM;
+
+ mutex_lock(&migf->lock);
+ if (migf->disabled) {
+ done = -ENODEV;
+ goto out_unlock;
+ }
+
+ ret = copy_from_user(&migf->vf_data, buf, len);
+ if (ret) {
+ done = -EFAULT;
+ goto out_unlock;
+ }
+ *pos += len;
+ done = len;
+ migf->total_length += len;
+out_unlock:
+ mutex_unlock(&migf->lock);
+ return done;
+}
+
+static const struct file_operations hisi_acc_vf_resume_fops = {
+ .owner = THIS_MODULE,
+ .write = hisi_acc_vf_resume_write,
+ .release = hisi_acc_vf_release_file,
+ .llseek = no_llseek,
+};
+
+static struct hisi_acc_vf_migration_file *
+hisi_acc_vf_pci_resume(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct hisi_acc_vf_migration_file *migf;
+
+ migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+ if (!migf)
+ return ERR_PTR(-ENOMEM);
+
+ migf->filp = anon_inode_getfile("hisi_acc_vf_mig", &hisi_acc_vf_resume_fops, migf,
+ O_WRONLY);
+ if (IS_ERR(migf->filp)) {
+ int err = PTR_ERR(migf->filp);
+
+ kfree(migf);
+ return ERR_PTR(err);
+ }
+
+ stream_open(migf->filp->f_inode, migf->filp);
+ mutex_init(&migf->lock);
+ return migf;
+}
+
+static ssize_t hisi_acc_vf_save_read(struct file *filp, char __user *buf, size_t len,
+ loff_t *pos)
+{
+ struct hisi_acc_vf_migration_file *migf = filp->private_data;
+ ssize_t done = 0;
+ int ret;
+
+ if (pos)
+ return -ESPIPE;
+ pos = &filp->f_pos;
+
+ mutex_lock(&migf->lock);
+ if (*pos > migf->total_length) {
+ done = -EINVAL;
+ goto out_unlock;
+ }
+
+ if (migf->disabled) {
+ done = -ENODEV;
+ goto out_unlock;
+ }
+
+ len = min_t(size_t, migf->total_length - *pos, len);
+ if (len) {
+ ret = copy_to_user(buf, &migf->vf_data, len);
+ if (ret) {
+ done = -EFAULT;
+ goto out_unlock;
+ }
+ *pos += len;
+ done = len;
+ }
+out_unlock:
+ mutex_unlock(&migf->lock);
+ return done;
+}
+
+static const struct file_operations hisi_acc_vf_save_fops = {
+ .owner = THIS_MODULE,
+ .read = hisi_acc_vf_save_read,
+ .release = hisi_acc_vf_release_file,
+ .llseek = no_llseek,
+};
+
+static struct hisi_acc_vf_migration_file *
+hisi_acc_vf_stop_copy(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct hisi_acc_vf_migration_file *migf;
+ int ret;
+
+ migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+ if (!migf)
+ return ERR_PTR(-ENOMEM);
+
+ migf->filp = anon_inode_getfile("hisi_acc_vf_mig", &hisi_acc_vf_save_fops, migf,
+ O_RDONLY);
+ if (IS_ERR(migf->filp)) {
+ int err = PTR_ERR(migf->filp);
+
+ kfree(migf);
+ return ERR_PTR(err);
+ }
+
+ stream_open(migf->filp->f_inode, migf->filp);
+ mutex_init(&migf->lock);
+
+ ret = vf_qm_state_save(hisi_acc_vdev, migf);
+ if (ret) {
+ fput(migf->filp);
+ return ERR_PTR(ret);
+ }
+
+ return migf;
+}
+
+static int hisi_acc_vf_stop_device(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct device *dev = &hisi_acc_vdev->vf_dev->dev;
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ int ret;
+
+ ret = vf_qm_func_stop(vf_qm);
+ if (ret) {
+ dev_err(dev, "failed to stop QM VF function!\n");
+ return ret;
+ }
+
+ ret = hisi_acc_check_int_state(hisi_acc_vdev);
+ if (ret) {
+ dev_err(dev, "failed to check QM INT state!\n");
+ return ret;
+ }
+ return 0;
+}
+
+static struct file *
+hisi_acc_vf_set_device_state(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ u32 new)
+{
+ u32 cur = hisi_acc_vdev->mig_state;
+ int ret;
+
+ if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_STOP) {
+ ret = hisi_acc_vf_stop_device(hisi_acc_vdev);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) {
+ struct hisi_acc_vf_migration_file *migf;
+
+ migf = hisi_acc_vf_stop_copy(hisi_acc_vdev);
+ if (IS_ERR(migf))
+ return ERR_CAST(migf);
+ get_file(migf->filp);
+ hisi_acc_vdev->saving_migf = migf;
+ return migf->filp;
+ }
+
+ if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP)) {
+ hisi_acc_vf_disable_fds(hisi_acc_vdev);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
+ struct hisi_acc_vf_migration_file *migf;
+
+ migf = hisi_acc_vf_pci_resume(hisi_acc_vdev);
+ if (IS_ERR(migf))
+ return ERR_CAST(migf);
+ get_file(migf->filp);
+ hisi_acc_vdev->resuming_migf = migf;
+ return migf->filp;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
+ ret = hisi_acc_vf_load_state(hisi_acc_vdev);
+ if (ret)
+ return ERR_PTR(ret);
+ hisi_acc_vf_disable_fds(hisi_acc_vdev);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RUNNING) {
+ hisi_acc_vf_start_device(hisi_acc_vdev);
+ return NULL;
+ }
+
+ /*
+ * vfio_mig_get_next_state() does not use arcs other than the above
+ */
+ WARN_ON(true);
+ return ERR_PTR(-EINVAL);
+}
+
+static struct file *
+hisi_acc_vfio_pci_set_device_state(struct vfio_device *vdev,
+ enum vfio_device_mig_state new_state)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
+ struct hisi_acc_vf_core_device, core_device.vdev);
+ enum vfio_device_mig_state next_state;
+ struct file *res = NULL;
+ int ret;
+
+ mutex_lock(&hisi_acc_vdev->state_mutex);
+ while (new_state != hisi_acc_vdev->mig_state) {
+ ret = vfio_mig_get_next_state(vdev,
+ hisi_acc_vdev->mig_state,
+ new_state, &next_state);
+ if (ret) {
+ res = ERR_PTR(-EINVAL);
+ break;
+ }
+
+ res = hisi_acc_vf_set_device_state(hisi_acc_vdev, next_state);
+ if (IS_ERR(res))
+ break;
+ hisi_acc_vdev->mig_state = next_state;
+ if (WARN_ON(res && new_state != hisi_acc_vdev->mig_state)) {
+ fput(res);
+ res = ERR_PTR(-EINVAL);
+ break;
+ }
+ }
+ hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
+ return res;
+}
+
+static int
+hisi_acc_vfio_pci_get_device_state(struct vfio_device *vdev,
+ enum vfio_device_mig_state *curr_state)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(vdev,
+ struct hisi_acc_vf_core_device, core_device.vdev);
+
+ mutex_lock(&hisi_acc_vdev->state_mutex);
+ *curr_state = hisi_acc_vdev->mig_state;
+ hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
+ return 0;
+}
+
+static void hisi_acc_vf_pci_aer_reset_done(struct pci_dev *pdev)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+
+ if (hisi_acc_vdev->core_device.vdev.migration_flags !=
+ VFIO_MIGRATION_STOP_COPY)
+ return;
+
+ /*
+ * As the higher VFIO layers are holding locks across reset and using
+ * those same locks with the mm_lock we need to prevent ABBA deadlock
+ * with the state_mutex and mm_lock.
+ * In case the state_mutex was taken already we defer the cleanup work
+ * to the unlock flow of the other running context.
+ */
+ spin_lock(&hisi_acc_vdev->reset_lock);
+ hisi_acc_vdev->deferred_reset = true;
+ if (!mutex_trylock(&hisi_acc_vdev->state_mutex)) {
+ spin_unlock(&hisi_acc_vdev->reset_lock);
+ return;
+ }
+ spin_unlock(&hisi_acc_vdev->reset_lock);
+ hisi_acc_vf_state_mutex_unlock(hisi_acc_vdev);
+}
+
+static int hisi_acc_vf_qm_init(struct hisi_acc_vf_core_device *hisi_acc_vdev)
+{
+ struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+ struct pci_dev *vf_dev = vdev->pdev;
+
+ /*
+ * ACC VF dev BAR2 region consists of both functional register space
+ * and migration control register space. For migration to work, we
+ * need access to both. Hence, we map the entire BAR2 region here.
+ * But unnecessarily exposing the migration BAR region to the Guest
+ * has the potential to prevent/corrupt the Guest migration. Hence,
+ * we restrict access to the migration control space from
+ * Guest(Please see mmap/ioctl/read/write override functions).
+ *
+ * Please note that it is OK to expose the entire VF BAR if migration
+ * is not supported or required as this cannot affect the ACC PF
+ * configurations.
+ *
+ * Also the HiSilicon ACC VF devices supported by this driver on
+ * HiSilicon hardware platforms are integrated end point devices
+ * and the platform lacks the capability to perform any PCIe P2P
+ * between these devices.
+ */
+
+ vf_qm->io_base =
+ ioremap(pci_resource_start(vf_dev, VFIO_PCI_BAR2_REGION_INDEX),
+ pci_resource_len(vf_dev, VFIO_PCI_BAR2_REGION_INDEX));
+ if (!vf_qm->io_base)
+ return -EIO;
+
+ vf_qm->fun_type = QM_HW_VF;
+ vf_qm->pdev = vf_dev;
+ mutex_init(&vf_qm->mailbox_lock);
+
+ return 0;
+}
+
+static struct hisi_qm *hisi_acc_get_pf_qm(struct pci_dev *pdev)
+{
+ struct hisi_qm *pf_qm;
+ struct pci_driver *pf_driver;
+
+ if (!pdev->is_virtfn)
+ return NULL;
+
+ switch (pdev->device) {
+ case PCI_DEVICE_ID_HUAWEI_SEC_VF:
+ pf_driver = hisi_sec_get_pf_driver();
+ break;
+ case PCI_DEVICE_ID_HUAWEI_HPRE_VF:
+ pf_driver = hisi_hpre_get_pf_driver();
+ break;
+ case PCI_DEVICE_ID_HUAWEI_ZIP_VF:
+ pf_driver = hisi_zip_get_pf_driver();
+ break;
+ default:
+ return NULL;
+ }
+
+ if (!pf_driver)
+ return NULL;
+
+ pf_qm = pci_iov_get_pf_drvdata(pdev, pf_driver);
+
+ return !IS_ERR(pf_qm) ? pf_qm : NULL;
+}
+
+static int hisi_acc_pci_rw_access_check(struct vfio_device *core_vdev,
+ size_t count, loff_t *ppos,
+ size_t *new_count)
+{
+ unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
+
+ if (index == VFIO_PCI_BAR2_REGION_INDEX) {
+ loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+ resource_size_t end = pci_resource_len(vdev->pdev, index) / 2;
+
+ /* Check if access is for migration control region */
+ if (pos >= end)
+ return -EINVAL;
+
+ *new_count = min(count, (size_t)(end - pos));
+ }
+
+ return 0;
+}
+
+static int hisi_acc_vfio_pci_mmap(struct vfio_device *core_vdev,
+ struct vm_area_struct *vma)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
+ unsigned int index;
+
+ index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
+ if (index == VFIO_PCI_BAR2_REGION_INDEX) {
+ u64 req_len, pgoff, req_start;
+ resource_size_t end = pci_resource_len(vdev->pdev, index) / 2;
+
+ req_len = vma->vm_end - vma->vm_start;
+ pgoff = vma->vm_pgoff &
+ ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+ req_start = pgoff << PAGE_SHIFT;
+
+ if (req_start + req_len > end)
+ return -EINVAL;
+ }
+
+ return vfio_pci_core_mmap(core_vdev, vma);
+}
+
+static ssize_t hisi_acc_vfio_pci_write(struct vfio_device *core_vdev,
+ const char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ size_t new_count = count;
+ int ret;
+
+ ret = hisi_acc_pci_rw_access_check(core_vdev, count, ppos, &new_count);
+ if (ret)
+ return ret;
+
+ return vfio_pci_core_write(core_vdev, buf, new_count, ppos);
+}
+
+static ssize_t hisi_acc_vfio_pci_read(struct vfio_device *core_vdev,
+ char __user *buf, size_t count,
+ loff_t *ppos)
+{
+ size_t new_count = count;
+ int ret;
+
+ ret = hisi_acc_pci_rw_access_check(core_vdev, count, ppos, &new_count);
+ if (ret)
+ return ret;
+
+ return vfio_pci_core_read(core_vdev, buf, new_count, ppos);
+}
+
+static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
+ unsigned long arg)
+{
+ if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
+ struct vfio_pci_core_device *vdev =
+ container_of(core_vdev, struct vfio_pci_core_device, vdev);
+ struct pci_dev *pdev = vdev->pdev;
+ struct vfio_region_info info;
+ unsigned long minsz;
+
+ minsz = offsetofend(struct vfio_region_info, offset);
+
+ if (copy_from_user(&info, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (info.argsz < minsz)
+ return -EINVAL;
+
+ if (info.index == VFIO_PCI_BAR2_REGION_INDEX) {
+ info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+
+ /*
+ * ACC VF dev BAR2 region consists of both functional
+ * register space and migration control register space.
+ * Report only the functional region to Guest.
+ */
+ info.size = pci_resource_len(pdev, info.index) / 2;
+
+ info.flags = VFIO_REGION_INFO_FLAG_READ |
+ VFIO_REGION_INFO_FLAG_WRITE |
+ VFIO_REGION_INFO_FLAG_MMAP;
+
+ return copy_to_user((void __user *)arg, &info, minsz) ?
+ -EFAULT : 0;
+ }
+ }
+ return vfio_pci_core_ioctl(core_vdev, cmd, arg);
+}
+
+static int hisi_acc_vfio_pci_open_device(struct vfio_device *core_vdev)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
+ struct hisi_acc_vf_core_device, core_device.vdev);
+ struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device;
+ int ret;
+
+ ret = vfio_pci_core_enable(vdev);
+ if (ret)
+ return ret;
+
+ if (core_vdev->ops->migration_set_state) {
+ ret = hisi_acc_vf_qm_init(hisi_acc_vdev);
+ if (ret) {
+ vfio_pci_core_disable(vdev);
+ return ret;
+ }
+ hisi_acc_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+ }
+
+ vfio_pci_core_finish_enable(vdev);
+ return 0;
+}
+
+static void hisi_acc_vfio_pci_close_device(struct vfio_device *core_vdev)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = container_of(core_vdev,
+ struct hisi_acc_vf_core_device, core_device.vdev);
+ struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm;
+
+ iounmap(vf_qm->io_base);
+ vfio_pci_core_close_device(core_vdev);
+}
+
+static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
+ .name = "hisi-acc-vfio-pci-migration",
+ .open_device = hisi_acc_vfio_pci_open_device,
+ .close_device = hisi_acc_vfio_pci_close_device,
+ .ioctl = hisi_acc_vfio_pci_ioctl,
+ .device_feature = vfio_pci_core_ioctl_feature,
+ .read = hisi_acc_vfio_pci_read,
+ .write = hisi_acc_vfio_pci_write,
+ .mmap = hisi_acc_vfio_pci_mmap,
+ .request = vfio_pci_core_request,
+ .match = vfio_pci_core_match,
+ .migration_set_state = hisi_acc_vfio_pci_set_device_state,
+ .migration_get_state = hisi_acc_vfio_pci_get_device_state,
+};
+
+static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
+ .name = "hisi-acc-vfio-pci",
+ .open_device = hisi_acc_vfio_pci_open_device,
+ .close_device = vfio_pci_core_close_device,
+ .ioctl = vfio_pci_core_ioctl,
+ .device_feature = vfio_pci_core_ioctl_feature,
+ .read = vfio_pci_core_read,
+ .write = vfio_pci_core_write,
+ .mmap = vfio_pci_core_mmap,
+ .request = vfio_pci_core_request,
+ .match = vfio_pci_core_match,
+};
+
+static int
+hisi_acc_vfio_pci_migrn_init(struct hisi_acc_vf_core_device *hisi_acc_vdev,
+ struct pci_dev *pdev, struct hisi_qm *pf_qm)
+{
+ int vf_id;
+
+ vf_id = pci_iov_vf_id(pdev);
+ if (vf_id < 0)
+ return vf_id;
+
+ hisi_acc_vdev->vf_id = vf_id + 1;
+ hisi_acc_vdev->core_device.vdev.migration_flags =
+ VFIO_MIGRATION_STOP_COPY;
+ hisi_acc_vdev->pf_qm = pf_qm;
+ hisi_acc_vdev->vf_dev = pdev;
+ mutex_init(&hisi_acc_vdev->state_mutex);
+
+ return 0;
+}
+
+static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev;
+ struct hisi_qm *pf_qm;
+ int ret;
+
+ hisi_acc_vdev = kzalloc(sizeof(*hisi_acc_vdev), GFP_KERNEL);
+ if (!hisi_acc_vdev)
+ return -ENOMEM;
+
+ pf_qm = hisi_acc_get_pf_qm(pdev);
+ if (pf_qm && pf_qm->ver >= QM_HW_V3) {
+ ret = hisi_acc_vfio_pci_migrn_init(hisi_acc_vdev, pdev, pf_qm);
+ if (!ret) {
+ vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
+ &hisi_acc_vfio_pci_migrn_ops);
+ } else {
+ pci_warn(pdev, "migration support failed, continue with generic interface\n");
+ vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
+ &hisi_acc_vfio_pci_ops);
+ }
+ } else {
+ vfio_pci_core_init_device(&hisi_acc_vdev->core_device, pdev,
+ &hisi_acc_vfio_pci_ops);
+ }
+
+ ret = vfio_pci_core_register_device(&hisi_acc_vdev->core_device);
+ if (ret)
+ goto out_free;
+
+ dev_set_drvdata(&pdev->dev, hisi_acc_vdev);
+ return 0;
+
+out_free:
+ vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
+ kfree(hisi_acc_vdev);
+ return ret;
+}
+
+static void hisi_acc_vfio_pci_remove(struct pci_dev *pdev)
+{
+ struct hisi_acc_vf_core_device *hisi_acc_vdev = dev_get_drvdata(&pdev->dev);
+
+ vfio_pci_core_unregister_device(&hisi_acc_vdev->core_device);
+ vfio_pci_core_uninit_device(&hisi_acc_vdev->core_device);
+ kfree(hisi_acc_vdev);
+}
+
+static const struct pci_device_id hisi_acc_vfio_pci_table[] = {
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_SEC_VF) },
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_HPRE_VF) },
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_HUAWEI, PCI_DEVICE_ID_HUAWEI_ZIP_VF) },
+ { }
+};
+
+MODULE_DEVICE_TABLE(pci, hisi_acc_vfio_pci_table);
+
+static const struct pci_error_handlers hisi_acc_vf_err_handlers = {
+ .reset_done = hisi_acc_vf_pci_aer_reset_done,
+ .error_detected = vfio_pci_core_aer_err_detected,
+};
+
+static struct pci_driver hisi_acc_vfio_pci_driver = {
+ .name = KBUILD_MODNAME,
+ .id_table = hisi_acc_vfio_pci_table,
+ .probe = hisi_acc_vfio_pci_probe,
+ .remove = hisi_acc_vfio_pci_remove,
+ .err_handler = &hisi_acc_vf_err_handlers,
+};
+
+module_pci_driver(hisi_acc_vfio_pci_driver);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Liu Longfang <liulongfang@huawei.com>");
+MODULE_AUTHOR("Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>");
+MODULE_DESCRIPTION("HiSilicon VFIO PCI - VFIO PCI driver with live migration support for HiSilicon ACC device family");
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
new file mode 100644
index 000000000000..5494f4983bbe
--- /dev/null
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2021 HiSilicon Ltd. */
+
+#ifndef HISI_ACC_VFIO_PCI_H
+#define HISI_ACC_VFIO_PCI_H
+
+#include <linux/hisi_acc_qm.h>
+
+#define MB_POLL_PERIOD_US 10
+#define MB_POLL_TIMEOUT_US 1000
+#define QM_CACHE_WB_START 0x204
+#define QM_CACHE_WB_DONE 0x208
+#define QM_MB_CMD_PAUSE_QM 0xe
+#define QM_ABNORMAL_INT_STATUS 0x100008
+#define QM_IFC_INT_STATUS 0x0028
+#define SEC_CORE_INT_STATUS 0x301008
+#define HPRE_HAC_INT_STATUS 0x301800
+#define HZIP_CORE_INT_STATUS 0x3010AC
+#define QM_QUE_ISO_CFG 0x301154
+
+#define QM_VFT_CFG_RDY 0x10006c
+#define QM_VFT_CFG_OP_WR 0x100058
+#define QM_VFT_CFG_TYPE 0x10005c
+#define QM_VFT_CFG 0x100060
+#define QM_VFT_CFG_OP_ENABLE 0x100054
+#define QM_VFT_CFG_DATA_L 0x100064
+#define QM_VFT_CFG_DATA_H 0x100068
+
+#define ERROR_CHECK_TIMEOUT 100
+#define CHECK_DELAY_TIME 100
+
+#define QM_SQC_VFT_BASE_SHIFT_V2 28
+#define QM_SQC_VFT_BASE_MASK_V2 GENMASK(15, 0)
+#define QM_SQC_VFT_NUM_SHIFT_V2 45
+#define QM_SQC_VFT_NUM_MASK_V2 GENMASK(9, 0)
+
+/* RW regs */
+#define QM_REGS_MAX_LEN 7
+#define QM_REG_ADDR_OFFSET 0x0004
+
+#define QM_XQC_ADDR_OFFSET 32U
+#define QM_VF_AEQ_INT_MASK 0x0004
+#define QM_VF_EQ_INT_MASK 0x000c
+#define QM_IFC_INT_SOURCE_V 0x0020
+#define QM_IFC_INT_MASK 0x0024
+#define QM_IFC_INT_SET_V 0x002c
+#define QM_QUE_ISO_CFG_V 0x0030
+#define QM_PAGE_SIZE 0x0034
+
+#define QM_EQC_DW0 0X8000
+#define QM_AEQC_DW0 0X8020
+
+struct acc_vf_data {
+#define QM_MATCH_SIZE offsetofend(struct acc_vf_data, qm_rsv_state)
+ /* QM match information */
+#define ACC_DEV_MAGIC 0XCDCDCDCDFEEDAACC
+ u64 acc_magic;
+ u32 qp_num;
+ u32 dev_id;
+ u32 que_iso_cfg;
+ u32 qp_base;
+ u32 vf_qm_state;
+ /* QM reserved match information */
+ u32 qm_rsv_state[3];
+
+ /* QM RW regs */
+ u32 aeq_int_mask;
+ u32 eq_int_mask;
+ u32 ifc_int_source;
+ u32 ifc_int_mask;
+ u32 ifc_int_set;
+ u32 page_size;
+
+ /* QM_EQC_DW has 7 regs */
+ u32 qm_eqc_dw[7];
+
+ /* QM_AEQC_DW has 7 regs */
+ u32 qm_aeqc_dw[7];
+
+ /* QM reserved 5 regs */
+ u32 qm_rsv_regs[5];
+ u32 padding;
+ /* qm memory init information */
+ u64 eqe_dma;
+ u64 aeqe_dma;
+ u64 sqc_dma;
+ u64 cqc_dma;
+};
+
+struct hisi_acc_vf_migration_file {
+ struct file *filp;
+ struct mutex lock;
+ bool disabled;
+
+ struct acc_vf_data vf_data;
+ size_t total_length;
+};
+
+struct hisi_acc_vf_core_device {
+ struct vfio_pci_core_device core_device;
+ u8 deferred_reset:1;
+ /* for migration state */
+ struct mutex state_mutex;
+ enum vfio_device_mig_state mig_state;
+ struct pci_dev *pf_dev;
+ struct pci_dev *vf_dev;
+ struct hisi_qm *pf_qm;
+ struct hisi_qm vf_qm;
+ u32 vf_qm_state;
+ int vf_id;
+ /* for reset handler */
+ spinlock_t reset_lock;
+ struct hisi_acc_vf_migration_file *resuming_migf;
+ struct hisi_acc_vf_migration_file *saving_migf;
+};
+#endif /* HISI_ACC_VFIO_PCI_H */
diff --git a/drivers/vfio/pci/mlx5/Kconfig b/drivers/vfio/pci/mlx5/Kconfig
new file mode 100644
index 000000000000..29ba9c504a75
--- /dev/null
+++ b/drivers/vfio/pci/mlx5/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config MLX5_VFIO_PCI
+ tristate "VFIO support for MLX5 PCI devices"
+ depends on MLX5_CORE
+ depends on VFIO_PCI_CORE
+ help
+ This provides migration support for MLX5 devices using the VFIO
+ framework.
+
+ If you don't know what to do here, say N.
diff --git a/drivers/vfio/pci/mlx5/Makefile b/drivers/vfio/pci/mlx5/Makefile
new file mode 100644
index 000000000000..689627da7ff5
--- /dev/null
+++ b/drivers/vfio/pci/mlx5/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5-vfio-pci.o
+mlx5-vfio-pci-y := main.o cmd.o
+
diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
new file mode 100644
index 000000000000..5c9f9218cc1d
--- /dev/null
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include "cmd.h"
+
+int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod)
+{
+ struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+ u32 out[MLX5_ST_SZ_DW(suspend_vhca_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(suspend_vhca_in)] = {};
+ int ret;
+
+ if (!mdev)
+ return -ENOTCONN;
+
+ MLX5_SET(suspend_vhca_in, in, opcode, MLX5_CMD_OP_SUSPEND_VHCA);
+ MLX5_SET(suspend_vhca_in, in, vhca_id, vhca_id);
+ MLX5_SET(suspend_vhca_in, in, op_mod, op_mod);
+
+ ret = mlx5_cmd_exec_inout(mdev, suspend_vhca, in, out);
+ mlx5_vf_put_core_dev(mdev);
+ return ret;
+}
+
+int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod)
+{
+ struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+ u32 out[MLX5_ST_SZ_DW(resume_vhca_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(resume_vhca_in)] = {};
+ int ret;
+
+ if (!mdev)
+ return -ENOTCONN;
+
+ MLX5_SET(resume_vhca_in, in, opcode, MLX5_CMD_OP_RESUME_VHCA);
+ MLX5_SET(resume_vhca_in, in, vhca_id, vhca_id);
+ MLX5_SET(resume_vhca_in, in, op_mod, op_mod);
+
+ ret = mlx5_cmd_exec_inout(mdev, resume_vhca, in, out);
+ mlx5_vf_put_core_dev(mdev);
+ return ret;
+}
+
+int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
+ size_t *state_size)
+{
+ struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+ u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {};
+ int ret;
+
+ if (!mdev)
+ return -ENOTCONN;
+
+ MLX5_SET(query_vhca_migration_state_in, in, opcode,
+ MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE);
+ MLX5_SET(query_vhca_migration_state_in, in, vhca_id, vhca_id);
+ MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0);
+
+ ret = mlx5_cmd_exec_inout(mdev, query_vhca_migration_state, in, out);
+ if (ret)
+ goto end;
+
+ *state_size = MLX5_GET(query_vhca_migration_state_out, out,
+ required_umem_size);
+
+end:
+ mlx5_vf_put_core_dev(mdev);
+ return ret;
+}
+
+int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id)
+{
+ struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+ u32 in[MLX5_ST_SZ_DW(query_hca_cap_in)] = {};
+ int out_size;
+ void *out;
+ int ret;
+
+ if (!mdev)
+ return -ENOTCONN;
+
+ out_size = MLX5_ST_SZ_BYTES(query_hca_cap_out);
+ out = kzalloc(out_size, GFP_KERNEL);
+ if (!out) {
+ ret = -ENOMEM;
+ goto end;
+ }
+
+ MLX5_SET(query_hca_cap_in, in, opcode, MLX5_CMD_OP_QUERY_HCA_CAP);
+ MLX5_SET(query_hca_cap_in, in, other_function, 1);
+ MLX5_SET(query_hca_cap_in, in, function_id, function_id);
+ MLX5_SET(query_hca_cap_in, in, op_mod,
+ MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE << 1 |
+ HCA_CAP_OPMOD_GET_CUR);
+
+ ret = mlx5_cmd_exec_inout(mdev, query_hca_cap, in, out);
+ if (ret)
+ goto err_exec;
+
+ *vhca_id = MLX5_GET(query_hca_cap_out, out,
+ capability.cmd_hca_cap.vhca_id);
+
+err_exec:
+ kfree(out);
+end:
+ mlx5_vf_put_core_dev(mdev);
+ return ret;
+}
+
+static int _create_state_mkey(struct mlx5_core_dev *mdev, u32 pdn,
+ struct mlx5_vf_migration_file *migf, u32 *mkey)
+{
+ size_t npages = DIV_ROUND_UP(migf->total_length, PAGE_SIZE);
+ struct sg_dma_page_iter dma_iter;
+ int err = 0, inlen;
+ __be64 *mtt;
+ void *mkc;
+ u32 *in;
+
+ inlen = MLX5_ST_SZ_BYTES(create_mkey_in) +
+ sizeof(*mtt) * round_up(npages, 2);
+
+ in = kvzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+ DIV_ROUND_UP(npages, 2));
+ mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
+
+ for_each_sgtable_dma_page(&migf->table.sgt, &dma_iter, 0)
+ *mtt++ = cpu_to_be64(sg_page_iter_dma_address(&dma_iter));
+
+ mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+ MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT);
+ MLX5_SET(mkc, mkc, lr, 1);
+ MLX5_SET(mkc, mkc, lw, 1);
+ MLX5_SET(mkc, mkc, rr, 1);
+ MLX5_SET(mkc, mkc, rw, 1);
+ MLX5_SET(mkc, mkc, pd, pdn);
+ MLX5_SET(mkc, mkc, bsf_octword_size, 0);
+ MLX5_SET(mkc, mkc, qpn, 0xffffff);
+ MLX5_SET(mkc, mkc, log_page_size, PAGE_SHIFT);
+ MLX5_SET(mkc, mkc, translations_octword_size, DIV_ROUND_UP(npages, 2));
+ MLX5_SET64(mkc, mkc, len, migf->total_length);
+ err = mlx5_core_create_mkey(mdev, mkey, in, inlen);
+ kvfree(in);
+ return err;
+}
+
+int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+ struct mlx5_vf_migration_file *migf)
+{
+ struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+ u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {};
+ u32 pdn, mkey;
+ int err;
+
+ if (!mdev)
+ return -ENOTCONN;
+
+ err = mlx5_core_alloc_pd(mdev, &pdn);
+ if (err)
+ goto end;
+
+ err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE,
+ 0);
+ if (err)
+ goto err_dma_map;
+
+ err = _create_state_mkey(mdev, pdn, migf, &mkey);
+ if (err)
+ goto err_create_mkey;
+
+ MLX5_SET(save_vhca_state_in, in, opcode,
+ MLX5_CMD_OP_SAVE_VHCA_STATE);
+ MLX5_SET(save_vhca_state_in, in, op_mod, 0);
+ MLX5_SET(save_vhca_state_in, in, vhca_id, vhca_id);
+ MLX5_SET(save_vhca_state_in, in, mkey, mkey);
+ MLX5_SET(save_vhca_state_in, in, size, migf->total_length);
+
+ err = mlx5_cmd_exec_inout(mdev, save_vhca_state, in, out);
+ if (err)
+ goto err_exec;
+
+ migf->total_length =
+ MLX5_GET(save_vhca_state_out, out, actual_image_size);
+
+ mlx5_core_destroy_mkey(mdev, mkey);
+ mlx5_core_dealloc_pd(mdev, pdn);
+ dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0);
+ mlx5_vf_put_core_dev(mdev);
+
+ return 0;
+
+err_exec:
+ mlx5_core_destroy_mkey(mdev, mkey);
+err_create_mkey:
+ dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_FROM_DEVICE, 0);
+err_dma_map:
+ mlx5_core_dealloc_pd(mdev, pdn);
+end:
+ mlx5_vf_put_core_dev(mdev);
+ return err;
+}
+
+int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+ struct mlx5_vf_migration_file *migf)
+{
+ struct mlx5_core_dev *mdev = mlx5_vf_get_core_dev(pdev);
+ u32 out[MLX5_ST_SZ_DW(save_vhca_state_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(save_vhca_state_in)] = {};
+ u32 pdn, mkey;
+ int err;
+
+ if (!mdev)
+ return -ENOTCONN;
+
+ mutex_lock(&migf->lock);
+ if (!migf->total_length) {
+ err = -EINVAL;
+ goto end;
+ }
+
+ err = mlx5_core_alloc_pd(mdev, &pdn);
+ if (err)
+ goto end;
+
+ err = dma_map_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0);
+ if (err)
+ goto err_reg;
+
+ err = _create_state_mkey(mdev, pdn, migf, &mkey);
+ if (err)
+ goto err_mkey;
+
+ MLX5_SET(load_vhca_state_in, in, opcode,
+ MLX5_CMD_OP_LOAD_VHCA_STATE);
+ MLX5_SET(load_vhca_state_in, in, op_mod, 0);
+ MLX5_SET(load_vhca_state_in, in, vhca_id, vhca_id);
+ MLX5_SET(load_vhca_state_in, in, mkey, mkey);
+ MLX5_SET(load_vhca_state_in, in, size, migf->total_length);
+
+ err = mlx5_cmd_exec_inout(mdev, load_vhca_state, in, out);
+
+ mlx5_core_destroy_mkey(mdev, mkey);
+err_mkey:
+ dma_unmap_sgtable(mdev->device, &migf->table.sgt, DMA_TO_DEVICE, 0);
+err_reg:
+ mlx5_core_dealloc_pd(mdev, pdn);
+end:
+ mlx5_vf_put_core_dev(mdev);
+ mutex_unlock(&migf->lock);
+ return err;
+}
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
new file mode 100644
index 000000000000..1392a11a9cc0
--- /dev/null
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/*
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ */
+
+#ifndef MLX5_VFIO_CMD_H
+#define MLX5_VFIO_CMD_H
+
+#include <linux/kernel.h>
+#include <linux/mlx5/driver.h>
+
+struct mlx5_vf_migration_file {
+ struct file *filp;
+ struct mutex lock;
+ bool disabled;
+
+ struct sg_append_table table;
+ size_t total_length;
+ size_t allocated_length;
+
+ /* Optimize mlx5vf_get_migration_page() for sequential access */
+ struct scatterlist *last_offset_sg;
+ unsigned int sg_last_entry;
+ unsigned long last_offset;
+};
+
+int mlx5vf_cmd_suspend_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
+int mlx5vf_cmd_resume_vhca(struct pci_dev *pdev, u16 vhca_id, u16 op_mod);
+int mlx5vf_cmd_query_vhca_migration_state(struct pci_dev *pdev, u16 vhca_id,
+ size_t *state_size);
+int mlx5vf_cmd_get_vhca_id(struct pci_dev *pdev, u16 function_id, u16 *vhca_id);
+int mlx5vf_cmd_save_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+ struct mlx5_vf_migration_file *migf);
+int mlx5vf_cmd_load_vhca_state(struct pci_dev *pdev, u16 vhca_id,
+ struct mlx5_vf_migration_file *migf);
+#endif /* MLX5_VFIO_CMD_H */
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
new file mode 100644
index 000000000000..bbec5d288fee
--- /dev/null
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -0,0 +1,676 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved
+ */
+
+#include <linux/device.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/interrupt.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <linux/pci.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+#include <linux/sched/mm.h>
+#include <linux/vfio_pci_core.h>
+#include <linux/anon_inodes.h>
+
+#include "cmd.h"
+
+/* Arbitrary to prevent userspace from consuming endless memory */
+#define MAX_MIGRATION_SIZE (512*1024*1024)
+
+struct mlx5vf_pci_core_device {
+ struct vfio_pci_core_device core_device;
+ u16 vhca_id;
+ u8 migrate_cap:1;
+ u8 deferred_reset:1;
+ /* protect migration state */
+ struct mutex state_mutex;
+ enum vfio_device_mig_state mig_state;
+ /* protect the reset_done flow */
+ spinlock_t reset_lock;
+ struct mlx5_vf_migration_file *resuming_migf;
+ struct mlx5_vf_migration_file *saving_migf;
+};
+
+static struct page *
+mlx5vf_get_migration_page(struct mlx5_vf_migration_file *migf,
+ unsigned long offset)
+{
+ unsigned long cur_offset = 0;
+ struct scatterlist *sg;
+ unsigned int i;
+
+ /* All accesses are sequential */
+ if (offset < migf->last_offset || !migf->last_offset_sg) {
+ migf->last_offset = 0;
+ migf->last_offset_sg = migf->table.sgt.sgl;
+ migf->sg_last_entry = 0;
+ }
+
+ cur_offset = migf->last_offset;
+
+ for_each_sg(migf->last_offset_sg, sg,
+ migf->table.sgt.orig_nents - migf->sg_last_entry, i) {
+ if (offset < sg->length + cur_offset) {
+ migf->last_offset_sg = sg;
+ migf->sg_last_entry += i;
+ migf->last_offset = cur_offset;
+ return nth_page(sg_page(sg),
+ (offset - cur_offset) / PAGE_SIZE);
+ }
+ cur_offset += sg->length;
+ }
+ return NULL;
+}
+
+static int mlx5vf_add_migration_pages(struct mlx5_vf_migration_file *migf,
+ unsigned int npages)
+{
+ unsigned int to_alloc = npages;
+ struct page **page_list;
+ unsigned long filled;
+ unsigned int to_fill;
+ int ret;
+
+ to_fill = min_t(unsigned int, npages, PAGE_SIZE / sizeof(*page_list));
+ page_list = kvzalloc(to_fill * sizeof(*page_list), GFP_KERNEL);
+ if (!page_list)
+ return -ENOMEM;
+
+ do {
+ filled = alloc_pages_bulk_array(GFP_KERNEL, to_fill, page_list);
+ if (!filled) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ to_alloc -= filled;
+ ret = sg_alloc_append_table_from_pages(
+ &migf->table, page_list, filled, 0,
+ filled << PAGE_SHIFT, UINT_MAX, SG_MAX_SINGLE_ALLOC,
+ GFP_KERNEL);
+
+ if (ret)
+ goto err;
+ migf->allocated_length += filled * PAGE_SIZE;
+ /* clean input for another bulk allocation */
+ memset(page_list, 0, filled * sizeof(*page_list));
+ to_fill = min_t(unsigned int, to_alloc,
+ PAGE_SIZE / sizeof(*page_list));
+ } while (to_alloc > 0);
+
+ kvfree(page_list);
+ return 0;
+
+err:
+ kvfree(page_list);
+ return ret;
+}
+
+static void mlx5vf_disable_fd(struct mlx5_vf_migration_file *migf)
+{
+ struct sg_page_iter sg_iter;
+
+ mutex_lock(&migf->lock);
+ /* Undo alloc_pages_bulk_array() */
+ for_each_sgtable_page(&migf->table.sgt, &sg_iter, 0)
+ __free_page(sg_page_iter_page(&sg_iter));
+ sg_free_append_table(&migf->table);
+ migf->disabled = true;
+ migf->total_length = 0;
+ migf->allocated_length = 0;
+ migf->filp->f_pos = 0;
+ mutex_unlock(&migf->lock);
+}
+
+static int mlx5vf_release_file(struct inode *inode, struct file *filp)
+{
+ struct mlx5_vf_migration_file *migf = filp->private_data;
+
+ mlx5vf_disable_fd(migf);
+ mutex_destroy(&migf->lock);
+ kfree(migf);
+ return 0;
+}
+
+static ssize_t mlx5vf_save_read(struct file *filp, char __user *buf, size_t len,
+ loff_t *pos)
+{
+ struct mlx5_vf_migration_file *migf = filp->private_data;
+ ssize_t done = 0;
+
+ if (pos)
+ return -ESPIPE;
+ pos = &filp->f_pos;
+
+ mutex_lock(&migf->lock);
+ if (*pos > migf->total_length) {
+ done = -EINVAL;
+ goto out_unlock;
+ }
+ if (migf->disabled) {
+ done = -ENODEV;
+ goto out_unlock;
+ }
+
+ len = min_t(size_t, migf->total_length - *pos, len);
+ while (len) {
+ size_t page_offset;
+ struct page *page;
+ size_t page_len;
+ u8 *from_buff;
+ int ret;
+
+ page_offset = (*pos) % PAGE_SIZE;
+ page = mlx5vf_get_migration_page(migf, *pos - page_offset);
+ if (!page) {
+ if (done == 0)
+ done = -EINVAL;
+ goto out_unlock;
+ }
+
+ page_len = min_t(size_t, len, PAGE_SIZE - page_offset);
+ from_buff = kmap_local_page(page);
+ ret = copy_to_user(buf, from_buff + page_offset, page_len);
+ kunmap_local(from_buff);
+ if (ret) {
+ done = -EFAULT;
+ goto out_unlock;
+ }
+ *pos += page_len;
+ len -= page_len;
+ done += page_len;
+ buf += page_len;
+ }
+
+out_unlock:
+ mutex_unlock(&migf->lock);
+ return done;
+}
+
+static const struct file_operations mlx5vf_save_fops = {
+ .owner = THIS_MODULE,
+ .read = mlx5vf_save_read,
+ .release = mlx5vf_release_file,
+ .llseek = no_llseek,
+};
+
+static struct mlx5_vf_migration_file *
+mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev)
+{
+ struct mlx5_vf_migration_file *migf;
+ int ret;
+
+ migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+ if (!migf)
+ return ERR_PTR(-ENOMEM);
+
+ migf->filp = anon_inode_getfile("mlx5vf_mig", &mlx5vf_save_fops, migf,
+ O_RDONLY);
+ if (IS_ERR(migf->filp)) {
+ int err = PTR_ERR(migf->filp);
+
+ kfree(migf);
+ return ERR_PTR(err);
+ }
+
+ stream_open(migf->filp->f_inode, migf->filp);
+ mutex_init(&migf->lock);
+
+ ret = mlx5vf_cmd_query_vhca_migration_state(
+ mvdev->core_device.pdev, mvdev->vhca_id, &migf->total_length);
+ if (ret)
+ goto out_free;
+
+ ret = mlx5vf_add_migration_pages(
+ migf, DIV_ROUND_UP_ULL(migf->total_length, PAGE_SIZE));
+ if (ret)
+ goto out_free;
+
+ ret = mlx5vf_cmd_save_vhca_state(mvdev->core_device.pdev,
+ mvdev->vhca_id, migf);
+ if (ret)
+ goto out_free;
+ return migf;
+out_free:
+ fput(migf->filp);
+ return ERR_PTR(ret);
+}
+
+static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
+ size_t len, loff_t *pos)
+{
+ struct mlx5_vf_migration_file *migf = filp->private_data;
+ loff_t requested_length;
+ ssize_t done = 0;
+
+ if (pos)
+ return -ESPIPE;
+ pos = &filp->f_pos;
+
+ if (*pos < 0 ||
+ check_add_overflow((loff_t)len, *pos, &requested_length))
+ return -EINVAL;
+
+ if (requested_length > MAX_MIGRATION_SIZE)
+ return -ENOMEM;
+
+ mutex_lock(&migf->lock);
+ if (migf->disabled) {
+ done = -ENODEV;
+ goto out_unlock;
+ }
+
+ if (migf->allocated_length < requested_length) {
+ done = mlx5vf_add_migration_pages(
+ migf,
+ DIV_ROUND_UP(requested_length - migf->allocated_length,
+ PAGE_SIZE));
+ if (done)
+ goto out_unlock;
+ }
+
+ while (len) {
+ size_t page_offset;
+ struct page *page;
+ size_t page_len;
+ u8 *to_buff;
+ int ret;
+
+ page_offset = (*pos) % PAGE_SIZE;
+ page = mlx5vf_get_migration_page(migf, *pos - page_offset);
+ if (!page) {
+ if (done == 0)
+ done = -EINVAL;
+ goto out_unlock;
+ }
+
+ page_len = min_t(size_t, len, PAGE_SIZE - page_offset);
+ to_buff = kmap_local_page(page);
+ ret = copy_from_user(to_buff + page_offset, buf, page_len);
+ kunmap_local(to_buff);
+ if (ret) {
+ done = -EFAULT;
+ goto out_unlock;
+ }
+ *pos += page_len;
+ len -= page_len;
+ done += page_len;
+ buf += page_len;
+ migf->total_length += page_len;
+ }
+out_unlock:
+ mutex_unlock(&migf->lock);
+ return done;
+}
+
+static const struct file_operations mlx5vf_resume_fops = {
+ .owner = THIS_MODULE,
+ .write = mlx5vf_resume_write,
+ .release = mlx5vf_release_file,
+ .llseek = no_llseek,
+};
+
+static struct mlx5_vf_migration_file *
+mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
+{
+ struct mlx5_vf_migration_file *migf;
+
+ migf = kzalloc(sizeof(*migf), GFP_KERNEL);
+ if (!migf)
+ return ERR_PTR(-ENOMEM);
+
+ migf->filp = anon_inode_getfile("mlx5vf_mig", &mlx5vf_resume_fops, migf,
+ O_WRONLY);
+ if (IS_ERR(migf->filp)) {
+ int err = PTR_ERR(migf->filp);
+
+ kfree(migf);
+ return ERR_PTR(err);
+ }
+ stream_open(migf->filp->f_inode, migf->filp);
+ mutex_init(&migf->lock);
+ return migf;
+}
+
+static void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev)
+{
+ if (mvdev->resuming_migf) {
+ mlx5vf_disable_fd(mvdev->resuming_migf);
+ fput(mvdev->resuming_migf->filp);
+ mvdev->resuming_migf = NULL;
+ }
+ if (mvdev->saving_migf) {
+ mlx5vf_disable_fd(mvdev->saving_migf);
+ fput(mvdev->saving_migf->filp);
+ mvdev->saving_migf = NULL;
+ }
+}
+
+static struct file *
+mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
+ u32 new)
+{
+ u32 cur = mvdev->mig_state;
+ int ret;
+
+ if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_STOP) {
+ ret = mlx5vf_cmd_suspend_vhca(
+ mvdev->core_device.pdev, mvdev->vhca_id,
+ MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
+ ret = mlx5vf_cmd_resume_vhca(
+ mvdev->core_device.pdev, mvdev->vhca_id,
+ MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_RESPONDER);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_RUNNING && new == VFIO_DEVICE_STATE_RUNNING_P2P) {
+ ret = mlx5vf_cmd_suspend_vhca(
+ mvdev->core_device.pdev, mvdev->vhca_id,
+ MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && new == VFIO_DEVICE_STATE_RUNNING) {
+ ret = mlx5vf_cmd_resume_vhca(
+ mvdev->core_device.pdev, mvdev->vhca_id,
+ MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR);
+ if (ret)
+ return ERR_PTR(ret);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_STOP_COPY) {
+ struct mlx5_vf_migration_file *migf;
+
+ migf = mlx5vf_pci_save_device_data(mvdev);
+ if (IS_ERR(migf))
+ return ERR_CAST(migf);
+ get_file(migf->filp);
+ mvdev->saving_migf = migf;
+ return migf->filp;
+ }
+
+ if ((cur == VFIO_DEVICE_STATE_STOP_COPY && new == VFIO_DEVICE_STATE_STOP)) {
+ mlx5vf_disable_fds(mvdev);
+ return NULL;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_STOP && new == VFIO_DEVICE_STATE_RESUMING) {
+ struct mlx5_vf_migration_file *migf;
+
+ migf = mlx5vf_pci_resume_device_data(mvdev);
+ if (IS_ERR(migf))
+ return ERR_CAST(migf);
+ get_file(migf->filp);
+ mvdev->resuming_migf = migf;
+ return migf->filp;
+ }
+
+ if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) {
+ ret = mlx5vf_cmd_load_vhca_state(mvdev->core_device.pdev,
+ mvdev->vhca_id,
+ mvdev->resuming_migf);
+ if (ret)
+ return ERR_PTR(ret);
+ mlx5vf_disable_fds(mvdev);
+ return NULL;
+ }
+
+ /*
+ * vfio_mig_get_next_state() does not use arcs other than the above
+ */
+ WARN_ON(true);
+ return ERR_PTR(-EINVAL);
+}
+
+/*
+ * This function is called in all state_mutex unlock cases to
+ * handle a 'deferred_reset' if exists.
+ */
+static void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev)
+{
+again:
+ spin_lock(&mvdev->reset_lock);
+ if (mvdev->deferred_reset) {
+ mvdev->deferred_reset = false;
+ spin_unlock(&mvdev->reset_lock);
+ mvdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+ mlx5vf_disable_fds(mvdev);
+ goto again;
+ }
+ mutex_unlock(&mvdev->state_mutex);
+ spin_unlock(&mvdev->reset_lock);
+}
+
+static struct file *
+mlx5vf_pci_set_device_state(struct vfio_device *vdev,
+ enum vfio_device_mig_state new_state)
+{
+ struct mlx5vf_pci_core_device *mvdev = container_of(
+ vdev, struct mlx5vf_pci_core_device, core_device.vdev);
+ enum vfio_device_mig_state next_state;
+ struct file *res = NULL;
+ int ret;
+
+ mutex_lock(&mvdev->state_mutex);
+ while (new_state != mvdev->mig_state) {
+ ret = vfio_mig_get_next_state(vdev, mvdev->mig_state,
+ new_state, &next_state);
+ if (ret) {
+ res = ERR_PTR(ret);
+ break;
+ }
+ res = mlx5vf_pci_step_device_state_locked(mvdev, next_state);
+ if (IS_ERR(res))
+ break;
+ mvdev->mig_state = next_state;
+ if (WARN_ON(res && new_state != mvdev->mig_state)) {
+ fput(res);
+ res = ERR_PTR(-EINVAL);
+ break;
+ }
+ }
+ mlx5vf_state_mutex_unlock(mvdev);
+ return res;
+}
+
+static int mlx5vf_pci_get_device_state(struct vfio_device *vdev,
+ enum vfio_device_mig_state *curr_state)
+{
+ struct mlx5vf_pci_core_device *mvdev = container_of(
+ vdev, struct mlx5vf_pci_core_device, core_device.vdev);
+
+ mutex_lock(&mvdev->state_mutex);
+ *curr_state = mvdev->mig_state;
+ mlx5vf_state_mutex_unlock(mvdev);
+ return 0;
+}
+
+static void mlx5vf_pci_aer_reset_done(struct pci_dev *pdev)
+{
+ struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
+
+ if (!mvdev->migrate_cap)
+ return;
+
+ /*
+ * As the higher VFIO layers are holding locks across reset and using
+ * those same locks with the mm_lock we need to prevent ABBA deadlock
+ * with the state_mutex and mm_lock.
+ * In case the state_mutex was taken already we defer the cleanup work
+ * to the unlock flow of the other running context.
+ */
+ spin_lock(&mvdev->reset_lock);
+ mvdev->deferred_reset = true;
+ if (!mutex_trylock(&mvdev->state_mutex)) {
+ spin_unlock(&mvdev->reset_lock);
+ return;
+ }
+ spin_unlock(&mvdev->reset_lock);
+ mlx5vf_state_mutex_unlock(mvdev);
+}
+
+static int mlx5vf_pci_open_device(struct vfio_device *core_vdev)
+{
+ struct mlx5vf_pci_core_device *mvdev = container_of(
+ core_vdev, struct mlx5vf_pci_core_device, core_device.vdev);
+ struct vfio_pci_core_device *vdev = &mvdev->core_device;
+ int vf_id;
+ int ret;
+
+ ret = vfio_pci_core_enable(vdev);
+ if (ret)
+ return ret;
+
+ if (!mvdev->migrate_cap) {
+ vfio_pci_core_finish_enable(vdev);
+ return 0;
+ }
+
+ vf_id = pci_iov_vf_id(vdev->pdev);
+ if (vf_id < 0) {
+ ret = vf_id;
+ goto out_disable;
+ }
+
+ ret = mlx5vf_cmd_get_vhca_id(vdev->pdev, vf_id + 1, &mvdev->vhca_id);
+ if (ret)
+ goto out_disable;
+
+ mvdev->mig_state = VFIO_DEVICE_STATE_RUNNING;
+ vfio_pci_core_finish_enable(vdev);
+ return 0;
+out_disable:
+ vfio_pci_core_disable(vdev);
+ return ret;
+}
+
+static void mlx5vf_pci_close_device(struct vfio_device *core_vdev)
+{
+ struct mlx5vf_pci_core_device *mvdev = container_of(
+ core_vdev, struct mlx5vf_pci_core_device, core_device.vdev);
+
+ mlx5vf_disable_fds(mvdev);
+ vfio_pci_core_close_device(core_vdev);
+}
+
+static const struct vfio_device_ops mlx5vf_pci_ops = {
+ .name = "mlx5-vfio-pci",
+ .open_device = mlx5vf_pci_open_device,
+ .close_device = mlx5vf_pci_close_device,
+ .ioctl = vfio_pci_core_ioctl,
+ .device_feature = vfio_pci_core_ioctl_feature,
+ .read = vfio_pci_core_read,
+ .write = vfio_pci_core_write,
+ .mmap = vfio_pci_core_mmap,
+ .request = vfio_pci_core_request,
+ .match = vfio_pci_core_match,
+ .migration_set_state = mlx5vf_pci_set_device_state,
+ .migration_get_state = mlx5vf_pci_get_device_state,
+};
+
+static int mlx5vf_pci_probe(struct pci_dev *pdev,
+ const struct pci_device_id *id)
+{
+ struct mlx5vf_pci_core_device *mvdev;
+ int ret;
+
+ mvdev = kzalloc(sizeof(*mvdev), GFP_KERNEL);
+ if (!mvdev)
+ return -ENOMEM;
+ vfio_pci_core_init_device(&mvdev->core_device, pdev, &mlx5vf_pci_ops);
+
+ if (pdev->is_virtfn) {
+ struct mlx5_core_dev *mdev =
+ mlx5_vf_get_core_dev(pdev);
+
+ if (mdev) {
+ if (MLX5_CAP_GEN(mdev, migration)) {
+ mvdev->migrate_cap = 1;
+ mvdev->core_device.vdev.migration_flags =
+ VFIO_MIGRATION_STOP_COPY |
+ VFIO_MIGRATION_P2P;
+ mutex_init(&mvdev->state_mutex);
+ spin_lock_init(&mvdev->reset_lock);
+ }
+ mlx5_vf_put_core_dev(mdev);
+ }
+ }
+
+ ret = vfio_pci_core_register_device(&mvdev->core_device);
+ if (ret)
+ goto out_free;
+
+ dev_set_drvdata(&pdev->dev, mvdev);
+ return 0;
+
+out_free:
+ vfio_pci_core_uninit_device(&mvdev->core_device);
+ kfree(mvdev);
+ return ret;
+}
+
+static void mlx5vf_pci_remove(struct pci_dev *pdev)
+{
+ struct mlx5vf_pci_core_device *mvdev = dev_get_drvdata(&pdev->dev);
+
+ vfio_pci_core_unregister_device(&mvdev->core_device);
+ vfio_pci_core_uninit_device(&mvdev->core_device);
+ kfree(mvdev);
+}
+
+static const struct pci_device_id mlx5vf_pci_table[] = {
+ { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_MELLANOX, 0x101e) }, /* ConnectX Family mlx5Gen Virtual Function */
+ {}
+};
+
+MODULE_DEVICE_TABLE(pci, mlx5vf_pci_table);
+
+static const struct pci_error_handlers mlx5vf_err_handlers = {
+ .reset_done = mlx5vf_pci_aer_reset_done,
+ .error_detected = vfio_pci_core_aer_err_detected,
+};
+
+static struct pci_driver mlx5vf_pci_driver = {
+ .name = KBUILD_MODNAME,
+ .id_table = mlx5vf_pci_table,
+ .probe = mlx5vf_pci_probe,
+ .remove = mlx5vf_pci_remove,
+ .err_handler = &mlx5vf_err_handlers,
+};
+
+static void __exit mlx5vf_pci_cleanup(void)
+{
+ pci_unregister_driver(&mlx5vf_pci_driver);
+}
+
+static int __init mlx5vf_pci_init(void)
+{
+ return pci_register_driver(&mlx5vf_pci_driver);
+}
+
+module_init(mlx5vf_pci_init);
+module_exit(mlx5vf_pci_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Max Gurtovoy <mgurtovoy@nvidia.com>");
+MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>");
+MODULE_DESCRIPTION(
+ "MLX5 VFIO PCI - User Level meta-driver for MLX5 device family");
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index a5ce92beb655..2b047469e02f 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -130,6 +130,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.open_device = vfio_pci_open_device,
.close_device = vfio_pci_core_close_device,
.ioctl = vfio_pci_core_ioctl,
+ .device_feature = vfio_pci_core_ioctl_feature,
.read = vfio_pci_core_read,
.write = vfio_pci_core_write,
.mmap = vfio_pci_core_mmap,
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index f948e6cd2993..b7bb16f92ac6 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -228,6 +228,19 @@ int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, pci_power_t stat
if (!ret) {
/* D3 might be unsupported via quirk, skip unless in D3 */
if (needs_save && pdev->current_state >= PCI_D3hot) {
+ /*
+ * The current PCI state will be saved locally in
+ * 'pm_save' during the D3hot transition. When the
+ * device state is changed to D0 again with the current
+ * function, then pci_store_saved_state() will restore
+ * the state and will free the memory pointed by
+ * 'pm_save'. There are few cases where the PCI power
+ * state can be changed to D0 without the involvement
+ * of the driver. For these cases, free the earlier
+ * allocated memory first before overwriting 'pm_save'
+ * to prevent the memory leak.
+ */
+ kfree(vdev->pm_save);
vdev->pm_save = pci_store_saved_state(pdev);
} else if (needs_restore) {
pci_load_and_free_saved_state(pdev, &vdev->pm_save);
@@ -322,6 +335,17 @@ void vfio_pci_core_disable(struct vfio_pci_core_device *vdev)
/* For needs_reset */
lockdep_assert_held(&vdev->vdev.dev_set->lock);
+ /*
+ * This function can be invoked while the power state is non-D0.
+ * This function calls __pci_reset_function_locked() which internally
+ * can use pci_pm_reset() for the function reset. pci_pm_reset() will
+ * fail if the power state is non-D0. Also, for the devices which
+ * have NoSoftRst-, the reset function can cause the PCI config space
+ * reset without restoring the original state (saved locally in
+ * 'vdev->pm_save').
+ */
+ vfio_pci_set_power_state(vdev, PCI_D0);
+
/* Stop the device from further DMA */
pci_clear_master(pdev);
@@ -921,6 +945,19 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
return -EINVAL;
vfio_pci_zap_and_down_write_memory_lock(vdev);
+
+ /*
+ * This function can be invoked while the power state is non-D0.
+ * If pci_try_reset_function() has been called while the power
+ * state is non-D0, then pci_try_reset_function() will
+ * internally set the power state to D0 without vfio driver
+ * involvement. For the devices which have NoSoftRst-, the
+ * reset function can cause the PCI config space reset without
+ * restoring the original state (saved locally in
+ * 'vdev->pm_save').
+ */
+ vfio_pci_set_power_state(vdev, PCI_D0);
+
ret = pci_try_reset_function(vdev->pdev);
up_write(&vdev->memory_lock);
@@ -1114,70 +1151,50 @@ hot_reset_release:
return vfio_pci_ioeventfd(vdev, ioeventfd.offset,
ioeventfd.data, count, ioeventfd.fd);
- } else if (cmd == VFIO_DEVICE_FEATURE) {
- struct vfio_device_feature feature;
- uuid_t uuid;
-
- minsz = offsetofend(struct vfio_device_feature, flags);
-
- if (copy_from_user(&feature, (void __user *)arg, minsz))
- return -EFAULT;
-
- if (feature.argsz < minsz)
- return -EINVAL;
-
- /* Check unknown flags */
- if (feature.flags & ~(VFIO_DEVICE_FEATURE_MASK |
- VFIO_DEVICE_FEATURE_SET |
- VFIO_DEVICE_FEATURE_GET |
- VFIO_DEVICE_FEATURE_PROBE))
- return -EINVAL;
-
- /* GET & SET are mutually exclusive except with PROBE */
- if (!(feature.flags & VFIO_DEVICE_FEATURE_PROBE) &&
- (feature.flags & VFIO_DEVICE_FEATURE_SET) &&
- (feature.flags & VFIO_DEVICE_FEATURE_GET))
- return -EINVAL;
-
- switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
- case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
- if (!vdev->vf_token)
- return -ENOTTY;
-
- /*
- * We do not support GET of the VF Token UUID as this
- * could expose the token of the previous device user.
- */
- if (feature.flags & VFIO_DEVICE_FEATURE_GET)
- return -EINVAL;
-
- if (feature.flags & VFIO_DEVICE_FEATURE_PROBE)
- return 0;
+ }
+ return -ENOTTY;
+}
+EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
- /* Don't SET unless told to do so */
- if (!(feature.flags & VFIO_DEVICE_FEATURE_SET))
- return -EINVAL;
+static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ struct vfio_pci_core_device *vdev =
+ container_of(device, struct vfio_pci_core_device, vdev);
+ uuid_t uuid;
+ int ret;
- if (feature.argsz < minsz + sizeof(uuid))
- return -EINVAL;
+ if (!vdev->vf_token)
+ return -ENOTTY;
+ /*
+ * We do not support GET of the VF Token UUID as this could
+ * expose the token of the previous device user.
+ */
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(uuid));
+ if (ret != 1)
+ return ret;
- if (copy_from_user(&uuid, (void __user *)(arg + minsz),
- sizeof(uuid)))
- return -EFAULT;
+ if (copy_from_user(&uuid, arg, sizeof(uuid)))
+ return -EFAULT;
- mutex_lock(&vdev->vf_token->lock);
- uuid_copy(&vdev->vf_token->uuid, &uuid);
- mutex_unlock(&vdev->vf_token->lock);
+ mutex_lock(&vdev->vf_token->lock);
+ uuid_copy(&vdev->vf_token->uuid, &uuid);
+ mutex_unlock(&vdev->vf_token->lock);
+ return 0;
+}
- return 0;
- default:
- return -ENOTTY;
- }
+int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
+ void __user *arg, size_t argsz)
+{
+ switch (flags & VFIO_DEVICE_FEATURE_MASK) {
+ case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN:
+ return vfio_pci_core_feature_token(device, flags, arg, argsz);
+ default:
+ return -ENOTTY;
}
-
- return -ENOTTY;
}
-EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl);
+EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl_feature);
static ssize_t vfio_pci_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
@@ -1891,8 +1908,8 @@ void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev)
}
EXPORT_SYMBOL_GPL(vfio_pci_core_unregister_device);
-static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
- pci_channel_state_t state)
+pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
+ pci_channel_state_t state)
{
struct vfio_pci_core_device *vdev;
struct vfio_device *device;
@@ -1914,6 +1931,7 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
}
+EXPORT_SYMBOL_GPL(vfio_pci_core_aer_err_detected);
int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
{
@@ -1936,7 +1954,7 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn)
EXPORT_SYMBOL_GPL(vfio_pci_core_sriov_configure);
const struct pci_error_handlers vfio_pci_core_err_handlers = {
- .error_detected = vfio_pci_aer_err_detected,
+ .error_detected = vfio_pci_core_aer_err_detected,
};
EXPORT_SYMBOL_GPL(vfio_pci_core_err_handlers);
@@ -2055,6 +2073,18 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set,
}
cur_mem = NULL;
+ /*
+ * The pci_reset_bus() will reset all the devices in the bus.
+ * The power state can be non-D0 for some of the devices in the bus.
+ * For these devices, the pci_reset_bus() will internally set
+ * the power state to D0 without vfio driver involvement.
+ * For the devices which have NoSoftRst-, the reset function can
+ * cause the PCI config space reset without restoring the original
+ * state (saved locally in 'vdev->pm_save').
+ */
+ list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
+ vfio_pci_set_power_state(cur, PCI_D0);
+
ret = pci_reset_bus(pdev);
err_undo:
@@ -2108,6 +2138,18 @@ static bool vfio_pci_dev_set_try_reset(struct vfio_device_set *dev_set)
if (!pdev)
return false;
+ /*
+ * The pci_reset_bus() will reset all the devices in the bus.
+ * The power state can be non-D0 for some of the devices in the bus.
+ * For these devices, the pci_reset_bus() will internally set
+ * the power state to D0 without vfio driver involvement.
+ * For the devices which have NoSoftRst-, the reset function can
+ * cause the PCI config space reset without restoring the original
+ * state (saved locally in 'vdev->pm_save').
+ */
+ list_for_each_entry(cur, &dev_set->device_list, vdev.dev_set_list)
+ vfio_pci_set_power_state(cur, PCI_D0);
+
ret = pci_reset_bus(pdev);
if (ret)
return false;
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 57d3b2cbbd8e..82ac1569deb0 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -288,6 +288,7 @@ out:
return done;
}
+#ifdef CONFIG_VFIO_PCI_VGA
ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite)
{
@@ -355,6 +356,7 @@ ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
return done;
}
+#endif
static void vfio_pci_ioeventfd_do_write(struct vfio_pci_ioeventfd *ioeventfd,
bool test_mem)
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 735d1d344af9..a4555014bd1e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1557,15 +1557,303 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
return 0;
}
+/*
+ * vfio_mig_get_next_state - Compute the next step in the FSM
+ * @cur_fsm - The current state the device is in
+ * @new_fsm - The target state to reach
+ * @next_fsm - Pointer to the next step to get to new_fsm
+ *
+ * Return 0 upon success, otherwise -errno
+ * Upon success the next step in the state progression between cur_fsm and
+ * new_fsm will be set in next_fsm.
+ *
+ * This breaks down requests for combination transitions into smaller steps and
+ * returns the next step to get to new_fsm. The function may need to be called
+ * multiple times before reaching new_fsm.
+ *
+ */
+int vfio_mig_get_next_state(struct vfio_device *device,
+ enum vfio_device_mig_state cur_fsm,
+ enum vfio_device_mig_state new_fsm,
+ enum vfio_device_mig_state *next_fsm)
+{
+ enum { VFIO_DEVICE_NUM_STATES = VFIO_DEVICE_STATE_RUNNING_P2P + 1 };
+ /*
+ * The coding in this table requires the driver to implement the
+ * following FSM arcs:
+ * RESUMING -> STOP
+ * STOP -> RESUMING
+ * STOP -> STOP_COPY
+ * STOP_COPY -> STOP
+ *
+ * If P2P is supported then the driver must also implement these FSM
+ * arcs:
+ * RUNNING -> RUNNING_P2P
+ * RUNNING_P2P -> RUNNING
+ * RUNNING_P2P -> STOP
+ * STOP -> RUNNING_P2P
+ * Without P2P the driver must implement:
+ * RUNNING -> STOP
+ * STOP -> RUNNING
+ *
+ * The coding will step through multiple states for some combination
+ * transitions; if all optional features are supported, this means the
+ * following ones:
+ * RESUMING -> STOP -> RUNNING_P2P
+ * RESUMING -> STOP -> RUNNING_P2P -> RUNNING
+ * RESUMING -> STOP -> STOP_COPY
+ * RUNNING -> RUNNING_P2P -> STOP
+ * RUNNING -> RUNNING_P2P -> STOP -> RESUMING
+ * RUNNING -> RUNNING_P2P -> STOP -> STOP_COPY
+ * RUNNING_P2P -> STOP -> RESUMING
+ * RUNNING_P2P -> STOP -> STOP_COPY
+ * STOP -> RUNNING_P2P -> RUNNING
+ * STOP_COPY -> STOP -> RESUMING
+ * STOP_COPY -> STOP -> RUNNING_P2P
+ * STOP_COPY -> STOP -> RUNNING_P2P -> RUNNING
+ */
+ static const u8 vfio_from_fsm_table[VFIO_DEVICE_NUM_STATES][VFIO_DEVICE_NUM_STATES] = {
+ [VFIO_DEVICE_STATE_STOP] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+ },
+ [VFIO_DEVICE_STATE_RUNNING] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+ },
+ [VFIO_DEVICE_STATE_STOP_COPY] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP_COPY,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+ },
+ [VFIO_DEVICE_STATE_RESUMING] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_RESUMING,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+ },
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_RUNNING,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_STOP,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_RUNNING_P2P,
+ [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+ },
+ [VFIO_DEVICE_STATE_ERROR] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_DEVICE_STATE_ERROR,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_DEVICE_STATE_ERROR,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_DEVICE_STATE_ERROR,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_DEVICE_STATE_ERROR,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] = VFIO_DEVICE_STATE_ERROR,
+ [VFIO_DEVICE_STATE_ERROR] = VFIO_DEVICE_STATE_ERROR,
+ },
+ };
+
+ static const unsigned int state_flags_table[VFIO_DEVICE_NUM_STATES] = {
+ [VFIO_DEVICE_STATE_STOP] = VFIO_MIGRATION_STOP_COPY,
+ [VFIO_DEVICE_STATE_RUNNING] = VFIO_MIGRATION_STOP_COPY,
+ [VFIO_DEVICE_STATE_STOP_COPY] = VFIO_MIGRATION_STOP_COPY,
+ [VFIO_DEVICE_STATE_RESUMING] = VFIO_MIGRATION_STOP_COPY,
+ [VFIO_DEVICE_STATE_RUNNING_P2P] =
+ VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P,
+ [VFIO_DEVICE_STATE_ERROR] = ~0U,
+ };
+
+ if (WARN_ON(cur_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
+ (state_flags_table[cur_fsm] & device->migration_flags) !=
+ state_flags_table[cur_fsm]))
+ return -EINVAL;
+
+ if (new_fsm >= ARRAY_SIZE(vfio_from_fsm_table) ||
+ (state_flags_table[new_fsm] & device->migration_flags) !=
+ state_flags_table[new_fsm])
+ return -EINVAL;
+
+ /*
+ * Arcs touching optional and unsupported states are skipped over. The
+ * driver will instead see an arc from the original state to the next
+ * logical state, as per the above comment.
+ */
+ *next_fsm = vfio_from_fsm_table[cur_fsm][new_fsm];
+ while ((state_flags_table[*next_fsm] & device->migration_flags) !=
+ state_flags_table[*next_fsm])
+ *next_fsm = vfio_from_fsm_table[*next_fsm][new_fsm];
+
+ return (*next_fsm != VFIO_DEVICE_STATE_ERROR) ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(vfio_mig_get_next_state);
+
+/*
+ * Convert the drivers's struct file into a FD number and return it to userspace
+ */
+static int vfio_ioct_mig_return_fd(struct file *filp, void __user *arg,
+ struct vfio_device_feature_mig_state *mig)
+{
+ int ret;
+ int fd;
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0) {
+ ret = fd;
+ goto out_fput;
+ }
+
+ mig->data_fd = fd;
+ if (copy_to_user(arg, mig, sizeof(*mig))) {
+ ret = -EFAULT;
+ goto out_put_unused;
+ }
+ fd_install(fd, filp);
+ return 0;
+
+out_put_unused:
+ put_unused_fd(fd);
+out_fput:
+ fput(filp);
+ return ret;
+}
+
+static int
+vfio_ioctl_device_feature_mig_device_state(struct vfio_device *device,
+ u32 flags, void __user *arg,
+ size_t argsz)
+{
+ size_t minsz =
+ offsetofend(struct vfio_device_feature_mig_state, data_fd);
+ struct vfio_device_feature_mig_state mig;
+ struct file *filp = NULL;
+ int ret;
+
+ if (!device->ops->migration_set_state ||
+ !device->ops->migration_get_state)
+ return -ENOTTY;
+
+ ret = vfio_check_feature(flags, argsz,
+ VFIO_DEVICE_FEATURE_SET |
+ VFIO_DEVICE_FEATURE_GET,
+ sizeof(mig));
+ if (ret != 1)
+ return ret;
+
+ if (copy_from_user(&mig, arg, minsz))
+ return -EFAULT;
+
+ if (flags & VFIO_DEVICE_FEATURE_GET) {
+ enum vfio_device_mig_state curr_state;
+
+ ret = device->ops->migration_get_state(device, &curr_state);
+ if (ret)
+ return ret;
+ mig.device_state = curr_state;
+ goto out_copy;
+ }
+
+ /* Handle the VFIO_DEVICE_FEATURE_SET */
+ filp = device->ops->migration_set_state(device, mig.device_state);
+ if (IS_ERR(filp) || !filp)
+ goto out_copy;
+
+ return vfio_ioct_mig_return_fd(filp, arg, &mig);
+out_copy:
+ mig.data_fd = -1;
+ if (copy_to_user(arg, &mig, sizeof(mig)))
+ return -EFAULT;
+ if (IS_ERR(filp))
+ return PTR_ERR(filp);
+ return 0;
+}
+
+static int vfio_ioctl_device_feature_migration(struct vfio_device *device,
+ u32 flags, void __user *arg,
+ size_t argsz)
+{
+ struct vfio_device_feature_migration mig = {
+ .flags = device->migration_flags,
+ };
+ int ret;
+
+ if (!device->ops->migration_set_state ||
+ !device->ops->migration_get_state)
+ return -ENOTTY;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET,
+ sizeof(mig));
+ if (ret != 1)
+ return ret;
+ if (copy_to_user(arg, &mig, sizeof(mig)))
+ return -EFAULT;
+ return 0;
+}
+
+static int vfio_ioctl_device_feature(struct vfio_device *device,
+ struct vfio_device_feature __user *arg)
+{
+ size_t minsz = offsetofend(struct vfio_device_feature, flags);
+ struct vfio_device_feature feature;
+
+ if (copy_from_user(&feature, arg, minsz))
+ return -EFAULT;
+
+ if (feature.argsz < minsz)
+ return -EINVAL;
+
+ /* Check unknown flags */
+ if (feature.flags &
+ ~(VFIO_DEVICE_FEATURE_MASK | VFIO_DEVICE_FEATURE_SET |
+ VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_PROBE))
+ return -EINVAL;
+
+ /* GET & SET are mutually exclusive except with PROBE */
+ if (!(feature.flags & VFIO_DEVICE_FEATURE_PROBE) &&
+ (feature.flags & VFIO_DEVICE_FEATURE_SET) &&
+ (feature.flags & VFIO_DEVICE_FEATURE_GET))
+ return -EINVAL;
+
+ switch (feature.flags & VFIO_DEVICE_FEATURE_MASK) {
+ case VFIO_DEVICE_FEATURE_MIGRATION:
+ return vfio_ioctl_device_feature_migration(
+ device, feature.flags, arg->data,
+ feature.argsz - minsz);
+ case VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE:
+ return vfio_ioctl_device_feature_mig_device_state(
+ device, feature.flags, arg->data,
+ feature.argsz - minsz);
+ default:
+ if (unlikely(!device->ops->device_feature))
+ return -EINVAL;
+ return device->ops->device_feature(device, feature.flags,
+ arg->data,
+ feature.argsz - minsz);
+ }
+}
+
static long vfio_device_fops_unl_ioctl(struct file *filep,
unsigned int cmd, unsigned long arg)
{
struct vfio_device *device = filep->private_data;
- if (unlikely(!device->ops->ioctl))
- return -EINVAL;
-
- return device->ops->ioctl(device, cmd, arg);
+ switch (cmd) {
+ case VFIO_DEVICE_FEATURE:
+ return vfio_ioctl_device_feature(device, (void __user *)arg);
+ default:
+ if (unlikely(!device->ops->ioctl))
+ return -EINVAL;
+ return device->ops->ioctl(device, cmd, arg);
+ }
}
static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
diff --git a/drivers/crypto/hisilicon/qm.h b/include/linux/hisi_acc_qm.h
index 3068093229a5..177f7b7cd414 100644
--- a/drivers/crypto/hisilicon/qm.h
+++ b/include/linux/hisi_acc_qm.h
@@ -34,6 +34,41 @@
#define QM_WUSER_M_CFG_ENABLE 0x1000a8
#define WUSER_M_CFG_ENABLE 0xffffffff
+/* mailbox */
+#define QM_MB_CMD_SQC 0x0
+#define QM_MB_CMD_CQC 0x1
+#define QM_MB_CMD_EQC 0x2
+#define QM_MB_CMD_AEQC 0x3
+#define QM_MB_CMD_SQC_BT 0x4
+#define QM_MB_CMD_CQC_BT 0x5
+#define QM_MB_CMD_SQC_VFT_V2 0x6
+#define QM_MB_CMD_STOP_QP 0x8
+#define QM_MB_CMD_SRC 0xc
+#define QM_MB_CMD_DST 0xd
+
+#define QM_MB_CMD_SEND_BASE 0x300
+#define QM_MB_EVENT_SHIFT 8
+#define QM_MB_BUSY_SHIFT 13
+#define QM_MB_OP_SHIFT 14
+#define QM_MB_CMD_DATA_ADDR_L 0x304
+#define QM_MB_CMD_DATA_ADDR_H 0x308
+#define QM_MB_MAX_WAIT_CNT 6000
+
+/* doorbell */
+#define QM_DOORBELL_CMD_SQ 0
+#define QM_DOORBELL_CMD_CQ 1
+#define QM_DOORBELL_CMD_EQ 2
+#define QM_DOORBELL_CMD_AEQ 3
+
+#define QM_DOORBELL_SQ_CQ_BASE_V2 0x1000
+#define QM_DOORBELL_EQ_AEQ_BASE_V2 0x2000
+#define QM_QP_MAX_NUM_SHIFT 11
+#define QM_DB_CMD_SHIFT_V2 12
+#define QM_DB_RAND_SHIFT_V2 16
+#define QM_DB_INDEX_SHIFT_V2 32
+#define QM_DB_PRIORITY_SHIFT_V2 48
+#define QM_VF_STATE 0x60
+
/* qm cache */
#define QM_CACHE_CTL 0x100050
#define SQC_CACHE_ENABLE BIT(0)
@@ -128,6 +163,11 @@ enum qm_debug_file {
DEBUG_FILE_NUM,
};
+enum qm_vf_state {
+ QM_READY = 0,
+ QM_NOT_READY,
+};
+
struct qm_dfx {
atomic64_t err_irq_cnt;
atomic64_t aeq_irq_cnt;
@@ -414,6 +454,10 @@ pci_ers_result_t hisi_qm_dev_slot_reset(struct pci_dev *pdev);
void hisi_qm_reset_prepare(struct pci_dev *pdev);
void hisi_qm_reset_done(struct pci_dev *pdev);
+int hisi_qm_wait_mb_ready(struct hisi_qm *qm);
+int hisi_qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue,
+ bool op);
+
struct hisi_acc_sgl_pool;
struct hisi_acc_hw_sgl *hisi_acc_sg_buf_map_to_hw_sgl(struct device *dev,
struct scatterlist *sgl, struct hisi_acc_sgl_pool *pool,
@@ -438,4 +482,9 @@ void hisi_qm_pm_init(struct hisi_qm *qm);
int hisi_qm_get_dfx_access(struct hisi_qm *qm);
void hisi_qm_put_dfx_access(struct hisi_qm *qm);
void hisi_qm_regs_dump(struct seq_file *s, struct debugfs_regset32 *regset);
+
+/* Used by VFIO ACC live migration driver */
+struct pci_driver *hisi_sec_get_pf_driver(void);
+struct pci_driver *hisi_hpre_get_pf_driver(void);
+struct pci_driver *hisi_zip_get_pf_driver(void);
#endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 78655d8d13a7..319322a8ff94 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1143,6 +1143,9 @@ int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type,
u64 length, u16 uid, phys_addr_t addr, u32 obj_id);
+struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev);
+void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev);
+
#ifdef CONFIG_MLX5_CORE_IPOIB
struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
struct ib_device *ibdev,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 49a48d7709ac..0235054fe55c 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -127,6 +127,11 @@ enum {
MLX5_CMD_OP_QUERY_SF_PARTITION = 0x111,
MLX5_CMD_OP_ALLOC_SF = 0x113,
MLX5_CMD_OP_DEALLOC_SF = 0x114,
+ MLX5_CMD_OP_SUSPEND_VHCA = 0x115,
+ MLX5_CMD_OP_RESUME_VHCA = 0x116,
+ MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE = 0x117,
+ MLX5_CMD_OP_SAVE_VHCA_STATE = 0x118,
+ MLX5_CMD_OP_LOAD_VHCA_STATE = 0x119,
MLX5_CMD_OP_CREATE_MKEY = 0x200,
MLX5_CMD_OP_QUERY_MKEY = 0x201,
MLX5_CMD_OP_DESTROY_MKEY = 0x202,
@@ -1757,7 +1762,9 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 reserved_at_682[0x1];
u8 log_max_sf[0x5];
u8 apu[0x1];
- u8 reserved_at_689[0x7];
+ u8 reserved_at_689[0x4];
+ u8 migration[0x1];
+ u8 reserved_at_68e[0x2];
u8 log_min_sf_size[0x8];
u8 max_num_sf_partitions[0x8];
@@ -11518,4 +11525,142 @@ enum {
MLX5_MTT_PERM_RW = MLX5_MTT_PERM_READ | MLX5_MTT_PERM_WRITE,
};
+enum {
+ MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR = 0x0,
+ MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER = 0x1,
+};
+
+struct mlx5_ifc_suspend_vhca_in_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x10];
+ u8 vhca_id[0x10];
+
+ u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_suspend_vhca_out_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
+enum {
+ MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_RESPONDER = 0x0,
+ MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR = 0x1,
+};
+
+struct mlx5_ifc_resume_vhca_in_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x10];
+ u8 vhca_id[0x10];
+
+ u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_resume_vhca_out_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
+struct mlx5_ifc_query_vhca_migration_state_in_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x10];
+ u8 vhca_id[0x10];
+
+ u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_query_vhca_migration_state_out_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+
+ u8 required_umem_size[0x20];
+
+ u8 reserved_at_a0[0x160];
+};
+
+struct mlx5_ifc_save_vhca_state_in_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x10];
+ u8 vhca_id[0x10];
+
+ u8 reserved_at_60[0x20];
+
+ u8 va[0x40];
+
+ u8 mkey[0x20];
+
+ u8 size[0x20];
+};
+
+struct mlx5_ifc_save_vhca_state_out_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 actual_image_size[0x20];
+
+ u8 reserved_at_60[0x20];
+};
+
+struct mlx5_ifc_load_vhca_state_in_bits {
+ u8 opcode[0x10];
+ u8 uid[0x10];
+
+ u8 reserved_at_20[0x10];
+ u8 op_mod[0x10];
+
+ u8 reserved_at_40[0x10];
+ u8 vhca_id[0x10];
+
+ u8 reserved_at_60[0x20];
+
+ u8 va[0x40];
+
+ u8 mkey[0x20];
+
+ u8 size[0x20];
+};
+
+struct mlx5_ifc_load_vhca_state_out_bits {
+ u8 status[0x8];
+ u8 reserved_at_8[0x18];
+
+ u8 syndrome[0x20];
+
+ u8 reserved_at_40[0x40];
+};
+
#endif /* MLX5_IFC_H */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8253a5413d7c..60d423d8f0c4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2166,7 +2166,8 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
#ifdef CONFIG_PCI_IOV
int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
-
+int pci_iov_vf_id(struct pci_dev *dev);
+void *pci_iov_get_pf_drvdata(struct pci_dev *dev, struct pci_driver *pf_driver);
int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
void pci_disable_sriov(struct pci_dev *dev);
@@ -2194,6 +2195,18 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
{
return -ENOSYS;
}
+
+static inline int pci_iov_vf_id(struct pci_dev *dev)
+{
+ return -ENOSYS;
+}
+
+static inline void *pci_iov_get_pf_drvdata(struct pci_dev *dev,
+ struct pci_driver *pf_driver)
+{
+ return ERR_PTR(-EINVAL);
+}
+
static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
{ return -ENODEV; }
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index aad54c666407..31dee2b65a62 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2529,6 +2529,9 @@
#define PCI_DEVICE_ID_KORENIX_JETCARDF3 0x17ff
#define PCI_VENDOR_ID_HUAWEI 0x19e5
+#define PCI_DEVICE_ID_HUAWEI_ZIP_VF 0xa251
+#define PCI_DEVICE_ID_HUAWEI_SEC_VF 0xa256
+#define PCI_DEVICE_ID_HUAWEI_HPRE_VF 0xa259
#define PCI_VENDOR_ID_NETRONOME 0x19ee
#define PCI_DEVICE_ID_NETRONOME_NFP4000 0x4000
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 76191d7abed1..66dda06ec42d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -33,6 +33,7 @@ struct vfio_device {
struct vfio_group *group;
struct vfio_device_set *dev_set;
struct list_head dev_set_list;
+ unsigned int migration_flags;
/* Members below here are private, not for driver use */
refcount_t refcount;
@@ -55,6 +56,17 @@ struct vfio_device {
* @match: Optional device name match callback (return: 0 for no-match, >0 for
* match, -errno for abort (ex. match with insufficient or incorrect
* additional args)
+ * @device_feature: Optional, fill in the VFIO_DEVICE_FEATURE ioctl
+ * @migration_set_state: Optional callback to change the migration state for
+ * devices that support migration. It's mandatory for
+ * VFIO_DEVICE_FEATURE_MIGRATION migration support.
+ * The returned FD is used for data transfer according to the FSM
+ * definition. The driver is responsible to ensure that FD reaches end
+ * of stream or error whenever the migration FSM leaves a data transfer
+ * state or before close_device() returns.
+ * @migration_get_state: Optional callback to get the migration state for
+ * devices that support migration. It's mandatory for
+ * VFIO_DEVICE_FEATURE_MIGRATION migration support.
*/
struct vfio_device_ops {
char *name;
@@ -69,8 +81,44 @@ struct vfio_device_ops {
int (*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma);
void (*request)(struct vfio_device *vdev, unsigned int count);
int (*match)(struct vfio_device *vdev, char *buf);
+ int (*device_feature)(struct vfio_device *device, u32 flags,
+ void __user *arg, size_t argsz);
+ struct file *(*migration_set_state)(
+ struct vfio_device *device,
+ enum vfio_device_mig_state new_state);
+ int (*migration_get_state)(struct vfio_device *device,
+ enum vfio_device_mig_state *curr_state);
};
+/**
+ * vfio_check_feature - Validate user input for the VFIO_DEVICE_FEATURE ioctl
+ * @flags: Arg from the device_feature op
+ * @argsz: Arg from the device_feature op
+ * @supported_ops: Combination of VFIO_DEVICE_FEATURE_GET and SET the driver
+ * supports
+ * @minsz: Minimum data size the driver accepts
+ *
+ * For use in a driver's device_feature op. Checks that the inputs to the
+ * VFIO_DEVICE_FEATURE ioctl are correct for the driver's feature. Returns 1 if
+ * the driver should execute the get or set, otherwise the relevant
+ * value should be returned.
+ */
+static inline int vfio_check_feature(u32 flags, size_t argsz, u32 supported_ops,
+ size_t minsz)
+{
+ if ((flags & (VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_SET)) &
+ ~supported_ops)
+ return -EINVAL;
+ if (flags & VFIO_DEVICE_FEATURE_PROBE)
+ return 0;
+ /* Without PROBE one of GET or SET must be requested */
+ if (!(flags & (VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_SET)))
+ return -EINVAL;
+ if (argsz < minsz)
+ return -EINVAL;
+ return 1;
+}
+
void vfio_init_group_dev(struct vfio_device *device, struct device *dev,
const struct vfio_device_ops *ops);
void vfio_uninit_group_dev(struct vfio_device *device);
@@ -82,6 +130,11 @@ extern void vfio_device_put(struct vfio_device *device);
int vfio_assign_device_set(struct vfio_device *device, void *set_id);
+int vfio_mig_get_next_state(struct vfio_device *device,
+ enum vfio_device_mig_state cur_fsm,
+ enum vfio_device_mig_state new_fsm,
+ enum vfio_device_mig_state *next_fsm);
+
/*
* External user API
*/
diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h
index ef9a44b6cf5d..74a4a0f17b28 100644
--- a/include/linux/vfio_pci_core.h
+++ b/include/linux/vfio_pci_core.h
@@ -159,8 +159,17 @@ extern ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev,
extern ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
+#ifdef CONFIG_VFIO_PCI_VGA
extern ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
+#else
+static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev,
+ char __user *buf, size_t count,
+ loff_t *ppos, bool iswrite)
+{
+ return -EINVAL;
+}
+#endif
extern long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset,
uint64_t data, int count, int fd);
@@ -220,6 +229,8 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn);
extern const struct pci_error_handlers vfio_pci_core_err_handlers;
long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd,
unsigned long arg);
+int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
+ void __user *arg, size_t argsz);
ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf,
size_t count, loff_t *ppos);
ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf,
@@ -230,6 +241,8 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf);
int vfio_pci_core_enable(struct vfio_pci_core_device *vdev);
void vfio_pci_core_disable(struct vfio_pci_core_device *vdev);
void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev);
+pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev,
+ pci_channel_state_t state);
static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
{
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index ef33ea002b0b..fea86061b44e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -323,7 +323,7 @@ struct vfio_region_info_cap_type {
#define VFIO_REGION_TYPE_PCI_VENDOR_MASK (0xffff)
#define VFIO_REGION_TYPE_GFX (1)
#define VFIO_REGION_TYPE_CCW (2)
-#define VFIO_REGION_TYPE_MIGRATION (3)
+#define VFIO_REGION_TYPE_MIGRATION_DEPRECATED (3)
/* sub-types for VFIO_REGION_TYPE_PCI_* */
@@ -405,225 +405,29 @@ struct vfio_region_gfx_edid {
#define VFIO_REGION_SUBTYPE_CCW_CRW (3)
/* sub-types for VFIO_REGION_TYPE_MIGRATION */
-#define VFIO_REGION_SUBTYPE_MIGRATION (1)
-
-/*
- * The structure vfio_device_migration_info is placed at the 0th offset of
- * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
- * migration information. Field accesses from this structure are only supported
- * at their native width and alignment. Otherwise, the result is undefined and
- * vendor drivers should return an error.
- *
- * device_state: (read/write)
- * - The user application writes to this field to inform the vendor driver
- * about the device state to be transitioned to.
- * - The vendor driver should take the necessary actions to change the
- * device state. After successful transition to a given state, the
- * vendor driver should return success on write(device_state, state)
- * system call. If the device state transition fails, the vendor driver
- * should return an appropriate -errno for the fault condition.
- * - On the user application side, if the device state transition fails,
- * that is, if write(device_state, state) returns an error, read
- * device_state again to determine the current state of the device from
- * the vendor driver.
- * - The vendor driver should return previous state of the device unless
- * the vendor driver has encountered an internal error, in which case
- * the vendor driver may report the device_state VFIO_DEVICE_STATE_ERROR.
- * - The user application must use the device reset ioctl to recover the
- * device from VFIO_DEVICE_STATE_ERROR state. If the device is
- * indicated to be in a valid device state by reading device_state, the
- * user application may attempt to transition the device to any valid
- * state reachable from the current state or terminate itself.
- *
- * device_state consists of 3 bits:
- * - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
- * it indicates the _STOP state. When the device state is changed to
- * _STOP, driver should stop the device before write() returns.
- * - If bit 1 is set, it indicates the _SAVING state, which means that the
- * driver should start gathering device state information that will be
- * provided to the VFIO user application to save the device's state.
- * - If bit 2 is set, it indicates the _RESUMING state, which means that
- * the driver should prepare to resume the device. Data provided through
- * the migration region should be used to resume the device.
- * Bits 3 - 31 are reserved for future use. To preserve them, the user
- * application should perform a read-modify-write operation on this
- * field when modifying the specified bits.
- *
- * +------- _RESUMING
- * |+------ _SAVING
- * ||+----- _RUNNING
- * |||
- * 000b => Device Stopped, not saving or resuming
- * 001b => Device running, which is the default state
- * 010b => Stop the device & save the device state, stop-and-copy state
- * 011b => Device running and save the device state, pre-copy state
- * 100b => Device stopped and the device state is resuming
- * 101b => Invalid state
- * 110b => Error state
- * 111b => Invalid state
- *
- * State transitions:
- *
- * _RESUMING _RUNNING Pre-copy Stop-and-copy _STOP
- * (100b) (001b) (011b) (010b) (000b)
- * 0. Running or default state
- * |
- *
- * 1. Normal Shutdown (optional)
- * |------------------------------------->|
- *
- * 2. Save the state or suspend
- * |------------------------->|---------->|
- *
- * 3. Save the state during live migration
- * |----------->|------------>|---------->|
- *
- * 4. Resuming
- * |<---------|
- *
- * 5. Resumed
- * |--------->|
- *
- * 0. Default state of VFIO device is _RUNNING when the user application starts.
- * 1. During normal shutdown of the user application, the user application may
- * optionally change the VFIO device state from _RUNNING to _STOP. This
- * transition is optional. The vendor driver must support this transition but
- * must not require it.
- * 2. When the user application saves state or suspends the application, the
- * device state transitions from _RUNNING to stop-and-copy and then to _STOP.
- * On state transition from _RUNNING to stop-and-copy, driver must stop the
- * device, save the device state and send it to the application through the
- * migration region. The sequence to be followed for such transition is given
- * below.
- * 3. In live migration of user application, the state transitions from _RUNNING
- * to pre-copy, to stop-and-copy, and to _STOP.
- * On state transition from _RUNNING to pre-copy, the driver should start
- * gathering the device state while the application is still running and send
- * the device state data to application through the migration region.
- * On state transition from pre-copy to stop-and-copy, the driver must stop
- * the device, save the device state and send it to the user application
- * through the migration region.
- * Vendor drivers must support the pre-copy state even for implementations
- * where no data is provided to the user before the stop-and-copy state. The
- * user must not be required to consume all migration data before the device
- * transitions to a new state, including the stop-and-copy state.
- * The sequence to be followed for above two transitions is given below.
- * 4. To start the resuming phase, the device state should be transitioned from
- * the _RUNNING to the _RESUMING state.
- * In the _RESUMING state, the driver should use the device state data
- * received through the migration region to resume the device.
- * 5. After providing saved device data to the driver, the application should
- * change the state from _RESUMING to _RUNNING.
- *
- * reserved:
- * Reads on this field return zero and writes are ignored.
- *
- * pending_bytes: (read only)
- * The number of pending bytes still to be migrated from the vendor driver.
- *
- * data_offset: (read only)
- * The user application should read data_offset field from the migration
- * region. The user application should read the device data from this
- * offset within the migration region during the _SAVING state or write
- * the device data during the _RESUMING state. See below for details of
- * sequence to be followed.
- *
- * data_size: (read/write)
- * The user application should read data_size to get the size in bytes of
- * the data copied in the migration region during the _SAVING state and
- * write the size in bytes of the data copied in the migration region
- * during the _RESUMING state.
- *
- * The format of the migration region is as follows:
- * ------------------------------------------------------------------
- * |vfio_device_migration_info| data section |
- * | | /////////////////////////////// |
- * ------------------------------------------------------------------
- * ^ ^
- * offset 0-trapped part data_offset
- *
- * The structure vfio_device_migration_info is always followed by the data
- * section in the region, so data_offset will always be nonzero. The offset
- * from where the data is copied is decided by the kernel driver. The data
- * section can be trapped, mmapped, or partitioned, depending on how the kernel
- * driver defines the data section. The data section partition can be defined
- * as mapped by the sparse mmap capability. If mmapped, data_offset must be
- * page aligned, whereas initial section which contains the
- * vfio_device_migration_info structure, might not end at the offset, which is
- * page aligned. The user is not required to access through mmap regardless
- * of the capabilities of the region mmap.
- * The vendor driver should determine whether and how to partition the data
- * section. The vendor driver should return data_offset accordingly.
- *
- * The sequence to be followed while in pre-copy state and stop-and-copy state
- * is as follows:
- * a. Read pending_bytes, indicating the start of a new iteration to get device
- * data. Repeated read on pending_bytes at this stage should have no side
- * effects.
- * If pending_bytes == 0, the user application should not iterate to get data
- * for that device.
- * If pending_bytes > 0, perform the following steps.
- * b. Read data_offset, indicating that the vendor driver should make data
- * available through the data section. The vendor driver should return this
- * read operation only after data is available from (region + data_offset)
- * to (region + data_offset + data_size).
- * c. Read data_size, which is the amount of data in bytes available through
- * the migration region.
- * Read on data_offset and data_size should return the offset and size of
- * the current buffer if the user application reads data_offset and
- * data_size more than once here.
- * d. Read data_size bytes of data from (region + data_offset) from the
- * migration region.
- * e. Process the data.
- * f. Read pending_bytes, which indicates that the data from the previous
- * iteration has been read. If pending_bytes > 0, go to step b.
- *
- * The user application can transition from the _SAVING|_RUNNING
- * (pre-copy state) to the _SAVING (stop-and-copy) state regardless of the
- * number of pending bytes. The user application should iterate in _SAVING
- * (stop-and-copy) until pending_bytes is 0.
- *
- * The sequence to be followed while _RESUMING device state is as follows:
- * While data for this device is available, repeat the following steps:
- * a. Read data_offset from where the user application should write data.
- * b. Write migration data starting at the migration region + data_offset for
- * the length determined by data_size from the migration source.
- * c. Write data_size, which indicates to the vendor driver that data is
- * written in the migration region. Vendor driver must return this write
- * operations on consuming data. Vendor driver should apply the
- * user-provided migration region data to the device resume state.
- *
- * If an error occurs during the above sequences, the vendor driver can return
- * an error code for next read() or write() operation, which will terminate the
- * loop. The user application should then take the next necessary action, for
- * example, failing migration or terminating the user application.
- *
- * For the user application, data is opaque. The user application should write
- * data in the same order as the data is received and the data should be of
- * same transaction size at the source.
- */
+#define VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED (1)
struct vfio_device_migration_info {
__u32 device_state; /* VFIO device state */
-#define VFIO_DEVICE_STATE_STOP (0)
-#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
-#define VFIO_DEVICE_STATE_SAVING (1 << 1)
-#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
-#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
- VFIO_DEVICE_STATE_SAVING | \
- VFIO_DEVICE_STATE_RESUMING)
+#define VFIO_DEVICE_STATE_V1_STOP (0)
+#define VFIO_DEVICE_STATE_V1_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_V1_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_V1_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_V1_RUNNING | \
+ VFIO_DEVICE_STATE_V1_SAVING | \
+ VFIO_DEVICE_STATE_V1_RESUMING)
#define VFIO_DEVICE_STATE_VALID(state) \
- (state & VFIO_DEVICE_STATE_RESUMING ? \
- (state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1)
+ (state & VFIO_DEVICE_STATE_V1_RESUMING ? \
+ (state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_V1_RESUMING : 1)
#define VFIO_DEVICE_STATE_IS_ERROR(state) \
- ((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_SAVING | \
- VFIO_DEVICE_STATE_RESUMING))
+ ((state & VFIO_DEVICE_STATE_MASK) == (VFIO_DEVICE_STATE_V1_SAVING | \
+ VFIO_DEVICE_STATE_V1_RESUMING))
#define VFIO_DEVICE_STATE_SET_ERROR(state) \
- ((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_SATE_SAVING | \
- VFIO_DEVICE_STATE_RESUMING)
+ ((state & ~VFIO_DEVICE_STATE_MASK) | VFIO_DEVICE_STATE_V1_SAVING | \
+ VFIO_DEVICE_STATE_V1_RESUMING)
__u32 reserved;
__u64 pending_bytes;
@@ -1002,6 +806,186 @@ struct vfio_device_feature {
*/
#define VFIO_DEVICE_FEATURE_PCI_VF_TOKEN (0)
+/*
+ * Indicates the device can support the migration API through
+ * VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE. If this GET succeeds, the RUNNING and
+ * ERROR states are always supported. Support for additional states is
+ * indicated via the flags field; at least VFIO_MIGRATION_STOP_COPY must be
+ * set.
+ *
+ * VFIO_MIGRATION_STOP_COPY means that STOP, STOP_COPY and
+ * RESUMING are supported.
+ *
+ * VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P means that RUNNING_P2P
+ * is supported in addition to the STOP_COPY states.
+ *
+ * Other combinations of flags have behavior to be defined in the future.
+ */
+struct vfio_device_feature_migration {
+ __aligned_u64 flags;
+#define VFIO_MIGRATION_STOP_COPY (1 << 0)
+#define VFIO_MIGRATION_P2P (1 << 1)
+};
+#define VFIO_DEVICE_FEATURE_MIGRATION 1
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET, execute a migration state change on the VFIO
+ * device. The new state is supplied in device_state, see enum
+ * vfio_device_mig_state for details
+ *
+ * The kernel migration driver must fully transition the device to the new state
+ * value before the operation returns to the user.
+ *
+ * The kernel migration driver must not generate asynchronous device state
+ * transitions outside of manipulation by the user or the VFIO_DEVICE_RESET
+ * ioctl as described above.
+ *
+ * If this function fails then current device_state may be the original
+ * operating state or some other state along the combination transition path.
+ * The user can then decide if it should execute a VFIO_DEVICE_RESET, attempt
+ * to return to the original state, or attempt to return to some other state
+ * such as RUNNING or STOP.
+ *
+ * If the new_state starts a new data transfer session then the FD associated
+ * with that session is returned in data_fd. The user is responsible to close
+ * this FD when it is finished. The user must consider the migration data stream
+ * carried over the FD to be opaque and must preserve the byte order of the
+ * stream. The user is not required to preserve buffer segmentation when writing
+ * the data stream during the RESUMING operation.
+ *
+ * Upon VFIO_DEVICE_FEATURE_GET, get the current migration state of the VFIO
+ * device, data_fd will be -1.
+ */
+struct vfio_device_feature_mig_state {
+ __u32 device_state; /* From enum vfio_device_mig_state */
+ __s32 data_fd;
+};
+#define VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE 2
+
+/*
+ * The device migration Finite State Machine is described by the enum
+ * vfio_device_mig_state. Some of the FSM arcs will create a migration data
+ * transfer session by returning a FD, in this case the migration data will
+ * flow over the FD using read() and write() as discussed below.
+ *
+ * There are 5 states to support VFIO_MIGRATION_STOP_COPY:
+ * RUNNING - The device is running normally
+ * STOP - The device does not change the internal or external state
+ * STOP_COPY - The device internal state can be read out
+ * RESUMING - The device is stopped and is loading a new internal state
+ * ERROR - The device has failed and must be reset
+ *
+ * And 1 optional state to support VFIO_MIGRATION_P2P:
+ * RUNNING_P2P - RUNNING, except the device cannot do peer to peer DMA
+ *
+ * The FSM takes actions on the arcs between FSM states. The driver implements
+ * the following behavior for the FSM arcs:
+ *
+ * RUNNING_P2P -> STOP
+ * STOP_COPY -> STOP
+ * While in STOP the device must stop the operation of the device. The device
+ * must not generate interrupts, DMA, or any other change to external state.
+ * It must not change its internal state. When stopped the device and kernel
+ * migration driver must accept and respond to interaction to support external
+ * subsystems in the STOP state, for example PCI MSI-X and PCI config space.
+ * Failure by the user to restrict device access while in STOP must not result
+ * in error conditions outside the user context (ex. host system faults).
+ *
+ * The STOP_COPY arc will terminate a data transfer session.
+ *
+ * RESUMING -> STOP
+ * Leaving RESUMING terminates a data transfer session and indicates the
+ * device should complete processing of the data delivered by write(). The
+ * kernel migration driver should complete the incorporation of data written
+ * to the data transfer FD into the device internal state and perform
+ * final validity and consistency checking of the new device state. If the
+ * user provided data is found to be incomplete, inconsistent, or otherwise
+ * invalid, the migration driver must fail the SET_STATE ioctl and
+ * optionally go to the ERROR state as described below.
+ *
+ * While in STOP the device has the same behavior as other STOP states
+ * described above.
+ *
+ * To abort a RESUMING session the device must be reset.
+ *
+ * RUNNING_P2P -> RUNNING
+ * While in RUNNING the device is fully operational, the device may generate
+ * interrupts, DMA, respond to MMIO, all vfio device regions are functional,
+ * and the device may advance its internal state.
+ *
+ * RUNNING -> RUNNING_P2P
+ * STOP -> RUNNING_P2P
+ * While in RUNNING_P2P the device is partially running in the P2P quiescent
+ * state defined below.
+ *
+ * STOP -> STOP_COPY
+ * This arc begin the process of saving the device state and will return a
+ * new data_fd.
+ *
+ * While in the STOP_COPY state the device has the same behavior as STOP
+ * with the addition that the data transfers session continues to stream the
+ * migration state. End of stream on the FD indicates the entire device
+ * state has been transferred.
+ *
+ * The user should take steps to restrict access to vfio device regions while
+ * the device is in STOP_COPY or risk corruption of the device migration data
+ * stream.
+ *
+ * STOP -> RESUMING
+ * Entering the RESUMING state starts a process of restoring the device state
+ * and will return a new data_fd. The data stream fed into the data_fd should
+ * be taken from the data transfer output of a single FD during saving from
+ * a compatible device. The migration driver may alter/reset the internal
+ * device state for this arc if required to prepare the device to receive the
+ * migration data.
+ *
+ * any -> ERROR
+ * ERROR cannot be specified as a device state, however any transition request
+ * can be failed with an errno return and may then move the device_state into
+ * ERROR. In this case the device was unable to execute the requested arc and
+ * was also unable to restore the device to any valid device_state.
+ * To recover from ERROR VFIO_DEVICE_RESET must be used to return the
+ * device_state back to RUNNING.
+ *
+ * The optional peer to peer (P2P) quiescent state is intended to be a quiescent
+ * state for the device for the purposes of managing multiple devices within a
+ * user context where peer-to-peer DMA between devices may be active. The
+ * RUNNING_P2P states must prevent the device from initiating
+ * any new P2P DMA transactions. If the device can identify P2P transactions
+ * then it can stop only P2P DMA, otherwise it must stop all DMA. The migration
+ * driver must complete any such outstanding operations prior to completing the
+ * FSM arc into a P2P state. For the purpose of specification the states
+ * behave as though the device was fully running if not supported. Like while in
+ * STOP or STOP_COPY the user must not touch the device, otherwise the state
+ * can be exited.
+ *
+ * The remaining possible transitions are interpreted as combinations of the
+ * above FSM arcs. As there are multiple paths through the FSM arcs the path
+ * should be selected based on the following rules:
+ * - Select the shortest path.
+ * Refer to vfio_mig_get_next_state() for the result of the algorithm.
+ *
+ * The automatic transit through the FSM arcs that make up the combination
+ * transition is invisible to the user. When working with combination arcs the
+ * user may see any step along the path in the device_state if SET_STATE
+ * fails. When handling these types of errors users should anticipate future
+ * revisions of this protocol using new states and those states becoming
+ * visible in this case.
+ *
+ * The optional states cannot be used with SET_STATE if the device does not
+ * support them. The user can discover if these states are supported by using
+ * VFIO_DEVICE_FEATURE_MIGRATION. By using combination transitions the user can
+ * avoid knowing about these optional states if the kernel driver supports them.
+ */
+enum vfio_device_mig_state {
+ VFIO_DEVICE_STATE_ERROR = 0,
+ VFIO_DEVICE_STATE_STOP = 1,
+ VFIO_DEVICE_STATE_RUNNING = 2,
+ VFIO_DEVICE_STATE_STOP_COPY = 3,
+ VFIO_DEVICE_STATE_RESUMING = 4,
+ VFIO_DEVICE_STATE_RUNNING_P2P = 5,
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**