summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds2023-02-2533-422/+756
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull VFIO updates from Alex Williamson: - Remove redundant resource check in vfio-platform (Angus Chen) - Use GFP_KERNEL_ACCOUNT for persistent userspace allocations, allowing removal of arbitrary kernel limits in favor of cgroup control (Yishai Hadas) - mdev tidy-ups, including removing the module-only build restriction for sample drivers, Kconfig changes to select mdev support, documentation movement to keep sample driver usage instructions with sample drivers rather than with API docs, remove references to out-of-tree drivers in docs (Christoph Hellwig) - Fix collateral breakages from mdev Kconfig changes (Arnd Bergmann) - Make mlx5 migration support match device support, improve source and target flows to improve pre-copy support and reduce downtime (Yishai Hadas) - Convert additional mdev sysfs case to use sysfs_emit() (Bo Liu) - Resolve copy-paste error in mdev mbochs sample driver Kconfig (Ye Xingchen) - Avoid propagating missing reset error in vfio-platform if reset requirement is relaxed by module option (Tomasz Duszynski) - Range size fixes in mlx5 variant driver for missed last byte and stricter range calculation (Yishai Hadas) - Fixes to suspended vaddr support and locked_vm accounting, excluding mdev configurations from the former due to potential to indefinitely block kernel threads, fix underflow and restore locked_vm on new mm (Steve Sistare) - Update outdated vfio documentation due to new IOMMUFD interfaces in recent kernels (Yi Liu) - Resolve deadlock between group_lock and kvm_lock, finally (Matthew Rosato) - Fix NULL pointer in group initialization error path with IOMMUFD (Yan Zhao) * tag 'vfio-v6.3-rc1' of https://github.com/awilliam/linux-vfio: (32 commits) vfio: Fix NULL pointer dereference caused by uninitialized group->iommufd docs: vfio: Update vfio.rst per latest interfaces vfio: Update the kdoc for vfio_device_ops vfio/mlx5: Fix range size calculation upon tracker creation vfio: no need to pass kvm pointer during device open vfio: fix deadlock between group lock and kvm lock vfio: revert "iommu driver notify callback" vfio/type1: revert "implement notify callback" vfio/type1: revert "block on invalid vaddr" vfio/type1: restore locked_vm vfio/type1: track locked_vm per dma vfio/type1: prevent underflow of locked_vm via exec() vfio/type1: exclude mdevs from VFIO_UPDATE_VADDR vfio: platform: ignore missing reset if disabled at module init vfio/mlx5: Improve the target side flow to reduce downtime vfio/mlx5: Improve the source side flow upon pre_copy vfio/mlx5: Check whether VF is migratable samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHS vfio/mdev: Use sysfs_emit() to instead of sprintf() vfio-mdev: add back CONFIG_VFIO dependency ...
| * vfio: Fix NULL pointer dereference caused by uninitialized group->iommufdYan Zhao2023-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | group->iommufd is not initialized for the iommufd_ctx_put() [20018.331541] BUG: kernel NULL pointer dereference, address: 0000000000000000 [20018.377508] RIP: 0010:iommufd_ctx_put+0x5/0x10 [iommufd] ... [20018.476483] Call Trace: [20018.479214] <TASK> [20018.481555] vfio_group_fops_unl_ioctl+0x506/0x690 [vfio] [20018.487586] __x64_sys_ioctl+0x6a/0xb0 [20018.491773] ? trace_hardirqs_on+0xc5/0xe0 [20018.496347] do_syscall_64+0x67/0x90 [20018.500340] entry_SYSCALL_64_after_hwframe+0x4b/0xb5 Fixes: 9eefba8002c2 ("vfio: Move vfio group specific code into group.c") Cc: stable@vger.kernel.org Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230222074938.13681-1-yan.y.zhao@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * docs: vfio: Update vfio.rst per latest interfacesYi Liu2023-02-091-22/+60
| | | | | | | | | | | | | | | | | | this imports the latest vfio_device_ops definition to vfio.rst. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230209081210.141372-3-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: Update the kdoc for vfio_device_opsYi Liu2023-02-091-0/+4
| | | | | | | | | | | | | | | | | | this is missed when adding bind_iommufd/unbind_iommufd and attach_ioas. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/20230209081210.141372-2-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mlx5: Fix range size calculation upon tracker creationYishai Hadas2023-02-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | Fix range size calculation to include the last byte of each range. In addition, log round up the length of the total ranges to be stricter. Fixes: c1d050b0d169 ("vfio/mlx5: Create and destroy page tracker object") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20230208152234.32370-1-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: no need to pass kvm pointer during device openMatthew Rosato2023-02-093-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | Nothing uses this value during vfio_device_open anymore so it's safe to remove it. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230203215027.151988-3-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: fix deadlock between group lock and kvm lockMatthew Rosato2023-02-094-15/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After 51cdc8bc120e, we have another deadlock scenario between the kvm->lock and the vfio group_lock with two different codepaths acquiring the locks in different order. Specifically in vfio_open_device, vfio holds the vfio group_lock when issuing device->ops->open_device but some drivers (like vfio-ap) need to acquire kvm->lock during their open_device routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first before calling vfio_file_set_kvm which will acquire the vfio group_lock. To resolve this, let's remove the need for the vfio group_lock from the kvm_vfio_release codepath. This is done by introducing a new spinlock to protect modifications to the vfio group kvm pointer, and acquiring a kvm ref from within vfio while holding this spinlock, with the reference held until the last close for the device in question. Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock") Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Tested-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20230203215027.151988-2-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: revert "iommu driver notify callback"Steve Sistare2023-02-092-12/+0
| | | | | | | | | | | | | | | | | | | | | | Revert this dead code: commit ec5e32940cc9 ("vfio: iommu driver notify callback") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/type1: revert "implement notify callback"Steve Sistare2023-02-091-15/+0
| | | | | | | | | | | | | | | | | | | | | | This is dead code. Revert it. commit 487ace134053 ("vfio/type1: implement notify callback") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/type1: revert "block on invalid vaddr"Steve Sistare2023-02-091-89/+5
| | | | | | | | | | | | | | | | | | | | | | Revert this dead code: commit 898b9eaeb3fe ("vfio/type1: block on invalid vaddr") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-6-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/type1: restore locked_vmSteve Sistare2023-02-091-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a vfio container is preserved across exec or fork-exec, the new task's mm has a locked_vm count of 0. After a dma vaddr is updated using VFIO_DMA_MAP_FLAG_VADDR, locked_vm remains 0, and the pinned memory does not count against the task's RLIMIT_MEMLOCK. To restore the correct locked_vm count, when VFIO_DMA_MAP_FLAG_VADDR is used and the dma's mm has changed, add the dma's locked_vm count to the new mm->locked_vm, subject to the rlimit, and subtract it from the old mm->locked_vm. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") Cc: stable@vger.kernel.org Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/type1: track locked_vm per dmaSteve Sistare2023-02-091-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | Track locked_vm per dma struct, and create a new subroutine, both for use in a subsequent patch. No functional change. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") Cc: stable@vger.kernel.org Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-4-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/type1: prevent underflow of locked_vm via exec()Steve Sistare2023-02-091-27/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a vfio container is preserved across exec, the task does not change, but it gets a new mm with locked_vm=0, and loses the count from existing dma mappings. If the user later unmaps a dma mapping, locked_vm underflows to a large unsigned value, and a subsequent dma map request fails with ENOMEM in __account_locked_vm. To avoid underflow, grab and save the mm at the time a dma is mapped. Use that mm when adjusting locked_vm, rather than re-acquiring the saved task's mm, which may have changed. If the saved mm is dead, do nothing. locked_vm is incremented for existing mappings in a subsequent patch. Fixes: 73fa0d10d077 ("vfio: Type1 IOMMU implementation") Cc: stable@vger.kernel.org Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/type1: exclude mdevs from VFIO_UPDATE_VADDRSteve Sistare2023-02-092-8/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disable the VFIO_UPDATE_VADDR capability if mediated devices are present. Their kernel threads could be blocked indefinitely by a misbehaving userland while trying to pin/unpin pages while vaddrs are being updated. Do not allow groups to be added to the container while vaddr's are invalid, so we never need to block user threads from pinning, and can delete the vaddr-waiting code in a subsequent patch. Fixes: c3cbab24db38 ("vfio/type1: implement interfaces to update vaddr") Cc: stable@vger.kernel.org Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/1675184289-267876-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: platform: ignore missing reset if disabled at module initTomasz Duszynski2023-02-011-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | If reset requirement was relaxed via module parameter, errors caused by missing reset should not be propagated down to the vfio core. Otherwise initialization will fail. Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com> Fixes: 5f6c7e0831a1 ("vfio/platform: Use the new device life cycle helpers") Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20230131083349.2027189-1-tduszynski@marvell.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mlx5: Improve the target side flow to reduce downtimeYishai Hadas2023-01-302-12/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the target side flow to reduce downtime as of below. - Support reading an optional record which includes the expected stop_copy size. - Once the source sends this record data, which expects to be sent as part of the pre_copy flow, prepare the data buffers that may be large enough to hold the final stop_copy data. The above reduces the migration downtime as the relevant stuff that is needed to load the image data is prepared ahead as part of pre_copy. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20230124144955.139901-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mlx5: Improve the source side flow upon pre_copyYishai Hadas2023-01-303-34/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the source side flow upon pre_copy as of below. - Prepare the stop_copy buffers as part of moving to pre_copy. - Send to the target a record that includes the expected stop_copy size to let it optimize its stop_copy flow as well. As for sending the target this new record type (i.e. MLX5_MIGF_HEADER_TAG_STOP_COPY_SIZE) we split the current 64 header flags bits into 32 flags bits and another 32 tag bits, each record may have a tag and a flag whether it's optional or mandatory. Optional records will be ignored in the target. The above reduces the downtime upon stop_copy as the relevant data stuff is prepared ahead as part of pre_copy. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20230124144955.139901-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mlx5: Check whether VF is migratableShay Drory2023-01-302-0/+28
| | | | | | | | | | | | | | | | | | | | Add a check whether VF is migratable. Only if VF is migratable, mark the VFIO device as migration capable. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20230124144955.139901-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * samples: fix the prompt about SAMPLE_VFIO_MDEV_MBOCHSye xingchen2023-01-301-1/+1
| | | | | | | | | | | | | | | | | | Change the prompt about SAMPLE_VFIO_MDEV_MBOCHS as 'Build VFIO mbochs example mediated device sample code'. Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Link: https://lore.kernel.org/r/202301301013518438986@zte.com.cn Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mdev: Use sysfs_emit() to instead of sprintf()Bo Liu2023-01-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu <liubo03@inspur.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230129084117.2384-1-liubo03@inspur.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio-mdev: add back CONFIG_VFIO dependencyArnd Bergmann2023-01-303-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_VFIO_MDEV cannot be selected when VFIO itself is disabled, otherwise we get a link failure: WARNING: unmet direct dependencies detected for VFIO_MDEV Depends on [n]: VFIO [=n] Selected by [y]: - SAMPLE_VFIO_MDEV_MTTY [=y] && SAMPLES [=y] - SAMPLE_VFIO_MDEV_MDPY [=y] && SAMPLES [=y] - SAMPLE_VFIO_MDEV_MBOCHS [=y] && SAMPLES [=y] /home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_remove': mdpy.c:(.text+0x1e1): undefined reference to `vfio_unregister_group_dev' /home/arnd/cross/arm64/gcc-13.0.1-nolibc/x86_64-linux/bin/x86_64-linux-ld: samples/vfio-mdev/mdpy.o: in function `mdpy_probe': mdpy.c:(.text+0x149e): undefined reference to `_vfio_alloc_device' Fixes: 8bf8c5ee1f38 ("vfio-mdev: turn VFIO_MDEV into a selectable symbol") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20230126211211.1762319-1-arnd@kernel.org Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/platform: Use GFP_KERNEL_ACCOUNT for userspace persistent allocationsYishai Hadas2023-01-232-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use GFP_KERNEL_ACCOUNT for userspace persistent allocations. The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this is untrusted allocation triggered from userspace and should be a subject of kmem accounting, and as such it is controlled by the cgroup mechanism. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230108154427.32609-7-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/fsl-mc: Use GFP_KERNEL_ACCOUNT for userspace persistent allocationsYishai Hadas2023-01-232-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use GFP_KERNEL_ACCOUNT for userspace persistent allocations. The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this is untrusted allocation triggered from userspace and should be a subject of kmem accounting, and as such it is controlled by the cgroup mechanism. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230108154427.32609-6-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/hisi: Use GFP_KERNEL_ACCOUNT for userspace persistent allocationsYishai Hadas2023-01-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use GFP_KERNEL_ACCOUNT for userspace persistent allocations. The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this is untrusted allocation triggered from userspace and should be a subject of kmem accounting, and as such it is controlled by the cgroup mechanism. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230108154427.32609-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: Use GFP_KERNEL_ACCOUNT for userspace persistent allocationsJason Gunthorpe2023-01-237-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use GFP_KERNEL_ACCOUNT for userspace persistent allocations. The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this is untrusted allocation triggered from userspace and should be a subject of kmem accounting, and as such it is controlled by the cgroup mechanism. The way to find the relevant allocations was for example to look at the close_device function and trace back all the kfrees to their allocations. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230108154427.32609-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mlx5: Allow loading of larger images than 512 MBYishai Hadas2023-01-232-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow loading of larger images than 512 MB by dropping the arbitrary hard-coded value that we have today and move to use the max device loading value which is for now 4GB. As part of that we move to use the GFP_KERNEL_ACCOUNT option upon allocating the persistent data of mlx5 and rely on the cgroup to provide the memory limit for the given user. The GFP_KERNEL_ACCOUNT option lets the memory allocator know that this is untrusted allocation triggered from userspace and should be a subject of kmem accounting, and as such it is controlled by the cgroup mechanism. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230108154427.32609-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio/mlx5: Fix UBSAN noteYishai Hadas2023-01-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent calling roundup_pow_of_two() with value of 0 as it causes the below UBSAN note. Move this code and its few extra related lines to be called only when it's really applicable. UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 15 PID: 1639 Comm: live_migration Not tainted 6.1.0-rc4 #1116 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 ubsan_epilogue+0x5/0x36 __ubsan_handle_shift_out_of_bounds.cold+0x61/0xef ? lock_is_held_type+0x98/0x110 ? rcu_read_lock_sched_held+0x3f/0x70 mlx5vf_create_rc_qp.cold+0xe4/0xf2 [mlx5_vfio_pci] mlx5vf_start_page_tracker+0x769/0xcd0 [mlx5_vfio_pci] vfio_device_fops_unl_ioctl+0x63f/0x700 [vfio] __x64_sys_ioctl+0x433/0x9a0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> Fixes: 79c3cf279926 ("vfio/mlx5: Init QP based resources for dirty tracking") Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230108154427.32609-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio-mdev: remove an non-existing driver from vfio-mediated-deviceChristoph Hellwig2023-01-231-7/+1
| | | | | | | | | | | | | | | | | | | | The nvidia mdev driver does not actually exist anywhere in the tree. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20230110091009.474427-5-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio-mdev: move the mtty usage documentationChristoph Hellwig2023-01-232-100/+100
| | | | | | | | | | | | | | | | | | | | | | Move the documentation on how to use mtty to samples/vfio-mdev/README.rst as it is in no way related to the vfio API. This matches how the bpf and pktgen samples are documented. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20230110091009.474427-4-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio-mdev: turn VFIO_MDEV into a selectable symbolChristoph Hellwig2023-01-237-16/+9
| | | | | | | | | | | | | | | | | | | | | | VFIO_MDEV is just a library with helpers for the drivers. Stop making it a user choice and just select it by the drivers that use the helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20230110091009.474427-3-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio-mdev: allow building the samples into the kernelChristoph Hellwig2023-01-231-8/+8
| | | | | | | | | | | | | | | | | | | | | | There is nothing in the vfio-mdev sample drivers that requires building them as modules, so remove that restriction. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20230110091009.474427-2-hch@lst.de Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * vfio: platform: No need to check res againAngus Chen2023-01-231-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | In function vfio_platform_regions_init(),we did check res implied by using while loop, so no need to check whether res be null or not again. No functional change intended. Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20230107034721.2127-1-angus.chen@jaguarmicro.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * MAINTAINERS: step down as vfio reviewerCornelia Huck2023-01-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | As my focus has shifted in recent months, my involvement with vfio has decreased to occasionally reviewing some simpler patches, which is probably less than you'd expect when you cc: someone for review. Given that I currently don't have spare time to invest in looking at vfio things, let's adjust the entry to match reality. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20230112145707.27941-1-cohuck@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2023-02-2547-502/+3535
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - device feature provisioning in ifcvf, mlx5 - new SolidNET driver - support for zoned block device in virtio blk - numa support in virtio pmem - VIRTIO_F_RING_RESET support in vhost-net - more debugfs entries in mlx5 - resume support in vdpa - completion batching in virtio blk - cleanup of dma api use in vdpa - now simulating more features in vdpa-sim - documentation, features, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa/mlx5: support device features provisioning vdpa/mlx5: make MTU/STATUS presence conditional on feature bits vdpa: validate device feature provisioning against supported class vdpa: validate provisioned device features against specified attribute vdpa: conditionally read STATUS in config space vdpa: fix improper error message when adding vdpa dev vdpa/mlx5: Initialize CVQ iotlb spinlock vdpa/mlx5: Don't clear mr struct on destroy MR vdpa/mlx5: Directly assign memory key tools/virtio: enable to build with retpoline vringh: fix a typo in comments for vringh_kiov vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails scsi: virtio_scsi: fix handling of kmalloc failure vdpa: Fix a couple of spelling mistakes in some messages vhost-net: support VIRTIO_F_RING_RESET vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit vdpa: mlx5: support per virtqueue dma device vdpa: set dma mask for vDPA device virtio-vdpa: support per vq dma device vdpa: introduce get_vq_dma_device() ...
| * | vdpa/mlx5: support device features provisioningSi-Wei Liu2023-02-211-8/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements features provisioning for mlx5_vdpa. 1) Validate the provisioned features are a subset of the parent features. 2) Clearing features that are not wanted by userspace. For example: # vdpa mgmtdev show pci/0000:41:04.2: supported_classes net max_supported_vqs 65 dev_features CSUM GUEST_CSUM MTU MAC HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 1) Provision vDPA device with all features derived from the parent # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 # vdpa dev config show vdpa1: mac e4:11:c6:d3:45:f0 link up link_announce false max_vq_pairs 1 mtu 1500 negotiated_features CSUM GUEST_CSUM MTU HOST_TSO4 HOST_TSO6 STATUS CTRL_VQ CTRL_VLAN MQ CTRL_MAC_ADDR VERSION_1 ACCESS_PLATFORM 2) Provision vDPA device with a subset of parent features # vdpa dev add name vdpa1 mgmtdev pci/0000:41:04.2 device_features 0x300020000 # vdpa dev config show vdpa1: negotiated_features CTRL_VQ VERSION_1 ACCESS_PLATFORM Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-7-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa/mlx5: make MTU/STATUS presence conditional on feature bitsSi-Wei Liu2023-02-211-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spec says: mtu only exists if VIRTIO_NET_F_MTU is set status only exists if VIRTIO_NET_F_STATUS is set We should only present MTU and STATUS conditionally depending on the feature bits. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-6-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa: validate device feature provisioning against supported classSi-Wei Liu2023-02-211-9/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Today when device features are explicitly provisioned, the features user supplied may contain device class specific features that are not supported by the parent management device. On the other hand, when parent management device supports more than one class, the device features to provision may be ambiguous if none of the class specific attributes is provided at the same time. Validate these cases and prompt appropriate user errors accordingly. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Message-Id: <1675725124-7375-5-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa: validate provisioned device features against specified attributeSi-Wei Liu2023-02-211-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With device feature provisioning, there's a chance for misconfiguration that the vdpa feature attribute supplied in 'vdpa dev add' command doesn't get selected on the device_features to be provisioned. For instance, when a @mac attribute is specified, the corresponding feature bit _F_MAC in device_features should be set for consistency. If there's conflict on provisioned features against the attribute, it should be treated as an error to fail the ambiguous command. Noted the opposite is not necessarily true, for e.g. it's okay to have _F_MAC set in device_features without providing a corresponding @mac attribute, in which case the vdpa vendor driver could load certain default value for attribute that is not explicitly specified. Generalize this check in vdpa core so that there's no duplicate code in each vendor driver. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-4-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa: conditionally read STATUS in config spaceSi-Wei Liu2023-02-211-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The spec says: status only exists if VIRTIO_NET_F_STATUS is set Similar to MAC and MTU, vdpa_dev_net_config_fill() should read STATUS conditionally depending on the feature bits. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-3-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa: fix improper error message when adding vdpa devSi-Wei Liu2023-02-211-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In below example, before the fix, mtu attribute is supported by the parent mgmtdev, but the error message showing "All provided are not supported" is just misleading. $ vdpa mgmtdev show vdpasim_net: supported_classes net max_supported_vqs 3 dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2 Error: vdpa: All provided attributes are not supported. kernel answers: Operation not supported After fix, the relevant error message will be like: $ vdpa dev add mgmtdev vdpasim_net name vdpasim0 mtu 5000 max_vqp 2 Error: vdpa: Some provided attributes are not supported: 0x1000. kernel answers: Operation not supported Fixes: d8ca2fa5be1b ("vdpa: Enable user to set mac and mtu of vdpa device") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Message-Id: <1675725124-7375-2-git-send-email-si-wei.liu@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa/mlx5: Initialize CVQ iotlb spinlockEli Cohen2023-02-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Initialize itolb spinlock. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230206122016.1149373-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa/mlx5: Don't clear mr struct on destroy MREli Cohen2023-02-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clearing the mr struct erases the lock owner and causes warnings to be emitted. It is not required to clear the mr so remove the memset call. Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230206121956.1149356-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa/mlx5: Directly assign memory keyEli Cohen2023-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a memory key, the key value should be assigned to the passed pointer and not or'ed to. No functional issue was observed due to this bug. Fixes: 29064bfdabd5 ("vdpa/mlx5: Add support library for mlx5 VDPA implementation") Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20230205072906.1108194-1-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | tools/virtio: enable to build with retpolineShunsuke Mie2023-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add build options to bring it close to a linux kernel. It allows for testing that is close to reality. Signed-off-by: Shunsuke Mie <mie@igel.co.jp> Message-Id: <20230202104538.2041879-1-mie@igel.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vringh: fix a typo in comments for vringh_kiovShunsuke Mie2023-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Probably it is a simple copy error from struct vring_iov. Signed-off-by: Shunsuke Mie <mie@igel.co.jp> Message-Id: <20230202104248.2040652-1-mie@igel.co.jp> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vhost-vdpa: print warning when vhost_vdpa_alloc_domain failsAlvaro Karsz2023-02-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a print explaining why vhost_vdpa_alloc_domain failed if the device is not IOMMU cache coherent capable. Without this print, we have no hint why the operation failed. For example: $ virsh start <domain> error: Failed to start domain <domain> error: Unable to open '/dev/vhost-vdpa-<idx>' for vdpa device: Unknown error 524 Suggested-by: Eugenio Perez Martin <eperezma@redhat.com> Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20230202084212.1328530-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio PĂ©rez Martin <eperezma@redhat.com>
| * | scsi: virtio_scsi: fix handling of kmalloc failureZheng Wang2023-02-211-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no check about the return value of kmalloc in virtscsi_rescan_hotunplug. Add the check to avoid use of null pointer 'inq_result' in case of the failure of kmalloc. Signed-off-by: Zheng Wang <zyytlz.wz@163.com> Message-Id: <20230202064124.22277-1-zyytlz.wz@163.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vdpa: Fix a couple of spelling mistakes in some messagesColin Ian King2023-02-212-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are two spelling mistakes in some literal strings. Fix them. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20230130092644.37002-1-colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * | vhost-net: support VIRTIO_F_RING_RESETKangjie Xu2023-02-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add VIRTIO_F_RING_RESET, which indicates that the driver can reset a queue individually. VIRTIO_F_RING_RESET feature is added to virtio-spec 1.2. The relevant information is in oasis-tcs/virtio-spec#124 oasis-tcs/virtio-spec#139 The implementation only adds the feature bit in supported features. It does not require any other changes because we reuse the existing vhost protocol. The virtqueue reset process can be concluded as two parts: 1. The driver can reset a virtqueue. When it is triggered, we use the set_backend to disable the virtqueue. 2. After the virtqueue is disabled, the driver may optionally re-enable it. The process is basically similar to when the device is started, except that the restart process does not need to set features and set mem table since they do not change. QEMU will send messages containing size, base, addr, kickfd and callfd of the virtqueue in order. Specifically, the host kernel will receive these messages in order: a. VHOST_SET_VRING_NUM b. VHOST_SET_VRING_BASE c. VHOST_SET_VRING_ADDR d. VHOST_SET_VRING_KICK e. VHOST_SET_VRING_CALL f. VHOST_NET_SET_BACKEND Finally, after we use set_backend to attach the virtqueue, the virtqueue will be enabled and start to work. Signed-off-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Message-Id: <20220825085610.80315-1-kangjie.xu@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * | vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emitBo Liu2023-02-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Follow the advice of the Documentation/filesystems/sysfs.rst and show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. Signed-off-by: Bo Liu <liubo03@inspur.com> Message-Id: <20230129091145.2837-1-liubo03@inspur.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>