summaryrefslogtreecommitdiffstats
path: root/drivers/vhost (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2021-02-251-6/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - new vdpa features to allow creation and deletion of new devices - virtio-blk support per-device queue depth - fixes, cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (31 commits) virtio-input: add multi-touch support virtio_mmio: fix one typo vdpa/mlx5: fix param validation in mlx5_vdpa_get_config() virtio_net: Fix fall-through warnings for Clang virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT. virtio-blk: support per-device queue depth virtio_vdpa: don't warn when fail to disable vq virtio-pci: introduce modern device module virito-pci-modern: rename map_capability() to vp_modern_map_capability() virtio-pci-modern: introduce helper to get notification offset virtio-pci-modern: introduce helper for getting queue nums virtio-pci-modern: introduce helper for setting/geting queue size virtio-pci-modern: introduce helper to set/get queue_enable virtio-pci-modern: introduce vp_modern_queue_address() virtio-pci-modern: introduce vp_modern_set_queue_vector() virtio-pci-modern: introduce vp_modern_generation() virtio-pci-modern: introduce helpers for setting and getting features virtio-pci-modern: introduce helpers for setting and getting status virtio-pci-modern: introduce helper to set config vector virtio-pci-modern: introduce vp_modern_remove() ...
| * vhost scsi: alloc vhost_scsi with kvzalloc() to avoid delayDongli Zhang2021-02-231-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of 'struct vhost_scsi' is order-10 (~2.3MB). It may take long time delay by kzalloc() to compact memory pages by retrying multiple times when there is a lack of high-order pages. As a result, there is latency to create a VM (with vhost-scsi) or to hotadd vhost-scsi-based storage. The prior commit 595cb754983d ("vhost/scsi: use vmalloc for order-10 allocation") prefers to fallback only when really needed, while this patch allocates with kvzalloc() with __GFP_NORETRY implicitly set to avoid retrying memory pages compact for multiple times. The __GFP_NORETRY is implicitly set if the size to allocate is more than PAGE_SZIE and when __GFP_RETRY_MAYFAIL is not explicitly set. Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Link: https://lore.kernel.org/r/20210123080853.4214-1-dongli.zhang@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* | vhost_net: avoid tx queue stuck when sendmsg failsYunjian Wang2021-01-191-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver doesn't drop a packet which can't be sent by tun (e.g bad packet). In this case, the driver will always process the same packet lead to the tx queue stuck. To fix this issue: 1. in the case of persistent failure (e.g bad packet), the driver can skip this descriptor by ignoring the error. 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), the driver schedules the worker to try again. Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/1610685980-38608-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2021-01-081-3/+65
|\| | | | | | | | | | | | | | | | | Trivial conflict in CAN on file rename. Conflicts: drivers/net/can/m_can/tcan4x5x-core.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2021-01-051-3/+65
| |\ | | | | | | | | | | | | | | | | | | | | | Pull vhost bugfix from Michael Tsirkin: "This fixes configs with vhost vsock behind a viommu" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/vsock: add IOTLB API support
| | * vhost/vsock: add IOTLB API supportStefano Garzarella2020-12-271-3/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables the IOTLB API support for vhost-vsock devices, allowing the userspace to emulate an IOMMU for the guest. These changes were made following vhost-net, in details this patch: - exposes VIRTIO_F_ACCESS_PLATFORM feature and inits the iotlb device if the feature is acked - implements VHOST_GET_BACKEND_FEATURES and VHOST_SET_BACKEND_FEATURES ioctls - calls vq_meta_prefetch() before vq processing to prefetch vq metadata address in IOTLB - provides .read_iter, .write_iter, and .poll callbacks for the chardev; they are used by the userspace to exchange IOTLB messages This patch was tested specifying "intel_iommu=strict" in the guest kernel command line. I used QEMU with a patch applied [1] to fix a simple issue (that patch was merged in QEMU v5.2.0): $ qemu -M q35,accel=kvm,kernel-irqchip=split \ -drive file=fedora.qcow2,format=qcow2,if=virtio \ -device intel-iommu,intremap=on,device-iotlb=on \ -device vhost-vsock-pci,guest-cid=3,iommu_platform=on,ats=on [1] https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg09077.html Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201223143638.123417-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* | | tap/tun: add skb_zcopy_init() helper for initialization.Jonathan Lemon2021-01-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Replace direct assignments with skb_zcopy_init() for zerocopy cases where a new skb is initialized, without changing the reference counts. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | skbuff: Add skb parameter to the ubuf zerocopy callbackJonathan Lemon2021-01-081-1/+2
|/ / | | | | | | | | | | | | | | | | | | Add an optional skb parameter to the zerocopy callback parameter, which is passed down from skb_zcopy_clear(). This gives access to the original skb, which is needed for upcoming RX zero-copy error handling. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge tag 'net-5.11-rc3' of ↵Linus Torvalds2021-01-051-3/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes, including fixes from netfilter, wireless and bpf trees. Current release - regressions: - mt76: fix NULL pointer dereference in mt76u_status_worker and mt76s_process_tx_queue - net: ipa: fix interconnect enable bug Current release - always broken: - netfilter: fixes possible oops in mtype_resize in ipset - ath11k: fix number of coding issues found by static analysis tools and spurious error messages Previous releases - regressions: - e1000e: re-enable s0ix power saving flows for systems with the Intel i219-LM Ethernet controllers to fix power use regression - virtio_net: fix recursive call to cpus_read_lock() to avoid a deadlock - ipv4: ignore ECN bits for fib lookups in fib_compute_spec_dst() - sysfs: take the rtnl lock around XPS configuration - xsk: fix memory leak for failed bind and rollback reservation at NETDEV_TX_BUSY - r8169: work around power-saving bug on some chip versions Previous releases - always broken: - dcb: validate netlink message in DCB handler - tun: fix return value when the number of iovs exceeds MAX_SKB_FRAGS to prevent unnecessary retries - vhost_net: fix ubuf refcount when sendmsg fails - bpf: save correct stopping point in file seq iteration - ncsi: use real net-device for response handler - neighbor: fix div by zero caused by a data race (TOCTOU) - bareudp: fix use of incorrect min_headroom size and a false positive lockdep splat from the TX lock - mvpp2: - clear force link UP during port init procedure in case bootloader had set it - add TCAM entry to drop flow control pause frames - fix PPPoE with ipv6 packet parsing - fix GoP Networking Complex Control config of port 3 - fix pkt coalescing IRQ-threshold configuration - xsk: fix race in SKB mode transmit with shared cq - ionic: account for vlan tag len in rx buffer len - stmmac: ignore the second clock input, current clock framework does not handle exclusive clock use well, other drivers may reconfigure the second clock Misc: - ppp: change PPPIOCUNBRIDGECHAN ioctl request number to follow existing scheme" * tag 'net-5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) net: dsa: lantiq_gswip: Fix GSWIP_MII_CFG(p) register access net: dsa: lantiq_gswip: Enable GSWIP_MII_CFG_EN also for internal PHYs net: lapb: Decrease the refcount of "struct lapb_cb" in lapb_device_event r8169: work around power-saving bug on some chip versions net: usb: qmi_wwan: add Quectel EM160R-GL selftests: mlxsw: Set headroom size of correct port net: macb: Correct usage of MACB_CAPS_CLK_HW_CHG flag ibmvnic: fix: NULL pointer dereference. docs: networking: packet_mmap: fix old config reference docs: networking: packet_mmap: fix formatting for C macros vhost_net: fix ubuf refcount incorrectly when sendmsg fails bareudp: Fix use of incorrect min_headroom size bareudp: set NETIF_F_LLTX flag net: hdlc_ppp: Fix issues when mod_timer is called while timer is running atlantic: remove architecture depends erspan: fix version 1 check in gre_parse_header() net: hns: fix return value check in __lb_other_process() net: sched: prevent invalid Scell_log shift count net: neighbor: fix a crash caused by mod zero ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst() ...
| * | vhost_net: fix ubuf refcount incorrectly when sendmsg failsYunjian Wang2021-01-041-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the vhost_zerocopy_callback() maybe be called to decrease the refcount when sendmsg fails in tun. The error handling in vhost handle_tx_zerocopy() will try to decrease the same refcount again. This is wrong. To fix this issue, we only call vhost_net_ubuf_put() when vq->heads[nvq->desc].len == VHOST_DMA_IN_PROGRESS. Fixes: bab632d69ee4 ("vhost: vhost TX zero-copy support") Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1609207308-20544-1-git-send-email-wangyunjian@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2020-12-242-8/+5
|\ \ \ | |/ / |/| / | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - vdpa sim refactoring - virtio mem: Big Block Mode support - misc cleanus, fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (61 commits) vdpa: Use simpler version of ida allocation vdpa: Add missing comment for virtqueue count uapi: virtio_ids: add missing device type IDs from OASIS spec uapi: virtio_ids.h: consistent indentions vhost scsi: fix error return code in vhost_scsi_set_endpoint() virtio_ring: Fix two use after free bugs virtio_net: Fix error code in probe() virtio_ring: Cut and paste bugs in vring_create_virtqueue_packed() tools/virtio: add barrier for aarch64 tools/virtio: add krealloc_array tools/virtio: include asm/bug.h vdpa/mlx5: Use write memory barrier after updating CQ index vdpa: split vdpasim to core and net modules vdpa_sim: split vdpasim_virtqueue's iov field in out_iov and in_iov vdpa_sim: make vdpasim->buffer size configurable vdpa_sim: use kvmalloc to allocate vdpasim->buffer vdpa_sim: set vringh notify callback vdpa_sim: add set_config callback in vdpasim_dev_attr vdpa_sim: add get_config callback in vdpasim_dev_attr vdpa_sim: make 'config' generic and usable for any device type ...
| * vhost scsi: fix error return code in vhost_scsi_set_endpoint()Zhang Changzhong2020-12-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 25b98b64e284 ("vhost scsi: alloc cmds per vq instead of session") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Link: https://lore.kernel.org/r/1607071411-33484-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost_vdpa: switch to vmemdup_user()Tian Tao2020-12-181-7/+3
| | | | | | | | | | | | | | | | | | Replace opencoded alloc and copy with vmemdup_user() Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1605057288-60400-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
* | vhost: vringh: use krealloc_array()Bartosz Golaszewski2020-12-151-1/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the helper that checks for overflows internally instead of manually calculating the size of the new array. Link: https://lkml.kernel.org/r/20201109110654.12547-5-brgl@bgdev.pl Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Christian Knig <christian.koenig@amd.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: David Rientjes <rientjes@google.com> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: James Morse <james.morse@arm.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Robert Richter <rric@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: Takashi Iwai <tiwai@suse.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vhost_vdpa: return -EFAULT if copy_to_user() failsDan Carpenter2020-12-021-1/+3
| | | | | | | | | | | | The copy_to_user() function returns the number of bytes remaining to be copied but this should return -EFAULT to the user. Fixes: 1b48dc03e575 ("vhost: vdpa: report iova range") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8c32z5EtDsMyyIL@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
* vhost-vdpa: fix page pinning leakage in error path (rework)Si-Wei Liu2020-11-251-18/+62
| | | | | | | | | | | | | | | | | | Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. The memory usage for bookkeeping pinned pages is reverted to what it was before: only one single free page is needed. This helps reduce the host memory demand for VM with a large amount of memory, or in the situation where host is running short of free memory. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1604618793-4681-1-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vringh: fix vringh_iov_push_*() documentationStefano Garzarella2020-11-251-3/+3
| | | | | | | | | | | vringh_iov_push_*() functions don't have 'dst' parameter, but have the 'src' parameter. Replace 'dst' description with 'src' description. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201116161653.102904-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost scsi: fix lun reset completion handlingMike Christie2020-11-251-1/+3
| | | | | | | | | | | | | | | vhost scsi owns the scsi se_cmd but lio frees the se_cmd->se_tmr before calling release_cmd, so while with normal cmd completion we can access the se_cmd from the vhost work, we can't do the same with se_cmd->se_tmr. This has us copy the tmf response in vhost_scsi_queue_tm_rsp to our internal vhost-scsi tmf struct for when it gets sent to the guest from our worker thread. Fixes: efd838fec17b ("vhost scsi: Add support for LUN resets.") Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/1605887459-3864-1-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost scsi: Add support for LUN resets.Mike Christie2020-11-151-13/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | In newer versions of virtio-scsi we just reset the timer when an a command times out, so TMFs are never sent for the cmd time out case. However, in older kernels and for the TMF inject cases, we can still get resets and we end up just failing immediately so the guest might see the device get offlined and IO errors. For the older kernel cases, we want the same end result as the modern virtio-scsi driver where we let the lower levels fire their error handling and handle the problem. And at the upper levels we want to wait. This patch ties the LUN reset handling into the LIO TMF code which will just wait for outstanding commands to complete like we are doing in the modern virtio-scsi case. Note: I did not handle the ABORT case to keep this simple. For ABORTs LIO just waits on the cmd like how it does for the RESET case. If an ABORT fails, the guest OS ends up escalating to LUN RESET, so in the end we get the same behavior where we wait on the outstanding cmds. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-6-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
* vhost scsi: add lun parser helperMike Christie2020-11-151-2/+7
| | | | | | | | | | | | Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
* vhost scsi: fix cmd completion raceMike Christie2020-11-151-27/+15
| | | | | | | | | | | | | | | | | We might not do the final se_cmd put from vhost_scsi_complete_cmd_work. When the last put happens a little later then we could race where vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends more IO, and vhost_scsi_handle_vq runs but does not find any free cmds. This patch has us delay completing the cmd until the last lio core ref is dropped. We then know that once we signal to the guest that the cmd is completed that if it queues a new command it will find a free cmd. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Maurizio Lombardi <mlombard@redhat.com> Link: https://lore.kernel.org/r/1604986403-4931-4-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
* vhost scsi: alloc cmds per vq instead of sessionMike Christie2020-11-151-79/+128
| | | | | | | | | | | | | | | | | | | | | | We currently are limited to 256 cmds per session. This leads to problems where if the user has increased virtqueue_size to more than 2 or cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest will get IO errors. This patch moves the cmd allocation to per vq so we can easily match whatever the user has specified for num_queues and virtqueue_size/cmd_per_lun. It also makes it easier to control how much memory we preallocate. For cases, where perf is not as important and we can use the current defaults (1 vq and 128 cmds per vq) memory use from preallocate cmds is cut in half. For cases, where we are willing to use more memory for higher perf, cmd mem use will now increase as the num queues and queue depth increases. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-3-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Maurizio Lombardi <mlombard@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
* vhost: add helper to check if a vq has been setupMike Christie2020-11-152-0/+7
| | | | | | | | | | | | | | This adds a helper check if a vq has been setup. The next patches will use this when we move the vhost scsi cmd preallocation from per session to per vq. In the per vq case, we only want to allocate cmds for vqs that have actually been setup and not for all the possible vqs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1604986403-4931-2-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
* vdpa: handle irq bypass register failure caseZhu Lingshan2020-10-301-0/+3
| | | | | | | | | | | | | | | | LKP considered variable 'ret' in vhost_vdpa_setup_vq_irq() as a unused variable, so suggest we remove it. Actually it stores return value of irq_bypass_register_producer(), but we did not check it, we should handle the failure case. This commit will print a message if irq bypass register producer fail, in this case, vqs still remain functional. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20201023104046.404794-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* Revert "vhost-vdpa: fix page pinning leakage in error path"Michael S. Tsirkin2020-10-301-71/+48
| | | | | | | | | | | This reverts commit 7ed9e3d97c32d969caded2dfb6e67c1a2cc5a0b1. The patch creates a DoS risk since it can result in a high order memory allocation. Fixes: 7ed9e3d97c32d ("vhost-vdpa: fix page pinning leakage in error path") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost_vdpa: Return -EFAULT if copy_from_user() failsDan Carpenter2020-10-301-5/+5
| | | | | | | | | | | | The copy_to/from_user() functions return the number of bytes which we weren't able to copy but the ioctl should return -EFAULT if they fail. Fixes: a127c5bbb6a8 ("vhost-vdpa: fix backend feature ioctls") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20201023120853.GI282278@mwanda Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Acked-by: Jason Wang <jasowang@redhat.com>
* vhost: vdpa: report iova rangeJason Wang2020-10-231-0/+41
| | | | | | | | | | | | | | | | | This patch introduces a new ioctl for vhost-vdpa device that can report the iova range by the device. For device that implements get_iova_range() method, we fetch it from the vDPA device. If device doesn't implement get_iova_range() but depends on platform IOMMU, we will query via DOMAIN_ATTR_GEOMETRY, otherwise [0, ULLONG_MAX] is assumed. For safety, this patch also rules out the map request which is not in the valid range. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20201023090043.14430-3-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost_vdpa: remove unnecessary spin_lock in vhost_vring_callZhu Lingshan2020-10-213-11/+1
| | | | | | | | | | | | This commit removed unnecessary spin_locks in vhost_vring_call and related operations. Because we manipulate irq offloading contents in vhost_vdpa ioctl code path which is already protected by dev mutex and vq mutex. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20200909065234.3313-1-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vringh: fix __vringh_iov() when riov and wiov are differentStefano Garzarella2020-10-211-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | If riov and wiov are both defined and they point to different objects, only riov is initialized. If the wiov is not initialized by the caller, the function fails returning -EINVAL and printing "Readable desc 0x... after writable" error message. This issue happens when descriptors have both readable and writable buffers (eg. virtio-blk devices has virtio_blk_outhdr in the readable buffer and status as last byte of writable buffer) and we call __vringh_iov() to get both type of buffers in two different iovecs. Let's replace the 'else if' clause with 'if' to initialize both riov and wiov if they are not NULL. As checkpatch pointed out, we also avoid crashing the kernel when riov and wiov are both NULL, replacing BUG() with WARN_ON() and returning -EINVAL. Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Cc: stable@vger.kernel.org Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201008204256.162292-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost_vdpa: Fix duplicate included kernel.hTian Tao2020-10-211-1/+0
| | | | | | | | | | linux/kernel.h is included more than once, Remove the one that isn't necessary. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1600131102-24672-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vhost: reduce stack usage in log_usedLi Wang2020-10-212-1/+2
| | | | | | | | | | | | | Fix the warning: [-Werror=-Wframe-larger-than=] drivers/vhost/vhost.c: In function log_used: drivers/vhost/vhost.c:1906:1: warning: the frame size of 1040 bytes is larger than 1024 bytes Signed-off-by: Li Wang <li.wang@windriver.com> Link: https://lore.kernel.org/r/1600106889-25013-1-git-send-email-li.wang@windriver.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vhost-vdpa: fix page pinning leakage in error pathSi-Wei Liu2020-10-041-48/+71
| | | | | | | | | | | | | | | | | | | Pinned pages are not properly accounted particularly when mapping error occurs on IOTLB update. Clean up dangling pinned pages for the error path. As the inflight pinned pages, specifically for memory region that strides across multiple chunks, would need more than one free page for book keeping and accounting. For simplicity, pin pages for all memory in the IOVA range in one go rather than have multiple pin_user_pages calls to make up the entire region. This way it's easier to track and account the pages already mapped, particularly for clean-up in the error path. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1601701330-16837-3-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-vdpa: fix vhost_vdpa_map() on error conditionSi-Wei Liu2020-10-041-0/+3
| | | | | | | | | | vhost_vdpa_map() should remove the iotlb entry just added if the corresponding mapping fails to set up properly. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Link: https://lore.kernel.org/r/1601701330-16837-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: Don't call log_access_ok() when using IOTLBGreg Kurz2020-10-041-5/+18
| | | | | | | | | | | | | | | | | | When the IOTLB device is enabled, the log_guest_addr that is passed by userspace to the VHOST_SET_VRING_ADDR ioctl, and which is then written to vq->log_addr, is a GIOVA. All writes to this address are translated by log_user() to writes to an HVA, and then ultimately logged through the corresponding GPAs in log_write_hva(). No logging will ever occur with vq->log_addr in this case. It is thus wrong to pass vq->log_addr and log_guest_addr to log_access_vq() which assumes they are actual GPAs. Introduce a new vq_log_used_access_ok() helper that only checks accesses to the log for the used structure when there isn't an IOTLB device around. Signed-off-by: Greg Kurz <groug@kaod.org> Link: https://lore.kernel.org/r/160171933385.284610.10189082586063280867.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: Use vhost_get_used_size() in vhost_vring_set_addr()Greg Kurz2020-10-041-2/+1
| | | | | | | | | | | | The open-coded computation of the used size doesn't take the event into account when the VIRTIO_RING_F_EVENT_IDX feature is present. Fix that by using vhost_get_used_size(). Fixes: 8ea8cf89e19a ("vhost: support event index") Cc: stable@vger.kernel.org Signed-off-by: Greg Kurz <groug@kaod.org> Link: https://lore.kernel.org/r/160171932300.284610.11846106312938909461.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: Don't call access_ok() when using IOTLBGreg Kurz2020-10-041-4/+5
| | | | | | | | | | | | | | | | | | | | | When the IOTLB device is enabled, the vring addresses we get from userspace are GIOVAs. It is thus wrong to pass them down to access_ok() which only takes HVAs. Access validation is done at prefetch time with IOTLB. Teach vq_access_ok() about that by moving the (vq->iotlb) check from vhost_vq_access_ok() to vq_access_ok(). This prevents vhost_vring_set_addr() to fail when verifying the accesses. No behavior change for vhost_vq_access_ok(). BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1883084 Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API") Cc: jasowang@redhat.com CC: stable@vger.kernel.org # 4.14+ Signed-off-by: Greg Kurz <groug@kaod.org> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/160171931213.284610.2052489816407219136.stgit@bahia.lan Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost vdpa: fix vhost_vdpa_open error handlingMike Christie2020-09-301-0/+1
| | | | | | | | | | We must free the vqs array in the open failure path, because vhost_vdpa_release will not be called. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/1600712588-9514-2-git-send-email-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
* vhost-vdpa: fix backend feature ioctlsJason Wang2020-09-241-14/+16
| | | | | | | | | | | | | | | | | | | Commit 653055b9acd4 ("vhost-vdpa: support get/set backend features") introduces two malfunction backend features ioctls: 1) the ioctls was blindly added to vring ioctl instead of vdpa device ioctl 2) vhost_set_backend_features() was called when dev mutex has already been held which will lead a deadlock This patch fixes the above issues. Cc: Eli Cohen <elic@nvidia.com> Reported-by: Zhu Lingshan <lingshan.zhu@intel.com> Fixes: 653055b9acd4 ("vhost-vdpa: support get/set backend features") Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200907104343.31141-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: Fix documentationEli Cohen2020-09-241-2/+2
| | | | | | | | | | Fix documentation to match actual function prototypes "end" used instead of "last". Fix that. Signed-off-by: Eli Cohen <eli@mellanox.com> Link: https://lore.kernel.org/r/20200630052925.GA157062@mtl-vdi-166.wap.labs.mlnx Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds2020-09-041-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi Kivilinna. 2) Fix loss of RTT samples in rxrpc, from David Howells. 3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu. 4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka. 5) We disable BH for too lokng in sctp_get_port_local(), add a cond_resched() here as well, from Xin Long. 6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu. 7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from Yonghong Song. 8) Missing of_node_put() in mt7530 DSA driver, from Sumera Priyadarsini. 9) Fix crash in bnxt_fw_reset_task(), from Michael Chan. 10) Fix geneve tunnel checksumming bug in hns3, from Yi Li. 11) Memory leak in rxkad_verify_response, from Dinghao Liu. 12) In tipc, don't use smp_processor_id() in preemptible context. From Tuong Lien. 13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu. 14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter. 15) Fix ABI mismatch between driver and firmware in nfp, from Louis Peens. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits) net/smc: fix sock refcounting in case of termination net/smc: reset sndbuf_desc if freed net/smc: set rx_off for SMCR explicitly net/smc: fix toleration of fake add_link messages tg3: Fix soft lockup when tg3_reset_task() fails. doc: net: dsa: Fix typo in config code sample net: dp83867: Fix WoL SecureOn password nfp: flower: fix ABI mismatch between driver and firmware tipc: fix shutdown() of connectionless socket ipv6: Fix sysctl max for fib_multipath_hash_policy drivers/net/wan/hdlc: Change the default of hard_header_len to 0 net: gemini: Fix another missing clk_disable_unprepare() in probe net: bcmgenet: fix mask check in bcmgenet_validate_flow() amd-xgbe: Add support for new port mode net: usb: dm9601: Add USB ID of Keenetic Plus DSL vhost: fix typo in error message net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init() pktgen: fix error message with wrong function name net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode cxgb4: fix thermal zone device registration ...
| * vhost: fix typo in error messageYunsheng Lin2020-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | "enable" should be "disable" when the function name is vhost_disable_notify(), which does the disabling work. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vhost-iotlb: fix vhost_iotlb_itree_next() documentationStefano Garzarella2020-08-261-2/+2
|/ | | | | | | | | | This patch contains trivial changes for the vhost_iotlb_itree_next() documentation, fixing the function name and the description of first argument (@map). Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20200825130543.43308-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2020-08-115-87/+169
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio updates from Michael Tsirkin: - IRQ bypass support for vdpa and IFC - MLX5 vdpa driver - Endianness fixes for virtio drivers - Misc other fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits) vdpa/mlx5: fix up endian-ness for mtu vdpa: Fix pointer math bug in vdpasim_get_config() vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config() vdpa/mlx5: fix memory allocation failure checks vdpa/mlx5: Fix uninitialised variable in core/mr.c vdpa_sim: init iommu lock virtio_config: fix up warnings on parisc vdpa/mlx5: Add VDPA driver for supported mlx5 devices vdpa/mlx5: Add shared memory registration code vdpa/mlx5: Add support library for mlx5 VDPA implementation vdpa/mlx5: Add hardware descriptive header file vdpa: Modify get_vq_state() to return error code net/vdpa: Use struct for set/get vq state vdpa: remove hard coded virtq num vdpasim: support batch updating vhost-vdpa: support IOTLB batching hints vhost-vdpa: support get/set backend features vhost: generialize backend features setting/getting vhost-vdpa: refine ioctl pre-processing vDPA: dont change vq irq after DRIVER_OK ...
| * vdpa: Modify get_vq_state() to return error codeEli Cohen2020-08-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | Modify get_vq_state() so it returns an error code. In case of hardware acceleration, the available index may be retrieved from the device, an operation that can possibly fail. Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Link: https://lore.kernel.org/r/20200804162048.22587-9-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
| * net/vdpa: Use struct for set/get vq stateEli Cohen2020-08-061-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | For now VQ state involves 16 bit available index value encoded in u64 variable. In the future it will be extended to contain more fields. Use struct to contain the state, now containing only a single u16 for the available index. In the future we can add fields to this struct. Reviewed-by: Parav Pandit <parav@mellanox.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <eli@mellanox.com> Link: https://lore.kernel.org/r/20200804162048.22587-8-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vdpa: remove hard coded virtq numMax Gurtovoy2020-08-061-6/+3
| | | | | | | | | | | | | | | | | | | | This will enable vdpa providers to add support for multi queue feature and publish it to upper layers (vhost and virtio). Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-7-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost-vdpa: support IOTLB batching hintsJason Wang2020-08-061-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patches extend the vhost IOTLB API to accept batch updating hints form userspace. When userspace wants update the device IOTLB in a batch, it may do: 1) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_BEGIN flag 2) Perform a batch of IOTLB updating via VHOST_IOTLB_UPDATE/INVALIDATE 3) Write vhost_iotlb_msg with VHOST_IOTLB_BATCH_END flag Vhost-vdpa may decide to batch the IOMMU/IOTLB updating in step 3 when vDPA device support set_map() ops. This is useful for the vDPA device that want to know all the mappings to tweak their own DMA translation logic. For vDPA device that doesn't require set_map(), no behavior changes. This capability is advertised via VHOST_BACKEND_F_IOTLB_BATCH capability. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-5-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost-vdpa: support get/set backend featuresJason Wang2020-08-061-0/+18
| | | | | | | | | | | | | | | | | | | | This patch makes userspace can get and set backend features to vhost-vdpa. Signed-off-by: Cindy Lu <lulu@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-4-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: generialize backend features setting/gettingJason Wang2020-08-063-16/+19
| | | | | | | | | | | | | | | | | | Move the backend features setting/getting from net.c to vhost.c to be reused by vhost-vdpa. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-3-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost-vdpa: refine ioctl pre-processingJason Wang2020-08-061-4/+5
| | | | | | | | | | | | | | | | Switch to use 'switch' to make the codes more easier to be extended. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20200804162048.22587-2-eli@mellanox.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>