summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2020-08-07166-7006/+6382
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma updates from Jason Gunthorpe: "A quiet cycle after the larger 5.8 effort. Substantially cleanup and driver work with a few smaller features this time. - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re - Removal of dead or redundant code across the drivers - RAW resource tracker dumps to include a device specific data blob for device objects to aide device debugging - Further advance the IOCTL interface, remove the ability to turn it off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands - Remove stubs related to devices with no pkey table - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a device to give higher performance - Several more static checker, syzkaller and rare crashers fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits) RDMA/mlx5: Fix flow destination setting for RDMA TX flow table RDMA/rxe: Remove pkey table RDMA/umem: Add a schedule point in ib_umem_get() RDMA/hns: Fix the unneeded process when getting a general type of CQE error RDMA/hns: Fix error during modify qp RTS2RTS RDMA/hns: Delete unnecessary memset when allocating VF resource RDMA/hns: Remove redundant parameters in set_rc_wqe() RDMA/hns: Remove support for HIP08_A RDMA/hns: Refactor hns_roce_v2_set_hem() RDMA/hns: Remove redundant hardware opcode definitions RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP RDMA/include: Replace license text with SPDX tags RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting RDMA/cma: Execute rdma_cm destruction from a handler properly RDMA/cma: Remove unneeded locking for req paths RDMA/cma: Using the standard locking pattern when delivering the removal event RDMA/cma: Simplify DEVICE_REMOVAL for internal_id RDMA/efa: Add EFA 0xefa1 PCI ID RDMA/efa: User/kernel compatibility handshake mechanism ...
| * RDMA/mlx5: Fix flow destination setting for RDMA TX flow tableMichael Guralnik2020-08-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For RDMA TX flow table, set destination type to be 'port' and prevent creation of flows with TIR destination. As RDMA TX is an egress flow table the rules on this flow table should not forward traffic back to the NIC and should set the destination to be the port. Without the setting of this destination type flow rules on the RDMA TX flow tables are not created as FW invokes a syndrome for undefined destination for the rule. Fixes: 24670b1a3166 ("net/mlx5: Add support for RDMA TX steering") Link: https://lore.kernel.org/r/20200803055849.14947-1-leon@kernel.org Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/rxe: Remove pkey tableKamal Heib2020-07-316-77/+13
| | | | | | | | | | | | | | | | | | | | | | | | The RoCE spec requires RoCE devices to support only the default pkey. However the rxe driver maintains a 64 enties pkey table and uses only the first entry. Remove the pkey table and hard code a table of length one hard wired with the default pkey. Replace all checks of the pkey_table with a comparison to the default_pkey instead. Link: https://lore.kernel.org/r/20200721101618.686110-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/umem: Add a schedule point in ib_umem_get()Eric Dumazet2020-07-311-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mapping as little as 64GB can take more than 10 seconds, triggering issues on kernels with CONFIG_PREEMPT_NONE=y. ib_umem_get() already splits the work in 2MB units on x86_64, adding a cond_resched() in the long-lasting loop is enough to solve the issue. Note that sg_alloc_table() can still use more than 100 ms, which is also problematic. This might be addressed later in ib_umem_add_sg_table(), adding new blocks in sgl on demand. Link: https://lore.kernel.org/r/20200730015755.1827498-1-edumazet@google.com Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Fix the unneeded process when getting a general type of CQE errorXi Wang2020-07-302-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | If the hns ROCEE reports a general error CQE (types not specified by the IB General Specifications), it's no need to change the QP state to error, and the driver should just skip it. Fixes: 7c044adca272 ("RDMA/hns: Simplify the cqe code of poll cq") Link: https://lore.kernel.org/r/1595932941-40613-8-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Fix error during modify qp RTS2RTSLang Cheng2020-07-301-1/+3
| | | | | | | | | | | | | | | | | | | | One qp state migrations legal configuration was deleted mistakenly. Fixes: 357f34294686 ("RDMA/hns: Simplify the state judgment code of qp") Link: https://lore.kernel.org/r/1595932941-40613-7-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Delete unnecessary memset when allocating VF resourceLang Cheng2020-07-301-2/+0
| | | | | | | | | | | | | | | | | | | | The hns_roce_cmq_setup_basic_desc() can clear the whole desc, so removes these redundant memset operations. Link: https://lore.kernel.org/r/1595932941-40613-6-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Remove redundant parameters in set_rc_wqe()Weihang Li2020-07-301-13/+17
| | | | | | | | | | | | | | | | | | | | | | There are some functions called by set_rc_wqe() use two parameters: "void *wqe" and "struct hns_roce_v2_rc_send_wqe *rc_sq_wqe", but the first one can be got from the second one. So remove the redundant wqe from related functions. Link: https://lore.kernel.org/r/1595932941-40613-5-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Remove support for HIP08_ALang Cheng2020-07-303-65/+47
| | | | | | | | | | | | | | | | | | | | | | HIP08_A is an temporary version and all features of it are supported by HIP08_B. So remove the relevant code. Link: https://lore.kernel.org/r/1595932941-40613-4-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Refactor hns_roce_v2_set_hem()Weihang Li2020-07-301-19/+26
| | | | | | | | | | | | | | | | | | | | The parts about preparing and sending mailbox to hardware is not strongly related to other codes in hns_roce_v2_set_hem(), and can be encapsulated into a separate function. Link: https://lore.kernel.org/r/1595932941-40613-3-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/hns: Remove redundant hardware opcode definitionsLang Cheng2020-07-302-30/+14
| | | | | | | | | | | | | | | | | | | | | | HNS_ROCE_SQ_OPCODE_XXXs and HNS_ROCE_V2_WQE_OP_XXXs have same values, so remove a set of redundant definitions. In addition, remove the suffix of HNS_ROCE_V2_WQE_OP_BIND_MW_TYPE. Link: https://lore.kernel.org/r/1595932941-40613-2-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QPMark Zhang2020-07-291-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | When dumping QPs bound to a counter, raw QPs should be allowed to dump without the CAP_NET_RAW privilege. This is consistent with what "rdma res show qp" does. Fixes: c4ffee7c9bdb ("RDMA/netlink: Implement counter dumpit calback") Link: https://lore.kernel.org/r/20200727095828.496195-1-leon@kernel.org Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/include: Replace license text with SPDX tagsLeon Romanovsky2020-07-2931-946/+59
| | | | | | | | | | | | | | | | | | | | | | The header files in RDMA subsystem are dual licensed and can be described by simple SPDX tag, so replace all of them at once together with making them use the same coding style for header guard defines. Link: https://lore.kernel.org/r/20200719072521.135260-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wqJack Wang2020-07-292-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep triggers a warning from time to time when running a regression test: rnbd_client L685: </dev/nullb0@bla> Device disconnected. rnbd_client L1756: Unloading module workqueue: WQ_MEM_RECLAIM rtrs_client_wq:rtrs_clt_reconnect_work [rtrs_client] is flushing !WQ_MEM_RECLAIM ib_addr:process_one_req [ib_core] WARNING: CPU: 2 PID: 18824 at kernel/workqueue.c:2517 check_flush_dependency+0xad/0x130 The root cause is workqueue core expect flushing should not be done for a !WQ_MEM_RECLAIM wq from a WQ_MEM_RECLAIM workqueue. In above case ib_addr workqueue without WQ_MEM_RECLAIM, but rtrs_wq WQ_MEM_RECLAIM. To avoid the warning, remove the WQ_MEM_RECLAIM flag. Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20200724111508.15734-4-haris.iqbal@cloud.ionos.com Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/rtrs-clt: add an additional random 8 seconds before reconnectingDanil Kipnis2020-07-291-2/+12
| | | | | | | | | | | | | | | | | | | | | | In order to avoid all the clients to start reconnecting at the same time schedule the reconnect dwork with a random jitter of +[0,8] seconds. Fixes: 6a98d71daea1 ("RDMA/rtrs: client: main functionality") Link: https://lore.kernel.org/r/20200724111508.15734-2-haris.iqbal@cloud.ionos.com Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com> Signed-off-by: Md Haris Iqbal <haris.iqbal@cloud.ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/cma: Execute rdma_cm destruction from a handler properlyJason Gunthorpe2020-07-291-90/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a rdma_cm_id needs to be destroyed after a handler callback fails, part of the destruction pattern is open coded into each call site. Unfortunately the blind assignment to state discards important information needed to do cma_cancel_operation(). This results in active operations being left running after rdma_destroy_id() completes, and the use-after-free bugs from KASAN. Consolidate this entire pattern into destroy_id_handler_unlock() and manage the locking correctly. The state should be set to RDMA_CM_DESTROYING under the handler_lock to atomically ensure no futher handlers are called. Link: https://lore.kernel.org/r/20200723070707.1771101-5-leon@kernel.org Reported-by: syzbot+08092148130652a6faae@syzkaller.appspotmail.com Reported-by: syzbot+a929647172775e335941@syzkaller.appspotmail.com Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/cma: Remove unneeded locking for req pathsJason Gunthorpe2020-07-291-25/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The REQ flows are concerned that once the handler is called on the new cm_id the ULP can choose to trigger a rdma_destroy_id() concurrently at any time. However, this is not true, while the ULP can call rdma_destroy_id(), it immediately blocks on the handler_mutex which prevents anything harmful from running concurrently. Remove the confusing extra locking and refcounts and make the handler_mutex protecting state during destroy more clear. Link: https://lore.kernel.org/r/20200723070707.1771101-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/cma: Using the standard locking pattern when delivering the removal eventJason Gunthorpe2020-07-291-26/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever an event is delivered to the handler it should be done under the handler_mutex and upon any non-zero return from the handler it should trigger destruction of the cm_id. cma_process_remove() skips some steps here, it is not necessarily wrong since the state change should prevent any races, but it is confusing and unnecessary. Follow the standard pattern here, with the slight twist that the transition to RDMA_CM_DEVICE_REMOVAL includes a cma_cancel_operation(). Link: https://lore.kernel.org/r/20200723070707.1771101-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/cma: Simplify DEVICE_REMOVAL for internal_idJason Gunthorpe2020-07-291-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cma_process_remove() triggers an unconditional rdma_destroy_id() for internal_id's and skips the event deliver and transition through RDMA_CM_DEVICE_REMOVAL. This is confusing and unnecessary. internal_id always has cma_listen_handler() as the handler, have it catch the RDMA_CM_DEVICE_REMOVAL event and directly consume it and signal removal. This way the FSM sequence never skips the DEVICE_REMOVAL case and the logic in this hard to test area is simplified. Link: https://lore.kernel.org/r/20200723070707.1771101-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/efa: Add EFA 0xefa1 PCI IDGal Pressman2020-07-291-2/+4
| | | | | | | | | | | | | | | | | | | | Add support for 0xefa1 devices. Link: https://lore.kernel.org/r/20200722140312.3651-5-galpress@amazon.com Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/efa: User/kernel compatibility handshake mechanismGal Pressman2020-07-292-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a mechanism that performs an handshake between the userspace provider and kernel driver which verifies that the user supports all required features in order to operate correctly. The handshake verifies the needed functionality by comparing the reported device caps and the provider caps. If the device reports a non-zero capability the appropriate comp mask is required from the userspace provider in order to allocate the context. Link: https://lore.kernel.org/r/20200722140312.3651-4-galpress@amazon.com Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/efa: Expose minimum SQ sizeGal Pressman2020-07-295-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | The device reports the minimum SQ size required for creation. This patch queries the min SQ size and reports it back to the userspace library. Link: https://lore.kernel.org/r/20200722140312.3651-3-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Shadi Ammouri <sammouri@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/efa: Expose maximum TX doorbell batchGal Pressman2020-07-295-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The device reports the maximum number of bytes to be written before ringing the doorbell (zero means unlimited). This patch queries the max batch size and reports it back to the userspace library. Link: https://lore.kernel.org/r/20200722140312.3651-2-galpress@amazon.com Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Firas JahJah <firasj@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * IB/srpt: use new shared CQ mechanismYamin Friedman2020-07-292-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | Have the driver use shared CQs provided by the rdma core driver. This provides the advantage of improved efficiency handling interrupts. Link: https://lore.kernel.org/r/20200722135629.49467-3-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * IB/isert: use new shared CQ mechanismYamin Friedman2020-07-292-153/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Previously when a single connection was opened a CQ was opened for every core with enough space for eight connections, this is a very large overhead that in most cases will not be utilized. Link: https://lore.kernel.org/r/20200722135629.49467-2-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * IB/iser: use new shared CQ mechanismYamin Friedman2020-07-292-106/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | Have the driver use shared CQs provided by the rdma core driver. Since this provides similar functionality to iser_comp it has been removed. Now there is no reason to allocate very large CQs when the driver is loaded while gaining the advantage of shared CQs. Link: https://lore.kernel.org/r/20200722135629.49467-1-maxg@mellanox.com Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Acked-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/mlx5: Delete unreachable codeLeon Romanovsky2020-07-282-9/+4
| | | | | | | | | | | | | | | | Delete two occurrences of unreachable code discovered by the Coverity. Link: https://lore.kernel.org/r/20200727095746.495915-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * RDMA/core: Fix return error value in _ib_modify_qp() to negativeLi Heng2020-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | The error codes in _ib_modify_qp() are supposed to be negative errno. Fixes: 7a5c938b9ed0 ("IB/core: Check for rdma_protocol_ib only after validating port_num") Link: https://lore.kernel.org/r/1595645787-20375-1-git-send-email-liheng40@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Li Heng <liheng40@huawei.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * Merge branch 'mlx5_uar' into rdma.git /for-nextJason Gunthorpe2020-07-277-49/+180
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Meir Lichtinger says: ==================== ConnectX-7 supports setting relaxed ordering read/write mkey attribute by UMR, indicated by new HCA capabilities, so extend mlx5_ib driver to configure UMR control segment ==================== Based on the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux due to dependencies. * branch 'mlx5_uar': RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7 RDMA/mlx5: Use MLX5_SET macro instead of local structure RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMR
| | * RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7Meir Lichtinger2020-07-273-12/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to ConnectX-7 UMR is not used when user passes relaxed ordering access flag. ConnectX-7 supports setting relaxed ordering read/write mkey attribute by UMR, indicated by new HCA capabilities. With ConnectX-7 driver uses UMR when user set relaxed ordering access flag, in contrast to previous silicon models. Specifically it includes setting relvant flags of mkey context mask in UMR control segment, and relaxed ordering write and read flags in UMR mkey context segment. Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Reviewed-by: Michael Guralnik <michaelgur@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| | * RDMA/mlx5: Use MLX5_SET macro instead of local structureMeir Lichtinger2020-07-273-20/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use generic mlx5 structure defined in mlx5_ifc.h to represent ConnectX device data structures instead of using structure defined specifically for mlx5_ib module. Link: https://lore.kernel.org/r/20200716105248.1423452-3-leon@kernel.org Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| | * RDMA/mlx5: ConnectX-7 new capabilities to set relaxed ordering by UMRMeir Lichtinger2020-07-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to ConnectX-7 setting mkey relaxed ordering read/write attributes by UMR is not supported. ConnectX-7 supports this option, which is indicated by two new HCA capabilities - relaxed_ordering_write_umr and relaxed_ordering_read_umr. Signed-off-by: Meir Lichtinger <meirl@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| | * net/mlx5: Enable count action for rules with allow actionMichael Guralnik2020-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Enable the creation of rules with allow and count actions. This enables using counters on egress flow tables. Signed-off-by: Michael Guralnik <michaelgur@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Add interface changes required for VDPAEli Cohen2020-07-162-16/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename mlx5_ifc_device_virtio_emulation_cap_bits to mlx5_ifc_virtio_emulation_cap_bits to match names produced by the tools producing these auto generated files. In addition missing capabilities that will be required by VDPA implementation. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Add VDPA interface type to supported enumerationsEli Cohen2020-07-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | VDPA is a new interface that will be added in subsequent patches. It uses mlx5 core devices and resources. Add an interface type for it. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| | * net/mlx5: Support setting access rights of dma addressesEli Cohen2020-07-163-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mlx5_fill_page_frag_array() is used to populate dma addresses to resources that require it, such as QPs, RQs etc. When the resource is used, PA list permissions are ignored. For resources that use MTT list, the user is required to provide the access rights. Subsequent patches use resources that require MTT lists, so modify API and implementation to support that. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | RDMA/mlx5: Fix typo in enum namePavel Machek2020-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Nnothing uses the enum name, so this is harmless. Fixes: 322694412400 ("IB/mlx5: Introduce driver create and destroy flow methods") Link: https://lore.kernel.org/r/20200724084112.GC31930@amd Signed-off-by: Pavel Machek (CIP) <pavel@denx.de> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | IB/hfi1: Use fallthrough pseudo-keywordGustavo A. R. Silva2020-07-2411-60/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7-rc7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Link: https://lore.kernel.org/r/20200721133455.GA14363@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/uverbs: Silence shiftTooManyBitsSigned warningLeon Romanovsky2020-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix reported by kbuild warning. drivers/infiniband/core/uverbs_cmd.c:1897:47: warning: Shifting signed 32-bit value by 31 bits is undefined behaviour [shiftTooManyBitsSigned] BUILD_BUG_ON(IB_USER_LAST_QP_ATTR_MASK == (1 << 31)); ^ Link: https://lore.kernel.org/r/20200720175627.1273096-3-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/uverbs: Remove redundant assignmentsLeon Romanovsky2020-07-241-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kbuild reported the following warning, so clean whole uverbs_cmd.c file. drivers/infiniband/core/uverbs_cmd.c:1066:6: warning: Variable 'ret' is reassigned a value before the old one has been used. [redundantAssignment] ret = uverbs_request(attrs, &cmd, sizeof(cmd)); ^ drivers/infiniband/core/uverbs_cmd.c:1064:0: note: Variable 'ret' is reassigned a value before the old one has been used. int ret = -EINVAL; ^ Link: https://lore.kernel.org/r/20200720175627.1273096-2-leon@kernel.org Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/mlx5: Add missing srcu_read_lock in ODP implicit flowMaor Gottlieb2020-07-243-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the locking scheme, mlx5_ib_update_xlt() should be called with srcu_read_lock(dev->odp->srcu). Prefetch missed this. This fixes the below WARN from lockdep_assert_held(): WARNING: CPU: 1 PID: 1130 at drivers/infiniband/hw/mlx5/odp.c:132 mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib] Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter overlay ib_srp scsi_transport_srp rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core mlx5_core mlxfw ptp pps_core CPU: 1 PID: 1130 Comm: kworker/u16:11 Tainted: G W 5.8.0-rc5_for_upstream_debug_2020_07_13_11_04 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 Workqueue: events_unbound mlx5_ib_prefetch_mr_work [mlx5_ib] RIP: 0010:mlx5_odp_populate_xlt+0x175/0x180 [mlx5_ib] Code: 08 e2 85 c0 0f 84 65 ff ff ff 49 8b 87 60 01 00 00 be ff ff ff ff 48 8d b8 b0 39 00 00 e8 93 e0 50 e1 85 c0 0f 85 45 ff ff ff <0f> 0b e9 3e ff ff ff 0f 0b eb c7 0f 1f 44 00 00 48 8b 87 98 0f 00 RSP: 0018:ffff88840f44fc68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88840cc9d000 RCX: ffff88840efcd940 RDX: 0000000000000000 RSI: ffff88844871b9b0 RDI: ffff88840efce100 RBP: ffff88840cc9d040 R08: 0000000000000040 R09: 0000000000000001 R10: ffff88846ced3068 R11: 0000000000000000 R12: 00000000000156ec R13: 0000000000000004 R14: 0000000000000004 R15: ffff888439941000 FS: 0000000000000000(0000) GS:ffff88846fa80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8536d12430 CR3: 0000000437a5e006 CR4: 0000000000360ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: mlx5_ib_update_xlt+0x37c/0x7c0 [mlx5_ib] pagefault_mr+0x315/0x440 [mlx5_ib] mlx5_ib_prefetch_mr_work+0x56/0xa0 [mlx5_ib] process_one_work+0x215/0x5c0 worker_thread+0x3c/0x380 ? process_one_work+0x5c0/0x5c0 kthread+0x133/0x150 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 Hold the SRCU during prefetch, even though it strictly isn't needed since prefetch is holding the num_deferred_work it does make it easier to reason about. Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") Link: https://lore.kernel.org/r/20200719065747.131157-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/core: Update write interface to use automatic object lifetimeLeon Romanovsky2020-07-241-207/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The automatic object lifetime model allows us to change the write() interface to have the same logic as the ioctl() path. Update the create/alloc functions to be in the following format, so the code flow will be the same: * Allocate objects * Initialize them * Call to the drivers, this is last step that is allowed to fail * Finalize object * Return response and allow to core code to handle abort/commit respectively. Link: https://lore.kernel.org/r/20200719052223.75245-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/core: Align abort/commit object scheme for write() and ioctl() pathsLeon Romanovsky2020-07-244-1/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create the same logic flow for the write() interface as we have for the ioctl() path by making sure that the object is committed or aborted automatically after HW object creation. Link: https://lore.kernel.org/r/20200719052223.75245-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/mlx5: Allow SQ modificationMaor Gottlieb2020-07-241-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the SQ is set to a ready state when the RAW QP is modified to INIT. When the TIS is modified, e.g. to change the lag_tx_affinity, then SQs which are already in the ready state will not be affected. Open a window to modify the SQ behavior by setting the SQ as ready only when QP was modified to RTS. Link: https://lore.kernel.org/r/20200716105416.1423826-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA: rdma_user_ioctl.h: fix a duplicated word + clarifyRandy Dunlap2020-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Change the repeated word "it" in a comment to "it to". Also insert a dash in the sentence to add clarity. Link: https://lore.kernel.org/r/20200719003220.21250-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/bnxt_re: Update maintainers for Broadcom rdma driverDevesh Sharma2020-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Adding a new co-maintainer for Broadcom's RDMA driver. Link: https://lore.kernel.org/r/1594822619-4098-7-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/bnxt_re: Change wr posting logic to accommodate variable wqesDevesh Sharma2020-07-205-173/+398
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modifying the post-send and post-recv to initialize the wqes slot by slot dynamically depending on the number of max sges requested by consumer at the time of QP creation. Changed the QP creation logic to determine the size of SQ and RQ in 16B slots based on the number of wqe and number of SGEs requested by consumer Link: https://lore.kernel.org/r/1594822619-4098-6-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/bnxt_re: Add helper data structuresDevesh Sharma2020-07-202-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding few helper data structure which are useful to initialize hardware send wqe in variable wqe mode. Adding a qp flag in HSI to indicate variable wqe is enabled for this qp. Link: https://lore.kernel.org/r/1594822619-4098-5-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/bnxt_re: Pull psn buffer dynamically based on prodDevesh Sharma2020-07-202-58/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changing the PSN management memory buffers from statically initialized to dynamic pull scheme. During create qp only the start pointers are initialized and during post-send the psn buffer is pulled based on current producer index. Adjusting post_send code to accommodate dynamic psn-pull and changing post_recv code to match post-send code wrt pseudo flush wqe generation. Link: https://lore.kernel.org/r/1594822619-4098-4-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
| * | RDMA/bnxt_re: introduce a function to allocate swqDevesh Sharma2020-07-203-171/+207
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bnxt_re driver now allocates shadow sq and rq to maintain per wqe wr_id and few other flags required to support variable wqe. Segregated the allocation of shadow queue in a separate function and adjust the cqe polling logic. The new polling logic is based on shadow queue indices. Link: https://lore.kernel.org/r/1594822619-4098-3-git-send-email-devesh.sharma@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>