summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* IB/cm: Fix sleeping in atomic when RoCE is usedRoland Dreier2017-08-311-19/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of places in the CM do spin_lock_irq(&cm_id_priv->lock); ... if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) However when the underlying transport is RoCE, this leads to a sleeping function being called with the lock held - the callchain is cm_alloc_response_msg() -> ib_create_ah_from_wc() -> ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() starts out by doing req = kzalloc(sizeof *req, GFP_KERNEL); not to mention rdma_addr_find_l2_eth_by_grh() doing wait_for_completion(&ctx.comp); to wait for the task that rdma_resolve_ip() queues up. Fix this by moving the AH creation out of the lock. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Add support to finalize objects in one transactionMatan Barak2017-08-303-1/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new ioctl based infrastructure either commits or rollbacks all objects of the method as one transaction. In order to do that, we introduce a notion of dealing with a collection of objects that are related to a specific method. This also requires adding a notion of a method and attribute. A method contains a hash of attributes, where each bucket contains several attributes. The attributes are hashed according to their namespace which resides in the four upper bits of the id. For example, an object could be a CQ, which has an action of CREATE_CQ. This action has multiple attributes. For example, the CQ's new handle and the comp_channel. Each layer in this hierarchy - objects, methods and attributes is split into namespaces. The basic example for that is one namespace representing the default entities and another one representing the driver specific entities. When declaring these methods and attributes, we actually declare their specifications. When a method is executed, we actually allocates some space to hold auxiliary information. This auxiliary information contains meta-data about the required objects, such as pointers to their type information, pointers to the uobjects themselves (if exist), etc. The specification, along with the auxiliary information we allocated and filled is given to the finalize_objects function. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Add a generic way to execute an operation on a uobjectMatan Barak2017-08-303-0/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ioctl infrastructure treats all user-objects in the same manner. It gets objects ids from the user-space and by using the object type and type attributes mentioned in the object specification, it executes this required method. Passing an object id from the user-space as an attribute is carried out in three stages. The first is carried out before the actual handler and the last is carried out afterwards. The different supported operations are read, write, destroy and create. In the first stage, the former three actions just fetches the object from the repository (by using its id) and locks it. The last action allocates a new uobject. Afterwards, the second stage is carried out when the handler itself carries out the required modification of the object. The last stage is carried out after the handler finishes and commits the result. The former two operations just unlock the object. Destroy calls the "free object" operation, taking into account the object's type and releases the uobject as well. Creation just adds the new uobject to the repository, making the object visible to the application. In order to abstract these details from the ioctl infrastructure layer, we add uverbs_get_uobject_from_context and uverbs_finalize_object functions which corresponds to the first and last stages respectively. Signed-off-by: Matan Barak <matanb@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Documentation: Hardware tag matchingArtemy Kovalyov2017-08-291-0/+64
| | | | | | | | | | Add document providing definitions of terms and core explanations for tag matching (TM) protocols, eager and rendezvous, TM application header, tag list manipulations and matching process. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/mlx5: Support IB_SRQT_TMArtemy Kovalyov2017-08-292-5/+22
| | | | | | | | | | Pass to mlx5_core flag to enable rendezvous offload, list_size and CQ when SRQ created with IB_SRQT_TM. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* net/mlx5: Add XRQ supportArtemy Kovalyov2017-08-293-10/+146
| | | | | | | | | | | | | Add support to new XRQ(eXtended shared Receive Queue) hardware object. It supports SRQ semantics with addition of extended receive buffers topologies and offloads. Currently supports tag matching topology and rendezvouz offload. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/mlx5: Fill XRQ capabilitiesArtemy Kovalyov2017-08-292-0/+15
| | | | | | | | | Provide driver specific values for XRQ capabilities. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/uverbs: Expose XRQ capabilitiesArtemy Kovalyov2017-08-292-0/+25
| | | | | | | | | Make XRQ capabilities available via ibv_query_device() verb. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/uverbs: Add new SRQ type IB_SRQT_TMArtemy Kovalyov2017-08-291-1/+5
| | | | | | | | | | | | | | | | | | | Add new SRQ type capable of new tag matching feature. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/uverbs: Add XRQ creation parameter to UAPIArtemy Kovalyov2017-08-291-1/+1
| | | | | | | | | | | Add tm_list_size parameter to struct ib_uverbs_create_xsrq. If SRQ type is tag-matching this field defines maximum size of tag matching list. Otherwise, it is expected to be zero. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Add new SRQ type IB_SRQT_TMArtemy Kovalyov2017-08-291-2/+8
| | | | | | | | | | | | | | | | | | | This patch adds new SRQ type - IB_SRQT_TM. The new SRQ type supports tag matching and rendezvous offloads for MPI applications. When SRQ receives a message it will search through the matching list for the corresponding posted receive buffer. The process of searching the matching list is called tag matching. In case the tag matching results in a match, the received message will be placed in the address specified by the receive buffer. In case no match was found the message will be placed in a generic buffer until the corresponding receive buffer will be posted. These messages are called unexpected and their set is called an unexpected list. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Separate CQ handle in SRQ contextArtemy Kovalyov2017-08-296-39/+60
| | | | | | | | | | | Before this change CQ attached to SRQ was part of XRC specific extension. Moving CQ handle out makes it available to other types extending SRQ functionality. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Add XRQ capabilitiesArtemy Kovalyov2017-08-291-0/+19
| | | | | | | | | | | | | | | | This patch adds following TM XRQ capabilities: * max_rndv_hdr_size - Max size of rendezvous request message * max_num_tags - Max number of entries in tag matching list * max_ops - Max number of outstanding list operations * max_sge - Max number of SGE in tag matching entry * flags - the following flags are currently defined: - IB_TM_CAP_RC - Support tag matching on RC transport Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* net/mlx5: Update HW layout definitionsArtemy Kovalyov2017-08-291-2/+7
| | | | | | | | | | * add offload_type field to mlx5_ifc_qpc_bits * update mlx5_ifc_xrqc_bits layout Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Reviewed-by: Yossi Itigin <yosefe@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Handle NETDEV_CHANGE eventsAndrew Boyer2017-08-291-1/+6
| | | | | | | | | | Without this fix, ports configured on top of ixgbe miss link up notifications. ibv_query_port() will continue to return IBV_PORT_DOWN even though the port is up and working. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Avoid ICRC errors by copying into the skb firstAndrew Boyer2017-08-291-6/+6
| | | | | | | | | | | | | | | | | | The current process is to first calculate the CRC and then copy the client data into the packet. This leaves a window in which the packet contents and CRC can get out of sync, if the client changes the data after the CRC is calculated but before the data is copied. By copying the data into the packet and then calculating the CRC directly from the packet contents we eliminate the window. This can be seen with qperf's ud_bi_bw test. This seems like very strange/reckless client behavior, but whether the client has mangled its data or not RXE should be able to transfer it reliably. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Another fix for broken receive queue drainingAndrew Boyer2017-08-291-1/+3
| | | | | | | | | This fixes another path in rxe_requester() that might overlook stale SKBs, preventing cleanup. Fixes: 1217197142d1 ("rxe: fix broken receive queue draining") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Remove unneeded initialization in prepare6()Andrew Boyer2017-08-291-1/+1
| | | | | | | Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Fix up rxe_qp_cleanup()Andrew Boyer2017-08-291-7/+2
| | | | | | | | | | Replace sk_dst_get()/dst_release() in rxe_qp_cleanup() with sk_dst_reset(). sk_dst_get() takes a new reference on dst, so the dst_release() doesn't actually release the original reference, which was the design intent. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Add dst_clone() in prepare_ipv6_hdr()Andrew Boyer2017-08-291-1/+1
| | | | | | | | Otherwise the reference count goes negative as IPv6 packets complete. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Fix destination cache for IPv6Andrew Boyer2017-08-292-1/+7
| | | | | | | | | | | | | | To successfully match an IPv6 path, the path cookie must match. Store it in the QP so that the IPv6 path can be reused. Replace open-coded version of dst_check() with the actual call, fixing the logic. The open-coded version skips the check call if dst->obsolete is 0 (DST_OBSOLETE_NONE), proceeding to replace the route. DST_OBSOLETE_NONE means that the route may continue to be used, though. Fixes: 4ed6ad1eb30e ("IB/rxe: Cache dst in QP instead of getting it...") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Fix up the responder's find_resources() functionAndrew Boyer2017-08-291-1/+1
| | | | | | | | | | | | The resource array is sized by max_dest_rd_atomic, not max_rd_atomic. Iterating over max_rd_atomic entries of qp->resp.resources[] will cause incorrect behavior when the two attributes are different (or even crash if max_rd_atomic is larger). Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Remove dangling prototypeAndrew Boyer2017-08-291-2/+0
| | | | | | | | Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Disable completion upcalls when a CQ is destroyedAndrew Boyer2017-08-294-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This prevents the stack from accessing userspace objects while they are being torn down. One possible sequence of events: - Userspace program exits - ib_uverbs_cleanup_ucontext() runs, calling ib_destroy_qp(), ib_destroy_cq(), etc. and releasing/freeing the UCQ - The QP still has tasklets running, so it isn't destroyed yet - The CQ is referenced by the QP, so the CQ isn't destroyed yet - The UCQ is kfree()'d anyway - A send work request completes - rxe_send_complete() calls cq->ibcq.comp_handler() - ib_uverbs_comp_handler() runs and crashes; the event queue is checked for is_closed, but it has no way to check the ib_ucq_object before accessing it The reference counting on the CQ doesn't protect against this since the CQ hasn't been destroyed yet. There's no available interface to deregister the UCQ from the CQ, and it didn't appear that attempting to add reference counting to the UCQ was going to be a good way to go since this solution is much simpler. Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rxe: Move refcounting earlier in rxe_send()Andrew Boyer2017-08-291-3/+5
| | | | | | | | | | | | | | The network stack will call nskb's destructor, rxe_skb_tx_dtor(), if the packet gets dropped by ip_local_out()/ip6_local_out(). Thus we need to add the QP ref before output to avoid extra dereferences during network congestion. This could lead to unwanted destruction of the QP. Fix up the skb_out accounting, too. Fixes: fda85ce91240 ("IB/rxe: Fix kernel panic from skb destructor") Signed-off-by: Andrew Boyer <andrew.boyer@dell.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rdmavt: Handle dereg of inuse MRs properlyMike Marciniszyn2017-08-294-21/+216
| | | | | | | | | | | | | | | | | | | | | | | | A destroy of an MR prior to destroying the QP can cause the following diagnostic if the QP is referencing the MR being de-registered: hfi1 0000:05:00.0: hfi1_0: rvt_dereg_mr timeout mr ffff8808562108 00 pd ffff880859b20b00 The solution is to when the a non-zero refcount is encountered when the MR is destroyed the QPs needs to be iterated looking for QPs in the same PD as the MR. If rvt_qp_mr_clean() detects any such QP references the rkey/lkey, the QP needs to be put into an error state via a call to rvt_qp_error() which will trigger the clean up of any stuck references. This solution is as specified in IBTA 1.3 Volume 1 11.2.10.5. [This is reproduced with the 0.4.9 version of qperf and the rc_bw test] Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/qib: Convert qp_stats debugfs interface to use new iterator APIMike Marciniszyn2017-08-293-63/+16
| | | | | | | | | Continue porting copy/paste code into rdmavt from qib. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Convert qp_stats debugfs interface to use new iterator APIMike Marciniszyn2017-08-293-107/+14
| | | | | | | | | Continue moving copy/paste code into rdmavt. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Convert hfi1_error_port_qps() to use new QP iteratorMike Marciniszyn2017-08-291-38/+41
| | | | | | | | | | Change hfi1_error_port_qps() to use the new rvt_qp_iter() in its QP scanning. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/rdmavt: Add QP iterator API for QPsMike Marciniszyn2017-08-292-0/+173
| | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently 3 spots in the qib and hfi1 driver that have knowledge of the internal QP hash list that should only be in scope to rdmavt QP code. Add an iterator API for processing all QPs to hide the nature of the RCU hashlist. The API consists of: - rvt_qp_iter_init() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter_next() * For iterating QPs one at a time for seq_file semantics - rvt_qp_iter() * For iterating all QPs The first two are used for things like seq_file prints. The last is for code that just needs to iterate all QPs in the system. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Use accessor to determine ring sizeKaike Wan2017-08-291-1/+1
| | | | | | | | | | | The qp_stats print will soon be moving to rdmavt, so use the proper accessor to get the ring size rather than a driver supplied constant. Fixes: Commit ff8d836efe06 ("IB/hfi1: Add receiving queue info to qp_stats") Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/qib: Stricter bounds checking for copy to bufferKamenee Arumugam2017-08-291-2/+2
| | | | | | | | | | Replace 'strcpy' with 'strncpy' to restrict the number of bytes copied to the buffer. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hif1: Remove static tracing from SDMA hot pathMichael J. Ruhl2017-08-297-29/+255
| | | | | | | | | | | | | The hfi1_cdbg() macro can be instantiated in the hot path even when it is not in use. This shows up on perf profiles. Rework the macros (for SDMA and MMU), to use the trace interface directly to eliminate this performance hit. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Acquire QSFP cable information on loopbackJan Sokolowski2017-08-291-0/+15
| | | | | | | | | | | | | Currently, QSFP information is not queried in cases where loopback was set up and QSFP module is present. Acquire QSFP information in case of loopback. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* i40iw: make some structures constBhumika Goyal2017-08-291-3/+3
| | | | | | | | Make some structures const as they are only used during a copy operation. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: constify vm_operations_structArvind Yadav2017-08-291-1/+1
| | | | | | | | | vm_operations_struct are not supposed to change at runtime. vm_area_struct structure working with const vm_operations_struct. So mark the non-const vm_operations_struct structs as const. Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/bnxt_re: remove unnecessary call to memsetHimanshu Jha2017-08-291-2/+0
| | | | | | | | | | call to memset to assign 0 value immediately after allocating memory with kzalloc is unnecesaary as kzalloc allocates the memory filled with 0 value. Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/usnic: check for allocation failureDan Carpenter2017-08-291-0/+2
| | | | | | | | | | usnic_uiom_get_dev_list() can return ERR_PTR(-ENOMEM) so we should check for that. Fixes: e3cf00d0a87f ("IB/usnic: Add Cisco VIC low-level hardware driver") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Add opcode states to qp_statsMike Marciniszyn2017-08-291-1/+3
| | | | | | | | | These fields allow for debugging send engine processing. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Add received request info to qp_statsKaike Wan2017-08-291-2/+9
| | | | | | | | | | | | The rvt_ack_entry pointed to by s_tail_ack_queue provides important info about the request that has just been processed or is being processed on the responder side of a RC connection. This patch adds this info to the qp_stats to assist debugging. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Fix whitespace alignment issue for MADKamenee Arumugam2017-08-291-1/+1
| | | | | | | | | | Fix a tab alignment issue present in pr_err_ratelimited error message. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Move structure and MACRO definitions in user_sdma.c to user_sdma.hHarish Chegondi2017-08-292-168/+166
| | | | | | | | | | Clean up user_sdma.c by moving the structure and MACRO definitions into the header file user_sdma.h Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Move structure definitions from user_exp_rcv.c to user_exp_rcv.hHarish Chegondi2017-08-294-19/+19
| | | | | | | | | | | | | Clean up user_exp_rcv.c file by moving structure definitions into header file user_exp_rcv.h. Since these structure definitions depend on the structure definitions in mmu_rb.h, move #include "mmu_rb.h" above the include "user_exp_rcv.h" or include of header files that include user_exp_rcv.h Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Remove duplicate definitions of num_user_pages() functionHarish Chegondi2017-08-293-20/+12
| | | | | | | | | | | num_user_pages() function has been defined in both user_exp_rcv.c file and user_sdma.c file. Move the function definition to a header file so there is only one definition in the source repo. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Fix the bail out code in pin_vector_pages() functionHarish Chegondi2017-08-291-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In pin_vector_pages() function, if there is any error while pinning the pages or while adding a pinned buffer to the cache, the bail out code needs to unpin any pinned pages that are not in the cache and adjust the n_locked counter that counts the total pages pinned. The current bail out code doesn't seem to be doing it right in two cases: 1. Before pinning required pages for a buffer, the SDMA pinned buffer cache is searched to see if the virtual address range that needs to be pinned is already pinned. If there isn't a hit in the cache, a new node is created for the buffer and is added to the cache after the buffer is pinned. If adding the new node to the cache fails, the n_locked count is decremented properly but the pinned pages are not freed. This commit fixes this issue. 2. If there is a hit in the SDMA cache, but the cached buffer doesn't have enough pages to cover the entire address range that needs to be pinned, the node for the cached buffer is extracted from the cache, remaining pages needed are pinned and added to the node. The node is finally added back into the cache. If there is an error pinning the extra pages, the bail out code frees all the pages in the node but the n_locked count is not being decremented by the no of pages in the node that are freed. This commit fixes this issue. This commit fixes the above two issues by creating a new function that frees the pages in a node and decrements the n_locked count by the number of pages freed. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Clean up pin_vector_pages() functionHarish Chegondi2017-08-291-34/+45
| | | | | | | | | | Clean up pin_vector_pages() function by moving page pinning related code to a separate function since it really stands on its own. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Clean up user_sdma_send_pkts() functionHarish Chegondi2017-08-291-59/+82
| | | | | | | | | | user_sdma_send_pkts() function is unnecessarily long. Clean it up by moving some of its code into separate functions. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Clean up hfi1_user_exp_rcv_setup functionHarish Chegondi2017-08-292-88/+153
| | | | | | | | | | | | Clean up hfi1_user_exp_rcv_setup function by moving page pinning and unpinning related code to separate functions. In order to reduce the number of parameters passed between functions, a new data structure struct tid_user_buf is defined and used. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Improve local kmem_cache_alloc performanceMichael J. Ruhl2017-08-293-27/+6
| | | | | | | | | | | | | | Performance analysis shows that the cache callback function sdma_kmem_cache_ctor contributes to 1/2 of the kmem_cache_allocs time. Since all of the fields in the allocated data structure are initialized in the code path, remove the _ctor function. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/hfi1: Ratelimit prints from sdma_interruptGrzegorz Morys2017-08-292-2/+8
| | | | | | | | | | Ratelimit error prints from sdma_interrupt function that could swarm dmesg otherwise. Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>