summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/cxgb4/qp.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* RDMA/cxgb4: Remove kref accounting for sync operationLeon Romanovsky2019-02-161-3/+0
| | | | | | | | | | | Ucontext allocation and release aren't async events and don't need kref accounting. The common layer of RDMA subsystem ensures that dealloc ucontext will be called after all other objects are released. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* IB/{hw,sw}: Remove 'uobject->context' dependency in object creation APIsShamir Rabinovitch2019-02-151-4/+5
| | | | | | | | | | | Now when we have the udata passed to all the ib_xxx object creation APIs and the additional macro 'rdma_udata_to_drv_context' to get the ib_ucontext from ib_udata stored in uverbs_attr_bundle, we can finally start to remove the dependency of the drivers in the ib_xxx->uobject->context. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge branch 'devx-async' into k.o/for-nextJason Gunthorpe2019-01-291-3/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yishai Hadas says: Enable DEVX asynchronous query commands This series enables querying a DEVX object in an asynchronous mode. The userspace application won't block when calling the firmware and it will be able to get the response back once that it will be ready. To enable the above functionality: - DEVX asynchronous command completion FD object was introduced. - The applicable file operations were implemented to enable using it by the user application. - Query asynchronous method was added to the DEVX object, it will call the firmware asynchronously and manages the response on the given input FD. - Hot unplug support was added for the FD to work properly upon unbind/disassociate. - mlx5 core fence for asynchronous commands was implemented and used to prevent racing upon unbind/disassociate. This branch is based on mlx5-next & v5.0-rc2 due to dependencies, from git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * branch 'devx-async': IB/mlx5: Implement DEVX hot unplug for async command FD IB/mlx5: Implement the file ops of DEVX async command FD IB/mlx5: Introduce async DEVX obj query API IB/mlx5: Introduce MLX5_IB_OBJECT_DEVX_ASYNC_CMD_FD Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * cross-tree: phase out dma_zalloc_coherent()Luis Chamberlain2019-01-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already need to zero out memory for dma_alloc_coherent(), as such using dma_zalloc_coherent() is superflous. Phase it out. This change was generated with the following Coccinelle SmPL patch: @ replace_dma_zalloc_coherent @ expression dev, size, data, handle, flags; @@ -dma_zalloc_coherent(dev, size, handle, flags) +dma_alloc_coherent(dev, size, handle, flags) Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> [hch: re-ran the script on the latest tree] Signed-off-by: Christoph Hellwig <hch@lst.de>
* | RDMA/iw_cxgb4: Drop __GFP_NOFAILJason Gunthorpe2019-01-241-1/+1
| | | | | | | | | | | | | | | | | | | | There is no reason for this __GFP_NOFAIL, none of the other routines in this file use it, and there is an error unwind here. NOFAIL should be reserved for special cases, not used by network drivers. Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's") Reported-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | iw_cxgb4: Check for send WR also while posting write with completion WRPotnuri Bharat Teja2019-01-071-6/+13
|/ | | | | | | | | | | | | Inorder to optimize the NVMEoF read IOPs, iw_cxgb4 posts a FW Write with Completion WQE that combines an RDMA Write WR and the subsequent RDMA Send with Invalidate WR. This patch is an extension to it, where it posts a Write with completion for RDMA WRITE WR + RDMA SEND WR combination as well. Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA: Cleanup undesired pd->uobject usageShamir Rabinovitch2018-12-191-2/+2
| | | | | | | | | | | | Drivers should be using udata to determine if a method is invoked from user space or kernel space. A pd does not necessarily say a different objects is kernel or user. Transforming the tests to use udata eliminates a large number of uobject references from the drivers. Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* infiniband/hw/cxgb4/qp.c: Use dma_zalloc_coherentSabyasachi Gupta2018-11-211-2/+1
| | | | | | | | Replaced dma_alloc_coherent + memset with dma_zalloc_coherent Signed-off-by: Sabyasachi Gupta <sabyasachi.linux@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds2018-10-261-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle with many of the commits being smallish code fixes and improvements across the drivers. - Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe - Memory window support in hns - mlx5 user API 'flow mutate/steering' allows accessing the full packet mangling and matching machinery from user space - Support inter-working with verbs API calls in the 'devx' mlx5 user API, and provide options to use devx with less privilege - Modernize the use of syfs and the device interface to use attribute groups and cdev properly for uverbs, and clean up some of the core code's device list management - More progress on net namespaces for RDMA devices - Consolidate driver BAR mmapping support into core code helpers and rework how RDMA holds poitners to mm_struct for get_user_pages cases - First pass to use 'dev_name' instead of ib_device->name - Device renaming for RDMA devices" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits) IB/mlx5: Add support for extended atomic operations RDMA/core: Fix comment for hw stats init for port == 0 RDMA/core: Refactor ib_register_device() function RDMA/core: Fix unwinding flow in case of error to register device ib_srp: Remove WARN_ON in srp_terminate_io() IB/mlx5: Allow scatter to CQE without global signaled WRs IB/mlx5: Verify that driver supports user flags IB/mlx5: Support scatter to CQE for DC transport type RDMA/drivers: Use core provided API for registering device attributes RDMA/core: Allow existing drivers to set one sysfs group per device IB/rxe: Remove unnecessary enum values RDMA/umad: Use kernel API to allocate umad indexes RDMA/uverbs: Use kernel API to allocate uverbs indexes RDMA/core: Increase total number of RDMA ports across all devices IB/mlx4: Add port and TID to MAD debug print IB/mlx4: Enable debug print of SMPs RDMA/core: Rename ports_parent to ports_kobj RDMA/core: Do not expose unsupported counters IB/mlx4: Refer to the device kobject instead of ports_parent RDMA/nldev: Allow IB device rename through RDMA netlink ...
| * iw_cxgb4: Use proper enumerated type in c4iw_bar2_addrsNathan Chancellor2018-09-251-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clang warns when one enumerated type is implicitly converted to another. drivers/infiniband/hw/cxgb4/qp.c:287:8: warning: implicit conversion from enumeration type 'enum t4_bar2_qtype' to different enumeration type 'enum cxgb4_bar2_qtype' [-Wenum-conversion] T4_BAR2_QTYPE_EGRESS, ^~~~~~~~~~~~~~~~~~~~ c4iw_bar2_addrs expects a value from enum cxgb4_bar2_qtype so use the corresponding values from that type so Clang is satisfied without changing the meaning of the code. T4_BAR2_QTYPE_EGRESS = CXGB4_BAR2_QTYPE_EGRESS = 0 T4_BAR2_QTYPE_INGRESS = CXGB4_BAR2_QTYPE_INGRESS = 1 Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * RDMA/cxgb4: remove redundant null pointer check before kfree_skbzhong jiang2018-09-211-2/+1
| | | | | | | | | | | | | | | | | | kfree_skb has taken the null pointer into account. hence it is safe to remove the redundant null pointer check before kfree_skb. Signed-off-by: zhong jiang <zhongjiang@huawei.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | PCI: Remove pci_unmap_addr() wrappers for DMA APIChristoph Hellwig2018-10-101-5/+5
|/ | | | | | | Only some of these were still used by the cxgb4 driver, and that despite the fact that the driver otherwise uses the generic DMA API. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* iw_cxgb4: only allow 1 flush on user qpsSteve Wise2018-09-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once the qp has been flushed, it cannot be flushed again. The user qp flush logic wasn't enforcing it however. The bug can cause touch-after-free crashes like: Unable to handle kernel paging request for data at address 0x000001ec Faulting instruction address: 0xc008000016069100 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c008000016069100] flush_qp+0x80/0x480 [iw_cxgb4] LR [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4] Call Trace: [c00800001606cd6c] c4iw_modify_qp+0x71c/0x11d0 [iw_cxgb4] [c00800001606e868] c4iw_ib_modify_qp+0x118/0x200 [iw_cxgb4] [c0080000119eae80] ib_security_modify_qp+0xd0/0x3d0 [ib_core] [c0080000119c4e24] ib_modify_qp+0xc4/0x2c0 [ib_core] [c008000011df0284] iwcm_modify_qp_err+0x44/0x70 [iw_cm] [c008000011df0fec] destroy_cm_id+0xcc/0x370 [iw_cm] [c008000011ed4358] rdma_destroy_id+0x3c8/0x520 [rdma_cm] [c0080000134b0540] ucma_close+0x90/0x1b0 [rdma_ucm] [c000000000444da4] __fput+0xe4/0x2f0 So fix flush_qp() to only flush the wq once. Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* iw_cxgb4: Support FW write completion WRPotnuri Bharat Teja2018-08-031-1/+146
| | | | | | | | | | | | | To optimize NVME-oF READ IOPs, use a specialized WQE that combines the RDMA WRITE and SEND_INV WR chain submitted by the NVME-oF target driver. This reduces uP overhead per NVME-oF IO, and results in over 10% improvement in NVME-oF 4K READ IOPs. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* iw_cxgb4: RDMA write with immediate supportPotnuri Bharat Teja2018-08-031-8/+29
| | | | | | | | Adds iw_cxgb4 functionality to support RDMA_WRITE_WITH_IMMEDATE opcode. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* rdma/cxgb4: fix some info leaksDan Carpenter2018-08-031-4/+3
| | | | | | | | | | | | | | | | In c4iw_create_qp() there are several struct members which potentially aren't inintialized like uresp.rq_key. I've fixed this code before in in commit ae1fe07f3f42 ("RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()") so this time I'm just going to take a big hammer approach and memset the whole struct to zero. Hopefully, it will stay fixed this time. In c4iw_create_srq() we don't clear uresp.reserved. Fixes: 6a0b6174d35a ("rdma/cxgb4: Add support for kernel mode SRQ's") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* rdma/cxgb4: Simplify a structure initializationBart Van Assche2018-08-011-1/+1
| | | | | | | | | | | This patch avoids that sparse reports the following warning: drivers/infiniband/hw/cxgb4/qp.c:2269:34: warning: Using plain integer as NULL pointer Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Raju Rangoju <rajur@chelsio.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments constBart Van Assche2018-07-311-12/+15
| | | | | | | | | | | | | | | | | | | | | | Since neither ib_post_send() nor ib_post_recv() modify the data structure their second argument points at, declare that argument const. This change makes it necessary to declare the 'bad_wr' argument const too and also to modify all ULPs that call ib_post_send(), ib_post_recv() or ib_post_srq_recv(). This patch does not change any functionality but makes it possible for the compiler to verify whether the ib_post_(send|recv|srq_recv) really do not modify the posted work request. To make this possible, only one cast had to be introduce that casts away constness, namely in rpcrdma_post_recvs(). The only way I can think of to avoid that cast is to introduce an additional loop in that function or to change the data type of bad_wr from struct ib_recv_wr ** into int (an index that refers to an element in the work request list). However, both approaches would require even more extensive changes than this patch. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* RDMA: Constify the argument of the work request conversion functionsBart Van Assche2018-07-311-9/+12
| | | | | | | | | | | | When posting a send work request, the work request that is posted is not modified by any of the RDMA drivers. Make this explicit by constifying most ib_send_wr pointers in RDMA transport drivers. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* rdma/cxgb4: Add support for kernel mode SRQ'sRaju Rangoju2018-07-261-127/+691
| | | | | | | | | | | | | | | | This patch implements the srq specific verbs such as create/destroy/modify and post_srq_recv. And adds srq specific structures and defines to t4.h and uapi. Also updates the cq poll logic to deal with completions that are associated with the SRQ's. This patch also handles kernel mode SRQ_LIMIT events as well as flushed SRQ buffers Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* treewide: kzalloc() -> kcalloc()Kees Cook2018-06-131-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kzalloc() function has a 2-factor argument form, kcalloc(). This patch replaces cases of: kzalloc(a * b, gfp) with: kcalloc(a * b, gfp) as well as handling cases of: kzalloc(a * b * c, gfp) with: kzalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kzalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kzalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kzalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kzalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kzalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kzalloc( - sizeof(u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kzalloc( - sizeof(char) * COUNT + COUNT , ...) | kzalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kzalloc + kcalloc ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kzalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kzalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kzalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kzalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kzalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kzalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kzalloc(C1 * C2 * C3, ...) | kzalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kzalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kzalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kzalloc(sizeof(THING) * C2, ...) | kzalloc(sizeof(TYPE) * C2, ...) | kzalloc(C1 * C2 * C3, ...) | kzalloc(C1 * C2, ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kzalloc + kcalloc ( - (E1) * E2 + E1, E2 , ...) | - kzalloc + kcalloc ( - (E1) * (E2) + E1, E2 , ...) | - kzalloc + kcalloc ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>
* Merge branch 'k.o/for-rc' into k.o/wip/dl-for-nextDoug Ledford2018-05-091-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several items of conflict have arisen between the RDMA stack's for-rc branch and upcoming for-next work: 9fd4350ba895 ("IB/rxe: avoid double kfree_skb") directly conflicts with 2e47350789eb ("IB/rxe: optimize the function duplicate_request") Patches already submitted by Intel for the hfi1 driver will fail to apply cleanly without this merge Other people on the mailing list have notified that their upcoming patches also fail to apply cleanly without this merge Signed-off-by: Doug Ledford <dledford@redhat.com>
| * iw_cxgb4: Atomically flush per QP HW CQEsBharat Potnuri2018-04-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | When a CQ is shared by multiple QPs, c4iw_flush_hw_cq() needs to acquire corresponding QP lock before moving the CQEs into its corresponding SW queue and accessing the SQ contents for completing a WR. Ignore CQEs if corresponding QP is already flushed. Cc: stable@vger.kernel.org Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | IB/cxgb4: use skb_put_zero()/__skb_put_zeroYueHaibing2018-05-011-6/+3
|/ | | | | | | | | Use the recently introduced helper to replace the pattern of skb_put_zero/__skb_put() && memset(). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge branch 'from-rc' of ↵Jason Gunthorpe2017-12-281-11/+83
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git Patches for 4.16 that are dependent on patches sent to 4.15-rc. These are small clean ups for the vmw_pvrdma and i40iw drivers. * 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git: RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file i40iw: Change accelerated flag to bool
| * iw_cxgb4: when flushing, complete all wrs in a chainSteve Wise2017-12-221-2/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a wr chain was posted and needed to be flushed, only the first wr in the chain was completed with FLUSHED status. The rest were never completed. This caused isert to hang on shutdown due to the missing completions which left iscsi IO commands referenced, stalling the shutdown. Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * iw_cxgb4: reflect the original WR opcode in drain cqesSteve Wise2017-12-221-4/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The flush/drain logic was not retaining the original wr opcode in its completion. This can cause problems if the application uses the completion opcode to make decisions. Use bit 10 of the CQE header word to indicate the CQE is a special drain completion, and save the original WR opcode in the cqe header opcode field. Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * iw_cxgb4: only insert drain cqes if wq is flushedSteve Wise2017-12-111-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only insert our special drain CQEs to support ib_drain_sq/rq() after the wq is flushed. Otherwise, existing but not yet polled CQEs can be returned out of order to the user application. This can happen when the QP has exited RTS but not yet flushed the QP, which can happen during a normal close (vs abortive close). In addition never count the drain CQEs when determining how many CQEs need to be synthesized during the flush operation. This latter issue should never happen if the QP is properly flushed before inserting the drain CQE, but I wanted to avoid corrupting the CQ state. So we handle it and log a warning once. Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Cc: stable@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * iw_cxgb4: only clear the ARMED bit if a notification is neededSteve Wise2017-12-071-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __flush_qp(), the CQ ARMED bit was being cleared regardless of whether any notification is actually needed. This resulted in the iser termination logic getting stuck in ib_drain_sq() because the CQ was not marked ARMED and thus the drain CQE notification wasn't triggered. This new bug was exposed when this commit was merged: commit cbb40fadd31c ("iw_cxgb4: only call the cq comp_handler when the cq is armed") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | infiniband: cxgb4: use ktime_get for timestampsArnd Bergmann2017-12-131-3/+3
|/ | | | | | | | | | | | | | | | | | | | The debugfs file prints the difference between host timestamps as a seconds/nanoseconds tuple, along with a 64-bit nanoseconds hardware timestamp. The host time is read using getnstimeofday() which is deprecated because of the y2038 overflow, and it suffers from time jumps during settimeofday() and leap seconds. Converting to ktime_get_ts64() would solve those two, but I'm going a little further here by changing to ktime_get() and printing 64-bit nanoseconds on both host and hw timestamps. This simplifies the code further and makes the output easier to understand. The format of the debugfs file obviously changes here, but this should only be read by humans and not scripts, so I assume it's fine. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* iw_cxgb4: atomically flush the qpSteve Wise2017-11-131-8/+11
| | | | | | | | | | | | | __flush_qp() has a race condition where during the flush operation, the qp lock is released allowing another thread to possibly post a WR, which corrupts the queue state, possibly causing crashes. The lock was released to preserve the cq/qp locking hierarchy of cq first, then qp. However releasing the qp lock is not necessary; both RQ and SQ CQ locks can be acquired first, followed by the qp lock, and then the RQ and SQ flushing can be done w/o unlocking. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: only call the cq comp_handler when the cq is armedSteve Wise2017-11-131-8/+12
| | | | | | | | The ULPs completion handler should only be called if the CQ is armed for notification. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: remove BUG_ON() usage.Steve Wise2017-11-131-3/+0
| | | | | | | | | | | | | | | | | iw_cxgb4 has many BUG_ON()s that were left over from various enhancemnets made over the years. Almost all of them should just be removed. Some, however indicate a ULP usage error and can be handled w/o bringing down the system. If the condition cannot happen with correctly implemented cxgb4 sw/fw, then remove the BUG_ON. If the condition indicates a misbehaving ULP (like CQ overflows), add proper recovery logic. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/cxgb4: Protect from possible dereferenceLeon Romanovsky2017-11-131-1/+1
| | | | | | | | | | | | Smatch tool reports the following error: drivers/infiniband/hw/cxgb4/qp.c:1886 c4iw_create_qp() error: we previously assumed 'ucontext' could be null (see line 1804) Cc: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge branch 'timer_setup' into for-nextDoug Ledford2017-10-181-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/infiniband/hw/cxgb4/cm.c drivers/infiniband/hw/qib/qib_driver.c drivers/infiniband/hw/qib/qib_mad.c There were minor fixups needed in these files. Just minor context diffs due to patches from independent sources touching the same basic area. Signed-off-by: Doug Ledford <dledford@redhat.com>
| * RDMA/cxgb4: Convert timers to use timer_setup()Kees Cook2017-10-181-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Also removes an unused timer and drops a redundant initialization. Cc: Steve Wise <swise@chelsio.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: add referencing to wait objectsSteve Wise2017-09-291-21/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For messages sent from the host to fw that solicit a reply from fw, the c4iw_wr_wait struct pointer is passed in the host->fw message, and included in the fw->host fw6_msg reply. This allows the sender to wait until the reply is received, and the code processing the ingress reply to wake up the sender. If c4iw_wait_for_reply() times out, however, we need to keep the c4iw_wr_wait object around in case the reply eventually does arrive. Otherwise we have touch-after-free bugs in the wake_up paths. This was hit due to a bad kernel driver that blocked ingress processing of cxgb4 for a long time, causing iw_cxgb4 timeouts, but eventually resuming ingress processing and thus hitting the touch-after-free bug. So I want to fix iw_cxgb4 such that we'll at least keep the wait object around until the reply comes. If it never comes we leak a small amount of memory, but if it does come late, we won't potentially crash the system. So add a kref struct in the c4iw_wr_wait struct, and take a reference before sending a message to FW that will generate a FW6 reply. And remove the reference (and potentially free the wait object) when the reply is processed. The ep code also uses the wr_wait for non FW6 CPL messages and doesn't embed the c4iw_wr_wait object in the message sent to firmware. So for those cases we add c4iw_wake_up_noref(). The mr/mw, cq, and qp object create/destroy paths do need this reference logic. For these paths, c4iw_ref_send_wait() is introduced to take the wr_wait reference, send the msg to fw, and then wait for the reply. So going forward, iw_cxgb4 either uses c4iw_ofld_send(), c4iw_wait_for_reply() and c4iw_wake_up_noref() like is done in the some of the endpoint logic, or c4iw_ref_send_wait() and c4iw_wake_up_deref() (formerly c4iw_wake_up()) when sending messages with the c4iw_wr_wait object pointer embedded in the message and resulting FW6 reply. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: allocate wait object for each ep objectSteve Wise2017-09-291-4/+4
| | | | | | | | | | | | | | | | Remove the embedded c4iw_wr_wait object in preparation for correctly handling timeouts. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: allocate wait object for each qp objectSteve Wise2017-09-291-22/+33
| | | | | | | | | | | | | | | | | | | | Remove the local stack allocated c4iw_wr_wait object in preparation for correctly handling timeouts. Also cleaned up some error path unwind logic to make it more readable. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: change pr_debug to appropriate log levelBharat Potnuri2017-09-271-2/+2
| | | | | | | | | | | | | | | | | | Error logs of iw_cxgb4 needs to be printed by default. This patch changes the necessary pr_debug() to appropriate pr_<log level>. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: Remove __func__ parameter from pr_debug()Bharat Potnuri2017-09-271-30/+25
|/ | | | | | | | | pr_debug() can be enabled to print function names, So removing the unwanted __func__ parameters from debug logs. Realign function parameters. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: don't use WR keys/addrs for 0 byte readsGanesh Goudar2017-07-201-1/+1
| | | | | | | | | | | | Only use the read sge lkey/addr and the remote rkey/addr if the length of the read is not zero. Otherwise the read response might be treated as the RTR read response and not delivered to the application. Or worse Terminator hardware will fail a 0B read if the STAG is 0 even if the read length is 0. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* net: introduce __skb_put_[zero, data, u8]yuan linyu2017-06-201-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | follow Johannes Berg, semantic patch file as below, @@ identifier p, p2; expression len; expression skb; type t, t2; @@ ( -p = __skb_put(skb, len); +p = __skb_put_zero(skb, len); | -p = (t)__skb_put(skb, len); +p = __skb_put_zero(skb, len); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, len); | -memset(p, 0, len); ) @@ identifier p; expression len; expression skb; type t; @@ ( -t p = __skb_put(skb, len); +t p = __skb_put_zero(skb, len); ) ... when != p ( -memset(p, 0, len); ) @@ type t, t2; identifier p, p2; expression skb; @@ t *p; ... ( -p = __skb_put(skb, sizeof(t)); +p = __skb_put_zero(skb, sizeof(t)); | -p = (t *)__skb_put(skb, sizeof(t)); +p = __skb_put_zero(skb, sizeof(t)); ) ... when != p ( p2 = (t2)p; -memset(p2, 0, sizeof(*p)); | -memset(p, 0, sizeof(*p)); ) @@ expression skb, len; @@ -memset(__skb_put(skb, len), 0, len); +__skb_put_zero(skb, len); @@ expression skb, len, data; @@ -memcpy(__skb_put(skb, len), data, len); +__skb_put_data(skb, data, len); @@ expression SKB, C, S; typedef u8; identifier fn = {__skb_put}; fresh identifier fn2 = fn ## "_u8"; @@ - *(u8 *)fn(SKB, S) = C; + fn2(SKB, C); Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
* networking: make skb_put & friends return void pointersJohannes Berg2017-06-161-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems like a historic accident that these return unsigned char *, and in many places that means casts are required, more often than not. Make these functions (skb_put, __skb_put and pskb_put) return void * and remove all the casts across the tree, adding a (u8 *) cast only where the unsigned char pointer was used directly, all done with the following spatch: @@ expression SKB, LEN; typedef u8; identifier fn = { skb_put, __skb_put }; @@ - *(fn(SKB, LEN)) + *(u8 *)fn(SKB, LEN) @@ expression E, SKB, LEN; identifier fn = { skb_put, __skb_put }; type T; @@ - E = ((T *)(fn(SKB, LEN))) + E = fn(SKB, LEN) which actually doesn't cover pskb_put since there are only three users overall. A handful of stragglers were converted manually, notably a macro in drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many instances in net/bluetooth/hci_sock.c. In the former file, I also had to fix one whitespace problem spatch introduced. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cxgb4: Convert PDBG to pr_debugJoe Perches2017-04-201-44/+47
| | | | | | | | | | | | | | Use a more typical logging style. Miscellanea: o Obsolete the c4iw_debug module parameter o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* cxgb4: Use more common logging styleJoe Perches2017-04-201-3/+2
| | | | | | | | | | | | | Convert printks to pr_<level> Miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge branch 'locking-core-for-linus' of ↵Linus Torvalds2017-02-201-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Implement wraparound-safe refcount_t and kref_t types based on generic atomic primitives (Peter Zijlstra) - Improve and fix the ww_mutex code (Nicolai Hähnle) - Add self-tests to the ww_mutex code (Chris Wilson) - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr Bueso) - Micro-optimize the current-task logic all around the core kernel (Davidlohr Bueso) - Tidy up after recent optimizations: remove stale code and APIs, clean up the code (Waiman Long) - ... plus misc fixes, updates and cleanups" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) fork: Fix task_struct alignment locking/spinlock/debug: Remove spinlock lockup detection code lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS lkdtm: Convert to refcount_t testing kref: Implement 'struct kref' using refcount_t refcount_t: Introduce a special purpose refcount type sched/wake_q: Clarify queue reinit comment sched/wait, rcuwait: Fix typo in comment locking/mutex: Fix lockdep_assert_held() fail locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock() locking/rwsem: Reinit wake_q after use locking/rwsem: Remove unnecessary atomic_long_t casts jump_labels: Move header guard #endif down where it belongs locking/atomic, kref: Implement kref_put_lock() locking/ww_mutex: Turn off __must_check for now locking/atomic, kref: Avoid more abuse locking/atomic, kref: Use kref_get_unless_zero() more locking/atomic, kref: Kill kref_sub() locking/atomic, kref: Add kref_read() locking/atomic, kref: Add KREF_INIT() ...
| * locking/atomic, kref: Add kref_read()Peter Zijlstra2017-01-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we need to change the implementation, stop exposing internals. Provide kref_read() to read the current reference count; typically used for debug messages. Kills two anti-patterns: atomic_read(&kref->refcount) kref->refcount.counter Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | iw_cxgb4: free EQ queue memory on last derefSteve Wise2017-01-101-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") introduced a bug where the RDMA QP EQ queue memory (and QIDs) are possibly freed before the underlying connection has been fully shutdown. The result being a possible DMA read issued by HW after the queue memory has been unmapped and freed. This results in possible WR corruption in the worst case, system bus errors if an IOMMU is in use, and SGE "bad WR" errors reported in the very least. The fix is to defer unmap/free of queue memory and QID resources until the QP struct has been fully dereferenced. To do this, the c4iw_ucontext must also be kept around until the last QP that references it is fully freed. In addition, since the last QP deref can happen in an IRQ disabled context, we need a new workqueue thread to do the final unmap/free of the EQ queue memory. Fixes: ad61a4c7a9b7 ("iw_cxgb4: don't block in destroy_qp awaiting the last deref") Cc: stable@vger.kernel.org Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: refactor sq/rq drain logicSteve Wise2017-01-101-44/+68
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the addition of the IB/Core drain API, iw_cxgb4 supported drain by watching the CQs when the QP was out of RTS and signalling "drain complete" when the last CQE is polled. This, however, doesn't fully support the drain semantics. Namely, the drain logic is supposed to signal "drain complete" only when the application has _processed_ the last CQE, not just removed them from the CQ. Thus a small timing hole exists that can cause touch after free type bugs in applications using the drain API (nvmf, iSER, for example). So iw_cxgb4 needs a better solution. The iWARP Verbs spec mandates that "_at some point_ after the QP is moved to ERROR", the iWARP driver MUST synchronously fail post_send and post_recv calls. iw_cxgb4 was currently not allowing any posts once the QP is in ERROR. This was in part due to the fact that the HW queues for the QP in ERROR state are disabled at this point, so there wasn't much else to do but fail the post operation synchronously. This restriction is what drove the first drain implementation in iw_cxgb4 that has the above mentioned flaw. This patch changes iw_cxgb4 to allow post_send and post_recv WRs after the QP is moved to ERROR state for kernel mode users, thus still adhering to the Verbs spec for user mode users, but allowing flush WRs for kernel users. Since the HW queues are disabled, we just synthesize a CQE for this post, queue it to the SW CQ, and then call the CQ event handler. This enables proper drain operations for the various storage applications. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>