summaryrefslogtreecommitdiffstats
path: root/include/rdma
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-05-03 21:45:55 +0200
committerLinus Torvalds <torvalds@linux-foundation.org>2017-05-03 21:45:55 +0200
commit1684096b1ed813f621fb6cbd06e72235c1c2a0ca (patch)
tree13a228c35d6344f5d23b2c195aa3b026e42aac4b /include/rdma
parentMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dto... (diff)
parentinfiniband: avoid dereferencing uninitialized dst on error path (diff)
downloadlinux-1684096b1ed813f621fb6cbd06e72235c1c2a0ca.tar.xz
linux-1684096b1ed813f621fb6cbd06e72235c1c2a0ca.zip
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford: "More exchaustive description of primary updates in this release: - Lots of driver fixes and misc fixes across the board. - I had to base on a net-next tree because the IPoIB Accelorator patches needed it. Unfortunately, it was known to Mellanox that there would need to be an IPoIB accelorator patch to the net tree (which left some functions turned off by an #ifdef construct to avoid warnings about defined but unused functions), then one to the RDMA tree, then a fixup that went back and re-enabled the functions in the net tree and enabled their use in the rdma tree Also, a sparse fix was sent to the net tree after I did my pull, and the fixup patch conflicts quite directly with that sparse fix, so I'm going to submit the fixup patch towards the end of the merge window by itself and based upon your master branch at the time. - Two separate rounds of hfi1 fixes, one that got dropped from last release because it came in just a day or two before the end of the merge window and then the one from this release cycle. Of note is that I now have a third series that just landed from Intel yesterday. It is not included in this pull request, but I may submit it by the end of the week. I'll talk to Intel about improving the timing of thier submissions for my workflow. - Changes to our idr usage in the RDMA subsystem that will tie into our cgroup management and also into the upcoming changes for the RDMA kernel<->userspace API. - Addition of support for a netdev to be tied to an RDMA device at the core level - Addition of the VNIC driver from Intel. While IPoIB provides IP over InfiniBand (and *only* IP, no lower layer protocol headers are allowed or supported), the VNIC driver presents a virtual Ethernet device with support for things like varying Ethertypes, VLANs, priorities and other features of Ethernet. The virtual devices are centrally managed by the OPA fabric manager, making this (for the time being) a strictly OPA specific feature. - Improvements to the On-Demand Paging support in the RDMA subsystem. - Addition of three significant OPA changes. While we added OPA support some time ago (via the hfi1 driver), the RDMA subsystem has so far glossed over the areas where OPA and InfiniBand differ. With this release we are starting to add support for the OPA extensions into the RDMA core in the following area: Extended port information for OPA is now supported, extended Address Handle attributes for OPA are now supported, and extended SA Queries to get OPA specific subnet information is now supported. Concise summary from the tag: - idr usage and locking changes - build fix for hns - ipoib debug path record file fix - hfi1 updates - core RDMA netdev addition - Intel VNIC driver addition - Enhanced accelerators for IPoIB addition - Debug cleanups in cxgb3/4 - Trivial cleanups from SF Markus Elfring - Misc rxe fixes from Mellanox - Misc ipoib fixes from Mellanox - Lots of mlx4/mlx5 changes from Mellanox - Misc fixes across the RDMA subsystem - ODP paging fixes and improvements - qedr updates - hfi1 updates - OPA port info patches - OPA AH patches - OPA SA Query patches" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (191 commits) infiniband: avoid dereferencing uninitialized dst on error path IB/SA: Add OPA addr header IB/mlx5: Add port_xmit_wait to counter registers read IB/ocrdma: fix out of bounds access to local buffer IB/mlx4: Fix incorrect order of formal and actual parameters IB/mlx4: Change flush logic so it adheres to the variable name mlx5: Fix mlx5_ib_map_mr_sg mr length IB/rxe: Don't clamp residual length to mtu IB/SA: Add support to query OPA path records IB/SA: Add OPA path record type IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields IB/SA: Introduce path record specific types IB/SA: Rename ib_sa_path_rec to sa_path_rec IB/CM: Add braces when using sizeof IB/core: Define 'opa' rdma_ah_attr type IB/core: Define 'ib' and 'roce' rdma_ah_attr types IB/core: Use rdma_ah_attr accessor functions IB/core: Add accessor functions for rdma_ah_attr fields IB/PVRDMA: Rename ib_ah_attr related functions IB/mthca: Rename to_ib_ah_attr to to_rdma_ah_attr ...
Diffstat (limited to 'include/rdma')
-rw-r--r--include/rdma/ib_cm.h14
-rw-r--r--include/rdma/ib_hdrs.h66
-rw-r--r--include/rdma/ib_mad.h40
-rw-r--r--include/rdma/ib_marshall.h6
-rw-r--r--include/rdma/ib_pack.h2
-rw-r--r--include/rdma/ib_sa.h304
-rw-r--r--include/rdma/ib_umem.h8
-rw-r--r--include/rdma/ib_umem_odp.h6
-rw-r--r--include/rdma/ib_verbs.h325
-rw-r--r--include/rdma/opa_addr.h79
-rw-r--r--include/rdma/opa_port_info.h3
-rw-r--r--include/rdma/opa_vnic.h141
-rw-r--r--include/rdma/rdma_cm.h4
-rw-r--r--include/rdma/rdma_cm_ib.h2
-rw-r--r--include/rdma/rdma_vt.h11
-rw-r--r--include/rdma/rdmavt_qp.h18
-rw-r--r--include/rdma/uverbs_std_types.h114
-rw-r--r--include/rdma/uverbs_types.h172
18 files changed, 1222 insertions, 93 deletions
diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h
index b49258b16f4e..7979cb04f529 100644
--- a/include/rdma/ib_cm.h
+++ b/include/rdma/ib_cm.h
@@ -117,8 +117,8 @@ struct ib_cm_req_event_param {
u8 port;
- struct ib_sa_path_rec *primary_path;
- struct ib_sa_path_rec *alternate_path;
+ struct sa_path_rec *primary_path;
+ struct sa_path_rec *alternate_path;
__be64 remote_ca_guid;
u32 remote_qkey;
@@ -197,7 +197,7 @@ struct ib_cm_mra_event_param {
};
struct ib_cm_lap_event_param {
- struct ib_sa_path_rec *alternate_path;
+ struct sa_path_rec *alternate_path;
};
enum ib_cm_apr_status {
@@ -363,8 +363,8 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device,
__be64 service_id);
struct ib_cm_req_param {
- struct ib_sa_path_rec *primary_path;
- struct ib_sa_path_rec *alternate_path;
+ struct sa_path_rec *primary_path;
+ struct sa_path_rec *alternate_path;
__be64 service_id;
u32 qp_num;
enum ib_qp_type qp_type;
@@ -521,7 +521,7 @@ int ib_send_cm_mra(struct ib_cm_id *cm_id,
* @private_data_len: Size of the private data buffer, in bytes.
*/
int ib_send_cm_lap(struct ib_cm_id *cm_id,
- struct ib_sa_path_rec *alternate_path,
+ struct sa_path_rec *alternate_path,
const void *private_data,
u8 private_data_len);
@@ -565,7 +565,7 @@ int ib_send_cm_apr(struct ib_cm_id *cm_id,
u8 private_data_len);
struct ib_cm_sidr_req_param {
- struct ib_sa_path_rec *path;
+ struct sa_path_rec *path;
__be64 service_id;
int timeout_ms;
const void *private_data;
diff --git a/include/rdma/ib_hdrs.h b/include/rdma/ib_hdrs.h
index c755325f0831..5519f31f043a 100644
--- a/include/rdma/ib_hdrs.h
+++ b/include/rdma/ib_hdrs.h
@@ -74,6 +74,12 @@
#define IB_GRH_FLOW_MASK 0xFFFFF
#define IB_GRH_FLOW_SHIFT 0
#define IB_GRH_NEXT_HDR 0x1B
+#define IB_FECN_SHIFT 31
+#define IB_FECN_MASK 1
+#define IB_FECN_SMASK BIT(IB_FECN_SHIFT)
+#define IB_BECN_SHIFT 30
+#define IB_BECN_MASK 1
+#define IB_BECN_SMASK BIT(IB_BECN_SHIFT)
#define IB_AETH_CREDIT_SHIFT 24
#define IB_AETH_CREDIT_MASK 0x1F
@@ -181,4 +187,64 @@ static inline void put_ib_ateth_compare(u64 val, struct ib_atomic_eth *ateth)
ib_u64_put(val, &ateth->compare_data);
}
+/*
+ * 9B/IB Packet Format
+ */
+#define IB_LNH_MASK 3
+#define IB_SC_MASK 0xf
+#define IB_SC_SHIFT 12
+#define IB_SL_MASK 0xf
+#define IB_SL_SHIFT 4
+
+static inline u8 ib_get_lnh(struct ib_header *hdr)
+{
+ return (be16_to_cpu(hdr->lrh[0]) & IB_LNH_MASK);
+}
+
+static inline u8 ib_get_sc(struct ib_header *hdr)
+{
+ return ((be16_to_cpu(hdr->lrh[0]) >> IB_SC_SHIFT) & IB_SC_MASK);
+}
+
+static inline u8 ib_get_sl(struct ib_header *hdr)
+{
+ return ((be16_to_cpu(hdr->lrh[0]) >> IB_SL_SHIFT) & IB_SL_MASK);
+}
+
+static inline u16 ib_get_dlid(struct ib_header *hdr)
+{
+ return (be16_to_cpu(hdr->lrh[1]));
+}
+
+static inline u16 ib_get_slid(struct ib_header *hdr)
+{
+ return (be16_to_cpu(hdr->lrh[3]));
+}
+
+/*
+ * BTH
+ */
+#define IB_BTH_OPCODE_MASK 0xff
+#define IB_BTH_OPCODE_SHIFT 24
+#define IB_BTH_PAD_MASK 3
+#define IB_BTH_PKEY_MASK 0xffff
+#define IB_BTH_PAD_SHIFT 20
+
+static inline u8 ib_bth_get_pad(struct ib_other_headers *ohdr)
+{
+ return ((be32_to_cpu(ohdr->bth[0]) >> IB_BTH_PAD_SHIFT) &
+ IB_BTH_PAD_MASK);
+}
+
+static inline u16 ib_bth_get_pkey(struct ib_other_headers *ohdr)
+{
+ return (be32_to_cpu(ohdr->bth[0]) & IB_BTH_PKEY_MASK);
+}
+
+static inline u8 ib_bth_get_opcode(struct ib_other_headers *ohdr)
+{
+ return ((be32_to_cpu(ohdr->bth[0]) >> IB_BTH_OPCODE_SHIFT) &
+ IB_BTH_OPCODE_MASK);
+}
+
#endif /* IB_HDRS_H */
diff --git a/include/rdma/ib_mad.h b/include/rdma/ib_mad.h
index 981214b3790c..d67b11b72029 100644
--- a/include/rdma/ib_mad.h
+++ b/include/rdma/ib_mad.h
@@ -262,6 +262,33 @@ struct ib_class_port_info {
__be32 trap_qkey;
};
+#define OPA_CLASS_PORT_INFO_PR_SUPPORT BIT(26)
+
+struct opa_class_port_info {
+ u8 base_version;
+ u8 class_version;
+ __be16 cap_mask;
+ __be32 cap_mask2_resp_time;
+
+ u8 redirect_gid[16];
+ __be32 redirect_tc_fl;
+ __be32 redirect_lid;
+ __be32 redirect_sl_qp;
+ __be32 redirect_qkey;
+
+ u8 trap_gid[16];
+ __be32 trap_tc_fl;
+ __be32 trap_lid;
+ __be32 trap_hl_qp;
+ __be32 trap_qkey;
+
+ __be16 trap_pkey;
+ __be16 redirect_pkey;
+
+ u8 trap_sl_rsvd;
+ u8 reserved[3];
+} __packed;
+
/**
* ib_get_cpi_resp_time - Returns the resp_time value from
* cap_mask2_resp_time in ib_class_port_info.
@@ -315,6 +342,17 @@ static inline void ib_set_cpi_capmask2(struct ib_class_port_info *cpi,
IB_CLASS_PORT_INFO_RESP_TIME_FIELD_SIZE);
}
+/**
+ * opa_get_cpi_capmask2 - Returns the capmask2 value from
+ * cap_mask2_resp_time in ib_class_port_info.
+ * @cpi: A struct opa_class_port_info mad.
+ */
+static inline u32 opa_get_cpi_capmask2(struct opa_class_port_info *cpi)
+{
+ return (be32_to_cpu(cpi->cap_mask2_resp_time) >>
+ IB_CLASS_PORT_INFO_RESP_TIME_FIELD_SIZE);
+}
+
struct ib_mad_notice_attr {
u8 generic_type;
u8 prod_type_msb;
@@ -673,7 +711,7 @@ struct ib_mad_agent *ib_register_mad_snoop(struct ib_device *device,
* After invoking this routine, MAD services are no longer usable by the
* client on the associated QP.
*/
-int ib_unregister_mad_agent(struct ib_mad_agent *mad_agent);
+void ib_unregister_mad_agent(struct ib_mad_agent *mad_agent);
/**
* ib_post_send_mad - Posts MAD(s) to the send queue of the QP associated
diff --git a/include/rdma/ib_marshall.h b/include/rdma/ib_marshall.h
index db037205c9e8..68cef3bd50fb 100644
--- a/include/rdma/ib_marshall.h
+++ b/include/rdma/ib_marshall.h
@@ -42,12 +42,12 @@ void ib_copy_qp_attr_to_user(struct ib_uverbs_qp_attr *dst,
struct ib_qp_attr *src);
void ib_copy_ah_attr_to_user(struct ib_uverbs_ah_attr *dst,
- struct ib_ah_attr *src);
+ struct rdma_ah_attr *src);
void ib_copy_path_rec_to_user(struct ib_user_path_rec *dst,
- struct ib_sa_path_rec *src);
+ struct sa_path_rec *src);
-void ib_copy_path_rec_from_user(struct ib_sa_path_rec *dst,
+void ib_copy_path_rec_from_user(struct sa_path_rec *dst,
struct ib_user_path_rec *src);
#endif /* IB_USER_MARSHALL_H */
diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h
index b13419ce99ff..36655899ee02 100644
--- a/include/rdma/ib_pack.h
+++ b/include/rdma/ib_pack.h
@@ -80,6 +80,8 @@ enum {
IB_OPCODE_UD = 0x60,
/* per IBTA 1.3 vol 1 Table 38, A10.3.2 */
IB_OPCODE_CNP = 0x80,
+ /* Manufacturer specific */
+ IB_OPCODE_MSP = 0xe0,
/* operations -- just used to define real constants */
IB_OPCODE_SEND_FIRST = 0x00,
diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h
index fd0e53219f93..f5f70e345318 100644
--- a/include/rdma/ib_sa.h
+++ b/include/rdma/ib_sa.h
@@ -43,6 +43,8 @@
#include <rdma/ib_verbs.h>
#include <rdma/ib_mad.h>
+#include <rdma/ib_addr.h>
+#include <rdma/opa_addr.h>
enum {
IB_SA_CLASS_VERSION = 2, /* IB spec version 1.1/1.2 */
@@ -56,6 +58,7 @@ enum {
IB_SA_METHOD_GET_TRACE_TBL = 0x13
};
+#define OPA_SA_CLASS_VERSION 0x80
enum {
IB_SA_ATTR_CLASS_PORTINFO = 0x01,
IB_SA_ATTR_NOTICE = 0x02,
@@ -147,13 +150,45 @@ enum ib_sa_mc_join_states {
#define IB_SA_PATH_REC_PACKET_LIFE_TIME IB_SA_COMP_MASK(21)
#define IB_SA_PATH_REC_PREFERENCE IB_SA_COMP_MASK(22)
-struct ib_sa_path_rec {
+enum sa_path_rec_type {
+ SA_PATH_REC_TYPE_IB,
+ SA_PATH_REC_TYPE_ROCE_V1,
+ SA_PATH_REC_TYPE_ROCE_V2,
+ SA_PATH_REC_TYPE_OPA
+};
+
+struct sa_path_rec_ib {
__be64 service_id;
- union ib_gid dgid;
- union ib_gid sgid;
__be16 dlid;
__be16 slid;
u8 raw_traffic;
+};
+
+struct sa_path_rec_roce {
+ u8 dmac[ETH_ALEN];
+ /* ignored in IB */
+ int ifindex;
+ /* ignored in IB */
+ struct net *net;
+
+};
+
+struct sa_path_rec_opa {
+ __be64 service_id;
+ __be32 dlid;
+ __be32 slid;
+ u8 raw_traffic;
+ u8 l2_8B;
+ u8 l2_10B;
+ u8 l2_9B;
+ u8 l2_16B;
+ u8 qos_type;
+ u8 qos_priority;
+};
+
+struct sa_path_rec {
+ union ib_gid dgid;
+ union ib_gid sgid;
/* reserved */
__be32 flow_label;
u8 hop_limit;
@@ -170,17 +205,109 @@ struct ib_sa_path_rec {
u8 packet_life_time_selector;
u8 packet_life_time;
u8 preference;
- u8 dmac[ETH_ALEN];
- /* ignored in IB */
- int ifindex;
- /* ignored in IB */
- struct net *net;
- enum ib_gid_type gid_type;
+ union {
+ struct sa_path_rec_ib ib;
+ struct sa_path_rec_roce roce;
+ struct sa_path_rec_opa opa;
+ };
+ enum sa_path_rec_type rec_type;
};
-static inline struct net_device *ib_get_ndev_from_path(struct ib_sa_path_rec *rec)
+static inline enum ib_gid_type
+ sa_conv_pathrec_to_gid_type(struct sa_path_rec *rec)
+{
+ switch (rec->rec_type) {
+ case SA_PATH_REC_TYPE_ROCE_V1:
+ return IB_GID_TYPE_ROCE;
+ case SA_PATH_REC_TYPE_ROCE_V2:
+ return IB_GID_TYPE_ROCE_UDP_ENCAP;
+ default:
+ return IB_GID_TYPE_IB;
+ }
+}
+
+static inline enum sa_path_rec_type
+ sa_conv_gid_to_pathrec_type(enum ib_gid_type type)
+{
+ switch (type) {
+ case IB_GID_TYPE_ROCE:
+ return SA_PATH_REC_TYPE_ROCE_V1;
+ case IB_GID_TYPE_ROCE_UDP_ENCAP:
+ return SA_PATH_REC_TYPE_ROCE_V2;
+ default:
+ return SA_PATH_REC_TYPE_IB;
+ }
+}
+
+static inline void path_conv_opa_to_ib(struct sa_path_rec *ib,
+ struct sa_path_rec *opa)
+{
+ if ((be32_to_cpu(opa->opa.dlid) >=
+ be16_to_cpu(IB_MULTICAST_LID_BASE)) ||
+ (be32_to_cpu(opa->opa.slid) >=
+ be16_to_cpu(IB_MULTICAST_LID_BASE))) {
+ /* Create OPA GID and zero out the LID */
+ ib->dgid.global.interface_id
+ = OPA_MAKE_ID(be32_to_cpu(opa->opa.dlid));
+ ib->dgid.global.subnet_prefix
+ = opa->dgid.global.subnet_prefix;
+ ib->sgid.global.interface_id
+ = OPA_MAKE_ID(be32_to_cpu(opa->opa.slid));
+ ib->dgid.global.subnet_prefix
+ = opa->dgid.global.subnet_prefix;
+ ib->ib.dlid = 0;
+
+ ib->ib.slid = 0;
+ } else {
+ ib->ib.dlid = htons(ntohl(opa->opa.dlid));
+ ib->ib.slid = htons(ntohl(opa->opa.slid));
+ }
+ ib->ib.service_id = opa->opa.service_id;
+ ib->ib.raw_traffic = opa->opa.raw_traffic;
+}
+
+static inline void path_conv_ib_to_opa(struct sa_path_rec *opa,
+ struct sa_path_rec *ib)
{
- return rec->net ? dev_get_by_index(rec->net, rec->ifindex) : NULL;
+ __be32 slid, dlid;
+
+ if ((ib_is_opa_gid(&ib->sgid)) ||
+ (ib_is_opa_gid(&ib->dgid))) {
+ slid = htonl(opa_get_lid_from_gid(&ib->sgid));
+ dlid = htonl(opa_get_lid_from_gid(&ib->dgid));
+ } else {
+ slid = htonl(ntohs(ib->ib.slid));
+ dlid = htonl(ntohs(ib->ib.dlid));
+ }
+ opa->opa.slid = slid;
+ opa->opa.dlid = dlid;
+ opa->opa.service_id = ib->ib.service_id;
+ opa->opa.raw_traffic = ib->ib.raw_traffic;
+}
+
+/* Convert from OPA to IB path record */
+static inline void sa_convert_path_opa_to_ib(struct sa_path_rec *dest,
+ struct sa_path_rec *src)
+{
+ if (src->rec_type != SA_PATH_REC_TYPE_OPA)
+ return;
+
+ *dest = *src;
+ dest->rec_type = SA_PATH_REC_TYPE_IB;
+ path_conv_opa_to_ib(dest, src);
+}
+
+/* Convert from IB to OPA path record */
+static inline void sa_convert_path_ib_to_opa(struct sa_path_rec *dest,
+ struct sa_path_rec *src)
+{
+ if (src->rec_type != SA_PATH_REC_TYPE_IB)
+ return;
+
+ /* Do a structure copy and overwrite the relevant fields */
+ *dest = *src;
+ dest->rec_type = SA_PATH_REC_TYPE_OPA;
+ path_conv_ib_to_opa(dest, src);
}
#define IB_SA_MCMEMBER_REC_MGID IB_SA_COMP_MASK( 0)
@@ -322,11 +449,11 @@ void ib_sa_cancel_query(int id, struct ib_sa_query *query);
int ib_sa_path_rec_get(struct ib_sa_client *client,
struct ib_device *device, u8 port_num,
- struct ib_sa_path_rec *rec,
+ struct sa_path_rec *rec,
ib_sa_comp_mask comp_mask,
int timeout_ms, gfp_t gfp_mask,
void (*callback)(int status,
- struct ib_sa_path_rec *resp,
+ struct sa_path_rec *resp,
void *context),
void *context,
struct ib_sa_query **query);
@@ -420,27 +547,27 @@ int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num,
struct ib_sa_mcmember_rec *rec,
struct net_device *ndev,
enum ib_gid_type gid_type,
- struct ib_ah_attr *ah_attr);
+ struct rdma_ah_attr *ah_attr);
/**
* ib_init_ah_from_path - Initialize address handle attributes based on an SA
* path record.
*/
int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
- struct ib_sa_path_rec *rec,
- struct ib_ah_attr *ah_attr);
+ struct sa_path_rec *rec,
+ struct rdma_ah_attr *ah_attr);
/**
* ib_sa_pack_path - Conert a path record from struct ib_sa_path_rec
* to IB MAD wire format.
*/
-void ib_sa_pack_path(struct ib_sa_path_rec *rec, void *attribute);
+void ib_sa_pack_path(struct sa_path_rec *rec, void *attribute);
/**
* ib_sa_unpack_path - Convert a path record from MAD format to struct
* ib_sa_path_rec.
*/
-void ib_sa_unpack_path(void *attribute, struct ib_sa_path_rec *rec);
+void ib_sa_unpack_path(void *attribute, struct sa_path_rec *rec);
/* Support GuidInfoRecord */
int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
@@ -454,14 +581,137 @@ int ib_sa_guid_info_rec_query(struct ib_sa_client *client,
void *context,
struct ib_sa_query **sa_query);
-/* Support get SA ClassPortInfo */
-int ib_sa_classport_info_rec_query(struct ib_sa_client *client,
- struct ib_device *device, u8 port_num,
- int timeout_ms, gfp_t gfp_mask,
- void (*callback)(int status,
- struct ib_class_port_info *resp,
- void *context),
- void *context,
- struct ib_sa_query **sa_query);
+bool ib_sa_sendonly_fullmem_support(struct ib_sa_client *client,
+ struct ib_device *device,
+ u8 port_num);
+
+static inline bool sa_path_is_roce(struct sa_path_rec *rec)
+{
+ return ((rec->rec_type == SA_PATH_REC_TYPE_ROCE_V1) ||
+ (rec->rec_type == SA_PATH_REC_TYPE_ROCE_V2));
+}
+
+static inline void sa_path_set_service_id(struct sa_path_rec *rec,
+ __be64 service_id)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ rec->ib.service_id = service_id;
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ rec->opa.service_id = service_id;
+}
+
+static inline void sa_path_set_slid(struct sa_path_rec *rec, __be32 slid)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ rec->ib.slid = htons(ntohl(slid));
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ rec->opa.slid = slid;
+}
+
+static inline void sa_path_set_dlid(struct sa_path_rec *rec, __be32 dlid)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ rec->ib.dlid = htons(ntohl(dlid));
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ rec->opa.dlid = dlid;
+}
+
+static inline void sa_path_set_raw_traffic(struct sa_path_rec *rec,
+ u8 raw_traffic)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ rec->ib.raw_traffic = raw_traffic;
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ rec->opa.raw_traffic = raw_traffic;
+}
+
+static inline __be64 sa_path_get_service_id(struct sa_path_rec *rec)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ return rec->ib.service_id;
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ return rec->opa.service_id;
+ return 0;
+}
+
+static inline __be32 sa_path_get_slid(struct sa_path_rec *rec)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ return htonl(ntohs(rec->ib.slid));
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ return rec->opa.slid;
+ return 0;
+}
+
+static inline __be32 sa_path_get_dlid(struct sa_path_rec *rec)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ return htonl(ntohs(rec->ib.dlid));
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ return rec->opa.dlid;
+ return 0;
+}
+
+static inline u8 sa_path_get_raw_traffic(struct sa_path_rec *rec)
+{
+ if (rec->rec_type == SA_PATH_REC_TYPE_IB)
+ return rec->ib.raw_traffic;
+ else if (rec->rec_type == SA_PATH_REC_TYPE_OPA)
+ return rec->opa.raw_traffic;
+ return 0;
+}
+
+static inline void sa_path_set_dmac(struct sa_path_rec *rec, u8 *dmac)
+{
+ if (sa_path_is_roce(rec))
+ memcpy(rec->roce.dmac, dmac, ETH_ALEN);
+}
+
+static inline void sa_path_set_dmac_zero(struct sa_path_rec *rec)
+{
+ if (sa_path_is_roce(rec))
+ eth_zero_addr(rec->roce.dmac);
+}
+
+static inline void sa_path_set_ifindex(struct sa_path_rec *rec, int ifindex)
+{
+ if (sa_path_is_roce(rec))
+ rec->roce.ifindex = ifindex;
+}
+
+static inline void sa_path_set_ndev(struct sa_path_rec *rec, struct net *net)
+{
+ if (sa_path_is_roce(rec))
+ rec->roce.net = net;
+}
+
+static inline u8 *sa_path_get_dmac(struct sa_path_rec *rec)
+{
+ if (sa_path_is_roce(rec))
+ return rec->roce.dmac;
+ return NULL;
+}
+
+static inline int sa_path_get_ifindex(struct sa_path_rec *rec)
+{
+ if (sa_path_is_roce(rec))
+ return rec->roce.ifindex;
+ return 0;
+}
+
+static inline struct net *sa_path_get_ndev(struct sa_path_rec *rec)
+{
+ if (sa_path_is_roce(rec))
+ return rec->roce.net;
+ return NULL;
+}
+
+static inline struct net_device *ib_get_ndev_from_path(struct sa_path_rec *rec)
+{
+ return sa_path_get_ndev(rec) ?
+ dev_get_by_index(sa_path_get_ndev(rec),
+ sa_path_get_ifindex(rec))
+ : NULL;
+}
#endif /* IB_SA_H */
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 2d83cfd7e6ce..23159dd5be18 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -44,7 +44,7 @@ struct ib_umem {
struct ib_ucontext *context;
size_t length;
unsigned long address;
- int page_size;
+ int page_shift;
int writable;
int hugetlb;
struct work_struct work;
@@ -60,7 +60,7 @@ struct ib_umem {
/* Returns the offset of the umem start relative to the first page. */
static inline int ib_umem_offset(struct ib_umem *umem)
{
- return umem->address & ((unsigned long)umem->page_size - 1);
+ return umem->address & (BIT(umem->page_shift) - 1);
}
/* Returns the first page of an ODP umem. */
@@ -72,12 +72,12 @@ static inline unsigned long ib_umem_start(struct ib_umem *umem)
/* Returns the address of the page after the last one of an ODP umem. */
static inline unsigned long ib_umem_end(struct ib_umem *umem)
{
- return PAGE_ALIGN(umem->address + umem->length);
+ return ALIGN(umem->address + umem->length, BIT(umem->page_shift));
}
static inline size_t ib_umem_num_pages(struct ib_umem *umem)
{
- return (ib_umem_end(umem) - ib_umem_start(umem)) >> PAGE_SHIFT;
+ return (ib_umem_end(umem) - ib_umem_start(umem)) >> umem->page_shift;
}
#ifdef CONFIG_INFINIBAND_USER_MEM
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index 542cd8b3414c..fb67554aabd6 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -84,7 +84,8 @@ struct ib_umem_odp {
#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem);
+int ib_umem_odp_get(struct ib_ucontext *context, struct ib_umem *umem,
+ int access);
struct ib_umem *ib_alloc_odp_umem(struct ib_ucontext *context,
unsigned long addr,
size_t size);
@@ -154,7 +155,8 @@ static inline int ib_umem_mmu_notifier_retry(struct ib_umem *item,
#else /* CONFIG_INFINIBAND_ON_DEMAND_PAGING */
static inline int ib_umem_odp_get(struct ib_ucontext *context,
- struct ib_umem *umem)
+ struct ib_umem *umem,
+ int access)
{
return -EINVAL;
}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 99e4423eb2b8..f0cb4906478a 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -55,6 +55,7 @@
#include <net/ip.h>
#include <linux/string.h>
#include <linux/slab.h>
+#include <linux/netdevice.h>
#include <linux/if_link.h>
#include <linux/atomic.h>
@@ -224,6 +225,7 @@ enum ib_device_cap_flags {
IB_DEVICE_VIRTUAL_FUNCTION = (1ULL << 33),
/* Deprecated. Please use IB_RAW_PACKET_CAP_SCATTER_FCS. */
IB_DEVICE_RAW_SCATTER_FCS = (1ULL << 34),
+ IB_DEVICE_RDMA_NETDEV_OPA_VNIC = (1ULL << 35),
};
enum ib_signature_prot_cap {
@@ -431,7 +433,8 @@ enum ib_port_speed {
IB_SPEED_QDR = 4,
IB_SPEED_FDR10 = 8,
IB_SPEED_FDR = 16,
- IB_SPEED_EDR = 32
+ IB_SPEED_EDR = 32,
+ IB_SPEED_HDR = 64
};
/**
@@ -498,6 +501,7 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats_struct(
/* Address format 0x000FF000 */
#define RDMA_CORE_CAP_AF_IB 0x00001000
#define RDMA_CORE_CAP_ETH_AH 0x00002000
+#define RDMA_CORE_CAP_OPA_AH 0x00004000
/* Protocol 0xFFF00000 */
#define RDMA_CORE_CAP_PROT_IB 0x00100000
@@ -836,15 +840,38 @@ struct ib_mr_status {
*/
__attribute_const__ enum ib_rate mult_to_ib_rate(int mult);
+enum rdma_ah_attr_type {
+ RDMA_AH_ATTR_TYPE_IB,
+ RDMA_AH_ATTR_TYPE_ROCE,
+ RDMA_AH_ATTR_TYPE_OPA,
+};
+
struct ib_ah_attr {
- struct ib_global_route grh;
u16 dlid;
- u8 sl;
u8 src_path_bits;
+};
+
+struct roce_ah_attr {
+ u8 dmac[ETH_ALEN];
+};
+
+struct opa_ah_attr {
+ u32 dlid;
+ u8 src_path_bits;
+};
+
+struct rdma_ah_attr {
+ struct ib_global_route grh;
+ u8 sl;
u8 static_rate;
- u8 ah_flags;
u8 port_num;
- u8 dmac[ETH_ALEN];
+ u8 ah_flags;
+ enum rdma_ah_attr_type type;
+ union {
+ struct ib_ah_attr ib;
+ struct roce_ah_attr roce;
+ struct opa_ah_attr opa;
+ };
};
enum ib_wc_status {
@@ -1163,8 +1190,8 @@ struct ib_qp_attr {
u32 dest_qp_num;
int qp_access_flags;
struct ib_qp_cap cap;
- struct ib_ah_attr ah_attr;
- struct ib_ah_attr alt_ah_attr;
+ struct rdma_ah_attr ah_attr;
+ struct rdma_ah_attr alt_ah_attr;
u16 pkey_index;
u16 alt_pkey_index;
u8 en_sqd_async_notify;
@@ -1336,6 +1363,7 @@ enum ib_access_flags {
IB_ACCESS_MW_BIND = (1<<4),
IB_ZERO_BASED = (1<<5),
IB_ACCESS_ON_DEMAND = (1<<6),
+ IB_ACCESS_HUGETLB = (1<<7),
};
/*
@@ -1357,6 +1385,17 @@ struct ib_fmr_attr {
struct ib_umem;
+enum rdma_remove_reason {
+ /* Userspace requested uobject deletion. Call could fail */
+ RDMA_REMOVE_DESTROY,
+ /* Context deletion. This call should delete the actual object itself */
+ RDMA_REMOVE_CLOSE,
+ /* Driver is being hot-unplugged. This call should delete the actual object itself */
+ RDMA_REMOVE_DRIVER_REMOVE,
+ /* Context is being cleaned-up, but commit was just completed */
+ RDMA_REMOVE_DURING_CLEANUP,
+};
+
struct ib_rdmacg_object {
#ifdef CONFIG_CGROUP_RDMA
struct rdma_cgroup *cg; /* owner rdma cgroup */
@@ -1365,19 +1404,16 @@ struct ib_rdmacg_object {
struct ib_ucontext {
struct ib_device *device;
- struct list_head pd_list;
- struct list_head mr_list;
- struct list_head mw_list;
- struct list_head cq_list;
- struct list_head qp_list;
- struct list_head srq_list;
- struct list_head ah_list;
- struct list_head xrcd_list;
- struct list_head rule_list;
- struct list_head wq_list;
- struct list_head rwq_ind_tbl_list;
+ struct ib_uverbs_file *ufile;
int closing;
+ /* locking the uobjects_list */
+ struct mutex uobjects_lock;
+ struct list_head uobjects;
+ /* protects cleanup process from other actions */
+ struct rw_semaphore cleanup_rwsem;
+ enum rdma_remove_reason cleanup_reason;
+
struct pid *tgid;
#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
struct rb_root umem_tree;
@@ -1407,9 +1443,16 @@ struct ib_uobject {
struct ib_rdmacg_object cg_obj; /* rdmacg object */
int id; /* index into kernel idr */
struct kref ref;
- struct rw_semaphore mutex; /* protects .live */
+ atomic_t usecnt; /* protects exclusive access */
struct rcu_head rcu; /* kfree_rcu() overhead */
- int live;
+
+ const struct uverbs_obj_type *type;
+};
+
+struct ib_uobject_file {
+ struct ib_uobject uobj;
+ /* ufile contains the lock between context release and file close */
+ struct ib_uverbs_file *ufile;
};
struct ib_udata {
@@ -1447,6 +1490,7 @@ struct ib_ah {
struct ib_device *device;
struct ib_pd *pd;
struct ib_uobject *uobject;
+ enum rdma_ah_attr_type type;
};
typedef void (*ib_comp_handler)(struct ib_cq *cq, void *cq_context);
@@ -1662,6 +1706,7 @@ enum ib_flow_spec_type {
IB_FLOW_SPEC_INNER = 0x100,
/* Actions */
IB_FLOW_SPEC_ACTION_TAG = 0x1000,
+ IB_FLOW_SPEC_ACTION_DROP = 0x1001,
};
#define IB_FLOW_SPEC_LAYER_MASK 0xF0
#define IB_FLOW_SPEC_SUPPORT_LAYERS 8
@@ -1790,6 +1835,11 @@ struct ib_flow_spec_action_tag {
u32 tag_id;
};
+struct ib_flow_spec_action_drop {
+ enum ib_flow_spec_type type;
+ u16 size;
+};
+
union ib_flow_spec {
struct {
u32 type;
@@ -1802,6 +1852,7 @@ union ib_flow_spec {
struct ib_flow_spec_ipv6 ipv6;
struct ib_flow_spec_tunnel tunnel;
struct ib_flow_spec_action_tag flow_tag;
+ struct ib_flow_spec_action_drop drop;
};
struct ib_flow_attr {
@@ -1862,6 +1913,34 @@ struct ib_port_immutable {
u32 max_mad_size;
};
+/* rdma netdev type - specifies protocol type */
+enum rdma_netdev_t {
+ RDMA_NETDEV_OPA_VNIC,
+ RDMA_NETDEV_IPOIB,
+};
+
+/**
+ * struct rdma_netdev - rdma netdev
+ * For cases where netstack interfacing is required.
+ */
+struct rdma_netdev {
+ void *clnt_priv;
+ struct ib_device *hca;
+ u8 port_num;
+
+ /* control functions */
+ void (*set_id)(struct net_device *netdev, int id);
+ /* send packet */
+ int (*send)(struct net_device *dev, struct sk_buff *skb,
+ struct ib_ah *address, u32 dqpn);
+ /* multicast */
+ int (*attach_mcast)(struct net_device *dev, struct ib_device *hca,
+ union ib_gid *gid, u16 mlid,
+ int set_qkey, u32 qkey);
+ int (*detach_mcast)(struct net_device *dev, struct ib_device *hca,
+ union ib_gid *gid, u16 mlid);
+};
+
struct ib_device {
/* Do not access @dma_device directly from ULP nor from HW drivers. */
struct device *dma_device;
@@ -1977,12 +2056,12 @@ struct ib_device {
struct ib_udata *udata);
int (*dealloc_pd)(struct ib_pd *pd);
struct ib_ah * (*create_ah)(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr,
+ struct rdma_ah_attr *ah_attr,
struct ib_udata *udata);
int (*modify_ah)(struct ib_ah *ah,
- struct ib_ah_attr *ah_attr);
+ struct rdma_ah_attr *ah_attr);
int (*query_ah)(struct ib_ah *ah,
- struct ib_ah_attr *ah_attr);
+ struct rdma_ah_attr *ah_attr);
int (*destroy_ah)(struct ib_ah *ah);
struct ib_srq * (*create_srq)(struct ib_pd *pd,
struct ib_srq_init_attr *srq_init_attr,
@@ -2115,6 +2194,20 @@ struct ib_device {
struct ib_rwq_ind_table_init_attr *init_attr,
struct ib_udata *udata);
int (*destroy_rwq_ind_table)(struct ib_rwq_ind_table *wq_ind_table);
+ /**
+ * rdma netdev operations
+ *
+ * Driver implementing alloc_rdma_netdev must return -EOPNOTSUPP if it
+ * doesn't support the specified rdma netdev type.
+ */
+ struct net_device *(*alloc_rdma_netdev)(
+ struct ib_device *device,
+ u8 port_num,
+ enum rdma_netdev_t type,
+ const char *name,
+ unsigned char name_assign_type,
+ void (*setup)(struct net_device *));
+ void (*free_rdma_netdev)(struct net_device *netdev);
struct module *owner;
struct device dev;
@@ -2537,6 +2630,21 @@ static inline bool rdma_cap_eth_ah(const struct ib_device *device, u8 port_num)
}
/**
+ * rdma_cap_opa_ah - Check if the port of device supports
+ * OPA Address handles
+ * @device: Device to check
+ * @port_num: Port number to check
+ *
+ * Return: true if we are running on an OPA device which supports
+ * the extended OPA addressing.
+ */
+static inline bool rdma_cap_opa_ah(struct ib_device *device, u8 port_num)
+{
+ return (device->port_immutable[port_num].core_cap_flags &
+ RDMA_CORE_CAP_OPA_AH) == RDMA_CORE_CAP_OPA_AH;
+}
+
+/**
* rdma_max_mad_size - Return the max MAD size required by this RDMA Port.
*
* @device: Device
@@ -2636,14 +2744,14 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
void ib_dealloc_pd(struct ib_pd *pd);
/**
- * ib_create_ah - Creates an address handle for the given address vector.
+ * rdma_create_ah - Creates an address handle for the given address vector.
* @pd: The protection domain associated with the address handle.
* @ah_attr: The attributes of the address vector.
*
* The address handle is used to reference a local or global destination
* in all UD QP post sends.
*/
-struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
+struct ib_ah *rdma_create_ah(struct ib_pd *pd, struct rdma_ah_attr *ah_attr);
/**
* ib_get_gids_from_rdma_hdr - Get sgid and dgid from GRH or IPv4 header
@@ -2676,7 +2784,7 @@ int ib_get_rdma_header_version(const union rdma_network_hdr *hdr);
*/
int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
const struct ib_wc *wc, const struct ib_grh *grh,
- struct ib_ah_attr *ah_attr);
+ struct rdma_ah_attr *ah_attr);
/**
* ib_create_ah_from_wc - Creates an address handle associated with the
@@ -2694,28 +2802,28 @@ struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, const struct ib_wc *wc,
const struct ib_grh *grh, u8 port_num);
/**
- * ib_modify_ah - Modifies the address vector associated with an address
+ * rdma_modify_ah - Modifies the address vector associated with an address
* handle.
* @ah: The address handle to modify.
* @ah_attr: The new address vector attributes to associate with the
* address handle.
*/
-int ib_modify_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr);
+int rdma_modify_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr);
/**
- * ib_query_ah - Queries the address vector associated with an address
+ * rdma_query_ah - Queries the address vector associated with an address
* handle.
* @ah: The address handle to query.
* @ah_attr: The address vector attributes associated with the address
* handle.
*/
-int ib_query_ah(struct ib_ah *ah, struct ib_ah_attr *ah_attr);
+int rdma_query_ah(struct ib_ah *ah, struct rdma_ah_attr *ah_attr);
/**
- * ib_destroy_ah - Destroys an address handle.
+ * rdma_destroy_ah - Destroys an address handle.
* @ah: The address handle to destroy.
*/
-int ib_destroy_ah(struct ib_ah *ah);
+int rdma_destroy_ah(struct ib_ah *ah);
/**
* ib_create_srq - Creates a SRQ associated with the specified protection
@@ -3380,5 +3488,156 @@ void ib_drain_sq(struct ib_qp *qp);
void ib_drain_qp(struct ib_qp *qp);
int ib_resolve_eth_dmac(struct ib_device *device,
- struct ib_ah_attr *ah_attr);
+ struct rdma_ah_attr *ah_attr);
+
+static inline u8 *rdma_ah_retrieve_dmac(struct rdma_ah_attr *attr)
+{
+ if (attr->type == RDMA_AH_ATTR_TYPE_ROCE)
+ return attr->roce.dmac;
+ return NULL;
+}
+
+static inline void rdma_ah_set_dlid(struct rdma_ah_attr *attr, u32 dlid)
+{
+ if (attr->type == RDMA_AH_ATTR_TYPE_IB)
+ attr->ib.dlid = (u16)dlid;
+ else if (attr->type == RDMA_AH_ATTR_TYPE_OPA)
+ attr->opa.dlid = dlid;
+}
+
+static inline u32 rdma_ah_get_dlid(const struct rdma_ah_attr *attr)
+{
+ if (attr->type == RDMA_AH_ATTR_TYPE_IB)
+ return attr->ib.dlid;
+ else if (attr->type == RDMA_AH_ATTR_TYPE_OPA)
+ return attr->opa.dlid;
+ return 0;
+}
+
+static inline void rdma_ah_set_sl(struct rdma_ah_attr *attr, u8 sl)
+{
+ attr->sl = sl;
+}
+
+static inline u8 rdma_ah_get_sl(const struct rdma_ah_attr *attr)
+{
+ return attr->sl;
+}
+
+static inline void rdma_ah_set_path_bits(struct rdma_ah_attr *attr,
+ u8 src_path_bits)
+{
+ if (attr->type == RDMA_AH_ATTR_TYPE_IB)
+ attr->ib.src_path_bits = src_path_bits;
+ else if (attr->type == RDMA_AH_ATTR_TYPE_OPA)
+ attr->opa.src_path_bits = src_path_bits;
+}
+
+static inline u8 rdma_ah_get_path_bits(const struct rdma_ah_attr *attr)
+{
+ if (attr->type == RDMA_AH_ATTR_TYPE_IB)
+ return attr->ib.src_path_bits;
+ else if (attr->type == RDMA_AH_ATTR_TYPE_OPA)
+ return attr->opa.src_path_bits;
+ return 0;
+}
+
+static inline void rdma_ah_set_port_num(struct rdma_ah_attr *attr, u8 port_num)
+{
+ attr->port_num = port_num;
+}
+
+static inline u8 rdma_ah_get_port_num(const struct rdma_ah_attr *attr)
+{
+ return attr->port_num;
+}
+
+static inline void rdma_ah_set_static_rate(struct rdma_ah_attr *attr,
+ u8 static_rate)
+{
+ attr->static_rate = static_rate;
+}
+
+static inline u8 rdma_ah_get_static_rate(const struct rdma_ah_attr *attr)
+{
+ return attr->static_rate;
+}
+
+static inline void rdma_ah_set_ah_flags(struct rdma_ah_attr *attr,
+ enum ib_ah_flags flag)
+{
+ attr->ah_flags = flag;
+}
+
+static inline enum ib_ah_flags
+ rdma_ah_get_ah_flags(const struct rdma_ah_attr *attr)
+{
+ return attr->ah_flags;
+}
+
+static inline const struct ib_global_route
+ *rdma_ah_read_grh(const struct rdma_ah_attr *attr)
+{
+ return &attr->grh;
+}
+
+/*To retrieve and modify the grh */
+static inline struct ib_global_route
+ *rdma_ah_retrieve_grh(struct rdma_ah_attr *attr)
+{
+ return &attr->grh;
+}
+
+static inline void rdma_ah_set_dgid_raw(struct rdma_ah_attr *attr, void *dgid)
+{
+ struct ib_global_route *grh = rdma_ah_retrieve_grh(attr);
+
+ memcpy(grh->dgid.raw, dgid, sizeof(grh->dgid));
+}
+
+static inline void rdma_ah_set_subnet_prefix(struct rdma_ah_attr *attr,
+ __be64 prefix)
+{
+ struct ib_global_route *grh = rdma_ah_retrieve_grh(attr);
+
+ grh->dgid.global.subnet_prefix = prefix;
+}
+
+static inline void rdma_ah_set_interface_id(struct rdma_ah_attr *attr,
+ __be64 if_id)
+{
+ struct ib_global_route *grh = rdma_ah_retrieve_grh(attr);
+
+ grh->dgid.global.interface_id = if_id;
+}
+
+static inline void rdma_ah_set_grh(struct rdma_ah_attr *attr,
+ union ib_gid *dgid, u32 flow_label,
+ u8 sgid_index, u8 hop_limit,
+ u8 traffic_class)
+{
+ struct ib_global_route *grh = rdma_ah_retrieve_grh(attr);
+
+ attr->ah_flags = IB_AH_GRH;
+ if (dgid)
+ grh->dgid = *dgid;
+ grh->flow_label = flow_label;
+ grh->sgid_index = sgid_index;
+ grh->hop_limit = hop_limit;
+ grh->traffic_class = traffic_class;
+}
+
+/*Get AH type */
+static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev,
+ u32 port_num)
+{
+ if ((rdma_protocol_roce(dev, port_num)) ||
+ (rdma_protocol_iwarp(dev, port_num)))
+ return RDMA_AH_ATTR_TYPE_ROCE;
+ else if ((rdma_protocol_ib(dev, port_num)) &&
+ (rdma_cap_opa_ah(dev, port_num)))
+ return RDMA_AH_ATTR_TYPE_OPA;
+ else
+ return RDMA_AH_ATTR_TYPE_IB;
+}
#endif /* IB_VERBS_H */
diff --git a/include/rdma/opa_addr.h b/include/rdma/opa_addr.h
new file mode 100644
index 000000000000..eace28f1555d
--- /dev/null
+++ b/include/rdma/opa_addr.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright(c) 2017 Intel Corporation.
+ *
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * - Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * - Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef OPA_ADDR_H
+#define OPA_ADDR_H
+
+#define OPA_SPECIAL_OUI (0x00066AULL)
+#define OPA_MAKE_ID(x) (cpu_to_be64(OPA_SPECIAL_OUI << 40 | (x)))
+
+/**
+ * ib_is_opa_gid: Returns true if the top 24 bits of the gid
+ * contains the OPA_STL_OUI identifier. This identifies that
+ * the provided gid is a special purpose GID meant to carry
+ * extended LID information.
+ *
+ * @gid: The Global identifier
+ */
+static inline bool ib_is_opa_gid(union ib_gid *gid)
+{
+ return ((be64_to_cpu(gid->global.interface_id) >> 40) ==
+ OPA_SPECIAL_OUI);
+}
+
+/**
+ * opa_get_lid_from_gid: Returns the last 32 bits of the gid.
+ * OPA devices use one of the gids in the gid table to also
+ * store the lid.
+ *
+ * @gid: The Global identifier
+ */
+static inline u32 opa_get_lid_from_gid(union ib_gid *gid)
+{
+ return be64_to_cpu(gid->global.interface_id) & 0xFFFFFFFF;
+}
+#endif /* OPA_ADDR_H */
diff --git a/include/rdma/opa_port_info.h b/include/rdma/opa_port_info.h
index 9303e0e4f508..b4f0ac02f283 100644
--- a/include/rdma/opa_port_info.h
+++ b/include/rdma/opa_port_info.h
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2014 Intel Corporation. All rights reserved.
+ * Copyright (c) 2014-2017 Intel Corporation. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -127,6 +127,7 @@
#define OPA_LINK_WIDTH_3X 0x0004
#define OPA_LINK_WIDTH_4X 0x0008
+#define OPA_CAP_MASK3_IsEthOnFabricSupported (1 << 13)
#define OPA_CAP_MASK3_IsSnoopSupported (1 << 7)
#define OPA_CAP_MASK3_IsAsyncSC2VLSupported (1 << 6)
#define OPA_CAP_MASK3_IsAddrRangeConfigSupported (1 << 5)
diff --git a/include/rdma/opa_vnic.h b/include/rdma/opa_vnic.h
new file mode 100644
index 000000000000..39d6890616a6
--- /dev/null
+++ b/include/rdma/opa_vnic.h
@@ -0,0 +1,141 @@
+#ifndef _OPA_VNIC_H
+#define _OPA_VNIC_H
+/*
+ * Copyright(c) 2017 Intel Corporation.
+ *
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * BSD LICENSE
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * - Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * - Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+/*
+ * This file contains Intel Omni-Path (OPA) Virtual Network Interface
+ * Controller (VNIC) specific declarations.
+ */
+
+#include <rdma/ib_verbs.h>
+
+/* VNIC uses 16B header format */
+#define OPA_VNIC_L2_TYPE 0x2
+
+/* 16 header bytes + 2 reserved bytes */
+#define OPA_VNIC_L2_HDR_LEN (16 + 2)
+
+#define OPA_VNIC_L4_HDR_LEN 2
+
+#define OPA_VNIC_HDR_LEN (OPA_VNIC_L2_HDR_LEN + \
+ OPA_VNIC_L4_HDR_LEN)
+
+#define OPA_VNIC_L4_ETHR 0x78
+
+#define OPA_VNIC_ICRC_LEN 4
+#define OPA_VNIC_TAIL_LEN 1
+#define OPA_VNIC_ICRC_TAIL_LEN (OPA_VNIC_ICRC_LEN + OPA_VNIC_TAIL_LEN)
+
+#define OPA_VNIC_SKB_MDATA_LEN 4
+#define OPA_VNIC_SKB_MDATA_ENCAP_ERR 0x1
+
+/* opa vnic rdma netdev's private data structure */
+struct opa_vnic_rdma_netdev {
+ struct rdma_netdev rn; /* keep this first */
+ /* followed by device private data */
+ char *dev_priv[0];
+};
+
+static inline void *opa_vnic_priv(const struct net_device *dev)
+{
+ struct rdma_netdev *rn = netdev_priv(dev);
+
+ return rn->clnt_priv;
+}
+
+static inline void *opa_vnic_dev_priv(const struct net_device *dev)
+{
+ struct opa_vnic_rdma_netdev *oparn = netdev_priv(dev);
+
+ return oparn->dev_priv;
+}
+
+/* opa_vnic skb meta data structrue */
+struct opa_vnic_skb_mdata {
+ u8 vl;
+ u8 entropy;
+ u8 flags;
+ u8 rsvd;
+} __packed;
+
+/* OPA VNIC group statistics */
+struct opa_vnic_grp_stats {
+ u64 unicast;
+ u64 mcastbcast;
+ u64 untagged;
+ u64 vlan;
+ u64 s_64;
+ u64 s_65_127;
+ u64 s_128_255;
+ u64 s_256_511;
+ u64 s_512_1023;
+ u64 s_1024_1518;
+ u64 s_1519_max;
+};
+
+struct opa_vnic_stats {
+ /* standard netdev statistics */
+ struct rtnl_link_stats64 netstats;
+
+ /* OPA VNIC statistics */
+ struct opa_vnic_grp_stats tx_grp;
+ struct opa_vnic_grp_stats rx_grp;
+ u64 tx_dlid_zero;
+ u64 tx_drop_state;
+ u64 rx_drop_state;
+ u64 rx_runt;
+ u64 rx_oversize;
+};
+
+static inline bool rdma_cap_opa_vnic(struct ib_device *device)
+{
+ return !!(device->attrs.device_cap_flags &
+ IB_DEVICE_RDMA_NETDEV_OPA_VNIC);
+}
+
+#endif /* _OPA_VNIC_H */
diff --git a/include/rdma/rdma_cm.h b/include/rdma/rdma_cm.h
index d3968b561f86..3d2eed3c4e75 100644
--- a/include/rdma/rdma_cm.h
+++ b/include/rdma/rdma_cm.h
@@ -85,7 +85,7 @@ struct rdma_addr {
struct rdma_route {
struct rdma_addr addr;
- struct ib_sa_path_rec *path_rec;
+ struct sa_path_rec *path_rec;
int num_paths;
};
@@ -106,7 +106,7 @@ struct rdma_conn_param {
struct rdma_ud_param {
const void *private_data;
u8 private_data_len;
- struct ib_ah_attr ah_attr;
+ struct rdma_ah_attr ah_attr;
u32 qp_num;
u32 qkey;
};
diff --git a/include/rdma/rdma_cm_ib.h b/include/rdma/rdma_cm_ib.h
index 2389c3b45404..6947a6ba2557 100644
--- a/include/rdma/rdma_cm_ib.h
+++ b/include/rdma/rdma_cm_ib.h
@@ -46,7 +46,7 @@
* connection and replaces the call to rdma_resolve_route.
*/
int rdma_set_ib_paths(struct rdma_cm_id *id,
- struct ib_sa_path_rec *path_rec, int num_paths);
+ struct sa_path_rec *path_rec, int num_paths);
/* Global qkey for UDP QPs and multicast groups. */
#define RDMA_UDP_QKEY 0x01234567
diff --git a/include/rdma/rdma_vt.h b/include/rdma/rdma_vt.h
index 8fc1ca7b6f23..4878aaf7bdff 100644
--- a/include/rdma/rdma_vt.h
+++ b/include/rdma/rdma_vt.h
@@ -170,7 +170,7 @@ struct rvt_pd {
/* Address handle */
struct rvt_ah {
struct ib_ah ibah;
- struct ib_ah_attr attr;
+ struct rdma_ah_attr attr;
atomic_t refcount;
u8 vl;
u8 log_pmtu;
@@ -311,10 +311,10 @@ struct rvt_driver_provided {
unsigned (*free_all_qps)(struct rvt_dev_info *rdi);
/* Driver specific AH validation */
- int (*check_ah)(struct ib_device *, struct ib_ah_attr *);
+ int (*check_ah)(struct ib_device *, struct rdma_ah_attr *);
/* Inform the driver a new AH has been created */
- void (*notify_new_ah)(struct ib_device *, struct ib_ah_attr *,
+ void (*notify_new_ah)(struct ib_device *, struct rdma_ah_attr *,
struct rvt_ah *);
/* Let the driver pick the next queue pair number*/
@@ -506,7 +506,7 @@ struct rvt_dev_info *rvt_alloc_device(size_t size, int nports);
void rvt_dealloc_device(struct rvt_dev_info *rdi);
int rvt_register_device(struct rvt_dev_info *rvd);
void rvt_unregister_device(struct rvt_dev_info *rvd);
-int rvt_check_ah(struct ib_device *ibdev, struct ib_ah_attr *ah_attr);
+int rvt_check_ah(struct ib_device *ibdev, struct rdma_ah_attr *ah_attr);
int rvt_init_port(struct rvt_dev_info *rdi, struct rvt_ibport *port,
int port_index, u16 *pkey_table);
int rvt_fast_reg_mr(struct rvt_qp *qp, struct ib_mr *ibmr, u32 key,
@@ -516,6 +516,7 @@ int rvt_rkey_ok(struct rvt_qp *qp, struct rvt_sge *sge,
u32 len, u64 vaddr, u32 rkey, int acc);
int rvt_lkey_ok(struct rvt_lkey_table *rkt, struct rvt_pd *pd,
struct rvt_sge *isge, struct ib_sge *sge, int acc);
-struct rvt_mcast *rvt_mcast_find(struct rvt_ibport *ibp, union ib_gid *mgid);
+struct rvt_mcast *rvt_mcast_find(struct rvt_ibport *ibp, union ib_gid *mgid,
+ u16 lid);
#endif /* DEF_RDMA_VT_H */
diff --git a/include/rdma/rdmavt_qp.h b/include/rdma/rdmavt_qp.h
index f3816396c76a..1d8141a88d3c 100644
--- a/include/rdma/rdmavt_qp.h
+++ b/include/rdma/rdmavt_qp.h
@@ -2,7 +2,7 @@
#define DEF_RDMAVT_INCQP_H
/*
- * Copyright(c) 2016 Intel Corporation.
+ * Copyright(c) 2016, 2017 Intel Corporation.
*
* This file is provided under a dual BSD/GPLv2 license. When using or
* redistributing this file, you may do so under either license.
@@ -269,8 +269,8 @@ struct rvt_qp {
struct ib_qp ibqp;
void *priv; /* Driver private data */
/* read mostly fields above and below */
- struct ib_ah_attr remote_ah_attr;
- struct ib_ah_attr alt_ah_attr;
+ struct rdma_ah_attr remote_ah_attr;
+ struct rdma_ah_attr alt_ah_attr;
struct rvt_qp __rcu *next; /* link list for QPN hash table */
struct rvt_swqe *s_wq; /* send work queue */
struct rvt_mmap_info *ip;
@@ -435,9 +435,14 @@ struct rvt_mcast_qp {
struct rvt_qp *qp;
};
+struct rvt_mcast_addr {
+ union ib_gid mgid;
+ u16 lid;
+};
+
struct rvt_mcast {
struct rb_node rb_node;
- union ib_gid mgid;
+ struct rvt_mcast_addr mcast_addr;
struct list_head qp_list;
wait_queue_head_t wait;
atomic_t refcount;
@@ -526,7 +531,6 @@ static inline void rvt_qp_wqe_reserve(
struct rvt_qp *qp,
struct rvt_swqe *wqe)
{
- wqe->wr.send_flags |= RVT_SEND_RESERVE_USED;
atomic_inc(&qp->s_reserved_used);
}
@@ -550,7 +554,6 @@ static inline void rvt_qp_wqe_unreserve(
struct rvt_swqe *wqe)
{
if (unlikely(wqe->wr.send_flags & RVT_SEND_RESERVE_USED)) {
- wqe->wr.send_flags &= ~RVT_SEND_RESERVE_USED;
atomic_dec(&qp->s_reserved_used);
/* insure no compiler re-order up to s_last change */
smp_mb__after_atomic();
@@ -574,6 +577,7 @@ extern const enum ib_wc_opcode ib_rvt_wc_opcode[];
static inline void rvt_qp_swqe_complete(
struct rvt_qp *qp,
struct rvt_swqe *wqe,
+ enum ib_wc_opcode opcode,
enum ib_wc_status status)
{
if (unlikely(wqe->wr.send_flags & RVT_SEND_RESERVE_USED))
@@ -586,7 +590,7 @@ static inline void rvt_qp_swqe_complete(
memset(&wc, 0, sizeof(wc));
wc.wr_id = wqe->wr.wr_id;
wc.status = status;
- wc.opcode = ib_rvt_wc_opcode[wqe->wr.opcode];
+ wc.opcode = opcode;
wc.qp = &qp->ibqp;
wc.byte_len = wqe->length;
rvt_cq_enter(ibcq_to_rvtcq(qp->ibqp.send_cq), &wc,
diff --git a/include/rdma/uverbs_std_types.h b/include/rdma/uverbs_std_types.h
new file mode 100644
index 000000000000..7771ce966952
--- /dev/null
+++ b/include/rdma/uverbs_std_types.h
@@ -0,0 +1,114 @@
+/*
+ * Copyright (c) 2017, Mellanox Technologies inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_STD_TYPES__
+#define _UVERBS_STD_TYPES__
+
+#include <rdma/uverbs_types.h>
+
+extern const struct uverbs_obj_fd_type uverbs_type_attrs_comp_channel;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_cq;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_qp;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_rwq_ind_table;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_wq;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_srq;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_ah;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_flow;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_mr;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_mw;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_pd;
+extern const struct uverbs_obj_idr_type uverbs_type_attrs_xrcd;
+
+static inline struct ib_uobject *__uobj_get(const struct uverbs_obj_type *type,
+ bool write,
+ struct ib_ucontext *ucontext,
+ int id)
+{
+ return rdma_lookup_get_uobject(type, ucontext, id, write);
+}
+
+#define uobj_get_type(_type) uverbs_type_attrs_##_type.type
+
+#define uobj_get_read(_type, _id, _ucontext) \
+ __uobj_get(&(_type), false, _ucontext, _id)
+
+#define uobj_get_obj_read(_type, _id, _ucontext) \
+({ \
+ struct ib_uobject *uobj = \
+ __uobj_get(&uobj_get_type(_type), \
+ false, _ucontext, _id); \
+ \
+ (struct ib_##_type *)(IS_ERR(uobj) ? NULL : uobj->object); \
+})
+
+#define uobj_get_write(_type, _id, _ucontext) \
+ __uobj_get(&(_type), true, _ucontext, _id)
+
+static inline void uobj_put_read(struct ib_uobject *uobj)
+{
+ rdma_lookup_put_uobject(uobj, false);
+}
+
+#define uobj_put_obj_read(_obj) \
+ uobj_put_read((_obj)->uobject)
+
+static inline void uobj_put_write(struct ib_uobject *uobj)
+{
+ rdma_lookup_put_uobject(uobj, true);
+}
+
+static inline int __must_check uobj_remove_commit(struct ib_uobject *uobj)
+{
+ return rdma_remove_commit_uobject(uobj);
+}
+
+static inline void uobj_alloc_commit(struct ib_uobject *uobj)
+{
+ rdma_alloc_commit_uobject(uobj);
+}
+
+static inline void uobj_alloc_abort(struct ib_uobject *uobj)
+{
+ rdma_alloc_abort_uobject(uobj);
+}
+
+static inline struct ib_uobject *__uobj_alloc(const struct uverbs_obj_type *type,
+ struct ib_ucontext *ucontext)
+{
+ return rdma_alloc_begin_uobject(type, ucontext);
+}
+
+#define uobj_alloc(_type, ucontext) \
+ __uobj_alloc(&(_type), ucontext)
+
+#endif
+
diff --git a/include/rdma/uverbs_types.h b/include/rdma/uverbs_types.h
new file mode 100644
index 000000000000..351ea185df44
--- /dev/null
+++ b/include/rdma/uverbs_types.h
@@ -0,0 +1,172 @@
+/*
+ * Copyright (c) 2017, Mellanox Technologies inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_TYPES_
+#define _UVERBS_TYPES_
+
+#include <linux/kernel.h>
+#include <rdma/ib_verbs.h>
+
+struct uverbs_obj_type;
+
+struct uverbs_obj_type_class {
+ /*
+ * Get an ib_uobject that corresponds to the given id from ucontext,
+ * These functions could create or destroy objects if required.
+ * The action will be finalized only when commit, abort or put fops are
+ * called.
+ * The flow of the different actions is:
+ * [alloc]: Starts with alloc_begin. The handlers logic is than
+ * executed. If the handler is successful, alloc_commit
+ * is called and the object is inserted to the repository.
+ * Once alloc_commit completes the object is visible to
+ * other threads and userspace.
+ e Otherwise, alloc_abort is called and the object is
+ * destroyed.
+ * [lookup]: Starts with lookup_get which fetches and locks the
+ * object. After the handler finished using the object, it
+ * needs to call lookup_put to unlock it. The exclusive
+ * flag indicates if the object is locked for exclusive
+ * access.
+ * [remove]: Starts with lookup_get with exclusive flag set. This
+ * locks the object for exclusive access. If the handler
+ * code completed successfully, remove_commit is called
+ * and the ib_uobject is removed from the context's
+ * uobjects repository and put. The object itself is
+ * destroyed as well. Once remove succeeds new krefs to
+ * the object cannot be acquired by other threads or
+ * userspace and the hardware driver is removed from the
+ * object. Other krefs on the object may still exist.
+ * If the handler code failed, lookup_put should be
+ * called. This callback is used when the context
+ * is destroyed as well (process termination,
+ * reset flow).
+ */
+ struct ib_uobject *(*alloc_begin)(const struct uverbs_obj_type *type,
+ struct ib_ucontext *ucontext);
+ void (*alloc_commit)(struct ib_uobject *uobj);
+ void (*alloc_abort)(struct ib_uobject *uobj);
+
+ struct ib_uobject *(*lookup_get)(const struct uverbs_obj_type *type,
+ struct ib_ucontext *ucontext, int id,
+ bool exclusive);
+ void (*lookup_put)(struct ib_uobject *uobj, bool exclusive);
+ /*
+ * Must be called with the exclusive lock held. If successful uobj is
+ * invalid on return. On failure uobject is left completely
+ * unchanged
+ */
+ int __must_check (*remove_commit)(struct ib_uobject *uobj,
+ enum rdma_remove_reason why);
+ u8 needs_kfree_rcu;
+};
+
+struct uverbs_obj_type {
+ const struct uverbs_obj_type_class * const type_class;
+ size_t obj_size;
+ unsigned int destroy_order;
+};
+
+/*
+ * Objects type classes which support a detach state (object is still alive but
+ * it's not attached to any context need to make sure:
+ * (a) no call through to a driver after a detach is called
+ * (b) detach isn't called concurrently with context_cleanup
+ */
+
+struct uverbs_obj_idr_type {
+ /*
+ * In idr based objects, uverbs_obj_type_class points to a generic
+ * idr operations. In order to specialize the underlying types (e.g. CQ,
+ * QPs, etc.), we add destroy_object specific callbacks.
+ */
+ struct uverbs_obj_type type;
+
+ /* Free driver resources from the uobject, make the driver uncallable,
+ * and move the uobject to the detached state. If the object was
+ * destroyed by the user's request, a failure should leave the uobject
+ * completely unchanged.
+ */
+ int __must_check (*destroy_object)(struct ib_uobject *uobj,
+ enum rdma_remove_reason why);
+};
+
+struct ib_uobject *rdma_lookup_get_uobject(const struct uverbs_obj_type *type,
+ struct ib_ucontext *ucontext,
+ int id, bool exclusive);
+void rdma_lookup_put_uobject(struct ib_uobject *uobj, bool exclusive);
+struct ib_uobject *rdma_alloc_begin_uobject(const struct uverbs_obj_type *type,
+ struct ib_ucontext *ucontext);
+void rdma_alloc_abort_uobject(struct ib_uobject *uobj);
+int __must_check rdma_remove_commit_uobject(struct ib_uobject *uobj);
+int rdma_alloc_commit_uobject(struct ib_uobject *uobj);
+
+struct uverbs_obj_fd_type {
+ /*
+ * In fd based objects, uverbs_obj_type_ops points to generic
+ * fd operations. In order to specialize the underlying types (e.g.
+ * completion_channel), we use fops, name and flags for fd creation.
+ * context_closed is called when the context is closed either when
+ * the driver is removed or the process terminated.
+ */
+ struct uverbs_obj_type type;
+ int (*context_closed)(struct ib_uobject_file *uobj_file,
+ enum rdma_remove_reason why);
+ const struct file_operations *fops;
+ const char *name;
+ int flags;
+};
+
+extern const struct uverbs_obj_type_class uverbs_idr_class;
+extern const struct uverbs_obj_type_class uverbs_fd_class;
+
+#define UVERBS_BUILD_BUG_ON(cond) (sizeof(char[1 - 2 * !!(cond)]) - \
+ sizeof(char))
+#define UVERBS_TYPE_ALLOC_FD(_size, _order) \
+ { \
+ .destroy_order = _order, \
+ .type_class = &uverbs_fd_class, \
+ .obj_size = (_size) + \
+ UVERBS_BUILD_BUG_ON((_size) < \
+ sizeof(struct ib_uobject_file)),\
+ }
+#define UVERBS_TYPE_ALLOC_IDR_SZ(_size, _order) \
+ { \
+ .destroy_order = _order, \
+ .type_class = &uverbs_idr_class, \
+ .obj_size = (_size) + \
+ UVERBS_BUILD_BUG_ON((_size) < \
+ sizeof(struct ib_uobject)), \
+ }
+#define UVERBS_TYPE_ALLOC_IDR(_order) \
+ UVERBS_TYPE_ALLOC_IDR_SZ(sizeof(struct ib_uobject), _order)
+#endif