summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i40e: properly cleanup on allocation failure in i40e_sync_vsi_filtersJacob Keller2016-10-311-17/+22
| | | | | | | | | | | | | | | | | Currently, we fail to correctly restore filters on the temporary add list when we fail to allocate memory either for deletion or addition. Replace calls to "goto out;" with calls to a new location that correctly handles memory allocation failures. Note that it is safe for us to call i40e_undo_filter_entries on the tmp_del_list even after we've deleted filters because at this point it will be empty, so we don't need to separate the logic for add and delete failure. Change-Id: Iee107fd219c6e03e2fd9645c2debf8e8384a8521 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: store MAC/VLAN filters in a hash with the MAC Address as keyJacob Keller2016-10-315-107/+161
| | | | | | | | | | | | Replace the mac_filter_list with a static size hash table of 8bits. The primary advantage of this is a decrease in latency of operations related to searching for specific MAC filters, including .set_rx_mode. Using a linked list resulted in several locations which were O(n^2). Using a hash table should give us latency growth closer to O(n*log(n)). Change-ID: I5330bd04053b880e670210933e35830b95948ebb Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
* i40e: implement __i40e_del_filter and use where applicableJacob Keller2016-10-311-33/+59
| | | | | | | | | | | | | | | | When inside a loop where we call i40e_del_filter we use an O(n^2) pattern where i40e_del_filter calls i40e_find_filter for us. We can avoid this O(n^2) logic by factoring a function, __i40e_del_filter() out from the i40e_del_filter code. This allows us to re-use the delete logic where appropriate without having to search for the filter twice. This new function benefits several functions including i40e_vsi_add_vlan, i40e_vsi_kill_vlan, i40e_del_mac_vlan_all, and i40e_vsi_release. Change-ID: I75fabe0f53bf73f56b80d342e5fdcfcc28f4d3eb Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: When searching all MAC/VLAN filters, ignore removed filtersJacob Keller2016-10-311-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding new MAC address filters, the driver determines if it should behave in VLAN mode (where all MAC addresses get assigned to every existing VLAN) or in non-VLAN mode where MAC addresses get assigned the VLAN_ANY identifier. Under some circumstances it is possible that a VLAN has been marked for removal (such that all filters of that VLAN are set to I40E_FILTER_REMOVE), and a subsequent call to i40e_put_mac_in_vlan may occur prior to the driver subtask that syncs filters to the hardware. In this case, we may add filters to the new removed VLAN, even though it should have been removed. This is most obvious when first adding a new VLAN. We will delete all filters which are in I40E_VLAN_ANY (-1) and then re-add them as in VLAN 0 (untagged). Then before we sync filters, we will add new MAC address filter, which will be added to every VLAN that exists. Unfortunately, this will include I40E_VLAN_ANY, so we will end up incorrectly adding filters to the -1 VLAN. This can be fixed by simply skipping all filters which are marked for removal. A similar check is not necessary in i40e_del_mac_all_vlan, since we are deleting, and any filter which we find already marked for removal would simply be deleted again, which doesn't cause any issues. Change-Id: I7962154013ce02fe950584690aeeb3ed853d0086 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: refactor i40e_put_mac_in_vlan to avoid changing f->vlanJacob Keller2016-10-311-12/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a PVID has been assigned to a VSI, the function i40e_put_mac_in_vlan arbitrarily modifies all filters to have the same VLAN. This is obviously incorrect because it could be modifying active filters without putting them into the NEW state. The correct method is to remove then re-add filters which is already done in the code where we assign the PVID. Fix this issue and a few other minor nits at the same time. First, when we have a PVID don't even bother looping and simply add the filter with the PVID immediately. In the case of the loop, we now can remove several checks. We also don't need to use i40e_find_filter first before calling i40e_add_filter, since i40e_add_filter implicitly does a lookup already. Finally, update the return semantics of this function so that on failure to add a filter it returns NULL, but on success, it returns the last filter added. Otherwise, we're just returning the last filter in the list. An alternative fix might be to return 0 or an error code, but this is pretty invasive to every call site. Change-ID: I2325dfd843aec76d89fb0d7cb0e7c4f290a34840 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: move i40e_put_mac_in_vlan and i40e_del_mac_all_vlanJacob Keller2016-10-311-57/+56
| | | | | | | | | | | | A future patch will be modifying these functions and making a call to a static function which currently is defined after these functions. Move them in a separate patch to ease review and ensure the moved code is correct. Change-ID: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: make use of __dev_uc_sync and __dev_mc_syncJacob Keller2016-10-312-54/+69
| | | | | | | | | | | | | | | | | | | | | The kernel provides __dev_uc_sync and __dev_mc_sync in order for drivers which need individual notification of add and delete for each filter. These functions allow us to vastly simplify our .set_rx_mode handler. We need to implement two functions for sync and unsync which add and remove filters respectively. This change avoids a very complex and inefficient algorithm which resulted in an abnormal latency for the .set_rx_mode NDO operation. The resulting code after this change is more readable, more efficient, and less code. Due to the callback signature used by these functions we also must update several other functions to take a const u8 * pointer. Change-Id: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: drop is_vf and is_netdev fields in struct i40e_mac_filterJacob Keller2016-10-315-272/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally the is_vf and is_netdev fields were added in order to distinguish between VF and netdev filters in a single VSI. However, it can be noted that we use separate VSI for SRIOV VFs and for netdev VSI. Thus, since a single VSI should only ever have one type of filter, we can simply remove the checks and remove the typing. In a similar fashion, we can note that the only remaining way to get multiple filters of a single type is through a debug command that was added to debugfs. This command is useless in practice, and results in causing bugs if we keep counter tracking but lose the is_vf and is_netdev protections as desired above. Since the only time we'd actually have a counter value besides 0 and 1 is through use of this debugfs hook, we can remove this unnecessary command, and the entire counter logic it required. We vastly simplify mac filters by removing (a) the distinction between VF and netdev filters (b) counting logic (c) the ability to add and remove filters bypassing the stack via debugfs Change-ID: Idf916dd2a1159b1188ddbab5bef6b85ea6bf27d9 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* i40e: Add missing \n to end of dev_err messageColin Ian King2016-10-311-1/+1
| | | | | | | | Trival fix, dev_err message is missing a \n, so add it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* Merge branch 'bridge-PIM-hello'David S. Miller2016-10-314-12/+95
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nikolay Aleksandrov says: ==================== bridge: add support for PIM hello router ports The first 3 patches of this set do minor cleanups and add some helpers to the PIM header file. Patch 4 adds a way to detect mcast router ports via PIM hello messages, they're marked as temporary and are not considered for querier. There's more detailed information in patch 4's commit message. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * bridge: mcast: add router port on PIM hello messageNikolay Aleksandrov2016-10-311-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we receive a PIM Hello message on a port we can consider that it has a multicast router attached, thus it is correct to add it to the router list. The only catch is it shouldn't be considered for a querier. Using Daniel's description: leaf-11 leaf-12 leaf-13 \ | / bridge-1 / \ host-11 host-12 - all ports in bridge-1 are in a single vlan aware bridge - leaf-11 is the IGMP querier - leaf-13 is the PIM DR - host-11 TXes packets to 226.10.10.10 - bridge-1 only forwards the 226.10.10.10 traffic out the port to leaf-11, it should also forward this traffic out the port to leaf-13 Suggested-by: Daniel Walton <dwalton@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pim: add all RFC7761 message typesNikolay Aleksandrov2016-10-313-3/+32
| | | | | | | | | | Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pim: add a helper to check for IPv4 all pim routers addressNikolay Aleksandrov2016-10-311-0/+6
| | | | | | | | | | Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: pim: add common pimhdr struct and helpersNikolay Aleksandrov2016-10-311-8/+36
|/ | | | | | | | Add the common pimhdr structure and helpers to access it, also cleanup the format of the header file. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'qed-next'David S. Miller2016-10-3116-97/+1404
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yuval Mintz says: ==================== qed*: Patch series This series does several things. The bigger changes: - Add new notification APIs [& Defaults] for various fields. The series then utilizes some of those qed <-> qede APIs to bass WoL support upon. - Change the resource allocation scheme to receive the values from management firmware, instead of equally sharing resources between functions [that might not need those]. That would, e.g., allow us to configure additional filters to network interfaces in presence of storage [PCI] functions from same adapter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Learn resources from management firmwareTomer Tayar2016-10-317-63/+341
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, each interfaces assumes it receives an equal portion of HW/FW resources, but this is wasteful - different partitions [and specifically, parititions exposing different protocol support] might require different resources. Implement a new resource learning scheme where the information is received directly from the management firmware [which has knowledge of all of the functions and can serve as arbiter]. Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Use VF-queue featureMintz, Yuval2016-10-314-16/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver sets several restrictions about the number of supported VFs according to available HW/FW resources. This creates a problem as there are constellations which can't be supported [as limitation don't accurately describe the resources], as well as holes where enabling IOV would fail due to supposed lack of resources. This introduces a new interal feature - vf-queues, which would be used to lift some of the restriction and accurately enumerate the queues that can be used by a given PF's VFs. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Learn of RDMA capabilities per-deviceMintz, Yuval2016-10-312-8/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | Today, RDMA capabilities are learned from management firmware which provides a per-device indication for all interfaces. Newer management firmware is capable of providing a per-device indication [would later be extended to either RoCE/iWARP]. Try using this newer learning mechanism, but fallback in case management firmware is too old to retain current functionality. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qede: Decouple ethtool caps from qedMintz, Yuval2016-10-311-2/+2
| | | | | | | | | | | | | | | | | | While the qed_lm_maps is closely tied with the QED_LM_* defines, when iterating over the array use actual size instead of the qed define to prevent future possible issues. Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed*: Add support for WoLMintz, Yuval2016-10-319-5/+176
| | | | | | | | | | Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Add nvram selftestMintz, Yuval2016-10-318-0/+267
| | | | | | | | | | Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed*: Management firmware - notifications and defaultsSudarsana Kalluru2016-10-319-3/+487
|/ | | | | | | | | | | | | | | | Management firmware is interested in various tidbits about the driver - including the driver state & several configuration related fields [MTU, primtary MAC, etc.]. This adds the necessray logic to update MFW with such configurations, some of which are passed directly via qed while for others APIs are provide so that qede would be able to later configure if needed. This also introduces a new default configuration for MTU which would replace the default inherited by being an ethernet device. Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* solos-pci: use permission-specific DEVICE_ATTR variantsJulia Lawall2016-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use DEVICE_ATTR_RW for read-write attributes. This simplifies the source code, improves readbility, and reduces the chance of inconsistencies. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @rw@ declarer name DEVICE_ATTR; identifier x,x_show,x_store; @@ DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store); @script:ocaml@ x << rw.x; x_show << rw.x_show; x_store << rw.x_store; @@ if not (x^"_show" = x_show && x^"_store" = x_store) then Coccilib.include_match false @@ declarer name DEVICE_ATTR_RW; identifier rw.x,rw.x_show,rw.x_store; @@ - DEVICE_ATTR(x, \(0644\|S_IRUGO|S_IWUSR\), x_show, x_store); + DEVICE_ATTR_RW(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* ptp: use permission-specific DEVICE_ATTR variantsJulia Lawall2016-10-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use DEVICE_ATTR_RO for read only attributes. This simplifies the source code, improves readbility, and reduces the chance of inconsistencies. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @ro@ declarer name DEVICE_ATTR; identifier x,x_show; @@ DEVICE_ATTR(x, \(0444\|S_IRUGO\), x_show, NULL); @script:ocaml@ x << ro.x; x_show << ro.x_show; @@ if not (x^"_show" = x_show) then Coccilib.include_match false @@ declarer name DEVICE_ATTR_RO; identifier ro.x,ro.x_show; @@ - DEVICE_ATTR(x, \(0444\|S_IRUGO\), x_show, NULL); + DEVICE_ATTR_RO(x); // </smpl> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf, inode: add support for symlinks and fix mtime/ctimeDaniel Borkmann2016-10-311-6/+39
| | | | | | | | | | | | | | | | | | | | | | | While commit bb35a6ef7da4 ("bpf, inode: allow for rename and link ops") added support for hard links that can be used for prog and map nodes, this work adds simple symlink support, which can be used f.e. for directories also when unpriviledged and works with cmdline tooling that understands S_IFLNK anyway. Since the switch in e27f4a942a0e ("bpf: Use mount_nodev not mount_ns to mount the bpf filesystem"), there can be various mount instances with mount_nodev() and thus hierarchy can be flattened to facilitate object sharing. Thus, we can keep bpf tooling also working by repointing paths. Most of the functionality can be used from vfs library operations. The symlink is stored in the inode itself, that is in i_link, which is sufficient in our case as opposed to storing it in the page cache. While at it, I noticed that bpf_mkdir() and bpf_mkobj() don't update the directories mtime and ctime, so add a common helper for it called bpf_dentry_finalize() that takes care of it for all cases now. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ldmvsw: tx queue stuck in stopped state after LDC resetAaron Young2016-10-311-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch fixes an issue with the ldmvsw driver where the network connection of a guest domain becomes non-functional after the guest domain has panic'd and rebooted. The root cause was determined to be from the following series of events: 1. Guest domain panics - resulting in the guest no longer processing network packets (from ldmvsw driver) 2. The ldmvsw driver (in the control domain) eventually exerts flow control due to no more available tx drings and stops the tx queue for the guest domain 3. The LDC of the network connection for the guest is reset when the guest domain reboots after the panic. 4. The LDC reset event is received by the ldmvsw driver and the ldmvsw responds by clearing the tx queue for the guest. 5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is the normal method to re-enable the tx queue. But the ACK never comes because the tx queue was cleared due to the LDC reset. To fix this issue, in addition to clearing the tx queue, re-enable the tx queue on a LDC reset. This prevents the ldmvsw from getting caught in this deadlocked state of waiting for a DATA ACK which will never come. Signed-off-by: Aaron Young <Aaron.Young@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'xps-DCB'David S. Miller2016-10-313-94/+219
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander Duyck says: ==================== Add support for XPS when using DCB This patch series enables proper isolation between traffic classes when using XPS while DCB is enabled. Previously enabling XPS would cause the traffic to be potentially pulled from one traffic class into another on egress. This change essentially multiplies the XPS map by the number of traffic classes and allows us to do a lookup per traffic class for a given CPU. To guarantee the isolation I invalidate the XPS map for any queues that are moved from one traffic class to another, or if we change the number of traffic classes. v2: Added sysfs to display traffic class Replaced do/while with for loop Cleaned up several other for for loops throughout the patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Add support for XPS with QoS via traffic classesAlexander Duyck2016-10-313-47/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for setting and using XPS when QoS via traffic classes is enabled. With this change we will factor in the priority and traffic class mapping of the packet and use that information to correctly select the queue. This allows us to define a set of queues for a given traffic class via mqprio and then configure the XPS mapping for those queues so that the traffic flows can avoid head-of-line blocking between the individual CPUs if so desired. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Refactor removal of queues from XPS map and apply on num_tc changesAlexander Duyck2016-10-311-23/+50
| | | | | | | | | | | | | | | | | | | | This patch updates the code for removing queues from the XPS map and makes it so that we can apply the code any time we change either the number of traffic classes or the mapping of a given block of queues. This way we avoid having queues pulling traffic from a foreign traffic class. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Add sysfs value to determine queue traffic classAlexander Duyck2016-10-313-1/+37
| | | | | | | | | | | | | | | | | | | | | | Add a sysfs attribute for a Tx queue that allows us to determine the traffic class for a given queue. This will allow us to more easily determine this in the future. It is needed as XPS will take the traffic class for a group of queues into account in order to avoid pulling traffic from one traffic class into another. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Move functions for configuring traffic classes out of inline headersAlexander Duyck2016-10-312-28/+32
|/ | | | | | | | | | The functions for configuring the traffic class to queue mappings have other effects that need to be addressed. Instead of trying to export a bunch of new functions just relocate the functions so that we can instrument them directly with the functionality they will need. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* driver: tun: Use new macro SOCK_IOC_TYPE instead of literal number 0x89Gao Feng2016-10-312-1/+3
| | | | | | | | | The current codes use _IOC_TYPE(cmd) == 0x89 to check if the cmd is one socket ioctl command like SIOCGIFHWADDR. But the literal number 0x89 may confuse readers. So create one macro SOCK_IOC_TYPE to enhance the readability. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add an ioctl to get a socket network namespaceAndrey Vagin2016-10-314-1/+19
| | | | | | | | | | | | | | | | | | | | Each socket operates in a network namespace where it has been created, so if we want to dump and restore a socket, we have to know its network namespace. We have a socket_diag to get information about sockets, it doesn't report sockets which are not bound or connected. This patch introduces a new socket ioctl, which is called SIOCGSKNS and used to get a file descriptor for a socket network namespace. A task must have CAP_NET_ADMIN in a target network namespace to use this ioctl. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mv643xx_eth: Properly resolve merge conflict.David S. Miller2016-10-311-2/+0
| | | | | | | | The second SET_NETDEV_DEV() in the hunk should be removed. Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* mv643xx_eth: Fix merge error.David S. Miller2016-10-311-3/+0
| | | | | | | One merge conflict block wasn't resolved. Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'shared-for-4.10-1' of ↵David S. Miller2016-10-3022-289/+927
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma Saeed Mahameed says: ==================== Mellanox mlx5 core driver updates 2016-10-25 This series contains some updates and fixes of mlx5 core and IB drivers with the addition of two features that demand new low level commands and infrastructure updates. - SRIOV VF max rate limit support - mlx5e tc support for FWD rules with counter. Needed for both net and rdma subsystems. Updates and Fixes: From Saeed Mahameed (2): - mlx5 IB: Skip handling unknown mlx5 events - Add ConnectX-5 PCIe 4.0 VF device ID From Artemy Kovalyov (2): - Update struct mlx5_ifc_xrqc_bits - Ensure SRQ physical address structure endianness From Eugenia Emantayev (1): - Fix length of async_event_mask New Features: From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support - Introduce TSAR manipulation firmware commands - Introduce E-switch QoS management - Add SRIOV VF max rate configuration support From Mark Bloch (7): mlx5e Tc support for FWD rule with counter - Don't unlock fte while still using it - Use fte status to decide on firmware command - Refactor find_flow_rule - Group similar rules under the same fte - Add multi dest support - Add option to add fwd rule with counter - mlx5e tc support for FWD rule with counter Mark here fixed two trivial issues with the flow steering core, and did some refactoring in the flow steering API to support adding mulit destination rules to the same hardware flow table entry at once. In the last two patches added the ability to populate a flow rule with a flow counter to the same flow entry. V2: Dropped some patches that added new structures without adding any usage of them. Added SRIOV VF max rate configuration support patch that introduces the usage of the TSAR infrastructure. Added flow steering fixes and refactoring in addition to mlx5 tc support for forward rule with counter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Add tc support for FWD rule with counterMark Bloch2016-10-302-10/+13
| | | | | | | | | | | | | | | | | | When creating a FWD rule using tc create also a HW counter for this rule. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Add option to add fwd rule with counterMark Bloch2016-10-301-6/+18
| | | | | | | | | | | | | | | | | | Currently the code supports only drop rules to possess counters, add that ability also for fwd rules. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Add multi dest supportMark Bloch2016-10-3014-254/+374
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when calling mlx5_add_flow_rule we accept only one flow destination, this commit allows to pass multiple destinations. This change forces us to change the return structure to a more flexible one. We introduce a flow handle (struct mlx5_flow_handle), it holds internally the number for rules created and holds an array where each cell points the to a flow rule. From the consumers (of mlx5_add_flow_rule) point of view this change is only cosmetic and requires only to change the type of the returned value they store. From the core point of view, we now need to use a loop when allocating and deleting rules (e.g given to us a flow handler). Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Group similer rules under the same fteMark Bloch2016-10-301-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When adding a new rule, if we can match it with compare_match_value and flow tag we might be able to insert the rule to the same fte. In order to do that, there must be an overlap between the actions of the fte and the new rule. When updating the action of an existing fte, we must tell the firmware we are doing so. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Refactor find_flow_ruleMark Bloch2016-10-301-9/+20
| | | | | | | | | | | | | | | | | | | | The way we compare between two dests will need to be used in other places in the future, so we factor out the comparison logic between two dests into a separate function. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Use fte status to decide on firmware commandMark Bloch2016-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | An fte status becomes FS_FTE_STATUS_EXISTING only after it was created in HW. We can use this in order to simplify the logic on what firmware command to use. If the status isn't FS_FTE_STATUS_EXISTING we need to create the fte, otherwise we need only to update it. Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Don't unlock fte while still using itMark Bloch2016-10-301-2/+4
| | | | | | | | | | | | | | | | | | | | When adding a new rule to an fte, we need to hold the fte lock until we add that rule to the fte and increase the fte ref count. Fixes: 0c56b97503fd ("net/mlx5_core: Introduce flow steering API") Signed-off-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Add SRIOV VF max rate configuration supportMohamad Haj Yahia2016-10-303-0/+80
| | | | | | | | | | | | | | | | Implement the vf set rate ndo by modifying the TSAR vport rate limit. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Introduce E-switch QoS managementMohamad Haj Yahia2016-10-302-1/+124
| | | | | | | | | | | | | | | | | | | | Add TSAR to the eswitch which will act as the vports rate limiter. Create/Destroy TSAR on Enable/Dsiable SRIOV. Attach/Detach vport to eswitch TSAR on Enable/Disable vport. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Introduce TSAR manipulation firmware commandsMohamad Haj Yahia2016-10-304-5/+279
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TSAR (stands for Transmit Scheduling ARbiter) is a hardware component that is responsible for selecting the next entity to serve on the transmit path. The arbitration defines the QoS policy between the agents connected to the TSAR. The TSAR is a consist two main features: 1) BW Allocation between agents: The TSAR implements a defecit weighted round robin between the agents. Each agent attached to the TSAR is assigned with a weight and it is awarded transmission tokens according to this weight. 2) Rate limer per agent: Each agent attached to the TSAR is (optionally) assigned with a rate limit. TSAR will not allow scheduling for an agent exceeding its defined rate limit. In this patch we implement the API of manipulating the TSAR. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Add ConnectX-5 PCIe 4.0 VF device IDSaeed Mahameed2016-10-301-0/+1
| | | | | | | | | | | | | | | | For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the device ID "0x101a" to mlx5_core_pci_table. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Fix length of async_event_maskEugenia Emantayev2016-10-301-1/+1
| | | | | | | | | | | | | | | | According to PRM async_event_mask have to be 64 bits long. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Ensure SRQ physical address structure endiannessArtemy Kovalyov2016-10-301-1/+1
| | | | | | | | | | | | | | | | SRQ physical address structure field should be in big-endian format. Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Update struct mlx5_ifc_xrqc_bitsArtemy Kovalyov2016-10-301-1/+1
| | | | | | | | | | | | | | Update struct mlx5_ifc_xrqc_bits according to last specification Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>