linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	sock: enable MSG_ZEROCOPY	Willem de Bruijn	2017-08-04	5	-33/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prepare the datapath for refcounted ubuf_info. Clone ubuf_info with skb_zerocopy_clone() wherever needed due to skb split, merge, resize or clone. Split skb_orphan_frags into two variants. The split, merge, .. paths support reference counted zerocopy buffers, so do not do a deep copy. Add skb_orphan_frags_rx for paths that may loop packets to receive sockets. That is not allowed, as it may cause unbounded latency. Deep copy all zerocopy copy buffers, ref-counted or not, in this path. The exact locations to modify were chosen by exhaustively searching through all code that might modify skb_frag references and/or the the SKBTX_DEV_ZEROCOPY tx_flags bit. The changes err on the safe side, in two ways. (1) legacy ubuf_info paths virtio and tap are not modified. They keep a 1:1 ubuf_info to sk_buff relationship. Calls to skb_orphan_frags still call skb_copy_ubufs and thus copy frags in this case. (2) not all copies deep in the stack are addressed yet. skb_shift, skb_split and skb_try_coalesce can be refined to avoid copying. These are not in the hot path and this patch is hairy enough as is, so that is left for future refinement. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock: add SOCK_ZEROCOPY sockopt	Willem de Bruijn	2017-08-04	13	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	The send call ignores unknown flags. Legacy applications may already unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a socket opts in to zerocopy. Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing. Processes can also query this socket option to detect kernel support for the feature. Older kernels will return ENOPROTOOPT. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock: add MSG_ZEROCOPY	Willem de Bruijn	2017-08-04	7	-21/+235
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The kernel supports zerocopy sendmsg in virtio and tap. Expand the infrastructure to support other socket types. Introduce a completion notification channel over the socket error queue. Notifications are returned with ee_origin SO_EE_ORIGIN_ZEROCOPY. ee_errno is 0 to avoid blocking the send/recv path on receiving notifications. Add reference counting, to support the skb split, merge, resize and clone operations possible with SOCK_STREAM and other socket types. The patch does not yet modify any datapaths. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock: skb_copy_ubufs support for compound pages	Willem de Bruijn	2017-08-04	2	-17/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Refine skb_copy_ubufs to support compound pages. With upcoming TCP zerocopy sendmsg, such fragments may appear. The existing code replaces each page one for one. Splitting each compound page into an independent number of regular pages can result in exceeding limit MAX_SKB_FRAGS if data is not exactly page aligned. Instead, fill all destination pages but the last to PAGE_SIZE. Split the existing alloc + copy loop into separate stages: 1. compute bytelength and minimum number of pages to store this. 2. allocate 3. copy, filling each page except the last to PAGE_SIZE bytes 4. update skb frag array Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	sock: allocate skbs from optmem	Willem de Bruijn	2017-08-04	2	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \|	Add sock_omalloc and sock_ofree to be able to allocate control skbs, for instance for looping errors onto sk_error_queue. The transmit budget (sk_wmem_alloc) is involved in transmit skb shaping, most notably in TCP Small Queues. Using this budget for control packets would impact transmission. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mlxsw-Support-for-IPv6-UC-router'	David S. Miller	2017-08-04	25	-177/+1490
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jiri Pirko says: ==================== mlxsw: Support for IPv6 UC router Ido says: This set adds support for IPv6 unicast routes offload. The first four patches make the FIB notification chain generic so that it could be used by address families other than IPv4. This is done by having each address family register its callbacks with the common code, so that its FIB tables and rules could be dumped upon registration to the chain, while ensuring the integrity of the dump. The exact mechanics are explained in detail in the first patch. The next six patches build upon this work and add the necessary callbacks in IPv6 code. This allows listeners of the chain to receive notifications about IPv6 routes addition, deletion and replacement as well as FIB rules notifications. Unlike user space notifications for IPv6 multipath routes, the FIB notification chain notifies these on a per-nexthop basis. This allows us to keep the common code lean and is also unnecessary, as notifications are serialized by each table's lock whereas applications maintaining netlink caches may suffer from concurrent dumps and deletions / additions of routes. The next five patches audit the different code paths reading the route's reference count (rt6i_ref) and remove assumptions regarding its meaning. This is needed since non-FIB users need to be able to hold a reference on the route and a non-zero reference count no longer means the route is in the FIB. The last six patches enable the mlxsw driver to offload IPv6 unicast routes to the Spectrum ASIC. Without resorting to ACLs, lookup is done solely based on the destination IP, so the abort mechanism is invoked upon the addition of source-specific routes. Follow-up patch sets will increase the scale of gatewayed routes by consolidating identical nexthop groups to one adjacency entry in the device's adjacency table (as in IPv4), as well as add support for NH_{ADD,DEL} events which enable support for the 'ignore_routes_with_linkdown' sysctl. Changes in v2: * Provide offload indication for individual nexthops (David Ahern). * Use existing route reference count instead of adding another one. This resulted in several new patches to remove assumptions regarding current semantics of the existing reference count (David Ahern). * Add helpers to allow non-FIB users to take a reference on route. * Remove use of tb6_lock in mlxsw (David Ahern). * Add IPv6 dependency to mlxsw. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Don't ignore IPv6 notifications	Ido Schimmel	2017-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We now have all the necessary IPv6 infrastructure in place, so stop ignoring these notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Abort on source-specific routes	Ido Schimmel	2017-08-04	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without resorting to ACLs, the device performs route lookup solely based on the destination IP address. In case source-specific routing is needed, an error is returned and the abort mechanism is activated, thus allowing the kernel to take over forwarding decisions. Instead of aborting, we can trap specific destination prefixes where source-specific routes are present, but this will result in a lot more code that is unlikely to ever be used. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Add support for route replace	Ido Schimmel	2017-08-04	1	-14/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case we got a replace event, then the replaced route must exist. If the route isn't capable of multipath, then replace first matching non-multipath capable route. If the route is capable of multipath and matching multipath capable route is found, then replace it. Otherwise, replace first matching non-multipath capable route. The new route is inserted before the replaced one. In case the replaced route is currently offloaded, then it's overwritten in the device's table by the new route and later deleted, thus not impacting routed traffic. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Add support for IPv6 routes addition / deletion	Ido Schimmel	2017-08-04	2	-3/+724
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow directly connected and remote unicast IPv6 routes to be programmed to the device's tables. As with IPv4, identical routes - sharing the same destination prefix - are ordered in a FIB node according to their table ID and then the metric. While the kernel doesn't share the same trie for the local and main table, this does happen in the device, so ordering according to table ID is needed. Since individual nexthops can be added and deleted in IPv6, each FIB entry stores a linked list of the rt6_info structs it represents. Upon the addition or deletion of a nexthop, a new nexthop group is allocated according to the new configuration and the old one is destroyed. Identical groups aren't currently consolidated, but will be in a follow-up patchset. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Sanitize IPv6 FIB rules	Ido Schimmel	2017-08-04	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only allow FIB offload in the presence of default rules or an l3mdev rule. In a similar fashion to IPv4 FIB rules, sanitize IPv6 rules. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Demultiplex FIB event based on family	Ido Schimmel	2017-08-04	1	-21/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FIB notification block currently only handles IPv4 events, but we want to start handling IPv6 events soon, so lay the groundwork now. Do that by preparing the work item and process it according to the notified address family. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Add helpers to hold / drop a reference on rt6_info	Ido Schimmel	2017-08-04	2	-10/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to commit 1c677b3d2828 ("ipv4: fib: Add fib_info_hold() helper") and commit b423cb10807b ("ipv4: fib: Export free_fib_info()") add an helper to hold a reference on rt6_info and export rt6_release() to drop it and potentially release the route. This is needed so that drivers capable of FIB offload could hold a reference on the route before queueing it for offload and drop it after the route has been programmed to the device's tables. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Regenerate host route according to node pointer upon interface up	Ido Schimmel	2017-08-04	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an interface is brought back up, the kernel tries to restore the host routes tied to its permanent addresses. However, if the host route was removed from the FIB, then we need to reinsert it. This is done by releasing the current dst and allocating a new, so as to not reuse a dst with obsolete values. Since this function is called under RTNL and using the same explanation from the previous patch, we can test if the route is in the FIB by checking its node pointer instead of its reference count. Tested using the following script and Andrey's reproducer mentioned in commit 8048ced9beb2 ("net: ipv6: regenerate host route if moved to gc list") and linked below: $ ip link set dev lo up $ ip link add dummy1 type dummy $ ip -6 address add cafe::1/64 dev dummy1 $ ip link set dev lo down # cafe::1/128 is removed $ ip link set dev dummy1 up $ ip link set dev lo up The host route is correctly regenerated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Link: http://lkml.kernel.org/r/CAAeHK+zSe82vc5gCRgr_EoUwiALPnWVdWJBPwJZBpbxYz=kGJw@mail.gmail.com Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: Regenerate host route according to node pointer upon loopback up	Ido Schimmel	2017-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the loopback device is brought back up we need to check if the host route attached to the address is still in the FIB and regenerate one in case it's not. Host routes using the loopback device are always inserted into and removed from the FIB under RTNL (under which this function is called), so we can test their node pointer instead of the reference count in order to check if the route is in the FIB or not. Tested using the following script from Nicolas mentioned in commit a220445f9f43 ("ipv6: correctly add local routes when lo goes up"): $ ip link add dummy1 type dummy $ ip link set dummy1 up $ ip link set lo down ; ip link set lo up The host route is correctly regenerated. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Unlink replaced routes from their nodes	Ido Schimmel	2017-08-04	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a route is deleted its node pointer is set to NULL to indicate it's no longer linked to its node. Do the same for routes that are replaced. This will later allow us to test if a route is still in the FIB by checking its node pointer instead of its reference count. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Don't assume only nodes hold a reference on routes	Ido Schimmel	2017-08-04	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The code currently assumes that only FIB nodes can hold a reference on routes. Therefore, after fib6_purge_rt() has run and the route is no longer present in any intermediate nodes, it's assumed that its reference count would be 1 - taken by the node where it's currently stored. However, we're going to allow users other than the FIB to take a reference on a route, so this assumption is no longer valid and the BUG_ON() needs to be removed. Note that purging only takes place if the initial reference count is different than 1. I've left that check intact, as in the majority of systems (where routes are only referenced by the FIB), it does actually mean the route is present in intermediate nodes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Add offload indication to routes	Ido Schimmel	2017-08-04	2	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow user space applications to see which routes are offloaded and which aren't by setting the RTNH_F_OFFLOAD flag when dumping them. To be consistent with IPv4, offload indication is provided on a per-nexthop basis. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Dump tables during registration to FIB chain	Ido Schimmel	2017-08-04	3	-2/+104
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Dump all the FIB tables in each net namespace upon registration to the FIB notification chain so that the callee will have a complete view of the tables. The integrity of the dump is ensured by a per-table sequence counter that is incremented (under write lock) whenever a route is added or deleted from the table. All the sequence counters are read (under each table's read lock) and summed, prior and after the dump. In case the counters differ, then the dump is either restarted or the registration fails. While it's possible for a table to be modified after its counter has been read, this isn't really a problem. In case it happened before it was read the second time, then the comparison at the end will fail. If it happened afterwards, then we're guaranteed to be notified about the change, as the notification block is registered prior to the second read. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib_rules: Dump rules during registration to FIB chain	Ido Schimmel	2017-08-04	3	-2/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow users of the FIB notification chain to receive a complete view of the IPv6 FIB rules upon registration to the chain. The integrity of the dump is ensured by a per-family sequence counter that is incremented (under RTNL) whenever a rule is added or deleted. All the sequence counters are read (under RTNL) and summed, prior and after the dump. In case the counters differ, then the dump is either restarted or the registration fails. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Add in-kernel notifications for route add / delete	Ido Schimmel	2017-08-04	2	-0/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with IPv4, allow listeners of the FIB notification chain to receive notifications whenever a route is added, replaced or deleted. This is done by placing calls to the FIB notification chain in the two lowest level functions that end up performing these operations - namely, fib6_add_rt2node() and fib6_del_route(). Unlike IPv4, APPEND notifications aren't sent as the kernel doesn't distinguish between "append" (NLM_F_CREATE\|NLM_F_APPEND) and "prepend" (NLM_F_CREATE). If NLM_F_EXCL isn't set, duplicate routes are always added after the existing duplicate routes. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib: Add FIB notifiers callbacks	Ido Schimmel	2017-08-04	5	-1/+75
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're about to add IPv6 FIB offload support, so implement the necessary callbacks in IPv6 code, which will later allow us to add routes and rules notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	ipv6: fib_rules: Check if rule is a default rule	Ido Schimmel	2017-08-04	2	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As explained in commit 3c71006d15fd ("ipv4: fib_rules: Check if rule is a default rule"), drivers supporting IPv6 FIB offload need to be able to sanitize the rules they don't support and potentially flush their tables. Add an IPv6 helper to check if a FIB rule is a default rule. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: fib_rules: Implement notification logic in core	Ido Schimmel	2017-08-04	5	-49/+101
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike the routing tables, the FIB rules share a common core, so instead of replicating the same logic for each address family we can simply dump the rules and send notifications from the core itself. To protect the integrity of the dump, a rules-specific sequence counter is added for each address family and incremented whenever a rule is added or deleted (under RTNL). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	rocker: Ignore address families other than IPv4	Ido Schimmel	2017-08-04	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As in previous patch, ignore IPv6 notifications since the driver doesn't support these. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlxsw: spectrum_router: Ignore address families other than IPv4	Ido Schimmel	2017-08-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're about to add IPv6 notifications in the FIB notification chain, but the driver currently doesn't support these, so ignore them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: core: Make the FIB notification chain generic	Ido Schimmel	2017-08-04	13	-93/+282
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FIB notification chain is currently soley used by IPv4 code. However, we're going to introduce IPv6 FIB offload support, which requires these notification as well. As explained in commit c3852ef7f2f8 ("ipv4: fib: Replay events when registering FIB notifier"), upon registration to the chain, the callee receives a full dump of the FIB tables and rules by traversing all the net namespaces. The integrity of the dump is ensured by a per-namespace sequence counter that is incremented whenever a change to the tables or rules occurs. In order to allow more address families to use the chain, each family is expected to register its fib_notifier_ops in its pernet init. These operations allow the common code to read the family's sequence counter as well as dump its tables and rules in the given net namespace. Additionally, a 'family' parameter is added to sent notifications, so that listeners could distinguish between the different families. Implement the common code that allows listeners to register to the chain and for address families to register their fib_notifier_ops. Subsequent patches will implement these operations in IPv6. In the future, ipmr and ip6mr will be extended to provide these notifications as well. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'mvpp2-add-TX-interrupts-support'	David S. Miller	2017-08-04	2	-151/+488
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Thomas Petazzoni says: ==================== net: mvpp2: add TX interrupts support So far, the mvpp2 driver was using an hrtimer to handle TX completion. This patch series adds support for using TX interrupts (for each CPU) on PPv2.2, the variant of the IP used on Marvell Armada 7K/8K. Dave: this version can be applied right away, it no longer depends on Antoine's patch series. Antoine series had some comments, so he will have to respin later on. Therefore, let's merge this smaller patch series first. Changes since v1: - Rebased on top of net-next, instead of on top of Antoine's series. - Removed the Device Tree patch, as it shouldn't go through the net tree. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	dt-bindings: net: marvell-pp2: update interrupt-names with TX interrupts	Thomas Petazzoni	2017-08-04	1	-3/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PPv2.2 unit has several interrupts used for TX completion notification. This commit updates the Device Tree binding describing this HW block to mention such interrupts. While at it, we update the example to use a recent Device Tree example, that uses interrupts going through the ICU, and not to the GIC directly. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvpp2: add support for TX interrupts and RX queue distribution modes	Thomas Petazzoni	2017-08-04	1	-29/+246
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds the support for two related features: - Support for TX interrupts, with one interrupt for each CPU - Support for different RX queue distribution modes MVPP2_QDIST_SINGLE_MODE where a single interrupt, shared by all CPUs, receives the RX events, and MVPP2_QDIST_MULTI_MODE, where the per-CPU interrupts used for TX events are also used for RX events. Since additional interrupts are needed, an update to the Device Tree binding is needed. However, backward compatibility is preserved with the old Device Tree binding, by gracefully degrading to the original behavior, with only one RX interrupt, and TX completion being handled by an hrtimer. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvpp2: introduce queue_vector concept	Thomas Petazzoni	2017-08-04	1	-54/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In preparation to the introduction of TX interrupts and improved RX queue distribution, this commit introduces the concept of "queue vector". A queue vector represents a number of RX and/or TX queues, and an associated NAPI instance and interrupt. This commit currently only creates a single queue_vector, so there are no changes in behavior, but it paves the way for additional queue_vector in the next commits. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvpp2: move from cpu-centric naming to "software thread" naming	Thomas Petazzoni	2017-08-04	1	-12/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The PPv2.2 IP has a concept of "software thread", with all registers of the PPv2.2 mapped 8 times, for concurrent accesses by 8 "software threads". In addition, interrupts on RX queues are associated to such "software thread". For most cases, we map a "software thread" to the more conventional concept of CPU, but we will soon have one exception: we will have a model where we have one TX interrupt per CPU (each using one software thread), and all RX events mapped to another software thread (associated to another interrupt). In preparation for this change, it makes sense to change the naming from MVPP2_MAX_CPUS to MVPP2_MAX_THREADS, and plan for 8 software threads instead of 4 currently. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvpp2: introduce per-port nrxqs/ntxqs variables	Thomas Petazzoni	2017-08-04	1	-42/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the global variables rxq_number and txq_number hold the number of per-port TXQs and RXQs. Until now, such numbers were constant regardless of the driver configuration. As we are going to introduce different modes for TX and RX queues, these numbers will depend on the configuration (PPv2.1 vs. PPv2.2, exact queue distribution logic). Therefore, as a preparation, we move the number of RXQs and TXQs in the 'struct mvpp2_port' structure, next to the RXQs and TXQs descriptor arrays. For now, they remain initialized to the same default values as rxq_number/txq_number used to be initialized, but this will change in future commits. The only non-mechanical change in this patch is that the check to verify hardware constraints on the number of RXQs and TXQs is moved from mvpp2_probe() to mvpp2_port_probe(), since it's now in mvpp2_port_probe() that we initialize the per-port count of RXQ and TXQ. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvpp2: remove RX queue group reset code	Thomas Petazzoni	2017-08-04	1	-17/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The RX queue group allocation is anyway re-done later in mvpp2_port_init(), so resetting it in mvpp2_init() is not very useful, and will be annoying as we are going to rework the RX queue group allocation logic. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: mvpp2: fix MVPP21_ISR_RXQ_GROUP_REG definition	Thomas Petazzoni	2017-08-04	1	-1/+1
\|/ \| \| \| \| \| \| \| \|	The MVPP21_ISR_RXQ_GROUP_REG register is not indexed by rxq, but by port, so we fix the parameter name accordingly. There are no functional changes. Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: arc_emac: Add support for ndo_do_ioctl net_device_ops operation	Romain Perier	2017-08-04	1	-0/+13
\| \| \| \| \| \| \| \| \| \|	This operation is required for handling ioctl commands like SIOCGMIIREG, when debugging MDIO registers from userspace. This commit adds support for this operation. Signed-off-by: Romain Perier <romain.perier@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'hns3-ethernet-driver'	David S. Miller	2017-08-04	18	-0/+11953
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Salil Mehta says: ==================== Hisilicon Network Subsystem 3 Ethernet Driver This patch-set contains the support of the HNS3 (Hisilicon Network Subsystem 3) Ethernet driver for hip08 family of SoCs and future upcoming SoCs. Hisilicon's new hip08 SoCs have integrated ethernet based on PCI Express and hence there was a need of new driver over the previous HNS driver which is already part of the Linux mainline. This new driver is NOT backward compatible with HNS. This current driver is meant to control the Physical Function and there would soon be a support of a separate driver for Virtual Function once this base PF driver has been accepted. Also, this driver is the ongoing development work and HNS3 Ethernet driver would be incrementally enhanced with more new features. High Level Architecture: [ Ethtool ] ^ \| \| \| [Ethernet Client] [ODP/UIO Client] . . . [ RoCE Client ] \| \| [ HNAE Device ] \| \| \| --------------------------------------------- \| \| \| [ HNAE3 Framework (Register/unregister) ] \| \| \| --------------------------------------------- \| \| \| [ HCLGE Layer] \| ________________\|_________________ \| \| \| \| \| [ MDIO ] [ Scheduler/Shaper ] [ Debugfs* ] \| \| \| \| \| \|________________\|_________________\| \| \| \| [ IMP command Interface ] \| --------------------------------------------- \| HIP08 H A R D W A R E * Current patch-set broadly adds the support of the following PF functionality: 1. Basic Rx and Tx functionality 2. TSO support 3. Ethtool support 4. * Debugfs support -> this patch for now has been taken off. 5. HNAE framework and hardware compatability layer 6. Scheduler and Shaper support in transmit function 7. MDIO support Change Log: V5->V6: Addressed below comments: * Andrew Lunn: Comments on MDIO and ethtool link mode * Leon Romanvosky: Some comments on HNAE layer tidy-up * Internal comments on redundant code removal, fixing error types etc. V4->V5: Addressed below concerns: * Florian Fanelli: Miscellaneous comments on ethtool & enet layer * Stephen Hemminger: comment of Netdev stats in ethool layer * Leon Romanvosky: Comments on Driver Version String, naming & Kconfig * Rochard Cochran: Redundant function prototype V3->V4: Addressed below comments: * Andrew Lunn: Various comments on MDIO, ethtool, ENET driver etc, * Stephen Hemminger: change access and updation to 64 but statistics * Bo You: some spelling mistakes and checkpatch.pl errors. V2->V3: Addressed comments * Yuval Mintz: Removal of redundant userprio-to-tc code * Stephen Hemminger: Ethtool & interuupt enable * Andrew Lunn: On C45/C22 PHy support, HNAE, ethtool * Florian Fainelli: C45/C22 and phy_connect/attach * Intel kbuild errors V1->V2: Addressed some comments by kbuild, Yuval MIntz, Andrew Lunn & Florian Fainelli in the following patches: * Add support of HNS3 Ethernet Driver for hip08 SoC * Add MDIO support to HNS3 Ethernet driver for hip08 SoC * Add support of debugfs interface to HNS3 driver ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add HNS3 driver to kernel build framework & MAINTAINERS	Salil	2017-08-04	5	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch updates the MAINTAINERS file with HNS3 Ethernet driver maintainers names and other details. This also introduces the new Makefiles required to build the HNS3 Ethernet driver and updates the existing Kconfig file in the hisilicon folder. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add Ethtool support to HNS3 driver	Salil	2017-08-04	1	-0/+482
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the support of the Ethtool interface to the HNS3 Ethernet driver. Various commands to read the statistics, configure the offloading, loopback selftest etc. are supported. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC	Salil	2017-08-04	2	-0/+230
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the support of MDIO bus interface for HNS3 driver. Code provides various interfaces to start and stop the PHY layer and to read and write the MDIO bus or PHY. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver	Salil	2017-08-04	2	-0/+1121
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	THis patch adds the support of the Scheduling and Shaping functionalities during the transmit leg. This also adds the support of Pause at MAC level. (Pause at per-priority level shall be added later along with the DCB feature). Hardware as such consists of two types of cofiguration of 6 level schedulers. Algorithms varies according to the level and type of scheduler being used. Current patch is used to initialize the mapping, algorithms(like SP, DWRR etc) and shaper(CIR, PIR etc) being used. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support	Salil	2017-08-04	2	-0/+4786
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the support of Hisilicon Network Subsystem Accceleration Engine and common operations to access it. This layer provides access to the hardware configuration, hardware statistics. This layer is also responsible for triggering the initialization of the PHY layer through the below MDIO layer. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add HNS3 IMP(Integrated Mgmt Proc) Cmd Interface Support	Salil	2017-08-04	2	-0/+1096
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the support of IMP (Integrated Management Processor) command interface to the HNS3 driver. Each PF/VF has support of CQP(Command Queue Pair) ring interface. Each CQP consis of send queue CSQ and receive queue CRQ. There are various commands a PF/VF may support, like for Flow Table manipulation, Device management, Packet buffer allocation, Forwarding, VLANs config, Tunneling/Overlays etc. This patch contains code to initialize the command queue, manage the command queue descriptors and Rx/Tx protocol with the command processor in the form of various commands/results and acknowledgements. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add support of the HNAE3 framework	Salil	2017-08-04	2	-0/+744
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the support of the HNAE3 (Hisilicon Network Acceleration Engine 3) framework support to the HNS3 driver. Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE and user-space Ethernet drivers (like ODP etc.) to register with HNAE3 devices and their associated operations. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC	Salil	2017-08-04	2	-0/+3440
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the support of Hisilicon Network Subsystem 3 Ethernet driver to hip08 family of SoCs. This driver includes basic Rx/Tx functionality. It also includes the client registration code with the HNAE3(Hisilicon Network Acceleration Engine 3) framework. This work provides the initial support to the hip08 SoC and would incrementally add features or enhancements. Signed-off-by: Daode Huang <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'sctp-remove-typedefs-from-structures-part-4'	David S. Miller	2017-08-03	8	-157/+162
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Xin Long says: ==================== sctp: remove typedefs from structures part 4 As we know, typedef is suggested not to use in kernel, even checkpatch.pl also gives warnings about it. Now sctp is using it for many structures. All this kind of typedef's using should be removed. This patchset is the part 4 to remove it for another 14 basic structures from linux/sctp.h. After this patchset, all typedefs are cleaned in linux/sctp.h. Just as the part 1-3, No any code's logic would be changed in these patches, only cleaning up. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sctp: remove the typedef sctp_auth_chunk_t	Xin Long	2017-08-03	3	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is to remove the typedef sctp_auth_chunk_t, and replace with struct sctp_auth_chunk in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sctp: remove the typedef sctp_authhdr_t	Xin Long	2017-08-03	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is to remove the typedef sctp_authhdr_t, and replace with struct sctp_authhdr in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sctp: remove the typedef sctp_addip_chunk_t	Xin Long	2017-08-03	4	-10/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is to remove the typedef sctp_addip_chunk_t, and replace with struct sctp_addip_chunk in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	sctp: remove the typedef sctp_addiphdr_t	Xin Long	2017-08-03	3	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is to remove the typedef sctp_addiphdr_t, and replace with struct sctp_addiphdr in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>