summaryrefslogtreecommitdiffstats
path: root/net/tipc/bearer.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* tipc: Remove unused function declarationsYue Haibing2023-08-031-2/+0
| | | | | | | | | | | | | | | | | Commit d50ccc2d3909 ("tipc: add 128-bit node identifier") declared but never implemented tipc_node_id2hash(). Also commit 5c216e1d28c8 ("tipc: Allow run-time alteration of default link settings") never implemented tipc_media_set_priority() and tipc_media_set_window(), commit cad2929dc432 ("tipc: update a binding service via broadcast") only declared tipc_named_bcast(). Since commit be07f056396d ("tipc: simplify the finalize work queue") tipc_sched_net_finalize() is removed and declaration is unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230802034659.39840-1-yuehaibing@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
* tipc: delete tipc_mtu_bad from tipc_udp_enableXin Long2023-05-311-2/+2
| | | | | | | | | | | | | | | | | | | Since commit a4dfa72d0acd ("tipc: set default MTU for UDP media"), it's been no longer using dev->mtu for b->mtu, and the issue described in commit 3de81b758853 ("tipc: check minimum bearer MTU") doesn't exist in UDP bearer any more. Besides, dev->mtu can still be changed to a too small mtu after the UDP bearer is created even with tipc_mtu_bad() check in tipc_udp_enable(). Note that NETDEV_CHANGEMTU event processing in tipc_l2_device_event() doesn't really work for UDP bearer. So this patch deletes the unnecessary tipc_mtu_bad from tipc_udp_enable. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Link: https://lore.kernel.org/r/282f1f5cc40e6cad385aa1c60569e6c5b70e2fb3.1685371933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tipc: add tipc_bearer_min_mtu to calculate min mtuXin Long2023-05-151-0/+3
| | | | | | | | | | | | | As different media may requires different min mtu, and even the same media with different net family requires different min mtu, add tipc_bearer_min_mtu() to calculate min mtu accordingly. This API will be used to check the new mtu when doing the link mtu negotiation in the next patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: constify dev_addr passingJakub Kicinski2021-10-131-1/+1
| | | | | | | In preparation for netdev->dev_addr being constant make all relevant arguments in tipc constant. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* net: tipc: Fix spelling errors in net/tipc moduleZheng Yongjun2021-04-071-3/+3
| | | | | | | | These patches fix a series of spelling errors in net/tipc module. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tipc: fix tipc header files for kernel-docRandy Dunlap2020-12-021-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix tipc header files for adding to the networking docbook. Remove some uses of "/**" that were not kernel-doc notation. Fix some source formatting to eliminate Sphinx warnings. Add missing struct member and function argument kernel-doc descriptions. Correct the description of a couple of struct members that were marked as "(FIXME)". Documentation/networking/tipc:18: ../net/tipc/name_table.h:65: WARNING: Unexpected indentation. Documentation/networking/tipc:18: ../net/tipc/name_table.h:66: WARNING: Block quote ends without a blank line; unexpected unindent. ../net/tipc/bearer.h:128: warning: Function parameter or member 'min_win' not described in 'tipc_media' ../net/tipc/bearer.h:128: warning: Function parameter or member 'max_win' not described in 'tipc_media' ../net/tipc/bearer.h:171: warning: Function parameter or member 'min_win' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'max_win' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'disc' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'up' not described in 'tipc_bearer' ../net/tipc/bearer.h:171: warning: Function parameter or member 'refcnt' not described in 'tipc_bearer' ../net/tipc/name_distr.h:68: warning: Function parameter or member 'port' not described in 'distr_item' ../net/tipc/name_table.h:111: warning: Function parameter or member 'services' not described in 'name_table' ../net/tipc/name_table.h:111: warning: Function parameter or member 'cluster_scope_lock' not described in 'name_table' ../net/tipc/name_table.h:111: warning: Function parameter or member 'rc_dests' not described in 'name_table' ../net/tipc/name_table.h:111: warning: Function parameter or member 'snd_nxt' not described in 'name_table' ../net/tipc/subscr.h:67: warning: Function parameter or member 'kref' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'net' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'service_list' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'conid' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'inactive' not described in 'tipc_subscription' ../net/tipc/subscr.h:67: warning: Function parameter or member 'lock' not described in 'tipc_subscription' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* tipc: introduce variable window congestion controlJon Maloy2019-12-111-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'congestion avoidance', and 'fast recovery' modes. - We introduce hard lower and upper window limits per link, still different and configurable per bearer type. - We introduce a 'slow start theshold' variable, initially set to the maximum window size. - We let a link start at the minimum congestion window, i.e. in slow start mode, and then let is grow rapidly (+1 per rceived ACK) until it reaches the slow start threshold and enters congestion avoidance mode. - In congestion avoidance mode we increment the congestion window for each window-size number of acked packets, up to a possible maximum equal to the configured maximum window. - For each non-duplicate NACK received, we drop back to fast recovery mode, by setting the both the slow start threshold to and the congestion window to (current_congestion_window / 2). - If the timeout handler finds that the transmit queue has not moved since the previous timeout, it drops the link back to slow start and forces a probe containing the last sent sequence number to the sent to the peer, so that this can discover the stale situation. This change does in reality have effect only on unicast ethernet transport, as we have seen that there is no room whatsoever for increasing the window max size for the UDP bearer. For now, we also choose to keep the limits for the broadcast link unchanged and equal. This algorithm seems to give a 50-100% throughput improvement for messages larger than MTU. Suggested-by: Xin Long <lucien.xin@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: introduce TIPC encryption & authenticationTuong Lien2019-11-081-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit offers an option to encrypt and authenticate all messaging, including the neighbor discovery messages. The currently most advanced algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All encryption/decryption is done at the bearer layer, just before leaving or after entering TIPC. Supported features: - Encryption & authentication of all TIPC messages (header + data); - Two symmetric-key modes: Cluster and Per-node; - Automatic key switching; - Key-expired revoking (sequence number wrapped); - Lock-free encryption/decryption (RCU); - Asynchronous crypto, Intel AES-NI supported; - Multiple cipher transforms; - Logs & statistics; Two key modes: - Cluster key mode: One single key is used for both TX & RX in all nodes in the cluster. - Per-node key mode: Each nodes in the cluster has one specific TX key. For RX, a node requires its peers' TX key to be able to decrypt the messages from those peers. Key setting from user-space is performed via netlink by a user program (e.g. the iproute2 'tipc' tool). Internal key state machine: Attach Align(RX) +-+ +-+ | V | V +---------+ Attach +---------+ | IDLE |---------------->| PENDING |(user = 0) +---------+ +---------+ A A Switch| A | | | | | | Free(switch/revoked) | | (Free)| +----------------------+ | |Timeout | (TX) | | |(RX) | | | | | | v | +---------+ Switch +---------+ | PASSIVE |<----------------| ACTIVE | +---------+ (RX) +---------+ (user = 1) (user >= 1) The number of TFMs is 10 by default and can be changed via the procfs 'net/tipc/max_tfms'. At this moment, as for simplicity, this file is also used to print the crypto statistics at runtime: echo 0xfff1 > /proc/sys/net/tipc/max_tfms The patch defines a new TIPC version (v7) for the encryption message (- backward compatibility as well). The message is basically encapsulated as follows: +----------------------------------------------------------+ | TIPCv7 encryption | Original TIPCv2 | Authentication | | header | packet (encrypted) | Tag | +----------------------------------------------------------+ The throughput is about ~40% for small messages (compared with non- encryption) and ~9% for large messages. With the support from hardware crypto i.e. the Intel AES-NI CPU instructions, the throughput increases upto ~85% for small messages and ~55% for large messages. By default, the new feature is inactive (i.e. no encryption) until user sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO" in the kernel configuration to enable/disable the new code when needed. MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add reference counter to bearerTuong Lien2019-11-081-0/+3
| | | | | | | | | | | | | | | | As a need to support the crypto asynchronous operations in the later commits, apart from the current RCU mechanism for bearer pointer, we add a 'refcnt' to the bearer object as well. So, a bearer can be hold via 'tipc_bearer_hold()' without being freed even though the bearer or interface can be disabled in the meanwhile. If that happens, the bearer will be released then when the crypto operation is completed and 'tipc_bearer_put()' is called. Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add loopback device trackingJohn Rutherford2019-08-091-0/+10
| | | | | | | | | | | | | | | | | | | | | Since node internal messages are passed directly to the socket, it is not possible to observe those messages via tcpdump or wireshark. We now remedy this by making it possible to clone such messages and send the clones to the loopback interface. The clones are dropped at reception and have no functional role except making the traffic visible. The feature is enabled if network taps are active for the loopback device. pcap filtering restrictions require the messages to be presented to the receiving side of the loopback device. v3 - Function dev_nit_active used to check for network taps. - Procedure netif_rx_ni used to send cloned messages to loopback device. Signed-off-by: John Rutherford <john.rutherford@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: enable tracepoints in tipcTuong Lien2018-12-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As for the sake of debugging/tracing, the commit enables tracepoints in TIPC along with some general trace_events as shown below. It also defines some 'tipc_*_dump()' functions that allow to dump TIPC object data whenever needed, that is, for general debug purposes, ie. not just for the trace_events. The following trace_events are now available: - trace_tipc_skb_dump(): allows to trace and dump TIPC msg & skb data, e.g. message type, user, droppable, skb truesize, cloned skb, etc. - trace_tipc_list_dump(): allows to trace and dump any TIPC buffers or queues, e.g. TIPC link transmq, socket receive queue, etc. - trace_tipc_sk_dump(): allows to trace and dump TIPC socket data, e.g. sk state, sk type, connection type, rmem_alloc, socket queues, etc. - trace_tipc_link_dump(): allows to trace and dump TIPC link data, e.g. link state, silent_intv_cnt, gap, bc_gap, link queues, etc. - trace_tipc_node_dump(): allows to trace and dump TIPC node data, e.g. node state, active links, capabilities, link entries, etc. How to use: Put the trace functions at any places where we want to dump TIPC data or events. Note: a) The dump functions will generate raw data only, that is, to offload the trace event's processing, it can require a tool or script to parse the data but this should be simple. b) The trace_tipc_*_dump() should be reserved for a failure cases only (e.g. the retransmission failure case) or where we do not expect to happen too often, then we can consider enabling these events by default since they will almost not take any effects under normal conditions, but once the rare condition or failure occurs, we get the dumped data fully for post-analysis. For other trace purposes, we can reuse these trace classes as template but different events. c) A trace_event is only effective when we enable it. To enable the TIPC trace_events, echo 1 to 'enable' files in the events/tipc/ directory in the 'debugfs' file system. Normally, they are located at: /sys/kernel/debug/tracing/events/tipc/ For example: To enable the tipc_link_dump event: echo 1 > /sys/kernel/debug/tracing/events/tipc/tipc_link_dump/enable To enable all the TIPC trace_events: echo 1 > /sys/kernel/debug/tracing/events/tipc/enable To collect the trace data: cat trace or cat trace_pipe > /trace.out & To disable all the TIPC trace_events: echo 0 > /sys/kernel/debug/tracing/events/tipc/enable To clear the trace buffer: echo > trace d) Like the other trace_events, the feature like 'filter' or 'trigger' is also usable for the tipc trace_events. For more details, have a look at: Documentation/trace/ftrace.txt MAINTAINERS | add two new files 'trace.h' & 'trace.c' in tipc Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: implement configuration of UDP media MTUGhantaKrishnamurthy MohanKrishna2018-04-201-0/+3
| | | | | | | | | | | | | | In previous commit, we changed the default emulated MTU for UDP bearers to 14k. This commit adds the functionality to set/change the default value by configuring new MTU for UDP media. UDP bearer(s) have to be disabled and enabled back for the new MTU to take effect. Acked-by: Ying Xue <ying.xue@windriver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: some cleanups in the file discover.cJon Maloy2018-03-231-1/+1
| | | | | | | | | | | To facilitate the coming changes in the neighbor discovery functionality we make some renaming and refactoring of that code. The functional changes in this commit are trivial, e.g., that we move the message sending call in tipc_disc_timeout() outside the spinlock protected region. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: Introduce __tipc_nl_media_setYing Xue2018-02-141-0/+1
| | | | | | | Introduce __tipc_nl_media_set() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: Introduce __tipc_nl_bearer_setYing Xue2018-02-141-0/+1
| | | | | | | Introduce __tipc_nl_bearer_set() which doesn't holding RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: Introduce __tipc_nl_bearer_enableYing Xue2018-02-141-0/+1
| | | | | | | Introduce __tipc_nl_bearer_enable() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: Introduce __tipc_nl_bearer_disableYing Xue2018-02-141-0/+1
| | | | | | | Introduce __tipc_nl_bearer_disable() which doesn't hold RTNL lock. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-09-021-0/+2
|\ | | | | | | | | | | Three cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * tipc: permit bond slave as bearerParthasarathy Bhuvaragan2017-08-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | For a bond slave device as a tipc bearer, the dev represents the bond interface and orig_dev represents the slave in tipc_l2_rcv_msg(). Since we decode the tipc_ptr from bonding device (dev), we fail to find the bearer and thus tipc links are not established. In this commit, we register the tipc protocol callback per device and look for tipc bearer from both the devices. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tipc: don't reset stale broadcast send linkJon Paul Maloy2017-08-211-1/+0
|/ | | | | | | | | | | | | | | | | | | | | | | | | When the broadcast send link after 100 attempts has failed to transfer a packet to all peers, we consider it stale, and reset it. Thereafter it needs to re-synchronize with the peers, something currently done by just resetting and re-establishing all links to all peers. This has turned out to be overkill, with potentially unwanted consequences for the remaining cluster. A closer analysis reveals that this can be done much simpler. When this kind of failure happens, for reasons that may lie outside the TIPC protocol, it is typically only one peer which is failing to receive and acknowledge packets. It is hence sufficient to identify and reset the links only to that peer to resolve the situation, without having to reset the broadcast link at all. This solution entails a much lower risk of negative consequences for the own node as well as for the overall cluster. We implement this change in this commit. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add function for checking broadcast support in bearerJon Paul Maloy2017-01-201-1/+7
| | | | | | | | | | | | | | | | | As a preparation for the 'replicast' functionality we are going to introduce in the next commits, we need the broadcast base structure to store whether bearer broadcast is available at all from the currently used bearer or bearers. We do this by adding a new function tipc_bearer_bcast_support() to the bearer layer, and letting the bearer selection function in bcast.c use this to give a new boolean field, 'bcast_support' the appropriate value. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: check minimum bearer MTUMichal Kubeček2016-12-021-0/+13
| | | | | | | | | | | | | | | | | | | | Qian Zhang (张谦) reported a potential socket buffer overflow in tipc_msg_build() which is also known as CVE-2016-8632: due to insufficient checks, a buffer overflow can occur if MTU is too short for even tipc headers. As anyone can set device MTU in a user/net namespace, this issue can be abused by a regular user. As agreed in the discussion on Ben Hutchings' original patch, we should check the MTU at the moment a bearer is attached rather than for each processed packet. We also need to repeat the check when bearer MTU is adjusted to new device MTU. UDP case also needs a check to avoid overflow when calculating bearer MTU. Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: introduce UDP replicastRichard Alpe2016-08-271-0/+1
| | | | | | | | | | | | | | This patch introduces UDP replicast. A concept where we emulate multicast by sending multiple unicast messages to configured peers. The purpose of replicast is mainly to be able to use TIPC in cloud environments where IP multicast is disabled. Using replicas to unicast multicast messages is costly as we have to copy each skb and send the copies individually. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: make bearer packet filtering genericJon Paul Maloy2016-08-191-0/+1
| | | | | | | | | | | | | | | | | | | | | In commit 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer layer") we introduced a method of filtering out messages while a bearer is being reset, to avoid that links may be re-created and come back in working state while we are still in the process of shutting them down. This solution works well, but is limited to only work with L2 media, which is insufficient with the increasing use of UDP as carrier media. We now replace this solution with a more generic one, by introducing a new flag "up" in the generic struct tipc_bearer. This field will be set and reset at the same locations as with the previous solution, while the packet filtering is moved to the generic code for the sending side. On the receiving side, the filtering is still done in media specific code, but now including the UDP bearer. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add a function to get the bearer nameParthasarathy Bhuvaragan2016-07-261-0/+1
| | | | | | | | | Introduce a new function to get the bearer name from its id. This is used in subsequent commit. Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-07-241-0/+1
|\ | | | | | | | | | | Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * tipc: reset all unicast links when broadcast send link failsJon Paul Maloy2016-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In test situations with many nodes and a heavily stressed system we have observed that the transmission broadcast link may fail due to an excessive number of retransmissions of the same packet. In such situations we need to reset all unicast links to all peers, in order to reset and re-synchronize the broadcast link. In this commit, we add a new function tipc_bearer_reset_all() to be used in such situations. The function scans across all bearers and resets all their pertaining links. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tipc: add neighbor monitoring frameworkJon Paul Maloy2016-06-151-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TIPC based clusters are by default set up with full-mesh link connectivity between all nodes. Those links are expected to provide a short failure detection time, by default set to 1500 ms. Because of this, the background load for neighbor monitoring in an N-node cluster increases with a factor N on each node, while the overall monitoring traffic through the network infrastructure increases at a ~(N * (N - 1)) rate. Experience has shown that such clusters don't scale well beyond ~100 nodes unless we significantly increase failure discovery tolerance. This commit introduces a framework and an algorithm that drastically reduces this background load, while basically maintaining the original failure detection times across the whole cluster. Using this algorithm, background load will now grow at a rate of ~(2 * sqrt(N)) per node, and at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will now have to actively monitor 38 neighbors in a 400-node cluster, instead of as before 399. This "Overlapping Ring Supervision Algorithm" is completely distributed and employs no centralized or coordinated state. It goes as follows: - Each node makes up a linearly ascending, circular list of all its N known neighbors, based on their TIPC node identity. This algorithm must be the same on all nodes. - The node then selects the next M = sqrt(N) - 1 nodes downstream from itself in the list, and chooses to actively monitor those. This is called its "local monitoring domain". - It creates a domain record describing the monitoring domain, and piggy-backs this in the data area of all neighbor monitoring messages (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in the cluster eventually (default within 400 ms) will learn about its monitoring domain. - Whenever a node discovers a change in its local domain, e.g., a node has been added or has gone down, it creates and sends out a new version of its node record to inform all neighbors about the change. - A node receiving a domain record from anybody outside its local domain matches this against its own list (which may not look the same), and chooses to not actively monitor those members of the received domain record that are also present in its own list. Instead, it relies on indications from the direct monitoring nodes if an indirectly monitored node has gone up or down. If a node is indicated lost, the receiving node temporarily activates its own direct monitoring towards that node in order to confirm, or not, that it is actually gone. - Since each node is actively monitoring sqrt(N) downstream neighbors, each node is also actively monitored by the same number of upstream neighbors. This means that all non-direct monitoring nodes normally will receive sqrt(N) indications that a node is gone. - A major drawback with ring monitoring is how it handles failures that cause massive network partitionings. If both a lost node and all its direct monitoring neighbors are inside the lost partition, the nodes in the remaining partition will never receive indications about the loss. To overcome this, each node also chooses to actively monitor some nodes outside its local domain. Those nodes are called remote domain "heads", and are selected in such a way that no node in the cluster will be more than two direct monitoring hops away. Because of this, each node, apart from monitoring the member of its local domain, will also typically monitor sqrt(N) remote head nodes. - As an optimization, local list status, domain status and domain records are marked with a generation number. This saves senders from unnecessarily conveying unaltered domain records, and receivers from performing unneeded re-adaptations of their node monitoring list, such as re-assigning domain heads. - As a measure of caution we have added the possibility to disable the new algorithm through configuration. We do this by keeping a threshold value for the cluster size; a cluster that grows beyond this value will switch from full-mesh to ring monitoring, and vice versa when it shrinks below the value. This means that if the threshold is set to a value larger than any anticipated cluster size (default size is 32) the new algorithm is effectively disabled. A patch set for altering the threshold value and for listing the table contents will follow shortly. - This change is fully backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: remove remnants of old broadcast codeJon Paul Maloy2016-04-131-15/+0
| | | | | | | | | | We remove a couple of leftover fields in struct tipc_bearer. Those were used by the old broadcast implementation, and are not needed any longer. There is no functional changes in this commit. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: eliminate remnants of hungarian notationJon Paul Maloy2015-11-201-4/+4
| | | | | | | | | | | | The number of variables with Hungarian notation (l_ptr, n_ptr etc.) has been significantly reduced over the last couple of years. We now root out the last traces of this practice. There are no functional changes in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: clean up unused code and structuresJon Paul Maloy2015-10-241-2/+0
| | | | | | | | | | | | After the previous changes in this series, we can now remove some unused code and structures, both in the broadcast, link aggregation and link code. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: let neighbor discoverer tranmsit consumable buffersJon Paul Maloy2015-10-241-0/+3
| | | | | | | | | | | | | | | | | | The neighbor discovery function currently uses the function tipc_bearer_send() for transmitting packets, assuming that the sent buffers are not consumed by the called function. We want to change this, in order to avoid unnecessary buffer cloning elswhere in the code. This commit introduces a new function tipc_bearer_skb() which consumes the sent buffers, and let the discoverer functions use this new call instead. The discoverer does now itself perform the cloning when that is necessary. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: introduce jumbo frame support for broadcastJon Paul Maloy2015-10-241-0/+1
| | | | | | | | | | | | | Until now, we have only been supporting a fix MTU size of 1500 bytes for all broadcast media, irrespective of their actual capability. We now make the broadcast MTU adaptable to the carrying media, i.e., we use the smallest MTU supported by any of the interfaces attached to TIPC. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: simplify bearer level broadcastJon Paul Maloy2015-10-241-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, we have been keeping track of the exact set of broadcast destinations though the help structure tipc_node_map. This leads us to have to maintain a whole infrastructure for supporting this, including a pseudo-bearer and a number of functions to manipulate both the bearers and the node map correctly. Apart from the complexity, this approach is also limiting, as struct tipc_node_map only can support cluster local broadcast if we want to avoid it becoming excessively large. We want to eliminate this limitation, in order to enable introduction of scoped multicast in the future. A closer analysis reveals that it is unnecessary maintaining this "full set" overview; it is sufficient to keep a counter per bearer, indicating how many nodes can be reached via this bearer at the moment. The protocol is now robust enough to handle transitional discrepancies between the nominal number of reachable destinations, as expected by the broadcast protocol itself, and the number which is actually reachable at the moment. The initial broadcast synchronization, in conjunction with the retransmission mechanism, ensures that all packets will eventually be acknowledged by the correct set of destinations. This commit introduces these changes. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: make media xmit call outside node spinlock contextJon Paul Maloy2015-07-211-0/+3
| | | | | | | | | | | | | | | | | | | | | Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: simplify include dependenciesJon Paul Maloy2015-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | When we try to add new inline functions in the code, we sometimes run into circular include dependencies. The main problem is that the file core.h, which really should be at the root of the dependency chain, instead is a leaf. I.e., core.h includes a number of header files that themselves should be allowed to include core.h. In reality this is unnecessary, because core.h does not need to know the full signature of any of the structs it refers to, only their type declaration. In this commit, we remove all dependencies from core.h towards any other tipc header file. As a consequence of this change, we can now move the function tipc_own_addr(net) from addr.c to addr.h, and make it inline. There are no functional changes in this commit. Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add ip/udp media typeErik Hugne2015-03-061-3/+9
| | | | | | | | | | | | | | | The ip/udp bearer can be configured in a point-to-point mode by specifying both local and remote ip/hostname, or it can be enabled in multicast mode, where links are established to all tipc nodes that have joined the same multicast group. The multicast IP address is generated based on the TIPC network ID, but can be overridden by using another multicast address as remote ip. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: make media address offset a common defineErik Hugne2015-02-281-0/+1
| | | | | | | | | | With the exception of infiniband media which does not use media offsets, the media address is always located at offset 4 in the media info field as defined by the protocol, so we move the definition to the generic bearer.h Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: rename media/msg related definitionsErik Hugne2015-02-281-2/+2
| | | | | | | | | | The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names are misleading, as they actually define the size and offset of the whole media info field and not the address part. This patch does not have any functional changes. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: convert legacy nl media dump to nl compatRichard Alpe2015-02-091-1/+0
| | | | | | | | | | Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: convert legacy nl bearer enable/disable to nl compatRichard Alpe2015-02-091-3/+0
| | | | | | | | | | | | | | | | | | Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: convert legacy nl bearer dump to nl compatRichard Alpe2015-02-091-1/+0
| | | | | | | | | | | | | | | | | | | Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: make tipc broadcast link support net namespaceYing Xue2015-01-121-6/+15
| | | | | | | | | | | | | | TIPC broadcast link is statically established and its relevant states are maintained with the global variables: "bcbearer", "bclink" and "bcl". Allowing different namespace to own different broadcast link instances, these variables must be moved to tipc_net structure and broadcast link instances would be allocated and initialized when namespace is created. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: make bearer list support net namespaceYing Xue2015-01-121-9/+7
| | | | | | | | | | | | | | Bearer list defined as a global variable is used to store bearer instances. When tipc supports net namespace, bearers created in one namespace must be isolated with others allocated in other namespaces, which requires us that the bearer list(bearer_list) must be moved to tipc_net structure. As a result, a net namespace pointer has to be passed to functions which access the bearer list. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: make tipc node table aware of net namespaceYing Xue2015-01-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Global variables associated with node table are below: - node table list (node_htable) - node hash table list (tipc_node_list) - node table lock (node_list_lock) - node number counter (tipc_num_nodes) - node link number counter (tipc_num_links) To make node table support namespace, above global variables must be moved to tipc_net structure in order to keep secret for different namespaces. As a consequence, these variables are allocated and initialized when namespace is created, and deallocated when namespace is destroyed. After the change, functions associated with these variables have to utilize a namespace pointer to access them. So adding namespace pointer as a parameter of these functions is the major change made in the commit. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: involve namespace infrastructureYing Xue2015-01-121-2/+3
| | | | | | | | | | | | | Involve namespace infrastructure, make the "tipc_net_id" global variable aware of per namespace, and rename it to "net_id". In order that the conversion can be successfully done, an instance of networking namespace must be passed to relevant functions, allowing them to access the "net_id" variable of per namespace. Signed-off-by: Ying Xue <ying.xue@windriver.com> Tested-by: Tero Aho <Tero.Aho@coriant.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: use generic SKB list APIs to manage link receive queueYing Xue2014-11-261-1/+1
| | | | | | | | | Use standard SKB list APIs associated with struct sk_buff_head to manage link's receive queue to simplify its relevant code cemplexity. Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add media set to new netlink apiRichard Alpe2014-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | | Add TIPC_NL_MEDIA_SET command to the new tipc netlink API. This command can set one or more link properties for a particular media. Netlink logical layout of bearer set message: -> media -> name -> link properties [ -> tolerance ] [ -> priority ] [ -> window ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add media get/dump to new netlink apiRichard Alpe2014-11-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Add TIPC_NL_MEDIA_GET command to the new tipc netlink API. This command supports dumping all information about all defined media as well as getting all information about a specific media. The information about a media includes name and link properties. Netlink logical layout of media get response message: -> media -> name -> link properties -> tolerance -> priority -> window Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tipc: add bearer set to new netlink apiRichard Alpe2014-11-211-0/+1
| | | | | | | | | | | | | | | | | | | | | Add TIPC_NL_BEARER_SET command to the new tipc netlink API. This command can set one or more link properties for a particular bearer. Netlink logical layout of bearer set message: -> bearer -> name -> link properties [ -> tolerance ] [ -> priority ] [ -> window ] Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>