summaryrefslogtreecommitdiffstats
path: root/bgpd/bgp_evpn_private.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bgpd: Move sticky, default_gw, router_flag into a single flags variableDonatas Abraitis2024-07-041-1/+1
| | | | | | | | Instead of using 3 uint8_t variables under struct attr, let's use a single uint8_t as the flags. Saving 2-bytes. Not a big deal, but it's even easier to track EVPN-related flags/variables. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
* bgpd: bgp_best_selection is inherently pi basedDonald Sharp2024-04-011-1/+2
| | | | | | | | | Currently evpn code calls bgp_best_selection for local decisions for local tables to figure out what to do. This is also pi based so let's note that the pi has been changed before calling bgp_best_selection. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
* bgpd: Convert from struct bgp_node to struct bgp_destYuqing Zhao2023-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | This is based on @donaldsharp's work The current code base is the struct bgp_node data structure. The problem with this is that it creates a bunch of extra data per route_node. The table structure generates ‘holder’ nodes that are never going to receive bgp routes, and now the memory of those nodes is allocated as if they are a full bgp_node. After splitting up the bgp_node into bgp_dest and route_node, the memory of ‘holder’ node which does not have any bgp data will be allocated as the route_node, not the bgp_node, and the memory usage is reduced. The memory usage of BGP node will be reduced from 200B to 96B. The total memory usage optimization of this part is ~16.00%. Signed-off-by: Donald Sharp <sharpd@nvidia.com> Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
* bgpd: bgp_path_info_extra memory optimizationValerian_He2023-08-081-8/+8
| | | | | | | | | | | | | Even if some of the attributes in bgp_path_info_extra are not used, their memory is still allocated every time. It cause a waste of memory. This commit code deletes all unnecessary attributes and changes the optional attributes to pointer storage. Memory will only be allocated when they are actually used. After optimization, extra info related memory is reduced by about half(~400B -> ~200B). Signed-off-by: Valerian_He <1826906282@qq.com>
* bgpd: Add MAC-VRF Site-of-Origin supportTrey Aspelund2023-05-301-0/+9
| | | | | | | | | | | | | | | Initial support for configuring an SoO for all MAC-VRFs (EVIs/L2VNIs). This provides a topology-independent method of preventing EVPN routes from one MAC-VRF "site" (an L2 domain) from being imported by other PEs in the same MAC-VRF "site", similar to how SoO is traditionally used in L3VPN to identify and break loops for an L3/IP-VRF "site". One example of where a MAC-VRF SoO can be used to avoid an L2 control plane loop is with Active/Active MLAG VTEPs. For a given L2 site only one control plane should be active. SoO can be used to ID/ignore entries originated from the local MAC-VRF site so that EVPN will not attempt to manage entries that are already handled by MLAG. Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
* bgpd: Drop afi_t from bgp_evpn_global_node_lookup()Donatas Abraitis2023-03-141-5/+3
| | | | | | Not used. Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
* Merge pull request #12248 from pguibert6WIND/bgpasdotRuss White2023-02-211-0/+1
|\ | | | | lib, bgp: add initial support for asdot format
| * bgpd: store the route-distinguisher from config as a stringPhilippe Guibert2023-02-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The route-distinguisher string can be expressed in different ways when the AS number is part of the RD. And the configured string value has to be kept intact. The following vty commands store the string value internally: - router bgp / address-family ipv4 unicast / rd vpn export <> - router bgp / address-family l2vpn evpn / rd <> - router bgp / address-family l2vpn evpn / vni <> / rd <> The vty commands where RD is configured in the below places is not considered: - router bgp / rfapi related commands - router bgp / address-family xxx xxx / network .. rd <> Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
* | *: auto-convert to SPDX License IDsDavid Lamparter2023-02-091-17/+1
|/ | | | | | Done with a combination of regex'ing and banging my head against a wall. Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
* Merge pull request #12081 from sworleys/EMM-upstreamDonatas Abraitis2022-11-171-12/+109
|\ | | | | Rework of Various Handling in EVPN for Extended Mac Mobility
| * bgpd,zebra,lib: bgp evpn vni macip into two tablesStephen Worley2022-10-111-19/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-work the bgp vni table to use separately keyed tables for type2 routes. So, with type2 routes, we have the main table keyed off of the IP and a new MAC table keyed off of MACs. By separating out the two, we are able to run path selection separately for the neigh and mac. Keeping the two separate is also more in-line with what happens in zebra (they are managed comptletely seperate). With this change type2 routes go into each table like so: ``` Remote MAC-IP -> IP Table & MAC Table Remote MAC -> MAC Table Local MAC-IP -> IP Table Local MAC -> MAC Table ``` The difference for local is necessary because we should not ever allow multiple paths for a local MAC. Also cleaned up the commands for querying the vni tables: ``` show bgp vni all type ... show bgp vni VNI type ... ``` Old commands will be deprecated in a separate commit. Signed-off-by: Stephen Worley <sworley@nvidia.com>
| * bgpd: rework VNI table for type2/macip routesStephen Worley2022-10-111-6/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the IP addr of type2/macip routes only for the hash/key of the VNI table and carry the MAC in a path_info_extra attribute. There is exists situations that can be hit during extended MAC mobility events where two MACs could be pointing to the same IP in our global table. It is requires very specific timings. When that happens, BPG would (because we key'd on both MAC and IP) install both into it's VNI table as separate entries, but zebra only knows/needs to know about a single IP -> MAC relationship for it's VNI table's type2 routes. So it was compleletly undeterministic which one zebra would end up with in these timing situations. With these changes, we move BGP's VNI table to key'd the same as Zebra's and now a single IP will have multiple path_info's with a path_info_extra that is carrying the MAC info for each path. BGP will then run best path to deterministically decide which one to send to zebra during the occasions where there exist's two possible MACs. Signed-off-by: Stephen Worley <sworley@nvidia.com>
* | bgpd: evpn L3 RT auto config and wildcard implementationStephen Worley2022-08-231-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement forcing L3 auto derivation via configs even when manually RTs are set. This will allow both to coexist in BGP RTs. Without using auto config command, it will remove auto derived RTs when you manually configure your own. To allow both, use the auto command ond import/export/both. Implement '*' wildcard import L3 RTs so we can import a route into any AS. This is necessary to avoid a user from having to configure an L3 RT for every AS they care to import evpn route from. Signed-off-by: Stephen Worley <sworley@nvidia.com>
* | bgpd: abstract ecom into struct for l3 route targetsStephen Worley2022-08-231-1/+13
|/ | | | | | | | | | | | Abstract the ecommunity into a container struct for L3 route targets so that we can add some additional info via flags to go along with RT configs without modifying the used elsewhere ecommunity struct. This functions as a wrapper everywhere its used including the import/export lists. The flags will be used in later commits to change behavior when importing/exporting routes. Signed-off-by: Stephen Worley <sworley@nvidia.com>
* bgpd: remove unncessary check for evpnanlan_cs2022-05-111-2/+1
| | | | | | | | | | In current code, `build_evpn_type2_prefix()` doesn't distinguish ARP according to the `ip` parameter. The `ip` parameter from caller is always non-NULL. Be consistent and not confused, just remove the unnecessary check. Signed-off-by: anlan_cs <vic.lan@pica8.com>
* bgpd: remove dead code for evpnanlan_cs2022-03-261-6/+0
| | | | | | | | | | `is_vni_param_configured()` is used to check whether RD/RT configured for specific evpn vni. There seems to be no need for this mixed check. No caller for about 5 years, just remove it. Signed-off-by: anlan_cs <vic.lan@pica8.com>
* lib, bgpd: changes for EAD-per-ES fragmentationAnuradha Karuppiah2022-03-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EAD-per-ES route carries ECs for all the ES-EVI RTs. As the number of VNIs increase all RTs do not fit into a standard BGP UPDATE (4K) so the route needs to be fragmented. Each fragment is associated with a separate RD and frag-id - 1. Local ES-per-EAD - ES route table - {ES-frag-ID, ESI, ET=0xffffffff, VTEP-IP} global route table - {RD-=ES-frag-RD, ESI, ET=0xffffffff} 2. Remote ES-per-EAD - VNI route table - {ESI, ET=0xffffffff, VTEP-IP} global route table - {RD-=ES-frag-RD, ESI, ET=0xffffffff} Note: The fragment ID is abandoned in the per-VNI routing table. At this point that is acceptable as we dont expect more than one-ES-per-EAD fragment to be imported into the per-VNI routing table. But that may need to be re-worked at a later point. CLI changes (sample with 4 VNIs per-fragment for experimental pruposes) - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> root@torm-11:mgmt:~# vtysh -c "show bgp l2vpn evpn es 03:44:38:39:ff:ff:01:00:00:01" ESI: 03:44:38:39:ff:ff:01:00:00:01 Type: LR RD: 27.0.0.21:3 Originator-IP: 27.0.0.21 Local ES DF preference: 50000 VNI Count: 10 Remote VNI Count: 10 VRF Count: 3 MACIP EVI Path Count: 33 MACIP Global Path Count: 198 Inconsistent VNI VTEP Count: 0 Inconsistencies: - Fragments: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 27.0.0.21:3 EVIs: 4 27.0.0.21:13 EVIs: 4 27.0.0.21:22 EVIs: 2 VTEPs: 27.0.0.22 flags: EA df_alg: preference df_pref: 32767 27.0.0.23 flags: EA df_alg: preference df_pref: 32767 root@torm-11:mgmt:~# vtysh -c "show bgp l2vpn evpn es-evi vni 1002 detail" VNI: 1002 ESI: 03:44:38:39:ff:ff:01:00:00:01 Type: LR ES fragment RD: 27.0.0.21:13 >>>>>>>>>>>>>>>>>>>>>>>>> Inconsistencies: - VTEPs: 27.0.0.22(EV),27.0.0.23(EV) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PS: The number of EVIs per-fragment has been set to 128 and may need further tuning. Ticket: #2632967 Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
* bgpd: evpn mh changes to advertise EAD routes with user configured export-rtAnuradha Karuppiah2022-03-181-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an alternate to EAD route fragmenation and allows the user to limit the route to a single UPDATE (<4K) independent of the number of EVIs. Sample config (add one l2-vni RT from each VRF) - >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ! router bgp 5556 ! address-family l2vpn evpn ead-es-route-target export 5556:1001 ead-es-route-target export 5556:1004 ead-es-route-target export 5556:1008 exit-address-family ! >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sample route >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Network Next Hop Metric LocPrf Weight Path *> [1]:[4294967295]:[03:44:38:39:ff:ff:01:00:00:01]:[32]:[27.0.0.21] 27.0.0.21 32768 i ET:8 ESI-label-Rt:AA RT:5556:1001 RT:5556:1004 RT:5556:1008 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> When configured, the ead-es-route-target is used instead of the auto-generated version that includes all associated EVI's RTs. Ticket: #2632967 Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
* bgpd: remove dead codeanlan_cs2022-03-161-1/+0
| | | | | | | | | `bgp_evpn_import_route_in_vrfs()` is special ( l2vpn ) form of `install_uninstall_evpn_route() with `AFI_L2VPN` and `SAFI_EVPN` family. No caller, just remove it. Signed-off-by: anlan_cs <vic.lan@pica8.com>
* bgpd: EVPN route type-5 to type-2 recursive resolution using gateway IPAmeya Dharkar2021-06-081-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When EVPN prefix route with a gateway IP overlay index is imported into the IP vrf at the ingress PE, BGP nexthop of this route is set to the gateway IP. For this vrf route to be valid, following conditions must be met. - Gateway IP nexthop of this route should be L3 reachable, i.e., this route should be resolved in RIB. - A remote MAC/IP route should be present for the gateway IP address in the EVI(L2VPN table). To check for the first condition, gateway IP is registered with nht (nexthop tracking) to receive the reachability notifications for this IP from zebra RIB. If the gateway IP is reachable, zebra sends the reachability information (i.e., nexthop interface) for the gateway IP. This nexthop interface should be the SVI interface. Now, to find out type-2 route corresponding to the gateway IP, we need to fetch the VNI for the above SVI. To do this VNI lookup effitiently, define a hashtable of struct bgpevpn with svi_ifindex as key. struct hash *vni_svi_hash; An EVI instance is added to vni_svi_hash if its svi_ifindex is nonzero. Using this hash, we obtain struct bgpevpn corresponding to the gateway IP. For gateway IP overlay index recursive lookup, once we find the correct EVI, we have to lookup its route table for a MAC/IP prefix. As we have to iterate the entire route table for every lookup, this lookup is expensive. We can optimize this lookup by adding all the remote IP addresses in a hash table. Following hash table is defined for this purpose in struct bgpevpn Struct hash *remote_ip_hash; When a MAC/IP route is installed in the EVI table, it is also added to remote_ip_hash. It is possible to have multiple MAC/IP routes with the same IP address because of host move scenarios. Thus, for every address addr in remote_ip_hash, we maintain list of all the MAC/IP routes having addr as their IP address. Following structure defines an address in remote_ip_hash. struct evpn_remote_ip { struct ipaddr addr; struct list *macip_path_list; }; A Boolean field is added to struct bgp_nexthop_cache to indicate that the nexthop is EVPN gateway IP overlay index. bool is_evpn_gwip_nexthop; A flag BGP_NEXTHOP_EVPN_INCOMPLETE is added to struct bgp_nexthop_cache. This flag is set when the gateway IP is L3 reachable but not yet resolved by a MAC/IP route. Following table explains the combination of L3 and L2 reachability w.r.t. BGP_NEXTHOP_VALID and BGP_NEXTHOP_EVPN_INCOMPLETE flags * | MACIP resolved | MACIP unresolved *----------------|----------------|------------------ * L3 reachable | VALID = 1 | VALID = 0 * | INCOMPLETE = 0 | INCOMPLETE = 1 * ---------------|----------------|-------------------- * L3 unreachable | VALID = 0 | VALID = 0 * | INCOMPLETE = 0 | INCOMPLETE = 0 Procedure that we use to check if the gateway IP is resolvable by a MAC/IP route: - Find the EVI/L2VRF that belongs to the nexthop SVI using vni_svi_hash. - Check if the gateway IP is present in remote_ip_hash in this EVI. When the gateway IP is L3 reachable and it is also resolved by a MAC/IP route, unset BGP_NEXTHOP_EVPN_INCOMPLETE flag and set BGP_NEXTHOP_VALID flag. Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
* bgpd, zebra: Add svi_interface to zebra VNI and bgp EVPN structuresAmeya Dharkar2021-06-081-1/+3
| | | | | | | | | | | | | | | | | SVI ifindex for L2VNI is required in BGP to perform EVPN type-5 to type-2 recusrsive resolution using gateway IP overlay index. Program this svi_ifindex in struct zebra_vni_t as well as in struct bgpevpn Changes include: 1. Add svi_if field to struct zebra_evpn_t 2. Add svi_ifindex field to struct bgpevpn 3. When SVI (bridge or VLAN) is bound to a VxLAN interface, store it in the zebra_evpn_t structure. 4. Add this SVI ifindex to ZEBRA_VNI_ADD 5. Store svi_ifindex in struct bgpevpn Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
* bgpd: handle local ES del or transition to LACP bypassAnuradha Karuppiah2021-03-261-0/+1
| | | | | | | | | | | | | 1. When a local ES is deleted or the ES-bond goes into bypass we treat imported MAC-IP routes with that ES destination as remote routes instead of sync routes. This requires a re-evaluation of the routes as "non-local-dest" and an update to zebra. 2. When a ES is attached to an access port or the ES-bond transitions from bypass to LACP-up we treat imported MAC-IP routes with that ES destination as sync routes. This requires a re-evaluation of the routes as "local-dest" and an update to zebra. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: re-eval use-l3nhg when a remote ES is [de]activated in a VRFAnuradha Karuppiah2021-03-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | There are two changes in this commit - 1. Maintain a list of global MAC-IP routes per-ES. This list is maintained for quick processing on the following events - a. When the first VTEP/PE becomes active in the ES-VRF, the L3 NHG is activated and the route can be sent to zebra. b. When there are no active PEs in the ES-VRF the L3 NHG is de-activated and - - If the ES is present in the VRF - The route is not installed in zebra as there are no active PEs for the ES-VRF - If the ES is not present in the VRF - The route is installed with a flat multi-path list i.e. without L3NHG. This is to handle the case where there are no locally attached L2VNIs on the ES (for that tenant VRF). 2. Reinstall VRF route when an ES is installed or uninstalled in a tenant VRF (the global MAC-IP list in #1 is used for this purpose also). If an ES is present in the VRF we use L3NHG to enable fast-failover of routed traffic. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: on ES down re-advertise the MAC-IP entry without the L3 ECOMAnuradha Karuppiah2021-03-261-0/+5
| | | | | | | | | | | | | | | | When an ES goes down the MAC-IP route must be updated to remove it from the tenant VRF routing table. This is because the fast-failover (via EAD-per-ES withdraw) procedures described in RFC 7432 are only applicable to L2 forwarding/MAC-ECMP. For L3/routed traffic (in a sym-IRB setup) failover, individual paths need to be withdrawn. To handle this difference in L2/L3 requirements BGP updates the MAC-IP route to include the L3 ECOM if local destination ES is oper-up and to exclude the L3 ECOM if local ES is oper-down. Ticket: CM-30935 Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* *: require semicolon after DEFINE_QOBJ & co.David Lamparter2021-03-171-2/+2
| | | | | | Again, see previous commits. Signed-off-by: David Lamparter <equinox@diac24.net>
* bgpd: Remove the double decleration of bgp_global_evpn_node_lookupDonald Sharp2021-02-071-8/+0
| | | | Signed-off-by: Donald Sharp <sharpd@nvidia.com>
* *: Replace s_addr check agains 0 with INADDR_ANYDonatas Abraitis2020-12-141-1/+1
| | | | Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
* bgpd: Handle ES VTEP add/del to a host routeAnuradha Karuppiah2020-11-241-0/+9
| | | | | | | | | 1. MAC-IP routes in the VPN routing table are linked to the destination ES for efficient handling for remote ES link flaps. 2. Only MAC-IP paths whose nexthops are active (added via EAD-ES) are imported into the VRF routing table. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: L3NHG infrastructure for host routes in EVPNAnuradha Karuppiah2020-11-241-0/+7
| | | | | | | | | | | | | | | | | ES-VRF entries are maintained for the purpose of L3-NHG creation - 1. Each ES-EVI entry is associated with a tenant VRF. This associaton triggers the creation of an ES-VRF entry. 2. Type-2/MAC-IP routes are imported into a tenant VRF and programmed as a /32 or host route entry in the dataplane. If the destination of the host route is a remote-ES the route is programmed with the corresponding (keyed in by {vrf,ES-id}) L3-NHG. 3. The reason for this indirection (route->L3-NHG, L3-NHG->list-of-VTEPs) is to avoid route updates to the dplane when a remote-ES link flaps i.e. instead of updating all the dependent routes the NHG's contents are updated. This reduces the amount of dataplane updates (fewer nhg updates vs. route updates) allowing for a faster failover. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: support for DF election in EVPN-MHAnuradha Karuppiah2020-10-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DF (Designated forwarder) election is used for picking a single BUM-traffic forwarded per-ES. RFC7432 specifies a mechanism called service carving for DF election. However that mechanism has many disadvantages - 1. LBs poorly. 2. Doesn't allow for a controlled failover needed in upgrade scenarios. 3. Not easy to hw accelerate. To fix the poor performance of service carving alternate DF mechanisms have been proposed via the following drafts - draft-ietf-bess-evpn-df-election-framework draft-ietf-bess-evpn-pref-df This commit adds support for the pref-df election mechanism which is used as the default. Other mechanisms including service-carving may be added later. In this mechanism one switch on an ES is elected as DF based on the preference value; higher preference wins with IP address acting as the tie-breaker (lower-IP wins if pref value is the same). Sample output ============= >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> torm-11# sh bgp l2vpn evpn es 03:00:00:00:00:01:11:00:00:01 ESI: 03:00:00:00:00:01:11:00:00:01 Type: LR RD: 27.0.0.15:6 Originator-IP: 27.0.0.15 Local ES DF preference: 100 VNI Count: 10 Remote VNI Count: 10 Inconsistent VNI VTEP Count: 0 Inconsistencies: - VTEPs: 27.0.0.16 flags: EA df_alg: preference df_pref: 32767 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> torm-11# sh bgp l2vpn evpn route esi 03:00:00:00:00:01:11:00:00:01 *> [4]:[03:00:00:00:00:01:11:00:00:01]:[32]:[27.0.0.15] 27.0.0.15 32768 i ET:8 ES-Import-Rt:00:00:00:00:01:11 DF: (alg: 2, pref: 100) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: More bgp_node -> bgp_dest cleanupDonald Sharp2020-10-171-7/+9
| | | | | | | | Some more of the bgp_node usage snuck in from big commits in the past month or so from feature work. Do some work to put it back to bgp_dest for incoming future work. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
* bgpd, lib: move EVPN route type def to lib and use it in the prefix macrosAnuradha Karuppiah2020-08-051-9/+0
| | | | | | | Use route names instead of route type number in the EVPN prefix macros. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: Type-2/MAC-IP SYNC route handlingAnuradha Karuppiah2020-08-051-1/+11
| | | | | | | | | | | | | | | | | | SYNC routes are paths rxed from a local-ES peer. These routes result in the installation of local dataplane entries i.e. with access port as destination (vs. the remote-VTEP destination that results in the packet being sent via the VxLAN overlay). If a SYNC path is selected as the best path it is always turned around into a local path which immediately lowers the status of the SYNC path to non-best. However we need to keep track of the highest MM seq-number and peer activity to continue advertising the local path. In order to do that we need information from the "second-best" SYNC path to be bubbled up to the local best path. This "SYNC" info is then consolidated and sent to zebra which is responsible for the MM handling and local path management. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* lib, bgpd: Remove unused variable from structureDonald Sharp2020-08-051-3/+0
| | | | | | | | | | The `struct evpn_ead_addr` structure had a prefix length associated with it. This value was only ever set never used. Remove this from our system. The other nice thing about this change is that it puts back the sizeof struct route_node to 192 bytes. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* bgpd: support for Ethernet Segments and Type-1/EAD routesAnuradha Karuppiah2020-08-051-41/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the base patch that brings in support for Type-1 routes. It includes support for - - Ethernet Segment (ES) management - EAD route handling - MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for active-active multihoming - Initial infra for consistency checking. Consistency checking is a fundamental feature for active-active solutions like MLAG. We will try to levarage the info in the EAD-ES/EAD-EVI routes to detect inconsitencies in access config across VTEPs attached to the same Ethernet Segment. Functionality Overview - ======================== 1. Ethernet segments are created in zebra and associated with access VLANs. zebra sends that info as ES and ES-EVI objects to BGP. 2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached ethernet segments. 3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers and translates them into ES-VTEP objects which are then sent to zebra as remote ESs. 4. Each ES in zebra is associated with a list of active VTEPs which is then translated into a L2-NHG (nexthop group). This is the ES "Alias" entry 5. MAC-IP routes with a non-zero ESI use the alias entry created in (4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES destinations. EAD route management (route table and key) - ============================================ 1. Local EAD-ES routes a. route-table: per-ES route-table key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP) b. route-table: per-VNI route-table Not added c. route-table: global route-table key: {RD=ES-RD, ESI, ET=0xffffffff) 2. Remote EAD-ES routes a. route-table: per-ES route-table Not added b. route-table: per-VNI route-table key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP) c. route-table: global route-table key: {RD=ES-RD, ESI, ET=0xffffffff) 3. Local EAD-EVI routes a. route-table: per-ES route-table Not added b. route-table: per-VNI route-table key: {RD=0, ESI, ET=0, VTEP-IP) c. route-table: global route-table key: {RD=L2-VNI-RD, ESI, ET=0) 4. Remote EAD-EVI routes a. route-table: per-ES route-table Not added b. route-table: per-VNI route-table key: {RD=0, ESI, ET=0, VTEP-IP) c. route-table: global route-table key: {RD=L2-VNI-RD, ESI, ET=0) Please refer to bgp_evpn_mh.h for info on how the data-structures are organized. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: pull the multihoming code out to a separate fileAnuradha Karuppiah2020-08-051-3/+3
| | | | | | | | | | Re-org only; no other code changes. This is being done to make maintanence of MH functionality (which will have more code added to it) easy. The code moved here was originally committed via - 'commit 50f74cf13105 ("*: support for evpn type-4 route")' Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: EVPN RT-2 advertised with 2 labels for prefix-routes-only configAmeya Dharkar2020-05-091-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | L3VNI is configured with "prefix-routes-only" flag. Even in this case, intermittently, we observed that local EVPN MACIP routes are installed and advertised with 2 labels and 2 export RTs. This is a sequencing issue. Consider following case where L2VNI 200 and L3VNI 1000 are configured for tenant vrf vrf-blue. Bug is observed for following sequence of events: 1. vrf-blue BGP instance is created. 2. L2VNI is created in bgp for vni 200. It is linked to the tenant vrf vrf-blue in function bgpevpn_link_to_l3vni. Following code sets "VNI_FLAG_USE_TWO_LABELS" flag for vni 200 as L3VNI is not yet attached to vrf-blue BGP instance. /* check if we are advertising two labels for this vpn */ if (!CHECK_FLAG(bgp_vrf->vrf_flags, BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY)) SET_FLAG(vpn->flags, VNI_FLAG_USE_TWO_LABELS); 2. Now L3VNI is attached to vrf-blue BGP instance. In this case, we set BGP_VRF_L3VNI_PREFIX_ROUTES_ONLY flag for vrf-blue but we do not clear VNI_FLAG_USE_TWO_LABELS flag set on the corresponding L2VNIs. This fix resolves following 2 issues observed above. 1. When L2VNI is created in BGP, flag VNI_FLAG_USE_TWO_LABELS should not be set for this VNI if BGP vrf is not attached to any L3VNI. 2. When L3VNI is attached to the BGP vrf, set "VNI_FLAG_USE_TWO_LABELS" flag if "prefix-routes-only" is not for the vrf. UT cases: 1. Flap "prefix-routes-only" config for a vrf. 2. Test following triggers for vrfs with and without "prefix-routes-only" - Flap L2VNI from kernel. - Flap L3VNI from kernel. Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
* bgpd: More `const struct prefix` workDonald Sharp2020-03-221-5/+6
| | | | | | | | | Modify more code to use `const struct prefix` throughout bgp. This is all prep work for adding an accessor function for bgp_node to get the prefix and reduce all the places that code needs to be touched when we get that work done. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* bgpd: evpn pip handle svi ip routeChirag Shah2019-11-221-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | By default announct Self Type-2 routes with system IP as nexthop and system MAC as nexthop. An API to check type-2 is self route via checking ipv4/ipv6 address from connected interfaces list. An API to extract RMAC and nexthop for type-2 routes based on advertise-svi-ip knob is enabled. When advertise-pip is enabled/disabled, trigger type-2 route update. For self type-2 routes to use anycast or individual (rmac, nexthop) addresses. Ticket:CM-26190 Reviewed By: Testing Done: Enable 'advertise-svi-ip' knob in bgp default instance. the vrf instance svi ip is advertised with nexthop as default instance router-id and RMAC as system MAC. Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* bgpd: evpn pip data struct and cliChirag Shah2019-11-221-0/+10
| | | | | | | | | | | | | | | | | | | | | | | Evpn Primary IP advertisement feature uses individual system IP and system MAC for prefix (type-5) and self type-2 routes. The PIP knob is enabled by default for bgp vrf instance. Configuration CLI for enable/disable PIP feature knob. User can configure PIP system IP and MAC to retain as permanent values. For the PIP IP, the default behavior is to accept bgp default instance's router-id. When the default instance router-id change, reflect PIP IP assignment. Reflect type-5 to use system-IP and system MAC as nexthop and RMAC values. Ticket:CM-26190 Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* bgpd: Ensure correct checks for EVPN route importvivek2019-08-191-0/+5
| | | | | | | | | | | | | | In a situation where a VRF has configured route targets for importing EVPN routes, this configuration may exist prior to the VRF being ready to have EVPN routes installed into it - e.g., still missing the L3VNI configuration or associated interface information. Ensure that this is taken into account during EVPN route import and unimport. Without this fix, EVPN routes would end up being prematurely imported into the VRF routing table and consequently installed as inactive (because the nexthop information would be incorrect when BGP informs zebra). Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
* Merge pull request #4025 from AnuradhaKaruppiah/pim-evpnJafar Al-Gharaibeh2019-04-221-2/+6
|\ | | | | pim-evpn: Forwarding overlay BUM traffic via multicast VxLAN tunnels in the underlay
| * bgpd: maintain flood mcast group per-l2-vniAnuradha Karuppiah2019-04-201-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | If PIM-SM if used for BUM flooding the multicast group address can be configured per-vxlan-device. BGP receives this config from zebra via the L2 VNI add/update. Sample output - root@TORS1:~# vtysh -c "show bgp l2vpn evpn vni 1000" |grep Mcast Mcast group: 239.1.1.100 root@TORS1:~# Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* | bgpd: lock the tenant-vrf associated with the l2-vniAnuradha Karuppiah2019-04-201-1/+2
|/ | | | | | | | | | | | | | | | | | | The l2vni (bgpevpn instance) was maintaining a back pointer to the tenant vrf without locking it. This would result in bgp_terminate crashing as the tenant-vrf is released before the underlay-vrf (vpn->bgp_vrf->l2vnis is NULL). Call stack - BGP: [bt 3] /lib/libfrr.so.0(listnode_delete+0x11) [0x7f041c967f51] BGP: [bt 4] /usr/lib/frr/bgpd(bgp_evpn_free+0x26) [0x55e3428eea46] BGP: [bt 5] /lib/libfrr.so.0(hash_iterate+0x4a) [0x7f041c95f00a] BGP: [bt 6] /usr/lib/frr/bgpd(bgp_evpn_cleanup+0x22) [0x55e3428f0a72] BGP: [bt 7] /usr/lib/frr/bgpd(bgp_free+0x180) [0x55e342955f50] PIM: vxlan SG (*,239.1.1.111) term mroute-up del BGP: [bt 8] /usr/lib/frr/bgpd(bgp_delete+0x43a) [0x55e342959d7a] BGP: [bt 9] /usr/lib/frr/bgpd(sigint+0xee) [0x55e3428d6a5e] Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com> Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
* bgp: fix misc evpn prefix match problems caused by using incorrect prefixlenAnuradha Karuppiah2019-03-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | The evpn route prefix len was being hardcoded to 224 bits while the length of a mac-ip addr is actually 288. Because of this many problems were seen in the evpn-tests. The sample below is from a test that does a vm-move to verify extended-evpn-mac-mobility - IP1-M1 => IP2->M1. You can see two local neighs but only one was inserted into the per-vni route table. root@TORC11:~# net show evpn arp vni 1001 |grep "2001:fee1:0:1::10\|2001:fee1:0:1::11" 2001:fee1:0:1::10 local active 00:54:6f:7c:74:64 2001:fee1:0:1::11 local active 00:54:6f:7c:74:64 root@TORC11:~# net show bgp l2vpn evpn route vni 1001 |grep "2001:fee1:0:1::10\|2001:fee1:0:1::11" *> [2]:[0]:[48]:[00:54:6f:7c:74:64]:[128]:[2001:fee1:0:1::11] root@TORC11:~# Similarly other traffic loss problems were seen because of one prefix updating another prefix's route. I think the 224-bits came from the packet format definition of type-2 routes. However the way FRR maintains the key is very different than the format in the packet so it seems best to just sizeof the addr. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* bgpd: advertise svi ip as macip config cmdChirag Shah2019-02-071-0/+7
| | | | | | Ticket:CM-23782 Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* bgpd: The default IP route not advertised with configured RDKishore Aramalla2018-11-291-0/+3
| | | | | | | | | | | | | When "default-originate ipv4" is configured, a type-5 route is installed in the local node and advertised to the peer with auto-rd. When the above was followed by configuring an RD in IP VRF, Type-5 are generated for only the non-default routes. Fixed this issue by withdrawing the default route with auto-rd and advertising the route with confiured RD. Signed-off-by: Kishore Aramalla karamalla@vmware.com
* bgpd: dup addr detect data struct for cfgChirag Shah2018-11-181-0/+18
| | | | | | | | | | | | | | Enable/disable duplicate address detection there are 3 actions warning-only: Default action which generates only frr warning (syslog) to user for any duplicate detecton freeze: Permanently freezes address, manual intervene required. freeze with time: An address will recover once the time has expired (auto-recovery). Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* bgpd: check existing l3vni for any l2vni creationChirag Shah2018-08-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | Scan all bgp vrf instances and respective L3VNI against the VNI which is being configured. Ticket:CM-21859 Testing Done: Configure l3vni, try to configure same vni as l2vni under router bgp, address-family l2vpn evpn. The configuration is rejected. show evpn vni VNI Type VxLAN IF # MACs # ARPs # Remote VTEPs Tenant VRF 4001 L3 vx-4001 0 0 n/a vrf1 TOR(config)# router bgp 5546 TOR(config-router)# address-family l2vpn evpn TOR(config-router-af)# vni 4001 % Failed to create VNI Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* Merge pull request #2665 from chiragshah6/evpn_devRuss White2018-07-241-0/+10
|\ | | | | bgpd: support evpn nd ext community