summaryrefslogtreecommitdiffstats
path: root/bgpd/bgp_mpath.c
diff options
context:
space:
mode:
authorDonald Sharp <sharpd@cumulusnetworks.com>2015-05-20 02:40:31 +0200
committerDonald Sharp <sharpd@cumulusnetworks.com>2015-05-20 02:40:31 +0200
commit5e242b0dd303f20b0b1986bcf2f62914757b2054 (patch)
tree31d8160325457e30d170b58ac5ad76a973a32311 /bgpd/bgp_mpath.c
parentAdd set ipv6 next-hop peer-address command. (diff)
downloadfrr-5e242b0dd303f20b0b1986bcf2f62914757b2054.tar.xz
frr-5e242b0dd303f20b0b1986bcf2f62914757b2054.zip
cluster-id length equality for multipath
A fat tree topology running IBGP gets into two issues with anycast address routing. Consider the following topology: R9 R10 x x R3 R4 R7 R8 x x R1 R2 R5 R6 | | | | 10/8 10/8 10/8 S Let's remind ourselves of BGP decision process steps: 1. Highest Local Preference 2. Shortest AS Path Length 3. Lowest Origin Type 4. Lowest MED (Multi-Exit Discriminator) 5. Prefer External to Internal 6. Closest Egress (Lowest IGP Distance) 7. Tie Breaking (Lowest-Router-ID) 8. Tie Breaking (Lowest-cluster-list length) 9. Tie Breaking (Lowest-neighbor-address) Without any policies, steps 1-6 will almost always evaluate identically for all paths received on any router in the above topology. Let's assume that the router-ids follow the following inequality: R1 < R2 < R5 < R6. Owing to the 7th step above, all routers will now choose R1's path as the best. This is undesirable. As an example, traffic from S to 10/8 will follow the path S -> R6 -> R7 -> R9 -> R4 -> R2 -> 10/8 instead of S -> R6 -> R7 -> R5 -> 10/8. Furthermore, once R7 (& R8) chooses R1's path as the best, it would withdraw its path learned through (R5, R6) from (R9, R10). This leads to inefficient load balancing - e.g. R9 can't do ECMP across all available egresses - (R1, R2, R5). The patch addresses these issues by noting that that cluster list is always carried along with the routes and its length is a good indicator of IBGP hops. It thus makes sense to compare that as an extension to metric after step 6. That automatically ensures correct multipath computation. Unfortunately a partial deployment of this in a generic topology (note: fat-tree/clos topologies work fine) may lead to potential loops. It needs to be looked into. Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com> Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Diffstat (limited to 'bgpd/bgp_mpath.c')
-rw-r--r--bgpd/bgp_mpath.c4
1 files changed, 3 insertions, 1 deletions
diff --git a/bgpd/bgp_mpath.c b/bgpd/bgp_mpath.c
index 7999d16b6..5590b0513 100644
--- a/bgpd/bgp_mpath.c
+++ b/bgpd/bgp_mpath.c
@@ -46,7 +46,7 @@
*/
int
bgp_maximum_paths_set (struct bgp *bgp, afi_t afi, safi_t safi,
- int peertype, u_int16_t maxpaths)
+ int peertype, u_int16_t maxpaths, u_int16_t options)
{
if (!bgp || (afi >= AFI_MAX) || (safi >= SAFI_MAX))
return -1;
@@ -55,6 +55,7 @@ bgp_maximum_paths_set (struct bgp *bgp, afi_t afi, safi_t safi,
{
case BGP_PEER_IBGP:
bgp->maxpaths[afi][safi].maxpaths_ibgp = maxpaths;
+ bgp->maxpaths[afi][safi].ibgp_flags |= options;
break;
case BGP_PEER_EBGP:
bgp->maxpaths[afi][safi].maxpaths_ebgp = maxpaths;
@@ -82,6 +83,7 @@ bgp_maximum_paths_unset (struct bgp *bgp, afi_t afi, safi_t safi,
{
case BGP_PEER_IBGP:
bgp->maxpaths[afi][safi].maxpaths_ibgp = BGP_DEFAULT_MAXPATHS;
+ bgp->maxpaths[afi][safi].ibgp_flags = 0;
break;
case BGP_PEER_EBGP:
bgp->maxpaths[afi][safi].maxpaths_ebgp = BGP_DEFAULT_MAXPATHS;