diff options
author | Donald Sharp <sharpd@cumulusnetworks.com> | 2015-05-20 02:40:31 +0200 |
---|---|---|
committer | Donald Sharp <sharpd@cumulusnetworks.com> | 2015-05-20 02:40:31 +0200 |
commit | 5e242b0dd303f20b0b1986bcf2f62914757b2054 (patch) | |
tree | 31d8160325457e30d170b58ac5ad76a973a32311 /tests | |
parent | Add set ipv6 next-hop peer-address command. (diff) | |
download | frr-5e242b0dd303f20b0b1986bcf2f62914757b2054.tar.xz frr-5e242b0dd303f20b0b1986bcf2f62914757b2054.zip |
cluster-id length equality for multipath
A fat tree topology running IBGP gets into two issues with anycast address
routing. Consider the following topology:
R9 R10
x x
R3 R4 R7 R8
x x
R1 R2 R5 R6
| | | |
10/8 10/8 10/8 S
Let's remind ourselves of BGP decision process steps:
1. Highest Local Preference
2. Shortest AS Path Length
3. Lowest Origin Type
4. Lowest MED (Multi-Exit Discriminator)
5. Prefer External to Internal
6. Closest Egress (Lowest IGP Distance)
7. Tie Breaking (Lowest-Router-ID)
8. Tie Breaking (Lowest-cluster-list length)
9. Tie Breaking (Lowest-neighbor-address)
Without any policies, steps 1-6 will almost always evaluate identically for
all paths received on any router in the above topology. Let's assume that
the router-ids follow the following inequality: R1 < R2 < R5 < R6. Owing to
the 7th step above, all routers will now choose R1's path as the best. This
is undesirable. As an example, traffic from S to 10/8 will follow the path
S -> R6 -> R7 -> R9 -> R4 -> R2 -> 10/8 instead of S -> R6 -> R7 -> R5 -> 10/8.
Furthermore, once R7 (& R8) chooses R1's path as the best, it would withdraw
its path learned through (R5, R6) from (R9, R10). This leads to inefficient
load balancing - e.g. R9 can't do ECMP across all available egresses -
(R1, R2, R5).
The patch addresses these issues by noting that that cluster list is always
carried along with the routes and its length is a good indicator of IBGP
hops. It thus makes sense to compare that as an extension to metric after
step 6. That automatically ensures correct multipath computation.
Unfortunately a partial deployment of this in a generic topology (note:
fat-tree/clos topologies work fine) may lead to potential loops. It needs
to be looked into.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Reviewed-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Diffstat (limited to 'tests')
-rw-r--r-- | tests/bgp_mpath_test.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/tests/bgp_mpath_test.c b/tests/bgp_mpath_test.c index 3d0ecb78b..a6ca9c537 100644 --- a/tests/bgp_mpath_test.c +++ b/tests/bgp_mpath_test.c @@ -157,9 +157,9 @@ run_bgp_cfg_maximum_paths (testcase_t *t) for (safi = SAFI_UNICAST; safi < SAFI_MAX; safi++) { /* test bgp_maximum_paths_set */ - api_result = bgp_maximum_paths_set (bgp, afi, safi, BGP_PEER_EBGP, 10); + api_result = bgp_maximum_paths_set (bgp, afi, safi, BGP_PEER_EBGP, 10, 0); EXPECT_TRUE (api_result == 0, test_result); - api_result = bgp_maximum_paths_set (bgp, afi, safi, BGP_PEER_IBGP, 10); + api_result = bgp_maximum_paths_set (bgp, afi, safi, BGP_PEER_IBGP, 10, 0); EXPECT_TRUE (api_result == 0, test_result); EXPECT_TRUE (bgp->maxpaths[afi][safi].maxpaths_ebgp == 10, test_result); EXPECT_TRUE (bgp->maxpaths[afi][safi].maxpaths_ibgp == 10, test_result); |