diff options
author | Donald Sharp <sharpd@cumulusnetworks.com> | 2015-05-20 02:40:34 +0200 |
---|---|---|
committer | Donald Sharp <sharpd@cumulusnetworks.com> | 2015-05-20 02:40:34 +0200 |
commit | fb018d251e5fc2fee0814a63c4e0a44274c98bb9 (patch) | |
tree | 59351ede3cd497fbb05c2b6906de549f17686f55 /doc | |
parent | bgpd: bgpd-update-delay.patch (diff) | |
download | frr-fb018d251e5fc2fee0814a63c4e0a44274c98bb9.tar.xz frr-fb018d251e5fc2fee0814a63c4e0a44274c98bb9.zip |
nexthop-tracking.patch
quagga: nexthop-tracking.patch
Add next hop tracking support to Quagga. Complete documentation in doc/next-hop-tracking.txt.
Signed-off-by: Pradosh Mohapatra <pmohapat@cumulusnetworks.com>
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/next-hop-tracking.txt | 326 |
1 files changed, 326 insertions, 0 deletions
diff --git a/doc/next-hop-tracking.txt b/doc/next-hop-tracking.txt new file mode 100644 index 000000000..d157866e8 --- /dev/null +++ b/doc/next-hop-tracking.txt @@ -0,0 +1,326 @@ +0. Introduction + +This is the design specification for next hop tracking feature in +Quagga. + +1. Background + +Recursive routes are of the form: + + p/m --> n + [Ex: 1.1.0.0/16 --> 2.2.2.2] + +where 'n' itself is resolved through another route as follows: + + p2/m --> h, interface + [Ex: 2.2.2.0/24 --> 3.3.3.3, eth0] + +Usually, BGP routes are recursive in nature and BGP nexthops get +resolved through an IGP route. IGP usually adds its routes pointing to +an interface (these are called non-recursive routes). + +When BGP receives a recursive route from a peer, it needs to validate +the nexthop. The path is marked valid or invalid based on the +reachability status of the nexthop. Nexthop validation is also +important for BGP decision process as the metric to reach the nexthop +is a parameter to best path selection process. + +As it goes with routing, this is a dynamic process. Route to the +nexthop can change. The nexthop can become unreachable or +reachable. In the current BGP implementation, the nexthop validation +is done periodically in the scanner run. The default scanner run +interval is one minute. Every minute, the scanner task walks the +entire BGP table. It checks the validity of each nexthop with Zebra +(the routing table manager) through a request and response message +exchange between BGP and Zebra process. BGP process is blocked for +that duration. The mechanism has two major drawbacks: + +(1) The scanner task runs to completion. That can potentially starve + the other tasks for long periods of time, based on the BGP table + size and number of nexthops. + +(2) Convergence around routing changes that affect the nexthops can be + long (around a minute with the default intervals). The interval + can be shortened to achieve faster reaction time, but it makes the + first problem worse, with the scanner task consuming most of the + CPU resources. + +"Next hop tracking" feature makes this process event-driven. It +eliminates periodic nexthop validation and introduces an asynchronous +communication path between BGP and Zebra for route change notifications +that can then be acted upon. + +2. Goal + +Stating the obvious, the main goal is to remove the two limitations we +discussed in the previous section. The goals, in a constructive tone, +are the following: + +- fairness: the scanner run should not consume an unjustly high amount + of CPU time. This should give an overall good performance and + response time to other events (route changes, session events, + IO/user interface). + +- convergence: BGP must react to nexthop changes instantly and provide + sub-second convergence. This may involve diverting the routes from + one nexthop to another. + +3. Overview of the changes + +The changes are in both BGP and Zebra modules. The short summary is +the following: + +- Zebra implements a registration mechanism by which clients can + register for next hop notification. Consequently, it maintains a + separate table, per (VRF, AF) pair, of next hops and interested + client-list per next hop. + +- When the main routing table changes in Zebra, it evaluates the next + hop table: for each next hop, it checks if the route table + modifications have changed its state. If so, it notifies the + interested clients. + +- BGP is one such client. It registers the next hops corresponding to + all of its received routes/paths. It also threads the paths against + each nexthop structure. + +- When BGP receives a next hop notification from Zebra, it walks the + corresponding path list. It makes them valid or invalid depending + on the next hop notification. It then re-computes best path for the + corresponding destination. This may result in re-announcing those + destinations to peers. + +4. Design + +4.1. Modules + +The core design introduces an "nht" (next hop tracking) module in BGP +and "rnh" (recursive nexthop) module in Zebra. The "nht" module +provides the following APIs: + +bgp_find_or_add_nexthop() : find or add a nexthop in BGP nexthop table +bgp_find_nexthop() : find a nexthop in BGP nexthop table +bgp_parse_nexthop_update() : parse a nexthop update message coming + from zebra + +The "rnh" module provides the following APIs: + +zebra_add_rnh() : add a recursive nexthop +zebra_delete_rnh() : delete a recursive nexthop +zebra_lookup_rnh() : lookup a recursive nexthop + +zebra_add_rnh_client() : register a client for nexthop notifications + against a recursive nexthop + +zebra_remove_rnh_client(): remove the client registration for a + recursive nexthop + +zebra_evaluate_rnh_table(): (re)evaluate the recursive nexthop table + (most probably because the main routing + table has changed). + +zebra_cleanup_rnh_client(): Cleanup a client from the "rnh" module + data structures (most probably because the + client is going away). + +4.2. Control flow + +The next hop registration control flow is the following: + +<==== BGP Process ====>|<==== Zebra Process ====> + | +receive module nht module | zserv module rnh module +---------------------------------------------------------------------- + | | | +bgp_update_ | | | + main() | bgp_find_or_add_ | | + | nexthop() | | + | | | + | | zserv_nexthop_ | + | | register() | + | | | zebra_add_rnh() + | | | + + +The next hop notification control flow is the following: + +<==== Zebra Process ====>|<==== BGP Process ====> + | +rib module rnh module | zebra module nht module +---------------------------------------------------------------------- + | | | +meta_queue_ | | | + process() | zebra_evaluate_ | | + | rnh_table() | | + | | | + | | bgp_read_nexthop_ | + | | update() | + | | | bgp_parse_ + | | | nexthop_update() + | | | + + +4.3. zclient message format + +ZEBRA_NEXTHOP_REGISTER and ZEBRA_NEXTHOP_UNREGISTER messages are +encoded in the following way: + +/* + * 0 1 2 3 + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | AF | prefix len | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . Nexthop prefix . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | AF | prefix len | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . Nexthop prefix . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + */ + +ZEBRA_NEXTHOP_UPDATE message is encoded as follows: + +/* + * 0 1 2 3 + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | AF | prefix len | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . Nexthop prefix getting resolved . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | metric | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | #nexthops | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | nexthop type | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . resolving Nexthop details . + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * | nexthop type | + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + * . resolving Nexthop details . + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + */ + +4.4. BGP data structure + +Legend: + +/\ struct bgp_node: a BGP destination/route/prefix +\/ + +[ ] struct bgp_info: a BGP path (e.g. route received from a peer) + + _ +(_) struct bgp_nexthop_cache: a BGP nexthop + + + + /\ NULL + \/--+ ^ + | : + +--[ ]--[ ]--[ ]--> NULL + /\ : + \/--+ : + | : + +--[ ]--[ ]--> NULL + : + _ : + (_)............. + + +4.5. Zebra data structure + +rnh table: + + O + / \ + O O + / \ + O O + + struct rnh + { + u_char flags; + struct rib *state; + struct list *client_list; + struct route_node *node; + }; + +5. User interface changes + +quagga# show ip nht +3.3.3.3 + resolved via kernel + via 11.0.0.6, swp1 + Client list: bgp(fd 12) +11.0.0.10 + resolved via connected + is directly connected, swp2 + Client list: bgp(fd 12) +11.0.0.18 + resolved via connected + is directly connected, swp4 + Client list: bgp(fd 12) +11.11.11.11 + resolved via kernel + via 10.0.1.2, eth0 + Client list: bgp(fd 12) + +quagga# show ip bgp nexthop +Current BGP nexthop cache: + 3.3.3.3 valid [IGP metric 0], #paths 3 + Last update: Wed Oct 16 04:43:49 2013 + + 11.0.0.10 valid [IGP metric 1], #paths 1 + Last update: Wed Oct 16 04:43:51 2013 + + 11.0.0.18 valid [IGP metric 1], #paths 2 + Last update: Wed Oct 16 04:43:47 2013 + + 11.11.11.11 valid [IGP metric 0], #paths 1 + Last update: Wed Oct 16 04:43:47 2013 + +quagga# show ipv6 nht +quagga# show ip bgp nexthop detail + +quagga# debug bgp nht +quagga# debug zebra nht + +6. Sample test cases + + r2----r3 + / \ / + r1----r4 + +- Verify that a change in IGP cost triggers NHT + + shutdown the r1-r4 and r2-r4 links + + no shut the r1-r4 and r2-r4 links and wait for OSPF to come back + up + + We should be back to the original nexthop via r4 now +- Verify that a NH becoming unreachable triggers NHT + + Shutdown all links to r4 +- Verify that a NH becoming reachable triggers NHT + + no shut all links to r4 + +7. Future work + +- route-policy for next hop validation (e.g. ignore default route) +- damping for rapid next hop changes +- prioritized handling of nexthop changes ((un)reachability vs. metric + changes) +- handling recursion loop, e.g. + 11.11.11.11/32 -> 12.12.12.12 + 12.12.12.12/32 -> 11.11.11.11 + 11.0.0.0/8 -> <interface> +- better statistics |