diff options
author | Marcin Siodelski <marcin@isc.org> | 2018-04-09 11:53:40 +0200 |
---|---|---|
committer | Marcin Siodelski <marcin@isc.org> | 2018-05-10 18:03:56 +0200 |
commit | 25ced3639b000355b5bc61c3bee1befca01f4215 (patch) | |
tree | 3fdff318092696e9a524c68f3db0a7536423ba7b /doc | |
parent | [5478] Initial documentation for the HA hook library. (diff) | |
download | kea-25ced3639b000355b5bc61c3bee1befca01f4215.tar.xz kea-25ced3639b000355b5bc61c3bee1befca01f4215.zip |
[5478] Load balancing configuration described.
Diffstat (limited to 'doc')
-rw-r--r-- | doc/guide/hooks.xml | 426 |
1 files changed, 421 insertions, 5 deletions
diff --git a/doc/guide/hooks.xml b/doc/guide/hooks.xml index bb7768dfbf..340811910c 100644 --- a/doc/guide/hooks.xml +++ b/doc/guide/hooks.xml @@ -2906,11 +2906,427 @@ both the command and the response. <title>Server States</title> <para>The DHCP server operating within an HA setup runs a state machine and the state of the server can be retrieved by its peers using the - 'ha-heartbeat' command sent over the RESTful API. If the partner server - doesn't respond to the 'ha-heartbeat' command longer than configured - amount of time, the communication is considered interrupted and the - server may (depending on the configuration) use additional measures to - verify if the partner is still operating.</para> + <command>ha-heartbeat</command> command sent over the RESTful API. If + the partner server doesn't respond to the <command>ha-heartbeat</command> + command longer than configured amount of time, the communication is + considered interrupted and the server may (depending on the configuration) + use additional measures to verify if the partner is still operating. + If it finds that the partner is not operating, the server transitions + to the <command>partner-down</command> state to handle the entire + DHCP traffic directed to the system.</para> + + <para>In this case, the surviving server continues to send the + <command>ha-heartbeat</command> command to detect when the partner wakes + up. The partner synchronizes the lease database and when it is finally + ready to operate, the surviving server returns to the normal operation, + i.e. <command>load-balancing</command> or <command>hot-standby</command> + state.</para> + + <para>The following is the list of all possible states into which the + servers may transition: + + <itemizedlist mark="bullet"> + <listitem><para><command>backup</command> - normal operation of the + backup server. In this state it receives lease updates from the active + servers.</para></listitem> + + <listitem><para><command>hot-standby</command> - normal operation of + the active server running in the hot standby mode. Both primary and + standby server are in this state during their normal operation. + The primary server is responding to the DHCP queries and sends lease updates + to the standby server and to the backup servers, if any backup servers + are present.</para></listitem> + + <listitem><para><command>load-balancing</command> - normal operation + of the active server running in the load balancing mode. Both primary + and secondary server are in this state during their normal operation. + Both servers are responding to the DHCP queries and send lease updates + to each other and to the backup servers, if any backup servers are + present.</para></listitem> + + <listitem><para><command>partner-down</command> - an active server + transitions to this state after detecting that its partner (another + active server) is offline. The server doesn't transition to this state + if any of the backup servers is unavailable. In the <command> + partner-down</command> state the server responds to all DHCP queries, + so also those queries which are normally handled by the active server + which is now unavailable.</para></listitem> + + <listitem><para><command>ready</command> - an active server transitions + to this state after synchronizing its lease database with an active + partner. This state is to indicate to the partner (likely being in the + <command>partner-down</command> state that it may return to the + normal operation. When it does, the server being in the <command> + ready</command> state will also start normal operation.</para> + </listitem> + + <listitem><para><command>syncing</command> - an active server + transitions to this state to fetch leases from the active partner + and update the local lease database. When it this state, it + issues the <command>dhcp-disable</command> to disable the DHCP + service of the partner from which the leases are fetched. The DHCP + servie is disabled for the maximum time of 60 seconds, after which + it is automatically enabled, in case the syncing partner has died + again failing to re-enable the service. If the synchronization is + completed the syncing server issues the <command>dhcp-enable + </command> to re-enable the DHCP service of the partner. The + syncing operation is synchronous. The server is waiting for an + answer from the partner and is not doing anything else while the + leases synchronization takes place.</para></listitem> + + <listitem><para><command>waiting</command> - each started server + instance enters this state. The backup server will transition + directly from this statet to the <command>backup</command> state. + An active server will send heartbeat to its partner to check its + state. If the partner appears to be unavailable the server will + transition to the <command>partner-down</command>, otherwise it + will transition to the <command>syncing</command> state and attempt + to synchronize the lease database. If both servers appear to be + in this state (concurrent startup) the primary server will + synchronize first. The secondary or standby server will remain + in the <command>waiting</command> state until the primary + synchronizes the database.</para></listitem>. + </itemizedlist> + + <para>Whether the server responds to the DHCP queries and which + queries it responds to is a matter of the server's state, if no + administrative action is performed to configure the server + otherwise. The following table provides the default behavior for + various states.</para> + + <para> + <table frame="all" xml:id="ha-default-states-behavior"> + <title>Default behavior of the server in various HA states</title> + <tgroup cols="4"> + <colspec colname="state"/> + <colspec colname="server type" align="center"/> + <colspec colname="dhcp-service" align="center"/> + <colspec colname="dhcp-service-scopes" align="center"/> + <thead> + <row> + <entry>State</entry> + <entry>Server Type</entry> + <entry>DHCP Service</entry> + <entry>DHCP Service Scopes</entry> + </row> + </thead> + <tbody> + <row> + <entry>backup</entry> + <entry>backup server</entry> + <entry>disabled</entry> + <entry>none</entry> + </row> + <row> + <entry>hot-standby</entry> + <entry>primary or standby (hot standby mode)</entry> + <entry>enabled</entry> + <entry><command>ha_server1</command> if primary, none otherwise</entry> + </row> + <row> + <entry>load-balancing</entry> + <entry>primary or secondary (load balancing mode)</entry> + <entry>enabled</entry> + <entry><command>ha_server1</command> or <command>ha_server2</command></entry> + </row> + <row> + <entry>partner-down</entry> + <entry>active server</entry> + <entry>enabled</entry> + <entry>all scopes</entry> + </row> + <row> + <entry>ready</entry> + <entry>active server</entry> + <entry>disabled</entry> + <entry>none</entry> + </row> + <row> + <entry>syncing</entry> + <entry>active server</entry> + <entry>disabled</entry> + <entry>none</entry> + </row> + <row> + <entry>waiting</entry> + <entry>any server</entry> + <entry>disabled</entry> + <entry>none</entry> + </row> + </tbody> + </tgroup> + </table> + </para> + + </para> + + <para>The DHCP service scopes require some explanation. The HA + configuration must specify a unique name for each server within + the HA setup. This document uses the following convention within + provided examples: <command>server1</command> for a primary server, + <command>server2</command> for the secondary or standby server and + <command>server3</command> for the backup server. In the real life + any names can be used as long as they remain unique.</para> + + <para>In the load balancing mode there are two scopes named after + the active servers: <command>ha_server1</command> and <command> + ha_server2</command>. The DHCP queries load balanced to the + <command>server1</command> belong to the <command>ha_server1</command> + scope and the queries load balanced to the <command>server2</command> + belong to the <command>ha_server2</command> scope. If any of the + servers is in the <command>partner-down</command> state, it is + responsible for serving both scopes.</para> + + <para>In the hot standby mode, there is only one scope <command> + ha_server1</command> because only the <command>server1</command> + is responding to the DHCP queries. If that server crashes, the + <command>server2</command> becomes responsible for this scope. + </para> + + <para>The backup servers do not have their own scopes. In some + cases they can be used to respond to the queries belonging to + the scopes of the active servers. Also, a server which is neither + in the partner-down state nor in the normal operation serves + no scopes.</para> + + <para>The scope names can be used to associate pools, subnets + and networks with certain servers, so as only these servers + can allocate addresses or prefixes from those pools, subnets + or network. This is done via the client classification mechanism + (see below).</para> + </section> + + <section xml:id="ha-load-balancing-config"> + <title>Load Balancing Configuration</title> + <para>The following is the configuration snippet which enables + high availability on the primary server within the load balancing + configuration. The same configuration should be applied on the + secondary and the backup server, with the only difference that + the <command>this-server-name</command> should be set to + <command>server2</command> and <command>server3</command> + on those servers respectively.</para> +<screen> +{ +"Dhcp4": { + + ... + + "hooks-libraries": [ + { + "library": "/usr/lib/hooks/libdhcp_lease_cmds.so", + "parameters": { } + }, + { + "library": "/usr/lib/hooks/libdhcp_ha.so", + "parameters": { + "high-availability": [ { + "this-server-name": "server1", + "mode": "load-balancing", + "heartbeat-delay": 10, + "max-response-delay": 10, + "max-ack-delay": 5, + "max-unacked-clients": 5, + "peers": [ + { + "name": "server1", + "url": "http://192.168.56.33:8080/", + "role": "primary", + "auto-failover": true + }, + { + "name": "server2", + "url": "http://192.168.56.66:8080/", + "role": "secondary", + "auto-failover": true + }, + { + "name": "server3", + "url": "http://192.168.56.99:8080/", + "role": "backup", + "auto-failover": false + } + ] + } ] + } + } + ], + + "subnet4": [ + { + "subnet": "192.0.3.0/24", + "pools": [ + { + "pool": "192.0.3.100 - 192.0.3.150", + "client-class": "ha_server1" + }, + { + "pool": "192.0.3.200 - 192.0.3.250", + "client-class": "ha_server2" + } + ], + + "option-data": [ + { + "name": "routers", + "data": "192.0.3.1" + } + ], + + "relay": { "ip-address": "10.1.2.3" } + } + ], + + ... + +} + +} +</screen> + + <para>Two hook libraries must be loaded to enable HA: + <filename>libdhcp_lease_cmds.so</filename> and + <filename>libdhcp_ha.so</filename>. The former provides the + implemenation of the HA feature. The latter enables control + commands required by HA to fetch and manipulate leases on the + remote servers. In the example provided above, it is assumed that + Kea libraries are installed in the <filename>/usr/lib</filename> + directory. If Kea is not installed in the /usr directory, the + hook libraries locations must be updated accordingly. + </para> + + <para>The HA configuration is specified within the scope of the + <filename>libdhcp_ha.so</filename>. Note that the top level + parameter <command>high-availability</command> is a list, even + though it currently contains only one entry. In the future this + configuration is likely to be extended to contain more entries, + if the particular server can participate in more than one + HA relationships.</para> + + <para>The following are the global parameters which control the server's + behavior with respect to HA: + <itemizedlist mark="bullet"> + <listitem><para><command>this-server-name</command> - is a unique + identifier of the server within this HA setup. It must match with one + of the servers specified within <command>peers</command> list. + </para></listitem> + + <listitem><para><command>mode</command> - specifies a HA mode + of operation. Currently supported modes are <command>load-balancing + </command> and <command>hot-standby</command>.</para></listitem> + + <listitem><para><command>heartbeat-delay</command> - specifies + a duration in seconds between the last heartbeat (or other command sent + to the partner) and sending the next heartbeat. The heartbeats are sent + periodically to gather the status of the partner and to verify whether + the partner is still operating.</para></listitem> + + <listitem><para><command>max-response-delay</command> - specifies a + duration in seconds since the last successful communication with the + partner, after which the server assumes that the communication with + the partner is interrupted. This duration should be greater than + the <command>heartbeat-delay</command>. Usually it is a greater than + the duration of multiple <command>heartbeat-delay</command> values. + When the server detects that the communication is interrupted, it + may transition to the <command>partner-down</command> state (when + <command>max-unacked-clients</command> is 0) or trigger failure + detection procedure using the values of the two parameters below. + </para></listitem> + + <listitem><para><command>max-ack-delay</command> - is one of + the parameters controlling partner failure detection. When the + communication with the partner is interrupted, the server examines values + of the <command>secs</command> field (DHCPv4) or <command>Elapsed Time + </command> option (DHCPv6) which denote how long the DHCP client has been + trying to communicate with the DHCP server. This parameter specifies the + maximum time for the client to try to communicate with the DHCP server, + after which this server assumes that the client failed to communicate + with the DHCP server (is "unacked").</para></listitem> + + <listitem><para><command>max-unacked-clients</command> - specifies + how many "unacked" clients are allowed (see <command>max-ack-delay</command>) + before this server assumes that the partner is offline and transitions + to the <command>partner-down</command> state. The special value of 0 + is allowed for this parameter which disables failure detection + mechanism. In this case, the server which can't communicate with the + partner over the control channel assumes that the partner server is + down and transitions to the <command>partner-down</command> state + immediately.</para></listitem> + + </itemizedlist> + </para> + + <para> + The values of <command>max-ack-delay</command> and + <command>max-unacked</command> must be selected carefully, taking + into account specifics of the network in which DHCP servers are + operating. Note that the server in question may not respond to some + of the DHCP clients because these clients are not to be serviced + by this server (per administrative policy). The server may also + drop malformed queries from the clients. Therefore, selecting too + low value for the <command>max-unacked-clients</command> may + result in transitioning to the <command>partner-down</command> + state even though the partner is still operating. On the other + hand, selecting too high value may result in never transitioning + to the <command>partner-down</command> state if the DHCP + traffic in the network is very low (e.g. night time), because the + number of distinct clients trying to communicate with the server + could be lower than <command>max-unacked-clients</command>. + </para> + + <para>In some cases it may be useful to disable the failure detection + mechanism altogether, if the servers are located very close to each + other and the network partitioning is unlikely, i.e. failure to + respond to heartbeats is only possible when the partner is offline. + In such cases, set the <command>max-unacked-clients</command> to 0. + </para> + + <para>The <command>peers</command> parameter contains a list of servers + within this HA setup. In this configuration it must contain at least + one primary and one secondary server. It may also contain unlimited + number of backup servers. In this example there is one backup server + which receives lease updates from the active servers.</para> + + <para>There are the following parameters specified for each of the + peers within this list: + + <itemizedlist mark="bullet"> + <listitem><para><command>name</command> - specifies unique name for + the server.</para></listitem> + + <listitem><para><command>url</command> - specifies URL to be used to + contact this server over the control channel. Other servers used this + URL to send control commands to that server.</para></listitem> + + <listitem><para><command>role</command> - denotes the role of the + server in the HA setup. The following roles are supported in the + load balancing configuration: <command>primary</command>, + <command>secondary</command> and <command>backup</command>. + There must be exactly one primary and one secondary server in the + load balancing setup.</para></listitem> + + <listitem><para><command>auto-failover</command> - a boolean value + which denotes whether the server detecting a partner's failure should + automatically start serving its clients.</para></listitem> + + </itemizedlist> + </para> + + <para>In our example configuration, both active servers can allocate + leases from the subnet "192.0.3.0/24". This subnet contains two + address pools: "192.0.3.100 - 192.0.3.150" and "192.0.3.200 - 192.0.3.250", + which are associated with HA servers scopes using client classification. + When the <command>server1</command> processes a DHCP query it will use + the first pool for the lease allocation. Conversely, when the + <command>server2</command> is processing the DHCP query it will use the + second pool. When any of the servers is in the <command>partner-down + </command> state, it can serve leases from both pools and it will + select the pool which is appropriate for the received query. In + other words, if the query would normally be processed by the + <command>server2</command>, but this server has crashed, the + <command>server1</command> will allocate the lease from the pool of + "192.0.3.200 - 192.0.3.250". + </para> + </section> </section> <!-- end of high-availability-library --> |