summaryrefslogtreecommitdiffstats
path: root/doc/sphinx/arm
diff options
context:
space:
mode:
authorThomas Markwalder <tmark@isc.org>2024-04-03 19:31:48 +0200
committerThomas Markwalder <tmark@isc.org>2024-04-03 19:31:48 +0200
commitf39b4692f8f707cda35c34c2e839b8da34517e58 (patch)
tree0a8b0cc17c8fb983a3127d8edfa0df399e188e19 /doc/sphinx/arm
parent[#3297] Fixed broken test (diff)
downloadkea-f39b4692f8f707cda35c34c2e839b8da34517e58.tar.xz
kea-f39b4692f8f707cda35c34c2e839b8da34517e58.zip
[#3297] Updated ARM, added a ChangeLog
Diffstat (limited to 'doc/sphinx/arm')
-rw-r--r--doc/sphinx/arm/hooks-perfmon.rst135
1 files changed, 120 insertions, 15 deletions
diff --git a/doc/sphinx/arm/hooks-perfmon.rst b/doc/sphinx/arm/hooks-perfmon.rst
index 224a0ad4d4..8e27f28724 100644
--- a/doc/sphinx/arm/hooks-perfmon.rst
+++ b/doc/sphinx/arm/hooks-perfmon.rst
@@ -9,7 +9,8 @@ to extend them with the ability to track and report performance related data.
.. note::
- This library is currently under development and not fully functional.
+ This library is experimental and not recommended for use in production
+ environments.
Overview
~~~~~~~~
@@ -93,23 +94,127 @@ uniquely identified by a "duration key" which consists of the following values:
Once the server has finished constructing a response to a query, the query's event stack
is processed into a series of updates to monitored durations. If upon updating, a
duration's sample interval is found to have been completed, it is sent to reporting
-and a new sample interval is begun.
-
-.. Note:
- Monitoring is not yet functional.
+and a new sample interval is begun. The interval width is dictacted by configuration
+parameter ``interval-width-secs``.
+
+The event stack for the multi-threaded mode DHCPDISCOVER/DHCPOFFER cycle shown above
+contains the following events:
+
+ +-----------------------------+--------------------+
+ | Event Timestamp | Event Label |
+ +=============================+====================+
+ | 2024-Mar-20 14:52:20.067563 | socket_received |
+ +-----------------------------+--------------------+
+ | 2024-Mar-20 14:52:20.067810 | buffer_read |
+ +-----------------------------+--------------------+
+ | 2024-Mar-20 14:52:20.067897 | mt_queued |
+ +-----------------------------+--------------------+
+ | 2024-Mar-20 14:52:20.067952 | process_started |
+ +-----------------------------+--------------------+
+ | 2024-Mar-20 14:52:20.069614 | process_completed |
+ +-----------------------------+--------------------+
+
+Assuming the selected subnet's ID was 100, the duration updates formed by PerfMon
+from these events are shown below:
+
+ +--------------------------------------------------------------+--------------+
+ | Duration Keys for SubnetID 100 | Update in |
+ | | microseconds |
+ +==============================================================+==============+
+ | DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.100 | 247 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.100 | 87 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.100 | 55 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.process_started-process_completed.100 | 1662 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.composite-total_response.100 | 2051 |
+ +--------------------------------------------------------------+--------------+
+
+Notice that in addition to the adjacent event updates, there is an additional duration
+update for the total duration of the entire stack whose key contains the event-pair
+``composite-total_response``. This tracks the total time to responds from query
+received until the response is ready to send. Finally, there would also be global
+duration updates for each of the above:
+
+ +--------------------------------------------------------------+--------------+
+ | Global Duration Keys | Update in |
+ | | milliseconds |
+ +==============================================================+==============+
+ | DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.0 | 247 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.0 | 87 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.0 | 55 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.process_started-process_completed.0 | 1662 |
+ +--------------------------------------------------------------+--------------+
+ | DHCPDISCOVER.DHCPOFFER.composite-total_response.0 | 2051 |
+ +--------------------------------------------------------------+--------------+
+
+Statistics Reporting
+~~~~~~~~~~~~~~~~~~~~
+
+When enabled (see ``stats-mgr-reporting``), PerfMon will report a duration's data
+data each time the duration completes a sampling interval. Each statistic employs
+the following naming convention:
+
+::
+
+ {subnet-id[x]}.perfmon.<query type>-<response type>.<start event>-<end event>.<value-name>
+
+And there will be both a global and a subnet-specific value for each. Currently the only
+value reported for a given duration key is "average-ms". This statistic is the average time
+between the duration's event pair over the most recently completed interval. In other
+words if during a given interval there were 7 occurrences (i.e. updates) totaling
+350ms, the the average-ms reported would be 50ms. Continuing with example above, the
+statistics reported would be named as follows for subnet level values:
+
+::
+
+ subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.average-ms
+ subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.average-ms
+ subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.average-ms
+ subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.process_started-process_completed.average-ms
+ subnet[100].perfmon.DHCPDISCOVER.DHCPOFFER.composite-total_response.average-ms
+
+and as shown for global values:
+
+::
+
+ perfmon.DHCPDISCOVER.DHCPOFFER.socket_received-buffer_read.average-ms
+ perfmon.DHCPDISCOVER.DHCPOFFER.buffer_read-mt_queue.average-ms
+ perfmon.DHCPDISCOVER.DHCPOFFER.mt_queued-process_started.average-ms
+ perfmon.DHCPDISCOVER.DHCPOFFER.process_started-process_completed.average-ms
+ perfmon.DHCPDISCOVER.DHCPOFFER.composite-total_response.average-ms
+
+Since they are reported to StatsMgr they may be fetched using the commands :isccmd:`statistic-get-all`
+or :isccmd:`statistic-get`.
Alarms
~~~~~~
-Alarms may be defined to watch specific durations. Each alarm defines a high-water mark and a
-low-water mark. When the reported average value for duration exceeds the high-water mark, a
-WARN level alarm log will be emitted at which point the alarm is considered "triggered". Once
-triggered the WARN level log will be repeated at a specified alarm report interval as long the
-reported average for the duration remains above the low-water mark. Once the average falls
-below the low-water mark the alarm is "cleared" and an INFO level log will be emitted.
+Alarms may be defined to watch specific durations. Each alarm defines a high-water mark,
+``high-water-ms`` and a low-water mark, ``low-water-ms``. When the reported average value
+for duration exceeds the high-water mark, a WARN level alarm log will be emitted at which
+point the alarm is considered "triggered". Once triggered the WARN level log will be
+repeated at a specified alarm report interval, ``alarm-report-secs`` as long the reported
+average for the duration remains above the low-water mark. Once the average falls below the
+low-water mark the alarm is "cleared" and an INFO level log will be emitted.
+
+The alarm triggered WARN log will look similar to the following:
+
+::
+
+ 2024-03-20 10:22:14.030 WARN [kea-dhcp6.leases/47195.139913679886272] PERFMON_ALARM_TRIGGERED Alarm for DHCPDISCOVER.DHCPOFFER.composite-total_response.0 has been triggered since 2024-03-20 10:18:20.070000, reported average duration 00:00:00.700000 exceeds high-water-ms: 500
+
+
+and he alarm cleared INFO log will look similar to the following:
+
+::
-.. Note:
- Alarms are not yet functional.
+ 2024-03-20 10:30:14.030 INFO [kea-dhcp6.leases/47195.139913679886272] PERFMON_ALARM_CLEARED Alarm for DHCPDISCOVER.DHCPOFFER.composite-total_response.0 has been cleared, reported average duration 00:00:00.010000 is now below low-water-ms: 25
API Commands
~~~~~~~~~~~~
@@ -153,7 +258,7 @@ An example of the anticipated configuration is shown below:
Where:
* enable-monitoring
- Enables event data aggregation for reporting, statisitcs, and alarms. Defaults to false.
+ Enables event data aggregation for reporting, statistics, and alarms. Defaults to false.
* interval-width-secs
The amount of time, in seconds, that individual task durations are accumulated into an
aggregate before it is reported. Default is 60 seconds.
@@ -167,7 +272,7 @@ Where:
defined by the following:
* duration-key
- Idenitifies the monitored duration to watch
+ Identifies the monitored duration to watch
* query-type - Message type of the client query (e.g.DHCPDISCOVER, DHCPV6_REQUEST)
* response-type - Message type of the server response (e.g. DHCPOFFER, DHCPV6_REPLY)