1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
|
.\" This file was originally generated by help2man 1.36.
.TH WATCHFRR 8 "July 2010"
.SH NAME
watchfrr \- a program to monitor the status of frr daemons
.SH SYNOPSIS
.B watchfrr
.RI [ option ...]
.IR daemon ...
.br
.B watchfrr
.BR \-h " | " \-v
.SH DESCRIPTION
.B watchfrr
is a watchdog program that monitors the status of supplied frr
.IR daemon s
and tries to restart them in case they become unresponsive or shut down.
.PP
To determine whether a daemon is running, it tries to connect to the
daemon's VTY UNIX stream socket, and send echo commands to ensure the
daemon responds. When the daemon crashes, EOF is received from the socket,
so that watchfrr can react immediately.
.PP
This program can run in one of the following 5 modes:
.TP
.B Mode 0: monitor
In this mode, the program serves as a monitor and reports status changes.
.IP
Example usage: watchfrr \-d zebra ospfd bgpd
.TP
.B Mode 1: global restart
In this mode, whenever a daemon hangs or crashes, the given command is used
to restart all watched daemons.
.IP
Example usage: watchfrr \-dz \e
.br
-R '/sbin/service zebra restart; /sbin/service ospfd restart' \e
.br
zebra ospfd
.TP
.B Mode 2: individual daemon restart
In this mode, whenever a single daemon hangs or crashes, the given command
is used to restart this daemon only.
.IP
Example usage: watchfrr \-dz \-r '/sbin/service %s restart' \e
.br
zebra ospfd bgpd
.TP
.B Mode 3: phased zebra restart
In this mode, whenever a single daemon hangs or crashes, the given command
is used to restart this daemon only. The only exception is the zebra
daemon; in this case, the following steps are taken: (1) all other daemons
are stopped, (2) zebra is restarted, and (3) other daemons are started
again.
.IP
Example usage: watchfrr \-adz \-r '/sbin/service %s restart' \e
.br
\-s '/sbin/service %s start' \e
.br
\-k '/sbin/service %s stop' zebra ospfd bgpd
.TP
.B Mode 4: phased global restart for any failure
In this mode, whenever a single daemon hangs or crashes, the following
steps are taken: (1) all other daemons are stopped, (2) zebra is restarted,
and (3) other daemons are started again.
.IP
Example usage: watchfrr \-Adz \-r '/sbin/service %s restart' \e
.br
\-s '/sbin/service %s start' \e
.br
\-k '/sbin/service %s stop' zebra ospfd bgpd
.PP
Important: It is believed that mode 2 (individual daemon restart) is not
safe, and mode 3 (phased zebra restart) may not be safe with certain
routing daemons.
.PP
In order to avoid restarting the daemons in quick succession, you can
supply the
.B \-m
and
.B \-M
options to set the minimum and maximum delay between the restart commands.
The minimum restart delay is recalculated each time a restart is attempted.
If the time since the last restart attempt exceeds twice the value of
.BR \-M ,
the restart delay is set to the value of
.BR \-m ,
otherwise the interval is doubled (but capped at the value of
.BR \-M ).
.SH OPTIONS
.TP
.BR \-d ", " \-\-daemon
Run in daemon mode. When supplied, error messages are sent to Syslog
instead of standard output (stdout).
.TP
.BI \-S " directory" "\fR, \fB\-\-statedir " directory
Set the VTY socket
.I directory
(the default value is "/var/run/frr").
.TP
.BR \-e ", " \-\-no\-echo
Do not ping the daemons to test whether they respond. This option is
necessary if one or more daemons do not support the echo command.
.TP
.BI \-l " level" "\fR, \fB\-\-loglevel " level
Set the logging
.I level
(the default value is "6"). The value should range from 0 (LOG_EMERG) to 7
(LOG_DEBUG), but higher number can be supplied if extra debugging messages
are required.
.TP
.BI \-m " number" "\fR, \fB\-\-min\-restart\-interval " number
Set the minimum
.I number
of seconds to wait between invocations of the daemon restart commands (the
default value is "60").
.TP
.BI \-M " number" "\fR, \fB\-\-max\-restart\-interval " number
Set the maximum
.I number
of seconds to wait between invocations of the daemon restart commands (the
default value is "600").
.TP
.BI \-i " number" "\fR, \fB\-\-interval " number
Set the status polling interval in seconds (the default value is "5").
.TP
.BI \-t " number" "\fR, \fB\-\-timeout " number
Set the unresponsiveness timeout in seconds (the default value is "10").
.TP
.BI \-T " number" "\fR, \fB\-\-restart\-timeout " number
Set the restart (kill) timeout in seconds (the default value is "20"). If
any background jobs are still running after this period has elapsed, they
will be killed.
.TP
.BI \-r " command" "\fR, \fB\-\-restart " command
Supply a Bourne shell
.I command
to restart a single daemon. The command string should contain the '%s'
placeholder to be substituted with the daemon name.
.IP
Note that
.B \-r
and
.B \-R
options are not compatible.
.TP
.BI \-s " command" "\fR, \fB\-\-start\-command " command
Supply a Bourne shell
.I command
to start a single daemon. The command string should contain the '%s'
placeholder to be substituted with the daemon name.
.TP
.BI \-k " command" "\fR, \fB\-\-kill\-command " command
Supply a Bourne shell
.I command
to stop a single daemon. The command string should contain the '%s'
placeholder to be substituted with the daemon name.
.TP
.BR \-R ", " \-\-restart\-all
When one or more daemons are shut down, try to restart them using the
Bourne shell command supplied on the command line.
.IP
Note that
.B \-r
and
.B \-R
options are not compatible.
.TP
.BR \-z ", " \-\-unresponsive\-restart
When a daemon is in an unresponsive state, treat it as being shut down for
the restart purposes.
.TP
.BR \-a ", " \-\-all\-restart
When zebra hangs or crashes, restart all daemons taking the following
steps: (1) stop all other daemons, (2) restart zebra, and (3) start other
daemons again.
.IP
Note that this option also requires
.BR \-r ,
.BR \-s ,
and
.B \-k
options to be specified.
.TP
.BR \-A ", " \-\-always\-all\-restart
When any daemon (i.e., not just zebra) hangs or crashes, restart all
daemons taking the following steps: (1) stop all other daemons, (2) restart
zebra, and (3) start other daemons again.
.IP
Note that this option also requires
.BR \-r ,
.BR \-s ,
and
.B \-k
options to be specified.
.TP
.BI \-p " filename" "\fR, \fB\-\-pid\-file " filename
Set the process identifier
.I filename
(the default value is "/var/run/frr/watchfrr.pid").
.TP
.BI \-b " string" "\fR, \fB\-\-blank\-string " string
When the supplied
.I string
is found in any of the command line option arguments (i.e.,
.BR \-r ,
.BR \-s ,
.BR \-k ,
or
.BR \-R ),
replace it with a space.
.IP
This is an ugly hack to circumvent problems with passing the command line
arguments containing embedded spaces.
.TP
.BR \-v ", " \-\-version
Display the version information and exit.
.TP
.BR \-h ", " \-\-help
Display the usage information and exit.
.SH SEE ALSO
.BR zebra (8),
.BR bgpd (8),
.BR isisd (8),
.BR ospfd (8),
.BR ospf6d (8),
.BR ripd (8),
.BR ripngd (8)
.PP
See the project homepage at <@PACKAGE_URL@>.
.SH AUTHORS
Copyright 2004 Andrew J. Schorr
|