25.5. Detecting a Dead Agent or Log Source
It is a common requirement to detect conditions when there are no log messages coming from a source. This usually indicates a problem such as a broken network connection, a server down, or a stuck application or system service. Usually this problem should be detected by monitoring tools (such as Nagios or OpenView), but the absence of logs can also be a good reason to investigate.
Note
|
The im_mark module is related: it can generate messages periodically in order to show that the agent is still functioning. |
The solution to this problem is the combined use of statistical counters and Scheduled checks. The input module can update a statistical counter configured to calculate events per hour. In the same input module a Schedule block checks the value of the statistical counter periodically. When the event rate is zero or drops below a certain limit, an appropriate action can be executed such as sending out an alert email or generating an internal warning message. Note that there are other ways to solve this issue and this method may not be optimal for all situations.
The following configuration example creates a statistical counter in the context of the im_tcp module to calculate the number of events received per hour. The Schedule block within the context of the same module checks the value of the msgrate statistical counter and generates an internal error message when there were no logs received in the past hour.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<Input in>
Module im_tcp
Port 2345
<Exec>
create_stat("msgrate", "RATE", 3600);
add_stat("msgrate", 1);
</Exec>
<Schedule>
Every 3600 sec
<Exec>
create_stat("msgrate", "RATE", 10);
add_stat("msgrate", 0);
if defined get_stat("msgrate") and get_stat("msgrate") <= 1
log_error("No messages received from the source!");
</Exec>
</Schedule>
</Input>