30. Reliable Message Delivery
Sometimes regulatory compliance or other requirements mandate that the logging infrastructure function in an ultra-reliable manner. NXLog Enterprise Edition can be configured to guarantee that:
-
log data is safe even in case of a crash,
-
no messages are lost due to intermittent network issues, and
-
there is no message duplication.
See also Using Buffers.
30.1. Crash-Safe Operation
A host or NXLog crash can happen for various reasons, including power failures without a UPS, kernel panics, and software bugs. To protect against data loss in these situations, the following techniques are implemented in NXLog Enterprise Edition.
-
Log messages are buffered in various places in NXLog, and buffered messages can be lost in the case of a crash. Persistent module message queues can be enabled so that these messages are stored on disk instead of in memory. Each log message is removed from the queue only after successful delivery. See the PersistLogqueue and SyncLogqueue global configuration directives, and the PersistLogqueue and SyncLogqueue module directives.
WarningLog message removal from queues in processor modules happens before delivery. This can result in potential data loss. Do not use processor modules when high reliability operation is required. -
Input positions (for im_file and other modules) are saved in the cache file, and by default this file is only saved to disk on shutdown. In case of a crash some events may be duplicated or lost depending on the value of the ReadFromLast directive. This data can be periodically flushed and synced to disk using the CacheFlushInterval and CacheSync directives.
In this example, the log queues are synced to disk after each successful delivery. The cache file containing the current event ID is also flushed and synced to disk after each event is read from the database. Note that these reliability features, when enabled, significantly reduce the processing speed.
30.2. Reliable Network Delivery
The TCP protocol provides guaranteed packet delivery via packet level acknowledgment. Unfortunately, if the receiver closes the TCP connection prematurely while messages are being transmitted, unsent data stored in the socket buffers will be lost since this is handled by the operating system instead of the application (NXLog). This can result in message loss and affects im_tcp, om_tcp, im_ssl, and om_ssl. See the diagram in All Buffers in a Basic Route.
The solution to this unreliability in the TCP protocol is application-level acknowledgment. NXLog provides two pairs of modules for this purpose.
-
NXLog can use the HTTP/HTTPS protocol to provide guaranteed message delivery over the network, optionally with TLS/SSL. The client (om_http) sends the event in a HTTP POST request. The server (im_http, only available in NXLog Enterprise Edition) responds with a status code indicating successful message reception.
Example 152. HTTPS Log TransferIn the following configuration example, a client reads logs from a file and transmits the logs over an SSL-secured HTTP connection.
1 2 3 4 5 6 7 8 9 10 11 12
<Input in> Module im_file File 'input.log' </Input> <Output out> Module om_http URL https://10.0.0.1:8080/ HTTPSCertFile %CERTDIR%/client-cert.pem HTTPSCertKeyFile %CERTDIR%/client-key.pem HTTPSCAFile %CERTDIR%/ca.pem </Output>
The server side accepts the HTTPS connections and stores the received messages in a file. The contents of
input.log
will be replicated inoutput.log
. -
The om_batchcompress and im_batchcompress modules, available in NXLog Enterprise Edition, also provide acknowledgment as part of the batchcompress protocol.
Example 153. Batched Log TransferWith the following configuration, a client reads logs from a file and transmits the logs in compressed batches to a remote NXLog agent.
1 2 3 4 5 6 7 8 9 10 11 12 13
<Input in> Module im_file File 'input.log' </Input> <Output out> Module om_batchcompress Host 10.0.0.1 UseSSL true CertFile %CERTDIR%/client-cert.pem CertKeyFile %CERTDIR%/client-key.pem CAFile %CERTDIR%/ca.pem </Output>
The remote NXLog agent receives and decompresses the received message batches and stores the individual messages in a file. The contents of
input.log
will be replicated inoutput.log
.
30.3. Protection Against Duplication
If the contents of the cache file (which stores the event position) are lost, the module can either read everything from the beginning or risk losing some messages. In the former case, messages may be duplicated. When using persistent queues, after the output module delivers the message it removes the message from the queue. If the crash occurs just before the removal, the message will be sent again (a duplicate) after the restart.
In some cases it may be very important that a log message not be duplicated. For example, a duplicated message may trigger the same alarm a second time or cause an extra entry in a financial transaction log. NXLog Enterprise Edition can be configured to prevent duplicate messages from occurring.
The best way to prevent duplicated messages is with a serial number, as it is only possible to detect duplicates at the receiver. The receiver can keep track of what has been received by storing the serial number of the last message. If a message is received with the same or a lower serial number from the same source, the message is simply discarded.
In NXLog Enterprise Edition, duplication prevention works as follows.
-
Each module which receives a message directly from an input source or from another module in the route assigns a field named
$__SERIAL__$
with a monotonically increasing serial number. The serial number is taken from a global generator and is increased after each fetch so that two messages received at two modules simultaneously will not have the same serial number. The serial number is initialized to the seconds elapsed since epoch when NXLog is started. This way it can provide 1,000,000 serial numbers per second without problems in case it is stopped and restarted. Otherwise the value would need to be saved and synced to disk as well after each serial number fetch and again this would result in a huge performance hit. When a module receives a message it checks the value of the field named$__SERIAL__$
against the last saved value. -
The im_http module keeps the value of the last
$__SERIAL__$
for each client. It is only possible to know and identify the client (om_http sender) in HTTPS mode. The Common Name (CN) in the certificate subject is used and is assumed to uniquely identify the client.NoteThe remote IP and port number cannot be used to identify the remote sender because the remote port is assigned dynamically and changes for every connection. Thus if a client sends a message, disconnects, reconnects, and then sends the same message again, it is impossible to know if this is the same client or another. For this reason it is not possible to protect against message duplication with plain TCP or HTTP when multiple clients connect from the same IP. The im_ssl and im_batchcompress modules do not have the certificate subject extraction implemented at this time. -
All other non-network modules use the value of SourceModuleName which is automatically set to the name of the module instance generating the log message. This value is assumed to uniquely identify the source. The value of SourceModuleName is not overwritten if it already exists. Note that this may present problems in some complex setups.
-
The algorithm is implemented in one procedure call named duplicate_guard(), which can be used in modules to prevent message duplication. The dropped() function can be then used to test whether the current log message has been dropped.
The following client and server configuration examples extend the earlier HTTPS example to provide an ultra-reliable operation where messages cannot be lost locally due to a crash, lost over the network, or duplicated.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
PersistLogqueue TRUE
SyncLogqueue TRUE
CacheFlushInterval always
CacheSync TRUE
<Input in>
Module im_file
File 'input.log'
</Input>
<Output out>
Module om_http
URL https://10.0.0.1:8080/
HTTPSCertFile %CERTDIR%/client-cert.pem
HTTPSCertKeyFile %CERTDIR%/client-key.pem
HTTPSCAFile %CERTDIR%/ca.pem
Exec duplicate_guard();
</Output>
The server side accepts the HTTPS connections and stores the received
messages in a file. The contents of input.log
will be replicated in
output.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
PersistLogqueue TRUE
SyncLogqueue TRUE
CacheFlushInterval always
CacheSync TRUE
<Input in>
Module im_http
ListenAddr 0.0.0.0
Port 8080
HTTPSCertFile %CERTDIR%/server-cert.pem
HTTPSCertKeyFile %CERTDIR%/server-key.pem
HTTPSCAFile %CERTDIR%/ca.pem
Exec duplicate_guard();
</Input>
<Output out>
Module om_file
File 'output.log'
Exec duplicate_guard();
</Output>