Centralized log collection, log aggregation, or log centralization is the process of sending event log data to a dedicated server or service for storage and optionally search and analytics. Storing logs on a centralized system provides several benefits versus storing the data locally.
Event data can be accessed even if the originating server is offline, compromised, or decommissioned.
Data can be analyzed and correlated across more than one system.
It is more difficult for malicious actors to remove evidence from logs that have already been forwarded.
Incident investigation and auditing is easier, as all event data is collected in one location.
Scalable, high-availability, and redundancy solutions are easier to implement and maintain because they can be implemented at the point of the collection server.
Compliance with internal and external standards for log data retention can be managed at a single point.
The following diagram shows a simple centralized log architecture. The single central server collects logs from servers, applications, and network devices. After collection, the logs can be forwarded as required for further analysis or storage.
This chapter is concerned with the left half of the diagram: collecting logs from clients.
In practice, network topology and other requirements may dictate that additional servers such as relays be added for log handling. For those cases, other functionality may be necessary than what is covered here (such as buffering).
In the context of clients generating logs, NXLog supports both "agent-based" and "agent-less" log collection, and it is possible to set up a system to work in mixed mode. In brief, these modes differ as follows (see the Log Processing Modes section for more details).
Agent-based log collection requires that an NXLog agent be installed on the client. With a local agent, collection is much more flexible, providing features such as filtering on the source system to send only the required data, format conversion, compression, encryption, and delivery reliability, among others. It is generally recommended that NXLog be deployed as an agent wherever possible.
With agent-based log collection, NXLog agents are installed on both the client and the central server. Here, the im_batchcompress and om_batchcompress modules are used to transport logs both compressed and encrypted. These modules preserve all the fields in the event record.
1 2 3 4 5 6 7 8 9 <Output batch> Module om_batchcompress Host 192.168.56.101 Port 2514 UseSSL TRUE CAFile /opt/openssl_rootca/rootCA.pem CertFile /opt/openssl_server/server.crt CertKeyFile /opt/openssl_server/server.key </Output>
In agent-less mode, there is no NXLog agent installed on the client. Instead, the client forwards events to the central server in a native format. On the central server, NXLog accepts and parses the logs received. Often there is limited control over the log format used, and it may not be possible to implement encryption, compression, delivery reliability, or other features.
With agent-less collection, NXLog is installed on the central server but not on the client. Clients can be configured to send UDP Syslog messages to the central server using their native logging functionality. The im_udp module below could be replaced im_tcp or im_ssl according to what protocol is supported by the clients.
|UDP transport does not provide any guarantee of delivery. Network congestion or other issues may result in lost log data.|
It is common for logs to be collected using a mix of different modes among the various clients, network devices, relays, and log servers in a network. For example, an NXLog relay may be configured to collect logs from both agents and agent-less sources and do filtering and processing before forwarding the data to a central server.
Various logging requirements may dictate particular details about the chosen logging architecture. The following are important things to consider when deciding how to set up centralized log collection. In some cases, these requirements can only be met when using agent-based collection.
UDP does not guarantee message delivery, and should be avoided if log data loss is not acceptable. Instead, TCP (and therefore, TLS) offers guaranteed packet delivery. In addition, with agent-based collection, NXLog can provide application-level guaranteed delivery. See Reliable Network Delivery for more information.
- Structured data
Correlating data across multiple log sources requires parsing event data into a common set of fields. Event fields are a core part of NXLog processing, and an NXLog agent can be configured to parse events at any point along the log path. Often, parsing is done as early as possible (at the source, for agent-based collection) to simplify later classifying as reduce processing load on log servers as logs are transported. See Parsing Various Formats and Message Classification.
To maintain confidentiality of log data, TLS can be used during transport.
If bandwidth is a concern, log data compression may be desirable. Event data is normally quite compressible, allowing bandwidth requirements to be reduced significantly. The im_batchcompress and om_batchcompress modules provide a batched, compressed transport for log data between NXLog agents.
- Storage format
Normally, data should be converted to and stored in a unified format in case of heterogeneous logs sources.
When using agent-based collection, it is often desirable to convert the data prior to transfer. In this case, structured data is often sent using one of these formats.
- Batch compression modules
- NXLog binary format
NXLog has its own binary format (see Binary InputType and Binary OutputType) that retains all the fields of an event and can be used to send logs via TCP, UDP, or TLS (or with other stream-oriented modules).
JSON is easy to generate and parse and has become a de-facto standard for logging as well. It has some limitations such as the missing datetime format. See the JSON section.
Agent-less collection is restricted to formats supported by the clients. The following are a few common formats, but many more are supported. See also the OS Support chapters.
Using Syslog has become a common practice and many SIEM vendors and products support (and may even require) Syslog. See the Syslog chapter for more details. Syslog contains free form message data that typically needs to be parsed to extract more information for further analysis. Syslog often uses UDP, TCP, or TLS as transport.
The Snare format is commonly used to transport Windows EventLog, with or without Syslog headers.
- Windows Event Forwarding (WEF)
Windows EventLog can be forwarded over HTTPS with Windows Event Forwarding. See the Windows Event Log chapter.