25.1. Parsing Various Formats
After an input module has received a log message and generated an event record for it, there may be additional parsing required. This parsing can be implemented by a dedicated module, or in the NXLog language with regular expression and other string manipulation functionality.
The following sections provide configuration examples for parsing log formats commonly used by applications.
25.1.1. Common & Combined Log Formats
The Common Log Format (or NCSA Common Log Format) and Combined Log Format are access log formats used by web servers. These are the same, except that the Combined Log Format uses two additional fields.
host ident authuser [date] "request" status size
host ident authuser [date] "request" status size "referer" "user-agent"
If a field is not available, a hyphen (-
) is used as a placeholder.
Field | Description |
---|---|
host |
IP address of the client |
ident |
RFC 1413 identity of the client |
authuser |
Username of the user accessing the document (not applicable for public documents) |
date |
Timestamp of the request |
request |
Request line received from the client |
status |
HTTP status code returned to the client |
size |
Size of the object returned to the client (measured in bytes) |
referer |
URL from which the user was referred |
user-agent |
User agent string sent by the client |
This configuration uses a regular expression to parse the fields in each record. The parsedate() function is used to convert the timestamp string into a datetime type for later processing or conversion as required.
<Input access_log>
Module im_file
File "/var/log/apache2/access.log"
<Exec>
if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/
{
$Hostname = $1;
if $2 != '-' $AccountName = $2;
$EventTime = parsedate($3);
$HTTPMethod = $4;
$HTTPURL = $5;
$HTTPResponseStatus = $6;
if $7 != '-' $FileSize = $7;
}
</Exec>
</Input>
This example is like the previous, but parses the additional two fields in the
Combined Log Format. An om_file instance is also shown here: it
discards all events not related to the user john
and writes the remaining
events to a file in JSON format.
<Extension _json>
Module xm_json
</Extension>
<Input access_log>
Module im_file
File "/var/log/apache2/access.log"
<Exec>
if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
\ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
\ \"([^\"]+)\"/
{
$Hostname = $1;
if $2 != '-' $AccountName = $2;
$EventTime = parsedate($3);
$HTTPMethod = $4;
$HTTPURL = $5;
$HTTPResponseStatus = $6;
if $7 != '-' $FileSize = $7;
if $8 != '-' $HTTPReferer = $8;
if $9 != '-' $HTTPUserAgent = $9;
}
</Exec>
</Input>
<Output out>
Module om_file
File '/var/log/john_access.log'
<Exec>
if not (defined($AccountName) and ($AccountName == 'john')) drop();
to_json();
</Exec>
</Output>
For information about using the Common and Combined Log Formats with the Apache HTTP Server, see Apache HTTP Server.
25.1.2. Field Delimited Formats (CSV)
Comma-, space-, or semicolon-separated field list formats are frequently used. The xm_csv module can both generate and parse these formats. Multiple xm_csv instances can be used to reorder, add, remove, or modify fields before outputting to a different CSV format.
This example reads from the input file and parses it with the csv1
instance. The $date
field is then set to the current time and the
$number
field is set to 0 if it is not already defined. Finally,
the csv2
instance is used to generate output with the additional
date
field, a different delimiter, and a different field order.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<Extension csv1>
Module xm_csv
Fields $id, $name, $number
FieldTypes integer, string, integer
Delimiter ,
</Extension>
<Extension csv2>
Module xm_csv
Fields $id, $number, $name, $date
Delimiter ;
</Extension>
<Input filein>
Module im_file
File "/tmp/input"
<Exec>
csv1->parse_csv();
$date = now();
if not defined $number $number = 0;
csv2->to_csv();
</Exec>
</Input>
<Output fileout>
Module om_file
File "/tmp/output"
</Output>
1, "John K.", 42 2, "Joe F.", 43
1;42;"John K.";2011-01-15 23:45:20 2;43;"Joe F.";2011-01-15 23:45:20
25.1.3. JSON
The xm_json module provides procedures for generating and parsing log data in JSON format.
This example reads JSON-formatted messages from file with the im_file module. Then the parse_json() procedure is used to parse the data, setting each JSON field to a field in the event record.
25.1.4. W3C Extended Log File Format
See the specification draft of the W3C format. The dedicated xm_w3c parser module can be used to process W3C formatted logs. See also the W3C section in the Microsoft IIS chapter.
#Version: 1.0
#Date: 2011-07-01 00:00:00
#Fields: date time cs-method cs-uri
2011-07-01 00:34:23 GET /foo/bar1.html
2011-07-01 12:21:16 GET /foo/bar2.html
2011-07-01 12:45:52 GET /foo/bar3.html
2011-07-01 12:57:34 GET /foo/bar4.html
This configuration reads the W3C format log file and parses it with the xm_w3c module. The fields in the event record are converted to JSON and the logs are forwarded via TCP.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<Extension _json>
Module xm_json
</Extension>
<Extension w3c_parser>
Module xm_w3c
</Extension>
<Input w3c>
Module im_file
File '/var/log/httpd-log'
InputType w3c_parser
</Input>
<Output tcp>
Module om_tcp
Host 192.168.12.1
Port 1514
Exec to_json();
</Output>
The W3C format can also be parsed with the xm_csv module if using NXLog Community Edition.
The following configuration reads a W3C file and tokenizes it with the CSV
parser. Header lines starting with a leading hash mark (#
) are ignored. The
$EventTime
field is set from the parsed date
and time
fields.
Note
|
The fields in the xm_csv module instance below must be updated to correspond with the fields in the W3C file to be parsed. |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<Extension w3c_parser>
Module xm_csv
Fields $date, $time, $HTTPMethod, $HTTPURL
FieldTypes string, string, string, string
Delimiter ' '
EscapeChar '"'
QuoteChar '"'
EscapeControl FALSE
UndefValue -
</Extension>
<Extension _json>
Module xm_json
</Extension>
<Input w3c>
Module im_file
File '/var/log/httpd-log'
<Exec>
if $raw_event =~ /^#/ drop();
else
{
w3c_parser->parse_csv();
$EventTime = parsedate($date + " " + $time);
}
</Exec>
</Input>
25.1.5. XML
The xm_xml module can be used for generating and parsing structured data in XML format.
This configuration uses the im_file module to read from file. Then the parse_xml() procedure parses the XML into fields in the event record.