Return to
Portfolio

25.1. Parsing Various Formats

After an input module has received a log message and generated an event record for it, there may be additional parsing required. This parsing can be implemented by a dedicated module, or in the NXLog language with regular expression and other string manipulation functionality.

The following sections provide configuration examples for parsing log formats commonly used by applications.

25.1.1. Common & Combined Log Formats

The Common Log Format (or NCSA Common Log Format) and Combined Log Format are access log formats used by web servers. These are the same, except that the Combined Log Format uses two additional fields.

Common Log Format Syntax
host ident authuser [date] "request" status size
Combined Log Format Syntax
host ident authuser [date] "request" status size "referer" "user-agent"

If a field is not available, a hyphen (-) is used as a placeholder.

Table 24. Fields
Field Description

host

IP address of the client

ident

RFC 1413 identity of the client

authuser

Username of the user accessing the document (not applicable for public documents)

date

Timestamp of the request

request

Request line received from the client

status

HTTP status code returned to the client

size

Size of the object returned to the client (measured in bytes)

referer

URL from which the user was referred

user-agent

User agent string sent by the client

Example 56. Parsing the Common Log Format

This configuration uses a regular expression to parse the fields in each record. The parsedate() function is used to convert the timestamp string into a datetime type for later processing or conversion as required.

nxlog.conf
<Input access_log>
    Module  im_file
    File    "/var/log/apache2/access.log"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)/
        {
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3);
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
        }
    </Exec>
</Input>
Example 57. Parsing the Combined Log Format

This example is like the previous, but parses the additional two fields in the Combined Log Format. An om_file instance is also shown here: it discards all events not related to the user john and writes the remaining events to a file in JSON format.

nxlog.conf
<Extension _json>
    Module  xm_json
</Extension>

<Input access_log>
    Module  im_file
    File    "/var/log/apache2/access.log"
    <Exec>
        if $raw_event =~ /(?x)^(\S+)\ \S+\ (\S+)\ \[([^\]]+)\]\ \"(\S+)\ (.+)
                          \ HTTP\/\d\.\d\"\ (\S+)\ (\S+)\ \"([^\"]+)\"
                          \ \"([^\"]+)\"/
        {
            $Hostname = $1;
            if $2 != '-' $AccountName = $2;
            $EventTime = parsedate($3);
            $HTTPMethod = $4;
            $HTTPURL = $5;
            $HTTPResponseStatus = $6;
            if $7 != '-' $FileSize = $7;
            if $8 != '-' $HTTPReferer = $8;
            if $9 != '-' $HTTPUserAgent = $9;
        }
    </Exec>
</Input>

<Output out>
    Module  om_file
    File    '/var/log/john_access.log'
    <Exec>
        if not (defined($AccountName) and ($AccountName == 'john')) drop();
        to_json();
    </Exec>
</Output>

For information about using the Common and Combined Log Formats with the Apache HTTP Server, see Apache HTTP Server.

25.1.2. Field Delimited Formats (CSV)

Comma-, space-, or semicolon-separated field list formats are frequently used. The xm_csv module can both generate and parse these formats. Multiple xm_csv instances can be used to reorder, add, remove, or modify fields before outputting to a different CSV format.

Example 58. Complex CSV Format Conversion

This example reads from the input file and parses it with the csv1 instance. The $date field is then set to the current time and the $number field is set to 0 if it is not already defined. Finally, the csv2 instance is used to generate output with the additional date field, a different delimiter, and a different field order.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<Extension csv1>
    Module      xm_csv
    Fields      $id, $name, $number
    FieldTypes  integer, string, integer
    Delimiter   ,
</Extension>

<Extension csv2>
    Module      xm_csv
    Fields      $id, $number, $name, $date
    Delimiter   ;
</Extension>

<Input filein>
    Module      im_file
    File        "/tmp/input"
    <Exec>
        csv1->parse_csv();
        $date = now();
        if not defined $number $number = 0;
        csv2->to_csv();
    </Exec>
</Input>

<Output fileout>
    Module      om_file
    File        "/tmp/output"
</Output>
Input Sample
1, "John K.", 42
2, "Joe F.", 43
Output Sample
1;42;"John K.";2011-01-15 23:45:20
2;43;"Joe F.";2011-01-15 23:45:20

25.1.3. JSON

The xm_json module provides procedures for generating and parsing log data in JSON format.

Example 59. Using the xm_json Module for Parsing JSON

This example reads JSON-formatted messages from file with the im_file module. Then the parse_json() procedure is used to parse the data, setting each JSON field to a field in the event record.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
<Extension _json>
    Module  xm_json
</Extension>

<Input in>
    Module  im_file
    File    "/var/log/app.json"
    Exec    parse_json();
</Input>
Example 60. Using the xm_json Module for Generating JSON

Here, the to_json() procedure is used to write all the event record fields to $raw_event in JSON format. This is then written to file using the om_file module.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
<Extension _json>
    Module  xm_json
</Extension>

<Output out>
    Module  om_file
    File    "/var/log/json.log"
    Exec    to_json();
</Output>

25.1.4. W3C Extended Log File Format

See the specification draft of the W3C format. The dedicated xm_w3c parser module can be used to process W3C formatted logs. See also the W3C section in the Microsoft IIS chapter.

Log Sample
#Version: 1.0
#Date: 2011-07-01 00:00:00
#Fields: date time cs-method cs-uri
2011-07-01 00:34:23 GET /foo/bar1.html
2011-07-01 12:21:16 GET /foo/bar2.html
2011-07-01 12:45:52 GET /foo/bar3.html
2011-07-01 12:57:34 GET /foo/bar4.html
Example 61. Parsing W3C Format With xm_w3c

This configuration reads the W3C format log file and parses it with the xm_w3c module. The fields in the event record are converted to JSON and the logs are forwarded via TCP.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<Extension _json>
    Module      xm_json
</Extension>

<Extension w3c_parser>
    Module      xm_w3c
</Extension>

<Input w3c>
    Module      im_file
    File        '/var/log/httpd-log'
    InputType   w3c_parser
</Input>

<Output tcp>
    Module      om_tcp
    Host        192.168.12.1
    Port        1514
    Exec        to_json();
</Output>

The W3C format can also be parsed with the xm_csv module if using NXLog Community Edition.

Example 62. Parsing W3C Format With xm_csv

The following configuration reads a W3C file and tokenizes it with the CSV parser. Header lines starting with a leading hash mark (#) are ignored. The $EventTime field is set from the parsed date and time fields.

Note
The fields in the xm_csv module instance below must be updated to correspond with the fields in the W3C file to be parsed.
nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<Extension w3c_parser>
    Module          xm_csv
    Fields          $date, $time, $HTTPMethod, $HTTPURL
    FieldTypes      string, string, string, string
    Delimiter       ' '
    EscapeChar      '"'
    QuoteChar       '"'
    EscapeControl   FALSE
    UndefValue      -
</Extension>

<Extension _json>
    Module          xm_json
</Extension>

<Input w3c>
    Module          im_file
    File            '/var/log/httpd-log'
    <Exec>
        if $raw_event =~ /^#/ drop();
        else
        {
            w3c_parser->parse_csv();
            $EventTime = parsedate($date + " " + $time);
        }
    </Exec>
</Input>

25.1.5. XML

The xm_xml module can be used for generating and parsing structured data in XML format.

Example 63. Using the xm_xml Module for Parsing XML

This configuration uses the im_file module to read from file. Then the parse_xml() procedure parses the XML into fields in the event record.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
<Extension _xml>
    Module  xm_xml
</Extension>

<Input in>
    Module  im_file
    File    "/var/log/app.xml"
    Exec    parse_xml();
</Input>
Example 64. Using the xm_xml Module for Generating XML

Here, the fields in the event record are used by the to_xml() procedure to generate XML, which is then written to file by the om_file module.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
<Extension _xml>
    Module  xm_xml
</Extension>

<Output out>
    Module  om_file
    File    "/var/log/logs.xml"
    Exec    to_xml();
</Output>