Return to
Portfolio

108.7. Delimiter-Separated Values (xm_csv)

This module provides functions and procedures for working with data formatted as comma-separated values (CSV). CSV input can be parsed into fields and CSV output can be generated. Delimiters other than the comma can be used also.

The pm_transformer module provides a simple interface to parse and generate CSV format, but the xm_csv module exports an API that can be used to solve more complex tasks involving CSV formatted data.

Note
It is possible to use more than one xm_csv module instance with different options in order to support different CSV formats at the same time. For this reason, functions and procedures exported by the module are public and must be referenced by the module instance name.

108.7.1. Configuration

The xm_csv module accepts the following directives in addition to the common module directives. The Fields directive is required.

Fields

This mandatory directive accepts a comma-separated list of fields which will be filled from the input parsed. Field names with or without the dollar sign ($) are accepted. The fields will be stored as strings unless their types are explicitly specified with the FieldTypes directive.


Delimiter

This optional directive takes a single character (see below) as argument to specify the delimiter character used to separate fields. The default delimiter character is the comma (,). Note that there is no delimiter after the last field.

EscapeChar

This optional directive takes a single character (see below) as argument to specify the escape character used to escape special characters. The escape character is used to prefix the following characters: the escape character itself, the quote character, and the delimiter character. If EscapeControl is TRUE, the newline (\n), carriage return (\r), tab (\t), and backspace (\b) control characters are also escaped. The default escape character is the backslash character (\).

EscapeControl

If this optional boolean directive is set to TRUE, control characters are also escaped. See the EscapeChar directive for details. The default is TRUE: control characters are escaped. Note that this is necessary to allow single line CSV field lists which contain line-breaks.

FieldTypes

This optional directive specifies the list of types corresponding to the field names defined in Fields. If specified, the number of types must match the number of field names specified with Fields. If this directive is omitted, all fields will be stored as strings. This directive has no effect on the fields-to-CSV conversion.

QuoteChar

This optional directive takes a single character (see below) as argument to specify the quote character used to enclose fields. If QuoteOptional is TRUE, then only string type fields are quoted. The default is the double-quote character (").

QuoteMethod

This optional directive can take the following values:

All

All fields will be quoted.

None

Nothing will be quoted. This can be problematic if a field value (typically text that can contain any character) contains the delimiter character. Make sure that this is escaped or replaced with something else.

String

Only string type fields will be quoted. This has the same effect as QuoteOptional set to TRUE and is the default behavior if the QuoteMethod directive is not specified.

Note that this directive only effects CSV generation when using to_csv(). The CSV parser can automatically detect the quotation.

QuoteOptional

This directive has been deprecated in favor of QuoteMethod, which should be used instead.

StrictMode

If this optional boolean directive is set to TRUE, the CSV parser will fail to parse CSV lines that do not contain the required number of fields. When this is set to FALSE and the input contains fewer fields than specified in Field, the rest of the fields will be unset. The default value is FALSE.

UndefValue

This optional directive specifies a string which will be treated as an undefined value. This is particularly useful when parsing the W3C format where the dash (-) marks an omitted field.

108.7.1.1. Specifying Quote, Escape, and Delimiter Characters

The QuoteChar, EscapeChar, and Delimiter directives can be specified in several ways.

Unquoted single character

Any printable character can be specified as an unquoted character, except for the backslash (\):

Delimiter ;
Control characters

The following non-printable characters can be specified with escape sequences:

\a

audible alert (bell)

\b

backspace

\t

horizontal tab

\n

newline

\v

vertical tab

\f

formfeed

\r

carriage return

For example, to use TAB delimiting:

Delimiter \t
A character in single quotes

The configuration parser strips whitespace, so it is not possible to define a space as the delimiter unless it is enclosed within quotes:

Delimiter ' '

Printable characters can also be enclosed:

Delimiter ';'

The backslash can be specified when enclosed within quotes:

Delimiter '\'
A character in double quotes

Double quotes can be used like single quotes:

Delimiter " "

The backslash can be specified when enclosed within double quotes:

Delimiter "\"
A hexadecimal ASCII code

Hexadecimal ASCII character codes can also be used by prepending 0x. For example, the space can be specified as:

Delimiter 0x20

This is equivalent to:

Delimiter " "

108.7.2. Functions

The following functions are exported by xm_csv.

string to_csv()

Convert the specified fields to a single CSV formatted string.

108.7.3. Procedures

The following procedures are exported by xm_csv.

parse_csv();

Parse the $raw_event field as CSV input.

parse_csv(string source);

Parse the given string as CSV format.

to_csv();

Format the specified fields as CSV and put this into the $raw_event field.

108.7.4. Examples

Example 490. Complex CSV Format Conversion

This example shows that the xm_csv module can not only parse and create CSV formatted input and output, but with multiple xm_csv modules it is also possible to reorder, add, remove, or modify fields before outputting to a different CSV format.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<Extension csv1>
    Module      xm_csv
    Fields      $id, $name, $number
    FieldTypes  integer, string, integer
    Delimiter   ,
</Extension>

<Extension csv2>
    Module      xm_csv
    Fields      $id, $number, $name, $date
    Delimiter   ;
</Extension>

<Input in>
    Module      im_file
    File        "tmp/input"
    <Exec>
        csv1->parse_csv();
        $date = now();
        if not defined $number $number = 0;
        csv2->to_csv();
    </Exec>
</Input>

<Output out>
    Module      om_file
    File        "tmp/output"
</Output>
Input Sample
1, "John K.", 42
2, "Joe F.", 43
Output Sample
1;42;"John K.";2011-01-15 23:45:20
2;43;"Joe F.";2011-01-15 23:45:20