NXLog User Guide

108.6. Character Set Conversion (xm_charconv)

This module provides tools for converting strings between different character sets (codepages). All the encodings available to iconv are supported. See iconv -l for a list of encoding names.

108.6.1. Configuration

The xm_charconv module accepts the following directives in addition to the common module directives.

AutodetectCharsets: This optional directive accepts a comma-separated list of character set names. When auto is specified as the source encoding for convert() or convert_fields(), these character sets will be tried for conversion. This directive is not related to the LineReader directive or the corresponding InputType registered by the module.

BigEndian: This optional boolean directive specifies the endianness to use during the encoding conversion. If this directive is not specified, it defaults to the host’s endianness. This directive only affects the registered InputType and is only applicable if the LineReader directive is set to a non-Unicode encoding and CharBytes is set to 2 or 4.

CharBytes: This optional integer directive specifies the byte-width of the encoding to use during conversion. Acceptable values are 1 (the default), 2, and 4. Most variable width encodings will work with the default value. This directive only affects the registered InputType and is only applicable if the LineReader directive is set to a non-Unicode encoding.

LineReader: If this optional directive is specified with an encoding, an InputType will be registered using the name of the xm_charconv module instance. The following Unicode encodings are supported: UTF-8, UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, and UTF-7. For other encodings, it may be necessary to also set BigEndian and/or CharBytes.

108.6.2. Functions

The following functions are exported by xm_charconv.

string convert(string source, string srcencoding, string dstencoding): Convert the source string to the encoding specified in dstencoding from srcencoding. The srcencoding argument can be set to auto to request auto detection.

108.6.3. Procedures

The following procedures are exported by xm_charconv.

convert_fields(string srcencoding, string dstencoding);: Convert all string type fields of a log message from srcencoding to dstencoding. The srcencoding argument can be set to auto to request auto detection.

108.6.4. Examples

Example 488. Character set auto-detection of various input encodings

This configuration shows an example of character set auto-detection. The input file can contain differently encoded lines, and the module normalizes output to UTF-8.

nxlog.conf


  1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

  <Extension charconv>
    Module              xm_charconv
    AutodetectCharsets  utf-8, euc-jp, utf-16, utf-32, iso8859-2
</Extension>

<Input filein>
    Module              im_file
    File                "tmp/input"
    Exec                convert_fields("auto", "utf-8");
</Input>

<Output fileout>
    Module              om_file
    File                "tmp/output"
</Output>

<Route r>
    Path                filein => fileout
</Route>

Example 489. Registering and Using an InputType

This configuration uses the InputType registered via the LineReader directive to read a file with the ISO-8859-2 encoding.

nxlog.conf


  1
2
3
4
5
6
7
8
9
10

  <Extension charconv>
    Module      xm_charconv
    LineReader  ISO-8859-2
</Extension>

<Input in>
    Module      im_file
    File        'modules/extension/charconv/iso-8859-2.in'
    InputType   charconv
</Input>