108.20. Pattern Matcher (xm_pattern)
This module makes it possible to execute pattern matching with a pattern database file in XML format. Using xm_pattern is more efficient than having NXLog regular expression rules listed in Exec directives, because it was designed in such a way that patterns do not need to be matched linearly. Regular expression sub-capturing can be used to set additional fields in the event record and arbitrary fields can be added under the scope of a pattern match for message classification. In addition, the module does an automatic on-the-fly pattern reordering internally for further speed improvements.
There are other techniques such as the radix tree which solve the linearity problem; the drawback is that usually these require the user to learn a special syntax for specifying patterns. If the log message is already parsed and is not treated as single line of message, then it is possible to process only a subset of the patterns which partially solves the linearity problem. With other performance improvements employed within the xm_pattern module, its speed can compare to the other techniques. Yet the xm_pattern module uses regular expressions which are familiar to users and can easily be migrated from other tools.
Traditionally, pattern matching on log messages has employed a technique where the log message was one string and the pattern (regular expression or radix tree based pattern) was executed against it. To match patterns against logs which contain structured data (such as the Windows EventLog), this structured data (the fields of the log) must be converted to a single string. This is a simple but inefficient method used by many tools.
The NXLog patterns defined in the XML pattern database file can contain more than one field. This allows multi-dimensional pattern matching. Thus with NXLog’s xm_pattern module there is no need to convert all fields into a single string as it can work with multiple fields.
Patterns can be grouped together under pattern groups. Pattern groups
serve an optimization purpose. The group can have an optional
matchfield block which can check a condition. If the condition (such
as $SourceName
matches sshd
) is satisfied, the xm_pattern module
will descend into the group and check each pattern against the log. If
the pattern group’s condition did not match ($SourceName
was not
sshd
), the module can skip all patterns in the group without having
to check each pattern individually.
When the xm_pattern module finds a matching pattern, the $PatternID
and $PatternName
fields are set on the log message. These can be
used later in conditional processing and correlation rules of the
pm_evcorr module, for example.
Note
|
The xm_pattern module does not process all patterns. It exits after the
first matching pattern is found. This means that at most one pattern can
match a log message. Multiple patterns that can match the same subset of
logs should be avoided. For example, with two regular expression
patterns ^\d+ and ^\d\d , only one will be matched but not
consistently because the internal order of patterns and pattern groups
is changed dynamically by xm_pattern (patterns with the highest match
count are placed and tried first). For a strictly linearly executing
pattern matcher, see the Exec directive.
|
108.20.1. Configuration
The xm_pattern module accepts the following directives in addition to the common module directives.
- PatternFile
-
This mandatory directive specifies the name of the pattern database file.
108.20.2. Functions
The following functions are exported by xm_pattern.
- boolean
match_pattern()
-
Execute the match_pattern() procedure. If the event is successfully matched, return TRUE, otherwise FALSE.
108.20.3. Procedures
The following procedures are exported by xm_pattern.
match_pattern();
-
Attempt to match the current event according to the PatternFile. Execute statements and add fields as specified.
108.20.4. Fields
The following fields are used by xm_pattern.
$PatternID
(type: integer)-
The ID of the pattern that matched the event.
$PatternName
(type: string)-
The name of the pattern that matched the event.
108.20.5. Examples
This configuration reads Syslog messages from file and parses them with
parse_syslog(). The events are then further
processed with a pattern file and the corresponding
match_pattern() procedure to add additional
fields to SSH authentication success or failure events. The matching is done
against the $SourceName
and $Message
fields, so the Syslog parsing must be
performed before the pattern matching will work.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<Extension _syslog>
Module xm_syslog
</Extension>
<Extension pattern>
Module xm_pattern
PatternFile modules/extension/pattern/patterndb2-3.xml
</Extension>
<Input in>
Module im_file
File 'test2.log'
<Exec>
parse_syslog();
match_pattern();
</Exec>
</Input>
The following pattern database contains two patterns to match SSH
authentication messages. The patterns are under a group named ssh which
checks whether the $SourceName
field is sshd
and only tries to match the
patterns if the logs are indeed from sshd. The patterns both extract
$AuthMethod
, $AccountName
, and $SourceIP4Address
fields from the log
message when the pattern matches the log. Additionally $TaxonomyStatus
and
$TaxonomyAction
are set. The second pattern shows an
Exec block example, which is evaluated when the pattern
matches.
<?xml version='1.0' encoding='UTF-8'?>
<patterndb>
<created>2018-01-01 01:02:03</created>
<version>4</version>
<group>
<name>ssh</name>
<id>1</id>
<matchfield>
<name>SourceName</name>
<type>exact</type>
<value>sshd</value>
</matchfield>
<pattern>
<id>1</id>
<name>ssh auth success</name>
<matchfield>
<name>Message</name>
<type>regexp</type>
<value>^Accepted (\S+) for (\S+) from (\S+) port \d+ ssh2</value>
<capturedfield>
<name>AuthMethod</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>AccountName</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>SourceIP4Address</name>
<type>ip4addr</type>
</capturedfield>
</matchfield>
<set>
<field>
<name>TaxonomyStatus</name>
<value>success</value>
<type>string</type>
</field>
<field>
<name>TaxonomyAction</name>
<value>authenticate</value>
<type>string</type>
</field>
</set>
</pattern>
<pattern>
<id>2</id>
<name>ssh auth failure</name>
<matchfield>
<name>Message</name>
<type>regexp</type>
<value>^Failed (\S+) for invalid user (\S+) from (\S+) port \d+ ssh2</value>
<capturedfield>
<name>AuthMethod</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>AccountName</name>
<type>string</type>
</capturedfield>
<capturedfield>
<name>SourceIP4Address</name>
<type>ip4addr</type>
</capturedfield>
</matchfield>
<set>
<field>
<name>TaxonomyStatus</name>
<value>failure</value>
<type>string</type>
</field>
<field>
<name>TaxonomyAction</name>
<value>authenticate</value>
<type>string</type>
</field>
</set>
<exec>
$TestField = 'test';
$TestField = $Testfield + 'value';
</exec>
</pattern>
</group>
</patterndb>
This example is the same as the previous one, and uses the same pattern file, but it uses the match_pattern() function to discard any event that is not matched by the pattern file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<Extension _syslog>
Module xm_syslog
</Extension>
<Extension pattern>
Module xm_pattern
PatternFile modules/extension/pattern/patterndb2-3.xml
</Extension>
<Input in>
Module im_file
File 'test2.log'
<Exec>
parse_syslog();
if not match_pattern() drop();
</Exec>
</Input>