Return to
Portfolio

125. Amazon S3

This add-on can be downloaded from the nxlog-public/contrib repository according the license and terms specified there.

NXLog can both receive events from and send events to Amazon S3 cloud storage. The NXLog Python modules for input and output (im_python and om_python) are used for this, as well as Boto3, the AWS SDK for Python. For more information about Boto3, see AWS SDK for Python (Boto3) on Amazon AWS.

125.1. Setting Up Boto3

  1. Boto3 can be installed with pip or the system package manager.

    • pip: pip install boto3

    • APT on a Debian-based distribution: apt-get install python-boto3

    • Yum on a Red Hat-based distribution: yum install python2-boto3

      Note
      The python2-boto3 package requires the installation of the EPEL repository.
  2. Make sure an AWS service account has been created.

  3. Set the default region and credentials in ~/.aws/. This can be done interactively, if the AWS CLI is installed. Or, edit the files shown below. Credentials for the AWS account can be found in the IAM Console. A new user can be created, or an existing user can be used. Go to "manage access keys" and generate a new set of keys. More information about the initial setup and the credentials can be found in the Boto3 Quickstart and Credentials documents.

    ~/.aws/config
    [default]
    region=eu-central-1
    ~/.aws/credentials
    [default]
    aws_access_key_id = YOUR_ACCESS_KEY
    aws_secret_access_key = YOUR_SECRET_KEY
    Note
    The region and credential configuration can also be hardcoded in the scripts, but this is not recommended.

125.2. AWS S3 Buckets, Objects, Keys, and Structure

Amazon S3 stores objects inside containers called buckets. There is a finite number of buckets available to the user and an infinite number of objects can be stored. More general information about Amazon S3 can be found at Getting Started with Amazon Simple Storage Service on Amazon AWS.

Both the input and output Python scripts interact with a single bucket on Amazon S3. The scripts will not create, delete, or alter the bucket or any of its properties, permissions, or management options. It is the responsibility of the user to create the bucket, provide the appropriate permissions (ACL), and further configure any lifecycle, replication, encryption, or other options. Similarly, the scripts do not alter the storage class of the objects stored or any other properties or permissions.

We selected a schema where we store events on a single bucket and each object has a key that references the server (or service) name, the date, and the event received time. Though Amazon S3 uses a flat structure to store objects, objects with similar key prefixes are grouped together resembling the structure of a file system. The following is a visual representation of the naming scheme used. Note that the key name in the deepest level represents a time—however, Amazon S3 uses the colon (:) as a special character and to avoid escaping we selected the dot (.) character to substitute it.

  • MYBUCKET/

    • SERVER01/

      • 2018-05-17/

        • 12.36.34.1

        • 12.36.35.1

      • 2018-05-18/

        • 10.46.34.1

        • 10.46.35.1

        • 10.46.35.2

        • 10.46.36.1

    • SERVER02/

      • 2018-05-16/

        • 14.23.12.1

      • 2018-05-17/

        • 17.03.52.1

        • 17.03.52.2

        • 17.03.52.3

125.3. Sending Events to S3

Events can be sent to Amazon S3 cloud object storage as follows.

Events are stored in the Amazon S3 bucket with object key names comprised from the server name, date in YYYY-MM-DD format, time in HH.MM.SS format, and a counter (since multiple events can be received during the same second).

  1. Copy the s3_write.py script to a location that is accessible by NXLog.

  2. Edit the BUCKET and SERVER variables in the code.

  3. Configure NXLog with an om_python instance.

Example 621. Sending Events From File to S3

This configuration reads raw events from a file with im_file and uses om_python to forward them, without any additional processing, to the configured S3 storage.

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<Input file>
    Module          im_file
    File            "input.log"
    # These may be helpful for testing
    SavePos         FALSE
    ReadFromLast    FALSE
</Input>

<Output s3>
    Module          om_python
    PythonCode      s3_write.py
</Output>

<Route file_to_s3>
    Path            file => s3
</Route>

125.4. Retrieving Events From S3

Events can be retrieved from Amazon S3 cloud object storage as follows.

The script keeps track of the last object retrieved from Amazon S3 by means of a file called lastkey.log, which is stored locally. Even in the event of an abnormal termination, the script will continue from where it stopped. The lastkey.log file can be deleted to reset that behavior (or edited if necessary).

  1. Copy the s3_read.py script to a location that is accessible by NXLog.

  2. Edit the BUCKET, SERVER, and POLL_INTERVAL variables in the code. The POLL_INTERVAL is the time the script will wait before checking again for new events. The MAXKEYS variable should be fine in all cases with the default value of 1000 keys.

  3. Configure NXLog with an im_python instance.

Example 622. Reading Events From S3 and Saving to File

This configuration collects events from the configured S3 storage with im_python and writes the raw events to file with om_file (without performing any additional processing).

nxlog.conf [Download file]
1
2
3
4
5
6
7
8
9
10
11
12
13
<Input s3>
    Module      im_python
    PythonCode  s3_read.py
</Input>

<Output file>
    Module      om_file
    File        "output.log"
</Output>

<Route s3_to_file>
    Path        s3 => file
</Route>

125.4.1. Serialization and Compression

In the previous examples, only the $raw_event field was stored in the objects. An easy way to store more than one field is to "pickle" (or "serialize" or "marshal") all the fields of an event.

Pickling Events [Download file]
import pickle

all = {}
for field in event.get_names():
     all.update({field: event.get_field(field)})

newraw = pickle.dumps(all)

client.put_object(Body=newraw, Bucket=BUCKET, Key=key)

Compressing the events with gzip is also possible.

Compressing Events With gzip [Download file]
import StringIO
import gzip

out = StringIO.StringIO()
with gzip.GzipFile(fileobj=out, mode="w") as f:
     f.write(newraw)

gzallraw = out.getvalue()

client.put_object(Body=gzallraw, Bucket=BUCKET, Key=key)