125. Amazon S3
NXLog can both receive events from and send events to Amazon S3 cloud storage. The NXLog Python modules for input and output (im_python and om_python) are used for this, as well as Boto3, the AWS SDK for Python. For more information about Boto3, see AWS SDK for Python (Boto3) on Amazon AWS.
125.1. Setting Up Boto3
-
Boto3 can be installed with pip or the system package manager.
-
pip:
pip install boto3
-
APT on a Debian-based distribution:
apt-get install python-boto3
-
Yum on a Red Hat-based distribution:
yum install python2-boto3
NoteThe python2-boto3
package requires the installation of the EPEL repository.
-
-
Make sure an AWS service account has been created.
-
Set the default region and credentials in
~/.aws/
. This can be done interactively, if the AWS CLI is installed. Or, edit the files shown below. Credentials for the AWS account can be found in the IAM Console. A new user can be created, or an existing user can be used. Go to "manage access keys" and generate a new set of keys. More information about the initial setup and the credentials can be found in the Boto3 Quickstart and Credentials documents.~/.aws/config[default] region=eu-central-1
~/.aws/credentials[default] aws_access_key_id = YOUR_ACCESS_KEY aws_secret_access_key = YOUR_SECRET_KEY
NoteThe region and credential configuration can also be hardcoded in the scripts, but this is not recommended.
125.2. AWS S3 Buckets, Objects, Keys, and Structure
Amazon S3 stores objects inside containers called buckets. There is a finite number of buckets available to the user and an infinite number of objects can be stored. More general information about Amazon S3 can be found at Getting Started with Amazon Simple Storage Service on Amazon AWS.
Both the input and output Python scripts interact with a single bucket on Amazon S3. The scripts will not create, delete, or alter the bucket or any of its properties, permissions, or management options. It is the responsibility of the user to create the bucket, provide the appropriate permissions (ACL), and further configure any lifecycle, replication, encryption, or other options. Similarly, the scripts do not alter the storage class of the objects stored or any other properties or permissions.
We selected a schema where we store events on a single bucket and each object
has a key that references the server (or service) name, the date, and the
event received time. Though Amazon S3 uses a flat structure to store objects,
objects with similar key prefixes are grouped together resembling the
structure of a file system. The following is a visual representation of the
naming scheme used. Note that the key name in the deepest level represents a
time—however, Amazon S3 uses the colon (:
) as a special character and to
avoid escaping we selected the dot (.
) character to substitute it.
-
MYBUCKET/
-
SERVER01/
-
2018-05-17/
-
12.36.34.1
-
12.36.35.1
-
-
2018-05-18/
-
10.46.34.1
-
10.46.35.1
-
10.46.35.2
-
10.46.36.1
-
-
-
SERVER02/
-
2018-05-16/
-
14.23.12.1
-
-
2018-05-17/
-
17.03.52.1
-
17.03.52.2
-
17.03.52.3
-
-
-
125.3. Sending Events to S3
Events can be sent to Amazon S3 cloud object storage as follows.
Events are stored in the Amazon S3 bucket with object key names comprised from
the server name, date in YYYY-MM-DD
format, time in HH.MM.SS
format, and a
counter (since multiple events can be received during the same second).
-
Copy the
s3_write.py
script to a location that is accessible by NXLog. -
Edit the
BUCKET
andSERVER
variables in the code. -
Configure NXLog with an om_python instance.
125.4. Retrieving Events From S3
Events can be retrieved from Amazon S3 cloud object storage as follows.
The script keeps track of the last object retrieved from Amazon S3 by means of
a file called lastkey.log
, which is stored locally. Even in the event of an
abnormal termination, the script will continue from where it stopped. The
lastkey.log
file can be deleted to reset that behavior (or edited if
necessary).
-
Copy the
s3_read.py
script to a location that is accessible by NXLog. -
Edit the
BUCKET
,SERVER
, andPOLL_INTERVAL
variables in the code. ThePOLL_INTERVAL
is the time the script will wait before checking again for new events. TheMAXKEYS
variable should be fine in all cases with the default value of 1000 keys. -
Configure NXLog with an im_python instance.
125.4.1. Serialization and Compression
In the previous examples, only the $raw_event
field was stored in the
objects. An easy way to store more than one field is to "pickle" (or
"serialize" or "marshal") all the fields of an event.
import pickle
all = {}
for field in event.get_names():
all.update({field: event.get_field(field)})
newraw = pickle.dumps(all)
client.put_object(Body=newraw, Bucket=BUCKET, Key=key)
Compressing the events with gzip is also possible.