Return to
Portfolio

104. Debugging NXLog

When other troubleshooting fails to identify (or resolve) an issue, inspecting the NXLog agent itself can prove useful. Some techniques are outlined below.

104.1. Dump Debug Info to NXLog’s Internal Log

A simple way to quickly get a more complete picture of NXLog’s current status is to dump debug info into the internal log. This information can be helpful in debugging, for example, why an input module is not sending to an output module. Normally, internal events are written to the log file configured with the LogFile directive.

  • On Linux, send SIGUSR1 to the application.

    # kill -SIGUSR1 $PID
  • On Windows, send the service control command "200" to the application.

    > sc control nxlog 200
Dumped debug info example
2017-03-29 10:05:19 INFO event queue has 2 events;jobgroup with priority 10;job of module in/im_file, events: 0;job of module out/om_null, events: 0;non-module job, events: 0;jobgroup with priority 99;non-module job, events: 0;[route 1]; - in: type INPUT, status: RUNNING queuesize: 0; - out: type OUTPUT, status: RUNNING queuesize: 0;

The status is the most important piece of information in the dumped log entries. A status of PAUSED means the input module is not able to send because the output module queue is full. In such a case the queuesize for the corresponding output(s) would be over 99. A status of STOPPED means the module is fully stopped, usually due to an error (e.g. TCP disconnection for om_tcp).

104.2. Switch to DEBUG Log Level

NXLog’s log level can be switched to DEBUG level without requiring a restart. For extended debugging sessions, consider setting the LogLevel directive to DEBUG.

  • On Linux, send SIGUSR2.

    # kill -SIGUSR2 $PID
  • On Windows, send service control command 201.

    > sc control nxlog 201

104.3. Generate Core Dumps

Core dumps can act as a helpful resource for the NXLog development and support teams for debugging issues.

104.3.1. Core Dumps on Linux

Note
It is necessary to install the NXLog debug symbols package in order to produce useful core dump files.
  1. Remove the User and Group directives from the configuration. NXLog needs to be running as root:root to produce a core dump.

  2. Use ulimit to remove the core file size limit.

    # ulimit -c unlimited
  3. Run NXLog manually to test that it can create a core dump.

    # /opt/nxlog/bin/nxlog -f
  4. Find the NXLog process and kill it with the SIGABRT signal.

    # kill -ABRT `ps aux | grep [/]opt/nxlog/bin/nxlog | awk '{print $2}'`
  5. Verify that a core dump file was created at /opt/nxlog/var/spool/nxlog/core.

    # ls -l /opt/nxlog/var/spool/nxlog/
    total 26708
    -rw------- 1 root root 27348992 Oct 30 08:51 core
  6. If the core dump file was created successfully, run NXLog again as root in order to catch the next crash.

    # /opt/nxlog/bin/nxlog -f

104.3.2. Core Dumps on Windows

Core dumps can be generated on Windows by using ProcDump from Microsoft Sysinternals.

Note
ProcDump runs on Windows Vista and higher, and Windows Server 2008 and higher.

For example, run the following to write a full dump of the nxlog process when its handle count exceeds 10,000:

> procdump -ma nxlog -p "\Process(nxlog)\Handle Count" 10000

104.4. Inspect Memory Leaks

If NXLog’s memory usage exceeds 200 MB, there is likely a memory leak.

104.4.1. Inspecting Memory Leaks on Linux

We recommend using Valgrind on GNU/Linux to debug memory leaks.

  1. Install the debug symbols (-dbg) package (for example, nxlog-dbg_3.0.1759_amd64.deb).

    Note
    The NXLog debug symbols package is currently only available for Linux. This package is not included with NXLog by default, but can be provided on request.
  2. Install Valgrind.

  3. Set the NoFreeOnExit directive to TRUE in the NXLog configuration file. This directive ensures that modules are not unloaded when NXLog is stopped, which allows Valgrind to properly resolve backtraces into modules.

  4. Start NXLog under Valgrind with the following command. If User is set to nxlog in the configuration, then the command must be executed with su, otherwise Valgrind will not be able to create the massif.out file at the end of the sampling process.

    # cd /tmp
    # su -lc "valgrind --tool=massif --pages-as-heap=yes /opt/nxlog/bin/nxlog -f" nxlog
  5. Let NXLog run for a while until the Valgrind process shows the memory increase, then interrupt it with Ctrl+C. The output is written to /tmp/massif.out.xxxx.

  6. Send the massif.out.xxxx file with a bug report.

  7. Optionally, create a report from the massif.out.xxxx file with the ms_print command:

    # ms_print massif.out.xxxx

    The output of the ms_print report contains an ASCII chart at the top showing the increase in memory usage. The chart shows the sample number with the highest memory usage—marked with (peak). This is normally at the end of the chart (the last sample). The backtrace from this sample indicates where the most memory is allocated.

104.4.2. Inspecting Memory Leaks on Windows

Windows Process Explorer from Microsoft Sysinternals can be used to inspect memory use of all running programs.

Once a potential source of excessive memory use has been determined, use DebugView from Microsoft Sysinternals to inspect the application’s debug output.