Opsview Knowledge Center

NRD Architecture

Learn about the technical architecture of the Nagios Result Distributor

From Opsview Monitor 3.11.0, NRD, or Nagios Result Distributor, is used for sending results from slaves back up to the Opsview Monitor master.
NRD has a client/server design - there is a client running on each slave server which sends results to the nrd process on the Opsview Monitor master.
Opsview Monitor's implementation of NRD has the following benefits of using NSCA:

  • Can send multi-line plugin output back to master
  • A 16K limit on the amount of data sent, above the previous 511 bytes
  • 50% performance improvement over NSCA communication
  • Results are queued on the slave which means that sending results is now in parallel to other Nagios(R) Core responsibilities
  • Results received on the Opsview Monitor master are written directly to the check results queue, reducing workload on Nagios Core on the master
  • Transactional results so if a sending failure occurs, the whole transaction is aborted and retried
  • The NRD daemon on the Opsview Monitor master can dynamically prefork extra servers based on load
  • Timestamp of results is now based on the time the results were stored on the slave, rather than the time of reception on the master
  • Packet sizes are now flexible, so only the amount of data required is sent (66% less data than NSCA)

Architecture

Server

The server daemon is /usr/local/nagios/bin/nrd. This is started and stopped with the rest of Opsview Monitor using /etc/init.d/opsview.
The configuration file used is /usr/local/nagios/etc/nrd.conf. However, this file is generated by an Opsview Monitor reload so changes made directly will be overwritten.
Opsview Monitor runs NRD in prefork mode, so there are up to 12 nrd processes - these are dynamically created if required. If there are changes (such as if the shared password has changed), then a restart of the nrd process is necessary.
The server logs information to /var/log/opsview/opsviewd.log. You can increase the logging by changing the /usr/local/nagios/etc/nrd.conf file and altering the log_level attribute to four for the maximum debug - this requires a restart of the NRD daemon. As Log4perl has its own filtering capabilities, you will need to change /usr/local/nagios/etc/Log4perl.conf with:

log4perl.logger.nrd=DEBUG

This file does not need a restart of NRD, but can take up to 30 seconds to be recognized.
Information is encrypted between the client and server based on a shared password. This shared password can be changed in the opsview.conf file.

Client

The client is implemented as a daemon called import_slaveresultsd. You can send results back to NRD on the master using:

printf "host1\t0\toutput message" | /usr/local/nagios/bin/send_nrd -c /usr/local/nagios/etc/send_nrd.cfg

But the import_slaveresultsd includes all the necessary libraries so there is not an invocation penalty for every result.
The daemon continually checks the directory /usr/local/nagios/var/slaveresults for any new files. This will contain results written by Nagios Core every five seconds and files are created based on the timestamp of creation.
When the daemon finds a file, it will (for each file, oldest first):

  • discard any file older than the value of $slave_results_max_cache in opsview.conf (five minutes by default)
  • send the file up to the master

The client log file is located at /usr/local/nagios/var/log/opsview-slave.log.
You can increase the debugging of the import_slaveresultsd by altering the file /usr/local/nagios/etc/Log4perl-slave.conf with:

#log4perl.logger.import_slaveresultsd=DEBUG

This does not require a restart of the daemon, though it could take up to 30 seconds to recognize.
Be aware that every successful send result is archived to the directory /usr/local/nagios/var/slaveresults.archive, so if you enable this option, please remember to disable again.

Communication Flow

When a client connects, there is an exchange of initial information before results are sent. Internally, the results are sent as JSON data but are encrypted.
The server will receive each result and write all the results into Nagios' checkresults directory (bypassing the Nagios Core named pipe which is a known bottleneck). The results are not considered 'ready' until the client signals the end of the data, thus ensuring data integrity.

Monitoring

Monitoring is done on the Opsview Monitor master via the Slave-node checks. This will return the number of back logged files existing in the results directory. It will also alert if the oldest file is older than 70 seconds as this means the daemon is not working.

Troubleshooting

Slave results not getting to master

If you get errors on the master in /var/log/opsview/opsviewd.log like:

[2012/01/31 13:48:01] [nrd] [ERROR] Couldn't unserialize a request: malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "\x{19cb}\x{285}E\x{529}...") at /usr/local/nagios/bin/../perl/lib/NRD/Serialize/plain.pm line 28, <GEN25> line 4.

Then restart NRD on the master with /etc/init.d/opsview restart.

NRD Architecture

Learn about the technical architecture of the Nagios Result Distributor