Opsview Knowledge Center

Graphing Data Engine

Learn about Opsview Monitors Graphing Date Engine

Timeseries

The Opsview Monitor Timeseries graphing data engine included in Opsview Monitor from version 5.2.0 provides a very flexible service for storing data used by the graphing services in the UI.

In the default configuration, all data is stored on the master server in exactly the same way as previous versions. However, if you experience high IO or load on the master server, then the graphing data engine can now be moved onto another server.

Installation

The graphing data engine is provided in 4 packages that are all installed via your normal OS package manager. They will be installed by default on the master server.

  • opsview-timeseries - request dispatcher
  • opsview-timeseries-enqueuer - request queuing and caching daemon
  • opsview-timeseries-lib - shared libraries between the other timeseries packages
  • opsview-timeseries-rrd - provides the RRD based data storage

All of these packages install under /opt/opsview and the directory names match the package names:

  • timeseries
  • timeseriesenqueuer
  • timeserieslib
  • timeseriesrrd

Each package uses the same directory structure and they all log to syslog (iusually into log files within /var/log, depending on how your system is configured).

Processes

All of the timeseries processes are stopped and started using the Opsview Monitor Watchdog. You can check them by running the following as the nagios user:

$ opsview_watchdog summary
+----------------------------------------+------------+-------------------+
| Service                                | Status     | Monitoring Status |
+----------------------------------------+------------+-------------------+
.... cut ....
| Process 'opsview-timeseriesrrdupdates' | Running    | Monitored         |
| Process 'opsview-timeseriesrrdqueries' | Running    | Monitored         |
| Process 'opsview-timeseriesenqueuer'   | Running    | Monitored         |
| Process 'opsview-timeseries'           | Running    | Monitored         |

The processes can be stopped, started and restarted individually, if required, e.g.:

$ opsview_watchdog opsview-timeseries restart

Configuration

All the daemon packages (i.e. all packages except timeseries-lib) provide two configuration files within their etc directory.

  • the <package name>.defaults.yaml file contains all default settings for the package. This file should not be changed or modified in any way. All changes to this file will be lost when the package is upgraded.
  • the <package name>.yaml.example file can be copied to <package name>.yaml and amended for local configuration changed - this file will not get overwritten on an upgrade.

If you need to change any of the default settings, copy the specific lines into the locally copied <package name>.yaml file.

The formatting of these files is very specific - spaces should be used to indent lines, not tabs. When changes have been made, restart the relevant daemon using opsview_watchdog as outlined in the Processes section above.

Moving Timeseries to another server

There are a number of steps involved in moving Timeseries to another server.

The first step will be to manually install the 4 packages (and their prerequisites) on the new server as per the Installation section above (you cannot use the autoinstall method for this at this time). You should not remove any of the timeseries packages from the master.

If you already have graphing data on your master server, you must transfer all the files to the new timeseries server using rsync (or similar), otherwise all graphing history will be lost. By default, the master server uses the directory /usr/local/nagios/var/rrd but on a newly installed and separate Timeseries server this will be /opt/opsview/timeseriesrrd/var/data.

On the new Timeseries server you must amend /opt/opsview/timeseries/etc/timeseries.yaml to set the correct listening address (as by default Timeseries only listens on the loopback interface). To do this, amend the file as follows:

timeseries:
    server:
        host: 0.0.0.0

and restart the timeseries daemons - you can do this as root by running:

/opt/opsview/watchdog/bin/opsview-monit all restart

Then, you must amend /usr/local/nagios/etc/opsview.conf on the Opsview Monitor server to set the following variable to point to the correct server IP address and port:

$timeseries_url = 'http://192.168.10.22:1600';

and restart all Opsview Monitor daemons - you can do this as root by running:

/opt/opsview/watchdog/bin/opsview-monit all restart

Finally, you can shut down all of the Timeseries daemons on the master server.

/opt/opsview/watchdog/bin/opsview-monit opsview-timeseriesrrdupdates unmonitor
/opt/opsview/watchdog/bin/opsview-monit opsview-timeseriesrrdqueries unmonitor
/opt/opsview/watchdog/bin/opsview-monit opsview-timeseriesenqueuer unmonitor
/opt/opsview/watchdog/bin/opsview-monit opsview-timeseries unmonitor

Graphing data should now be provided from the new Timeseries server.

Data Flow

The daemon process 'import_perfdatarrd' reads files from /usr/local/nagios/var/perfdatarrd and then passes the data on to the timeseries manager daemon on port 1600 on the configured host (localhost by default).

The Timeseries manager process launches and monitors worker processes (four by default) which are responsible for parsing and dispatching incoming requests. Write requests (adding more metrics from import_perfdatarrd) are dispatched to Timeseries Enqueuers (localhost on port 1620 by default), while the queries are dispatched to Timeseries RRD Queries (localhost port 1660 by default).

Timeseries Enqueuer passes the data to all configured RRD Updater workers simultaneously (localhost with ports 1640-1643 by default)

The timeseries RRD update worker writes out the data into the rrd files. On an upgraded system (one that previously ran an older version of Opsview Monitor) the rrd files are stored in /usr/local/nagios/var/rrd/<hostname>/<servicename>/<metric>/value.rrd, whereas on a new Opsview Monitor 5.2 (or later) installation, the data is stored in /opt/opsview/timeseriesrrd/var/data/<hostname>/<servicename>/<metric>/value.rrd.

The timeseries manager, enqueuer and RRD writer daemons can all be installed on separate hosts. However, for network bandwidth usage it is generally better to keep the enqueuer and RRD daemons on the same machine.

Data Storage - RRD

When using RRD (Round Robin Database), numerical values are stored in "time buckets" so there is a single value for each of these buckets. These are the default values used by Opsview:

  • Expects a 5 minute interval for values
  • Will keep 5 minute buckets for the last 50 hours
  • Will keep 30 minute buckets for the last 2 weeks
  • Will keep 2 hour buckets for 2 months
  • Will keep 1 day buckets for 2 years

This means the resolution of data gradually gets "thinned out" over time. When calculating a "bigger bucket" (such as taking six 5 minute buckets and consolidating into a single 30 minute bucket), the average value will be used.

Note, the "RRD heartbeat" is set to 4200 seconds by default, which means that if no values are received after an hour and 10 minutes, there will be gaps in the data. If any value is received during this time, all the buckets during the last hour and 10 minutes will be filled with this value.

InfluxDB

Introduction

InfluxDB is a timeseries database created by InfluxData. It is a part of their set of tools focused at performance data which they collect, store, visualise and then raise alerts. We do not provide InfluxDB directly, instead we provide a client component that is able to communicate with InfluxDB to query and store data.

Full support for InfluxDB versions 1.1.x and 1.2.x is available since 5.3 which includes required configuration options.

RRD will continue being the default timeseries engine.

InfluxDB has the following differences with RRD:

  • InfluxDB will store the raw value received, whereas RRD will apply averaging based on the intervals it is defined with. This means RRDs may return non-round numbers for things that should be round (eg: number of bits transferred or number of users), whereas InfluxDB will return whole numbers back when the granularity is small enough (obviously, there maybe fractional numbers when querying the average over a whole day). For example, this is a plugin that returns back the hour it is run in. For RRD, it has an average value of 9.420 at 10:00:

InfluxDB will show the value of 10 at 10:00

  • RRD has a value for all times going back to the last year, even if that is considered NULL. InfluxDB will only return NULL points when it has got some data for the range requested.
  • For counters, RRD stores the last counter value and records the difference based on the step size. InfluxDB stores the actual values of each counter but at query time will return the derivative. If a counter is reset, this would provide a negative difference with the previous value. However, this can be a normal scenario (eg: a device restart resets its counters) - in these cases, we assume the same rate as the previous value. For an initial value that is negative, Opsview will return a NULL point

Migration from RRDTool

Make sure system is running the latest packages of Opsview Monitor 5.3 - see our Installation/Upgrade instructions. The example commands for Debian based OSes:

  1. Upgrade all Opsview packages

    sudo apt-get update
    sudo apt-get install opsview-timeseries opsview-timeseries-enqueuer opsview-timeseries-rrd 
    sudo apt-get install opsview
    
  2. Pausing performance metrics processing

    # stop processing new performance data
    sudo /opt/opsview/watchdog/bin/opsview-monit stop import_perfdatarrd
    # stop Timeseries RRD
    sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-timeseriesrrdupdates
    sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-timeseriesrrdqueries
    
  3. Extracting existing timeseries data from RRDs - it may take a while!

    sudo -iu opsview /opt/opsview/timeseriesrrd/utils/rrd2perfdata.pl -i $PATH_TO_RRD_DIR$ -o $OUTPUT_DIR$
    

    You will need to identify the correct paths for your installation:

    # most common paths
    PATH_TO_RRD_DIR
      /usr/local/nagios/var/rrd (systems pre-5.2 upgraded to 5.3)
      /opt/opsview/timeseriesrrd/var/data (default 5.2 directory)
    OUTPUT_DIR
      /tmp/rrd_data
    

Installing InfluxDB

  1. Download the package for your platform and follow the documentation from InfluxDB: https://docs.influxdata.com/influxdb/v1.2/introduction/installation
  2. Create opsview database

    curl -i -XPOST http://127.0.0.1:8086/query --data-urlencode "q=CREATE DATABASE opsview"
    
  3. Install the new processor code - this will uninstall RRDs if they are installed as well!

    sudo apt-get install opsview-timeseries-influxdb
    
  4. configure the daemon

    sudo -iu opsview cp /opt/opsview/timeseriesinfluxdb/etc/timeseriesinfluxdb.yaml.example /opt/opsview/timeseriesinfluxdb/etc/timeseriesinfluxdb.yaml
    
  5. The example file should work without any modifications, but please verify

    sudo -iu opsview vim /opt/opsview/timeseriesinfluxdb/etc/timeseriesinfluxdb.yaml
    # restart services
    sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbupdates
    sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbqueries
    
  6. Restoring previous timeseries data

    # copy files into the processing directory ($OUTPUT_DIR$ is the path from previous command)
    find $OUTPUT_DIR$/perfdatarrd/ -type f -print0 | sudo xargs -0 -I{} cp -v {} /usr/local/nagios/var/perfdatarrd/
    sudo chown nagios.nagios -R /usr/local/nagios/var/perfdatarrd/
    # restore history
    sudo -iu nagios /usr/local/nagios/utils/import_servicecheck_interval_history $OUTPUT_DIR$/interval-history.tsv
    
  7. Configure Opsview

    # set timeseries providers
    sudo -iu nagios vim /usr/local/nagios/etc/opsview.conf
    # and add following line:
    $timeseries_provider = "influxdb";
    # restart opsview processes
    sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-web
    sudo /opt/opsview/watchdog/bin/opsview-monit restart opsviewd
    sudo /opt/opsview/watchdog/bin/opsview-monit restart import_ndologsd
    sudo /opt/opsview/watchdog/bin/opsview-monit restart import_ndoconfigend
    # reload opsview - this is required to get the correct interval for 
    # the current service checks.
    sudo -iu nagios /usr/local/nagios/bin/rc.opsview gen_config
    

    At this point you should be able to use the UI and see historical graphing data. If you do not, check the timeseries daemons are all running using 'opsview_watchdog' and perform a reload in the UI before continuing.

  8. Enable Timeseries Processing

    # restart processing
    sudo /opt/opsview/watchdog/bin/opsview-monit start import_perfdatarrd
    
  9. Re-process the migrated data. Depending on the number of metrics migrated it make take a bit of time to re-process all the data. The following command shows the current progress

    # report on number of files remaining to be processed
    watch "sudo ls /usr/local/nagios/var/perfdatarrd | wc -l"
    

Troubleshooting

Drop the whole database and recreate

# drop
curl -i -XPOST <a href="http://127.0.0.1:8086/query">http://127.0.0.1:8086/query</a> --data-urlencode "q=DROP DATABASE opsview"
sudo rm -rf /opt/opsview/timeseriesinfluxdb/var/data/*
# recreate
curl -i -XPOST <a href="http://127.0.0.1:8086/query">http://127.0.0.1:8086/query</a> --data-urlencode "q=CREATE DATABASE opsview"
# restart services
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbupdates
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbqueries

Drop all metrics for specific host

$ influx -database opsview
Connected to <a href="http://localhost:8086">http://localhost:8086</a> version 1.2.0
InfluxDB shell version: 1.2.0
> DROP MEASUREMENT "switch1.opsview.com";

Drop all metrics for a specific servicecheck on specific host

$ influx -database opsview
Connected to <a href="http://localhost:8086">http://localhost:8086</a> version 1.2.0
InfluxDB shell version: 1.2.0
> DELETE FROM "switch1.opsview.com" WHERE service = 'Connectivity - Lan';

NOTE: Single and double quotes are not interchangeable! See the InfluxDB Documentation for more information

Graphing Data Engine

Learn about Opsview Monitors Graphing Date Engine