Opsview Knowledge Center

Timeseries Graphing Engine

Learn about Opsview Monitors Graphing Data Engine

Timeseries

The Opsview Monitor Timeseries graphing data engine included in Opsview Monitor from version 5.2.0 provides a very flexible service for storing data used by the graphing services in the UI.

In the default configuration, all data is stored on the master server in exactly the same way as previous versions. However, if you experience high IO or load on the master server, then the graphing data engine can now be moved onto another server.

Installation

The graphing data engine is provided in 4 packages that are all installed via your normal OS package manager. They will be installed by default on the master server.

  • opsview-timeseries - request dispatcher
  • opsview-timeseries-enqueuer - request queuing and caching daemon
  • opsview-timeseries-lib - shared libraries between the other timeseries packages
  • opsview-timeseries-rrd - provides the RRD based data storage

All of these packages install under /opt/opsview and the directory names match the package names:

  • timeseries
  • timeseriesenqueuer
  • timeserieslib
  • timeseriesrrd

Each package uses the same directory structure and they all log to syslog (usually into log files within /var/log, depending on how your system is configured).

Processes

All of the timeseries processes are stopped and started using the Opsview Monitor Watchdog. You can check them by running the following as the opsview user:

$ opsview_watchdog summary
+----------------------------------------+------------+-------------------+
| Service                                | Status     | Monitoring Status |
+----------------------------------------+------------+-------------------+
.... cut ....
| Process 'opsview-timeseriesrrdupdates' | Running    | Monitored         |
| Process 'opsview-timeseriesrrdqueries' | Running    | Monitored         |
| Process 'opsview-timeseriesenqueuer'   | Running    | Monitored         |
| Process 'opsview-timeseries'           | Running    | Monitored         |

The processes can be stopped, started and restarted individually, if required, e.g.:

$ opsview_watchdog opsview-timeseries restart

Configuration

All the daemon packages (i.e. all packages except timeseries-lib) provide two configuration files within their etc directory.

  • the <package name>.defaults.yaml file contains all default settings for the package. This file should not be changed or modified in any way. All changes to this file will be lost when the package is upgraded.
  • the <package name>.yaml.example file can be copied to <package name>.yaml and amended for local configuration changed - this file will not get overwritten on an upgrade.

If you need to change any of the default settings, copy the specific lines into the locally copied <package name>.yaml file.

The formatting of these files is very specific - spaces should be used to indent lines, not tabs. When changes have been made, restart the relevant daemon using opsview_watchdog as outlined in the Processes section above.

Moving Timeseries to another server

There are a number of steps involved in moving Timeseries to another server.

The first step will be to manually install the 4 packages (and their prerequisites) on the new server as per the Installation section above (you cannot use the autoinstall method for this at this time). You should not remove any of the timeseries packages from the master.

If you already have graphing data on your master server, you must transfer all the files to the new timeseries server using rsync (or similar), otherwise all graphing history will be lost. By default, the Timeseries RRD uses the /opt/opsview/timeseriesrrd/var/data directory.

On the new Timeseries server you must amend /opt/opsview/timeseries/etc/timeseries.yaml to set the correct listening address (as by default Timeseries only listens on the loopback interface). To do this, amend the file as follows:

timeseries:
    server:
        host: 0.0.0.0

and restart the timeseries daemons - you can do this as root by running:

/opt/opsview/watchdog/bin/opsview-monit all restart

Then, you must amend /opt/opsview/coreutils/etc/opsview.conf on the Opsview Monitor server to set the following variable to point to the correct server IP address and port:

$timeseries_url = 'http://192.168.10.22:1600';

and restart all Opsview Monitor daemons - you can do this as root by running:

/opt/opsview/watchdog/bin/opsview-monit all restart

Finally, you can shut down all of the Timeseries daemons on the master server.

/opt/opsview/watchdog/bin/opsview-monit opsview-timeseriesrrdupdates unmonitor
/opt/opsview/watchdog/bin/opsview-monit opsview-timeseriesrrdqueries unmonitor
/opt/opsview/watchdog/bin/opsview-monit opsview-timeseriesenqueuer unmonitor
/opt/opsview/watchdog/bin/opsview-monit opsview-timeseries unmonitor

Graphing data should now be provided from the new Timeseries server.

Data Flow

The Results-Performance reads the results from MessageQueue and then passes the data on to the timeseries manager daemon on port 1600 on the configured host (localhost by default).

The Timeseries manager process launches and monitors worker processes (four by default) which are responsible for parsing and dispatching incoming requests. Write requests (adding more metrics from Results-Performance) are dispatched to Timeseries Enqueuers (localhost on port 1620 by default), while the queries are dispatched to Timeseries RRD Queries (localhost port 1660 by default).

Timeseries Enqueuer passes the data to all configured RRD Updater workers simultaneously (localhost with ports 1640-1643 by default)

The timeseries RRD update worker writes out the data into the rrd files. Opsview Monitor stores RRD data in /opt/opsview/timeseriesrrd/var/data/<hostname>/<servicename>/<metric>/value.rrd.

The timeseries manager, enqueuer and RRD writer daemons can all be installed on separate hosts. However, for network bandwidth usage it is generally better to keep the enqueuer and RRD daemons on the same machine.

Data Storage - RRD

When using RRD (Round Robin Database), numerical values are stored in "time buckets" so there is a single value for each of these buckets. These are the default values used by Opsview:

  • Expects a 5 minute interval for values
  • Will keep 5 minute buckets for the last 50 hours
  • Will keep 30 minute buckets for the last 2 weeks
  • Will keep 2 hour buckets for 2 months
  • Will keep 1 day buckets for 2 years

This means the resolution of data gradually gets "thinned out" over time. When calculating a "bigger bucket" (such as taking six 5 minute buckets and consolidating into a single 30 minute bucket), the average value will be used.

Note, the "RRD heartbeat" is set to 4200 seconds by default, which means that if no values are received after an hour and 10 minutes, there will be gaps in the data. If any value is received during this time, all the buckets during the last hour and 10 minutes will be filled with this value.

InfluxDB

Introduction

InfluxDB is a timeseries database created by InfluxData. It is a part of their set of tools focused at performance data which they collect, store, visualise and then raise alerts. We do not provide InfluxDB directly, instead we provide a client component that is able to communicate with InfluxDB to query and store data.

Full support for InfluxDB version 1.2.x is available since 6.0-EA which includes required configuration options.

RRD will continue being the default timeseries engine.

InfluxDB has the following differences with RRD:

  • InfluxDB will store the raw value received, whereas RRD will apply averaging based on the intervals it is defined with. This means RRDs may return non-round numbers for things that should be round (eg: number of bits transferred or number of users), whereas InfluxDB will return whole numbers back when the granularity is small enough (obviously, there maybe fractional numbers when querying the average over a whole day). For example, this is a plugin that returns back the hour it is run in. For RRD, it has an average value of 9.420 at 10:00:

InfluxDB will show the value of 10 at 10:00

  • RRD has a value for all times going back to the last year, even if that is considered NULL. InfluxDB will only return NULL points when it has got some data for the range requested.
  • For counters, RRD stores the last counter value and records the difference based on the step size. InfluxDB stores the actual values of each counter but at query time will return the derivative. If a counter is reset, this would provide a negative difference with the previous value. However, this can be a normal scenario (eg: a device restart resets its counters) - in these cases, we assume the same rate as the previous value. For an initial value that is negative, Opsview will return a NULL point

Migration from RRDTool

Preparation

Make sure system is running the latest packages of Opsview Monitor 6.0 - see our Installation/Upgrade instruction on Installation/Upgrade instructions/. The example commands for Debian based OSes:

# upgrade all Opsview packages
sudo apt-get update
sudo apt-get install opsview-timeseries opsview-timeseries-enqueuer opsview-timeseries-rrd 
sudo apt-get install opsview

Pausing performance metrics processing

# stop processing new performance data
sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-resultsperformance

# stop Timeseries RRD
sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-timeseriesrrdupdates
sudo /opt/opsview/watchdog/bin/opsview-monit stop opsview-timeseriesrrdqueries

Extracting existing timeseries data from RRDs

 sudo -iu opsview /opt/opsview/timeseriesrrd/utils/rrd2perfdata.pl -i $PATH_TO_RRD_DIR$ -o $OUTPUT_DIR$

You will need to identify the correct paths for your installation:

# most common paths
$PATH_TO_RRD_DIR$
    /usr/local/nagios/var/rrd (systems pre-5.2 upgraded to 5.3)
    /opt/opsview/timeseriesrrd/var/data (default 5.2 directory and onward)
$OUTPUT_DIR$
    /tmp/rrd_data

Installing InfluxDB

# download package for your platform and follow the install documentation from InfluxDB
https://docs.influxdata.com/influxdb/v1.2/introduction/installation

# create opsview database
curl -i -XPOST http://127.0.0.1:8086/query --data-urlencode "q=CREATE DATABASE opsview"

Opsview Timeseries InfluxDB

# install new processor - this will uninstall RRDs if they are installed as well!

# for RPM based systems
sudo yum remove opsview-timeseries-rrd
sudo yum install opsview-timeseries-influxdb

# for DPKG based systems
sudo apt-get remove opsview-timeseries-rrd
sudo apt-get install opsview-timeseries-influxdb

# configure
sudo cp -p /opt/opsview/timeseriesinfluxdb/etc/timeseriesinfluxdb.yaml.example /opt/opsview/timeseriesinfluxdb/etc/timeseriesinfluxdb.yaml

# Edit this file and add lines 'user' and 'password' with their corresponding values.
sudo -iu opsview vi /opt/opsview/timeseriesinfluxdb/etc/timeseriesinfluxdb.yaml

# The user can be obtained from file opsview.conf
grep timeseries_username /opt/opsview/coreutils/etc/opsview.conf

# The password will be located in file user_secrets.yaml
grep opsview_timeseries_password /opt/opsview/deploy/etc/user_secrets.yml

# The top of your file should look like this
# Note: Indentation is important, please only use spaces, no tabbing!
---
timeseriesinfluxdb:
    server:
        user: opsview
        password: WFC8Ut25vwefwefwefVT
        updates:

# restart services
sudo /opt/opsview/watchdog/bin/opsview-monit reload
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbupdates
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbqueries
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesenqueuer
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseries

Restoring previous timeseries data

Please substitute <user>:<passwd> with the value you obtained for file timeseriesinfluxdb.yaml

# copy files into the processing directory ($OUTPUT_DIR$ is the path from previous command)
find $OUTPUT_DIR$/perfdatarrd/ -type f -print0 | xargs -0 -I{} curl -XPOST -u<user>:<passwd> --data-binary "@{}" -H "Content-type: text/plain" http://localhost:1600/write

# restore history
sudo -iu opsview /opt/opsview/coreutils/utils/import_servicecheck_interval_history $OUTPUT_DIR$/interval-history.tsv

Configure Opsview

# set timeseries providers
sudo vi /opt/opsview/coreutils/etc/opsview.conf
# and add/amend the following line anywhere above the final line containing *1;* (which must be at the end of the file):
$timeseries_provider = "influxdb";


# restart opsview processes
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-web

# reload opsview - this is required to get the correct interval for the current service checks.
sudo -iu opsview /opt/opsview/coreutils/bin/rc.opsview gen_config

Enable Timeseries Processing

# restart processing
sudo /opt/opsview/watchdog/bin/opsview-monit start opsview-resultsperformance

Troubleshooting

Drop the whole database and recreate

# drop
curl -i -XPOST http://127.0.0.1:8086/query --data-urlencode "q=DROP DATABASE opsview"
sudo rm -rf /opt/opsview/timeseriesinfluxdb/var/data/*
# recreate
curl -i -XPOST http://127.0.0.1:8086/query --data-urlencode "q=CREATE DATABASE opsview"
# restart services
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbupdates
sudo /opt/opsview/watchdog/bin/opsview-monit restart opsview-timeseriesinfluxdbqueries

Drop all metrics for specific host

$ influx -database opsview
Connected to <a href="http://localhost:8086">http://localhost:8086</a> version 1.6.0
InfluxDB shell version: 1.6.0
> DROP MEASUREMENT "switch1.opsview.com";

Drop all metrics for a specific servicecheck on specific host

$ influx -database opsview
Connected to <a href="http://localhost:8086">http://localhost:8086</a> version 1.6.0
InfluxDB shell version: 1.6.0
> DELETE FROM "switch1.opsview.com" WHERE service = 'Connectivity - Lan';

NOTE: Single and double quotes are not interchangeable! See the InfluxDB Documentation for more information