Opsview Knowledge Center

Distributed Monitoring

Generic architecture and principles

Overview

Distributed Monitoring is a feature of Opsview Monitor that allows checks and notifications to be executed from remote servers, thus giving the capability of scaling your Opsview Monitor system by spreading the load and reducing latency. This is useful when:

  • You have a large number of Hosts
  • You have several datacenters that span across different geographic locations
  • You have networks that have secured zones or firewall rules to segregate Hosts

Opsview Monitor uses Collectors to handle the execution and collection of results.

For additional failover and load balancing capabilities, Collectors may be grouped together to form a Cluster.

There should always be an odd number of nodes within a collector cluster; 1, 3, 5, etc. This is to help with resiliency and to avoid split-brain issues

Each Host in Opsview is assigned to a Cluster. The Host will be actively checked by any Collector in that Cluster.

Note: The first Cluster with a Collector that is registered into Opsview Monitor in a new system will assume the role of ''Master Cluster'. Using the Advanced Automated Installation, the 'Master Cluster' forms part of the 'Opsview Monitor Primary Server'.

To setup additional Collectors, you need to:

  • Install the software
  • Register it in Opsview Monitor
  • Assign it to a Cluster.

These steps are detailed below in Managing Collector Servers.

Opsview Monitor Orchestrator

The host where Opsview is first installed is called the Opsview Monitor Primary Server. This host has all the necessary software packages installed so that it can function as a single Opsview system, but you can separate out the functional components onto other hosts to spread the load and decrease latency.

The Opsview Monitor Primary Server will have the following Host Templates assigned to it:

  • Application - Opsview
  • Opsview - Component - Agent
  • Opsview - Component - Autodiscovery Manager
  • Opsview - Component - BSM
  • Opsview - Component - Datastore
  • Opsview - Component - Downtime Manager
  • Opsview - Component - Executor
  • Opsview - Component - Flow Collector
  • Opsview - Component - Freshness Checker
  • Opsview - Component - License Manager
  • Opsview - Component - Load Balancer
  • Opsview - Component - Machine Stats
  • Opsview - Component - MessageQueue
  • Opsview - Component - Notification Center
  • Opsview - Component - Orchestrator
  • Opsview - Component - Registry
  • Opsview - Component - Results Dispatcher
  • Opsview - Component - Results Flow
  • Opsview - Component - Results Forwarder
  • Opsview - Component - Results Live
  • Opsview - Component - Results Performance
  • Opsview - Component - Results Recent
  • Opsview - Component - Results Sender
  • Opsview - Component - Results SNMP
  • Opsview - Component - Scheduler
  • Opsview - Component - SNMP Traps Collector
  • Opsview - Component - SNMP Traps
  • Opsview - Component - SSH Tunnels
  • Opsview - Component - State Changes
  • Opsview - Component - TimeSeries Enqueuer
  • Opsview - Component - TimeSeries
  • Opsview - Component - TimeSeries RRD
  • Opsview - Component - TimeSeries InfluxDB
  • Opsview - Component - Watchdog
  • Opsview - Component - Web
  • OS - Unix Base
  • Network - Base

This host is automatically assigned to the Master Cluster, and will normally monitor itself.

To add, remove, register clusters and collectors, see Managing Collector Servers.

Troubleshooting

The most common problem relates to misconfiguration of Components requiring access to the Master MessageQueue Server - Scheduler and Results-Sender. Check /var/log/opsview/opsview.log for detailed errors.

Architecture

Opsview Scheduler is the main component of a Collector. It receives commands and configuration from the Orchestrator and schedules execution of monitoring plugins, event handlers and notification scripts.

The execution of plugins is performed by Opsview Executor, whose only job is to execute commands requested by Scheduler. Results are then sent back to the Opsview Scheduler who requested those commands.

This approach allows sharing multiple Opsview Executors among all Collectors of a given Cluster - Point all Components to the same Cluster MessageQueue, and automatic load-balancing will be available.

Opsview Scheduler sends the results to Opsview Results-Sender, which will forward them to the Results Processors. In the case of a network outage, the Results-Sender will hold the results for a configurable amount of time.

Scalability

For high-availability we recommend you to have a single monitoring Cluster per monitored location (e.g. datacenter) with as many Collector nodes as required. All Collectors should point to single Cluster MessageQueue Server. For more information and assistance, contact our Customer Success Team.

Security

To secure communication over the network please refer to the Securing Network Connection documentation here.

Failure Scenarios

Opsview 6 can handle n-1 Collector failures within a Monitoring Cluster and since there is no upper limit on the number of Collectors in Cluster, we recommend you have at least three nodes per Cluster.
If there is a Collector failure, the Orchestrator will detect this within 60 seconds and automatically re-assigns the hosts monitored by that failed Collector to the remaining Collectors of the Cluster. The re-assignment will use the current known state of the objects and the configuration of the last time you have performed an Apply Change from the Configuration menu. Re-assigned hosts and their services are instantly re-checked.

When the Collector recovers, the Orchestrator would also automatically re-assign the hosts back again.

Distributed Monitoring


Generic architecture and principles

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.