Opsview Knowledge Center

Installing Opsview Monitor Slaves

Learn how to install Opsview Monitor Slaves servers to create a distributed monitoring environment

Within large organizations where there are large numbers of devices to monitor, or in a distributed datacenter that can span continents, countries or cities, a single Opsview Monitor server can get bogged down trying to process all the necessary device checks. As a result, it may fall behind leading to delayed checks and notifications. Also, networks that make use of secured zones or firewall rules to segregate devices and services should not have that security compromised by having to open significant numbers of ports to enable monitoring. Adding Opsview Monitor slaves to the monitoring hierarchy can help to spread the load and reduce latency, as well as reduce network administration issues by ensuring that all devices can be adequately monitored.

Note: All slaves must be available when Opsview Monitor reloads in order for configuration to be synchronized. If a slave is not working, then the reload will fail. It is possible to temporarily deactivate a slave system to allow the reload to continue.

Note: Each slave instance is a separate instance of Nagios(R) Core, but has their state synchronized with the master. See section Slave State Synchronization for some known limitations.

Note: It is not possible to have Notification suppression based on parent/child relationships for Hosts outside of a slave, because slaves only know about their own Hosts. If Notifications are sent from the slave, there may be more Notifications than if it was sent from the master.

Note: The OS time zone of the slave should match the master. If not, there could be issues with freshness checking for services that only run at certain times, as the master will be expecting results which the slaves will not send.

Note: There is a queuing system implemented on the slave to cater to a master failure or a network connection error. Any results older than five minutes will be dropped and a log entry in /usr/local/nagios/var/log/opsview-slave.log will be created. This can be changed by the slave_results_max_cache variable, see section Config Files.

Architecture

The NRD mechanism is described in its own section

Restrictions

Slave servers currently must have the same architecture and OS build as the master server (including prerequisite software). Slave servers must also have their date and time in sync with the master. Slave servers must also have an SSH server listening on port 22.

Prerequisites

On all slave servers, you should not install the main Opsview Monitor packages nor the Opsview Agent package. The Opsview Monitor code will be pushed from the master server to each slave during initial configuration and upgrades. When creating slaves for Red Hat 7 Enterprise, please make sure that the optional and extra repos are enabled on the server; this will allow Opsview Monitor to install all the dependencies required for Opsview Monitor slave. The repos can be enabled using the following command(s).

subscription-manager repos --enable=rhel-7-server-extras-rpms --enable=rhel-7-server-optional-rpms

The repositories have an opsview-slave package that ensures all dependencies are installed correctly. Note that the package itself does not install any files - it is only used for the checking and installation of dependencies.

Note: If you have previously installed the opsview-agent package, remove it and then ensure the nagios user and groups are removed.

For Debian and Ubuntu, set up up your repository configuration for your OS as per this section and then run the following command:

apt-get install opsview-slave

For Red Hat and CentOS setup the YUM repositories and run:

yum install opsview-slave

Pre-install Tasks

These steps are to be performed on the new slave server, unless otherwise stated: 1. Create the nagios and nagcmd groups:

groupadd nagios
groupadd nagcmd
  1. Create the nagios user and set its password to a known and secure value:
useradd -g nagios -G nagcmd -d /var/log/nagios -m nagios
passwd nagios
  1. Ensure the Nagios user has root access for specific commands via sudo (note also the Nagios user should have sudo on its PATH to use this correctly):
visudo

Comment out the following line if it is set

#Defaults requiretty

add the following line

nagios ALL = NOPASSWD: /usr/local/nagios/bin/install_slave
  1. On the master server, copy the nagios SSH public key from the master to the slave server:
su - nagios
ssh-keygen -t rsa  # Creates your SSH public/private keys if they do not currently exist
ssh-copy-id -i ~/.ssh/id_rsa.pub {slave_hostname}

The .ssh directory should be mode 0700 and the id_rsa file should be 0600. You should be able to connect to the slave server from the Opsview Monitor master server without passwords:

root@master$ su - nagios
nagios@master$ ssh {slave_hostname}
# Should be logged into slave system
  1. Copy the check_reqs and profile scripts from the master onto the slave as the Nagios user. This should work without prompting for authentication:
nagios@master$ scp /usr/local/nagios/installer/check_reqs /usr/local/nagios/bin/profile {slave}:

On the slave as the Nagios user, source the profile then run check_reqs:

nagios@slave$ . ./profile
nagios@slave$ ./check_reqs slave

If the above steps fail, fix any dependency issues listed (see section below). Specifically verify the opsview-slave package is installed.

  1. Set up the profile for the Nagios user on the slave server to be sourced at login:
nagios@slave$ echo "test -f /usr/local/nagios/bin/profile && . /usr/local/nagios/bin/profile" >> ~nagios/.profile
nagiso@slave$ chown nagios:nagios ~nagios/.profile

Note: The /usr/local/nagios/bin/profile

will be installed a bit later on - this sets it up for future logins.

  1. Create the temporary drop directory on the slave. On the slave, create the temporary directory to put the transfer files:
su - root
mkdir -p /usr/local/nagios/tmp 
chown nagios.nagios /usr/local/nagios /usr/local/nagios/tmp 
chmod 755 /usr/local/nagios /usr/local/nagios/tmp
  1. Check SSH TCP Port Forwarding on the slave. In order to communicate with the master server, port forwarding must be enabled in /etc/ssh/sshd_config on the slave server. Ensure that the following option is set to yes (default is yes):
AllowTcpForwarding yes

Restart the SSH server if this is changed.

Setup of the Slave

  1. Within the master web interface, ensure that the slave Host is set up on the master server:

    • Settings > Hosts Settings > Add New...
    • Make sure you assign the Application - Opsview Common Host template against the Host.

The Host used for the slaves must have at least one service associated with it, otherwise a configuration reload will fail.

  1. Within the master web interface, add the slave Host as a monitoring server via 'Settings > Monitoring Servers > Add New'
  1. From the master server, test the connection to the slave Host:
su - nagios
/usr/local/nagios/bin/send2slaves -t [opsview slave node name] # Connection test

If you get an error like:

Host key verification failed.
Error connecting to opslave.opsview.com

This is likely to be that the SSH Host key is not setup. Test with:

ssh opslave.opsview.com

Accept any Host keys if required.

4.From the master server, send all the program and configuration files to the slave Host:

/usr/local/nagios/bin/send2slaves [opsview slave node name]

The slave node name is optional and can be used when multiple slaves have been defined. In the example above, the slave node name is opslave. The connection test should show:

Connected to opslave...

The send code step should produce an error:

Errors requiring manual intervention: 1

and will detail running the commands in the next step, which should be \x{00E2} \x{0080} \x{00A6}. 5. On the slave server run the setup program:

su - root
cd /usr/local/nagios/tmp && ./install_slave

Note: The opsview-slave startup script is only required if you have a reverse SSH connection setup to your slave systems. (See below for more information).

  1. Within the master web interface, reload the Opsview Monitor configuration on the master server: "Settings > Reload" When this completes, the slave will automatically start the Nagios daemon and will start sending data back to the master.

Upgrading Slaves

Slaves are upgraded as part of the master upgrade (though slaves must be in a working state at the time of the master upgrade). You can resend the latest master files to the slave by running:

su - nagios
/usr/local/nagios/bin/send2slaves {slavename}

Note that the Opsview Agent configuration, nrpe.cfg, on the master is not sent to the slave, so you might want to compare any changes that have been made to it with its equivalent on each slave. See, for example, our recent changes to the Opsview Agent in Opsview Monitor 4.6.3.

Monitoring Slaves

Some services will be automatically created on the master to monitor each slave. These are called "Slave-node: {hostname}". This service will run a plugin called check_opsview_slave_node which checks:

  • if all slaves are contactable
  • if their time is synchronized
  • if NSCA has errored
  • if the Nagios daemon is running correctly

You should make sure that you will get alerts from this service as it will be the first warning of problems with a slave.

Slave Failures

When a slave fails, this is the sequence of actions:

  • The "Slave-node: {hostname}" check will go into a critical state - make sure you get notifications for this service
  • After 30 minutes, all Hosts and services on the failed slave will go into an UNKNOWN stale state with the text of UNKNOWN: Service results are stale. We think this is reflective of the situation as the service states are not up to date, and there is a single failure that needs resolving

Note: All Hosts monitored by that slave will not change state. It is not possible to set a freshness value on the Host state as Hosts are only checked on demand and thus will not have regular results sent to the Opsview Monitor master. Note: While a slave is down, a reload will fail. This is because the reload expects to be able to communicate with all slave nodes. If the slave is expected to be offline for a long period, you can disable the slave by marking it as not activated (done by editing the slave entry on the 'Monitoring Servers' settings page and unselecting the 'active' checkbox). If the slave is restarted independently, the Nagios daemon on the slave will continue to run checks but results will not be received by the master. You may still get notifications from this slave server - remember to stop Nagios!

Slave Clusters

If more than one server is selected during configuration of monitoring server, a 'slave cluster' will be created. This will provide automatic load balancing between nodes within the cluster and all checks will automatically fail over if one of the nodes in that cluster fails. See slave clusters for more information on setting up and configuring slave clusters.

Using The Slave

To make use of the slave Host, amend each Host configuration and set the Monitored By value to the correct server. Configuration->Hosts->{Edit Host}->Monitored By->{slave_host} Alternatively, you can drag and drop the Host between servers on the Monitoring servers list page.

Using Slaves with Reversed SSH Tunnels

The reverse SSH tunnels are useful if security policy only allow slaves to initiate conversations to the master. A tunnel is started from slave to master, and then the master is able to start new communications with slaves as required. Note: Reverse SSH functionality applies to all nodes of selected monitoring cluster - default setting is to use forward SSH tunnels. Create new monitoring node/cluster or edit the existing one and set the SSH tunnel configuration option to Reverse tunnel. On the list page, hover over the slave node - a popup will appear with the slave_port_number.

You will need to access slave in some way for the initial install. Create the Nagios groups and User and exchange SSH keys. The slave needs the public key of the master and the master needs the public key of the slave. On the slave, run as Nagios user:

ssh -N -R {slave_port_number}:127.0.0.1:22 {opsviewmaster}

This process will not exit on slave. On the master run the following as the nagios user:

/usr/local/nagios/bin/dosh -i -s {slavename} uname -a

This commands checks the connectivity to slaves. Install Opsview Monitor with send2slaves {slave_name} If this is a new slave you will need to manually intervene as root - check instructions onscreen. If this is a pre-existing slave, rerun

send2slaves {slave-name}

again. When installed, on the slave, stop the earlier SSH command. Test that this has been cancelled by running on the master and getting the expected error:

$ /usr/local/nagios/bin/dosh -i -s {name} uname -a
ssh: connect to host 127.0.0.1 port 25801: Connection refused

On the slave, create

/usr/local/nagios/etc/opsview-slave.conf

with contents of:

MASTER={masteripaddress}
SLAVE_PORT={slave_port_number}

Test from the slave with:

/etc/init.d/opsview-slave test
/etc/init.d/opsview-slave start
/etc/init.d/opsview-slave status

Now repeat these steps for all slaves in Opsview Monitor. On master, use

dosh uname -a

to test connectivity to all slaves. All communications with master should now work correctly and a reload can now be performed. Note: If you change a slave from reverse to forward, you must remove the opsview-slave.conf file so that Opsview Monitor will not attempt to restart the tunnel.

Changing the base port number

Reverse SSH requires starting from a specific TCP port number. However, this may clash with other applications, so this value is configurable. The default value is 28500 and the actual port used will be the base port + the slave nodes' id. To change the base port, on the Opsview Master, edit opsview.conf and set:

$slave_base_port = 11100;

Restart opsview and opsview-web. You will then need to change the port number in the opsview-slave.conf file on all your slave nodes.

Troubleshooting

'Could not change group' upon a reload This happens when the opsview-agent package has been installed and removed prior to installing opsview-slave on the Slave Node. Check to ensure the group nagcmd exists on the server and the user nagios is a member of it along with the nagios group. Host key verification failed When you run on the master

dosh uname -a

and you get the error:

Host key verification failed

Then you probably haven't created all the necessary Host keys. For that particular slave, run:

/usr/local/nagios/bin/dosh -i -s {slavename} uname -a

This will prompt for password if the SSH key exchange hasn't been done yet. It will also automatically add the Host key if not already done.

perl: undefined symbol

If you get this error while installing Opsview Monitor on the slave:

/usr/bin/perl: symbol lookup error: /usr/local/nagios/perl/lib/i486-linux-gnu-thread-multi/auto/Time/HiRes/HiRes.so: undefined symbol: Perl_Istack_sp_ptr

This can be due to different levels of Perl between the master and the slave. There is a restriction that the OS distribution must be at the same level between master and slave.

ssh: Connection timed out during banner exchange

This appears if there is too much load on the slave. Try again. Keep an eye on resources as you do not want this to occur often.

Missing prerequisites - libexpat is not installed

When running

check_reqs slave

your environment may be identified as missing the libexpat XML parsing libraries, even if you already have the libexpat or libexpat1 packages installed. The solution is to also install the dev version of this package. When using apt the package name is libexpat-dev or libexpat1-dev packages, depending upon your OS version. When using yum the package name is expat-devel.

Plugin output is not preserved completely on Master view

Plugins run on the slaves are passed back to the master via NRD. The flow is:

  1. Plugin executed via slave
  2. Result stored as a Nagios performance result using the performance template
  3. File moved to /usr/local/nagios/var/slaveresults
  4. File picked up by import_slaveresultsd
  5. Results sent to NRD on the master
  6. NRD places the result into Nagios' checkresult directory
  7. Nagios reads these results from the checkresult directory

However, at step 2, the output is passed through a filter where certain characters are removed. Also linefeeds are changed to \n and all backslashes are converted to a double backslash. Because of this, the output will not be exactly the same between master and slave.

Suppressing Message of the Day/Login Banner

To suppress login banners while connecting from Opsview Master to Slave servers (which will be logged in opsviewd.log as they are printed to STDERR) suggested approach when the banners cannot be disabled is to create a ~nagios/.hushlogin file:

touch ~nagios/.hushlogin

Installing Opsview Monitor Slaves

Learn how to install Opsview Monitor Slaves servers to create a distributed monitoring environment