Opsview Knowledge Center

Troubleshooting

How to investigate Opsview Monitor problems

Opsview Watchdog

The first page you should load when you are encountering issues with Opsview Monitor is the 'My System > Monitoring Engine' page, as below:

Within the 'Monitoring Engine' page is the 'Process Information' table. If there any problems with your Opsview Monitor system, it is likely a process is not running, or is encountering problems.

On the 'Monitoring Engine' tab there is also a button labeled 'Download Now', which will download a .tar.gz file with all the relevant syslogs, etc:

Naturally, if the web interface is the problem then you wont be able to access this section. If this is the case, run the command /opt/opsview/watchdog/bin/opsview-monit summary as the root user, as seen below:

# /opt/opsview/watchdog/bin/opsview-monit summary
The Monit daemon 5.14 uptime: 4d 6h 49m
Process 'opsview-web'               Running
Filesystem 'rootfs'                 Accessible
Filesystem 'varfs'                  Accessible
Filesystem 'optfs'                  Accessible
Process 'opsviewmd'                 Running
Process 'import_ndologsd'           Running
Process 'import_perfdatarrd'        Running
Process 'import_ndoconfigend'       Running
Process 'opsviewadmd'               Running
Process 'nsca'                      Running
Process 'nrd'                       Running
Process 'opsviewnfd'                Not monitored
Process 'nagios'                    Running
Process 'opsviewd'                  Running
Process 'opsviewhd'                 Running
Process 'opsview-agent'             Running
System 'inst-debian8-64'            Running

If you have any issues with the processes above, i.e. opsview-web, then you can restart an individual process with the command:

# /opt/opsview/watchdog/bin/opsview-monit restart opsview-web

You can also view detailed information about each process by running the following command as the root user:

# /opt/opsview/watchdog/bin/opsview-monit status opsview-web
The Monit daemon 5.14 uptime: 4d 6h 54m
Process 'opsview-web'  status                            Running
monitoring status                 Monitored  
pid                               3451  
parent pid                        1  
uid                               999  
effective uid                     999  
gid                               998  
uptime                            3h 21m  
children                          3  
memory                            250.8 MB  
memory total                      1.0 GB  
memory percent                    5.0%  
memory percent total              20.7%  
cpu percent                       0.0%  
cpu percent total                 0.0%  
data collected                    Tue, 15 Sep 2015 16:16:39

If you encounter the error:

# /opt/opsview/watchdog/bin/opsview-monit summary
Status not available -- the monit daemon is not running

Then the opsview-watchdog service is not running. To start it, as root run the commands:

# pkill -u opsview
# pkill -u nagios
# service opsview-watchdog start
# /opt/opsview/watchdog/bin/opsview-monit start all

This will kill any leftover processes, start the daemon, and then restart all the services which the watchdog is controlling.

If the watchdog process starts but the processes it is monitoring do not, check your sudo configuration (using the command ' visudo') does not have 'Defaults requiretty' enabled. If it does, disabled it (by commenting it out with a '#' character) and rerun the following:

# /opt/opsview/watchdog/bin/opsview-monit validate

Finally, if your watchdog services start (as per the 'summary' command), but suddenly shutdown after a minute, you may not have enough free disk space. Opsview Monitor requires a MINIMUM of 2GB free space. If this threshold is breached, Opsview Monitor will elegantly shutdown instead of crashing and leaving the system in a problematic state when the disk space issue is resolved.

To confirm you are encountering the disk space issue, run the command:

# cat /var/log/syslog | grep "resource limit"
Sep 15 12:06:47 ov-author opsview-monit[4362]: 'rootfs' space free 1.7 GB matches resource limit [space free>2.0 GB]
Sep 15 12:06:47 ov-author opsview-monit[4362]: 'varfs' space free 1.7 GB matches resource limit [space free>2.0 GB]
Sep 15 12:06:47 ov-author opsview-monit[4362]: 'optfs' space free 1.7 GB matches resource limit [space free>2.0 GB]

If you see these errors, you should check your free disk space using df:

# df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/ovauthorvg-rootlv  9.3G  7.6G  1.2G  87% /
none                           4.0K     0  4.0K   0% /sys/fs/cgroup
udev                           2.5G  4.0K  2.5G   1% /dev
tmpfs                          497M  352K  496M   1% /run
none                           5.0M     0  5.0M   0% /run/lock
none                           2.5G     0  2.5G   0% /run/shm
none                           100M     0  100M   0% /run/user
/dev/mapper/ovauthorvg-bootlv  233M   38M  179M  18% /boot

Logs

Logs are always a good place to start when it comes to troubleshooting. In Opsview Monitor; there are numerous logs:

  • /var/log/daemon.log
  • /var/log/auth.log
  • /var/log/syslog
  • /var/log/opsview/opsview-web.log
  • /var/log/opsview/opsviewd.log
  • /usr/local/nagios/var/nagios.log
  • /usr/local/nagios/var/perfdata.log

Databases

We have seen issues where a database has a bad schema and indexes are given the wrong name. This causes problems for the upgrade scripts as they expect specific names to exist when upgrading.

Follow this process to reset the schema while retaining the existing data. You should not normally have to do this.

  • Stop opsview and opsview-web
  • Take a backup of the opsview database: /usr/local/nagios/bin/db_opsview db_backup > /tmp/opsview.db
  • Take another backup, for comparing differences: mysqldump -u {user} -p{password} --skip-extended-insert opsview > /tmp/opsview.diff
  • Export just data from the database: mysqldump --skip-extended-insert -t -c -u {user} -p{password} opsview > /tmp/opsview.data
  • Create the database from scratch: /usr/local/nagios/bin/db_opsview db_install
  • Export the schema information from a fresh install: mysqldump -d -u {user} -p{password} opsview > /tmp/opsview.schema
  • Delete and recreate just the database: echo 'drop database opsview; create database opsview' | mysql -u {user} -p{password}
  • Import the fresh schema information: mysql -u {user} -p{password} opsview < /tmp/opsview.schema
  • Import the data: mysql -u {user} -p{password} opsview < /tmp/opsview.data
  • Take a new backup: /usr/local/nagios/bin/db_opsview db_backup > /tmp/opsview_post.db
  • Take another backup, for comparing: mysqldump -u {user} -p{password} --skip-extended-insert opsview > /tmp/opsview2.diff
  • Compare to check differences: diff -u /tmp/opsview.diff /tmp/opsview2.diff
  • Restart Opsview Monitor

Other

Please see the below section for 'other' troubleshooting steps to look into, if the steps above do not resolve your problem.

SNMP MIBs

The Debian squeeze distribution does not install Simple Network Management Protocol (SNMP) Management Information Bases (MIBs). In the example below, we show the command to use to install the MIBs manually.

Use this command to resolve missing MIBs:
apt-get install snmp-mibs-downloader

Note: You may still receive several MIB errors, as shown below, but these can be ignored.

$ snmpwalk -mALL -v2c -cpublic host | head
Unlinked OID in
IPATM-IPMC-MIB: marsMIB ::= { mib-2 57 }Undefined identifier: mib-2 near line
18 of /usr/share/mibs/ietf/IPATM-IPMC-MIBBad operator (INTEGER): At line 73 in
/usr/share/mibs/ietf/SNMPv2-PDUUndefined OBJECT-GROUP
(diffServMIBMultiFieldClfrGroup): At line 2195 in
/usr/share/mibs/ietf/IPSEC-SPD-MIBUndefined OBJECT-GROUP
(diffServMultiFieldClfrNextFree): At line 2157 in
/usr/share/mibs/ietf/IPSEC-SPD-MIBUndefined OBJECT-GROUP
(diffServMIBMultiFieldClfrGroup): At line 2062 in /usr/share/mibs/ietf/IPSEC-SPD-MIBExpected
"::=" (RFC5644): At line 493 in
/usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIBExpected "{"
(EOF): At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIBBad
object identifier: At line 651 in /usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIBBad
parse of OBJECT-IDENTITY: At line 651 in
/usr/share/mibs/iana/IANA-IPPM-METRICS-REGISTRY-MIBRFC1213-MIB::sysDescr.0 =
STRING: "Cisco Internetwork Operating System SoftwareIOS (tm) C2600
Software (C2600-J1S3-M), Version 12.2(15)T7, RELEASE SOFTWARE (fc2)TAC Support:
http://www.cisco.com/tacCopyright (c)
1986-2003 by cisco Systems, Inc.Compiled Sat 09-Aug-03 07:18 by
ccai"RFC1213-MIB::sysObjectID.0 = OID:
SNMPv2-SMI::enterprises.9.1.186DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks:
(813825536) 94 days, 4:37:35.36RFC1213-MIB::sysContact.0 = STRING: "
support@opsview.com"RFC1213-MIB::sysName.0
= STRING: "2611"RFC1213-MIB::sysLocation.0 = STRING:
"Reading"

Centos/RHEL - Automatic dependencies

yum should automatically resolve all dependencies when installing Opsview Monitor. However, in some instances during installation, if the Opsview Monitor packages don't include opsview-base, opsview-perl and so on, then ensure yum-updatesd-helper is not running and execute the following commands:

# yum remove opsview
# yum clean all
# yum makecache

Finally, running the command shown below should show the correct dependencies and allow Opsview Monitor to install correctly:

# yum deplist opsview
# yum install opsview

Access denied for some files within the repositories

If you replicate our public repository to a server on your own network, you may find you get 'Access Denied' errors when trying to copy some files.

This is expected behavior as some files within the repository are restricted to customers that purchase additional modules.

Troubleshooting

How to investigate Opsview Monitor problems