Opsview Knowledge Center

Host

Brief Overview of a Host in Opsview Monitor

In Opsview Monitor, a Host is defined as:

'An autonomous computing device, such as (but not limited to), a server, virtual server, a slave server, database server, workstation, PC, network device, storage device, sensor, tablet or mobile device.'

Hosts are effectively logical end-points, meaning if you wish to monitor an Oracle database on a Host you add the Host. Conversely if you wish to monitor an VMware vSphere server running 64 guests, you can add that as a Host or you could add each guest individually as a Host, allowing the monitoring and alerting on the per-guest metrics such as CPU usage and so forth.

The modification, addition and deletion of Hosts is done primarily via the 'Hosts' section with 'Settings':

You can also choose to add Hosts via an Auto Discovery scan, outlined here. The main benefit of Auto Discovery is the bulk addition of Hosts discovered via a network scan, compared to the Hosts settings page where Hosts are added on an individual basis. However, Hosts can be bulk-edited once they added to Opsview Monitor.

The Host settings section comprises of a sortable/filterable grid view containing all of the Hosts within the Opsview Monitor system. Each column header can be filtered on relevant information, i.e. a User can filter the list to show only Hosts that have a given Host Template applied, or show only Hosts that are members of a given Host Group.

In the above screen we have the Host settings page of a newly installed Opsview Monitor system. The screen contains a range of information and options.

In the top right, there is a drop-down menu allowing the User to configure how many Hosts are visible on the page. There are options for '25, 50, 100 and 200' Hosts per page. There is also a string of text highlighting the limit of Hosts that can added to this system; this information is derived from the Opsview Monitor software key.

In the top left, there are four buttons:

  • Add new: This button loads a modal window which allows you to add a new Host.
  • Bulk edit: This button loads a modal window allowing the bulk-editing of selected Hosts. The selection of the Hosts is performed once the bulk edit window is loaded.
  • Export: This button loads a drop down window offering three options: .csv, json and xml. When one of these options is chosen the list of Hosts and their relevant data (Host Group, Host template(s), etc) is exported in the chosen format.
  • Clear filters: This button will clear all filters applied via the column headers.

On the bottom row are the standard Opsview Monitor pagination controls. These controls enable you to move quickly between the paginated lists when a large number of Hosts are in the system.

The Host list page by default will list all Hosts within Opsview Monitor that your role can see, as shown below:

However, when one or more Hosts have been modified and are in a 'pending' state, i.e. changes have been applied but a 'Reload' hasn't been performed, the Host list page will have a new section added at the top which will display all modified Hosts.

This allows you to easily see which Hosts have been modified and are pending a reload, and is especially useful if you are searching for the 10 modified Hosts out of a list of 5,000+. The Hosts pending a reload will also have the yellow 'pending' badge in the far right column and there is a link to the 'Reload' action page at the bottom of the 'Modified Hosts' section, for ease of use.

Host Edit/Add Modal Window

When a Host is edited or the 'Add New' button is clicked, a modal window will appear.

The modal window is split into five tabs (six tabs if you have a subscription for the Network Analyzer):

  • Host
  • Notifications
  • Service Checks
  • SNMP
  • Attributes
  • NetAudit (only displayed when the Opsview Monitor system has a valid Network Analyzer subscription)

Host Tab

The Host tab is the main configuration window when adding or configuring a Host. It is split into two sections, 'Basic' and 'Advanced'. The Basic section is the main settings a Host needs to have configured. Items denoted with a red star () are mandatory fields.
In the *Basic
section there are four options that need to be configured:

  • Primary Hostname/IP: In this field should be the network address of the Host; either an IP or a domain resolvable by the Opsview Monitor box. The network address entered in this is used by the system macro '$HOSTADDRESS$', which is used throughout the entire Opsview Monitor system.
  • Host Title: This is the 'friendly' name of the Host and is what is displayed in the Opsview Monitor software. If your network address is '192.168.123.123' you may want to assign the Host a 'friendlier' name, i.e. 'The Router'. This field is limited to 63 characters.
  • Host Group: As covered in the earlier section, a Host Group is a container for one or more Hosts. In this drop-down list, all available Host Groups will be listed. Host Groups containing other Host Groups won't be listed, as covered in Section 4.3. (Host Groups can only contain either all Hosts, or all Host Groups ' not a mixture).
  • Host Templates: Covered in a separate section, Host Templates are a group of Service Checks that can be applied to a given Host. Host Templates provide the ability to monitor certain technologies; for example, if the Host you are adding is an Oracle database, apply the 'Database ' Oracle' Host template by selecting it in the left-hand column and clicking the 'right arrow'.

In the Advanced section there is a range of optional settings that can be configured for the Host:

  • Other Hostnames/IPs: This is a comma-separated list of other network addresses relating to the Host. For example, if the Host has two IP addresses, you may enter the first IP address in the 'Basic > Primary Hostname/IP' field, and the second IP address in this field. The primary Hostname/IP is addressed using the $HOSTADDRESS$ macro, whereas all comma-separated values entered in this field are addressed as '$ADDRESS1$, $ADDRESS2$ and so forth. To use these values in a Service Check instead of the Primary Hostname/IP, simply replace $HOSTADDRESS$ with $ADDRESS1$, for example. If other addresses are not specified in this field yet the $ADDRESSx$ macro is used, Opsview Monitor will default the value to the Primary Hostname/IP instead. This field is also used for relating these IP addresses to this Host for the purpose of SNMP trap processing.
  • Description: Free text entry field, this field is purely for describing the Host and is not used elsewhere within Opsview Monitor.
  • Host Check Command: Covered in detail in Section 6. The Host Check Command is used to determine the Host status which can be one of three statuses: 'UP' ' responding to the Host Check Command, 'DOWN' ' no response, or 'UNREACHABLE' for when the Host has a parent relationship configured, and the parent is in a DOWN state

By default, the Host Check Command is ping, therefore if ICMP traffic is blocked between the Host and the Opsview Monitor server you should change the Host Check Command to one that is allowed to traverse the network, inbound to the Host.

  • Icon: The icon is used within the 'Host Groups, Hosts and Services' section with 'Monitoring' to identify the Host, along with being visible in the Host list page. Opsview Monitor ships with a series of default icons that can be chosen via the drop-down box. To upload your own icon simply use the 'Hosticon_admin' script via the command line. This script is located within /usr/local/nagios/bin/. As the 'nagios' user, run the command 'hosticon_admin add 'LOGO - Hosticon' /path/to/Hosticon.png' where 'LOGO ' Hosticon' is what you wish the icon to show as within the dropdown menu, and /path/to/Hosticon.png is where the image is you wish to convert into an icon. To delete a Host icon, run the following command as the nagios user:
    hosticon_admin remove 'LOGO ' Hosticon'

To list all of the icons within Opsview Monitor run the command:

    hosticon_admin list
  • Check Period:The check period is a list of 'Time Periods' available within the Opsview Monitor system. Time Periods are covered in detail in a later section. However, they are essentially a weekly format which allows a user to create a time period called 'working hours', for example, that is Monday ' Friday, 9:00 am to 5:00 pm. When this time period is then applied to a Host, this Host is only monitored during the 'working hours' specified, i.e. Monday to Friday, 9:00 am-5:00pm.
  • Check Interval: Working in combination with the check period and the Host Check Command, the Check Interval is how regularly the Host is checked using the Host Check Command during the specified time period. If set to '5m' (default) and all settings are left to default, the Host will be pinged once every five minutes when the time period is valid (i.e. the Host is being monitored). This field allows for hours (h), minutes (m) or seconds (s) i.e. '24h' means once a day, '30s' means every 30 seconds. The field can also be set to '0' which means the Host is always considered 'UP' unless a check has been manually requested (i.e. 'Recheck' is run against the Host via its contextual menu).
  • Max Check Attempts: This field determines the number of times a Host Check Command has to fail for the Host to change into a 'hard state'. In Opsview Monitor there is the concept of 'Soft' and 'Hard' states. When a Host check fails and the Host changes into the 'DOWN' state it is considered a 'Soft' state. After the Host Check Command has failed for the number of times specified in this field is considered a 'hard' state, i.e. not a temporary blip, etc. You can use hard states so that they are only notified when a Host is truly down. The interval used here is not the 'check interval' but the 'Retry interval'.
  • Retry Interval: A separate field to the 'Check interval', the 'Retry Interval' is only used when a Host goes into the 'DOWN' / 'UNREACHABLE' state. For a Host to go from a 'soft' state to a 'hard' state, the Host Check Command must fail $X number of times, where $X is the value set in this field. For example, if the Retry Interval is 1m and the Max Check Attempts is set to 3, the Host Check Command will run once a minute for two further minutes (the first failure is what triggers the retry)' after which if the Host is still 'DOWN' it will change from a 'soft DOWN' to a 'hard DOWN'.
  • Hashtags: Covered in greater detail within the 'Hashtags' section, this drop-down is a list of all Hashtags within the Opsview Monitor system. By selecting one or more Hashtags from this drop down menu you are 'tagging' the Host with the Hashtag. This means when you tag a Host with 'linux-systems', anyone whose role allows them to view Hosts tagged with 'linux-systems' will be able to view this Host. Similar logic applies for Notifications.
  • Globally Applied Hashtags: When a Hashtag is applied from the 'Settings > Hashtags' section and not via the Host, it will appear in this list. To remove the Hashtag from the Host simply edit the relevant Hashtag via 'Settings > Hashtag >' and click on the Hashtag in question.
  • Event Handler: Covered in greater detail in the 'Event Handler' section of the User Guide, Event Handlers are scripts that can be triggered when a Host goes into a 'DOWN' or 'UNREACHABLE' state (soft/hard, depending on the event handler script). The script can do anything you like, but a common usage includes restarting a service or server(virtual machine, for example) via an API.
  • Parents: This relationship is used to calculate if a Host is DOWN or UNREACHABLE, ie, if the dependencies for the Host mean the Host is really down or if something in the middle is hiding the true state of the Host. Use this to relationship to minimise Notifications as you can disable Notifications for UNREACHABLE Hosts.

For example, if you have a switch as the parent of 10 Hosts and the switch is marked as DOWN, then when the 10 Hosts are checked and considered DOWN, they will be marked as UNREACHABLE instead and you will only get one Notification for the switch instead of 10 Host Notifications. Note, there maybe a delay in this eventual condition as results will be coming in at different times. You can select multiple parents, if you have a failover capability.

Notifications Tab

The Notifications tab contains various settings relating to when and why Notifications are sent for this particular Host.

  • Notify On: This section determines which states the Host should notify on, i.e. only on 'DOWN' or 'UNREACHABLE', for example. If a Host does not notify on any states, then the services on that Host will also not send any notifications.
  • Notification Period: This field uses the 'Time Periods' already defined within the Opsview Monitor system, and determines when notifications are allowed to be sent to users
  • Re-notification Interval: This field determines the period of time (in hours, minutes or seconds) after which a notification is re-sent if the Host is still unhandled (i.e. the problem has not been ACKNOWLEDGED or put into Downtime). If this is set to '0', only the first notification is sent (when the Host changes to the 'HARD' state).
  • Flap Detection: This checkbox toggles flap detection on and off. Flap Detection is used in notifications and other areas of Opsview Monitor, i.e. don't send me an alert if the Host is flapping. A Host is considered 'flapping' if it changes state between OK and non-OK more than seven times in the last 20 checks.

Service Checks Tab

The Service Checks tab is designed to give you the ability to:

  • Add Service Checks to a Host.
  • Modify Service Checks on a Host basis, i.e. use different arguments just for this Host.
  • Omit Service Checks that have been inherited via a Host Template; i.e. 'we don't want this service check on this Host but we want the rest from the Host Template'.
  • Test Service Checks against a Host before submitting the change and reloading.

The left-hand section of the Service Checks tab displays the Service Check 'tree'. Service Checks reside within 'Service Groups' (covered in a separate section), e.g. the checks visible above, such as 'CPU statistics', live within the service group 'OS ' Base Unix Agent'. The algorithm behind the tree structure creation uses the hyphens as the separator, therefore 'OS ' Base Unix Agent' becomes 'OS' at the top level, and 'Base Unix Agent' at the 2nd level down.

In the tree on the far right of the Service Checks' row (the items with the check boxes) is the location where one of two icons will potentially be displayed. These icons depict whether this Service Check is inherited from a Host template or whether it was originally inherited from a Service Check and has since been 'omitted', i.e. 'don't apply this Service Check to this Host'. This 'omit' option is toggled by using the 'Remove Service Check from Host Template' option within the Service Check, and only becomes visible when the Service Check is checked in the left hand section.

A Service Check inherited from 'OS ' Unix Base' Host Template, that has been omitted, i.e. won't be applied to this Host.

A Service Check inherited from 'OS ' Unix Base' Host Template.

If a Service Check is inherited from a Host template yet isn't 'checked' in the left-hand section, the 'Exceptions' section will not be editable and the 'Remove Service Check from Host Template' toggle button will not be visible. To edit these items, checking the box next to the Service Check tells Opsview Monitor to look at this section for information on this Service Check instead of the Host Template.

The right-hand section of the Service Checks tab is populated with information and options relevant to the selected Service Check and is commonly referred to as the 'Service Check information panel'.
When no Service Check is selected, this section will contain a message informing you to select a Service Check first.

Service checks tab without a service check selected

Service checks tab without a service check selected

When a Service Check is selected and checked in the left-hand tree panel, the Service Check information panel will populate with relevant data:

Example Service Check
The 'Service Check information' panel contains:

  • Service Check name
  • Service Check description
  • Plugin
  • Default arguments

This information is non-editable from within the Service Check info panel and is only included in this view for an information perspective.

The 'Service Check information' panel also contains:

  • Plugin and Macro Help buttons
  • Test Service Check: arguments
  • Test Service Check: button
  • Variables drawer
  • Exceptions drawer

The 'Test Service Check' button, 'Plugin and Macro Help' and 'Test Service Check: arguments' sections are designed to provide the ability to test a Service Check against the relevant Host before submitting the changes ('Submit Changes') and reloading Opsview Monitor. By proving the Service Check will perform as expected, i.e. work correctly before a reload takes place, will save hours of time.

Example 'Test Service Check' output

Example 'Test Service Check' output

The 'Plugin' help button will load a new modal window displaying the 'Help file' for the plugin. The 'Macro' help button will load a new modal window displaying all of the available system macros (system 'variables', essentially).

Example plugin help output.

Example macro help output

Example macro help output

The 'Test Service Check arguments' input box allows for the modification of the arguments before they are passed to the plugin to be executed within the 'Test Service Check' box. Here you can modify items relevant to the Service Check, i.e. a different port, file name, etc.

Example 'Test Service Check arguments' box

Example 'Test Service Check arguments' box

The 'Variables' drawer contains all variables that the Service Check may be using. 'Variables' are covered in great detail within their relevant User Guide section; however, in essence they act like standard computer science 'Variables', in that you can configure '-p %PORT%' instead of '-p 9200' for an Elasticsearch Service Check's arguments. The benefit of this is that by using a Variable instead of hard coding the port, you can apply the Service Check to hundreds of Hosts and simply add the '%PORT%' Variable to the Hosts which don't have Elasticsearch on port 9200.

If a Service Check requires a Variable in order to successfully work, then the Variable will be listed within the 'Variables drawer'. In the example Service Check below we are applying a Service Check to monitor the number of Bytes received for a MySQL database. The syntax for this Service Check is:

-H $HOSTADDRESS$ -u %MYSQLCREDENTIALS:1% -p %MYSQLCREDENTIALS:2% --metricname=Bytes_received

This means that the username field (-U) and the password field (-P) are located within the %MYSQLCREDENTIALS% attribute. By default, the Variables drawer will be empty. It will only be populated with the Variables required once you have pressed the 'Test' button:

Example of Variables drawer where there are either no variables required, or the 'Test' button has not been pressed.

Example of a Variables drawer where the test button has been pressed and %MYSQLCREDENTIALS% is required.

If these Variables are not populated with global defaults (Settings > Variables), then the Service Check will fail as there is no means to log in to the MySQL database in order to monitor it. If that is the case then you will need click the 'Add' button next to the Variable, which will navigate to the 'Variables' tab and add a new Variable as below:

Here you will need to enter a value (not relevant to this Service Check so enter anything) and click 'Save'. Once saved the 'Host variable details' panel will populate. Here you can now check both 'Override username' and 'Override password' and enter the correct login information for this database.

Once the correct information is added, navigate back to the Service Checks tab and click 'Test' again and if you have added the correct credentials, the Service Check should now successfully work:

Example 'Test Service Check' output using the correct Variables

Example 'Test Service Check' output using the correct Variables

You can now safely submit the changes ('Submit Changes') and reload Opsview Monitor knowing that the Service Check will work post-reload.

If you wish to change the actual plugin arguments themselves (i.e. add a warning/critical level (-w/-c) to the Service Check), then you can do so via the Exceptions drawer.
As covered earlier in this section, the Exceptions drawer will not be modifiable until the Service Check is checked in the left-hand tree pane. Once the check is checked, the Exceptions drawer should look similar to the one below:

The three options are:

  • Exception: change the arguments of the Service Check regardless of Time Period.
  • Timed Exception: Change the arguments of the Service Check to be what is entered in the text box, but only change the arguments during the chosen Time Period.
  • Event Handler: Override the Service Check's Event Handler with a custom option.

For the 'MySQL Bytes Received' Service Check you may wish to add a '-w' and '-c' option, e.g. if the number of Bytes received is over 'X' set the status to WARNING, and over 'Y' set the status to CRITICAL. You can choose to do this by selecting 'Exception' and modifying the arguments, using the 'Plugin Help' modal for direction on how to modify these arguments:

When we next run the 'Test Service Check', the command run will be modified to use the arguments specified here instead of the 'default' arguments:

The 'Timed Exception' option works exactly the same, however the defined arguments will not be 'injected' into the Service Check until the relevant time period begins.

SNMP Tab

The 'SNMP' tab is where you can configure SNMP settings for a Host. For example, if you wish to use plugins or Service Checks which rely on SNMP then the relevant SNMP credentials will first need to be configured and tested within this section.

The tab is split into three sections:

  • Credentials
  • Interfaces
  • SNMP Traps

These sections are visible when 'Enable SNMP:' is checked. When it is not checked, the three drawers are hidden.

Credentials

The Credentials drawer is where you should select the version of SNMP used, along with the relevant authentication information. For SNMP v1 or v2c, only the port and community string need to be specified. For SNMP v3, a port, username, authentication protocol, authentication password, privacy protocol and privacy password are all required.

On first entry to the 'Credentials drawer', the SNMP community string for v2c (for example) will say 'SNMP community encrypted ' click to reset'. This message is displayed as a secure, encrypted placeholder is used until a valid community string is set. Simply click on the button as directed and enter the community string in before clicking 'Test SNMP Connection' to ensure the credentials have been entered correctly. Once authentication data has been entered into the UI it cannot be retrieved (due to security), however it can be reset.

Example of a successful connection

Example of a successful connection

Example of a failed connection.

Example of a failed connection.

For a breakdown of the relevant credential information see the tables below:

SNMP v1 and v2c

Field Details
SNMP Port This defines the port number to connect to the SNMP device. Default is 161.
SNMP Community This defines the community string to connect to the SNMP device.This value will be encrypted in the Opsview Monitor database. After this value has been saved, it cannot be retrieved back in the user interface.If you want to change the value, click the Reset button to change it.

SNMP v3

Field Details
SNMPv3 Username This defines the SNMPv3 username to connect to the SNMP device.
SNMPv3 Authentication Protocol This defines the SNMPv3 protocol to connect to the SNMP device to authenticate the User.
SNMPv3 Authentication Password This defines the SNMPv3 password to connect to the SNMP device to authenticate the User.This value will be encrypted in the Opsview Monitor database. After this value has been saved, it cannot be retrieved back in the user interface.If you want to change the value, click the Reset button to change it.
SNMPv3 Privacy Protocol This defines the SNMPv3 protocol to encrypt traffic between Opsview Monitor and the SNMP device.
SNMPv3 Privacy Password This defines the SNMPv3 password to encrypt traffic between Opsview Monitor and the SNMP device. If this is not set, then no attempt to encrypt traffic will take place. For devices using Net-SNMP, an empty privacy password will still allow connection to the device even if a privacy password is defined for a user.This value will be encrypted in the Opsview database. After this value has been saved, it cannot be retrieved back in the user interface.If you want to change the value, click the Reset button to change it.

Interfaces

The second drawer within SNMP is the 'interfaces' drawer. This drawer is used to list all the available interfaces on this Host; with the list being gathered via SNMP (therefore correct credentials are a pre-requisite).

Example of a Cisco router's interfaces

Example of a Cisco router's interfaces

Important: Please note that in order to monitor the interfaces of a Host, you must apply the 'SNMP ' MIB II' Host template before a reload is undertaken. This template is comprised of the Service Checks 'Interface Poller', 'Interface', 'Discards' and 'Errors'.

To view the interfaces of a Host click on the 'Query Host' button which will populate the table with the available interfaces. There are a few options you may wish to modify before running the query:

  • Extended Throughput Data: If this option is enabled then the Interface Service Check will also return unicast, multicast and broadcast performance data. This will be in the form of bits per second based on the interface speed.
  • SNMP Message Size: Some SNMP devices can return a significant amount of data which fills the standard SNMP buffer size of around 500 octets. Many devices cannot cope with setting the maximum buffer size so this option allows the size to be tailored to each device. The units are in Kio which are multiples of 1024.
  • Modify ifDescr Level: Some SNMP devices can have very long descriptions (ifDescr) for each interface on a device, mostly made up from common words. There is a limit in Opsview Monitor that this description shouldn't exceed 52 characters otherwise monitoring the interface will not work as expected (a 'duplicate interface' error may be shown at the bottom of the screen).
  • Setting this option can remove common words to reduce the length of each interface ifDescr and help to avoid duplicate interfaces.

The settings are as follows:

Setting Words Removed
Off (default) None
Level 1 'Nortel Ethernet', 'Nortel', 'Routing', 'Module'
Level 2 Trailing spaces removed
Level 3 'PCI Express', 'Quad Port', 'Gigabit', 'Server'
Level 4 'Corrigent systems', ', , '
Level 5 'Ethernet', 'Frontpanel', 'RJ45', '1000BASE-T', '- no sfp inserted'

Levels are cumulative. Further levels may be added in the future. The level should not be changed once monitoring is working to prevent loss of historical data.

Example output of 'Query Host'

Example output of 'Query Host'

The table section of the 'Interfaces drawer' has five main columns:

  • Selection box: Check box; check this to monitor the interface. If you select an interface using the check box beside the name, Opsview Monitor will create a service for each interface after a reload. This will monitor throughput, errors and discards. Use the checkbox in the column header to toggle all interface checkboxes.
  • Interfaces to poll: The description of the interface.
  • Throughput
  • Errors
  • Discards

For the discards, errors and throughput fields a threshold can be set. For any selected interface, if the cell is empty, the threshold value will be taken from the default line. If a cell is set to -, then no threshold will be set. This is equivalent to saying 'I do not want to set a warning threshold'.

Throughput is monitored from the 'multiple' Service Check called Interface. This calculates the rate of throughput between checks and returns the input and output information. If the rate is above the threshold value, then an alert will be raised at the appropriate level.

Performance data will be returned based on the input and output rate in octets per second. If the threshold is specified as a percentage value, the performance data returned will be a percentage value instead.

If a percentage threshold is not specified and it is not possible to work out the interface speed (eg VLANs), then the plugin will return a WARNING with the message:

INTERFACENAME throughput (in/out) X bps/Y bps but has an interface speed of 0, so cannot check a percentage threshold
You should set the threshold to be based on bits per second for this interface, rather than using a percentage threshold.

It is possible to use advanced syntax for more complicated threshold checking. For example:

IN 10:50% - alert if input throughput is below 10% or above 50%
OUT 30000:50000 - alert if output throughput is below 30,000 bits/sec or above 50,000 bits/sec
IN 10:50% and OUT 30:55% - alert if both input throughput is below 10% or above 50% and output throughput is below 30% or above 55%
IN 10:50% or OUT 30:55% - alert if either input throughput is below 10% or above 50% or output throughput is below 30% or above 55%
40:60% - this is the same as IN 40:60% or OUT 40:60%
75% - this is the same as 0:75% which was the old behaviour.

Most whitespace is ignored. Note that you cannot mix percentage and bits per second values in the same threshold.

Errors is monitored from the 'multiple' Service Check called Errors. This calculates the average number of errors per minute between checks, and returns the input and output error per minute information. If the rate is above the threshold, then an alert will be raised at the appropriate level. Performance data will be returned based on the input and output errors per minute.

Discards is monitored from the 'multiple' Service Check called Discards. This calculates the average number of discards per minute between checks, and returns the input and output error per minute information. If the rate is above the threshold, then an alert will be raised at the appropriate level. Performance data will be returned based on the input and output errors per minute.

If there is a Host you do not want to monitor throughput, errors or discards on, you can simply remove the service check 'Interface', 'Errors' or 'Discards' from the Host.

Note: If the interface is down, then the state of Errors / Discards will be set to OK and the output will say Interface NAME is down. Also, you should set maximum check attempts to one because a subsequent invocation may have no errors and a notification will not get raised.

SNMP TRAPS

The third drawer within SNMP is the 'SNMP Traps' drawer. This drawer is used to list all the available options relating to SNMP Traps and this Host.

Currently there is only one option called 'Enable tracing'. Once enabled, Opsview Monitor will be begin tracing traps for the Host. To collect the traps for the Host, click on the 'collect' button which will pull the debugging information to the master server in a distributed setup and parses the traps recieved.

SNMP Limitations

You need to have SNMPv2c if you are monitoring an interface of 100Mbs or over. This is because SNMPv2 supports 64bit counters, but SNMPv1 doesn't. If you use SNMPv1, your graphs are likely to have gaps in them.

Interfaces are monitored by name, so if the SNMP index position changes (which could happen on a router reboot), then a rescan of the device will occur to check (Opsview Monitor treats the SNMP index as an internal number which a system does not need to know about. By working with names only, Opsview Monitor can automatically follow any changes to the SNMP index position without human intervention).

If there are multiple interfaces with the same name, the ifIndex will also be passed to the plugin to check. If the ifName does not match the expect interface name for this ifIndex, an alert will be raised which says:

WARNING - Interface name $user_specified_ifname expected at index $user_specified_index, but got $name!

You will need to run Query Host to list the interfaces to check again.

Note**:** if the index moves to a position with the same interface name, then Opsview Monitor will not see a change and will continue monitoring this interface as usual even though it could be a different interface.
If you have a Cisco router, please check this Cisco support article regarding ifIndex persistence.

Why aren't my interfaces being monitored?

The services are only created if the Host has the 'SNMP ' MIB-II' Host template applied, or has the Interface Poller, Interface, Discards and Errors Service Checks associated to the Host directly via the 'Service Checks tab'.

I'm getting thresholds that are over 100%

For each interface, Opsview Monitor will work out the utilization of an interface based on the amount of bytes transferred as reported by SNMP divided by the time difference of the two values, as a percentage of the interface speed as reported by SNMPs ifSpeed counter.
We have done debugging where we have run Opsview Monitor's plugins and compared figures with a regularly executed snmpwalk and have found that the data values are exactly the same, so we are confident that the collection of data and the calculation of the utilization is correct.

There seem to be different reasons for why you can get over 100% utilization:

  • The wrong ifSpeed is reported by the device. This can sometimes occur with Net-SNMP, but it is possible to set the speed correctly in the configuration file
  • Some speeds are not the maximum possible throughput. ifSpeed is defined as 'An estimate of the interface's current bandwidth in bits per second'
  • Full duplex may skew the results as you may be able to get more transfer in one direction than in another
  • Some devices only update the SNMP counters at certain intervals. This means you could see sudden spikes in utilization if Opsview gathers data at different intervals

If you have interfaces that are consistently reporting more than 100% utilization, please contact Opsview Monitor Customer Success who can assist.

Plugin raises a WARNING about an interface with 0 speed
If you get an error like:

INTERFACENAME throughput (in/out) 0 bps/0 bps but has an interface speed of 0, so cannot check a percentage threshold

When a threshold is specified as a percentage value, Opsview Monitor works out the percent utilization based on the speed. However, if the speed is zero, this is not possible.

Possible resolutions:

  • The device is reporting the incorrect speed - contact the device manufacturer. If the device is a Unix server running net-snmp, you can force net-snmp to set a specific speed per interface
  • The interface is not valid for monitoring - uncheck the interface from being monitored
  • You still want to monitor the interface status - set the threshold to a dash (which means that no threshold check will be required) or set an absolute threshold rather than a percentage, so the speed check is ignored

There are duplicate names in the interface SNMP table which has some limitations
Interfaces are tracked by their name rather than their ID as provided by the device being monitored - this is because some devices reallocate ID's on a reboot.

Opsview tracks these interfaces by fetching each interface 'IfDescr' and shortening it to 52 characters and storing it as the 'short interface name'. This limit is the standard length of interface description supported by the majority of devices. This can appear to cause duplicate interface names however if the IfDescr contains unnecessary duplicate text, i.e.

Nortel Ethernet Routing Switch 5510-48T Module - Unit 1 Port 1
Nortel Ethernet Routing Switch 5510-48T Module - Unit 1 Port 2
Nortel Ethernet Routing Switch 5510-48T Module - Unit 1 Port 3

would all be shortened to

Nortel Ethernet Routing Switch 5510-48T Module ' Un

You can either reconfigure all the interface IfDescr's on the device to only contain short unique names such as

5510-48T Unit 1 Port 13

And the re-running the 'Query Host' on the Host configuration SNMP page, or set the 'Modify ifDescr Level' option which attempts to remove certain 'common' words. See the section 'INTERFACES' above for more details.

Variables Tab

As mentioned in the 'Service Checks' section, 'Variables' are covered in great detail within their relevant User Guide section. However, in essence they act like standard computer science 'Variables', in that you can configure '-p %PORT%' instead of '-p 9200' for an Elasticsearch Service Check's arguments. The benefit of this is that by using a Variable instead of hard coding the port, you can apply the Service Check to hyndreds of Hosts and simply add the '%PORT%' variable to the Hosts who don't have Elasticsearch on port 9200.

In our example above, we have added the 'Database ' MySQL' Host template which requires the %MYSQLCREDENTIALS% variable to be populated with relevant username/password data.

This can be configured at a global level via 'Settings > Variables > %MYSQLCREDENTIALS%', which means any Host that has Service Checks/Host templates using the %MYSQLCREDENTIALS% variable will use the values set here (the global 'defaults'), however if a Host has a different set of credentials you can choose to add the %MYSQLCREDENTIALS% locally via the 'Variables' tab. If the Variable is added to the Host locally, the values set here are used first.

For the Host in the screen above, you can choose to override the username/password with the custom, Host-specific ones by checking the 'Override username' and 'Override password' fields, respectively. The 'Password' field has been set to an 'encrypted' one at the Variable level, which means once the value is overridden and the 'Submit changes' button has been pressed, the value entered cannot be retrieved ' only overwritten.

NetAudit Tab

The NetAudit tab is an optional tab present only for Users who have purchased the Network Analyzer module for Opsview Monitor.

In this tab Users can configure the settings needed in order to allow Opsview Monitor to log in to the Host and back up the network device's configuration.

NetAudit is covered in greater detail within the dedicated User Guide section.

Host

Brief Overview of a Host in Opsview Monitor