Opsview Knowledge Center

Monitoring Plugins

Overview of Plugins in Opsview Monitor

Adding a New Plugin to Opsview Monitor

Monitoring plugins are stored within /usr/local/nagios/libexec on both Opsview Monitor master and slave servers. To add a new plugin to the Opsview Monitor software, simply copy it to the folder and set the permissions appropriate.

For example, to copy it from Mac/Linux to the Opsview Master you should run the following command (using SCP):

scp 'P 22 new_plugin.pl user@opsview.monitor:/usr/local/nagios/libexec

Once the plugin is successfully copied, SSH to the server and navigate to the plugin directory:

ssh user@opsview.monitor -p 22
cd /usr/local/nagios/libexec

Once within the directory, set the ownership and file permissions respectively:

chown nagios:nagios new_plugin.pl
chmod +x new_plugin.pl

Finally, change to the 'nagios' user and attempt to run the plugin:

su ' nagios
cd /usr/local/nagios/libexec 

If the plugin execution fails, please ensure that the permissions are set appropriately, as above, and read the error message for any potential dependencies that need to be installed ' i.e. SDK's or pre-requisite packages.

Copying plugins from a Windows desktop to Opsview Monitor is very similar, however you must use WinSCP instead (or FTP/CIFS, if they are configured on the Opsview Monitor server).

What are Monitoring Plugins?

Monitoring plugins, sometimes known as 'Nagios plugins', are the most common and popular way of monitoring hosts ' aside from SNMP.

Monitoring plugins can be written in any language, from bash and C to Perl and Python ' meaning anyone who can script can create plugins! For a detailed look at writing Monitoring Plugins, see this great guide

In essence, a monitoring plugin is a translator that resides between Opsview Monitor and the item we wish to monitor. The plugin speaks both languages; It knows how to speak to Opsview in Opsview Monitor's language, and it knows how to talk to the Host in the Host's language:

For example, If Opsview Monitor wants to talk to a Windows Host it will need to know how to 'talk Windows'. This is where a plugin comes in. Opsview Monitorsimply asks the question, 'Hey, go and find out how full the C:/ drive is'. The Plugin goes to the Windows Host, asks the question, gets the answer, and converts it into a format that Opsview Monitor understands and can process for alerts, graphs and more.

Most, If not all monitoring plugins, require input in order to run. For example the Windows C:/ service check above will likely require the username and password to authenticate, but also the name of the drive that needs to be monitored. These bits of information are known as 'arguments'. Arguments provide the plugin with the information required to run correctly. Each plugin generally comes with a help file, visible within Opsview Monitor via 'Show plugin help', which explains what options are needed, what options are available, and how to set them.

Common options, also known as 'flags', are:

  • -H: After this flag, enter the Host address, i.e. '-H' to run the plugin against
  • -w: The warning flag. If monitoring a number, i.e. a temperature, the warning flag is followed by a number that is exceeded will result in the service check changing to a WARNING status. i.e. '-w 55' means if the temperature is above 55, then set to WARNING (unless the temperature exceeds the CRITICAL level, as set below, then the Service Check will change to a CRITICAL state).
  • -c: The critical flag. The number immediately following the 'c option tells Opsview 'If the temperature is above the number, change the state of the Service Check to CRITICAL. i.e. '-c 75'. There are often plugin specific flags that need to be set, such as 'u/-p (username / password), and more.

A plugin-based service check comprises of two component parts:

<plugin> <arguments> For example, the Service Check below returns the number of users connected to an Oracle database:
check_oracle_health --connect=$HOSTADDRESS$ --user=%ORACREDENTIALS:1% --password=%ORACREDENTIALS:2% --name=system --mode=connected-users
The 'check_oracle_health' is the plugin, and the '--connect ' are the arguments needed by the plugin in order to successfully log in and retrieve the number of users. (Items wrapped in $ symbols are known as Macros, and items wrapped in % symbols are known as Variables ' both covered in Section What are variables and how do they work?'.

Plugins Directory

Plugins live on the master server (or slave server, which are managed by the master). They live within the folder /usr/local/nagios/libexec and must be owned by nagios:nagios and have the file permissions 644, as below:
-rwxr-xr-x1 nagios nagios11072 Aug 14 11:03 check_solr If the permissions and ownership is not set correctly, the plugin may not execute or may return errors ' depending on the plugin.

Status Codes

All plugins return a status code. This status code is what Opsview Monitor uses to determine the state of the service check.

Status code '0' means that the Service Check is running successfully and without errors, thus 'OK':

$  check_icmp -H -w 100.0,20% -c 500.0,60%Output: OK - : rta 0.287ms, lost 0%|rta=0.287ms;100.000;500.000;0; pl=0%;20;60;; rtmax=0.453ms;;;; rtmin=0.242ms;;;;
Return code: 0

Status code '1' means that the Service Check has exceeded the warning level determined or that the Service Check is not executing properly (misconfigured, etc) as below:

$check_http 'H -w 5 -c 10Output: HTTP WARNING: HTTP/1.1 401 Unauthorized - 192 bytes in 0.017 second response time |time=0.017094s;5.000000;10.000000;0.000000 size=192B;;;0Errors:Return code: 1

Status code '2' means that the Service Check has exceeded the critical level determined, or that the Service Check is not executing properly or has been denied access, as below:

$  check_nrpe -H -c check_load -a '-w 5,5,5 -c 9,9,9'Output: Connection refused by hostErrors:Return code: 2

Status code '3' means that the Service Check is returning in an 'UNKNOWN' state. This may indicate that the Service Check is misconfigured or that there is an issue with the monitored Host:

$check_apache_performance 'H -m bytes_per_request -t 60Output: APACHE STATUS UNKNOWN - 404 Not FoundErrors:Return code: 3

Performance Data
Most plugins return what is known as performance data. This data is listed in after the pipe symbol, '|', and is picked up by Opsview Monitor and stored for graphing purposes.

$check_icmp -H -w 100.0,20% -c 500.0,60%Output: OK - : rta 0.287ms, lost 0%|rta=0.287ms;100.000;500.000;0; pl=0%;20;60;; rtmax=0.453ms;;;; rtmin=0.242ms;;;; Errors:
Return code: 0

In the example above, the performance data is listed in yellow. This is easier to read within the 'investigate mode' for the Service Check (covered later in this section):

If there is no performance data present, the 'Graph' tab will be hidden (as there is no data to plot on the graph).

Configuring a New Plugin-based Service Check

To configure a new plugin-based Service Check, navigate to Service Checks; this is located within the 'Settings' tab in the overlay menu, as below:

Once within the Service Checks window, click on the 'Add New' button in the top level ' and then click on Plugin Check.

Once 'SNMP Polling' has been clicked a window similar to the one below will load:

The window is split into two tabs:

  • Details: This is where you can configure various Service Check-related fields, such as the name, description, its Service Group, its Host templates and more
  • Plugin and Arguments: The plugin specific tab ' this is where the plugin is selected and the arguments for the plugin are entered.

Details Tab: Advanced

The items within ' Advanced' are the less used, more 'advanced' Service Check options:

  • Hashtags: The Hashtags to which this Service Check will belong, when applied to one or more Hosts.
  • Globally applied hashtags: If the Service Check has been added to a Hashtag via the 'Settings > Hashtags' section instead of the selection box above, then the hashtags will be listed here. To remove the Service Check from the Hashtag listed here, you should edit the hashtag within 'Settings > Hashtags'.
  • Dependencies: Dependencies allow you to set a parent/child relationship for the Service Check, i.e. for this SNMP polling check, we may choose to have a parent Service Check of 'TCP Port 161'. This means that if the Service Check 'TCP Port 161' changes to a critical state (i.e. SNMP is down), then this Service Check and all other Service Checks that are a child of the aforementioned Service Check will change to an UNKNOWN state and will not resume normal running until after the parent Service Check returns to an 'OK' state. This not only reduces the work load of the Opsview Monitor server but also reduces alerts; Opsview Monitor will only alert for the 'TCP Port 161' failure and not for all of its dependent children.
  • Maximum check attempts: This field determines the number of times a Service Check has to fail for the Service Check to change into a 'hard state'. In Opsview Monitor 5.0 there is the concept of 'soft' and 'hard' states. When a Service Check fails and the Service Check changes into the 'CRITICAL' state it is considered a 'soft' state. After the service check has failed for the number of times specified in this field is considered a 'hard' state, i.e. not a temporary blip, etc. You can use hard states so that you are only notified when a Service Check is truly CRITICAL. The interval used here is not the 'check interval' but the 'Retry interval'.
  • Retry interval: A separate field to the 'Check interval', the 'Retry interval' is only used when a Service Check goes into a 'CRITICAL' / 'WARNING' / 'UNKNOWN' state. For a Service Check to go from a 'soft' state to a 'hard' state, the Service Check must fail $X number of times, where $X is the value set in this field. For example, if the Retry Interval is 1m and the Max Check Attempts is set to three, the service check will run once a minute for three minutes ' after which if the Service Check is still 'CRITICAL' it will change from a 'soft DOWN' to a 'hard DOWN'.
  • Notify for service on This section determines which states the Service Check should notify on, i.e. only on 'CRITICAL' or 'UNKNOWN', for example. Note: If aHost does not notify on any states, then the Service Checks on that Host will also not send any Notifications.
  • Notification period: This field uses the 'Time Periods' already defined within the Opsview Monitor system, and determines when Notifications are allowed to be sent to users.
  • Re-notification interval: This field determines the period of time (in hours, minutes or seconds) after which a Notification is re-sent if the Host is still unhandled (i.e. the problem has not been ACKNOWLEDGED). If this is set to '0', only the first notification is sent (when the Host changes to the 'HARD' state).
  • Create Multiple Services: If a Variable is selected within this drop-down, for each Variable of the selected type added a new Service Check will be added with the value in the Variable added to the Service Check name. I.e. if we have 'Disk Capacity' as a Service Check with '%DISK%' selected in the 'Create Multiple Services: drop-down', then if four Variables are added via the 'Variables' tab ' four Service Checks will be added 'Disk Capacity: Value1, Disk Capacity: Value2', and so forth.
  • Flap Detection: A service is considered flapping if its state changes too much. If this option is set, any services will be checked for this flapping condition and an icon will appear for the service and notifications will be temporarily disabled until the service comes out of a flapping state. We recommend that flap detection is enabled for active checks. However if you find a service is flapping frequently, there is probably another issue that needs investigating. We recommend that flap detection is disabled for passive checks.
  • Sensitive arguments: If the Service Check is a plugin-based one, then the Sensitive Arguments checkbox allow you to determine if the arguments for the Service Check are displayed within the 'Test Service Check' tab within the investigate mode. If the flag is checked, the arguments will be hidden ' if unchecked the arguments will be shown. If you have TESTCHANGE set within your Role, you will be able to modify the arguments before testing the service check.
  • Record Output Changes: Normally, the output of a Service Check is only recorded when the state of that service changes. For example, assuming a new check has been set up:
State Output Output Recorded
OK Service OK: 10% Yes
OK Service OK: 15% No
OK Service OK: 15% No
OK Service OK: 20% No
CRITICAL Service warning: 80% Yes
CRITICAL Service warning: 75% NO
WARNING Service warning: 70% Yes
WARNING Service warning: 40% No
WARNING Service warning: 40% No
OK Service OK: 20% Yes
OK Service OK: 18% No

This option instead causes every change of output to be logged regardless of change of state (for the selected state changes). For example, for the same sequence above with OK and WARNING selected:

State Output Output Recorded
OK Service OK: 10% Yes
OK Service OK: 15% Yes
OK Service OK: 15% No
OK Service OK: 20% Yes
CRITICAL Service warning: 80% Yes
CRITICAL Service warning: 75% NO - CRITICAL option was not selected
WARNING Service warning: 70% Yes
WARNING Service warning: 40% Yes
WARNING Service warning: 40% No
OK Service OK: 20% Yes
OK Service OK: 18% Yes
  • Alert every failure: This option forces a Notification to be sent on every check in a non-OK state. This is useful if you have a passive Service Check which receives results.
    There are three states for this option:
    • Disabled: only get alerts on state changes
    • Enabled: get alerts for every failed state. This overrides the re-notification interval option
    • Enabled with re-notification interval: get alerts for every failed state as long as the re-notification interval has passed. This is useful if you get a lot of results in quick succession .
      : The notification number will increase for every non-OK result and only gets reset to zero when an OK state is received.
  • Event handler: Covered in greater detail in the 'Event Handler' section of the User Guide, Event Handlers are scripts that can be triggered when a Service Check goes into or returns from a problem state, such as 'CRITICAL' or 'WARNING'. The script can do anything you like, but a common usage includes restarting a service or server(virtual machine, for example) via an API.
  • Markdown filter:If this option is chosen, then the service output will be filtered through the Markdown plugin. This allows you to mark up the output with bold, italics and URL links. For instance, if the output is:
**Disk failure** on *sd1* - see internal wiki

This will be displayed as:

**Disk failure** on *sd1* - see internal wiki

Use http://daringfireball.net/projects/markdown/dingus to test your plugin output. Bear in mind that you cannot use the pipe symbol as Nagios Core interprets this as the start of performance data.
Also, < and > characters are converted to the HTML entities so you cannot embed other HTML tags, and finally you should keep to only one line due to NSCA limitations in a distributed environment.
Therefore, you should stick to using just bold, italics and links in your output.

Note: If your plugin returns HTML output, this will be displayed as the text. You must use markdown format if you want to use links.

Plugin and Arguments Tab

Once you have configured the relevant options within the 'Details' tab, you can click on the Plugin and Arguments tab:

The main User steps for the plugin-based Service Checks is:

  • Select a plugin from the drop-down
  • View the 'Plugin Help'
  • Enter the relevant arguments
  • Submit changes

'Invert Plugin Results' is a checkbox that when checked will invert certain result codes from a plugin, i.e. a critical result can be inverted to OK and vice versa.

Note: If the plugin is not listed in the 'Plugin:' dropdown, the ensure that the permissions are set correctly and that the plugin starts with 'check'. Any script in libexec that does not start with 'check..' will not be listed.

Note: There are restrictions on what character can be placed within the 'value' field. The value field can only consist of alphanumeric characters, space, a period ('.'), a forward slash ('/') or a dash ('-'), up to 63 characters - this is because the value could be used in the service check name, which has restrictions on the characters used. Any trailing spaces will be removed. Also, the argument field allows any characters, but be aware that Nagios Core may process some special characters like !, $ and \. These special characters must be 'escaped', for example 'Password11!' becomes 'Password11\!', and 'PAS$WORD' becomes 'PAS$$WORD'.

Once the Service Check and its options have been configured, it can be applied to one or more Hosts. See Section Service Checks Tab for guides on how to add the newly-created Service Check to a Host.

Monitoring Plugins

Overview of Plugins in Opsview Monitor