Passive Checks are designed to allow in-bound data from other software such as NSCA (Nagios Service Check Acceptor) or other Monitoring tools, such as Nagios Core ©.
Passive checks are an empty Service Check. Opsview Monitor creates the Service Check, but we expect results to be pushed into Nagios Core via the Nagios Core command pipe from some external system. See the Nagios Core documentation for more details.
As an aside, in a distributed environment, Opsview Monitor actually creates passive services for all Hosts that are monitored by a slave and then uses the results from the slave to update the master's passive services.
Users can specify a passive Service Check to get its results from an active check. It is up to the plugin run by the active Service Check to return information into this particular passive Service Check.
For example, the Interface, Errors and Discards Service Checks will be defined as passive checks, where the results will be provided by the Interface Polling Service Check which runs the check_snmp_interfaces_cascade plugin.
If this field is set, then when a service is rechecked, the correct active service will be run.
To configure a new passive check, navigate to Service Checks; this is located within the 'Settings' tab in the overlay menu, as shown below:
Menu with 'Service Checks' highlighted
Once within the Service Checks window, click on the 'Add New' button in the top level ' and then click on 'Passive Checks':
'Add New > Passive Check' within Service Checks window
Once 'Passive Checks' has been clicked a window similar to below will load:
New 'Passive Check' Service Check
The window is split into two tabs:
- Details: This is where you can configure various Service Check-related fields, such as the name, description, its Service Group, its Host templates and more**
- Passive checks: The passive checks specific section; this is where the 'Cascaded from:' option is selected.
The Details tab is split into two drawers, 'Basic' and 'Advanced'.
The items within 'Basic' are the most commonly used fields for Service Check configuration:
- Name: The name of the Service Check, i.e. 'Cisco 3750 Stack configuration status'.
- Description: A friendly description of the Service Check, i.e. 'A custom SNMP check that returns the status of the Switch in the context of its stack configuration. Apply this to all stacked Cisco 3750's.'
- Service group: Covered in Section, Service Group is a container for one or more Service Checks and are used for alerting and access control, amongst others.
- Host templates: Covered in Section, a Host template can contain one or more Service Checks from any Service Group. While a Service Check can only ever belong to one Service Group, it can belong to as many Host templates as you desire.
The items within 'Advanced' are the less used, more 'advanced' Service Check options:
- Hashtags: The hashtags which this Service Check will belong to, when applied to one or more Hosts.
- Globally applied hashtags: If the Service Check has been added to a Hashtag via the 'Settings > Hashtags' section instead of the selection box above, then the Hashtags will be listed here. To remove the Service Check from the Hashtag listed here, you should edit the Hashtag within 'Settings > Hashtags'.
- Dependencies: Dependencies allow you to set a parent/child relationship for the Service Check, i.e. for this SNMP polling check, we may choose to have a parent Service Check of 'TCP Port 161'. This means that if the Service Check 'TCP Port 161' changes to a CRITICAL state (i.e. SNMP is down), then this Service Check and all other Service Checks that are a child of the aforementioned Service Check will change to an UNKNOWN state and will not resume their normal running until after the parent Service Check returns to an 'OK' state. This not only reduces the work load of the Opsview Monitor server but also reduces alerts; Opsview Monitor will only alert for the 'TCP Port 161' failure and not for all of its dependent children.
- Notify for service on This section determines which states the Service Check should notify on, i.e. only on 'CRITICAL' or 'UNKNOWN', for example. Note: If a Host does not notify on any states, then the Service Checks on that Host will also not send any Notifications.
- Notification period: This field uses the 'Time Periods' already defined within the Opsview Monitor system, and determines when Notifications are allowed to be sent to Users.
- Re-notification interval: This field determines the period of time (in hours, minutes or seconds) after which a Notification is re-sent if the Host is still unhandled (i.e. the problem has not been ACKNOWLEDGED). If this is set to '0', only the first notification is sent (when the Host changes to the 'HARD' state).
- Create Multiple Services: If a Variable is selected within this drop-down, for each Variable of the selected type added a new Service Check will be added with the value in the Variable added to the Service Check name. I.e. if we have 'Disk Capacity' as a Service Check with '%DISK%' selected in the 'Create Multiple Services: drop-down', then if four Variables are added via the 'Variables' tab ' 4 Service Checks will be added 'Disk Capacity: Value1, Disk Capacity: Value2', and so forth.
- Flap Detection: A service is considered flapping if its state changes too much. If this option is set, any services will be checked for this flapping condition and an icon will appear for the service and notifications will be temporarily disabled until the service comes out of a flapping state. We recommend that flap detection is enabled for active checks. However if you find a service is flapping frequently, there is probably another issue that needs investigating. We recommend that flap detection is disabled for passive checks.
- Sensitive arguments: If the Service Check is a plugin-based one, then the Sensitive Arguments checkbox allow you to determine if the arguments for the service check are displayed within the 'Test Service Check' tab within the investigate mode. If the flag is checked, the arguments will be hidden ' if unchecked the arguments will be shown. If you have TESTCHANGE set within your Role, you will be able to modify the arguments before testing the Service Check.
- Record Output Changes: Normally, the output of a Service Check is only recorded when the state of that service changes. For example, assuming a new check has been set up:
Service OK: 10%
Service OK: 15%
Service OK: 15%
Service OK: 20%
Service warning: 80%
Service warning: 75%
Service warning: 70%
Service warning: 40%
Service warning: 40%
Service OK: 20%
Service OK: 18%
- This option instead causes every change of output to be logged regardless of change of state (for the selected state changes). For example, for the same sequence above with OK and WARNING selected:
|OK||Service OK: 10%||Yes|
|OK||Service OK: 15%||Yes|
|OK||Service OK: 15%||No|
|OK||Service OK: 20%||Yes|
|CRITICAL||Service warning: 80%||Yes|
|CRITICAL||Service warning: 75%||NO - CRITICAL option was not selected|
|WARNING||Service warning: 70%||Yes|
|WARNING||Service warning: 40%||Yes|
|WARNING||Service warning: 40%||No|
|OK||Service OK: 20%||Yes|
|OK||Service OK: 18%||Yes|
- Alert every failure: This option forces a Notification to be sent on every check in a non-OK state. This is useful if you have a passive Service Check which receives results.
There are three states for this option:
- Disabled: only get alerts on state changes
- Enabled: get alerts for every failed state. This overrides the re-notification interval option
- Enabled with re-notification interval: get alerts for every failed state as long as the re-notification interval has passed. This is useful if you get a lot of results in quick succession
Note: The Notification number will increase for every non-OK result and only gets reset to zero when an OK state is received.
Event handler: Covered in greater detail in the 'Event handler' section of the User Guide, Event Handlers are scripts that can be triggered when a Service Check goes into or recovers from a problem state, such as 'WARNING' or 'CRITICAL'. The script can do anything you like, but a common usage includes restarting a service or server (virtual machine, for example) via an API.
Markdown filter: If this option is chosen, then the service output will be filtered through the Markdown plugin. This allows you to mark up the output with bold, italics and URL links. For instance, if the output is:
**Disk failure** on *sd1* - see internal wiki
This will be displayed as:
**Disk failure** on *sd1* - see internal wiki
Use http://daringfireball.net/projects/markdown/dingus to test your plugin output. Bear in mind that you cannot use the pipe symbol as Nagios Core interprets this as the start of performance data.
Also, < and > characters are converted to the HTML entities so you cannot embed other HTML tags, and finally you should keep to only one line due to NSCA limitations in a distributed environment.
Therefore, you should stick to using just bold, italics and links in your output.
Note: If your plugin returns HTML output, this will be displayed as the text. You must use markdown format if you want to use links.
- Check Freshness: If you are receiving passive results, you may want to check that you are getting results within a certain timeframe. From Opsview Monitor 3.5.1, you can configure this to take an action. You can enable freshness checking which means that if this service has not been updated for this amount of time, then Nagios Core will force a stale result for the service based on the configuration. There are two actions that can be taken:
Note: Due to a limitation in Nagios Core, only one of these actions can be chosen.
- Resend Notifications
When a passive check is received, there are normally no others that follow. The status on screen will show this last state. An alert will also be raised through the usual mechanism. However, you do not get a re-notification unless the service fails again.
If this option is selected, then we implement this Service Check with a freshness check that just submits the same result back to Nagios Core. The freshness threshold is set to the notification interval so it looks like the service has received the same result again. However, a side effect of this is that if the passive check is run on a slave, the master will not get a stale result for this service. Do not enable this option if you expect regular passive results to arrive.
Note: This feature is available in Opsview Monitor 3.5.0, but the user interface options will change in Opsview Monitor 3.5.1 onwards.**
This submits a result back into Nagios Core if the freshness timeout value has been reached, so you can either change the state to display an error or perhaps reset the state of a service back to OK
This is the amount of time before Nagios Core considers a service to be not fresh. You can enter this value in a duration format, such as 10m for 10 minutes or 48h 15m for 48 hours and 15 minutes.
Note: that due to the way that Nagios Core calculates this value, the stale action will run a few minutes after this timeout value.
Choose the appropriate state that you want the service to change to when it passes the freshness threshold. You may want to automatically set a service back to OK after one hour for certain types of checks.**
Choose what text you want to set as the output. The text will be added to the end of the state phrase (OK, WARNING, CRITICAL, UNKNOWN).
Once you have configured the relevant options within the 'Details' tab, you can click on the 'Passive Checks': tab:
This window contains just two options:
- Cascaded From: Designates an active check that is responsible for submitting passive results into your new passive Service Check. This ensures that when a recheck is performed, the correct active check runs for your passive check.
- For example, the Interface, Errors and Discards Service Checks will be defined as passive checks, where the results will be provided by the Interface Polling Service Check which runs the check_snmp_interfaces_cascade plugin.
- If this field is set, then when a service is rechecked, the correct active service will be run.
- Alert from Failure: You can define how many passive failures are received before an alert is raised. This is similar to how maximum check attempts is defined.
- We recommend you use a value of 1 if you have passive results coming from another system (such as log alerts). If you are using cascade checks, then you may want this value to be higher so you are only alerted after multiple failures.
Once the passive check and its options have been configured, it can be applied to one or more Hosts. See 'Section Service Checks Tab' for guides on how to add the newly created Service Check to a Host.