Opsview Knowledge Center

Analyzing the Data

Overview of Service Checks and Checker in Opsview Monitor

Host Groups, Hosts and Services – Checker Overview

You should now be comfortable with the creation, removal and modification of Service Groups, Host Templates and Service Checks, and the application of these items to Hosts. After the Service Checks and Host Templates have been configured and applied to Hosts, Users can begin to interpret the Service Checks and analyze the monitored data within the 'Host Groups, Hosts and Services' section of 'Monitoring':

This section is the default view for all Host Group, Host and Service Check analysis and allows a range of functions including investigation of Hosts and services, 'actions' at a Host Group/Host Service Check level and more. The 'Host Group, Hosts and Services' section is split into two'sections'; the top half is known as the 'Navigator' and the bottom half is known as the 'Checker'.

Example 'Host Group, Hosts and Services' section with one host selected

Example 'Host Group, Hosts and Services' section with one host selected

The Navigator contains the Host Group hierarchy, with the Hosts as the 'end point'. In the example screen shown below, there is a Host Group hierarchy containing four Hosts:

Example 'Navigator' with four Host Groups expanded to reveal four Hosts

Example 'Navigator' with four Host Groups expanded to reveal four Hosts

To view the details of a Host, you should check the 'View' box next to the Host's contextual menu. Checking the box will load the Checker window and populate it with the Service Checks of the Host(s) selected.

For more information on the Navigator and how to analyze Hosts within the 'Host Groups, Hosts and Services' window, see Section 'Analyzing the data (Monitoring Host groups, Hosts and services)'.

The Checker contains the Service Checks from the Hosts who have been selected within the Navigator. For example, if you select the Host 'opsview' then the Checker will appear and display all of the Service Checks for the 'opsview' Host. These checks could be applied directly via the Host edit section or via a Host template applied to the Host.

For information on:

  • Adding one or moreHosts ServiceChecks to the Checker
  • Filtering and sorting the list of Service Checks
  • Removing all selected Hosts (clearing selections)
  • Actions and icons next to Service Checks

See Section 'Viewing a hosts service checks'.

The next sections will cover the Service Check contextual menu and actions.

Contextual Menu: Service Checks

There are six actions that are available from the contextual menu of Host Groups, Hosts and services:

  • Investigate
  • Schedule Downtime
  • Re-Check
  • Acknowledge
  • Set Service State
  • Troubleshoot
  • Edit configuration for Host
  • Edit configuration for Service

For detailed information around these items, other than 'Investigate', 'Troubleshoot' and 'Edit configuration**' see the sections below:

This section will cover the investigate mode in detail, along with a section on how to use each of the options available within the contextual menu.

Investigate

Clicking on ‘Investigate’ will load a modal window with eight tabs.

Investigate - Info Tab

The first of those tabs is the 'Info' tab as below:

The Info tab is a one-stop shop for all information relating to the Host. For an explanation of what each field means, see below:

  • Service State: The state of the Service Check, i.e. 'OK', 'CRITICAL', 'WARNING' or 'UNKNOWN'. Also displays how long the Service Check has been in the given state, i.e. 'OK for 2 days ..'.
  • Status information: The output of the Service Check. In the example above, 'Opsview DB Connections', a status is returned showing the performance .
  • Performance data: If the Service Check returns data in a 'performance data' ('perfdata') format, it will be displayed here.
  • Current Attempt: The current attempt number. If the Service Check is 'OK' this value will always be 1. If the host is 'WARNING', 'UNKNOWN' or 'CRITICAL', this number will be between 1 and the number defined in the 'Max Attempts' field.
  • Max Attempts: The number of attempts required for the Service Check to be converted from a 'SOFT' state to a 'HARD' state.
  • State Type: Hard or Soft; if 'OK' the Service Check will always be in a 'HARD' state (see Section Host Check Commands for more information around this concept). If 'CRITICAL' or 'WARNING' for example, the host will be in a 'SOFT' state until the number of Max Attempts has been met, at which point it will convert from SOFT to HARD.
  • Last Check: The date and time of the last check of this Service Check, i.e. the last time the Service Check was run against the host.
  • Check Type: Whether the check is active or passive.
  • Monitored By: The name of the Monitoring Server that is monitoring the Host and its Service Checks. If monitored by a slave cluster, the slave cluster name will be returned here instead of the individual slave server.
  • Latency: The time it took Opsview Monitor in milliseconds to execute the Service Check.
  • Duration: The time it took Opsview Monitor to execute the Service Check and get a result.
  • Next Scheduled Check: The date and time of the next scheduled execution of the Service Checks.
  • Last State Change: The date and time of the last state change, i.e. the date and time of when the Service Check changed from 'CRITICAL' back to 'OK', for example.
  • Last Notification: The date and time of when the last Notification regarding a non-OK host status was sent, i.e. the last time an email was sent to inform you of the Service Check being in a non-OK state.
  • Notification Number: If the Service Check is currently 'CRITICAL'/'WARNING'/'UNKNOWN' and sending Notifications, this number denotes the number of alerts sent. For example, if the Host has been down for six hours and Opsview Monitor is configured to send an alert every hour, this number would be six (i.e. six Notifications sent). If the host is 'UP', then no Notifications are being sent and as such the number will be 0.
  • Is This Service Flapping?: A 'Yes' or 'No' label relating to Flap Detection, which is configured within the 'Notifications' tab of the edit window for the Host. See Section Details Tab: Advanced for more information. If the Service Check is marked as 'flapping', this field will change to 'Yes'.
  • In Scheduled Downtime?: A 'Yes' or 'No' label relating to whether the Service Check is in a state of downtime or not. If the Service Check is in an active period of downtime (i.e. the current date and time falls within a downtime periods configured date and time), the label will read 'Yes'.
  • Last update: The date and time of when a result was received for this Service Check.
  • Active Checks: An 'Enabled' or 'Disabled' label relating to whether active checks are currently allowed for this Host. This is configured via the 'Actions' tab. For more information, see 'Investigate mode: Service Check ' Actions tab'.
  • Passive Checks: An 'Enabled' or 'Disabled' label relating to whether passive checks are currently allowed for this Service Check. This is configured via the 'Actions' tab. For more information see 'Investigate mode: Service Check ' Actions tab'.
  • Notifications: An 'Enabled' or 'Disabled' label relating to whether Notifications are currently enabled or disabled for this Service Check. This is configured via the 'Notifications' tab of the Host edit window for the Host. This is configured via the 'Actions' tab. For more information, see 'Investigate mode: Service Check ' Actions tab'.
  • Event Handler: An 'Enabled' or 'Disabled' label relating to whether an Event Handler is currently allowed for this Service Check. This is configured via the 'Actions' tab. For more information, see 'Investigate mode: Service Check ' Actions tab'.
  • Flap Detection: An 'Enabled' or 'Disabled' label relating to whether flap detection is currently enabled or disabled on this Service Check. This is configured via the 'Actions' tab. For more information see 'Investigate mode: Service Check ' Actions tab'.

Investigate - Actions Tab

The second of the tabs within the 'Investigate' view is the 'Actions' tab as below:

The Actions tab allows you to change certain settings relating to the Service Check such as whether active checks are enabled for the Host, or whether flap detection is enabled.

There are two boxes below the 'toggle buttons' panel, the first of which allows for the rescheduling of the next check of the Service Check a specific date and time:

The second box allows for the submission of a passive check result for the Service Check, i.e. change the Service Check from an 'OK' to a 'CRITICAL' state with a User-defined 'output' and 'performance data' value:

Clicking the 'Reset' button will clear all values entered into the 'Reschedule' and 'Submit Passive Check' boxes, however any toggle switches are actioned immediately, meaning if 'Accept Passive Checks?' is toggled from 'Enabled' to 'Disabled', the Service Check no longer accepts passive check results immediately, without the need for the 'Commit' button to be pressed.

Clicking 'Commit' will submit the information from the 'Reschedule' OR 'Submit Passive Check Result' boxes, depending on which one is enabled via the radio button.

Investigate - Graphs Tab

The third of the tabs within the 'Investigate' view is the 'Graph' tab as below:

The 'Graph' tab will display the performance data gathered by the Service Check within the Opsview Monitor graphing framework. This performance data is visible on the 'Info' tab.

If there is no performance data gathered by the Service Check then no graphs will be available, and thus the Graph tab will be automatically hidden.

The date range shown in the graph tab is defaulted to 1d (one day), however it can be modified to 1w (one week), 1m (one month), 1y (one year) or by manually specifying a date range (from and to) in the bottom left corner.

The values of the graph are shown on the 'Value' row, with the numbers here changing when the mouse is moved throughout the graph.

Finally, you can choose to 'annotate', 'print', 'save data as ..' and 'download' the graph using the circle with the 'downwards' arrow, located in the top right hand side of the graph.

Investigate - Troubleshoot Tab

The fourth of the tabs within the 'Investigate' view is the 'Troubleshoot' tab as below:

The 'Troubleshoot' tab allows you to test the Service Check, as it would run on the command line, via the Opsview Monitor interface. Combined with the 'Macro Help' and 'Plugin Help' windows, you can modify arguments and click 'Submit' to test if the Service Check will work.

Simply modify the arguments using the text entry box at the top as per the plugin help file and click 'Submit' to test various combinations that the plugin may allow.

Investigate - Notifications Tab

The fifth of the tabs within the 'Investigate' view is the 'Notifications' tab as below:

This tab will show all Notifications sent relating to either the Host or one of the Hosts' Service Checks.

  • Time: The date and time the Notification was sent.
  • Status: The status of the Service Check or Host check at the point of the Notification; i.e. CRITICAL, DOWN, etc.
  • Users: The number of Users to whom the specific Notification was sent. The number is clickable, at which point a new modal window will appear displaying the username, profile name and Notification methods used to notify the User. These Notification methods are displayed as icons, which have a description in the tooltip when the mouse is hovered over the icon:
  • Notification Type: The type of notification. Most of the time, this will be 'NORMAL', but other types include:'**ACKNOWLEDGEMENT','FLAPPING STARTED','FLAPPING STOPPED','FLAPPING DISABLED','DOWNTIME STARTED','DOWNTIME STOPPED','DOWNTIME CANCELLED','CUSTOM'

The list of Notifications can be exported by clicking on the 'Export' button, at which point you are prompted to choose one of three export formats: csv, json and xml. When the format is selected, the Notifications list will be generated in the given format and downloaded to the user's desktop/device via the browser.

Investigate - Events Tab

The sixth of the tabs within the 'Investigate' view is the 'Events' tab as shown below:

Essentially a different way of analyzing the history of a Service Check, the Events tab allows Users to choose a date using the date picker on the left hand side, which then re-populates the bar graph with the events (if any) for the chosen date. In the screen above, we have 1 'OK' event and '1' warning events at 18:00.

By default, the bar graph is displayed 'full tab', with the event checker minimized. The mouse can be hovered over the bars which will reveal the number of events in that given state, i.e. 1 'OK' events in the above example. When one or more bars are clicked, the event checker will be populated with the events from the selected bars:

In the above example we have clicked on the 1 warning events bar, which has loaded the event checker with the 3 specified critical events. To clear the event checker and minimize it we can simply re-click on the '1' bar which will deselect it. When the event checker is empty it will automatically minimize.

Within the event bar, located in the top right, is a 'downwards' arrow. When moused-over, this arrow will reveal four contextual menu options:

Download as ' allows you to choose from one of four formats: png, jpg, svg or pdf.

Save data ' allows you to choose from one of three options: .csv, .xlsx or .json.

Annotate ' When selected, allows you to draw and annotate the bar graph. Once annotated, the bar graph can be downloaded using the 'Download as' button.

Print ' allows you to print the bar graph as an image.

Investigate - Notes Tab

The 'Notes' tab is the second to last tab within the Investigate mode window:

The Notes tab for a Service Check is very similar to the one for Host Groups and Hosts, in that it allows you to enter text in a WYSIWYG editor which can be seen and edited by other users of Opsview Monitor (who have permission to view the relevant Host/Service Check). This is a great way to leave notes about what the Service Check is i.e. 'Interface throughput monitor. This is Tims Tyres router, they are located in London, UK and have an internal subnet of 192.168.1.0/24 with the router's IP being 1.254.'.

Investigate - History Tab

The 'History' tab is the last tab within the Investigate mode window:

The History tab will show the history of the Service Check within a tabular format. The 'State' and 'Type' columns can be filtered via the columns contextual menu as below:

To filter on the date and time, you can use the filter toolbar at the top of the table:

To apply the entered date and time parameters, you should click on the 'search' icon. To clear the entered results and reset the values in the fields you should click on the 'cross' icon.

If you attempt to filter on a range where the 'Start date' (From) is later than the 'End date' (Until), Opsview Monitor will display the following error:

The history list can be exported by clicking on the 'Export' button, at which point you are prompted to choose one of three export formats: csv, json and xml. When the format is selected, the Notifications list will be generated in the given format and downloaded to the User's desktop/device via the browser.

Schedule Downtime

The 'schedule downtime' option within a Service Check's contextual menu will load a modal window when clicked, which will look similar to the screen shown below:

The window allows you to select various fields in the top half:

  • Start time: When the downtime should begin (i.e. the start of the maintenance window).
  • End time: When the downtime should end (i.e. the end of the maintenance window).
  • Comment: User entered free-text describing the reason for the downtime.

Clicking the 'Schedule Downtime' button will submit the action to Opsview Monitor, which will result in either the 'pending downtime' or 'active downtime' icon appearing against the Host Group (these icons were outlined in Overview of icons

Re-Checks

The 'Re-Check' option within a Service Check's contextual menu will load a modal window when clicked, which will look similar to the screen shown below:

The window allows you to force a re-check of a selected Service Check. On 'Submit', Opsview Monitor will forcibly recheck the Service Check in question.

Acknowledge

The 'Acknowledge' option within a Service Check's contextual menu will load a modal window when clicked, which will look similar to the screen shown below:

The window allows you to acknowledge the Service Check (and thus convert into a 'Handled' status).

As covered in Section 'Acknowledgments', only Service Checks that are not 'OK' can be acknowledged.

You can also choose to send an acknowledgement Notification ('Ack Notification') from within the modal window, which if checked, will send a Notification to the appropriate Users (i.e. those who are set up to be notified about the given Host/Service Checks).

Finally, there is the option to set the acknowledgement as 'Sticky'. As covered in Section 'Acknowledgments':

'A sticky acknowledgement means that only when the Host or Service Check returns to an 'UP' or 'OK' state will the acknowledgement be cleared. This is great for when there is a Host flapping between a warning state (i.e. I'm too busy!) and a CRITICAL state (i.e. I'm REALLY too busy!).'

Set Service Status

The 'Set Service Status' option within a Service Check's contextual menu will load a modal window when clicked, which will look similar to the screen shown below:

The window allows you to change the status of the Service Check. i.e. change a 'CRITICAL' service check to an 'OK' state, as with the screen above.

Please note, if an OK Service Check is changed to a non-OK status, i.e. CRITICAL, WARNING or UNKNOWN, the check interval will change to that of the 'Retry interval'. This means that the Service Checks you have changed from OK to CRITICAL, for example, will likely return to the OK state within one minute ' unless the retry interval is modified.

Analyzing the Data

Overview of Service Checks and Checker in Opsview Monitor