Opsview Knowledge Center

Event Handlers

Learn about Event Handlers

Overview

Event Handlers are a feature within Opsview Monitor that moves your monitoring solution away from a 'detect and alerting' system to a more proactive monitoring tool. What does this mean? Well, if Opsview Monitor detects that the web service is not running on a monitored host it can not only alert you, but it can also automatically restart the web service. This means that you will know a problem occurred so you can diagnose and ensure it doesn't happen again. But at the same time your users are not impacted as the web server is back online within seconds of the outage. This is done via an Event Handler.

Event Handlers are scripts (Perl, Python, etc.) that can be automatically run by Opsview Monitor when it detects that a host or service check has failed (i.e. gone into a non 'OK' or non 'UP' state). For reference, the Event Handler commands are executed when host or service check:

  • Goes into a 'soft' error state
  • Each soft error state also invokes the handler
  • Goes into a 'hard' error state
  • Recovers from a 'soft' or 'hard' error state.

Note: Downtime and Acknowledged Service Checks will still have event handlers run.

Event Handlers sit on the monitoring server and are invoked via Opsview Monitor. In order to successfully run the Event Handler, it must be stored within /opt/opsview/monitoringscripts/eventhandlers/ on the Collector with ownership of 'opsview:opsview' and file permissions of 0640 so that it can be successfully executed.

The graphic above shows the relationship between the Opsview Monitor software, the Opsview Agent and the Event Handler. The Collector runs the Event Handler when the service changes to a non-OK state. At the same time, the 'retry interval' will be running, meaning the Collector is likely monitoring the server at a one-minute interval (if the default value is unmodified). This means that once the Event Handler has been run, the Opsview Monitor server should detect that the service is now back 'up' and running, and thus the service check state should return to an 'OK' state (unless there is a problem stopping the service from restarting, such as misconfiguration, etc).

In the example above, we have chosen to run an Event Handler on the 'Apache service status' service check, however Event Handlers can be run on any host or service check; e.g. you may create an Event Handler that clears /tmp or 'Recycle Bin' when the 'Disk capacity' check changes to WARNING or CRITICAL. Alternatively, you may wish to create an Event Handler that flashes a series of lights red when a service check monitoring the number of 'Severity 1 tickets' changes from zero to one or more, in order to alert your support team quickly.

Creating a New Event Handler

Event Handlers, as explained in Introduction to Event Handlers, are scripts that can be written in any language understandable by the host operating system; e.g. Perl or Python. In most cases, however, Event Handlers tend to be either shell or Perl scripts.

Event Handlers should use the available macros within the environment to ensure that they only run when required. The main macros are:

  • $NAGIOS_HOSTSTATE (UP, DOWN or UNREACHABLE)
  • $NAGIOS_HOSTSTATETYPE (SOFT or HARD)
  • $NAGIOS_HOSTATTEMPT (number, starts from 1)
  • $NAGIOS_SERVICESTATE (OK, WARNING, CRITICAL or UNKNOWN)
  • $NAGIOS_SERVICESTATETYPE (SOFT or HARD)
  • $NAGIOS_SERVICEATTEMPT (number, starts from 1)

Other macros available within Opsview Monitor are:

  • $NAGIOS_CONTACTALIAS
  • $NAGIOS_CONTACTEMAIL
  • $NAGIOS_CONTACTGROUPLIST
  • $NAGIOS_CONTACTNAME
  • $NAGIOS_CONTACTPAGER
  • $NAGIOS_HOSTACKAUTHOR
  • $NAGIOS_HOSTACKCOMMENT
  • $NAGIOS_HOSTADDRESS
  • $NAGIOS_HOSTALIAS
  • $NAGIOS_HOSTDOWNTIME
  • $NAGIOS_HOSTDURATION
  • $NAGIOS_HOSTGROUPALIAS
  • $NAGIOS_HOSTGROUPNAME
  • $NAGIOS_HOSTNAME
  • $NAGIOS_HOSTNOTIFICATIONNUMBER
  • $NAGIOS_HOSTOUTPUT
  • $NAGIOS_HOSTPROBLEMID
  • $NAGIOS_HOSTSTATEID
  • $NAGIOS_LASTHOSTCHECK
  • $NAGIOS_LASTHOSTDOWN
  • $NAGIOS_LASTHOSTPROBLEMID
  • $NAGIOS_LASTHOSTSTATE
  • $NAGIOS_LASTHOSTSTATECHANGE
  • $NAGIOS_LASTHOSTUNREACHABLE
  • $NAGIOS_LASTHOSTUP
  • $NAGIOS_LASTSERVICECHECK
  • $NAGIOS_LASTSERVICECRITICAL
  • $NAGIOS_LASTSERVICEOK
  • $NAGIOS_LASTSERVICEPROBLEMID
  • $NAGIOS_LASTSERVICESTATE
  • $NAGIOS_LASTSERVICESTATECHANGE
  • $NAGIOS_LASTSERVICEWARNING
  • $NAGIOS_LASTSTATECHANGE
  • $NAGIOS_LONGDATETIME
  • $NAGIOS_LONGHOSTOUTPUT
  • $NAGIOS_LONGSERVICEOUTPUT
  • $NAGIOS_NOTIFICATIONAUTHOR
  • $NAGIOS_NOTIFICATIONCOMMENT
  • $NAGIOS_NOTIFICATIONNUMBER
  • $NAGIOS_NOTIFICATIONTYPE
  • $NAGIOS_SERVICEACKAUTHOR
  • $NAGIOS_SERVICEACKCOMMENT
  • $NAGIOS_SERVICEDESC
  • $NAGIOS_SERVICEDOWNTIME
  • $NAGIOS_SERVICEDURATION
  • $NAGIOS_SERVICENOTES
  • $NAGIOS_SERVICENOTIFICATIONNUMBER
  • $NAGIOS_SERVICEOUTPUT
  • $NAGIOS_SERVICEPROBLEMID
  • $NAGIOS_SERVICESTATEID
  • $NAGIOS_SHORTDATETIME
  • $NAGIOS_TIMET

In the example script below, we are restarting the Apache service per the scenario in Section 4.5.1:

#!/bin/bash
# Uncomment below to get debug information about the environment variables set 
# {date; env | sort; echo; } >> /tmp/handler.log
# If Service State is CRITICAL (options are OK, WARNING, CRITICAL and UNKNOWN)
# and Service State Type is HARD (options are HARD and SOFT)
# then execute Event Handler action
if [[ "$NAGIOS_SERVICESTATE" = "CRITICAL" && "$NAGIOS_SERVICESTATETYPE" = "HARD" ]]
then
        echo "restarting apache"
        # insert Event Handler action here...
        /opt/opsview/monitoringscripts/plugins/check_nrpe -H $NAGIOS_HOSTADDRESS -c eh_apache_restart >/dev/null 2>&1
        # record event to syslog
        logger "Apache 2 restarted by Opsview $NAGIOS_HOSTADDRESS"
fi

This Event Handler is located within /opt/opsview/monitoringscripts/eventhandlers/ on the Opsview Monitor primary server. The first part of the Event Handler will check the service state to ensure it is CRITICAL and also HARD (i.e. in case the service has temporarily stopped; this can be changed easily).

Once the Event Handler is satisfied the above criteria are met, it will echo 'restarting apache' and then run the command 'eh_apache_restart' on the host in question, along with piping the output of the command to /dev/null (i.e. hide the output). Finally, it will log that it has restarted the script.

In the Opsview Monitor user interface, the Event Handler can be configured either on a global basis for the service check, for example, if this service check changes to a CRITICAL state on any host, run this Event Handler, or on an individual basis, if this service check changes to a CRITICAL state just on this host. This allows for bespoke Event Handlers that are custom to individual hosts.

Applying an Event Handler to a Service Check

This applies the event handler to all Hosts that use the modified service check.

Go to 'Configuration > Service Checks' and edit the service check to which you want to apply the Event Handler. In our example it will be 'Apache active sessions'.

On the Service Check edit window, go to the 'Details' tab and click on the 'Advanced' section:

Once you have clicked 'Submit Changes', any host that has the 'Apache active sessions' service check applied will have the Event Handler enabled for its service check.

Applying an Event Handler to a Host

This applies the event handler to the one Host for which you modify the service check.

Go to 'Configuration > Hosts' and edit the host to which you want to apply the Event Handler.

On the Host edit window, click on the 'Service Checks' tab, and navigate to the service you want to add the Event Handler to using the tree panel on the left hand side.
Note: Ensure the service check is checked in the left hand panel; if the service check is not checked then the 'Exceptions' drawer will not be enabled.

Once 'within' the service check, click on the 'Exceptions' drawer and check the 'Event Handler' checkbox as shown above.

Finally, enter the name of the Event Handler and click 'submit changes'; this will now enable the 'restart_apache' service check just for this service check on the host 'Opsview'.

Debugging Event Handlers

There are a few helpful tips that can assist you in debugging Event Handlers that are not working. First, ensure that Opsview-Executor log level is set to debug.

Secondly, ensure that the scripts are placed in /opt/opsview/monitoringscripts/eventhandlers/ and that they are owned by 'opsview:opsview' and have the file permissions '500' as below:

root@system:/opt/opsview/monitoringscripts/eventhandlers# ls -la
total 64
drwxr-xr-x 2 opsview opsview 4096 Jun 29 14:17 .
drwxrwxr-x 6 opsview opsview 36864 Aug 11 11:15 ..
-r-x------ 1 opsview opsview 1610 Feb  2  2012 apache_restart
-r-x------ 1 opsview opsview 217 Jan 29  2013 windows_service_restart

Thirdly, you can test the Event Handler by passing through the environment variables (macros) to the script to simulate a check execution using a command such as:

NAGIOS_SERVICESTATE=CRITICAL NAGIOS_SERVICESTATETYPE=HARD NAGIOS_SERVICEATTEMPT=3 /opt/opsview/monitoringscripts/eventhandlers/apache_restart

Running this against our apache_restart script we can see:

nagios@system:/opt/opsview/monitoringscripts/eventhandlers$
NAGIOS_SERVICESTATE=CRITICAL NAGIOS_SERVICESTATETYPE=HARD NAGIOS_SERVICEATTEMPT=3 /opt/opsview/monitoringscripts/eventhandlers/apache_restart
restarting apache
opsview@system:/opt/opsview/monitoringscripts/eventhandlers$

Fourthly, check the 'opsview.log' file on the appropriate Collector for the message:

[1224616751] SERVICE EVENT HANDLER: opsview:Pipe Check;CRITICAL;SOFT;1;host1_service83_eh-hander_cmdpipe.sh

This means the Event Handler was called successfully. You can look for these messages using a command similar to the one shown below:

opsview@system:/opt/opsview$ tail -n2000 /var/log/opsview/opsview.log | grep EVENT

Finally, if you see a log message similar to the one shown below:

[1224616753] Warning: Attempting to execute the command "/opt/opsview/monitoringscripts/eventhandlers/event_handler_for_tcpip" resulted in a return code of 127.  Make sure the script or binary you are trying to execute actually exists...

Then check the permissions for the script (see above for setting permissions and ownership).

Event Handlers


Learn about Event Handlers

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.